1 2 3 4 Previous Next 47 Replies Latest reply on Jun 8, 2010 9:10 AM by timfox Go to original post
      • 15. Re: many topics, paused queues, memory growing
        hantunca

        Tim - I do see the ThroughputConnectionFactory - looks like the only difference is the batch-delay parameter - what is the parameter, i.e. the number of messages to wait before sending or the number of milliseconds?

        • 16. Re: many topics, paused queues, memory growing
          timfox

          It's the max delay in ms.

          • 17. Re: many topics, paused queues, memory growing
            timfox

            Anyway... to come back to the original point. Please create a test case, add it to a JIRA as explained on the wiki and someone will take a look...

            • 18. Re: many topics, paused queues, memory growing
              hantunca

              OK, I've had a chance to peer into the code a little and try to figure out what's going on.  First, let me explain how I'm setting this up: I create an embedded core server - once the core server is setup, I create embedded producers that start publishing messages - at this point there are no consumers for the addresses, so no queues are created.  Then, I fire up a non-embedded consumer that creates a queue for one of the addresses.  When I start up a consumer, sometimes messages will start flowing, sometimes not.  If messages are not flowing, using the QueueControl interface I can hit the resume method and messages start flowing.  It's hard to create a test case for this as the problem is intermittent.

               

              Now, looking through the code, the problem seems to occur in the QueueImpl class.  Specifically, in the directDeliver method.  In the cases where the messages aren't flowing, the call to handlers.isEmpty() returns true.  What happens is that the directDeliver method is being called before the addConsumer() method is called.  At this point, the directDeliver method returns false to the add method, which then sets the 'direct' variable to false - this 'direct' variable does not get set until the resume method is called?  In the cases where the messages flow as expected, the addConsumer method was called before the directDeliver method is called.

               

              This seems like a timing issue, not sure.  I think one way of fixing this is to reset the 'direct' variable at the end of the addConsumer.

              • 19. Re: many topics, paused queues, memory growing
                timfox

                Please try 2.1.0 final. We've made lots of changes in delivery since 2.0

                • 20. Re: many topics, paused queues, memory growing
                  hantunca

                  I tested 2.1.0 maybe 3-4 weeks and it had the same problem.  In addition, I found 2.1.0 was slower in terms of latency from 2.0.0 - since latency is critical for my app, I'm going to have to stay with 2.0.0 unless 2.1.0 can match the speed of 2.0.0.

                  • 21. Re: many topics, paused queues, memory growing
                    timfox

                    You really should try 2.1.0

                     

                    "handlers" and directDeliver method haven't existed since 2.0, that's months ago.

                     

                    Also regarding latency. By default 2.1 latency should be exact same as 2.0, this has been added since you last tried trunk.

                     

                    This is discussed in the 2.1.0 perf tuning chapter. There are a set of new options for performance tuning throughput and latency there.

                     

                    Reporting issues against 2.0 doesn't really help us. We only investigate issues against TRUNK which right now is 2.1. Your explanation doesn't really help since, like I say, those methods doesn't exist any more anyway.

                    • 22. Re: many topics, paused queues, memory growing
                      hantunca

                      Tim - just tried 2.1.0 straight from the trunk, and on the very first shot I saw the same issue - i.e. the queue starts to get backed up but is not in a 'paused' state.  I won't be able to look further into it on the 2.1.0 codebase until early next week, but from what I could tell on the 2.0.0 code base, there is a timing/threading issue - that is to say, if messages start flowing before a handler is created, they start to backup into a queue causing the problem.  In most cases, a handler is created first and then messages start flowing.

                       

                      I'm not sure if I'm the first to see this - if I am, perhaps it's because I create addresses first and start publishing to them before consumers are created?

                       

                      thanks,

                      Han

                      • 23. Re: many topics, paused queues, memory growing
                        hantunca

                        I've had a chance to take a look into the 2.1.0 code base, and here's what I think is going on...

                         

                        In the 'add' method, the message reference is added to the messagesReferences collection - if the number of messages in the messagesReferences collection is 1, a call is made to deliverAsync() which causes an executor to call the deliver() method.  At this point, because there are no consumers, the deliver() method exits.  In the meantime, more messages are added via the "add" method to the messagesReferences collection - however, now refs > 1, so deliverAsync() is never called - so messages are now never taken out of the messageReferences collection and continue to queue up.  In fact, I can't really see where, if ever, when the messageReferences collection > 1, how those messages get taken out automatically - looking at the "resume" method, I see a call to deliverAsync() which then starts pulling messages off of the messageReferences collection and sending them out (which makes sense that when I call the resume method on the queue, messages start flowing again).

                         

                        In order to reproduce this, you should be able to modify addConsumer method in the QueueImpl class by 'delaying' adding the consumer to the consumerList/consumerSet collections while messages are being added.  I believe, to replicate this on your system, you could:

                         

                        - modify the addConsumer method - when a consumer is added, instead of adding it right away, create a timer to add it in 20 seconds (or whatever you chose).

                        - start up a core session, create a producer, start publishing messages (at a high rate).

                        - create a consumer (in another jvm) and start consuming ticks.

                         

                        I realize I could be wrong in my assessment, and I'm hoping you can point out what I'm doing wrong.  However, if this is a bug, it seems pretty serious.

                         

                        Han.

                        • 24. Re: many topics, paused queues, memory growing
                          timfox

                          han tunca wrote:

                           

                          In fact, I can't really see where, if ever, when the messageReferences collection > 1, how those messages get taken out automatically -

                          When a consumer is created, the next thing that happens is a flow control token hits the ServerConsumerImpl instance and causes promptDelivery() to be called, this in turn calls QueueImpl.deliverAsync() which results in the delivery of any messages.

                           

                          If you can provide a simple test case that demonstrates an issue, we'll be more than happy to investigate.

                          • 25. Re: many topics, paused queues, memory growing
                            hantunca

                            Tim,

                             

                            attached is a test case - to replicate:

                             

                            - unzip file

                            - start server with server.sh.

                            - start producer with producer.sh

                            - start consumer with consumer.sh

                             

                            The producer will start sending messages - once the client starts, you will notice the server will spit out messages telling you how many messages are backed up in the queue - the server delays putting a consumer in it's collections for 20 seconds - at 20 seconds you will see it adding the consumer to its collections, but messages still backup.  You can then use a JMX console to call "resume" on the queue to get the messages flowing.  Please let me know if you need help getting this going.

                             

                            Please note that I've overwritten the QueueImpl class with one where I've put in some more debugging to see what's going on.

                             

                            Han

                            • 26. Re: many topics, paused queues, memory growing
                              timfox

                              It's the delay you added that is causing the messages to not get delivered.

                               

                              In a normal "unhacked" server, as I mentioned before, straight after a consumer is created, a flow token is sent, which causes deliverAsync() to be called on the queue, which delivers the messages.

                               

                              In your "hacked" version you are delaying the addition of the consumer until after the flow token has been sent, so the messages never get delivered. This would never happen in real life.

                              • 27. Re: many topics, paused queues, memory growing
                                hantunca

                                Tim,

                                 

                                The whole reason I started this thread is I'm seeing the problem real-time in a non-modified version.  I've seen this problem in version 2.0.0 and 2.1.0.  As I've mentioned before, I can't reliably get this to happen on a production system, so I modified the code to reliably reproduce the problem.  I had a version of the QueueImpl class that did not have the delay - it only printed out a message when adding a consumer and when adding messages - when I saw the problem it had started adding messages before adding a consumer, so I'm doing my best to reliably replicate what I see happen in a non-modified system by adding in the delay.

                                 

                                Note, this is a system that is creating thousands of queues, and sending many thousands of messages per minute.  It's definitely dependent on how many messages are being sent through - that is to say, if I'm sending < 1000 messages per second, it happens less frequently than if I sent >4000 messages per second.  I suspect what is going on is that addConsumer/add methods are being called on two different threads?  The addConsumer thread pauses and the add methods continue.  Please also note that I'm running on a system with 8 cores, and many threads.

                                 

                                I really like the hornetq system, and have our code built around it right now.  But before I can put the code into production, I need to solve this problem.  If you haven't seen this problem before, it must be in my use case.

                                 

                                Han

                                • 28. Re: many topics, paused queues, memory growing
                                  timfox

                                  han tunca wrote:

                                   

                                  Tim,

                                   

                                  The whole reason I started this thread is I'm seeing the problem real-time in a non-modified version.  I've seen this problem in version 2.0.0 and 2.1.0.  As I've mentioned before, I can't reliably get this to happen on a production system, so I modified the code to reliably reproduce the problem. 

                                  It's your modification that reliably *causes* the problem

                                   

                                  han tunca wrote:


                                   

                                  I really like the hornetq system, and have our code built around it right now.  But before I can put the code into production, I need to solve this problem.  If you haven't seen this problem before, it must be in my use case.

                                   

                                   

                                  I haven't seen this problem before. And, as yet, I haven't seen anything that points to a problem in HornetQ. I'm not saying there isn't an issue here. But it needs to be reproducable for us to be able to do anything about it.

                                  • 29. Re: many topics, paused queues, memory growing
                                    timfox

                                    I'd bear in mind also that 2.0 and 2.1 have *completly different* delivery algorithms, it was completely rewritten.