1 2 Previous Next 26 Replies Latest reply on Oct 1, 2015 10:23 AM by gabboflabbo

    Out of memory with Clustered HornetQ

    gabboflabbo

      I've been noticing memory issues with hornetq and clustering.  Specifically, once a certain number of messages are added to a queue my server goes OOM

       

      In my setup (both wildfly 9.0.1):

      Server A Windows:

      - produces messages to queue

       

      Server B Linux

      - consumes messages

       

      Server B can't consume messages as fast as they are produced so Server A eventually starts paging.   Debugging memory on Server A shows that the BridgeImpl class has a member called "refs" which holds every a reference to every message (even messages that are paged).    The problem is that if I push 1million+ messages then that refs member will hold 1 million+ references which blows up my memory.   

       

      Paging works fine in a single server environment to limit memory usage but does not in a clustered environment.  Is there some configuration option that will limit the number of MessageReference's held in BridgeImpl.class.


       

        • 1. Re: Out of memory with Clustered HornetQ
          gabboflabbo

          I've attached 2 wars.  They can run in wildfly with standalone-full-ha.xml  (the cluster password will have needed to be setup).

          There's a queue.cli in conf/   to setup the queue.

           

          2 separate machines will have to be used (I did not experience this issue with 2 servers on one machine)

           

          Steps to reproduce:

           

          1.) Start up hornetq-clustering.war on server A

          2.) Visit the following url 10 times:

              http://localhost:8080/hornetq-clustering/rest/test/generate/50000  

          3.) Start up hornetq-clustering-consumer.war on Server B

           

          Server A will then begin to load references to all messages in memory and go OOM.

           

          Debugging BridgeImpl.class shows refs holding 500,000 entries.

          • 2. Re: Out of memory with Clustered HornetQ
            jbertram

            I'm not clear on your set up.  Are you saying that both Server A and server B are clustered from a HornetQ perspective and than an application on server A is producing messages to a local queue and then the clustered bridge is then moving those messages to server B where a different application is then consuming those messages?  If so, that seems less than ideal from a performance and complexity-of-configuration standpoint.  From a performance perspective you're basically doubling the amount of work necessary to get a message to your consumer since the message has to go to a local queue, get consumed by the bridge, then sent to the other server by the bridge where it is delivered to another queue.  For durable messages that means the message would be written to and read from disk twice.  If you simply sent the messages directly to the server B or had the consumers from server B read from server A the performance would be better.  Such a change would also nullify the need for a cluster in the first place which would simplify your configuration.

             

            Whether you change your configuration or stick with your current set up you're always going to have a problem if you produce messages more quickly than you consume them.  Paging messages to disk is only a temporary solution as you will eventually fill up your disk.  I recommend you conduct some performance benchmarks and then tune your producer's connection factory settings to throttle the message volume so the consumers don't fall behind.  See the documentation for more details on that.

            • 3. Re: Out of memory with Clustered HornetQ
              jbertram

              One point here...

               

              2 separate machines will have to be used (I did not experience this issue with 2 servers on one machine)

              I don't have 2 physical machines so I wouldn't be able to test this anyway, but the fact that you need 2 physical machines to reproduce this problem suggests to me that the network connection between the two machines may not be able to sustain the message volume you're generating.  In any event, my recommendation is the same as in my previous comment.

              • 4. Re: Out of memory with Clustered HornetQ
                gabboflabbo

                Justin Bertram wrote:

                 

                I'm not clear on your set up.  Are you saying that both Server A and server B are clustered from a HornetQ perspective and than an application on server A is producing messages to a local queue and then the clustered bridge is then moving those messages to server B where a different application is then consuming those messages?  If so, that seems less than ideal from a performance and complexity-of-configuration standpoint.  From a performance perspective you're basically doubling the amount of work necessary to get a message to your consumer since the message has to go to a local queue, get consumed by the bridge, then sent to the other server by the bridge where it is delivered to another queue.  For durable messages that means the message would be written to and read from disk twice.  If you simply sent the messages directly to the server B or had the consumers from server B read from server A the performance would be better.  Such a change would also nullify the need for a cluster in the first place which would simplify your configuration.

                 

                Yes.  I've provided as simple as possible test case to illustrate the issue.  The real world application is different/more compilcated (and has local consumers) but suffers the same problem.  In the end, a rest service consumes a json payload and pushes it to a queue.     I'm showing that when server b starts up,   server a dies.  That's not good.

                Whether you change your configuration or stick with your current set up you're always going to have a problem if you produce messages more quickly than you consume them.  Paging messages to disk is only a temporary solution as you will eventually fill up your disk.  I recommend you conduct some performance benchmarks and then tune your producer's connection factory settings to throttle the message volume so the consumers don't fall behind.  See the documentation for more details on that.

                 

                Disk space is much easier to manage than memory.    It would take a billion messages to run out of disk space and we'd run out of memory long before that because of this issue.  Also our traffic is high during the day and low at night so it can play catch up during periods of low activity.  I can't tell our clients to stop hitting our server.  This is the whole point of clustering.

                • 5. Re: Out of memory with Clustered HornetQ
                  gabboflabbo

                  Justin Bertram wrote:

                   

                  One point here...

                   

                  2 separate machines will have to be used (I did not experience this issue with 2 servers on one machine)

                  I don't have 2 physical machines so I wouldn't be able to test this anyway, but the fact that you need 2 physical machines to reproduce this problem suggests to me that the network connection between the two machines may not be able to sustain the message volume you're generating.  In any event, my recommendation is the same as in my previous comment.

                  Yes,  but hornetq shouldn't blow up memory if the network is slow.

                  • 6. Re: Out of memory with Clustered HornetQ
                    jbertram

                    Yes.  I've provided as simple as possible test case to illustrate the issue.

                    Unfortunately I can't use your test-case since it requires two physical machines (as I noted previously).  Furthermore, you did not include your server configuration or test source code so I can't see what your address settings are or what the code is doing exactly.

                     

                    I can't tell our clients to stop hitting our server.

                    I apologize if this point wasn't clear.  I wasn't suggesting that you tell your clients to stop hitting your server.  I only meant to explain how producing more messages than you consume is ultimately untenable, but you can mitigate that problem by using producer flow control.

                     

                    This is the whole point of clustering.

                    The way you're utilizing clustering is a bit bizarre, in my opinion.  In most use-cases clustering is implemented to spread the load across the servers fairly evenly (i.e. messages are sent to all nodes in the cluster in a fairly balanced way and consumers connect to all nodes in the cluster in a fairly balanced way).  However, in your use-case all production is happening on one node and all consumption is happening on the other.  That makes the cluster bridge between the two nodes a bottleneck (as you have observed).  As I have already noted, I don't think this configuration is optimal for your use-case.  You'd be much better off implementing what I described previously.

                     

                    Lastly, you may not even need multiple servers to deal with your load.  A single HornetQ server can handle millions of messages per second.  Have you conducted performance benchmarks with a single server and found that it couldn't support your performance requirements?

                    • 7. Re: Out of memory with Clustered HornetQ
                      gabboflabbo

                      Justin Bertram wrote:

                      Unfortunately I can't use your test-case since it requires two physical machines (as I noted previously).  Furthermore, you did not include your server configuration or test source code so I can't see what your address settings are or what the code is doing exactly.

                       

                      I'll attach the source.   (not at the office at the moment so it'll be in a few hours)

                       

                      The way you're utilizing clustering is a bit bizarre, in my opinion.  In most use-cases clustering is implemented to spread the load across the servers fairly evenly (i.e. messages are sent to all nodes in the cluster in a fairly balanced way and consumers connect to all nodes in the cluster in a fairly balanced way).  However, in your use-case all production is happening on one node and all consumption is happening on the other.  That makes the cluster bridge between the two nodes a bottleneck (as you have observed).  As I have already noted, I don't think this configuration is optimal for your use-case.  You'd be much better off implementing what I described previously.

                       

                      I haven't described the whole process but I don't mind doing it.  Basically, rest requests are received by our rest service. Each request contains up to 1000 messages and each message contains an IP address.  We then send each message to the queue with a group id of the IP address (This ensure messages per IP are processed serially).   The consumers then process each of these messages.   Since the consumption is quite a bit slower,  we will be clustering the consumers heavily.     The rest service will also be clustered.   

                       

                      Lastly, you may not even need multiple servers to deal with your load.  A single HornetQ server can handle millions of messages per second.  Have you conducted performance benchmarks with a single server and found that it couldn't support your performance requirements?

                      It's not hornetq that is slow,  it's obviously my processing of the jms messages.   I've conducted many performance tests and we definitely need to cluster.  Atm I get about 1500 msgs/second.    We need to reach 10,000 regularly.  (and handle spikes)

                      • 8. Re: Out of memory with Clustered HornetQ
                        gabboflabbo

                        I've been able to reproduce this with one machine.  (Two separate instances of wildfly)  I've simplied the 2 test projects as well and attached my wildfly standalone-full-xa for each of the 2 servers.

                         

                        Both wildfly instances were given 1 gig of memory.

                         

                        hornetq-clustering.war goes into ServerA

                        hornetq-clustering-consumer.war  goes into ServerB


                        To reproduce:


                        1.) Start up ServerA

                        2.) Visit http://{serverA-ip}:{serverA-Port}/hornetq-clustering/rest/test/generate/1000000

                          ( changing to server A's ip and port )


                        This will generate 1 million jms messages.  At this point in time server A is fine on memory.


                        3.) Start up Server B


                        Server B will start consuming messages.  Quickly at first but then slow down as Server A becomes unresponsive.  (a log message on Server B will appear every 10 seconds indicating how many messages were processed in the last 10 seconds):


                        11:02:00,036 INFO  [com.test.StatLogger] (EJB default - 1) Messages processed: 5222

                        11:02:10,026 INFO  [com.test.StatLogger] (EJB default - 2) Messages processed: 26511

                        11:02:20,113 INFO  [com.test.StatLogger] (EJB default - 3) Messages processed: 22724

                        11:02:30,032 INFO  [com.test.StatLogger] (EJB default - 4) Messages processed: 8511

                        11:02:40,080 INFO  [com.test.StatLogger] (EJB default - 5) Messages processed: 8387

                        11:02:50,026 INFO  [com.test.StatLogger] (EJB default - 6) Messages processed: 8666

                        11:03:00,107 INFO  [com.test.StatLogger] (EJB default - 7) Messages processed: 7760

                        11:03:10,051 INFO  [com.test.StatLogger] (EJB default - 8) Messages processed: 8532

                        11:03:20,029 INFO  [com.test.StatLogger] (EJB default - 9) Messages processed: 9222

                        11:03:30,032 INFO  [com.test.StatLogger] (EJB default - 10) Messages processed: 8400

                        11:03:40,007 INFO  [com.test.StatLogger] (EJB default - 1) Messages processed: 8880

                        11:03:50,001 INFO  [com.test.StatLogger] (EJB default - 2) Messages processed: 7963

                        11:04:00,001 INFO  [com.test.StatLogger] (EJB default - 3) Messages processed: 6656

                        11:04:10,032 INFO  [com.test.StatLogger] (EJB default - 4) Messages processed: 1568

                        11:04:20,001 INFO  [com.test.StatLogger] (EJB default - 5) Messages processed: 1232

                        11:04:30,014 INFO  [com.test.StatLogger] (EJB default - 6) Messages processed: 1060

                        11:04:40,007 INFO  [com.test.StatLogger] (EJB default - 7) Messages processed: 1356

                        11:04:50,001 INFO  [com.test.StatLogger] (EJB default - 8) Messages processed: 1072

                        11:05:00,001 INFO  [com.test.StatLogger] (EJB default - 9) Messages processed: 704

                        11:05:10,005 INFO  [com.test.StatLogger] (EJB default - 10) Messages processed: 0

                        11:05:20,027 INFO  [com.test.StatLogger] (EJB default - 1) Messages processed: 0

                        11:05:30,002 INFO  [com.test.StatLogger] (EJB default - 2) Messages processed: 0


                        At this point Server A runs into a lot of out of heap space exceptions.


                        • 9. Re: Out of memory with Clustered HornetQ
                          jbertram

                          Are you able to reproduce the problem if you follow the recommendations I outlined previously?

                          • 10. Re: Out of memory with Clustered HornetQ
                            jbertram

                            Since the consumption is quite a bit slower,  we will be clustering the consumers heavily.     The rest service will also be clustered.

                            I recommend that you develop a test that mimics your actual production use-case rather than a fake use-case that IMO doesn't make much sense.

                            • 11. Re: Out of memory with Clustered HornetQ
                              jbertram

                              It's not hornetq that is slow,  it's obviously my processing of the jms messages.   I've conducted many performance tests and we definitely need to cluster.  Atm I get about 1500 msgs/second.    We need to reach 10,000 regularly.  (and handle spikes)

                              Are you gathering these numbers based on the test you've attached to this thread?  If so, I'd say your performance numbers are artificially limited by the test itself.  To get the most out of the broker you'd need quite a few more producers and consumers running concurrently (ideally distributed fairly evenly across the cluster nodes).

                              • 12. Re: Out of memory with Clustered HornetQ
                                jbertram

                                We then send each message to the queue with a group id of the IP address (This ensure messages per IP are processed serially).

                                Be sure to read the documentation on clustered grouping.  Pay special attention to the bit about the server with the LOCAL handler being a potential single point of failure.

                                • 13. Re: Out of memory with Clustered HornetQ
                                  gabboflabbo

                                  I recommend that you develop a test that mimics your actual production use-case rather than a fake use-case that IMO doesn't make much sense.

                                  I'm trying to show that the BridgeImpl.class has a ConcurrentLinkedQueue "refs" that has no limit on the number of MessageReference entites and that causes problems in certain cases.  I've provided a reproducer to show one such case, but another simple one is when clustered consumers can't keep up,  that linkedqueue will eventually cause memory problems.  

                                   

                                  Are you gathering these numbers based on the test you've attached to this thread?  If so, I'd say your performance numbers are artificially limited by the test itself.  To get the most out of the broker you'd need quite a few more producers and consumers running concurrently (ideally distributed fairly evenly across the cluster nodes).

                                  The numbers I showed were from production real world use,  I was merely trying to say that our real world application will require clustering.

                                   

                                  Are you able to reproduce the problem if you follow the recommendations I outlined previously?

                                  If you are referring to blocking producer flow that is unfortunately not an option for us.   We need to respond to our clients requests immediately. We're looking for a dump and run solution.  If there are too many jms messages to process that's fine as long as we record them in the queue.

                                   

                                  Have you had a chance to try the example?

                                   

                                  If this is a limitation with hornetq,  then a possible solution might be to have the initial queue not be clustered,  then a consumer pulls from that and pushes to the clustered queue with producer flow control.   I assume it's possible to configure whether a queue is clustered or not?

                                  • 14. Re: Out of memory with Clustered HornetQ
                                    jbertram

                                    I'm trying to show that the BridgeImpl.class has a ConcurrentLinkedQueue "refs" that has no limit on the number of MessageReference entites and that causes problems in certain cases.  I've provided a reproducer to show one such case, but another simple one is when clustered consumers can't keep up,  that linkedqueue will eventually cause memory problems.

                                    I understand that.  I looked at the code in question when if you first started the thread.  I'm pushing back on this because I think you've reproduced the problem with a use-case that doesn't reflect a real-world scenario.

                                     

                                    Regarding the technical aspects of the issue you're seeing...This looks like it might be a byproduct of an optimization to reduce message latency which delivers the message using the thread that sent it.  In effect, when your application sends a message to a queue it is being delivered to the cluster bridge straight away on the thread you're using to send it.  The sending operation is so fast (i.e. the bridge is receiving messages so quickly) that the message references are accumulating in memory more quickly than the bridge is able to send them.  The bridge is attempting to send messages over the Netty transport whereas it is receiving messages directly over an in-vm transport.  I believe the mismatch here is ultimately causing the problem.  It's more pronounced when the machines are separated by a physical network, but the impact of the transport discrepancy still exists when the HornetQ instances are on the same machine.  You just have to push harder to see it.  This can be tuned, however, by setting the "direct-deliver" param on the in-vm acceptor you're using to "false".  You can read more about the "direct-deliver" parameter in the documentation.

                                     

                                    The numbers I showed were from production real world use...

                                    So you're only getting 1,500 msgs/sec in your production tests with multiple producers and consumers? 

                                     

                                    If you are referring to blocking producer flow that is unfortunately not an option for us.   We need to respond to our clients requests immediately. We're looking for a dump and run solution.  If there are too many jms messages to process that's fine as long as we record them in the queue.

                                    I wasn't referring to the flow control option.  I was referring to my original assessment of your use-case.  Here's what I said, "...that seems less than ideal from a performance and complexity-of-configuration standpoint.  From a performance perspective you're basically doubling the amount of work necessary to get a message to your consumer since the message has to go to a local queue, get consumed by the bridge, then sent to the other server by the bridge where it is delivered to another queue.  For durable messages that means the message would be written to and read from disk twice.  If you simply sent the messages directly to the server B or had the consumers from server B read from server A the performance would be better.  Such a change would also nullify the need for a cluster in the first place which would simplify your configuration."  I emphasized the most relevant text.  Aside from boosting the performance it would completely eliminate the problem you're seeing since it would remove the cluster bridge from the equation.

                                    1 2 Previous Next