14 Replies Latest reply on Jul 30, 2009 2:42 PM by adridi

    JBM2 cluster fails under heavy load

    adridi

      I have set a cluster of 2 nodes, each node has it's own backup.
      Each pair (Live/backup) is installed in a 64bits Linux box.
      Configuration of 4 nodes is the same except "backup" is set to false in each live node.

      In each node I have 102 distributed queues; A producer produces message to an InBoundQueue in each node and a consumer
      consuming messages from the InBoundQueue and distributes them over the 100 queues depending on the message content, each queue of the 100 ones
      has a consumer that consumes messages and copy them to a common distributed outBoundQueue.
      I have a consumer producer per Queue except for the outBound queue where I have a pool of 100 producer and 1 consumer

      The InBoundQueue producer has a rate of 500msg/s which leads to a 1000msg/s for the cluster.

      After 30 min of running, I had the following error:

      Jul 22, 2009 6:37:11 PM org.jboss.messaging.core.logging.Logger warn
      WARNING: Connection failure has been detected Did not receive data from server (or ping).:3
      18:37:42,055 ERROR @Thread-12 (group:JBM-client-global-threads-621631806) [SmppQueueListener] Exception in onMessage():
      javax.jms.JMSException: Timed out waiting for response when sending packet 43
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$ChannelImpl.sendBlocking(RemotingConnectionImpl.java:1155)
       at org.jboss.messaging.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:420)
       at org.jboss.messaging.jms.client.JBossMessage.acknowledge(JBossMessage.java:969)
       at com.clairmail.test.happypath.SmppQueueListener.onMessage(SmppQueueListener.java:56)
       at org.jboss.messaging.jms.client.JMSMessageListenerWrapper.onMessage(JMSMessageListenerWrapper.java:97)
       at org.jboss.messaging.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:670)
       at org.jboss.messaging.core.client.impl.ClientConsumerImpl.access$100(ClientConsumerImpl.java:41)
       at org.jboss.messaging.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:787)
       at org.jboss.messaging.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:105)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:619)
      Caused by: MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 43]
      


      then the system failover took over for 5 min or so then all connections were destroyed.

      I Followed Tim's docs (CH.36 and 37) to set the cluster and the backup nodes.
      Do you think it's the network switched that's causing the problem?
      Thought I have 1G switch.

      Thanks,
      Abdel

        • 1. Re: JBM2 cluster fails under heavy load
          clebert.suconic

          Any chance you could try it using a build from trunk? There was a ping problem on trunk that was fixed.


          In any case, if you have a test to share it would be really nice, we would give a try replicating your issue.

          • 2. Re: JBM2 cluster fails under heavy load
            timfox

            Also it would help a lot if you always mention what exact version you are running.

            • 3. Re: JBM2 cluster fails under heavy load
              adridi

              Thanks Tim and Clebert,

              I am using the latest version; JBM2-Beta3.
              I will try with a build from the trunk as Clebert suggested, else can I create a Jira and attach my code?

              Abdel

              • 4. Re: JBM2 cluster fails under heavy load
                clebert.suconic

                Yeah...


                please create a JIRA if you still see an issue:

                https://jira.jboss.org/jira/browse/JBMESSAGING

                If you attach your code with instructions to replicate it, we will give it a try.

                • 5. Re: JBM2 cluster fails under heavy load
                  adridi

                  I tried with the trunk and still have the problem - thought I saw some improvement with the trunk build.

                  I created a Jira: JBMESSAGING-1691

                  Thanks,
                  Abdel

                  • 6. Re: JBM2 cluster fails under heavy load
                    adridi

                    Clebert,

                    Did you have a chance to check my Jira?
                    I believe there's a clustering/failover issue: when some connections attached to live server fail, the backup node will kick-off and you will have the pair live/backup broadcasting the same node id.

                    Beside when trying JBM2 with a limited number of queues (like 5) things works good and when increasing the number of queues (100ds) the system doesnt behave properly.

                    Any ideas how can I resolve the connection failure issue?

                    Thanks,
                    Abdel

                    • 7. Re: JBM2 cluster fails under heavy load
                      clebert.suconic

                      I' m actually working on it right now...

                      Give me a few hours and I will get back on this thread.

                      • 8. Re: JBM2 cluster fails under heavy load
                        clebert.suconic

                        I was having a bit of trouble to configure my environment (some hardware issues of my own).

                        but now I' m a bit confused on what you start on the test. Do you use the TestLauncher or you run the test in a different way? Can you add a description on how to replicate it to the JIRA?

                        • 9. Re: JBM2 cluster fails under heavy load
                          clebert.suconic

                          Abdel,


                          I added a build.xml, as I won' t have access to the graphical interface on the machine I' m using.

                          Also.. I first ran the nodeA, but nothing happened. your tests was just waiting some latch. Do I need to run both nodes in order for this test to work?

                          Your test seems a bit complex as it involves a bunch of stuff you' re doing besides JBM. (which is ok). I just didn' t want to spend time debugging things that are beyond the scope now. So.. that' s why the test is just hanging when I only ran the first node.

                          • 10. Re: JBM2 cluster fails under heavy load
                            adridi

                            I added some detailed information in the Jira.
                            Please take a look at the "jbmTest.jpg" picture; it represents the Queues that has been deployed in each container.

                            Please, let me know what I might be doing wrong.

                            Thanks,
                            Abdel

                            • 11. Re: JBM2 cluster fails under heavy load
                              clebert.suconic

                              at this point I can' t even run your test. I' m starting the first node only, and it hangs. I was wondering if you could give me hint on why? (I mean.. since I don' t want to debug your code besides the possible problem).

                              I will try it again tomorrow.

                              • 12. Re: JBM2 cluster fails under heavy load
                                adridi

                                Clebert,
                                In one Node every thing work perfect.
                                it's the cluster/failover that's failing.
                                And Yes, you need both nodes.

                                My test case is all JBM2, but in order to start all consumers/producers in parallel I am using a Java concurrence library JCSP .

                                you can ignore the latch as it waits for a the total count of messages to be consumed or an end messages to stop consuming.

                                Abdel

                                • 13. Re: JBM2 cluster fails under heavy load
                                  clebert.suconic

                                  Abdel,
                                  We are doing a major piece of work/refactoring (Tim Fox is doing it) on the replication. I will get back to this as soon as Tim Fox is done with that over the next week (or two weeks maybe).

                                  We will create soak tests for that, and I will make sure your case will work with this.

                                  It will be more productive this way.

                                  • 14. Re: JBM2 cluster fails under heavy load
                                    adridi

                                    Thanks Clebert!

                                    I know that Tim is in vacation.
                                    Your suggestion make sense to me.

                                    Abdel