6 Replies Latest reply on Oct 2, 2012 11:19 AM by michellegx11

    Cluster didn't distribute messages

    michellegx11

      Hi All,

       

      I am using Hornetq version 2.2.5 Final and trying to establish a cluster with 2 node: node1 and node2. On each node, I have a Live machie and a Backup machine. But Standalone Hornetq on Node2 is the Backup of Live server on Node1; Standaloen on Node1 is the Backup of Live server on Node2.

       

      Node1: Live1 (embedded Hornetq) + Backup2 (standalone Hornetq)

      Node2: Live2 (embedded Hornetq) + Backup1 (standalone Hornetq)

       

      Live server is with embedded Hornetq of Jboss 6.1.0.Final with application software. Backup server is a Standalone Hornetq and is used only to treat unhandled Queue messages in case Live machine crashed.

       

      After starting Live and Backup servers on both nodes, the Backup machine has announced itself as Backup. In the Live machine, logs indicates the cluster has 2 members, as shown below:

       

      2012-09-27 17:07:47,385 INFO  [org.jboss.ha.framework.server.ClusterPartition.lifecycle.Partition1] (Incoming-15,null) New cluster view for partition Partition1 (id: 7, delta: 1, merge: false) : [192.168.134.128:1099, 192.168.134.130:1099]

      2012-09-27 17:07:47,385 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-15,null) Received new cluster view: [192.168.134.128:1099|7] [192.168.134.128:1099, 192.168.134.130:1099]

      2012-09-27 17:07:47,385 INFO  [org.jboss.ha.core.framework.server.DistributedReplicantManagerImpl.Partition1] (AsynchViewChangeHandler Thread) I am (192.168.134.128:1099) received membershipChanged event:

      2012-09-27 17:07:47,386 INFO  [org.jboss.ha.core.framework.server.DistributedReplicantManagerImpl.Partition1] (AsynchViewChangeHandler Thread) Dead members: 0 ([])

      2012-09-27 17:07:47,386 INFO  [org.jboss.ha.core.framework.server.DistributedReplicantManagerImpl.Partition1] (AsynchViewChangeHandler Thread) New Members : 1 ([192.168.134.130:1099])

      2012-09-27 17:07:47,387 INFO  [org.jboss.ha.core.framework.server.DistributedReplicantManagerImpl.Partition1] (AsynchViewChangeHandler Thread) All Members : 2 ([192.168.134.128:1099, 192.168.134.130:1099])

       

      But when sending messages, the mesages are not distributed at all. If sending message to node1, the message will be consumed by Live1. If sending to node2, message will be consumed by Live2.

       

      Anybody got an idea? Many thanks in advance!

        • 1. Re: Cluster didn't distribute messages
          clebert.suconic

          Are you sure you didn't copy the data between the nodes?

           

           

          you should use a newer version BTW.

          • 2. Re: Cluster didn't distribute messages
            michellegx11

            Hi Clebert,

             

            Thanks for your reply. I tested the structure again and found out the clustering did distrute the message now. I might by accident use the wrong way to send message with our software yesterday.

             

            But at this moment, I have met another problem. After I killed the Live server, the backup server successfully annouced itself to be alive, as shown below:

            [Thread-1] 14:00:24,720 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  Backup Server is now live
            [Thread-6 (group:HornetQ-server-threads1762702912-750922299)] 14:00:24,891 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl]  Connecting bridge sf.my-cluster.fa8a6b0a-095d-11e2-a8d2-000c2986119a to its destination [f88154cf-095d-11e2-92c9-000c299c9d74]
            [Thread-6 (group:HornetQ-server-threads1762702912-750922299)] 14:00:24,957 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl]  Bridge sf.my-cluster.fa8a6b0a-095d-11e2-a8d2-000c2986119a is connected [f88154cf-095d-11e2-92c9-000c299c9d74-> sf.my-cluster.fa8a6b0a-095d-11e2-a8d2-000c2986119a]

             

            But the message stored in the journal is not treated at all by the backup machine.

             

            In the logs of another Live machine, I have seen the following sentences. It seems, the other Live machine did recognized the failedOver, but due to some reason, the channel is disconnected. Do you know why it happened? Thanks!

             

            2012-09-28 14:00:23,862 WARN  [org.hornetq.core.server.cluster.impl.BridgeImpl] (Thread-0 (group:HornetQ-client-global-threads-561276172)) sf.my-cluster.f88154cf-095d-11e2-92c9-000c299c9d74::Connection failed before reconnect : HornetQException[errorCode=2 message=Channel disconnected]
                    at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:363)
                    at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:687)
                    at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                    at java.lang.Thread.run(Thread.java:619)

            2012-09-28 14:00:24,934 WARN  [org.hornetq.core.server.cluster.impl.BridgeImpl] (Thread-0 (group:HornetQ-client-global-threads-561276172)) sf.my-cluster.f88154cf-095d-11e2-92c9-000c299c9d74::Connection failed with failedOver=true: HornetQException[errorCode=2 message=Channel disconnected]
                    at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:363)
                    at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:687)
                    at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                    at java.lang.Thread.run(Thread.java:619)

            2012-09-28 14:00:24,948 WARN  [org.hornetq.core.protocol.core.impl.ChannelImpl] (Old I/O client worker (channelId: 635122141, /127.0.0.1:54161 => localhost/127.0.0.1:5446)) Can't find packet to clear:  last received command id 3 first stored command id 3
            2012-09-28 14:00:24,951 WARN  [org.hornetq.core.protocol.core.impl.ChannelImpl] (Old I/O client worker (channelId: 635122141, /127.0.0.1:54161 => localhost/127.0.0.1:5446)) Can't find packet to clear:  last received command id 4 first stored command id 4
            2012-09-28 14:00:24,951 WARN  [org.hornetq.core.protocol.core.impl.ChannelImpl] (Old I/O client worker (channelId: 635122141, /127.0.0.1:54161 => localhost/127.0.0.1:5446)) Can't find packet to clear:  last received command id 5 first stored command id 5
            2012-09-28 14:00:24,953 WARN  [org.hornetq.core.protocol.core.impl.ChannelImpl] (Old I/O client worker (channelId: 635122141, /127.0.0.1:54161 => localhost/127.0.0.1:5446)) Can't find packet to clear:  last received command id 6 first stored command id 6
            2012-09-28 14:00:25,357 INFO  [org.jboss.ha.framework.server.ClusterPartition.Partition1] (VERIFY_SUSPECT.TimerThread,Partition1-HAPartition,192.168.134.128:1099) Suspected member: 192.168.134.130:1099
            2012-09-28 14:00:25,488 INFO  [org.jboss.ha.framework.server.ClusterPartition.lifecycle.Partition1] (Incoming-4,null) New cluster view for partition Partition1 (id: 2, delta: -1, merge: false) : [192.168.134.128:1099]
            2012-09-28 14:00:25,488 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-4,null) Received new cluster view: [192.168.134.128:1099|2] [192.168.134.128:1099]
            2012-09-28 14:00:25,491 INFO  [org.jboss.ha.core.framework.server.DistributedReplicantManagerImpl.Partition1] (AsynchViewChangeHandler Thread) I am (192.168.134.128:1099) received membershipChanged event:
            2012-09-28 14:00:25,492 INFO  [org.jboss.ha.core.framework.server.DistributedReplicantManagerImpl.Partition1] (AsynchViewChangeHandler Thread) Dead members: 1 ([192.168.134.130:1099])
            2012-09-28 14:00:25,492 INFO  [org.jboss.ha.core.framework.server.DistributedReplicantManagerImpl.Partition1] (AsynchViewChangeHandler Thread) New Members : 0 ([])
            2012-09-28 14:00:25,492 INFO  [org.jboss.ha.core.framework.server.DistributedReplicantManagerImpl.Partition1] (AsynchViewChangeHandler Thread) All Members : 1 ([192.168.134.128:1099])

             

            I have checked the Standalone Hornetq with Jconsole and it seems the Backup machine had formed the cluster with the other Live machine.  The Queue were also created successfully at the backup server, but there's no consumer to consume the info.

             

             

            PS: The reason I didn't use the latest version is because our software used the version 2.2.5 Final already in the past, when first using the Hornetq. I recently worked on it again, just to adding the backup servers, so I used the same version as before.

             

             

            • 3. Re: Cluster didn't distribute messages
              clebert.suconic

              I will repeat my previous question: Are you sure you didn't copy the journal between the servers as a template. There's a file on the journal folder with the ID for the server, which is supposed to be unique. If you copy that file between your servers you will duplicate the identity and create some problems.

              • 4. Re: Cluster didn't distribute messages
                michellegx11

                Hi Clebert,

                 

                Yes, I am sure, I didn't copy the Journals between each Live/Backup pair. I craeted 2 new folders to store files for each pair and deleted those files to have a fresh start for my tests.

                 

                Xue

                • 5. Re: Cluster didn't distribute messages
                  michellegx11

                  Since the backup server seems not having any consumer, I suppose the MDB is not able to connect to the remote Hornetq, thus updated the ra.xml ( under jms-ra.rar/META-INF), mention both the Live server and the Backup server, as shown below:

                   

                           <config-property>
                            ......

                           <config-property-name>ConnectorClassName</config-property-name>
                           <config-property-type>java.lang.String</config-property-type>
                           <config-property-value>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory,org.hornetq.core.remoting.impl.netty.NettyConnectorFactory,org.hornetq.core.remoting.impl.netty.NettyConnectorFactory,org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</config-property-value>
                        </config-property>
                        <config-property>
                            ......    

                           <config-property-name>ConnectionParameters</config-property-name>
                           <config-property-type>java.lang.String</config-property-type>
                           <config-property-value>host=192.168.134.128;port=5445,host=192.168.134.128;port=5446,host=192.168.134.130;port=5445,host=192.168.134.130;port=5446</config-roperty-value>
                        </config-property>

                       <config-property>
                            ......    
                           <config-property-name>HA</config-property-name>
                           <config-property-type>java.lang.Boolean</config-property-type>
                             <config-property-value>true</config-roperty-value>
                        </config-property>

                   

                  After this configuration, I can see consumers at the Backup server after killing Live server and the Backup server is able to treate the unhandled messages.

                   

                  The above scenario works fine, but when testing further, I have a new problem.

                   

                  After I restart the Live server, say Live1, I have seen the Live1 server getting the control and Backup1 server announced to be backup again. When sending messages, according to the message count from the Jmx console, messages reached to both Live1 and Live2 equally, but the customer who consumed the message are always from Live2. Live1 had consumers but didn't consume any messages any more. If I stop Live2, then Live1 will begin to consume message again. If I start Live2 again at this moment, Live2 will stop consume message then and all the messges will be consumed by Live1.

                   

                   

                   

                  I have tested further, it seems the "not load balancing at the client side" problem I have descrived above was due to the ra.xml changes.

                   

                  I removed the changes on ra.xml and the message started to be consumed equally again, which of course, make me go back to the old stage where my Backup server have no consumers.

                   

                   

                  So it is back to the original point now.

                  • 6. Re: Cluster didn't distribute messages
                    michellegx11

                    I have updated the hornetq-configuration.xml for the backup server, allowing the message redistribution by adding <redistribution-delay>0</redistribution-delay> under <address-settings> part.

                     

                    Now, the backup server is able to forward the unhandled mesage to other cluster machine after Live server crashed. After the Live server getting back, it will form again the cluster with the other Live machine and handling messages in a balanced way.