13 Replies Latest reply on Jul 24, 2014 7:29 PM by Daniel Baum

    Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.

    John Muhlestein Newbie

      Accidently posted as an article before...


      I have 4 nodes set up all running in Jboss7 (7.1.2)  I have upgraded to 2.2.23.AS7.Final HornetQ running inside the app server.


      Here is my scenario.

      1. start node 1  - all things are good (cluster1.tmx.com)
      2. start node 2  (cluster2.tmx.com)
        1. Node 2 comes up correctly, through JMX (to node 2) I can see an entry in ClusterConnection with my cluster name ("cluster1"). When  I look in the attribues, at the Nodes attribute, I see 1 node listed.
          1. The node listed has a hash value = node1.host.com/ip address:port
          2. 49917ce9-63e8-11e2-9ffc-b94078d8350a=cluster1.tmx.com/
          3. All things look good
        2. Looking back at node1 through JMX at the same ClusterConnection has the following entry
          1. 72248456-6972-11e2-b14d-4fe9b2c880f3=cluster1.tmx.com/
          2. This is where it gets interesting. Notice that there is a different has value, but that the host name/ ip address it is pointing at is NOT the second node in the cluster, but instead is pointing at itself
      3. Node 2 is able to communicate with Node1, but Node1 is not able to communicate with Node 2
      4. I now start Node3
        1. Node 3 now has 2 Nodes listed in it's ClusterConnection, both correctly pointing to Node2 and Node1
        2. Node 2 now has 2 Nodes listed in it's ClusterConnection, the original Node points back to Node1, The new Node now points back to itself.
        3. Node1 now has 2 Nodes listed in it's ClusterConnection, both these nodes point back to itself
      5. Node 3 can communicate with 2 and 1, Node 2 can only communicate with Node 1, Node 1 cannot communicate with either 2 or 3
      6. I now start Node4 (wait for it....)
        1. Node 4 now has 3 Nodes listed in it's ClusterConnection, all three are correclty pointing to Node 3, Node 2 and Node 1
        2. Node 3 now has 3 Nodes listed. 2 correctly pointing at Node 2 and Node 1 and a new Node pointing back to itself
        3. Node 2 now has 3 Nodes listed. 1 correctly pointing at Node 1 and 2 incorreclty pointing back to itself
        4. Node 1 now has 3 Nodes listed. All three are incorrectly pointing back to itself.
      7. A couple of other interesting items:
        1. If you look at this entry in Node 3
          1. 72248456-6972-11e2-b14d-4fe9b2c880f3=cluster2.tmx.com/
        2. And the same entry (based on the hash value) in Node1
          1. 72248456-6972-11e2-b14d-4fe9b2c880f3=cluster1.tmx.com/ (this is incorrect, this address points back to the Node 1 machine)
        3. You notice that the identical hashes are pointing to different addresses. It seems all the Hash values are identified correctly in each node, but the address/host the point to the wrong address
        4. If I go back and stop Jboss on Node1, then restart it, Discovery seems to work correctly, and now all three Nodes listed point to the correct host name, and Node1 is now able to communicate with Node 2, Node 3 and Node 4
        5. Restarting each node 1 at a time eventually gets the entire cluster communicating.


      Any ideas out there on why this happens?



      All Nodes have the following in their standalone.xml config file




                    <broadcast-group name="hq-cluster-broadcast">


                      <connector-ref connector-name="netty">netty</connector-ref>




                    <discovery-group name="hq-cluster-discovery">






                    <cluster-connection name="cluster-conn-1">

                      <connector-ref connector-name="netty">netty</connector-ref>



                      <discovery-group-ref discovery-group-name="hq-cluster-discovery" />




              <socket-binding name="messaging-group" multicast-address="${jboss.messaging.group.address:}" multicast-port="${jboss.messaging.group.port:9876}"/>