13 Replies Latest reply on Jul 24, 2014 7:29 PM by daniel.baum

    Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.

    johnnysoccer

      Accidently posted as an article before...

       

      I have 4 nodes set up all running in Jboss7 (7.1.2)  I have upgraded to 2.2.23.AS7.Final HornetQ running inside the app server.

       

      Here is my scenario.

      1. start node 1  - all things are good (cluster1.tmx.com)
      2. start node 2  (cluster2.tmx.com)
        1. Node 2 comes up correctly, through JMX (to node 2) I can see an entry in ClusterConnection with my cluster name ("cluster1"). When  I look in the attribues, at the Nodes attribute, I see 1 node listed.
          1. The node listed has a hash value = node1.host.com/ip address:port
          2. 49917ce9-63e8-11e2-9ffc-b94078d8350a=cluster1.tmx.com/172.21.162.48:5445
          3. All things look good
        2. Looking back at node1 through JMX at the same ClusterConnection has the following entry
          1. 72248456-6972-11e2-b14d-4fe9b2c880f3=cluster1.tmx.com/172.21.162.48:5445
          2. This is where it gets interesting. Notice that there is a different has value, but that the host name/ ip address it is pointing at is NOT the second node in the cluster, but instead is pointing at itself
      3. Node 2 is able to communicate with Node1, but Node1 is not able to communicate with Node 2
      4. I now start Node3
        1. Node 3 now has 2 Nodes listed in it's ClusterConnection, both correctly pointing to Node2 and Node1
        2. Node 2 now has 2 Nodes listed in it's ClusterConnection, the original Node points back to Node1, The new Node now points back to itself.
        3. Node1 now has 2 Nodes listed in it's ClusterConnection, both these nodes point back to itself
      5. Node 3 can communicate with 2 and 1, Node 2 can only communicate with Node 1, Node 1 cannot communicate with either 2 or 3
      6. I now start Node4 (wait for it....)
        1. Node 4 now has 3 Nodes listed in it's ClusterConnection, all three are correclty pointing to Node 3, Node 2 and Node 1
        2. Node 3 now has 3 Nodes listed. 2 correctly pointing at Node 2 and Node 1 and a new Node pointing back to itself
        3. Node 2 now has 3 Nodes listed. 1 correctly pointing at Node 1 and 2 incorreclty pointing back to itself
        4. Node 1 now has 3 Nodes listed. All three are incorrectly pointing back to itself.
      7. A couple of other interesting items:
        1. If you look at this entry in Node 3
          1. 72248456-6972-11e2-b14d-4fe9b2c880f3=cluster2.tmx.com/172.21.162.49:5445
        2. And the same entry (based on the hash value) in Node1
          1. 72248456-6972-11e2-b14d-4fe9b2c880f3=cluster1.tmx.com/172.21.162.48:5445 (this is incorrect, this address points back to the Node 1 machine)
        3. You notice that the identical hashes are pointing to different addresses. It seems all the Hash values are identified correctly in each node, but the address/host the point to the wrong address
        4. If I go back and stop Jboss on Node1, then restart it, Discovery seems to work correctly, and now all three Nodes listed point to the correct host name, and Node1 is now able to communicate with Node 2, Node 3 and Node 4
        5. Restarting each node 1 at a time eventually gets the entire cluster communicating.

       

      Any ideas out there on why this happens?

       

       

      All Nodes have the following in their standalone.xml config file

       

       

           <broadcast-groups>

                    <broadcast-group name="hq-cluster-broadcast">

                      <socket-binding>messaging-group</socket-binding>

                      <connector-ref connector-name="netty">netty</connector-ref>

                    </broadcast-group>

                  </broadcast-groups>

                  <discovery-groups>

                    <discovery-group name="hq-cluster-discovery">

                      <socket-binding>messaging-group</socket-binding>

                      <refresh-timeout>60000</refresh-timeout>

                    </discovery-group>

                  </discovery-groups>

                  <cluster-connections>

                    <cluster-connection name="cluster-conn-1">

                      <connector-ref connector-name="netty">netty</connector-ref>

                      <address>jms</address>

                      <max-hops>1</max-hops>

                      <discovery-group-ref discovery-group-name="hq-cluster-discovery" />

                    </cluster-connection>

             </cluster-connections>

       

              <socket-binding name="messaging-group" multicast-address="${jboss.messaging.group.address:231.7.7.7}" multicast-port="${jboss.messaging.group.port:9876}"/>

       

      thanks,

      John

        • 1. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
          gaohoward

          What's your 'netty' connector's configuration for each node?

           

          Howard

          • 2. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
            johnnysoccer

            standalone.xml is the same for all 4 nodes

             

                        <connectors>

                          <netty-connector name="netty" socket-binding="messaging"/>

                          <netty-connector name="netty-throughput" socket-binding="messaging-throughput">

                            <param key="batch-delay" value="50"/>

                          </netty-connector>

                          <in-vm-connector name="in-vm" server-id="0"/>

                        </connectors>

                        <acceptors>

                          <netty-acceptor name="netty" socket-binding="messaging"/>

                          <netty-acceptor name="netty-throughput" socket-binding="messaging-throughput">

                            <param key="batch-delay" value="50"/>

                            <param key="direct-deliver" value="false"/>

                          </netty-acceptor>

                          <in-vm-acceptor name="in-vm" server-id="0"/>

                        </acceptors>

             

                    <socket-binding name="messaging" port="5445"/>

                    <socket-binding name="messaging-throughput" port="5455"/>

                    <socket-binding name="messaging-group" multicast-address="${jboss.messaging.group.address:231.7.7.7}" multicast-port="${jboss.messaging.group.port:9876}"/>

            • 3. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
              johnnysoccer

              Any thoughts on this?

              I included my netty connector information

              Does the connector name need to be different for each node?

               

              John

              • 4. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                ataylor

                make sure they arent all using the same journal

                • 5. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                  johnnysoccer

                  I'm not sure what you mean by using the same journal.

                  Do you mean that all 4 machines need to be sharing a drive and all their journal-directory mappings need to point to that shared drive?

                  • 6. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                    ataylor

                    Do you mean that all 4 machines need to be sharing a drive and all their journal-directory mappings need to point to that shared drive?

                    No, thats exactly what you dont want, since you are running on different machines its not an issue (unless you have accidently copied over the journal from one machine to another).

                     

                    If you are using the same config, how are you starting each server, i will try tomorrow and see what happens.

                     

                    Also note that a node should not be able to connect to itself as it ignores anything with the same node ID as itself.

                     

                    Also if you could try the latest AS7 nightly build

                    • 7. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                      johnnysoccer

                      Sorry, I misread your previous post (I read it as "are" instead of "aren't")

                      Each machine is completely independent.

                       

                      I'm not sure I'm doing anything special to get them to start on different machines with the same standalone.xml file.

                       

                      Also note that a node should not be able to connect to itself as it ignores anything with the same node ID as itself.

                       

                      This is interesting, it does appear to ignore the same node ID, but,  like I mentioned in the original post, there when you look at different nodes in the cluster, the same node ID is mapped to different machine name/IP address

                      (you can look at point 7.1 and 7.2 above).

                      • 8. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                        ataylor

                        I'm not sure I'm doing anything special to get them to start on different machines with the same standalone.xml file.

                         

                        In that case are they not just binding to localhost which i thought was the default?

                         

                        This is interesting, it does appear to ignore the same node ID, but,  like I mentioned in the original post, there when you look at different nodes in the cluster, the same node ID is mapped to different machine name/IP address

                        (you can look at point 7.1 and 7.2 above).

                        That can only happen if you have copied over the journal from one machine to another, make sure you only do this with a clean installation or make sure the data directory is clean

                        • 9. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                          johnnysoccer

                          We are telling the servers to bind to 0.0.0.0 in the standalone.conf and standalone.sh, so they are binding to all interfaces on the machine.

                           

                          The data directories were all clean to start with. As a matter of fact, I've cleared them out and recreated them several times to destroy all the old bindings.

                           

                          In standalone.conf:

                            JAVA_OPTS="$JAVA_OPTS -Djboss.bind.address.management=0.0.0.0"

                           

                          When we call standalone.sh we pass in the -b 0.0.0.0 as well.

                          • 10. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                            ataylor

                            can you try the nightly build, if you still see the same i will take a look tomorrow

                            • 11. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                              johnnysoccer

                              Nightly build of HornetQ or Jboss7?

                               

                              Going to a version of JBoss7 > 7.1.2 isn't an option due to software certification.

                              • 12. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                                johnnysoccer

                                Just as an FYI, I seem to have discovered the problem.

                                 

                                1. the names for the discover-group for each node needs to be unique

                                2. I also modified the configuration so that each node in the cluster had a netty-connector set up to a unique port (node1 = 5445, node2=5446, node3=5447, etc.)

                                 

                                These configuration changes seem to have solved the problems with initial discovery not working correctly.

                                1 of 1 people found this helpful
                                • 13. Re: Clustering standalone HornetQ in Jboss7. Initial discovery is incorrect.
                                  daniel.baum

                                  I had this same problem and attempted to do the 2 fixes that John mentioned.  I only had to do #2 in order to get my first node to forward JMS messages to my second node.  Here is what I changed in the socket bindings...

                                   

                                  Went from

                                  <socket-binding name="messaging" port="5445"/>

                                  to

                                  <socket-binding name="messaging" port="${jboss.default.massaging.address:5445}"/>

                                   

                                  And in the .bat scripts to start each server just set "-Djboss.default.massaging.address=5445" or "-Djboss.default.massaging.address=5446" and so on.