5 Replies Latest reply on Oct 24, 2011 10:46 AM by dskiles

    HornetQ clustering issues on a 3 node setup

    dskiles

      I'm currently trying to set up a load-balanced, 3 node JBoss cluster for an application that uses HornetQ.  When I create 2 node cluster, everything works beautifully. 

       

      When I set up a three node cluster, one of the nodes always throws an error that NettyConnector has failed to create a netty connection due to a SocketTimeoutException (stack trace is attached).  When I try to run a test case through that setup, 1/3 of all of my messages seem to get lost.

       

      If I restart the offending node while the other two are still running, I don't get the exception, but I still see the same behavior when I rerun my tests.

       

      Nothing in the FAQ jumped out at me as an obvious issue: each node is assigned a specific bind address (not 0.0.0.0) and all three nodes can communicate with one another.  Am I missing something obvious here that would cause messages to get dropped in a three node setup, but not a two node?

       

      I have attached the stack trace that I mentioned earlier, as well as the hornetq-configuration.xml and hornetq-jms.xml for each node.  If anyone has any suggestions or ideas, I would appreciate hearing them.  If any more information would be useful, please let me know.

        • 1. Re: HornetQ clustering issues on a 3 node setup
          dskiles

          Additional details based on further testing

           

          Based on what I'm seeing in Spring's TRACE level logging, I am definitely sending 1000 messages using JmsTemplate*.  The logging also seems to show that the spring JMS listener on one node is receiving exactly 1/3 of all of the requests, while two of the other nodes are receiving 1/6th of all requests.  This corresponds with what I am seeing in my own application level logging, as well as what eventually ends up persisted to database.  The remaining 1/3 of all messages seem to just go out into the void.

           

          When I look at the JBoss JMX console for the JMS queue in question each node, the "MessagesAdded" attribute seems to record that all three nodes receive exactly 1/3 of the messages, but on 2 of the nodes, there's a disconnect between that value and the number of objects I actually see through logging and values that my application persist.

           

          When I parse the available data, I see this pattern emerge, where A, B, and C represent the node that is processing each message:

           

          A, B, A, C, A, B, A, C, A, B, A, etc...

           

          Which node is A, which is B, and which is C seems to largely be the luck of the draw between cluster restarts.

           

           

           

          *(I am aware that JMS Template is regarded as an antipattern.  However, this is pre-existing code and I am doing what I can to mitigate the problem by using a cached connection factory.)

          • 2. Re: HornetQ clustering issues on a 3 node setup
            dskiles

            Second update based on additional testing.  I would be greatful to hear any suggestions.

             

            Using a small test harness I've reproduced something that is almost what I'm seeing in the full system.

             

            I have a stand alone, nonclustered server (configuration attached), a class acting as a listener on a queue (attached), and a class that produces messages and sends them to the same queue (attached).

             

            Here's what I'm seeing:

             

            1. Start standalone server.
            2. Launch listener.
            3. Launch producer and allow it to complete.
            4. Listener receives all 100 messages.
            5. Stop listener.
            6. Start listener again.
            7. Launch producer again.
            8. Listener will only receive every other message.

             

            This is close to what I'm seeing on 2 of the 3 nodes in my cluster.  Do I have a configuration mistake or something?

            • 3. Re: HornetQ clustering issues on a 3 node setup
              clebert.suconic

              It seems your acceptor is only hearing localhost.

               

               

              Set a real ip on the acceptor and connector. Make it the same to what you are using on the static cluster.

              • 4. Re: HornetQ clustering issues on a 3 node setup
                dskiles

                Sorry about the late response.  My testbed got co-opted for other purposes.

                 

                On all three hosts I removed the system property reference for the local node and replaced it with a string literal in the connector and acceptor. (configs are attached).

                 

                After restarting the system I reproduced my test scenario in my main application and saw the same behavior that I had before.

                 

                All three connectors on each host reference a specific, string literal bind address and a specific, string literal port.  All three nodes can communicate with each other, but on two of the nodes exactly one half of the incoming messages are getting lost.

                 

                Any other ideas?

                • 5. Re: HornetQ clustering issues on a 3 node setup
                  dskiles

                  I ended up fixing this by systematically changing each configuration setting, one by one, until the problem went away.

                   

                  The offending configuration element was <forward-when-no-consumers>true</forward-when-no-consumers>.  Setting that value to false fixed the problem.