3 Replies Latest reply on Nov 20, 2013 10:23 PM by gaohoward

    Round-robin client-side load-balancing and connection failure detection with pooled-connection-factory configured with multiple netty-connectors

    vbchin2

      We are attempting to use a pooled-connection-factory (HornetQ) configured with two netty-connectors pointing two remote AS 7, each acting as purely JMS messaging servers. The setup (shown quoted below) works but we find that:

       

      1. The connection pool appears to be filled with connections from only one netty-connector
      2. When a series of connections are opened and closed in sequence with only one connection active at a time we find that the created connection always belongs to only one netty-connector. We understand the pool behavior but we were hoping to see connections alternating (round-robin as default behavior) between the two configured netty-connectors
      3. Assuming a connection was created using a netty-connector (for example netty-remote1). And between this connection being closed and new connection being requested if the messaging server #1 dies, the pooled-connection-factory instead of throwing a error stack trace,  keeps trying to get a connection from the dead messaging server, waiting indefinitly. 


      So coming to the questions:

      1. Is this a well known or expected behavior ?
      2. What can be done in addition so that pooled-connection-factory :
        • Detects failures and servers only the established connections and not wait indefinitely for the dead server to come back up again
        • Creates a connection pool by round-robin'ing between netty-connectors and also serving them up in round-robin fashion when a connection is requested


      The following goes into the domain.xml file for the app servers (pasting only the relevant sections):

                     <connectors>

                     <netty-connector name="netty" socket-binding="messaging"/>    

                     <netty-connector name="netty-remote1" socket-binding="messaging-remote1"/>

                     <netty-connector name="netty-remote2" socket-binding="messaging-remote2"/>

                     <netty-connector name="netty-throughput" socket-binding="messaging-throughput">

                         <param key="batch-delay" value="50"/>

                     </netty-connector>

                     <in-vm-connector name="in-vm" server-id="0"/>

               </connectors>

           <pooled-connection-factory name="hornetq-ra-remote">

                <connectors>

                     <connector-ref connector-name="netty-remote1"/>    

                     <connector-ref connector-name="netty-remote2"/>

                </connectors>

                <entries>

                     <entry name="java:/RemoteJmsXA"/>    

                </entries>    

           </pooled-connection-factory>

               ....

              <outbound-socket-binding name="messaging-remote1">

                  <remote-destination host="192.168.1.2" port="5445"/>

              </outbound-socket-binding>

              <outbound-socket-binding name="messaging-remote2">

                  <remote-destination host="192.168.1.3" port="5445"/>

              </outbound-socket-binding>

              <outbound-socket-binding name="mail-smtp">

                  <remote-destination host="localhost" port="25"/>

              </outbound-socket-binding>

          </socket-binding-group>

      Messaging Architecture.jpg

        • 1. Re: Round-robin client-side load-balancing and connection failure detection with pooled-connection-factory configured with multiple netty-connectors
          vbchin2

          We believe we achieved closure on this issue. Before I dive into the what worked and how, I would like to point out that to achieve the same objective, the HornetQ configurations had to differ slightly between EAP 6.0.1 and EAP 6.1.1; which was quite unexpected. Thanks to Justin for all his feedback on a relevant ticket raised directly thru RedHat Support.


          Disclaimer: I understand that the discussions here are meant for the community product only but I think the post here will help somebody in a similar situation


          Common behavior and configuration between EAP 6.0.1 and EAP 6.1.1

           

          1. The connection pool appears to be filled with connections from only one netty-connector
          2. When a series of connections are opened and closed in sequence with only one connection active at a time we find that the created connection always belongs to only one netty-connector. We understand the pool behavior but we were hoping to see connections alternating (round-robin as default behavior) between the two configured netty-connectors
            1. The above behavior was consistent regardless of whether all the previous connections were closed before a new one was opened or otherwise.
            2. The client-side round-robin behavior when it came to connections (not messages) was *not* honored by the pooled-connection-factory configured with more than one netty-connector


          So we decided to fall back on server-side clustering of messages (not connections) to achieve (somewhat) the same behavior when it comes to message distribution. So in effect all of the configuration posted in the original post remained the same except for a minor change for EAP 6.1.1 discussed below


          Different behavior and configuration between EAP 6.0.1 and EAP 6.1.1


          In EAP 6.1.1:


          If a test was done to send messages via the pooled-connection-factory configured on the app server and if (for example) the netty-connector1 pointing to message server 1 was chosen to send the batch of messages, the next time the same test was repeated with message server 1 down. The app server kept trying to establish connection to the downed server without giving up and falling back on the other connector. We *did not* see the same behavior on 6.0.1.


          The fix was to throw in <reconnect-attempts>0</reconnect-attempts> as shown below:

               <pooled-connection-factory name="hornetq-ra-remote">

                    <connectors>

                         <connector-ref connector-name="netty-remote1"/>   

                         <connector-ref connector-name="netty-remote2"/>

                    </connectors>

                    <entries>

                         <entry name="java:/RemoteJmsXA"/>   

                    </entries>

                    <reconnect-attempts>0</reconnect-attempts>

               </pooled-connection-factory>



          In EAP 6.0.1


          When we relied on server-side load-balancing of messages, we noticed that the core-bridge that is established as part of the cluster would not terminate if one of the messaging servers were brought down. The problem we faced was that the pooled-connection-factory detected the connection failure and redirected all the batch of messages to the live server. But since the live server was attempting to distribute half of its load with the other server which was dead and since it kept waiting for it to come back up alive, it never displayed the remaining half in the management console or made it available for consumption. We *did not* see the same behavior on 6.1.1.

           

          To fix that issue we had to throw in the same snippet of code but into the following section on both the messaging servers:

           

           

                          <cluster-connections>

                              <cluster-connection name="my-standalone-cluster">

                                  <address>jms</address>

                                  <connector-ref>netty</connector-ref>

                                  <discovery-group-ref discovery-group-name="dg-group2"/>

                                  <reconnect-attempts>0</reconnect-attempts>

                              </cluster-connection>

                          </cluster-connections>

           

           

          If anyone has any follow-up questions on the what was discussed please let me know.

          • 2. Re: Round-robin client-side load-balancing and connection failure detection with pooled-connection-factory configured with multiple netty-connectors
            gaohoward

            is there a <ha> attribute for the pooled-connection-factory? like if you do

             

            <pooled-connection-factory name="...">

               <ha>true</ha>

               ......

            </pooled-connection-factory>

             

            will it make any difference?

            • 3. Re: Round-robin client-side load-balancing and connection failure detection with pooled-connection-factory configured with multiple netty-connectors
              gaohoward

              ok that might not help. Please ignore the messages, I'm deleting them.

               

              Thanks

              Howard