7 Replies Latest reply on May 20, 2010 12:55 PM by artp

    Core Bridge Failover issues

    artp

      I'm having issues with a core bridge during failover. Currently, I have two clusters of jboss 5.1 servers with HornetQ 2.0. Cluster A contains three nodes(A1,A2,A3). Cluster B has two nodes(B1,B2). Each node on cluster A has a core bridge configured to send messages to Cluster B (see below). I set up a connector(B1) and a backup(B2). To test failover, I took down B1 and  messages produced on cluster A went to B2 as expected. Then I took down B2, so no nodes were running in cluster B. I waited for some time then brought up B1. After B1 started I saw an exception on each node in cluster A(see below) and a few exceptions on B1(see one below). I tried sending messages on node A1 but none were sent over the bridge. Messages were backing up on the forwarding address queue(jms.topic.WOPEvents) of the bridge.

       

      It looks like my issue is similar to https://community.jboss.org/thread/149213

       

      Also, would it help to upgrade to 2.1?

       

       

      Bridge on cluster A nodes

      <bridge name="ots-bridge">
                <queue-name>jms.queue.OTSForward</queue-name>
                <forwarding-address>jms.topic.WOPEvents</forwarding-address>
                <retry-interval>5000</retry-interval>
                <reconnect-attempts>-1</reconnect-attempts>
                <failover-on-server-shutdown>true</failover-on-server-shutdown>
                <use-duplicate-detection>false</use-duplicate-detection>
                <connector-ref connector-name="B1"
                               backup-connector-name="B2"/>
             </bridge>

       

       

       

      Exception on A1,A2,A3

      2010-05-18 23:14:41,522 WARN  [org.hornetq.core.remoting.impl.RemotingConnectionImpl] (Thread-19 (group:HornetQ-client-global-threads-968713772)) Connection failure has been detected: Did not receive data from server for org.hornetq.integration.transports.netty.NettyConnection@220334b4[local= /10.20.28.168:56337, remote=euca-10-20-28-165.eucalyptus.ec.company.corp/10.20.28.165:5445] [code=3]
      2010-05-18 23:15:11,528 ERROR [org.hornetq.core.client.impl.ClientSessionImpl] (Thread-14 (group:HornetQ-client-global-threads-968713772)) Failed to handle failover
      HornetQException[errorCode=3 message=Timed out waiting for response when sending packet 32]
              at org.hornetq.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:270)
              at org.hornetq.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:863)
              at org.hornetq.core.client.impl.FailoverManagerImpl.reconnectSessions(FailoverManagerImpl.java:785)
              at org.hornetq.core.client.impl.FailoverManagerImpl.failoverOrReconnect(FailoverManagerImpl.java:686)
              at org.hornetq.core.client.impl.FailoverManagerImpl.handleConnectionFailure(FailoverManagerImpl.java:548)
              at org.hornetq.core.client.impl.FailoverManagerImpl.access$600(FailoverManagerImpl.java:69)
              at org.hornetq.core.client.impl.FailoverManagerImpl$DelegatingFailureListener.connectionFailed(FailoverManagerImpl.java:1111)
              at org.hornetq.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:445)
              at org.hornetq.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:250)
              at org.hornetq.core.client.impl.FailoverManagerImpl$PingRunnable$1.run(FailoverManagerImpl.java:1169)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:619)

       

       

      Exception on B1

      7:39,274 ERROR [org.hornetq.core.client.impl.ClientSessionImpl] (Thread-6 (group:HornetQ-client-global-threads-1492690777)) Failed to handle failover
      HornetQException[errorCode=3 message=Timed out waiting for response when sending packet 32]
              at org.hornetq.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:270)
              at org.hornetq.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:863)
              at org.hornetq.core.client.impl.FailoverManagerImpl.reconnectSessions(FailoverManagerImpl.java:785)
              at org.hornetq.core.client.impl.FailoverManagerImpl.failoverOrReconnect(FailoverManagerImpl.java:686)
              at org.hornetq.core.client.impl.FailoverManagerImpl.handleConnectionFailure(FailoverManagerImpl.java:548)
              at org.hornetq.core.client.impl.FailoverManagerImpl.access$600(FailoverManagerImpl.java:69)
              at org.hornetq.core.client.impl.FailoverManagerImpl$DelegatingFailureListener.connectionFailed(FailoverManagerImpl.java:1111)
              at org.hornetq.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:445)
              at org.hornetq.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:250)
              at org.hornetq.core.client.impl.FailoverManagerImpl$PingRunnable$1.run(FailoverManagerImpl.java:1169)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:619)

        • 1. Re: Core Bridge Failover issues
          ataylor

          Failing back over from a backup node to th eoriginal node is not yet supported. I think there is a JIRA for this for 2.2

          • 2. Re: Core Bridge Failover issues
            artp

            How should I recover from that situation? All nodes in cluster B are down, when I bring B1 up the bridge doesn't reconnect from any of the cluster A nodes. I'd would have expected the bridge from cluster A to reconnect to any B node since none were available(FYI, I took down the nodes in cluster B gracefully). Is that not the case after failover occurs? Would I need to bring up the failover node first(ie B2) to ensure the bridge reconnects?

            • 3. Re: Core Bridge Failover issues
              artp

              Also, when is 2.2 scheduled to be released?

              • 4. Re: Core Bridge Failover issues
                clebert.suconic

                He is doing a failover from a live to live node...

                 

                I thought we had a JIRA for that, but I can't find it..

                 

                It seems the failover is working fine however he's not being able to place it back.  Maybe we should allow circular failover when failing over with live nodes.

                • 5. Re: Core Bridge Failover issues
                  artp

                  Just to be clear, what I'm trying to achieve is to have the core bridge failover to another node in cluster B when the node the bridge is connected to leaves the cluster. All the nodes in cluster B were clustered live jboss 5.1 servers with no backup configuration according to the HA in the user manual.

                   

                  The functionality I was seeking was similar to the JMS Bridge that can be configured with multiple servers in provider.url. If one fails it will try to connect to the next in the list.

                  • 6. Re: Core Bridge Failover issues
                    timfox

                    How do you configure the JMS bridge with multiple targets?

                     

                    Currently a core bridge always tries to reconnect to the same node.

                     

                    It can also optionally failover to another node.

                     

                    There's a JIRA somewhere for automatically reconnecting to other nodes.

                    • 7. Re: Core Bridge Failover issues
                      artp

                      From the JMS Bridge example i was adding multiple java.naming.provider.url. BTW, I've noticed that  my Core Bridge configuration seems to not connect correctly if our target server isn't available when I start up the source host first. See my configuration below. Steps are, first start up the source cluster then start up the target node configured in the core bridge. I'm still troubleshooting this issue. It it worth it to upgrade to 2.1 since i'm running this in Jboss?

                       

                      <bean name="TargetJNDI">
                                      <constructor>
                                         <map keyClass="java.lang.String"
                                                                          valueClass="java.lang.String">
                                            <entry>
                                               <key>java.naming.factory.initial</key>
                                               <value>org.jnp.interfaces.NamingContextFactory</value>
                                            </entry>
                                            <entry>
                                               <key>java.naming.provider.url</key>
                                               <value>host01:1100,host02:1100</value>
                                            </entry>
                                            <entry>
                                               <key>java.naming.factory.url.pkgs</key>
                                               <value>org.jboss.naming:org.jnp.interfaces</value>
                                            </entry>
                                            <entry>
                                               <key>jnp.partitionName</key>
                                               <value>artp-OTS</value>
                                            </entry>
                                         </map>
                                      </constructor>
                                   </bean>
                      
                      
                      

                       

                       

                      Source Host nodes in cluster A has two bridges to the Target Host in cluster B. All nodes are Jboss 5.1 with HornetQ 2.0GA

                       

                       <bridge name="ots-bridge">
                                <queue-name>jms.queue.OTSForward</queue-name>
                                <forwarding-address>jms.topic.WOPEvents</forwarding-address>
                                <retry-interval>5000</retry-interval>
                                <reconnect-attempts>-1</reconnect-attempts>
                                <use-duplicate-detection>false</use-duplicate-detection>
                                <connector-ref connector-name="ots-connector-0"/>
                             </bridge>
                             <bridge name="ots-pkg-bridge">
                                <queue-name>jms.queue.OTSPkgForward</queue-name>
                                <forwarding-address>jms.queue.OTSEventQueue</forwarding-address>
                                <retry-interval>5000</retry-interval>
                                <reconnect-attempts>-1</reconnect-attempts>
                                <use-duplicate-detection>false</use-duplicate-detection>
                                 <connector-ref connector-name="ots-connector-0"/>
                             </bridge>