1 2 Previous Next 15 Replies Latest reply on Jan 7, 2014 12:42 AM by nishakurur

Replicated live backup setup - failover not happening properly

nishakurur Dec 22, 2013 9:11 PM

Greetings!

I have set up two servers (say server A and server B) with hornetq 2.3.0_Final with replicated live-backup configuration. Initially server A is live and server B is backup. Hornetq live server shows proper logs and backup server logs shows backup announced. The sample programs are able to publish and subscribe using the live server.

Issue 1:

On killing server A failover happens as per the hornetq logs. However, the topic publisher fails to keep sending messages and kills itself with the following exception:

javax.jms.IllegalStateException: Session is closed

at org.hornetq.jms.client.HornetQSession.checkClosed(HornetQSession.java:1014)

at org.hornetq.jms.client.HornetQSession.createTextMessage(HornetQSession.java:189)

at TopicPubExample.main(TopicPubExample.java:38)

Issue 2:

On shutting down the live server, it takes almost a minute for the backup server to become live. The log file messages are shown below:

20:42:49,715 WARN [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]

20:42:49,716 WARN [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]

20:42:49,742 WARN [org.hornetq.core.client] HQ212004: Failed to connect to server.

20:43:49,718 INFO [org.hornetq.core.server] HQ221037: HornetQServerImpl::serverUUID=ff558ad8-67be-11e3-9ca1-d78e71deb2e5 to become live

Also, in such failover conditions, the topic publisher fails with the following stacktrace:

Dec 22, 2013 8:47:38 PM org.hornetq.core.protocol.core.impl.RemotingConnectionImpl fail

WARN: HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]

javax.jms.IllegalStateException: Session is closed

at org.hornetq.jms.client.HornetQSession.checkClosed(HornetQSession.java:1014)

at org.hornetq.jms.client.HornetQSession.createTextMessage(HornetQSession.java:189)

at TopicPubExample.main(TopicPubExample.java:38)

I have attached the sample clients that I have been using. The configuration files are also attached here. The hornetq-configuration file for live server has been attached. The only change in hornetq-configuration for backup is that the backup tag is true. <backup>true</backup>

Is there something that am missing in the configuration or while writing client code?

Regards

Nisha

1. Re: Replicated live backup setup - failover not happening properly

jbertram Dec 22, 2013 9:50 PM (in response to nishakurur)

Try casting your TopicConnectionFactory to a org.hornetq.jms.client.HornetQConnectionFactory and call setReconnectAttempts(int). The default reconnect attempts is 0 so you need to set it to > 0 to have any chance of reconnecting upon a failure.
1 of 1 people found this helpful
Actions
2. Re: Re: Replicated live backup setup - failover not happening properly

nishakurur Dec 23, 2013 2:08 AM (in response to jbertram)
Thanks Justin, this fixed my issue. Now I have another issue

I am using the above standalone live-backup hornetq servers for the applications hosted on the JBoss 7.1.1_Final app server (say server C). JBoss 7.1.1_Final comes with HornetQ 2.2.13 whereas the standalone hornetq servers are 2.3.0_Final. My configuration related to hornetq in standalone-full.xml is attached. When a failover happens, the JBoss server log shows the following error and then stops receiving messages from the hornetq cluster. To enable the connectivity between the JBoss app server and HornetQ cluster, the JBoss has to be restarted. Is the difference in HornetQ versions the reason for this issue?

WARN [org.hornetq.jms.server.recovery.HornetQXAResourceWrapper] (Thread-168 (HornetQ-client-global-threads-1725854739)) Notified of connection failure in xa recovery connectionFactory for provider ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-240], discoveryGroupConfiguration=DiscoveryGroupConfiguration [discoveryInitialWaitTimeout=10000, groupAddress=231.7.7.8, groupPort=9879, localBindAddress=null, name=94831171-6b9a-11e3-91d0-00505689359f, refreshTimeout=10000]], connectorConfig=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-240, backupConfig=null] will attempt reconnect on next pass: HornetQException[errorCode=2 message=Channel disconnected]
        at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:380) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:711) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100) [hornetq-core-2.2.13.Final.jar:]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_21]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_21]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_21]

01:51:52,308 WARN [org.hornetq.core.server.cluster.impl.BridgeImpl] (Thread-53 (HornetQ-client-global-threads-1725854739)) ClusterConnectionBridge@4fa4b827 [name=sf.my-cluster.ff558ad8-67be-11e3-9ca1-d78e71deb2e5, queue=QueueImpl[name=sf.my-cluster.ff558ad8-67be-11e3-9ca1-d78e71deb2e5, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=4c49aa62-dd04-11e2-92b2-00505689359f]]@4736314a targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@4fa4b827 [name=sf.my-cluster.ff558ad8-67be-11e3-9ca1-d78e71deb2e5, queue=QueueImpl[name=sf.my-cluster.ff558ad8-67be-11e3-9ca1-d78e71deb2e5, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=4c49aa62-dd04-11e2-92b2-00505689359f]]@4736314a targetConnector=ServerLocatorImpl [initialConnectors=[org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-240], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1147871798[nodeUUID=4c49aa62-dd04-11e2-92b2-00505689359f, connector=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory, address=jms, server=HornetQServerImpl::serverUUID=4c49aa62-dd04-11e2-92b2-00505689359f])) [initialConnectors=[org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-240], discoveryGroupConfiguration=null]]::Connection failed with failedOver=false-HornetQException[errorCode=2 message=Channel disconnected]: HornetQException[errorCode=2 message=Channel disconnected]
        at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:380) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:711) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100) [hornetq-core-2.2.13.Final.jar:]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_21]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_21]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_21]

01:52:01,085 WARN [org.hornetq.core.cluster.impl.DiscoveryGroupImpl] (hornetq-discovery-group-thread-dg-group) There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=ff558ad8-67be-11e3-9ca1-d78e71deb2e5
01:52:01,086 WARN [org.hornetq.core.cluster.impl.DiscoveryGroupImpl] (hornetq-discovery-group-thread-9483116f-6b9a-11e3-91d0-00505689359f) There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=ff558ad8-67be-11e3-9ca1-d78e71deb2e5

Regards
Nisha

standalone-full-hornetq-related.xml 4.5 KB
Actions
3. Re: Replicated live backup setup - failover not happening properly

jbertram Dec 23, 2013 10:11 AM (in response to nishakurur)

I am using the above standalone live-backup hornetq servers for the applications hosted on the JBoss 7.1.1_Final app server (say server C).

Based on this description I wouldn't expect this JBoss 7.1.1.Final server to be clustered with the other HornetQ servers. However, it appears that it is. Can you elaborate on this? This is the cause of at least one of the WARN messages you are seeing.

Also, please elaborate on the purpose of the <resource-adapter> you've defined for "hornetq-ra.rar". I can't think of any reason you'd want to do this. The <pooled-connection-factory> in <hornetq-server> is a facade for the HornetQ JCA RA. You should be able to use that (e.g. configure a new one) for anything you need related to the RA.

Lastly, how many times do you see the "There are more than one servers on the network broadcasting the same node id" WARN message logged?
Actions
4. Re: Re: Replicated live backup setup - failover not happening properly

nishakurur Dec 23, 2013 12:05 PM (in response to jbertram)

Based on this description I wouldn't expect this JBoss 7.1.1.Final server to be clustered with the other HornetQ servers. However, it appears that it is. Can you elaborate on this? This is the cause of at least one of the WARN messages you are seeing.

We have a standalone HornetQ cluster (replicated live-backup) which is being used by many applications. (Many of the applications are not under our control). Our application which has been hosted on the JBoss 7.1.1_Final has to listen to the messages being sent by different applications to this hornetq cluster. What way can be used to achieve this apart from what we have mentioned above?

Also, please elaborate on the purpose of the <resource-adapter> you've defined for "hornetq-ra.rar". I can't think of any reason you'd want to do this. The <pooled-connection-factory> in <hornetq-server> is a facade for the HornetQ JCA RA. You should be able to use that (e.g. configure a new one) for anything you need related to the RA.
Without providing the details in the resource adapter tag, we were unable to receive any messages.
Lastly, how many times do you see the "There are more than one servers on the network broadcasting the same node id" WARN message logged?
Twice or sometimes thrice..
Actions
5. Re: Re: Replicated live backup setup - failover not happening properly

jbertram Dec 23, 2013 12:46 PM (in response to nishakurur)

Our application which has been hosted on the JBoss 7.1.1_Final has to listen to the messages being sent by different applications to this hornetq cluster.

It's perfectly reasonable for your application to consume messages from the replicated live/backup pair. However, the server hosting your application is actually clustering with the replicated live/backup pair. This is almost certainly not what you want. Your application should simply connect to the remote live/backup pair and consume the messages that way. I recommend you remove <clustered>, <cluster-user>, <cluster-password>, and <cluster-connections> from your configuration on the server hosting your application.

Without providing the details in the resource adapter tag, we were unable to receive any messages.

So what exactly in your application is using the <resource-adapter> and why don't you just use a <pooled-connection-factory> instead to save yourself from some configuration complexity?

Twice or sometimes thrice..

If that's all then I think you can ignore it.
Actions
6. Re: Replicated live backup setup - failover not happening properly

nishakurur Dec 24, 2013 2:24 AM (in response to nishakurur)

It's perfectly reasonable for your application to consume messages from the replicated live/backup pair. However, the server hosting your application is actually clustering with the replicated live/backup pair. This is almost certainly not what you want. Your application should simply connect to the remote live/backup pair and consume the messages that way. I recommend you remove <clustered>, <cluster-user>, <cluster-password>, and <cluster-connections> from your configuration on the server hosting your application.

On removing the above mentioned tags, while failover we get the below exception.

02:12:54,310 WARN [org.hornetq.jms.server.recovery.HornetQXAResourceWrapper] (Thread-47 (HornetQ-client-global-threads-898317280)) Notified of connection failure in xa recovery connectionFactory for provider ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239], discoveryGroupConfiguration=DiscoveryGroupConfiguration [discoveryInitialWaitTimeout=10000, groupAddress=231.7.7.8, groupPort=9879, localBindAddress=null, name=9c3cf333-6c6a-11e3-92d4-00505689359f, refreshTimeout=10000]], connectorConfig=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239, backupConfig=null] will attempt reconnect on next pass: HornetQException[errorCode=2 message=Channel disconnected]
        at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:380) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:711) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100) [hornetq-core-2.2.13.Final.jar:]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_21]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_21]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_21]

So what exactly in your application is using the <resource-adapter> and why don't you just use a <pooled-connection-factory> instead to save yourself from some configuration complexity?

We are using <pooled-connection-factory> which has the discovery group defined in it. Please find the configuration below:

              <pooled-connection-factory name="hornetq-ra">
                        <discovery-group-ref discovery-group-name="dg-group"/>
                        <connectors>
                            <connector-ref connector-name="netty-connector"/>
                        </connectors>
                        <entries>
                            <entry name="java:/JmsXA"/>
                        </entries>
                        <reconnect-attempts>-1</reconnect-attempts>
                    </pooled-connection-factory>

Still it is unable to find the resource. So we thought of providing the server broadcast ip and port as config parameters to the resource-adapter. How can we achieve with the <pooled-connection-factory> ?
Actions
7. Re: Replicated live backup setup - failover not happening properly

jbertram Dec 24, 2013 10:07 PM (in response to nishakurur)

02:12:54,310 WARN [org.hornetq.jms.server.recovery.HornetQXAResourceWrapper] (Thread-47 (HornetQ-client-global-threads-898317280)) Notified of connection failure in xa recovery connectionFactory for provider...

How often do you see this message? The message itself says, "will attempt reconnect on next pass," so I'm curious if it actually does reconnect after a few warnings.

We are using <pooled-connection-factory> which has the discovery group defined in it. Please find the configuration below...

If you really want fail-over functionality (which it appears that you do) you need to enable it by using <ha>true</ha> on the pooled-connection-factory.

Still it is unable to find the resource.

Still it is unable to find what resource? Are you talking about the previous WARN message from HornetQXAResourceWrapper? Are you seeing any functional impact from that? Are you actually using XA transactions in your application?

How can we achieve with the <pooled-connection-factory> ?

How can you achieve what exactly with the <pooled-connection-factory>? Are you talking about fail-over functionality? If so, configure it appropriately as mentioned above. If not, please elaborate.
Actions
8. Re: Re: Replicated live backup setup - failover not happening properly

nishakurur Dec 31, 2013 5:13 AM (in response to jbertram)

How often do you see this message? The message itself says, "will attempt reconnect on next pass," so I'm curious if it actually does reconnect after a few warnings.

The JBoss server log is as given below. While attempting to reconnect on next pass, XA exception occurs.. And it stops receiving events from the remote HornetQ cluster.

23:57:52,009 WARN [org.hornetq.jms.server.recovery.HornetQXAResourceWrapper] (Thread-197 (HornetQ-client-global-threads-1956891780)) Notified of connection failure in xa recovery connectionFactory for provider ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239], discoveryGroupConfiguration=DiscoveryGroupConfiguration [discoveryInitialWaitTimeout=10000, groupAddress=231.7.7.8, groupPort=9879, localBindAddress=null, name=5198dc46-710e-11e3-bfe1-00505689359f, refreshTimeout=10000]], connectorConfig=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239, backupConfig=null] will attempt reconnect on next pass: HornetQException[errorCode=2 message=Channel disconnected]
        at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:380) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:711) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100) [hornetq-core-2.2.13.Final.jar:]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_21]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_21]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_21]

23:57:52,535 WARN [org.hornetq.core.cluster.impl.DiscoveryGroupImpl] (hornetq-discovery-group-thread-51bda25a-710e-11e3-bfe1-00505689359f) There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=ff558ad8-67be-11e3-9ca1-d78e71deb2e5
23:57:52,536 WARN [org.hornetq.core.cluster.impl.DiscoveryGroupImpl] (hornetq-discovery-group-thread-51bd2d28-710e-11e3-bfe1-00505689359f) There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=ff558ad8-67be-11e3-9ca1-d78e71deb2e5
23:57:52,538 WARN [org.hornetq.core.cluster.impl.DiscoveryGroupImpl] (hornetq-discovery-group-thread-5198dc44-710e-11e3-bfe1-00505689359f) There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=ff558ad8-67be-11e3-9ca1-d78e71deb2e5
23:57:57,873 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016027: Local XARecoveryModule.xaRecovery got XA exception XAException.XAER_RMERR: javax.transaction.xa.XAException: Error trying to connect to any providers for xa recovery
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.getDelegate(HornetQXAResourceWrapper.java:275) [hornetq-jms-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.recover(HornetQXAResourceWrapper.java:77) [hornetq-jms-2.2.13.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecovery(XARecoveryModule.java:503) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.resourceInitiatedRecoveryForRecoveryHelpers(XARecoveryModule.java:471) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:385) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:166) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.2.Final.jar:]
Caused by: java.lang.IllegalStateException: Cannot create session factory, server locator is closed (maybe it has been garbage collected)
    at org.hornetq.core.client.impl.ServerLocatorImpl.assertOpen(ServerLocatorImpl.java:1823) [hornetq-core-2.2.13.Final.jar:]
      at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:699) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.connect(HornetQXAResourceWrapper.java:321) [hornetq-jms-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.getDelegate(HornetQXAResourceWrapper.java:251) [hornetq-jms-2.2.13.Final.jar:]

If you really want fail-over functionality (which it appears that you do) you need to enable it by using <ha>true</ha> on the pooled-connection-factory.
Even after adding this tag (<ha>true</ha>), the JBoss server fails to get the messages after the very first failover happens at Hornetq cluster. i.e JBoss gets messages till the first failover of HornetQ happens after the JBoss has started.

Still it is unable to find what resource? Are you talking about the previous WARN message from HornetQXAResourceWrapper? Are you seeing any functional impact from that? Are you actually using XA transactions in your application?

Sorry for being unclear about my statement. What I meant was when the resource-adapter tag is not provided, the above mentioned exception happens. On providing resource-adapter tag, the above mentioned XA Exception does not occur instead the below shown message comes thrice. Still it stops receiving further messages from the HornetQ cluster. So the disturbing part is that a failover affects the functionality and to regain the functionality JBoss server has to be restarted. I dont think we are using any XA transactions in our application as of now. But we may use it in future. So that was the reason to use XA. If we can get this solved by avoiding XA transactions, then that would also be fine with us for time being.

00:35:14,356 WARN [org.hornetq.core.cluster.impl.DiscoveryGroupImpl] (hornetq-discovery-group-thread-013546bf-7114-11e3-9370-00505689359f) There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=ff558ad8-67be-11e3-9ca1-d78e71deb2e5

On enabling DEBUG logs, the following log was seen in the JBoss server.log. The blue color highlighted text in the log shows that JBoss server is able to detect the live and backup server details. However, the red text shows that backup configuration is null. Is it because the backupConfig is being set before the actual details are obtained via discovery?

04:29:05,182 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] (Periodic Recovery) Trying to connect with connector = org.hornetq.core.remoting.impl.netty.NettyConnectorFactory@7b4ad4ae, parameters = {port=5445, host=10.252.122.239} connector = NettyConnector [host=10.252.122.239, port=5445, httpEnabled=false, useServlet=false, servletPath=/messaging/HornetQServlet, sslEnabled=false, useNio=false]
04:29:05,183 DEBUG [org.hornetq.core.remoting.impl.netty.NettyConnector] (Periodic Recovery) Started Netty Connector version 3.2.5.Final-a96d88c
04:29:05,183 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] (Periodic Recovery) Trying to connect at the main server using connector :org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239
04:29:05,186 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] (Periodic Recovery) Reconnection successfull
04:29:05,186 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] (Periodic Recovery) ClientSessionFactoryImpl received backup update for live/backup pair = org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239 / null but it didn't belong to org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239
04:29:05,189 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] (Old I/O client worker ([id: 0x6dd12abe, /10.252.122.248:58338 => /10.252.122.239:5445])) Node ff558ad8-67be-11e3-9ca1-d78e71deb2e5 going up, connector = Pair[a=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239, b=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5446&host=10-252-122-240], isLast=true csf created at
serverLocator=ServerLocatorImpl [initialConnectors=[org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239], discoveryGroupConfiguration=DiscoveryGroupConfiguration [discoveryInitialWaitTimeout=10000, groupAddress=231.7.7.8, groupPort=9879, localBindAddress=null, name=f6e95cf7-71fd-11e3-bf8f-00505689359f, refreshTimeout=10000]]: java.lang.Exception
        at org.hornetq.core.client.impl.ClientSessionFactoryImpl.<init>(ClientSessionFactoryImpl.java:180) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:732) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.connect(HornetQXAResourceWrapper.java:321) [hornetq-jms-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.getDelegate(HornetQXAResourceWrapper.java:251) [hornetq-jms-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.recover(HornetQXAResourceWrapper.java:77) [hornetq-jms-2.2.13.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecovery(XARecoveryModule.java:503) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.resourceInitiatedRecoveryForRecoveryHelpers(XARecoveryModule.java:471) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:385) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:166) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.2.Final.jar:]
Actions
9. Re: Re: Re: Replicated live backup setup - failover not happening properly

nishakurur Jan 1, 2014 5:38 AM (in response to nishakurur)

On going ahead, we got the same error as in the below link.
Re: HQ 2.2.13 - Cannot seem to set up MDB with failover using hard coded list of hosts

Tried out all the options provided in the above link as well as in HornetQ 2.2.13 - HA not working on MDB client topic - IllegalStateException: Cannot create session factory, server locator is closed (maybe it has been garbage collected)

Now the activation config properties of the MDB is as given below:

@MessageDriven(name = "HelloWorldQTopicMDB", activationConfig = {
        @ActivationConfigProperty(propertyName = "clientID", propertyValue = "testing"),
        @ActivationConfigProperty(propertyName = "subscriptionName", propertyValue = "testing"),
        @ActivationConfigProperty(propertyName = "subscriptionDurability", propertyValue ="Durable"),
        @ActivationConfigProperty(propertyName = "setupAttempts", propertyValue = "-1"),
        @ActivationConfigProperty(propertyName = "hA", propertyValue = "true"),
        @ActivationConfigProperty(propertyName = "reconnectAttempts", propertyValue = "-1"),
        @ActivationConfigProperty(propertyName = "destinationType", propertyValue = "javax.jms.Topic"),
        @ActivationConfigProperty(propertyName = "destination", propertyValue = "HelloWorldTopic"),
        @ActivationConfigProperty(propertyName = "discoveryAddress", propertyValue = "231.7.7.8"),
        @ActivationConfigProperty(propertyName = "discoveryPort", propertyValue = "9879"),
        @ActivationConfigProperty(propertyName = "autoAcknowledgement", propertyValue = "true")})

Now the issue is after fail over we are able to get just one message and after that only exceptions are shown.

As soon as failover happens the below exception is shown.

05:21:31,518 WARN [org.hornetq.jms.server.recovery.HornetQXAResourceWrapper] (Thread-6 (HornetQ-client-global-threads-1234022259)) Notified of connection failure in xa recovery connectionFactory for provider ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239], discoveryGroupConfiguration=DiscoveryGroupConfiguration [discoveryInitialWaitTimeout=10000, groupAddress=231.7.7.8, groupPort=9879, localBindAddress=null, name=32efc321-72ce-11e3-bd5e-00505689359f, refreshTimeout=10000]], connectorConfig=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5446&host=10-252-122-240, backupConfig=null] will attempt reconnect on next pass: HornetQException[errorCode=2 message=Channel disconnected]
        at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:380) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:711) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100) [hornetq-core-2.2.13.Final.jar:]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_21]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_21]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_21]

Then one actual message is delivered as shown below. Note that the initial connector in the above log message shows 10.252.122.239 whereas the connectorConfig shows 10.252.122.240. This means that at some point of time it is getting the backup configuration as well. And that should be the reason it receives one message.

05:21:31,538 INFO [stdout] (Thread-23 (HornetQ-client-global-threads-1234022259)) Received Message from topic: Message 16

But soon after receiving the message it fails with an exception that is shown periodically.

05:21:48,833 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016027: Local XARecoveryModule.xaRecovery got XA exception XAException.XAER_RMERR: javax.transaction.xa.XAException: Error trying to connect to any providers for xa recovery
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.getDelegate(HornetQXAResourceWrapper.java:275) [hornetq-jms-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.recover(HornetQXAResourceWrapper.java:77) [hornetq-jms-2.2.13.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecovery(XARecoveryModule.java:503) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.resourceInitiatedRecoveryForRecoveryHelpers(XARecoveryModule.java:471) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:385) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:166) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.2.Final.jar:]
Caused by: java.lang.IllegalStateException: Cannot create session factory, server locator is closed (maybe it has been garbage collected)
        at org.hornetq.core.client.impl.ServerLocatorImpl.assertOpen(ServerLocatorImpl.java:1823) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:699) [hornetq-core-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.connect(HornetQXAResourceWrapper.java:321) [hornetq-jms-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.getDelegate(HornetQXAResourceWrapper.java:251) [hornetq-jms-2.2.13.Final.jar:]
        ... 7 more

05:21:48,875 WARN [org.hornetq.jms.server.recovery.HornetQXAResourceWrapper] (Periodic Recovery) Can't connect to any hornetq server on recovery [XARecoveryConfig [hornetQConnectionFactory=HornetQConnectionFactory [serverLocator=ServerLocatorImpl [initialConnectors=[org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=10-252-122-239], discoveryGroupConfiguration=DiscoveryGroupConfiguration [discoveryInitialWaitTimeout=10000, groupAddress=231.7.7.8, groupPort=9879, localBindAddress=null, name=5187f261-72ce-11e3-bd5e-00505689359f, refreshTimeout=10000]], clientID=null, dupsOKBatchSize=1048576, transactionBatchSize=1048576, readOnly=false], username=null, password=null]]
05:21:48,881 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016027: Local XARecoveryModule.xaRecovery got XA exception XAException.XAER_RMERR: javax.transaction.xa.XAException: Error trying to connect to any providers for xa recovery
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.getDelegate(HornetQXAResourceWrapper.java:275) [hornetq-jms-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.recover(HornetQXAResourceWrapper.java:77) [hornetq-jms-2.2.13.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecovery(XARecoveryModule.java:503) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.resourceInitiatedRecoveryForRecoveryHelpers(XARecoveryModule.java:471) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:385) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:166) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.2.Final.jar:]
        at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.2.Final.jar:]
Caused by: HornetQException[errorCode=2 message=null]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.connect(HornetQXAResourceWrapper.java:351) [hornetq-jms-2.2.13.Final.jar:]
        at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.getDelegate(HornetQXAResourceWrapper.java:251) [hornetq-jms-2.2.13.Final.jar:]
        ... 7 more
Actions
10. Re: Replicated live backup setup - failover not happening properly

jbertram Jan 1, 2014 10:41 AM (in response to nishakurur)

Can you provide us with a reproducible test-case (perhaps based on one of the examples shipped with HornetQ)?
Actions
11. Re: Re: Replicated live backup setup - failover not happening properly

nishakurur Jan 2, 2014 3:25 AM (in response to jbertram)
Versions:
HornetQ 2.3.0_Final
JBoss AS 7.1.1_Final

Contents of attached zip:
1) Configuration files of HornetQ live and backup servers and messaging subsystem configuration of standalone-full.xml for JBoss application server
2) Mavenized sample MDB (taken from JBoss quickstarts)
3) Standalone topic publisher.

Set up (on 3 different hosts):
Host A HornetQ live server.
Host B HornetQ backup server.
Host C JBoss application server.

Test Case Steps:
Step 1: Start Host A hornetq server (this is live)
Step 2: Start Host B hornetq server (this is backup)
Step 3: Start Host C JBoss server and deploy the MDB as a war.
Step 4: Start the topic publisher. This can be either the standalone publisher provided or any publisher from hornetq example that uses discovery using UDP.
Step 5: Kill the live hornetq server using kill command (kill -9 <hornetq_pid>)

Observation:
On killing the live server, the backup server becomes live. The topic publisher keeps on sending messages. However, the MDB deployed in JBoss receives just one message and then fails to receive further messages.

hornetq-testcase.zip 11.9 KB
Actions
12. Re: Replicated live backup setup - failover not happening properly

jbertram Jan 3, 2014 9:41 PM (in response to nishakurur)

Can you try reproducing this with JBoss EAP 6.2 and HornetQ 2.4.0.Final?
Actions
13. Re: Replicated live backup setup - failover not happening properly

nishakurur Jan 6, 2014 8:50 PM (in response to jbertram)

Justin Bertram wrote:

Can you try reproducing this with JBoss EAP 6.2 and HornetQ 2.4.0.Final?

I was unable to reproduce this error in the above configuration. It worked as expected. So I will have to move to the above configuration to get failover working isnt it? Will there be a bug fix on JBoss 7.1.1_Final?
Actions
14. Re: Replicated live backup setup - failover not happening properly

jbertram Jan 6, 2014 9:48 PM (in response to nishakurur)

In general there are no bug fix releases for community projects. There definitely won't be any more releases for JBoss AS 7.x. That project has morphed into Wildfly. Community projects are generally considered "bleeding" edge. If you want bug-fixes and long-term support consider getting a Red Hat subscription.
Actions

1 2 Previous Next

Go to original post

Versions:

Contents of attached zip:

Set up (on 3 different hosts):

Test Case Steps:

Observation: