-
1. Re: Automatic fail-over is not working
ataylor Feb 15, 2012 6:33 AM (in response to perltom)do you see the backup server announce itself, something like "backup announced".
also if they are on the same machine make sure you have a loopback address configured
-
2. Re: Automatic fail-over is not working
perltom Feb 15, 2012 9:44 AM (in response to ataylor)Hi Andy,
Yes, to both questions:
- the live and the the back-up servers are on the same host
- the back-up server announce itself correctly.
The following are the output of the log files
Live server 's log
==================================
[main] 09:34:47,557 INFO [org.hornetq.integration.bootstrap.HornetQBootstrapServer] Starting HornetQ Server
[main] 09:34:48,768 INFO [org.hornetq.core.server.impl.HornetQServerImpl] live server is starting with configuration HornetQ Configuration (clustered=true,backup=false,sharedStore=true,journalDirectory=/jms02/journal,bindingsDirectory=/jms02/bindings,largeMessagesDirectory=/jms02/large-messages,pagingDirectory=/jms02/paging)
[main] 09:34:48,769 INFO [org.hornetq.core.server.impl.HornetQServerImpl] Waiting to obtain live lock
[main] 09:34:48,804 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager] Using AIO Journal
[main] 09:34:49,007 INFO [org.hornetq.core.server.impl.AIOFileLockNodeManager] Waiting to obtain live lock
[main] 09:34:49,008 INFO [org.hornetq.core.server.impl.AIOFileLockNodeManager] Live Server Obtained live lock
[main] 09:34:56,863 INFO [org.hornetq.core.server.impl.HornetQServerImpl] trying to deploy queue jms.queue.DLQ
[main] 09:34:56,887 INFO [org.hornetq.core.server.impl.HornetQServerImpl] trying to deploy queue jms.queue.ExpiryQueue
[main] 09:34:56,893 INFO [org.hornetq.core.server.impl.HornetQServerImpl] trying to deploy queue jms.queue.ExampleQueue
[main] 09:34:56,898 INFO [org.hornetq.core.server.impl.HornetQServerImpl] trying to deploy queue jms.topic.CacheMQTopic
[main] 09:34:56,907 INFO [org.hornetq.core.server.impl.HornetQServerImpl] trying to deploy queue jms.topic.exampleTopic
[main] 09:34:57,008 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor] Started Netty Acceptor version 3.2.3.Final-r${buildNumber} wpqajms02.wiley.com:10500 for CORE protocol
[main] 09:34:57,011 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor] Started Netty Acceptor version 3.2.3.Final-r${buildNumber} wpqajms02.wiley.com:10400 for CORE protocol
[main] 09:34:57,028 INFO [org.hornetq.core.server.impl.HornetQServerImpl] Server is now live
[main] 09:34:57,028 INFO [org.hornetq.core.server.impl.HornetQServerImpl] HornetQ Server version 2.2.5.Final (HQ_2_2_5_FINAL_AS7, 121) [0fe2b036-55ed-11e1-be2b-001a64664c6a] started
[Thread-29 (group:HornetQ-server-threads1173925084-435456241)] 09:34:57,076 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Connecting bridge sf.wileyplus-cluster.3f1be2a6-54f8-11e1-860b-0015172ce7cd to its destination [0fe2b036-55ed-11e1-be2b-001a64664c6a]
[Thread-0 (group:HornetQ-server-threads1173925084-435456241)] 09:34:57,108 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Connecting bridge sf.wileyplus-cluster.4c7983db-54f8-11e1-885c-001a64664c6a to its destination [0fe2b036-55ed-11e1-be2b-001a64664c6a]
[Thread-0 (group:HornetQ-server-threads1173925084-435456241)] 09:34:57,149 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Bridge sf.wileyplus-cluster.4c7983db-54f8-11e1-885c-001a64664c6a is connected [0fe2b036-55ed-11e1-be2b-001a64664c6a-> sf.wileyplus-cluster.4c7983db-54f8-11e1-885c-001a64664c6a]
[Thread-29 (group:HornetQ-server-threads1173925084-435456241)] 09:34:57,157 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Bridge sf.wileyplus-cluster.3f1be2a6-54f8-11e1-860b-0015172ce7cd is connected [0fe2b036-55ed-11e1-be2b-001a64664c6a-> sf.wileyplus-cluster.3f1be2a6-54f8-11e1-860b-0015172ce7cd]
The back-up server's
======================================
[main] 09:34:52,759 INFO [org.hornetq.integration.bootstrap.HornetQBootstrapServer] Starting HornetQ Server
[main] 09:34:53,837 INFO [org.hornetq.core.server.impl.HornetQServerImpl] backup server is starting with configuration HornetQ Configuration (clustered=true,backup=true,sharedStore=true,journalDirectory=/jms02/journal,bindingsDirectory=/jms02/bindings,largeMessagesDirectory=/jms02/large-messages,pagingDirectory=/jms02/paging)
[Thread-1] 09:34:53,840 INFO [org.hornetq.core.server.impl.AIOFileLockNodeManager] Waiting to become backup node
[Thread-1] 09:34:53,841 INFO [org.hornetq.core.server.impl.AIOFileLockNodeManager] ** got backup lock
[Thread-1] 09:34:53,871 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager] Using AIO Journal
[Thread-1] 09:34:54,028 INFO [org.hornetq.core.server.cluster.impl.ClusterManagerImpl] announcing backup
[Thread-1] 09:34:54,029 INFO [org.hornetq.core.server.impl.HornetQServerImpl] HornetQ Backup Server version 2.2.5.Final (HQ_2_2_5_FINAL_AS7, 121) [0fe2b036-55ed-11e1-be2b-001a64664c6a] started, waiting live to fail before it gets active
[Thread-0 (group:HornetQ-server-threads452794384-1921072065)] 09:34:55,085 INFO [org.hornetq.core.server.cluster.impl.ClusterManagerImpl] backup announced
Live server goes down
====================================
[hornetq-shutdown-thread] 09:39:05,551 INFO [org.hornetq.integration.bootstrap.HornetQBootstrapServer] Stopping HornetQ Server...
Back-up becomes live
====================================
[Thread-1] 09:39:14,077 INFO [org.hornetq.core.server.impl.HornetQServerImpl] trying to deploy queue jms.queue.DLQ
[Thread-1] 09:39:14,099 INFO [org.hornetq.core.server.impl.HornetQServerImpl] trying to deploy queue jms.queue.ExpiryQueue
[Thread-1] 09:39:14,105 INFO [org.hornetq.core.server.impl.HornetQServerImpl] trying to deploy queue jms.queue.ExampleQueue
[Thread-1] 09:39:14,111 INFO [org.hornetq.core.server.impl.HornetQServerImpl] trying to deploy queue jms.topic.CacheMQTopic
[Thread-1] 09:39:14,174 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor] Started Netty Acceptor version 3.2.3.Final-r${buildNumber} wpqajms02.wiley.com:11400 for CORE protocol
[Thread-1] 09:39:14,177 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor] Started Netty Acceptor version 3.2.3.Final-r${buildNumber} wpqajms02.wiley.com:11500 for CORE protocol
[Thread-1] 09:39:14,238 INFO [org.hornetq.core.server.impl.HornetQServerImpl] Backup Server is now live
[Thread-8 (group:HornetQ-server-threads452794384-1921072065)] 09:39:14,448 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Connecting bridge sf.wileyplus-cluster.3f1be2a6-54f8-11e1-860b-0015172ce7cd to its destination [0fe2b036-55ed-11e1-be2b-001a64664c6a]
[Thread-2 (group:HornetQ-server-threads452794384-1921072065)] 09:39:14,448 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Connecting bridge sf.wileyplus-cluster.4c7983db-54f8-11e1-885c-001a64664c6a to its destination [0fe2b036-55ed-11e1-be2b-001a64664c6a]
[Thread-8 (group:HornetQ-server-threads452794384-1921072065)] 09:39:14,490 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Bridge sf.wileyplus-cluster.3f1be2a6-54f8-11e1-860b-0015172ce7cd is connected [0fe2b036-55ed-11e1-be2b-001a64664c6a-> sf.wileyplus-cluster.3f1be2a6-54f8-11e1-860b-0015172ce7cd]
The following are the settings used to set-up both servers. The settings are read by the run.sh script
# Cluster settings
# a1
data_dir_a1=/jms01
jnp_port_a1=10000
jnp_rmiPort_a1=10100
jmx_port_a1=10200
hq_host_a1=myhost
remoting_netty_port_a1=10400
remoting_netty_batch_port_a1=10500
# a2
data_dir_a2=/jms01
jnp_port_a2=11000
jnp_rmiPort_a2=11100
jmx_port_a2=11200
hq_host_a2=myhost
remoting_netty_port_a2=11400
remoting_netty_batch_port_a2=11500
Each node in the cluster consists of a live and back-up server that access a SAN shared file system - /jms01.
I hope this helped.
thanks
Dimitar
-
3. Re: Automatic fail-over is not working
underscore_dot Feb 16, 2012 12:15 PM (in response to perltom)I'm having a similar issue.
I'm using a spring-jms based client, which fails to redirect to the backup server. Instead, it tries to create a new jndi connection factory from the previou live node.
I successfully ran the jms/non-transaction-failover example coming along with version 2.2.5-Final, which uses plain JMS, so I guess my problem is either spring-jms doesn't work properly or I'm not using it correctly.
Are you using spring-jms?
Cheers.
-
4. Re: Automatic fail-over is not working
perltom Feb 16, 2012 5:37 PM (in response to underscore_dot)The primary application that will use the cluster is based on EJB 2.1 but it is using HornetQ JMS API to connect to the server. At the moment, this application can neither re-connect to the server not fail-over to another node in the cluster. However the next version of this application (still in beta) is based on Spring and it looks like the developers working on it were successful in implementing all HA features - re-attach, re-connect and fail-over. I plan to verify this in the next few days and post my findings in this thread.
I also have to mention that a Java test client was developed to help us test the performance of the HornetQ stand-alone server and cluster. It cannot fail-over as well. The Java client was developed in Java 6, it is using HornetQ JMS API and was planned to run in Grinder types of tests.
-
5. Re: Automatic fail-over is not working
perltom Apr 6, 2012 10:35 AM (in response to perltom)While working on this issue I realized its description would need to be broken into two parts
- Client session re-connection
- Client fail-over
#1 was addressed through adding the following setting to the Connection factory (connection-factory element) in hornetq-jms.xml configuration file.
<confirmation-window-size>1048676</confirmation-window-size>
which basically enables the buffering of messages on the client side and sets its value to 1 MB. The default value is -1.
Tunning of connection-ttl and client-failure-check-period settings also made sure that connections are not dropped by either the HornetQ server or the clients during peak days.
I still need to test #2 with these changes but at this point we are addressing the client-fail over requirements by implementing listeners on the client side that will notify the client when connection fails. According to the documentation client fail-over should work out of the box but at this point I am not sure this is the case. I guess this feature would need further research and a review of the requirements.
In any case HornetQ works really great in a production environment. With about 4K msgs/min CPU and memory utilization of the HornetQ process is really low and efficient.
More updates to come..