0 Replies Latest reply on Oct 31, 2011 5:15 PM by alexc099

4 nodes, clustered and HA in 2.2.5.Final

alexc099 Oct 31, 2011 5:15 PM

Here's what I'm trying to do. I want four instances of HornetQ, two as a load balancing cluster and two as HA backups. Currently I've got things set up so that when I run all four instances of hornetQ I see the following, very encouraging, messages:

Instance 1

[main] 12:54:11,459 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor] Started Netty Acceptor version 3.2.3.Final-r${buildNumber} localhost:5455 for CORE protocol

[main] 12:54:11,509 INFO [org.hornetq.core.server.impl.HornetQServerImpl] Server is now live

[main] 12:54:11,510 INFO [org.hornetq.core.server.impl.HornetQServerImpl] HornetQ Server version 2.2.5.Final (HQ_2_2_5_FINAL_AS7, 121) [e4064500-01b5-11e1-b1fa-005056a10001] started

[Thread-22 (group:HornetQ-server-threads2137169521-780298059)] 12:54:16,641 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Connecting bridge sf.my-cluster.615f7230-01bc-11e1-93df-005056a10001 to its destination [e4064500-01b5-11e1-b1fa-005056a10001]

[Thread-22 (group:HornetQ-server-threads2137169521-780298059)] 12:54:16,878 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Bridge sf.my-cluster.615f7230-01bc-11e1-93df-005056a10001 is connected [e4064500-01b5-11e1-b1fa-005056a10001-> sf.my-cluster.615f7230-01bc-11e1-93df-005056a10001]

Instance 2

[main] 12:54:15,520 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor] Started Netty Acceptor version 3.2.3.Final-r${buildNumber} localhost:5446 for CORE protocol

[main] 12:54:15,545 INFO [org.hornetq.core.server.impl.HornetQServerImpl] Server is now live

[main] 12:54:15,545 INFO [org.hornetq.core.server.impl.HornetQServerImpl] HornetQ Server version 2.2.5.Final (HQ_2_2_5_FINAL_AS7, 121) [615f7230-01bc-11e1-93df-005056a10001] started

[Thread-21 (group:HornetQ-server-threads2137169521-780298059)] 12:54:16,625 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Connecting bridge sf.my-cluster.e4064500-01b5-11e1-b1fa-005056a10001 to its destination [615f7230-01bc-11e1-93df-005056a10001]

[Thread-21 (group:HornetQ-server-threads2137169521-780298059)] 12:54:16,825 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Bridge sf.my-cluster.e4064500-01b5-11e1-b1fa-005056a10001 is connected [615f7230-01bc-11e1-93df-005056a10001-> sf.my-cluster.e4064500-01b5-11e1-b1fa-005056a10001]

Instance 3 (Instance 1's backup)

[Thread-1] 12:54:22,897 INFO [org.hornetq.core.server.impl.AIOFileLockNodeManager] Waiting to become backup node

[Thread-1] 12:54:22,899 INFO [org.hornetq.core.server.impl.AIOFileLockNodeManager] ** got backup lock

[Thread-1] 12:54:22,937 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager] Using AIO Journal

[Thread-1] 12:54:22,960 WARNING [org.hornetq.core.server.impl.HornetQServerImpl] Security risk! It has been detected that the cluster admin user and password have not been changed from the installation default. Please see the HornetQ user guide, cluster chapter, for instructions on how to do this.

[Thread-1] 12:54:23,380 INFO [org.hornetq.core.server.cluster.impl.ClusterManagerImpl] announcing backup

[Thread-1] 12:54:23,390 INFO [org.hornetq.core.server.impl.HornetQServerImpl] HornetQ Backup Server version 2.2.5.Final (HQ_2_2_5_FINAL_AS7, 121) [e4064500-01b5-11e1-b1fa-005056a10001] started, waiting live to fail before it gets active

[Thread-0 (group:HornetQ-server-threads1543103262-1433965066)] 12:54:25,668 INFO [org.hornetq.core.server.cluster.impl.ClusterManagerImpl] backup announced

Instance 4 (Instance 2's backup)

[Thread-1] 12:54:29,251 INFO [org.hornetq.core.server.impl.AIOFileLockNodeManager] Waiting to become backup node

[Thread-1] 12:54:29,254 INFO [org.hornetq.core.server.impl.AIOFileLockNodeManager] ** got backup lock

[Thread-1] 12:54:29,288 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager] Using AIO Journal

[Thread-1] 12:54:29,309 WARNING [org.hornetq.core.server.impl.HornetQServerImpl] Security risk! It has been detected that the cluster admin user and password have not been changed from the installation default. Please see the HornetQ user guide, cluster chapter, for instructions on how to do this.

[Thread-1] 12:54:29,728 INFO [org.hornetq.core.server.cluster.impl.ClusterManagerImpl] announcing backup

[Thread-1] 12:54:29,736 INFO [org.hornetq.core.server.impl.HornetQServerImpl] HornetQ Backup Server version 2.2.5.Final (HQ_2_2_5_FINAL_AS7, 121) [615f7230-01bc-11e1-93df-005056a10001] started, waiting live to fail before it gets active

[Thread-0 (group:HornetQ-server-threads1543103262-1433965066)] 12:54:30,661 INFO [org.hornetq.core.server.cluster.impl.ClusterManagerImpl] backup announced

So far, so good! Now for my testing I'm going to start sending JMS messages with their JMSReplyTo field set to a non-temporary reply queue. About half way through it I'll whack instance one right as I'm about to send a reply. When I do that I see:

Instance 2:

[Thread-6 (group:HornetQ-client-global-threads-2041134904)] 13:04:02,895 WARNING [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]

[Thread-4 (group:HornetQ-client-global-threads-2041134904)] 13:04:02,896 WARNING [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]

[Thread-5 (group:HornetQ-client-global-threads-2041134904)] 13:04:02,917 WARNING [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]

[Thread-5 (group:HornetQ-client-global-threads-2041134904)] 13:04:02,918 WARNING [org.hornetq.core.server.cluster.impl.BridgeImpl] sf.my-cluster.e4064500-01b511e1-b1fa-005056a10001::Connection failed before reconnect

HornetQException[errorCode=4 message=The connection was disconnected because of server shutdown]

at org.hornetq.core.client.impl.ClientSessionFactoryImpl$Channel0Handler$1.run(ClientSessionFactoryImpl.java:1262)

at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

[Thread-5 (group:HornetQ-client-global-threads-2041134904)] 13:04:06,068 WARNING [org.hornetq.core.server.cluster.impl.BridgeImpl] sf.my-cluster.e4064500-01b5-11e1-b1fa-005056a10001::Connection failed with failedOver=true

HornetQException[errorCode=4 message=The connection was disconnected because of server shutdown]

at org.hornetq.core.client.impl.ClientSessionFactoryImpl$Channel0Handler$1.run(ClientSessionFactoryImpl.java:1262)

at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

[Old I/O client worker ([id: 0x315863e4, /127.0.0.1:48053 => localhost/127.0.0.1:65445])] 13:04:06,130 WARNING [org.hornetq.core.protocol.core.impl.ChannelImpl] Can't find packet to clear: last received command id 2 first stored command id 2

[Old I/O client worker ([id: 0x315863e4, /127.0.0.1:48053 => localhost/127.0.0.1:65445])] 13:04:06,137 WARNING [org.hornetq.core.protocol.core.impl.ChannelImpl] Can't find packet to clear: last received command id 4 first stored command id 4

[Old I/O client worker ([id: 0x315863e4, /127.0.0.1:48053 => localhost/127.0.0.1:65445])] 13:04:06,139 WARNING [org.hornetq.core.protocol.core.impl.ChannelImpl] Can't find packet to clear: last received command id 5 first stored command id 5

[hornetq-discovery-group-thread-dg-group2] 13:04:08,370 WARNING [org.hornetq.core.cluster.impl.DiscoveryGroupImpl] There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=e4064500-01b5-11e1-b1fa-005056a10001

Instance 3 (Instance 1's backup)

[Thread-1] 13:04:05,931 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor] Started Netty Acceptor version 3.2.3.Final-r${buildNumber} localhost:65445 for CORE protocol

[Thread-1] 13:04:05,937 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor] Started Netty Acceptor version 3.2.3.Final-r${buildNumber} localhost:65455 for CORE protocol

[Thread-1] 13:04:05,974 INFO [org.hornetq.core.server.impl.HornetQServerImpl] Backup Server is now live

[Thread-23 (group:HornetQ-server-threads1543103262-1433965066)] 13:04:06,048 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Connecting bridge sf.my-cluster.615f7230-01bc-11e1-93df-005056a10001 to its destination [e4064500-01b5-11e1-b1fa-005056a10001]

[Thread-23 (group:HornetQ-server-threads1543103262-1433965066)] 13:04:06,166 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Bridge sf.my-cluster.615f7230-01bc-11e1-93df-005056a10001 is connected [e4064500-01b5-11e1-b1fa-005056a10001-> sf.my-cluster.615f7230-01bc-11e1-93df-005056a10001]

Again, so far so good! Instance 2 is complaining that another instance in its cluster went down and instance 3 wakes up as a backup. Just what I want. So I start off by having my test consumer send a reply and I get an exception:

Caused by: javax.jms.IllegalStateException: Session is closed

at org.hornetq.jms.client.HornetQSession.checkClosed(HornetQSession.java:1008)

at org.hornetq.jms.client.HornetQSession.createObjectMessage(HornetQSession.java:165)

Not what I was hoping for. All of these queues are non-transacted and were created on the JMS side from a connection factory created with the HornetQJMSClient.createConnectionFactoryWithHA. The core API was also called through a ServerLocator to create the queue if it doesn't already exits. Both are wired into Spring like so:

	<util:constant id="QUEUE_CF"
	static-field="org.hornetq.api.jms.JMSFactoryType.QUEUE_CF" />

<bean name="targetConnectionFactory" class="org.hornetq.api.jms.HornetQJMSClient"

	factory-method="createConnectionFactoryWithHA">
	<constructor-arg index="0" ref="QUEUE_CF" />
	<constructor-arg index="1" ref="transportConfiguration" />

</bean>

<bean name="targetServerLocatorFactory" class="org.hornetq.api.core.client.HornetQClient"

	factory-method="createServerLocatorWithHA">
	<constructor-arg index="0" ref="transportConfiguration" />

</bean>

<bean name="transportConfiguration"

	class="org.hornetq.api.core.TransportConfiguration">
	<constructor-arg
	value="org.hornetq.core.remoting.impl.netty.NettyConnectorFactory" />
	<constructor-arg>
	<map key-type="java.lang.String" value-type="java.lang.Object">
	<entry key="host" value="localhost" />
	<entry key="port" value="5445" />
	</map>
	</constructor-arg>

</bean>

	<bean id="jmsFactory"
	class="org.springframework.jms.connection.CachingConnectionFactory">
	<property name="targetConnectionFactory" ref="targetConnectionFactory" />
	<property name="sessionCacheSize" value="100" />
	<property name="cacheProducers" value="true" />

</bean>

Finally, the config I've touched in hornetq-configuration looks like so:

<shared-store>true</shared-store>

<persistence-enabled>true</persistence-enabled>

<failover-on-shutdown>true</failover-on-shutdown>

<allow-failback>false</allow-failback>

<jmx-management-enabled>false</jmx-management-enabled>

<journal-directory>/testenv/shared/hq/data/journal</journal-directory>

<paging-directory>/testenv/shared/hq/data/paging</paging-directory>

<bindings-directory>/testenv/shared/hq/data/bindings</bindings-directory>

<journal-min-files>10</journal-min-files>

<large-messages-directory>/testenv/shared/hq/data/large-messages</large-messages-directory>

With the addition of <backup>true</backup> in instances three and four. Any input on where I may have gone wrong?

Edit: After looking here (http://community.jboss.org/thread/174199?tstart=0) I also tried setting failover-on-shutdown to false but that didn't change the behavior.