3 Replies Latest reply on Feb 24, 2004 12:09 PM by iankenn

JMS queues started on both nodes in a cluster after temporar

iankenn Feb 24, 2004 7:08 AM

Hi

I'm currently developing a system which uses JMS queuing for async processing of messages. I'm looking at deploying to a cluster of two JBoss 3.2.3 servers to provide some level of fail-over/resilience.

During testing of the JMS fail-over I've tried killing one of the JBoss instances (the one running the JMS server) and see that the JMS queues are migrated to the other node. But when I tried to simulate a temporary loss of network connectivity between the two machines (by removing one of the network cables and then replacing it) the cluster seems to break and both machines start to run the JMS queues.

When the network cable is reconnected, neither node appear to know that there is another node in the same partition. Effectively the cluster is not re-established. The only way to make the two nodes see each other again is to restart one of the nodes. Is there something that I have miss-configured/not configured, I am new to clustering and would appreciate some advice. - I am currently testing on two windows machines but intend to deploy to Linux boxes.

Thanks,

Ian

1. Re: JMS queues started on both nodes in a cluster after temp

crobert Feb 24, 2004 10:19 AM (in response to iankenn)

Hello,

That looks like the same problem I'm having in
http://www.jboss.org/index.html?module=bb&op=viewtopic&t=45855
with JBoss 3.2.3. Unfortunately no one answered.

I don't use JMS, but the behaviour seems similar: if I kill one of the servers, the cluster is not always formed back. If I gracefully shut it down, the cluster is formed back.

Regards,
Robert
Actions
2. Re: JMS queues started on both nodes in a cluster after temp

adrian.brock Feb 24, 2004 10:49 AM (in response to iankenn)

It sounds like the merge processing is not working correctly.

Are you seeing messages on the console saying it is attempting to merge
the nodes?

Post the steps and the logs as a bug at www.sf.net/projects/jboss
Enable the example cluster TRACE logging found at the bottom of conf/log4j.xml
to get a complete log.

Regards,
Adrian
Actions
3. Re: JMS queues started on both nodes in a cluster after temp

iankenn Feb 24, 2004 12:09 PM (in response to iankenn)

I do see the following messages (sometimes)

(on the Node which was not the Singleton before the network error)
12:00:47,920 INFO [DefaultPartition:ReplicantManager] Start merging members in DRM service...
12:00:48,045 INFO [HAILServerILService] Notified to stop acting as singleton.
12:00:48,061 INFO [DefaultPartition:ReplicantManager] ..Finished merging members in DRM service

It does not always try to merge and even when it says that it is merging it doesn't seem to merge the cluster state correctly.

I will repeat the test with TRACE on and submit it as a bug.

Thanks

Ian
Actions

Go to original post