3 Replies Latest reply on Feb 24, 2004 12:09 PM by iankenn

    JMS queues started on both nodes in a cluster after temporar

    iankenn

      Hi

      I'm currently developing a system which uses JMS queuing for async processing of messages. I'm looking at deploying to a cluster of two JBoss 3.2.3 servers to provide some level of fail-over/resilience.

      During testing of the JMS fail-over I've tried killing one of the JBoss instances (the one running the JMS server) and see that the JMS queues are migrated to the other node. But when I tried to simulate a temporary loss of network connectivity between the two machines (by removing one of the network cables and then replacing it) the cluster seems to break and both machines start to run the JMS queues.

      When the network cable is reconnected, neither node appear to know that there is another node in the same partition. Effectively the cluster is not re-established. The only way to make the two nodes see each other again is to restart one of the nodes. Is there something that I have miss-configured/not configured, I am new to clustering and would appreciate some advice. - I am currently testing on two windows machines but intend to deploy to Linux boxes.

      Thanks,

      Ian

        • 1. Re: JMS queues started on both nodes in a cluster after temp
          crobert

          Hello,

          That looks like the same problem I'm having in
          http://www.jboss.org/index.html?module=bb&op=viewtopic&t=45855
          with JBoss 3.2.3. Unfortunately no one answered.

          I don't use JMS, but the behaviour seems similar: if I kill one of the servers, the cluster is not always formed back. If I gracefully shut it down, the cluster is formed back.

          Regards,
          Robert

          • 2. Re: JMS queues started on both nodes in a cluster after temp

            It sounds like the merge processing is not working correctly.

            Are you seeing messages on the console saying it is attempting to merge
            the nodes?

            Post the steps and the logs as a bug at www.sf.net/projects/jboss
            Enable the example cluster TRACE logging found at the bottom of conf/log4j.xml
            to get a complete log.

            Regards,
            Adrian

            • 3. Re: JMS queues started on both nodes in a cluster after temp
              iankenn

              I do see the following messages (sometimes)

              (on the Node which was not the Singleton before the network error)
              12:00:47,920 INFO [DefaultPartition:ReplicantManager] Start merging members in DRM service...
              12:00:48,045 INFO [HAILServerILService] Notified to stop acting as singleton.
              12:00:48,061 INFO [DefaultPartition:ReplicantManager] ..Finished merging members in DRM service

              It does not always try to merge and even when it says that it is merging it doesn't seem to merge the cluster state correctly.

              I will repeat the test with TRACE on and submit it as a bug.

              Thanks

              Ian