4 Replies Latest reply on Jul 20, 2007 5:54 AM by timfox

    problem when testing multiple failovers in a cluster

    chatzi

      Hi

      I have set up a JBoss 4.2.0 cluster running JBoss Messaging 1.4.0CR1 on top of an SQLSERVER 2005. I have configured a distributed topic. Runinng the cluster, publishing and subscribing messages from a client works reasonable fine, even if one of the (two) cluster nodes is shutdown or killed. However, if I shutdown one node, bring it up again and then shutdown the other node, failover does not work as I would expect it. It seems like the restarted node never really finds its way back into the cluster. The first exception I get is something like ' org.jboss.jms.exception.MessagingJMSException: Cannot handle invocation since messaging server is not active (it is either starting up or shutting down)'.

      Big question: Is this not yet implemented, or do I do something wrong. Do I need to 'reinitialise' my publishers/subscribers after I 'somehow' realise that a cluster node has joined the cluster.

      Thanks in advance for all help.

      Regards
      Alex

        • 1. Re: problem when testing multiple failovers in a cluster
          timfox

          Not sure I understood what the problem is from your explanation.

          Can you explain in more detail?

          When you say "failover does not work as I would expect it" - how would you expect it to work?

          If you can give me a step by step to reproduce and state what you your expected behaviour is that would be a great help.

          • 2. Re: problem when testing multiple failovers in a cluster
            chatzi

            Hi

            Thanks for the reply.

            Initially I did the following (all locally on my machine, Windows XP, SQLServer 2005).
            1) Start the Cluster.
            2) Wait for it to completely come up.
            3) Start my 'test' tool
            - The test simply sends messages as quickly as possible to a distributed topic, while additionally (in another thread) receiving them again.
            - At this point in time both JBoss instances start consuming CPU.
            4) Shutdown one JBoss instance (through CTRL-C in its command window).
            - There are now some exceptions thrown in the background, but the test continues to run.
            - There is no 're-initialisation' of any JMS recources whatsoever from my side (the cluster should be transparent to me).
            5) Wait for 'my' test to finish.
            - The test in the end reports the number of 'send' and received messages. Some have gone lost, but ok, not too many.

            Now I did the test again,with the following differences:
            1) Extend the duration of the test.
            2) Start everything again.
            3) Shutdown JBoss instance 'A' (as above)
            4) Re-start JBoss instance 'A'
            5) Wait for it to come up again.
            6) Shutdown JBoss instance 'B'
            - THE PROBLEM: At this point in time exceptions start to pour in, the test fails. 'Failover', i.e. the sender connection previously using 'B' now using 'A' did not happen for me.

            Regards
            Alex

            • 3. Re: problem when testing multiple failovers in a cluster
              timfox

              Failover kicks in when a server *dies*.

              CTRL-C does not kill a server, it shuts it down cleanly - this won't cause failover.

              Kill it using kill, or using task manager in windows.

              There have been long discussions on this in other threads.

              • 4. Re: problem when testing multiple failovers in a cluster
                timfox