1 2 Previous Next 15 Replies Latest reply on Jul 16, 2008 10:55 AM by bfach

    JBoss Cluster failover not working

    bfach

      Hi,

      I am using Jboss 4.2.2.GA with Messaging 1.4SP3. I am running two MDB's on a server.

      I have two identical nodes running in a cluster, both using the same db with two different serverpeerid's. I have configured them to run in a cluster by setting the following:

      -Djboss.partition.name=
      -Djboss.messaging.datachannelupdaddress=
      -Djboss.messaging.controlchannelupdaddress=

      Both servers start and join the cluster together. Failover has been an issue. With a standalone java ap, we are sending messages to a queue on ServerA. ServerA is processing the messages.

      1. We kill A - B should take over
      2. B reports failover complete and the following message

      Timed out getting a connection from the pool. Try increasing clientMaxPoolSize and/or numberOfRetries attributes in remoting-xxx-service.xml

      org.jboss.jms.exception.MessagingJMSException: Failed to invoke


      3. No messages are processed on B

      There are cases where failover works and you see the usual error messages such as:

      peer closed connection, trying to re-send msg

      2nd attempt to send data failed too

      Got marshalling exception, exiting


      These messages appear during the failover of B but the messages in 2. also appear when failover does not suceed correctly.

      MORE INFO:
      1. binding to the machine name (machine-xp.net)
      2. using the same machine name in jndi.properties file for standalong java app sending messages
      3. Using ClusterConnectionFactory for JMS Producer from java app

      Any help would be greatly appreciated.

      Thank You,

        • 1. Re: JBoss Cluster failover not working
          timfox

          Do the clustering and failover examples in the distro work for you?

          Have you tried 1.4.0.SP3_CP02?

          • 2. Re: JBoss Cluster failover not working
            bfach

            Tim,

            Thanks for the fast reply. I have tried them before but just to make sure, could you point me to them so I can verify that I am following the examples correctly.

            CP2? We upgraded last night and still have same issues.

            Thanks for the help,

            • 3. Re: JBoss Cluster failover not working
              timfox

               

              "bfach" wrote:
              Tim,

              Thanks for the fast reply. I have tried them before but just to make sure, could you point me to them so I can verify that I am following the examples correctly.

              Thanks for the help,


              Take a look at the installation section in the userguide. It goes on to explain how to run the examples.

              • 4. Re: JBoss Cluster failover not working
                bfach

                Thanks Tim. I am going to look at this first and I will get back to you.

                Just one question, FailoveronNodeLeave in CP2, is this property set to true if you want another node to failover if node is shutdown properly or if it is killed?

                Thanks in advance,

                • 5. Re: JBoss Cluster failover not working
                  timfox

                  By default, failover will *not* occur if you shut down a node cleanly (see FAQ on wiki).

                  If you want it to occur in this situation set this property to true.

                  There's a JIRA about this against the 1.4.0.SP3_CP02 release with info and links to discussions on the subject.

                  • 6. Re: JBoss Cluster failover not working
                    bfach

                    Thanks. I thought so. Ok thanks for the help.

                    • 7. Re: JBoss Cluster failover not working
                      bfach

                      Ok i have gone through the example in the userguide for 1.4SP3 and have found nothing out of the ordinary.

                      I did find that my clusterpullconnectionfactory failover was set to false as well as the loadbalancing false. Failover has worked before and these properties were false. i turn switch them to true to see if it made a difference.

                      I finally turned my attention to the standalone java app that sends the messages. I realized that someone had changed the order of the classpath.

                      I have switched it back to the following:

                      export SIM_CLASSPATH=$SIM_HOME/bin:\
                      $SIM_HOME/vendor/jaxb/1.2/jaxb-impl.jar:\
                      $SIM_HOME/vendor/jaxb/1.2/jaxb-jsr173_1.0.jar:\
                      $SIM_HOME/vendor/commons/commons-lang-2.3.jar:\
                      $SIM_HOME/vendor/jboss/4.2.2.GA/log4j.jar:\
                      $SIM_HOME/vendor/jboss/4.2.2.GA/trove.jar:\
                      $SIM_HOME/vendor/jboss/4.2.2.GA/javassist.jar:\
                      $SIM_HOME/vendor/jboss/4.2.2.GA/jbossall-client.jar:\
                      $SIM_HOME/vendor/jboss/4.2.2.GA/jboss-aop-jdk50.jar:\
                      $SIM_HOME/vendor/jboss/4.2.2.GA/jboss-ejb3x.jar:\
                      $SIM_HOME/vendor/jboss/4.2.2.GA/jboss-remoting.jar:\
                      $SIM_HOME/vendor/jbm/1.4.0.SP3/jboss-messaging-client.jar:\
                      $SIM_HOME/conf:

                      After changing it back to this order, failover works again with no issues. Load balancing works between the two nodes running in the cluster.

                      The one small issue is that if you kill a node and bring it back up, it joins the cluster but doesn't take over any of the load after it is running again. If you kill the node doing all the work, the newly started node takes over. The desired behaviour after successful restart of a node is for it to share some of the load again.

                      Is there anything that doesn't seem right in the classpath?

                      Thanks for the help!

                      • 8. Re: JBoss Cluster failover not working
                        timfox

                        jboss remoting jar and messaging client jar should really be the first items on the classpath, as explained in the user guide.

                        • 9. Re: JBoss Cluster failover not working
                          bfach

                          Jeez... i just saw that after the post. I have seen this before and forgotten about this.

                          I will make the change to the CP.

                          Do you think this could be causing load balancing not to happen after a node restart>?

                          Thanks,

                          • 10. Re: JBoss Cluster failover not working
                            bfach

                            Tim,

                            I am having one small issue with failover still. I am finding with 2 nodes if i do the following

                            Start Node A and B
                            Start sending messages to A on hajndi port 1100
                            kill A -> b takes over
                            start A
                            kill B -> a takes over
                            start B
                            kill A -> B takes over but messages not making it to the queue this time.

                            Any thoughts? Sounds like it is linked to my question in the previous post.

                            Thanks,

                            • 11. Re: JBoss Cluster failover not working
                              bfach

                              It turns out that this was a classpath issue. The remoting jar for SP3 was not included but rather an older version. You need to have the 2.2.2.SP4 remoting jar included or failover will not work for the jms client.

                              • 12. Re: JBoss Cluster failover not working
                                timfox

                                 

                                "bfach" wrote:
                                It turns out that this was a classpath issue. The remoting jar for SP3 was not included but rather an older version. You need to have the 2.2.2.SP4 remoting jar included or failover will not work for the jms client.


                                Glad to hear it is working :)

                                Just to clarify - are you saying that the version of JBoss Remoting specified to use in the installation guide was wrong?

                                • 13. Re: JBoss Cluster failover not working
                                  bfach

                                  I did not see it in the users guide. I just remembered a mention of updating the remoting jar to match the version of messaging you are running. 1.4.0SP3 provides the correct messaging. Once i replace the remoting jar in jboss and the remote client, failover worked correctly.

                                  The only issue I still currently have is round robin load balancing after failover. This has to do with this thread. If I had JbossA and JbossB, I connect via jmx to one of the two on port 1100. From what i read, round robin is achieved via the connections to JBoss on the client side.

                                  If the above is true, how do you get load balancing back if you kill jboss a and it fails over to b? When you bring it back (JBossA), all the traffic continues to go to B and no load is directed at A. You can however failover to A if you kill B.. but load balancing is out.

                                  Anything that can be done? I wish load balancing was done by a intermediate source in between the client and server(s). This way round robin could be achieved before and after failover.

                                  Thanks in advance!

                                  • 14. Re: JBoss Cluster failover not working
                                    timfox

                                     

                                    "bfach" wrote:
                                    I did not see it in the users guide


                                    http://www.jboss.org/file-access/default/members/jbossmessaging/freezone/docs/userguide-1.4.0.SP3/html/installation.html#install.extra-steps

                                    1 2 Previous Next