6 Replies Latest reply on Oct 4, 2007 11:57 PM by brian.stansberry

    HA-JMS

    rarondini

      Hi all,

      When JBoss is deployed in a cluster environment with HA-JMS, which is the politics to choose the master node (JBoss 4.0.5) ??

        • 1. Re: HA-JMS
          brian.stansberry

          Basically, it's the JGroups coordinator for the jboss:service=DefaultPartition channel, i.e. the first member shown in the "CurrentView" attribute for that mbean [1].

          99% case: JGroups coordinator is the node that's been a member of the cluster the longest. The view is ordered by when nodes joined.

          1% case: there was some network disruption and the cluster fell apart into more than one subgroup. Then the disruption ended and the subgroups merged. In that case the view will be ordered based on the IP address / port of the members, so the coordinator will be the one that sorts first.

          [1] Odd case: you're not using deploy-hasingleton on all nodes in the cluster. In that case, the master will be the first node in the view where deploy-hasingleton is used. Rules for ordering the view are as described above.

          • 2. Re: HA-JMS
            ratrask

            Hi Brian,

            I'm sleep deprived and having a very hard time so please excuse me if I sound like an idiot, but I really need some help.

            What you describe here does not seem to happen by default, and I can't figure out how to make it so.

            I have a 4.2.1 cluster, using MySQL as the DefaultDS. the configuration for all nodes is identical. (did one coppied the rest). Everything works fine when the first node comes up the queues are deployed and there are no exceptions. When the second & subsequent nodes come up the original node receives a stop from the BarrierController, and undeploys the queues. The second node does not start up the queues neither do subsequent nodes. Which ever is the last node to get shut down will bring back the queues.

            This behavior is easy to replicate, just bring up 2 virgin installation Jboss nodes with configuration -c all

            Your email indicates that the answer may lie with the Jgroups documentation but if you can point me in the right direction that would be greatly appreciated.

            Ron

            • 3. Re: HA-JMS
              ratrask

              Just to clarify here are some snipits of a log that show the problem.

              Install 2 virgin copies of Jboss on separate nodes. Then bring up the first node with the all configuration (run.bat ?c all). Note that is elected the Master

              2007-09-24 06:48:09,718 DEBUG [org.jboss.ha.singleton.HASingletonController] partitionTopologyChanged, isElectedNewMaster=true, isMasterNode=false, viewID=-35945124
              2007-09-24 06:48:09,718 DEBUG [org.jboss.ha.singleton.HASingletonController] startNewMaster, isMasterNode=false
              2007-09-24 06:48:09,718 DEBUG [org.jboss.ha.singleton.HASingletonController] startSingleton() : elected for master singleton node
              2007-09-24 06:48:09,718 DEBUG [org.jboss.ha.singleton.HASingletonController] Calling operation: deploy(file:/C:/JBoss/jboss-4.2.1.GA/server/all//deploy-hasingleton), on target: 'jboss.system:service=MainDeployer'

              ? it then created the queues for example

              2007-09-24 06:48:10,484 DEBUG [org.jboss.system.ServiceCreator] About to create bean: jboss.mq.destination:service=Queue,name=A with code: org.jboss.mq.server.jmx.Queue
              2007-09-24 06:48:10,499 DEBUG [org.jboss.system.ServiceCreator] Created bean: jboss.mq.destination:service=Queue,name=A
              2007-09-24 06:48:10,499 DEBUG [org.jboss.system.ServiceController] recording that jboss.mq.destination:service=Queue,name=A depends on jboss.mq:service=DestinationManager
              2007-09-24 06:48:10,499 DEBUG [org.jboss.system.ServiceConfigurator] considering DestinationManager with object name jboss.mq:service=DestinationManager

              ? now bring up the second node. The log on the original node records the following:

              2007-09-24 07:37:55,512 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 1, delta: 1) : [127.0.0.1:1099, 127.0.0.1:1099]
              2007-09-24 07:37:55,527 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] membership changed from 1 to 2

              ? followed by

              2007-09-24 07:37:56,840 DEBUG [org.jboss.system.BarrierController] Saw 'stop' handback, stopping barrier
              2007-09-24 07:37:56,840 DEBUG [org.jboss.system.ServiceController] stopping service: jboss.ha:service=HASingletonDeployer,type=Barrier
              2007-09-24 07:37:56,840 DEBUG [org.jboss.system.ServiceController] stopping dependent services for: jboss.ha:service=HASingletonDeployer,type=Barrier dependent services are: []

              ? and by

              2007-09-24 07:37:56,855 DEBUG [org.jboss.system.ServiceController] stopping service: jboss.mq.destination:service=Queue,name=A
              2007-09-24 07:37:56,855 DEBUG [org.jboss.system.ServiceController] stopping dependent services for: jboss.mq.destination:service=Queue,name=A dependent services are: []
              2007-09-24 07:37:56,855 DEBUG [org.jboss.mq.server.jmx.Queue.A] Stopping jboss.mq.destination:service=Queue,name=A
              2007-09-24 07:37:56,855 INFO [org.jboss.mq.server.jmx.Queue.A] Unbinding JNDI name: queue/A
              2007-09-24 07:37:56,871 DEBUG [org.jboss.mq.server.JMSDestinationManager] Closing destination QUEUE.A
              2007-09-24 07:37:56,871 DEBUG [org.jboss.mq.server.jmx.Queue.A] Stopped jboss.mq.destination:service=Queue,name=A

              • 4. Re: HA-JMS
                brian.stansberry

                 

                2007-09-24 07:37:55,512 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 1, delta: 1) : [127.0.0.1:1099, 127.0.0.1:1099]


                Something is strange with your environment; both members of the view are shown as having the same address/port.

                I suspect these nodes are on separate machines, and you're not passing -b when you start JBoss. Is that correct?

                If so, can you try a couple things:

                1) Start JBoss passing a real IP address on each node, e.g.

                ./run.sh -c all -b 192.xxx.xxx.xxx

                2) Start JBoss telling it to bind services to all addressses:

                ./run.sh -c all -b 0.0.0.0

                Try that latter 2 or 3 times. I'm curious if you have problems with it.

                • 5. Re: HA-JMS
                  ratrask

                  Thanks Brian,

                  I did not see your reply until today. it would have saved me sme pretty frustrating intervening days.

                  When I try the -b option, either of them it works with no problems, if I remove the -b option the problem recurs.

                  The problem was pretty vexing, because it occurred on two different networks that were administered independently. It happened every time I tried it as long as I had a valid setup.

                  I am kind of a cygwin junky, and the only commonality between the two environments was that I started the servers from cygwin with run.sh do you think this could be related to the issue?

                  None of the servers are dual homed, so I was dubious that your suggestion would work.

                  Another curiosity is that the problem happens in 4.2.0, but not in 4.0.5

                  At this point I have rolled back to 4.0.5.

                  • 6. Re: HA-JMS
                    brian.stansberry

                    Yes, this problem would only occur in 4.2 and later.

                    In 4.2, if you don't pass -b at startup, JBoss will almost all services (e.g. JNDI) to 127.0.0.1. This is a security precaution; basically the server is not remotely available unless you say you want it to be.

                    An exception to this is JGroups channels, which will communicate using the default interface if you don't set -b. This allows servers to cluster. So, your servers clustered.

                    For services based on the HAPartition (e.g. HASingleton) the nodes in the cluster are identified by taking the IP address and port that the JNDI service is bound to. I won't go into the details as to why this is, other than to say the goal is to detect/prevent the same node joining the group twice. This breaks down if the nodes in the group are all binding JNDI to localhost, since every node has the same id!! You can see this in your log:

                    2007-09-24 07:37:55,512 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 1, delta: 1) : [127.0.0.1:1099, 127.0.0.1:1099]

                    Having two nodes with the same id like that must be messing up the master node election process. Hence my suggestion to use -b.