-
1. Re: HA-JMS
brian.stansberry Sep 12, 2007 4:51 PM (in response to rarondini)Basically, it's the JGroups coordinator for the jboss:service=DefaultPartition channel, i.e. the first member shown in the "CurrentView" attribute for that mbean [1].
99% case: JGroups coordinator is the node that's been a member of the cluster the longest. The view is ordered by when nodes joined.
1% case: there was some network disruption and the cluster fell apart into more than one subgroup. Then the disruption ended and the subgroups merged. In that case the view will be ordered based on the IP address / port of the members, so the coordinator will be the one that sorts first.
[1] Odd case: you're not using deploy-hasingleton on all nodes in the cluster. In that case, the master will be the first node in the view where deploy-hasingleton is used. Rules for ordering the view are as described above. -
2. Re: HA-JMS
ratrask Sep 28, 2007 12:14 AM (in response to rarondini)Hi Brian,
I'm sleep deprived and having a very hard time so please excuse me if I sound like an idiot, but I really need some help.
What you describe here does not seem to happen by default, and I can't figure out how to make it so.
I have a 4.2.1 cluster, using MySQL as the DefaultDS. the configuration for all nodes is identical. (did one coppied the rest). Everything works fine when the first node comes up the queues are deployed and there are no exceptions. When the second & subsequent nodes come up the original node receives a stop from the BarrierController, and undeploys the queues. The second node does not start up the queues neither do subsequent nodes. Which ever is the last node to get shut down will bring back the queues.
This behavior is easy to replicate, just bring up 2 virgin installation Jboss nodes with configuration -c all
Your email indicates that the answer may lie with the Jgroups documentation but if you can point me in the right direction that would be greatly appreciated.
Ron -
3. Re: HA-JMS
ratrask Sep 28, 2007 7:25 AM (in response to rarondini)Just to clarify here are some snipits of a log that show the problem.
Install 2 virgin copies of Jboss on separate nodes. Then bring up the first node with the all configuration (run.bat ?c all). Note that is elected the Master
2007-09-24 06:48:09,718 DEBUG [org.jboss.ha.singleton.HASingletonController] partitionTopologyChanged, isElectedNewMaster=true, isMasterNode=false, viewID=-35945124
2007-09-24 06:48:09,718 DEBUG [org.jboss.ha.singleton.HASingletonController] startNewMaster, isMasterNode=false
2007-09-24 06:48:09,718 DEBUG [org.jboss.ha.singleton.HASingletonController] startSingleton() : elected for master singleton node
2007-09-24 06:48:09,718 DEBUG [org.jboss.ha.singleton.HASingletonController] Calling operation: deploy(file:/C:/JBoss/jboss-4.2.1.GA/server/all//deploy-hasingleton), on target: 'jboss.system:service=MainDeployer'
? it then created the queues for example
2007-09-24 06:48:10,484 DEBUG [org.jboss.system.ServiceCreator] About to create bean: jboss.mq.destination:service=Queue,name=A with code: org.jboss.mq.server.jmx.Queue
2007-09-24 06:48:10,499 DEBUG [org.jboss.system.ServiceCreator] Created bean: jboss.mq.destination:service=Queue,name=A
2007-09-24 06:48:10,499 DEBUG [org.jboss.system.ServiceController] recording that jboss.mq.destination:service=Queue,name=A depends on jboss.mq:service=DestinationManager
2007-09-24 06:48:10,499 DEBUG [org.jboss.system.ServiceConfigurator] considering DestinationManager with object name jboss.mq:service=DestinationManager
? now bring up the second node. The log on the original node records the following:
2007-09-24 07:37:55,512 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 1, delta: 1) : [127.0.0.1:1099, 127.0.0.1:1099]
2007-09-24 07:37:55,527 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] membership changed from 1 to 2
? followed by
2007-09-24 07:37:56,840 DEBUG [org.jboss.system.BarrierController] Saw 'stop' handback, stopping barrier
2007-09-24 07:37:56,840 DEBUG [org.jboss.system.ServiceController] stopping service: jboss.ha:service=HASingletonDeployer,type=Barrier
2007-09-24 07:37:56,840 DEBUG [org.jboss.system.ServiceController] stopping dependent services for: jboss.ha:service=HASingletonDeployer,type=Barrier dependent services are: []
? and by
2007-09-24 07:37:56,855 DEBUG [org.jboss.system.ServiceController] stopping service: jboss.mq.destination:service=Queue,name=A
2007-09-24 07:37:56,855 DEBUG [org.jboss.system.ServiceController] stopping dependent services for: jboss.mq.destination:service=Queue,name=A dependent services are: []
2007-09-24 07:37:56,855 DEBUG [org.jboss.mq.server.jmx.Queue.A] Stopping jboss.mq.destination:service=Queue,name=A
2007-09-24 07:37:56,855 INFO [org.jboss.mq.server.jmx.Queue.A] Unbinding JNDI name: queue/A
2007-09-24 07:37:56,871 DEBUG [org.jboss.mq.server.JMSDestinationManager] Closing destination QUEUE.A
2007-09-24 07:37:56,871 DEBUG [org.jboss.mq.server.jmx.Queue.A] Stopped jboss.mq.destination:service=Queue,name=A -
4. Re: HA-JMS
brian.stansberry Sep 28, 2007 11:07 AM (in response to rarondini)2007-09-24 07:37:55,512 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 1, delta: 1) : [127.0.0.1:1099, 127.0.0.1:1099]
Something is strange with your environment; both members of the view are shown as having the same address/port.
I suspect these nodes are on separate machines, and you're not passing -b when you start JBoss. Is that correct?
If so, can you try a couple things:
1) Start JBoss passing a real IP address on each node, e.g.
./run.sh -c all -b 192.xxx.xxx.xxx
2) Start JBoss telling it to bind services to all addressses:
./run.sh -c all -b 0.0.0.0
Try that latter 2 or 3 times. I'm curious if you have problems with it. -
5. Re: HA-JMS
ratrask Oct 4, 2007 11:36 PM (in response to rarondini)Thanks Brian,
I did not see your reply until today. it would have saved me sme pretty frustrating intervening days.
When I try the -b option, either of them it works with no problems, if I remove the -b option the problem recurs.
The problem was pretty vexing, because it occurred on two different networks that were administered independently. It happened every time I tried it as long as I had a valid setup.
I am kind of a cygwin junky, and the only commonality between the two environments was that I started the servers from cygwin with run.sh do you think this could be related to the issue?
None of the servers are dual homed, so I was dubious that your suggestion would work.
Another curiosity is that the problem happens in 4.2.0, but not in 4.0.5
At this point I have rolled back to 4.0.5. -
6. Re: HA-JMS
brian.stansberry Oct 4, 2007 11:57 PM (in response to rarondini)Yes, this problem would only occur in 4.2 and later.
In 4.2, if you don't pass -b at startup, JBoss will almost all services (e.g. JNDI) to 127.0.0.1. This is a security precaution; basically the server is not remotely available unless you say you want it to be.
An exception to this is JGroups channels, which will communicate using the default interface if you don't set -b. This allows servers to cluster. So, your servers clustered.
For services based on the HAPartition (e.g. HASingleton) the nodes in the cluster are identified by taking the IP address and port that the JNDI service is bound to. I won't go into the details as to why this is, other than to say the goal is to detect/prevent the same node joining the group twice. This breaks down if the nodes in the group are all binding JNDI to localhost, since every node has the same id!! You can see this in your log:
2007-09-24 07:37:55,512 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 1, delta: 1) : [127.0.0.1:1099, 127.0.0.1:1099]
Having two nodes with the same id like that must be messing up the master node election process. Hence my suggestion to use -b.