Testing Cluster Formation with JGroups
When JBoss nodes cannot find each other, it is best to start at the bottom of the stack, and test this with a simple JGroups program first. We will use the org.jgroups.demos.ViewDemo program with the same configuration as used in JBoss. To do this, perform the following steps:
Find the JGroups protocol stack configuration (see below for more on how to do this)
Copy the protocol stack configuration to the clipboard, including only the <config> and </config> tags, e.g. (edited)
<config> <UDP mcast_addr="239.192.2.2" mcast_port="48866" mcast_send_buf_size="10000000" ucast_recv_buf_size="10000000" loopback="false" mcast_recv_buf_size="10000000" max_bundle_size="64000" max_bundle_timeout="30" use_incoming_packet_handler="false" use_outgoing_packet_handler="true" ucast_send_buf_size="10000000" ip_ttl="32" enable_bundling="true"/> <PING timeout="2000" down_thread="false" num_initial_members="3"/> <MERGE2 max_interval="10000" down_thread="false" min_interval="5000"/> <FD_SOCK down_thread="false"/> <VERIFY_SUSPECT timeout="1500" down_thread="false"/> <pbcast.NAKACK max_xmit_size="60000" down_thread="false" use_mcast_xmit="true" gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"/> <UNICAST timeout="300,600,1200,2400,3600" down_thread="false"/> <pbcast.STABLE stability_delay="1000" desired_avg_gossip="5000" down_thread="false" max_bytes="250000"/> <pbcast.GMS print_local_addr="true" join_timeout="3000" down_thread="false" join_retry_timeout="2000" shun="true"/> <FRAG frag_size="60000" down_thread="false" up_thread="true"/> <pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/> </config>
Paste this into a file, e.g. test.xml.
We need to following JARs in the classpath: jgroups.jar, commons-logging.jar and concurrent.jar
Start 2 or more ViewDemo instances with this configuration (shown for JBoss 4.0.3RC1):
cd c:\jboss-4.0.3\server\default\lib C:\jboss-4.0.3RC1>java -cp lib\concurrent.jar;server\default\lib\jgroups.jar;server\default\lib\commons-logging.jar org.jgroups.demos.ViewDemo -props c:\test.xml ------------------------------------------------------- GMS: address is laptop:1440 ------------------------------------------------------- ** New view: [laptop:1440|0] [laptop:1440]
When another instance is started, the current instance should show this (same as when a member is killed):
** New view: [laptop:1440|1] [laptop:1440, laptop:1448] Suspected(laptop:1448) ** New view: [laptop:1440|2] [laptop:1440]
In this case, our instance is laptop:1440. Then a new instance is started as laptop:1448. Finally, the laptop:1448 instance is killed.
This shows that the JGroups instances do find each other.
If you work with dedicated bind addresses (run.sh -b <ip>) you should start the jgroups test with the additional property '-Djgroups.bind_addr=192.168.128.5', it should be equal to you final JBoss cluster. The bind address will make a difference for the network configuration!
If this test is successful, then we can move on and see what the problem is in JBoss. Otherwise, further troubleshooting needs to be done on the JGroups level.
One common problem in JBoss is that we have multiple clusters, but they are not separated by having different multicast addresses and/or ports.
Finding the JGroups Protocol Stack Configuration
In the AS 5 series, JGroups protocol stack configurations are all managed by the ChannelFactory service. Configurations are listed in the server/all/deploy/cluster/jgroups-channelfactory.sar/META-INF/jgroups-channelfactory-stacks.xml file; which configuration is relevant depends on what AS service is using the channel; see the wiki page for a listing of the standard protocol stacks and their typical usages.
Note that the udp and jbm-control configs include the UDP protocol configuration by means of an XML entity, &shared-udp. If you copy and paste one of those configs, you'll have to replace that &shared-udp with the UDP protocol configuration declared at the top of the file.
In AS 4.x and earlier, the JGroups protocol configurations where included in the deployment descriptor for whatever service was creating the channel, e.g. cluster-service.xml (for HAPartition), jboss-web-cluster.sar/META-INF/jboss-service.xml (HttpSession replication in 4.2.x), ejb3-clustered-sfsbcache-service.xml, ejb3-entity-cache-service.xml.
Further Troubleshooting
See http://www.jgroups.org/javagroupsnew/docs/manual/html/ch02.html and in particular the sub-page http://www.jgroups.org/javagroupsnew/docs/manual/html/ch02.html#ItDoesntWork.
Comments