Q. Joining a cluster fails
Join can fail for several reasons, e.g. the coordinator crashed just before the new member joined.
When FD is used, it will take some time to discover the failed coord. A client will loop in the JOIN until the new coord has taken over. To troubleshoot this, it is useful to have:
a stack trace of the joiner and the coordinator (oldest member, who handles the JOIN)
logs: org.jgroups at the TRACE level, see JGroups Logging
the output of probe (see Probe protocol)
Q. What is this shunning thing, and should I turn it on or off ?
Check the page on shunning for details
Q. What is the version of JGroups ?
Execute from the command line:
java -cp jgroups.jar org.jgroups.Version
or
java -jar jgroups.jar
Q. Can JGroups bind to 0.0.0.0 ?
A bind address of 0.0.0.0 works for other JBoss services, but not for JGroups because the address plus port constitute the identity of a JGroups node, and 0.0.0.0 is not a valid address
Q. How do I bind TCP sockets to all interfaces?
In order to bind to all network interfaces:
remove bind_addr
add receive_interfaces (see example below) and
set -Dignore.bind.address=true, so that the values set in the XML files are taken
Example:
<TCP receive_interfaces="192.168.5.2,192.168.0.2" start_port="7800" ...></TCP>
Q. How does a JGroups transport protocol decide which address to bind to ?
There's two ways in which the bind address can be specified:
Using the
bind.address
system property
Specifying the
bind_addr
XML attribute in any of the transport protocols
The system property always overrides the XML property, unless you use the system property
-Dignore.bind.address=true
(added at added at JGroups 2.2.8). Then it will use the bind_addr value from the config XML file.
Q. How do bind all JBoss services to the same IP address but have the JGroups traffic going over a different network (i.e. clustering)?
For JGroups 2.2.8 and later:
Use -Dignore.bind.address=true and the bind_addr attribute in the protocol stack config -- JGroup will now ignore the -b switch and use the bind_addr
For JGroups 2.2.7 and earlier:
In this case, you can't use ignore.bind.address because it was added in 2.2.8. Therefore, you would need to start AS something like this:
$ run.sh -b 192.168.1.10 -Dbind.address=10.0.0.10
Note that for releases prior to 4.0.5.GA the -Dbind.address part must come after the -b part. When the AS parses the command line args, the -b switch sets two system properties -- jboss.bind.address and bind.address. JGroups uses the latter. If you specifically set bind.address after -b is parsed, that value will be preserved and JGroups will use it. For 4.0.5.GA and later it doesn't matter in what order you specify things; if you set -Dbind.address, the value you pass will be used by JGroups.
Q. Merging does not occur even though Shunning is disabled, what could be the problem ?
If you are using TCPPING, check initial_hosts attribute as explained in MERGE2 protocol.
Q. I get an "java.net.BindException: Cannot assign requested address exception"
If you get an exception like this, then, switch to IPv4 with -Djava.net.preferIPv4Stack=true. This is quite likely due to trying to use IPv6 in Linux but Sun's JDK has a bug that won't be fix until Java 6. See IPv6
Caused by: java.lang.Exception: problem creating sockets (bind_addr=/fe80:0:0:0:217:a4ff:fe10:3ee7%3, mcast_addr=null) at org.jgroups.protocols.UDP.start(UDP.java:372) at org.jgroups.stack.Protocol.handleSpecialDownEvent(Protocol.java:589) ... 1 more Caused by: java.net.BindException: Cannot assign requested address at java.net.PlainDatagramSocketImpl.bind0(Native Method) at java.net.PlainDatagramSocketImpl.bind(PlainDatagramSocketImpl.java:82) at java.net.DatagramSocket.bind(DatagramSocket.java:368) at java.net.DatagramSocket.<init>(DatagramSocket.java:210) at java.net.DatagramSocket.<init>(DatagramSocket.java:261) at org.jgroups.protocols.UDP.createEphemeralDatagramSocket(UDP.java:572) at org.jgroups.protocols.UDP.createSockets(UDP.java:436) at org.jgroups.protocols.UDP.start(UDP.java:367)
Q. During a load test, I get: "ERROR org.jgroups.blocks.GroupRequest both corr and transport are null, cannot send group request", how do I get around it?
This error message can appear under high load. Upping the FD timeout setting should make this error dissapear. In fact, it's recommended to have an combined FD and FD_SOCK failure detection mechanism as explained in FDVersusFD_SOCK, with a high FD timeout.
Referenced by:
Comments