JGroups Multiple Registrations
kvbisme Mar 18, 2008 10:29 AMWe have two JBoss Servers (4.0.5.GA) clustered together. Each machine has two network cards, one connects to the world and the other to a small subnet of the JBoss Servers and other support servers used by the enterprise (database and stuff like that)
In the run.conf at startup we added a -Djboss.bind.address= to point to the network card connected to our smaller subset of machines.
Every so often . . . we start getting messages (the machine and IP address have been changed to accommodate my overly concerned boss):
[org.jgroups.protocols.FD] I was suspected, but will not remove myself from membership (waiting for EXIT message)
[org.jgroups.protocols.pbcast.GMS] checkSelfInclusion() failed, Machine1-int:33252 (additional data: 18 bytes) is not a member of view [Machine2-int:33200 (additional data: 18 bytes) |2] [Machine2-int:33200 (additional data: 18 bytes)]; discarding view
[org.jgroups.protocols.pbcast.GMS] I (Machine1-int:33252 (additional data: 18 bytes)) am being shunned, will leave and rejoin group (prev_members are [Machine1-int:33252 (additional data: 18 bytes) Machine2-int:33200 (additional data: 18 bytes) ])
[org.jgroups.protocols.pbcast.NAKACK] [Machine1-int:33252 (additional data: 18 bytes)] discarded message from non-member Machine2-int:33200 (additional data: 18 bytes)
[org.jgroups.protocols.pbcast.NAKACK] [Machine1-int:33252 (additional data: 18 bytes)] discarded message from non-member Machine2-int:33200 (additional data: 18 bytes)
[org.jgroups.protocol.PING] down_handler thread for PING was interrupted (in order to be terminated), but is is still alive
----------------------------------------------------------------------------
GMS: address is Machine1-int:33265 (additional data: 18 bytes)
----------------------------------------------------------------------------
[org.jgroups.protocol.pbcast.NAKACK] sender Machine1-int:33252 (additional data: 18 bytes) not found in received_msgs
[org.jgroups.protocol.pbcast.NAKACK] range is null
[org.jgroups.protocol.pbcast.NAKACK] sender Machine2-int:33200 (additional data: 18 bytes) not found in received_msgs
[org.jgroups.protocol.pbcast.NAKACK] range is null
[org.jgroups.protocol.pbcast.Digest] sender is null, will not add it !
[org.jgroups.protocol.pbcast.Digest] sender is null, will not add it !
[org.jgroups.protocols.pbcast.NAKACK] sender at index 1 in digest is null
[org.jgroups.protocols.pbcast.NAKACK] sender at index 2 in digest is null
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition ( id: 3, delta: 1) : [111.222.333.001:1099, 111.222.333.002:1099, 111.222.333.001:1099]
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] I am (111.222.333.001:1099) receivedmembershipChanged event:
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] Dead Members: 0 ([])
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] New Members: 0 ([])
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] All Members : 3 ([ 111.222.333.001:1099, 111.222.333.002:1099, 111.222.333.001:1099])
[org.jgroups.protocols.pbcast.STATE_TRANSFER] GET_APPLSTATE_OK: received application state, but there are no requestors !
Then there are four sets of the following messages with sequence numbers starting at zero and ending at 1273:
[org.jgroups.protocols.pbcast.NAKACK] (requestor=Machine1-int:33265 (additional data: 18 bytes), local_addr=Machine1-int:33252 (additional data: 18 bytes)) message with seqno=0 not found in sent_msgs ! sent_msgs=[1274 -“ 1274]
. . .
[org.jgroups.protocols.pbcast.NAKACK] (requestor=Machine1-int:33265 (additional data: 18 bytes), local_addr=Machine1-int:33252 (additional data: 18 bytes)) message with seqno=1273 not found in sent_msgs ! sent_msgs=[1274 - 1274]
At this point Machine1 starts adding itself to the cluster over and over again until we have to stop and restart the machine.
What could possibly be going on here?