Server fails to join cluster after being restarted
erikpearson Nov 21, 2013 12:44 AMDoes anyone have suggestions about why a server can't drop and rejoin a cluster? I'm running Infinispan 6.0.0.Final in a two node cluster. With servers A and B running and clustered, if B is brought down, then restarted, it fails to rejoin the cluster. We see the following errors in the logs:
Server A:
[2013-11-20 23:26:53,087][ViewHandler,mn4s34196-5743][DEBUG][org.jgroups.protocols.pbcast.STABLE] suspending message garbage collection
[2013-11-20 23:26:53,087][ViewHandler,mn4s34196-5743][DEBUG][org.jgroups.protocols.pbcast.STABLE] mn4s34196-5743: resume task started, max_suspend_time=33000
[2013-11-20 23:26:53,087][Incoming-1,mn4s34196-5743][DEBUG][org.jgroups.protocols.pbcast.GMS] mn4s34196-5743: installing view [mn4s34196-5743|9] (2) [mn4s34196-5743, mn4s34198-20624]
[2013-11-20 23:26:53,088][Incoming-1,mn4s34196-5743][DEBUG][org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [mn4s34196-5743, mn4s34198-20624]
[2013-11-20 23:26:53,088][FD_SOCK pinger,mn4s34196-5743][DEBUG][org.jgroups.protocols.FD_SOCK] ping_dest is mn4s34198-20624, pingable_mbrs=[mn4s34196-5743, mn4s34198-20624]
[2013-11-20 23:26:53,088][Incoming-1,mn4s34196-5743][DEBUG][org.infinispan.remoting.transport.jgroups.JGroupsTransport] New view accepted: [mn4s34196-5743|9] (2) [mn4s34196-5743, mn4s34198-20624]
[2013-11-20 23:26:53,088][Incoming-1,mn4s34196-5743][INFO][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [mn4s34196-5743|9] (2) [mn4s34196-5743,,mn4s34198-20624]
[2013-11-20 23:26:53,135][ViewHandler,mn4s34196-5743][DEBUG][org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
[2013-11-20 23:27:52,497][Timer-4,mn4s34196-5743][DEBUG][org.jgroups.protocols.FD_ALL] haven't received a heartbeat from mn4s34198-20624 for 50351 ms, adding it to suspect list
[2013-11-20 23:27:52,497][Timer-4,mn4s34196-5743][DEBUG][org.jgroups.protocols.FD_ALL] suspecting [mn4s34198-20624]
[2013-11-20 23:27:52,498][Incoming-2,mn4s34196-5743][WARN][org.jgroups.protocols.pbcast.GMS] mn4s34196-5743: not member of view [mn4s34198-20624|10]; discarding it
[2013-11-20 23:27:55,499][Timer-2,mn4s34196-5743][DEBUG][org.jgroups.protocols.MERGE2] mn4s34196-5743 found different views : [mn4s34198-20624|10], [mn4s34196-5743|9]; sending up MERGE event with merge participants [mn4s34198-20624, mn4s34196-5743].
Discovery results:
[mn4s34198-20624]: coord=mn4s34198-20624
[mn4s34196-5743]: coord=mn4s34196-5743
[2013-11-20 23:27:56,748][INT-1,mn4s34196-5743][DEBUG][org.jgroups.protocols.pbcast.STABLE] suspending message garbage collection
[2013-11-20 23:27:56,748][INT-1,mn4s34196-5743][DEBUG][org.jgroups.protocols.pbcast.STABLE] mn4s34196-5743: resume task started, max_suspend_time=220000
[2013-11-20 23:27:56,803][Incoming-1,mn4s34196-5743][DEBUG][org.jgroups.protocols.pbcast.NAKACK2]
[mn4s34196-5743 mergeDigest()]
existing digest: mn4s34198-20624: [3 (3)], mn4s34196-5743: [14 (14)]
new digest: mn4s34198-20624: [1 (1)], mn4s34196-5743: [14 (14)]
resulting digest: mn4s34198-20624: [3 (3)], mn4s34196-5743: [14 (14)]
[2013-11-20 23:27:56,803][Incoming-1,mn4s34196-5743][DEBUG][org.jgroups.protocols.pbcast.GMS] mn4s34196-5743: installing view MergeView::[mn4s34198-20624|11] (2) [mn4s34198-20624, mn4s34196-5743], 2 subgroups: [mn4s34198-20624|10] (1) [mn4s34198-20624], [mn4s34196-5743|9] (1) [mn4s34196-5743]
[2013-11-20 23:27:56,804][Incoming-1,mn4s34196-5743][DEBUG][org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [mn4s34198-20624, mn4s34196-5743]
[2013-11-20 23:27:56,804][Incoming-1,mn4s34196-5743][DEBUG][org.infinispan.remoting.transport.jgroups.JGroupsTransport] New view accepted: MergeView::[mn4s34198-20624|11] (2) [mn4s34198-20624, mn4s34196-5743], 2 subgroups: [mn4s34198-20624|10] (1) [mn4s34198-20624], [mn4s34196-5743|9] (1) [mn4s34196-5743]
[2013-11-20 23:27:56,804][Incoming-1,mn4s34196-5743][INFO][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000093: Received new, MERGED cluster view: MergeView::[mn4s34198-20624|11] (2) [mn4s34198-20624, mn4s34196-5743], 2 subgroups: [mn4s34198-20624|10] (1) [mn4s34198-20624], [mn4s34196-5743|9] (1) [mn4s34196-5743]
[2013-11-20 23:27:56,804][Incoming-1,mn4s34196-5743][DEBUG][org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
…
[2013-11-20 23:27:58,806][Incoming-2,mn4s34196-5743][WARN][org.jgroups.protocols.pbcast.GMS] mn4s34196-5743: failed to collect all ACKs (expected=2) for view [mn4s34198-20624|11] after 2000ms, missing 2 ACKs from mn4s34198-20624, mn4s34196-5743
Server B:
[2013-11-20 23:26:52,976][main][DEBUG][org.jgroups.protocols.UDP] socket information:
mcast_addr=230.33.1.1:46655, bind_addr=/10.151.115.22, ttl=2
sock: bound to 10.151.115.22:39198, receive buffer size=20000000, send buffer size=640000
mcast_sock: bound to 10.151.115.22:46655, send buffer size=640000, receive buffer size=25000000
[2013-11-20 23:26:52,978][main][DEBUG][org.jgroups.protocols.pbcast.GMS] address=mn4s34198-20624, cluster=ISPN, physical address=10.151.115.22:39198
[2013-11-20 23:26:53,007][main][DEBUG][org.jgroups.protocols.pbcast.GMS] mn4s34198-20624: sending JOIN(mn4s34198-20624) to mn4s34196-5743
[2013-11-20 23:26:53,116][main][DEBUG][org.jgroups.protocols.pbcast.NAKACK2]
[mn4s34198-20624 setDigest()]
existing digest: []
new digest: mn4s34196-5743: [13 (13)], mn4s34198-20624: [0 (0)]
resulting digest: mn4s34196-5743: [13 (13)], mn4s34198-20624: [0 (0)]
[2013-11-20 23:26:53,117][main][DEBUG][org.jgroups.protocols.pbcast.GMS] mn4s34198-20624: installing view [mn4s34196-5743|9] (2) [mn4s34196-5743, mn4s34198-20624]
[2013-11-20 23:26:53,117][main][DEBUG][org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [mn4s34196-5743, mn4s34198-20624]
[2013-11-20 23:26:53,120][main][DEBUG][org.infinispan.remoting.transport.jgroups.JGroupsTransport] New view accepted: [mn4s34196-5743|9] (2) [mn4s34196-5743, mn4s34198-20624]
[2013-11-20 23:26:53,120][main][INFO][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [mn4s34196-5743|9] (2) [mn4s34196-5743, mn4s34198-20624]
[2013-11-20 23:26:53,121][main][INFO][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000079: Cache local address is mn4s34198-20624, physical addresses are [10.151.115.22:39198]
[2013-11-20 23:26:53,122][main][DEBUG][org.infinispan.remoting.transport.jgroups.JGroupsTransport] Waiting on view being accepted
[2013-11-20 23:26:53,125][main][INFO][org.infinispan.factories.GlobalComponentRegistry] ISPN000128: Infinispan version: Infinispan 'Infinium' 6.0.0.Final
[2013-11-20 23:26:53,156][FD_SOCK pinger,mn4s34198-20624][DEBUG][org.jgroups.protocols.FD_SOCK] ping_dest is mn4s34196-5743, pingable_mbrs=[mn4s34196-5743, mn4s34198-20624]
...
[2013-11-20 23:27:17,138][Timer-3,mn4s34198-20624][DEBUG][org.jgroups.protocols.FD_ALL] haven't received a heartbeat from mn4s34196-5743 for 15026 ms, adding it to suspect list
[2013-11-20 23:27:17,138][Timer-3,mn4s34198-20624][DEBUG][org.jgroups.protocols.FD_ALL] suspecting [mn4s34196-5743]
[2013-11-20 23:27:18,641][VERIFY_SUSPECT.TimerThread,mn4s34198-20624][DEBUG][org.jgroups.blocks.RequestCorrelator] suspect=mn4s34196-5743
[2013-11-20 23:27:18,695][ViewHandler,mn4s34198-20624][DEBUG][org.jgroups.protocols.pbcast.GMS] mn4s34198-20624: members are [mn4s34196-5743, mn4s34198-20624], coord=mn4s34198-20624: I'm the new coord !
[2013-11-20 23:27:18,696][ViewHandler,mn4s34198-20624][DEBUG][org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
[2013-11-20 23:27:18,749][Incoming-1,mn4s34198-20624][DEBUG][org.jgroups.protocols.pbcast.GMS] mn4s34198-20624: installing view [mn4s34198-20624|10] (1) [mn4s34198-20624]
[2013-11-20 23:27:18,749][Incoming-1,mn4s34198-20624][DEBUG][org.jgroups.protocols.pbcast.NAKACK2] mn4s34198-20624: removed mn4s34196-5743 from xmit_table (not member anymore)
[2013-11-20 23:27:18,749][Incoming-1,mn4s34198-20624][DEBUG][org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [mn4s34198-20624]
[2013-11-20 23:27:19,091][FD_SOCK pinger,mn4s34198-20624][DEBUG][org.jgroups.protocols.FD_SOCK] socket to null was closed gracefully
[2013-11-20 23:27:19,093][Incoming-1,mn4s34198-20624][DEBUG][org.infinispan.remoting.transport.jgroups.JGroupsTransport] New view accepted: [mn4s34198-20624|10] (1) [mn4s34198-20624]
[2013-11-20 23:27:19,093][Incoming-1,mn4s34198-20624][INFO][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [mn4s34198-20624|10] (1) [mn4s34198-20624] !