This content has been marked as final.
Show 3 replies
-
1. Re: Clustered nodes cannot discover each other after unplugg
ben.wang Dec 13, 2005 11:50 AM (in response to vegecat)you should turn on the the jgroups log tracing to see the details. Please refer to the jgroups wiki page in jboss wiki site.
-
2. Re: Clustered nodes cannot discover each other after unplugg
vegecat Dec 13, 2005 5:08 PM (in response to vegecat)Hi, Ben
Thanks for the pointer. I tested the unplug/replug network cable scenario again while enabling TRACE jgroups. The result was different from what I saw previously. The two clustered nodes could discover each other after an initial failure. This is the message shown on one node after I replugged in the network cable: (NMS is the partition name)16:59:04,867 INFO [TreeCache] viewAccepted(): new members: [USABBRDUL14407:2567 , USABBRDUL15163:1672] 16:59:04,877 ERROR [GMS] [USABBRDUL14407:2567] received view <= current view; di scarding it (current vid: [USABBRDUL14407:2567|13], new vid: [USABBRDUL14407:256 7|13]) 16:59:05,077 WARN [NAKACK] [USABBRDUL14407:2569 (additional data: 19 bytes)] di scarded message from non-member USABBRDUL15163:1674 (additional data: 18 bytes) 16:59:06,740 INFO [TreeCache] viewAccepted(): new members: [USABBRDUL14407:2568 , USABBRDUL15163:1676] 16:59:06,740 ERROR [GMS] [USABBRDUL14407:2568] received view <= current view; di scarding it (current vid: [USABBRDUL14407:2568|13], new vid: [USABBRDUL14407:256 8|13]) 16:59:08,642 WARN [NAKACK] [USABBRDUL14407:2569 (additional data: 19 bytes)] di scarded message from non-member USABBRDUL15163:1674 (additional data: 18 bytes) 16:59:08,963 INFO [NMS] New cluster view for partition NMS (id: 13, delta: 1) : [130.110.93.205:1099, 10.66.248.243:1099] 16:59:08,963 INFO [NMS] Merging partitions... 16:59:08,963 INFO [NMS] Dead members: 0 16:59:08,963 INFO [NMS] Originating groups: [[USABBRDUL14407:2569 (additional d ata: 19 bytes)|12] [USABBRDUL14407:2569 (additional data: 19 bytes)], [USABBRDUL 15163:1674 (additional data: 18 bytes)|12] [USABBRDUL15163:1674 (additional data : 18 bytes)]] 16:59:08,973 ERROR [GMS] [USABBRDUL14407:2569 (additional data: 19 bytes)] recei ved view <= current view; discarding it (current vid: [USABBRDUL14407:2569 (addi tional data: 19 bytes)|13], new vid: [USABBRDUL14407:2569 (additional data: 19 b ytes)|13]) 16:59:09,323 ERROR [NMS] merge failed java.lang.ClassCastException: EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderH ashMap at org.jboss.ha.framework.server.DistributedReplicantManagerImpl.mergeMe mbers(DistributedReplicantManagerImpl.java:791) at org.jboss.ha.framework.server.DistributedReplicantManagerImpl$MergeMe mbers.run(DistributedReplicantManagerImpl.java:927) 16:59:17,595 WARN [CoordGmsImpl] merge responses from subgroup coordinators <= 1 ([sender=USABBRDUL14407:2534, view=[USABBRDUL14407:2534|1] [USABBRDUL14407:253 4, USABBRDUL15163:1641], digest=[USABBRDUL14407:2534: [0 : 21, USABBRDUL15163:16 41: [0 : 0]]). Cancelling merge 16:59:37,784 WARN [CoordGmsImpl] merge responses from subgroup coordinators <= 1 ([sender=USABBRDUL14407:2534, view=[USABBRDUL14407:2534|1] [USABBRDUL14407:253 4, USABBRDUL15163:1641], digest=[USABBRDUL14407:2534: [0 : 21, USABBRDUL15163:16 41: [0 : 0]]). Cancelling merge 16:59:50,963 WARN [CoordGmsImpl] merge responses from subgroup coordinators <= 1 ([sender=USABBRDUL14407:2534, view=[USABBRDUL14407:2534|1] [USABBRDUL14407:253 4, USABBRDUL15163:1641], digest=[USABBRDUL14407:2534: [0 : 22, USABBRDUL15163:16 41: [0 : 0]]). Cancelling merge 17:00:04,162 WARN [CoordGmsImpl] merge responses from subgroup coordinators <= 1 ([sender=USABBRDUL14407:2534, view=[USABBRDUL14407:2534|1] [USABBRDUL14407:253 4, USABBRDUL15163:1641], digest=[USABBRDUL14407:2534: [0 : 22, USABBRDUL15163:16 41: [0 : 0]]). Cancelling merge 17:00:18,172 WARN [CoordGmsImpl] merge responses from subgroup coordinators <= 1 ([sender=USABBRDUL14407:2534, view=[USABBRDUL14407:2534|1] [USABBRDUL14407:253 4, USABBRDUL15163:1641], digest=[USABBRDUL14407:2534: [0 : 23, USABBRDUL15163:16 41: [0 : 0]]). Cancelling merge 17:00:21,848 WARN [FD] ping_dest is null: members=[USABBRDUL14407:2568, USABBRD UL15163:1676], pingable_mbrs=[USABBRDUL14407:2568], local_addr=USABBRDUL14407:25 68 17:00:22,979 WARN [FD] ping_dest is null: members=[USABBRDUL14407:2567, USABBRD UL15163:1672], pingable_mbrs=[USABBRDUL14407:2567], local_addr=USABBRDUL14407:25 67 17:00:23,350 INFO [TreeCache] viewAccepted(): new members: [USABBRDUL14407:2568 ] 17:00:24,482 INFO [TreeCache] viewAccepted(): new members: [USABBRDUL14407:2567 ] 17:00:29,148 WARN [FD] ping_dest is null: members=[USABBRDUL14407:2569 (additio nal data: 19 bytes), USABBRDUL15163:1674 (additional data: 18 bytes)], pingable_ mbrs=[USABBRDUL14407:2569 (additional data: 19 bytes)], local_addr=USABBRDUL1440 7:2569 (additional data: 19 bytes) 17:00:29,579 INFO [NMS] Suspected member: USABBRDUL15163:1674 (additional data: 18 bytes) 17:00:29,579 INFO [NMS] New cluster view for partition NMS (id: 14, delta: -1) : [130.110.93.205:1099] 17:00:29,589 INFO [NMS] I am (130.110.93.205:1099) received membershipChanged e vent: 17:00:29,589 INFO [NMS] Dead members: 1 ([10.66.248.243:1099]) 17:00:29,589 INFO [NMS] New Members : 0 ([]) 17:00:29,589 INFO [NMS] All Members : 1 ([130.110.93.205:1099]) 17:00:33,044 WARN [NAKACK] [USABBRDUL14407:2569 (additional data: 19 bytes)] di scarded message from non-member USABBRDUL15163:1674 (additional data: 18 bytes) 17:00:33,054 WARN [NAKACK] [USABBRDUL14407:2568] discarded message from non-mem ber USABBRDUL15163:1676 17:00:33,064 WARN [NAKACK] [USABBRDUL14407:2567] discarded message from non-mem ber USABBRDUL15163:1672 17:00:35,958 INFO [TreeCache] viewAccepted(): new members: [USABBRDUL14407:2568 , USABBRDUL15163:1698] 17:00:35,958 INFO [TreeCache] viewAccepted(): new members: [USABBRDUL14407:2567 , USABBRDUL15163:1701] 17:00:35,968 INFO [NMS] New cluster view for partition NMS (id: 15, delta: 1) : [130.110.93.205:1099, 10.66.248.243:1099] 17:00:35,968 INFO [NMS] I am (130.110.93.205:1099) received membershipChanged e vent: 17:00:35,968 INFO [NMS] Dead members: 0 ([]) 17:00:35,968 INFO [NMS] New Members : 1 ([10.66.248.243:1099]) 17:00:35,978 INFO [TreeCache] locking the tree to obtain transient state 17:00:35,978 INFO [TreeCache] returning the transient state (140 bytes) 17:00:35,978 INFO [NMS] All Members : 2 ([130.110.93.205:1099, 10.66.248.243:10 99]) 17:00:35,978 INFO [TreeCache] locking the tree to obtain transient state 17:00:35,978 INFO [TreeCache] returning the transient state (140 bytes) 17:00:52,071 WARN [CoordGmsImpl] merge responses from subgroup coordinators <= 1 ([sender=USABBRDUL14407:2534, view=[USABBRDUL14407:2534|1] [USABBRDUL14407:253 4, USABBRDUL15163:1641], digest=[USABBRDUL14407:2534: [0 : 24, USABBRDUL15163:16 41: [0 : 0]]). Cancelling merge 17:01:06,442 WARN [CoordGmsImpl] merge responses from subgroup coordinators <= 1 ([sender=USABBRDUL14407:2534, view=[USABBRDUL14407:2534|1] [USABBRDUL14407:253 4, USABBRDUL15163:1641], digest=[USABBRDUL14407:2534: [0 : 24, USABBRDUL15163:16 41: [0 : 0]]). Cancelling merge
-
3. Re: Clustered nodes cannot discover each other after unplugg
brian.stansberry Dec 13, 2005 6:53 PM (in response to vegecat)The
16:59:09,323 ERROR [NMS] merge failed
problem is due to http://jira.jboss.com/jira/browse/JBAS-2439.
java.lang.ClassCastException: EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderH
ashMap
at org.jboss.ha.framework.server.DistributedReplicantManagerImpl.mergeMe
mbers(DistributedReplicantManagerImpl.java:791)
at org.jboss.ha.framework.server.DistributedReplicantManagerImpl$MergeMe
mbers.run(DistributedReplicantManagerImpl.java:927)
As for the rest of the problems, it's hard to tell without understanding your environment. Is this the "all" config from 4.0.3, with 3 tree caches (session replication, SFSB replication, entity bean replication) + the NMS Partition? If so, it looks like 2 of the caches recovered:17:00:35,958 INFO [TreeCache] viewAccepted(): new members: [USABBRDUL14407:2568
, USABBRDUL15163:1698]
17:00:35,958 INFO [TreeCache] viewAccepted(): new members: [USABBRDUL14407:2567
, USABBRDUL15163:1701]
while the NMS partition lost a member:17:00:35,968 INFO [NMS] New Members : 1 ([10.66.248.243:1099])
which isn't surprising given the above referenced bug. Not clear what happened to the 3rd TreeCache. By the end of the log snippet it hadn't recovered.