"suspected members" on cluster with 2 nodes
israeldl Aug 10, 2010 4:30 PMHi guys,
I'm trying to run infinispan-demo-gui on 2 nodes (on the same network), but the cluster becomes unstable.
It seems that the nodes recognize each other, start a cluster, and then one of them is declared suspect ... repeatedly. This loop never ends...so the log is pretty big.
One of the nodes(rooster) have this log (not complete...):
2993 [pool-1-thread-1] DEBUG org.jgroups.protocols.pbcast.GMS - new_view=[ozzy-22851|1] [ozzy-22851, rooster-3720]
2993 [pool-1-thread-1] DEBUG org.jgroups.protocols.pbcast.GMS - rooster-3720: view is [ozzy-22851|1] [ozzy-22851, rooster-3720]
2997 [pool-1-thread-1] DEBUG org.jgroups.protocols.FD_SOCK - VIEW_CHANGE received: [ozzy-22851, rooster-3720]
3009 [FD_SOCK pinger,rooster-3720] DEBUG org.jgroups.protocols.FD_SOCK - ping_dest is ozzy-22851, pingable_mbrs=[ozzy-22851, rooster-3720]
3013 [pool-1-thread-1] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport - Received new cluster view: [ozzy-22851|1] [ozzy-22851, rooster-3720]
3016 [pool-1-thread-1] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: installing view [ozzy-22851|1] [ozzy-22851, rooster-3720]
3017 [pool-1-thread-1] DEBUG org.jgroups.protocols.pbcast.GMS - rooster-3720:
5018 [pool-1-thread-1] WARN org.jgroups.protocols.pbcast.FLUSH - rooster-3720: waiting for UNBLOCK timed out after 2000 ms
5019 [pool-1-thread-1] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport - Cache local address is rooster-3720, physical addresses are [192.168.1.2:50702]
5020 [pool-1-thread-1] INFO org.infinispan.factories.GlobalComponentRegistry - Infinispan version: Infinispan 'Pagoa' 5.0.0.SNAPSHOT
5058 [pool-1-thread-1] DEBUG org.infinispan.interceptors.InterceptorChain - Interceptor chain size: 6
5059 [pool-1-thread-1] DEBUG org.infinispan.interceptors.InterceptorChain - Interceptor chain is:
>> org.infinispan.interceptors.InvocationContextInterceptor
>> org.infinispan.interceptors.DistTxInterceptor
>> org.infinispan.interceptors.NotificationInterceptor
>> org.infinispan.interceptors.DistLockingInterceptor
>> org.infinispan.interceptors.DistributionInterceptor
>> org.infinispan.interceptors.CallInterceptor
5106 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.JoinTask - Commencing rehash on node: rooster-3720. Before start, dmi.joinComplete = false
9009 [OOB-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: suspect is ozzy-22851, completed false, flushOkSet {}, flushMembers []
9010 [OOB-1,rooster-3720] DEBUG org.jgroups.blocks.RequestCorrelator - suspect=ozzy-22851
9064 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS - suspected members=[ozzy-22851], suspected_mbrs=[ozzy-22851]
9065 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS - members are [ozzy-22851, rooster-3720], coord=rooster-3720: I'm the new coord !
9067 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS - rooster-3720:
9121 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS - new=[], suspected=[ozzy-22851], leaving=[], new view: [rooster-3720|2] [rooster-3720]
9123 [Incoming-2,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: received START_FLUSH, but not sending BLOCK up
9125 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: flush coordinator is starting FLUSH with participants [rooster-3720]
9126 [Incoming-2,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: received START_FLUSH, responded with FLUSH_COMPLETED to rooster-3720
9127 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: FLUSH_COMPLETED from rooster-3720, completed true, flushMembers [rooster-3720], flushCompleted [rooster-3720]
9127 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: all FLUSH_COMPLETED received
9130 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS - rooster-3720: view is [rooster-3720|2] [rooster-3720]
9131 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.NAKACK - removed ozzy-22851 from xmit_table (not member anymore)
9132 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.FD_SOCK - VIEW_CHANGE received: [rooster-3720]
9436 [FD_SOCK pinger,rooster-3720] DEBUG org.jgroups.protocols.FD_SOCK - socket to null was closed gracefully
9443 [Incoming-1,rooster-3720] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport - Received new cluster view: [rooster-3720|2] [rooster-3720]
9451 [Incoming-1,rooster-3720] INFO org.infinispan.distribution.DistributionManagerImpl - Detected a view change. Member list changed from [ozzy-22851, rooster-3720] to [rooster-3720]
9451 [Incoming-1,rooster-3720] INFO org.infinispan.distribution.DistributionManagerImpl - This is a LEAVE event! Node ozzy-22851 has just left
9453 [Incoming-1,rooster-3720] INFO org.infinispan.distribution.DistributionManagerImpl - I rooster-3720 am participating in rehash
9454 [Incoming-1,rooster-3720] INFO org.infinispan.distribution.TransactionLoggerImpl - Starting transaction logging
9458 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: installing view [rooster-3720|2] [rooster-3720]
9459 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
9460 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS - rooster-3720: sending RESUME event
9460 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: received RESUME, sending STOP_FLUSH to all
9463 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH - rooster-3720: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
9793 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.JoinTask - Retrieved old consistent hash address list [ozzy-22851]
9794 [Rehasher-rooster-3720] INFO org.infinispan.distribution.TransactionLoggerImpl - Starting transaction logging
We're the only member in the cluster; Don't invoke remotely.
We're the only member in the cluster; Don't invoke remotely.
9796 [Rehasher-rooster-3720] INFO org.infinispan.distribution.TransactionLoggerImpl - Stopping transaction logging
We're the only member in the cluster; Don't invoke remotely.
We're the only member in the cluster; Don't invoke remotely.
9796 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.JoinTask - Invalidating entries that have migrated across
9800 [pool-1-thread-1] INFO org.infinispan.factories.ComponentRegistry - Infinispan version: Infinispan 'Pagoa' 5.0.0.SNAPSHOT
9802 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.InvertedLeaveTask - Commencing rehash at rooster-3720, I am a state receiver
9803 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.InvertedLeaveTask - I rooster-3720 am pulling state from [rooster-3720]
We're the only member in the cluster; Don't invoke remotely.
9803 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.InvertedLeaveTask - I rooster-3720 received response []
9833 [Rehasher-rooster-3720] INFO org.infinispan.distribution.TransactionLoggerImpl - Stopping transaction logging
9833 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.InvertedLeaveTask - Invalidating entries that have migrated across
9962 [pool-1-thread-1] INFO org.hibernate.search.Version - Hibernate Search 3.2.1.Final
10002 [pool-1-thread-1] INFO org.hibernate.annotations.common.Version - Hibernate Commons Annotations 3.2.0.Final
10083 [pool-1-thread-1] DEBUG org.hibernate.search.impl.InitContext - Using default similarity implementation: org.apache.lucene.search.DefaultSimilarity
10395 [pool-1-thread-1] DEBUG org.hibernate.search.engine.DocumentBuilderIndexedEntity - Field selection in projections is set to true for entity org.infinispan.query.Pessoa.
10677 [pool-1-thread-1] DEBUG org.infinispan.query.backend.QueryHelper - Entered QueryHelper.applyProperties()
10684 [pool-1-thread-1] DEBUG org.infinispan.factories.ComponentRegistry - Looking in configuration for an instance of interface javax.transaction.TransactionManager that may have been injected from an external source.
10685 [pool-1-thread-1] DEBUG org.infinispan.query.backend.LocalQueryInterceptor - Entered LocalQueryInterceptor.init()
10685 [pool-1-thread-1] DEBUG org.infinispan.factories.ComponentRegistry - Looking in configuration for an instance of interface javax.transaction.TransactionManager that may have been injected from an external source.
10685 [pool-1-thread-1] DEBUG org.infinispan.query.backend.LocalQueryInterceptor - Entered LocalQueryInterceptor.init()
10736 [pool-1-thread-1] DEBUG org.infinispan.demo.InfinispanDemo - Updating cluster table with new member list [rooster-3720]
24005 [Timer-1,rooster-3720] DEBUG org.jgroups.protocols.MERGE2 - rooster-3720 found different views : [ozzy-22851|3], [rooster-3720|2]; sending up MERGE event with merge participants [ozzy-22851, rooster-3720].
Discovery results:
[rooster-3720]: [rooster-3720|2] [rooster-3720]
[ozzy-22851]: [ozzy-22851|3] [ozzy-22851]
24059 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS - determining merge leader from [ozzy-22851, rooster-3720]
24059 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS - I (rooster-3720) will be the leader. Starting the merge task for [ozzy-22851, rooster-3720]
24062 [MergeTask,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS - rooster-3720: sending MERGE_REQ to [ozzy-22851, rooster-3720]
24064 [OOB-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.STABLE - suspending message garbage collection
The other node(Ozzy) have this (not complete too):
45905 [Incoming-6,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS - ozzy-22851: view is MergeView::[rooster-3720|4] [rooster-3720, ozzy-22851], subgroups=[[ozzy-22851|3] [ozzy-22851], [rooster-3720|2] [rooster-3720]]
45905 [Incoming-6,ozzy-22851] DEBUG org.jgroups.protocols.FD_SOCK - VIEW_CHANGE received: [rooster-3720, ozzy-22851]
45905 [FD_SOCK pinger,ozzy-22851] DEBUG org.jgroups.protocols.FD_SOCK - ping_dest is rooster-3720, pingable_mbrs=[rooster-3720, ozzy-22851]
45905 [Incoming-6,ozzy-22851] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport - Received new cluster view: MergeView::[rooster-3720|4] [rooster-3720, ozzy-22851], subgroups=[[ozzy-22851|3] [ozzy-22851], [rooster-3720|2] [rooster-3720]]
45908 [Incoming-6,ozzy-22851] DEBUG org.infinispan.demo.InfinispanDemo - Updating cluster table with new member list [rooster-3720, ozzy-22851]
45909 [Incoming-6,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: installing view MergeView::[rooster-3720|4] [rooster-3720, ozzy-22851], subgroups=[[ozzy-22851|3] [ozzy-22851], [rooster-3720|2] [rooster-3720]]
45909 [Rehasher-ozzy-22851] DEBUG org.infinispan.distribution.JoinTask - Commencing rehash on node: ozzy-22851. Before start, dmi.joinComplete = true
45909 [Incoming-6,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS - ozzy-22851:
45910 [Rehasher-ozzy-22851] ERROR org.infinispan.distribution.JoinTask - Caught exception!
java.lang.IllegalStateException: Join cannot be complete without rehash to finish (node ozzy-22851 )
at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:82)
at org.infinispan.distribution.RehashTask.call(RehashTask.java:52)
at org.infinispan.distribution.RehashTask.call(RehashTask.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
45912 [Rehasher-ozzy-22851] INFO org.infinispan.distribution.TransactionLoggerImpl - Stopping transaction logging
45912 [Incoming-4,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.STABLE - resuming message garbage collection
46227 [OOB-4,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
51910 [OOB-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: suspect is ozzy-22851, completed false, flushOkSet {}, flushMembers []
51910 [OOB-5,ozzy-22851] DEBUG org.jgroups.blocks.RequestCorrelator - suspect=ozzy-22851
51960 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS - suspected members=[ozzy-22851], suspected_mbrs=[ozzy-22851]
51993 [Incoming-7,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: received START_FLUSH but I am not flush participant, not responding
51993 [Incoming-7,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS - ozzy-22851: view is [rooster-3720|5] [rooster-3720]
51994 [Incoming-7,ozzy-22851] WARN org.jgroups.protocols.pbcast.GMS - ozzy-22851: not member of view [rooster-3720|5] [rooster-3720]; discarding it
52304 [Incoming-7,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
57906 [OOB-6,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: suspect is rooster-3720, completed false, flushOkSet {}, flushMembers []
57907 [OOB-6,ozzy-22851] DEBUG org.jgroups.blocks.RequestCorrelator - suspect=rooster-3720
57957 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS - suspected members=[rooster-3720], suspected_mbrs=[rooster-3720]
57957 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS - members are [rooster-3720, ozzy-22851], coord=ozzy-22851: I'm the new coord !
57957 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS - ozzy-22851:
58007 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS - new=[], suspected=[rooster-3720], leaving=[], new view: [ozzy-22851|6] [ozzy-22851]
58007 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: flush coordinator is starting FLUSH with participants [ozzy-22851]
58007 [Incoming-7,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: received START_FLUSH, responded with FLUSH_COMPLETED to ozzy-22851
58008 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: FLUSH_COMPLETED from ozzy-22851, completed true, flushMembers [ozzy-22851], flushCompleted [ozzy-22851]
58009 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH - ozzy-22851: all FLUSH_COMPLETED received
58010 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS - ozzy-22851: view is [ozzy-22851|6] [ozzy-22851]
58010 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.NAKACK - removed rooster-3720 from xmit_table (not member anymore)
58010 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.FD_SOCK - VIEW_CHANGE received: [ozzy-22851]
58312 [Incoming-5,ozzy-22851] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport - Received new cluster view: [ozzy-22851|6] [ozzy-22851]
58312 [Incoming-5,ozzy-22851] INFO org.infinispan.distribution.DistributionManagerImpl - Detected a view change. Member list changed from [rooster-3720, ozzy-22851] to [ozzy-22851]
58312 [Incoming-5,ozzy-22851] INFO org.infinispan.distribution.DistributionManagerImpl - This is a LEAVE event! Node rooster-3720 has just left
58312 [Incoming-5,ozzy-22851] INFO org.infinispan.distribution.DistributionManagerImpl - I ozzy-22851 am participating in rehash
58312 [Incoming-5,ozzy-22851] INFO org.infinispan.distribution.TransactionLoggerImpl - Starting transaction logging