3 Replies Latest reply on Aug 12, 2010 5:32 AM by manik

    "suspected members" on cluster with 2 nodes

    israeldl

      Hi guys,

       

      I'm trying to run infinispan-demo-gui on 2 nodes (on the same network), but the cluster becomes unstable.

       

      It seems that the nodes recognize each other, start a cluster, and then one of them is declared suspect ... repeatedly. This loop never ends...so the log is pretty big.

       

       

      One of the nodes(rooster) have this log (not complete...):

       

      2993 [pool-1-thread-1] DEBUG org.jgroups.protocols.pbcast.GMS  - new_view=[ozzy-22851|1] [ozzy-22851, rooster-3720]
      2993 [pool-1-thread-1] DEBUG org.jgroups.protocols.pbcast.GMS  - rooster-3720: view is [ozzy-22851|1] [ozzy-22851, rooster-3720]
      2997 [pool-1-thread-1] DEBUG org.jgroups.protocols.FD_SOCK  - VIEW_CHANGE received: [ozzy-22851, rooster-3720]
      3009 [FD_SOCK pinger,rooster-3720] DEBUG org.jgroups.protocols.FD_SOCK  - ping_dest is ozzy-22851, pingable_mbrs=[ozzy-22851, rooster-3720]
      3013 [pool-1-thread-1] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport  - Received new cluster view: [ozzy-22851|1] [ozzy-22851, rooster-3720]
      3016 [pool-1-thread-1] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: installing view [ozzy-22851|1] [ozzy-22851, rooster-3720]
      3017 [pool-1-thread-1] DEBUG org.jgroups.protocols.pbcast.GMS  - rooster-3720:
      5018 [pool-1-thread-1] WARN org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: waiting for UNBLOCK timed out after 2000 ms
      5019 [pool-1-thread-1] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport  - Cache local address is rooster-3720, physical addresses are [192.168.1.2:50702]
      5020 [pool-1-thread-1] INFO org.infinispan.factories.GlobalComponentRegistry  - Infinispan version: Infinispan 'Pagoa' 5.0.0.SNAPSHOT
      5058 [pool-1-thread-1] DEBUG org.infinispan.interceptors.InterceptorChain  - Interceptor chain size: 6
      5059 [pool-1-thread-1] DEBUG org.infinispan.interceptors.InterceptorChain  - Interceptor chain is:
          >> org.infinispan.interceptors.InvocationContextInterceptor
          >> org.infinispan.interceptors.DistTxInterceptor
          >> org.infinispan.interceptors.NotificationInterceptor
          >> org.infinispan.interceptors.DistLockingInterceptor
          >> org.infinispan.interceptors.DistributionInterceptor
          >> org.infinispan.interceptors.CallInterceptor
      5106 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.JoinTask  - Commencing rehash on node: rooster-3720. Before start, dmi.joinComplete = false
      9009 [OOB-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: suspect is ozzy-22851, completed false, flushOkSet {}, flushMembers []
      9010 [OOB-1,rooster-3720] DEBUG org.jgroups.blocks.RequestCorrelator  - suspect=ozzy-22851
      9064 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS  - suspected members=[ozzy-22851], suspected_mbrs=[ozzy-22851]
      9065 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS  - members are [ozzy-22851, rooster-3720], coord=rooster-3720: I'm the new coord !
      9067 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS  - rooster-3720:
      9121 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS  - new=[], suspected=[ozzy-22851], leaving=[], new view: [rooster-3720|2] [rooster-3720]
      9123 [Incoming-2,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: received START_FLUSH, but not sending BLOCK up
      9125 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: flush coordinator  is starting FLUSH with participants [rooster-3720]
      9126 [Incoming-2,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: received START_FLUSH, responded with FLUSH_COMPLETED to rooster-3720
      9127 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: FLUSH_COMPLETED from rooster-3720, completed true, flushMembers [rooster-3720], flushCompleted [rooster-3720]
      9127 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: all FLUSH_COMPLETED received
      9130 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS  - rooster-3720: view is [rooster-3720|2] [rooster-3720]
      9131 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.NAKACK  - removed ozzy-22851 from xmit_table (not member anymore)
      9132 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.FD_SOCK  - VIEW_CHANGE received: [rooster-3720]
      9436 [FD_SOCK pinger,rooster-3720] DEBUG org.jgroups.protocols.FD_SOCK  - socket to null was closed gracefully
      9443 [Incoming-1,rooster-3720] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport  - Received new cluster view: [rooster-3720|2] [rooster-3720]
      9451 [Incoming-1,rooster-3720] INFO org.infinispan.distribution.DistributionManagerImpl  - Detected a view change.  Member list changed from [ozzy-22851, rooster-3720] to [rooster-3720]
      9451 [Incoming-1,rooster-3720] INFO org.infinispan.distribution.DistributionManagerImpl  - This is a LEAVE event!  Node ozzy-22851 has just left
      9453 [Incoming-1,rooster-3720] INFO org.infinispan.distribution.DistributionManagerImpl  - I rooster-3720 am participating in rehash
      9454 [Incoming-1,rooster-3720] INFO org.infinispan.distribution.TransactionLoggerImpl  - Starting transaction logging
      9458 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: installing view [rooster-3720|2] [rooster-3720]
      9459 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
      9460 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS  - rooster-3720: sending RESUME event
      9460 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: received RESUME, sending STOP_FLUSH to all
      9463 [Incoming-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.FLUSH  - rooster-3720: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
      9793 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.JoinTask  - Retrieved old consistent hash address list [ozzy-22851]
      9794 [Rehasher-rooster-3720] INFO org.infinispan.distribution.TransactionLoggerImpl  - Starting transaction logging
      We're the only member in the cluster; Don't invoke remotely.
      We're the only member in the cluster; Don't invoke remotely.
      9796 [Rehasher-rooster-3720] INFO org.infinispan.distribution.TransactionLoggerImpl  - Stopping transaction logging
      We're the only member in the cluster; Don't invoke remotely.
      We're the only member in the cluster; Don't invoke remotely.
      9796 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.JoinTask  - Invalidating entries that have migrated across
      9800 [pool-1-thread-1] INFO org.infinispan.factories.ComponentRegistry  - Infinispan version: Infinispan 'Pagoa' 5.0.0.SNAPSHOT
      9802 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.InvertedLeaveTask  - Commencing rehash at rooster-3720, I am a state receiver
      9803 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.InvertedLeaveTask  - I rooster-3720 am pulling state from [rooster-3720]
      We're the only member in the cluster; Don't invoke remotely.
      9803 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.InvertedLeaveTask  - I rooster-3720 received response []
      9833 [Rehasher-rooster-3720] INFO org.infinispan.distribution.TransactionLoggerImpl  - Stopping transaction logging
      9833 [Rehasher-rooster-3720] DEBUG org.infinispan.distribution.InvertedLeaveTask  - Invalidating entries that have migrated across
      9962 [pool-1-thread-1] INFO org.hibernate.search.Version  - Hibernate Search 3.2.1.Final
      10002 [pool-1-thread-1] INFO org.hibernate.annotations.common.Version  - Hibernate Commons Annotations 3.2.0.Final
      10083 [pool-1-thread-1] DEBUG org.hibernate.search.impl.InitContext  - Using default similarity implementation: org.apache.lucene.search.DefaultSimilarity
      10395 [pool-1-thread-1] DEBUG org.hibernate.search.engine.DocumentBuilderIndexedEntity  - Field selection in projections is set to true for entity org.infinispan.query.Pessoa.
      10677 [pool-1-thread-1] DEBUG org.infinispan.query.backend.QueryHelper  - Entered QueryHelper.applyProperties()
      10684 [pool-1-thread-1] DEBUG org.infinispan.factories.ComponentRegistry  - Looking in configuration for an instance of interface javax.transaction.TransactionManager that may have been injected from an external source.
      10685 [pool-1-thread-1] DEBUG org.infinispan.query.backend.LocalQueryInterceptor  - Entered LocalQueryInterceptor.init()
      10685 [pool-1-thread-1] DEBUG org.infinispan.factories.ComponentRegistry  - Looking in configuration for an instance of interface javax.transaction.TransactionManager that may have been injected from an external source.
      10685 [pool-1-thread-1] DEBUG org.infinispan.query.backend.LocalQueryInterceptor  - Entered LocalQueryInterceptor.init()
      10736 [pool-1-thread-1] DEBUG org.infinispan.demo.InfinispanDemo  - Updating cluster table with new member list [rooster-3720]
      24005 [Timer-1,rooster-3720] DEBUG org.jgroups.protocols.MERGE2  - rooster-3720 found different views : [ozzy-22851|3], [rooster-3720|2]; sending up MERGE event with merge participants [ozzy-22851, rooster-3720].
      Discovery results:
      [rooster-3720]: [rooster-3720|2] [rooster-3720]
      [ozzy-22851]: [ozzy-22851|3] [ozzy-22851]

       

      24059 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS  - determining merge leader from [ozzy-22851, rooster-3720]
      24059 [ViewHandler,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS  - I (rooster-3720) will be the leader. Starting the merge task for [ozzy-22851, rooster-3720]
      24062 [MergeTask,rooster-3720] DEBUG org.jgroups.protocols.pbcast.GMS  - rooster-3720: sending MERGE_REQ to [ozzy-22851, rooster-3720]
      24064 [OOB-1,rooster-3720] DEBUG org.jgroups.protocols.pbcast.STABLE  - suspending message garbage collection

       

       

      The other node(Ozzy) have this (not complete too):

       

       

      45905 [Incoming-6,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS   - ozzy-22851: view is MergeView::[rooster-3720|4] [rooster-3720,  ozzy-22851], subgroups=[[ozzy-22851|3] [ozzy-22851], [rooster-3720|2]  [rooster-3720]]
      45905 [Incoming-6,ozzy-22851] DEBUG org.jgroups.protocols.FD_SOCK  - VIEW_CHANGE received: [rooster-3720, ozzy-22851]
      45905  [FD_SOCK pinger,ozzy-22851] DEBUG org.jgroups.protocols.FD_SOCK  -  ping_dest is rooster-3720, pingable_mbrs=[rooster-3720, ozzy-22851]
      45905 [Incoming-6,ozzy-22851] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport   - Received new cluster view: MergeView::[rooster-3720|4] [rooster-3720,  ozzy-22851], subgroups=[[ozzy-22851|3] [ozzy-22851], [rooster-3720|2]  [rooster-3720]]
      45908 [Incoming-6,ozzy-22851] DEBUG org.infinispan.demo.InfinispanDemo  - Updating cluster table with new member list [rooster-3720, ozzy-22851]
      45909 [Incoming-6,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH   - ozzy-22851: installing view MergeView::[rooster-3720|4]  [rooster-3720, ozzy-22851], subgroups=[[ozzy-22851|3] [ozzy-22851],  [rooster-3720|2] [rooster-3720]]
      45909 [Rehasher-ozzy-22851] DEBUG org.infinispan.distribution.JoinTask  - Commencing rehash on node: ozzy-22851. Before start, dmi.joinComplete = true
      45909 [Incoming-6,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS  - ozzy-22851:
      45910 [Rehasher-ozzy-22851] ERROR org.infinispan.distribution.JoinTask  - Caught exception!
      java.lang.IllegalStateException: Join cannot be complete without rehash to finish (node ozzy-22851 )
          at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:82)
           at org.infinispan.distribution.RehashTask.call(RehashTask.java:52)
          at org.infinispan.distribution.RehashTask.call(RehashTask.java:1)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
           at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
           at java.lang.Thread.run(Thread.java:619)
      45912 [Rehasher-ozzy-22851] INFO org.infinispan.distribution.TransactionLoggerImpl  - Stopping transaction logging
      45912 [Incoming-4,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.STABLE  - resuming message garbage collection
      46227 [OOB-4,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH  - ozzy-22851: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
      51910 [OOB-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH  - ozzy-22851: suspect is ozzy-22851, completed false, flushOkSet {}, flushMembers []
      51910 [OOB-5,ozzy-22851] DEBUG org.jgroups.blocks.RequestCorrelator  - suspect=ozzy-22851
      51960 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS  - suspected members=[ozzy-22851], suspected_mbrs=[ozzy-22851]
      51993 [Incoming-7,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH  - ozzy-22851: received START_FLUSH but I am not flush participant, not responding
      51993 [Incoming-7,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS  - ozzy-22851: view is [rooster-3720|5] [rooster-3720]
      51994 [Incoming-7,ozzy-22851] WARN org.jgroups.protocols.pbcast.GMS  - ozzy-22851: not member of view [rooster-3720|5] [rooster-3720]; discarding it
      52304 [Incoming-7,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH  - ozzy-22851: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up
      57906 [OOB-6,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH  - ozzy-22851: suspect is rooster-3720, completed false, flushOkSet {}, flushMembers []
      57907 [OOB-6,ozzy-22851] DEBUG org.jgroups.blocks.RequestCorrelator  - suspect=rooster-3720
      57957 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS  - suspected members=[rooster-3720], suspected_mbrs=[rooster-3720]
      57957 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS  - members are [rooster-3720, ozzy-22851], coord=ozzy-22851: I'm the new coord !
      57957 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS  - ozzy-22851:
      58007 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS  - new=[], suspected=[rooster-3720], leaving=[], new view: [ozzy-22851|6] [ozzy-22851]
      58007 [ViewHandler,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH  - ozzy-22851: flush coordinator  is starting FLUSH with participants [ozzy-22851]
      58007 [Incoming-7,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH  - ozzy-22851: received START_FLUSH, responded with FLUSH_COMPLETED to ozzy-22851
      58008 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH  - ozzy-22851: FLUSH_COMPLETED from ozzy-22851, completed true, flushMembers [ozzy-22851], flushCompleted [ozzy-22851]
      58009 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.FLUSH  - ozzy-22851: all FLUSH_COMPLETED received
      58010 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.GMS  - ozzy-22851: view is [ozzy-22851|6] [ozzy-22851]
      58010 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.pbcast.NAKACK  - removed rooster-3720 from xmit_table (not member anymore)
      58010 [Incoming-5,ozzy-22851] DEBUG org.jgroups.protocols.FD_SOCK  - VIEW_CHANGE received: [ozzy-22851]
      58312 [Incoming-5,ozzy-22851] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport  - Received new cluster view: [ozzy-22851|6] [ozzy-22851]
      58312 [Incoming-5,ozzy-22851] INFO org.infinispan.distribution.DistributionManagerImpl  - Detected a view change.  Member list changed from [rooster-3720, ozzy-22851] to [ozzy-22851]
      58312 [Incoming-5,ozzy-22851] INFO org.infinispan.distribution.DistributionManagerImpl  - This is a LEAVE event!  Node rooster-3720 has just left
      58312 [Incoming-5,ozzy-22851] INFO org.infinispan.distribution.DistributionManagerImpl  - I ozzy-22851 am participating in rehash
      58312 [Incoming-5,ozzy-22851] INFO org.infinispan.distribution.TransactionLoggerImpl  - Starting transaction logging