0 Replies Latest reply on Nov 21, 2013 4:39 AM by R R

    The suspected member log keeps on coming and coordinator is not made when master(or coord) goes down

    R R Newbie

      Hello,

       

      We have a 4 node jboss 5.1.0 cluster. We have observed a strange behavior  during fail over:

       

      Some times when there is heavy load on all the nodes and master node is shut down, no other member node becomes coordinator and the following log keeps on coming indefinitely on the other nodes:

       

      DEBUG [org.jgroups.protocols.VERIFY_SUSPECT] diff=1500, mbr 10.XXX.XX.XXX:7600 is dead (passing up SUSPECT event)

      DEBUG [org.jgroups.protocols.VERIFY_SUSPECT] diff=1500, mbr 10.XXX.XX.XXX:7600 is dead (passing up SUSPECT event)

      DEBUG [org.jgroups.protocols.VERIFY_SUSPECT] diff=1500, mbr 10.XXX.XX.XXX:7600 is dead (passing up SUSPECT event)

      ..

      ..

      ..

       

      and

       

      INFO  [org.jboss.ha.framework.interfaces.HAPartition.sfhsw-fdfksdjbvsdvsdv9fsdfj-311ee0d88e9f] Suspected member: 10.XXX.XX.XXX:7600

      INFO  [org.jboss.ha.framework.interfaces.HAPartition.sfhsw-fdfksdjbvsdvsdv9fsdfj-311ee0d88e9f] Suspected member: 10.XXX.XX.XXX:7600

      INFO  [org.jboss.ha.framework.interfaces.HAPartition.sfhsw-fdfksdjbvsdvsdv9fsdfj-311ee0d88e9f] Suspected member: 10.XXX.XX.XXX:7600

      INFO  [org.jboss.ha.framework.interfaces.HAPartition.sfhsw-fdfksdjbvsdvsdv9fsdfj-311ee0d88e9f] Suspected member: 10.XXX.XX.XXX:7600

      INFO  [org.jboss.ha.framework.interfaces.HAPartition.sfhsw-fdfksdjbvsdvsdv9fsdfj-311ee0d88e9f] Suspected member: 10.XXX.XX.XXX:7600

      INFO  [org.jboss.ha.framework.interfaces.HAPartition.sfhsw-fdfksdjbvsdvsdv9fsdfj-311ee0d88e9f] Suspected member: 10.XXX.XX.XXX:7600

      ...

      ...

      ..

       

      Any idea why this can be happening?

      Any help is highly appreciated.

       

      Thanks