9 Replies Latest reply on Nov 5, 2007 5:33 PM by Brian Stansberry

    purgeDeadMembers Failing - what is the cause?

    Cody Addison Newbie

      Hello everyone,

      I have yet another issue with my cluster. I have seen this issue in other forum entries, however I am unable to locate the definite answer for it's cause.

      I have a cluster which consists of 6 nodes. I have implemented fail-over and session replication within the cluster. My problem is, that when specific nodes in my cluster are stopped, their session information is not accurate when fail-over occurs.

      I have investigated into each nodes logs and I have discovered that not all of my nodes have this problem.

      Here is the entry from a node which does have the problem, as I called the shutdown.

      2007-11-04 21:15:04,784 DEBUG [org.jboss.ha.framework.server.HARMIServerImpl$RefreshProxiesHATarget] replicantsChanged 'HAJNDI' to 2 (intra-view id: 56409408)
      2007-11-04 21:15:04,860 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.jboss1] New cluster view for partition jboss1 (id: 27, delta: -1) : [192.168.202.x:1099, 192.168.202.x:1099]
      2007-11-04 21:15:04,860 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.jboss1] dead members: [192.168.202.x:1099]
      2007-11-04 21:15:04,860 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.jboss1] membership changed from 3 to 2
      2007-11-04 21:15:04,862 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.jboss1] Begin notifyListeners, viewID: 27
      2007-11-04 21:15:04,863 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] I am (192.168.202.x:1099) received membershipChanged event:
      2007-11-04 21:15:04,863 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] Dead members: 1 ([192.168.202.x:1099])
      2007-11-04 21:15:04,864 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] New Members : 0 ([])
      2007-11-04 21:15:04,864 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] All Members : 2 ([192.168.202.x:1099, 192.168.202.x:1099])
      2007-11-04 21:15:04,864 DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] purgeDeadMembers, [192.168.202.x:1099]
      2007-11-04 21:15:04,864 DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] trying to remove deadMember 192.168.202.x:1099 for key DCacheBridge-DefaultJGBridge
      2007-11-04 21:15:04,864 DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] 192.168.202.11:1099 was NOT removed!!!
      2007-11-04 21:15:04,864 DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] trying to remove deadMember 192.168.202.x:1099 for key jboss.ha:service=HASingletonDeployer
      2007-11-04 21:15:04,864 DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] 192.168.202.11:1099 was NOT removed!!!
      2007-11-04 21:15:04,864 DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] trying to remove deadMember 192.168.202.x:1099 for key HAJNDI
      2007-11-04 21:15:04,864 DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] 192.168.202.11:1099 was NOT removed!!!
      2007-11-04 21:15:04,864 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.jboss1] End notifyListeners, viewID: 27
      2007-11-04 21:15:05,625 INFO [org.jboss.cache.TreeCache] viewAccepted(): [192.168.202.x:33173|27] [192.168.202.x:33173, 192.168.202.x:33183]
      2007-11-04 21:15:07,200 DEBUG [org.jboss.web.tomcat.service.session.JBossCacheManager] Looking for sessions that have expired ...
      2007-11-04 21:15:14,362 INFO [org.jboss.cache.TreeCache] viewAccepted(): [192.168.202.x:33180|28] [192.168.202.x:33180]
      2007-11-04 21:15:14,470 DEBUG [org.jboss.ha.singleton.HASingletonController] partitionTopologyChanged, isElectedNewMaster=true, isMasterNode=true, viewID=-424447
      2007-11-04 21:15:14,500 DEBUG [org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge] Updating list of invalidation groups that are bridged...
      
      


      Next I shutdown another node, which does not have the problem.

      2007-11-04 21:15:14,500 DEBUG [org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge] ... nothing needs to be bridged.
      2007-11-04 21:15:14,501 DEBUG [org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge] The list of replicant for the JG bridge has changed, computing and updating local info...
      2007-11-04 21:15:14,501 DEBUG [org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge] ... No bridge info was associated to this node
      2007-11-04 21:15:14,731 INFO [org.jboss.cache.TreeCache] viewAccepted(): [192.168.202.x:33178|28] [192.168.202.x:33178]
      2007-11-04 21:15:14,744 DEBUG [org.jboss.ha.framework.server.HARMIServerImpl$RefreshProxiesHATarget] replicantsChanged 'HAJNDI' to 1 (intra-view id: -424447)
      2007-11-04 21:15:15,078 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.jboss1] New cluster view for partition jboss1 (id: 28, delta: -1) : [192.168.202.x:1099]
      2007-11-04 21:15:15,078 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.jboss1] dead members: [192.168.202.x:1099]
      2007-11-04 21:15:15,079 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.jboss1] membership changed from 2 to 1
      2007-11-04 21:15:15,081 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.jboss1] Begin notifyListeners, viewID: 28
      2007-11-04 21:15:15,081 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] I am (192.168.202.x:1099) received membershipChanged event:
      2007-11-04 21:15:15,081 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] Dead members: 1 ([192.168.202.x:1099])
      2007-11-04 21:15:15,081 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] New Members : 0 ([])
      2007-11-04 21:15:15,081 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] All Members : 1 ([192.168.202.x:1099])
      2007-11-04 21:15:15,081 DEBUG [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.jboss1] purgeDeadMembers, [192.168.202.x:1099]
      2007-11-04 21:15:15,081 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.jboss1] End notifyListeners, viewID: 28
      2007-11-04 21:15:15,571 INFO [org.jboss.cache.TreeCache] viewAccepted(): [192.168.202.x:33173|28] [192.168.202.x:33173]
      
      



      I do not understand why one node is not able to be removed and the other is. My two configurations are the same. "DistributedReplicantManager" is able to remove one node, and not the other, what causes this?

      Could anyone please advise as to what my problem may be?

      Thank you in advance.