4 Replies Latest reply on Aug 20, 2009 12:40 PM by lowecg2004

    Cluster suddenly dies with NPE in ModCluster component

    lowecg2004

      Hi Jean-Frederic,

      All nodes in my cluster suddenly went down recently with the following log output:

      Node 2:

      2009-08-18 13:48:41,455 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-20,123.234.123.101:7600) Received new cluster view: [123.234.123.101:7600|14] [123.234.123.101:7600]
      2009-08-18 13:48:41,519 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-20,123.234.123.101:7600) Received new cluster view: [123.234.123.101:7600|14] [123.234.123.101:7600]
      2009-08-18 13:48:47,127 INFO [org.jboss.ha.framework.interfaces.HAPartition.SaPartition] (VERIFY_SUSPECT.TimerThread,SaPartition,123.234.123.101:7600) Suspected member: 123.234.123.100:7600
      2009-08-18 13:48:47,331 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.SaPartition] (Incoming-4,123.234.123.101:7600) New cluster view for partition SaPartition (id: 14, delta: -1) : [123.234.123.101:1099]
      2009-08-18 13:48:47,383 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) I am (123.234.123.101:1099) received membershipChanged event:
      2009-08-18 13:48:47,383 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) Dead members: 1 ([123.234.123.100:1099])
      2009-08-18 13:48:47,383 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) New Members : 0 ([])
      2009-08-18 13:48:47,383 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) All Members : 1 ([123.234.123.101:1099])
      2009-08-18 13:48:59,928 WARN [org.jgroups.protocols.pbcast.GMS] (Incoming-9,123.234.123.101:7600) merge was supposed to be cancelled at merge participant 123.234.123.101:7600 (merge_id=[123.234.123.100:7600|1250603331953]), but it is not since merge ids do not match
      2009-08-18 13:53:31,011 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,123.234.123.101:7600) sender 123.234.123.100:7600 not found in xmit_table
      2009-08-18 13:53:31,011 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,123.234.123.101:7600) range is null
      2009-08-18 13:53:31,051 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-6,123.234.123.101:7600) Received new cluster view: MergeView::[123.234.123.100:7600|15] [123.234.123.100:7600, 123.234.123.101:7600], subgroups=[[123.234.123.100:7600|14] [123.234.123.100:7600], [123.234.123.101:7600|14] [123.234.123.101:7600]]
      2009-08-18 13:53:37,168 WARN [org.jgroups.protocols.pbcast.NAKACK] (Incoming-11,123.234.123.101:7600) 123.234.123.101:7600] discarded message from non-member 123.234.123.100:7600, my view is [123.234.123.101:7600|14] [123.234.123.101:7600]
      2009-08-18 13:53:37,264 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-13,123.234.123.101:7600) sender 123.234.123.100:7600 not found in xmit_table
      2009-08-18 13:53:37,264 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-13,123.234.123.101:7600) range is null
      2009-08-18 13:53:37,292 INFO [org.jboss.ha.framework.interfaces.HAPartition.SaPartition] (Incoming-13,123.234.123.101:7600) New cluster view for partition SaPartition: 15 ([123.234.123.100:1099, 123.234.123.101:1099] delta: 1)
      2009-08-18 13:53:37,296 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) Merging partitions...
      2009-08-18 13:53:37,296 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) Dead members: 0
      2009-08-18 13:53:37,296 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) Originating groups: [[123.234.123.101:7600|14] [123.234.123.101:7600], [123.234.123.100:7600|14] [123.234.123.100:7600]]
      2009-08-18 13:53:37,792 ERROR [org.apache.catalina.core.ContainerBase] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) Exception invoking periodic operation:
      java.lang.NullPointerException
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.updateClusterStatus(HAModClusterService.java:896)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:836)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:780)
       at org.jboss.modcluster.CatalinaEventHandlerAdapter.lifecycleEvent(CatalinaEventHandlerAdapter.java:163)
       at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
       at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1348)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1612)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1601)
       at java.lang.Thread.run(Thread.java:619)
      2009-08-18 13:53:42,012 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-11635,123.234.123.101:7600) 123.234.123.101:7600] discarded message from non-member 123.234.123.100:7600, my view is [123.234.123.101:7600|14] [123.234.123.101:7600]
      2009-08-18 13:53:42,180 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-11635,123.234.123.101:7600) 123.234.123.101:7600] discarded message from non-member 123.234.123.100:7600, my view is [123.234.123.101:7600|14] [123.234.123.101:7600]
      2009-08-18 13:53:47,824 ERROR [org.apache.catalina.core.ContainerBase] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) Exception invoking periodic operation:
      java.lang.NullPointerException
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.updateClusterStatus(HAModClusterService.java:896)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:836)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:780)
       at org.jboss.modcluster.CatalinaEventHandlerAdapter.lifecycleEvent(CatalinaEventHandlerAdapter.java:163)
       at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
       at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1348)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1612)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1601)
       at java.lang.Thread.run(Thread.java:619)
      2009-08-18 13:53:50,889 WARN [org.jgroups.protocols.pbcast.NAKACK] (Incoming-3,123.234.123.101:7600) 123.234.123.101:7600] discarded message from non-member 123.234.123.100:7600, my view is [123.234.123.101:7600|14] [123.234.123.101:7600]
      2009-08-18 13:53:50,985 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,123.234.123.101:7600) sender 123.234.123.100:7600 not found in xmit_table
      2009-08-18 13:53:50,985 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,123.234.123.101:7600) range is null
      2009-08-18 13:53:51,013 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-6,123.234.123.101:7600) Received new cluster view: MergeView::[123.234.123.100:7600|15] [123.234.123.100:7600, 123.234.123.101:7600], subgroups=[[123.234.123.101:7600|14] [123.234.123.101:7600], [123.234.123.100:7600|14] [123.234.123.100:7600]]
      2009-08-18 13:53:57,861 ERROR [org.apache.catalina.core.ContainerBase] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) Exception invoking periodic operation:
      java.lang.NullPointerException
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.updateClusterStatus(HAModClusterService.java:896)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:836)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:780)
       at org.jboss.modcluster.CatalinaEventHandlerAdapter.lifecycleEvent(CatalinaEventHandlerAdapter.java:163)
       at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
       at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1348)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1612)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1601)
       at java.lang.Thread.run(Thread.java:619)
      2009-08-18 13:54:07,894 ERROR [org.apache.catalina.core.ContainerBase] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) Exception invoking periodic operation:
      java.lang.NullPointerException
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.updateClusterStatus(HAModClusterService.java:896)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:836)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:780)
       at org.jboss.modcluster.CatalinaEventHandlerAdapter.lifecycleEvent(CatalinaEventHandlerAdapter.java:163)
       at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
       at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1348)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1612)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1601)
       at java.lang.Thread.run(Thread.java:619)
      2009-08-18 13:54:17,926 ERROR [org.apache.catalina.core.ContainerBase] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) Exception invoking periodic operation:
      java.lang.NullPointerException
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.updateClusterStatus(HAModClusterService.java:896)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:836)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:780)
       at org.jboss.modcluster.CatalinaEventHandlerAdapter.lifecycleEvent(CatalinaEventHandlerAdapter.java:163)
       at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
       at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1348)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1612)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1601)
       at java.lang.Thread.run(Thread.java:619)
      



      On the other node, there is similar output around the same time:

      Node 1:

      2009-08-18 13:53:30,981 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-5,123.234.123.100:7600) sender 123.234.123.101:7600 not found in xmit_table
      2009-08-18 13:53:30,981 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-5,123.234.123.100:7600) range is null
      2009-08-18 13:53:30,981 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-5,123.234.123.100:7600) Received new cluster view: MergeView::[123.234.123.100:7600|15] [123.234.123.100:7600, 123.234.123.101:7600], subgroups=[[123.234.123.100:7600|14] [123.234.123.100:7600], [123.234.123.101:7600|14] [123.234.123.101:7600]]
      2009-08-18 13:53:37,153 WARN [org.jgroups.protocols.pbcast.NAKACK] (Incoming-7,123.234.123.100:7600) 123.234.123.100:7600] discarded message from non-member 123.234.123.101:7600, my view is [123.234.123.100:7600|14] [123.234.123.100:7600]
      2009-08-18 13:53:37,265 WARN [org.jgroups.protocols.pbcast.NAKACK] (Incoming-11,123.234.123.100:7600) 123.234.123.100:7600] discarded message from non-member 123.234.123.101:7600, my view is [123.234.123.100:7600|14] [123.234.123.100:7600]
      2009-08-18 13:53:37,265 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-12,123.234.123.100:7600) sender 123.234.123.101:7600 not found in xmit_table
      2009-08-18 13:53:37,265 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-12,123.234.123.100:7600) range is null
      2009-08-18 13:53:37,297 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.SaPartition] (Incoming-12,123.234.123.100:7600) New cluster view for partition SaPartition (id: 15, delta: 1) : [123.234.123.100:1099, 123.234.123.101:1099]
      2009-08-18 13:53:37,297 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) Merging partitions...
      2009-08-18 13:53:37,297 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) Dead members: 0
      2009-08-18 13:53:37,297 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.SaPartition] (AsynchViewChangeHandler Thread) Originating groups: [[123.234.123.101:7600|14] [123.234.123.101:7600], [123.234.123.100:7600|14] [123.234.123.100:7600]]
      2009-08-18 13:53:46,801 ERROR [org.apache.catalina.core.ContainerBase] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) Exception invoking periodic operation:
      java.lang.NullPointerException
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.updateClusterStatus(HAModClusterService.java:896)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:836)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:780)
       at org.jboss.modcluster.CatalinaEventHandlerAdapter.lifecycleEvent(CatalinaEventHandlerAdapter.java:163)
       at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
       at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1348)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1612)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1601)
       at java.lang.Thread.run(Thread.java:619)
      2009-08-18 13:53:50,886 WARN [org.jgroups.protocols.pbcast.NAKACK] (Incoming-20,123.234.123.100:7600) 123.234.123.100:7600] discarded message from non-member 123.234.123.101:7600, my view is [123.234.123.100:7600|14] [123.234.123.100:7600]
      2009-08-18 13:53:50,986 WARN [org.jgroups.protocols.pbcast.NAKACK] (Incoming-4,123.234.123.100:7600) 123.234.123.100:7600] discarded message from non-member 123.234.123.101:7600, my view is [123.234.123.100:7600|14] [123.234.123.100:7600]
      2009-08-18 13:53:50,986 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,123.234.123.100:7600) sender 123.234.123.101:7600 not found in xmit_table
      2009-08-18 13:53:50,986 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,123.234.123.100:7600) range is null
      2009-08-18 13:53:51,022 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-6,123.234.123.100:7600) Received new cluster view: MergeView::[123.234.123.100:7600|15] [123.234.123.100:7600, 123.234.123.101:7600], subgroups=[[123.234.123.101:7600|14] [123.234.123.101:7600], [123.234.123.100:7600|14] [123.234.123.100:7600]]
      2009-08-18 13:53:56,834 ERROR [org.apache.catalina.core.ContainerBase] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) Exception invoking periodic operation:
      java.lang.NullPointerException
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.updateClusterStatus(HAModClusterService.java:896)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:836)
       at org.jboss.modcluster.ha.HAModClusterService$ClusteredCatalinaEventHandler.status(HAModClusterService.java:780)
       at org.jboss.modcluster.CatalinaEventHandlerAdapter.lifecycleEvent(CatalinaEventHandlerAdapter.java:163)
       at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
       at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1348)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1612)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1601)
       at java.lang.Thread.run(Thread.java:619)
      




      Where in both nodes, the NullPointerException continued to repeated every 10 seconds.

      Have you seen this before?

      Cheers,

      Chris