4 Replies Latest reply on Aug 2, 2007 9:00 AM by belaban

    Session Replication deadlock

    stringaling

      Hi,

      We are using Jboss 4.0.5 in a cluster. We are experiencing deadlock within the Jboss Cache, which is causing out of memory problems on the server. The Jboss Cache Version is 1.4.1SP3 (bundled Cache for jboss 4.0.5). We only expirience this issue when BUDDY replication is enabled.

      The thread dump reveils a lot of threads waiting on a lock for jgroups..

      "http-0.0.0.0-9006-11" daemon prio=1 tid=0x086eb710 nid=0x719 waiting for monitor entry [0x87d44000..0x87d45600]
      at org.jgroups.protocols.UNICAST.down(UNICAST.java:265)
      - waiting to lock <0x9246a478> (a org.jgroups.protocols.UNICAST$Entry)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:283)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.pbcast.GMS.down(GMS.java:840)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.FC.sendCreditRequest(FC.java:567)
      at org.jgroups.protocols.FC.handleDownMessage(FC.java:390)
      at org.jgroups.protocols.FC.down(FC.java:320)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.protocols.FC.receiveDownEvent(FC.java:314)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.FRAG2.down(FRAG2.java:167)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:294)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:385)
      at org.jgroups.JChannel.down(JChannel.java:1231)
      at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:788)
      at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.passDown(MessageDispatcher.java:765)
      at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:295)
      at org.jgroups.blocks.GroupRequest.doExecute(GroupRequest.java:444)
      at org.jgroups.blocks.GroupRequest.execute(GroupRequest.java:193)
      at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:429)
      at org.jgroups.blocks.RpcDispatcher.callRemoteMethods(RpcDispatcher.java:192)
      at sun.reflect.GeneratedMethodAccessor177.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:585)
      at org.jboss.cache.TreeCache.callRemoteMethodsViaReflection(TreeCache.java:4435)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4391)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4344)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4455)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
      at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:119)
      at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:88)
      at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
      at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:365)
      at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:160)
      at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
      at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:167)
      at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5863)
      at org.jboss.cache.TreeCache.put(TreeCache.java:3759)
      at sun.reflect.GeneratedMethodAccessor186.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:585)
      at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
      at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
      at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
      at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
      at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
      at org.jboss.mx.util.MBeanProxyExt.invoke(MBeanProxyExt.java:210)
      at $Proxy57.put(Unknown Source)
      at org.jboss.web.tomcat.tc5.session.JBossCacheWrapper.put(JBossCacheWrapper.java:141)
      at org.jboss.web.tomcat.tc5.session.JBossCacheService.putSession(JBossCacheService.java:315)
      at org.jboss.web.tomcat.tc5.session.JBossCacheClusteredSession.processSessionRepl(JBossCacheClusteredSession.java:121)
      - locked <0x93ffa170> (a org.jboss.web.tomcat.tc5.session.SessionBasedClusteredSession)
      at org.jboss.web.tomcat.tc5.session.JBossCacheManager.processSessionRepl(JBossCacheManager.java:1093)
      at org.jboss.web.tomcat.tc5.session.JBossCacheManager.storeSession(JBossCacheManager.java:648)
      - locked <0x93ffa170> (a org.jboss.web.tomcat.tc5.session.SessionBasedClusteredSession)
      at org.jboss.web.tomcat.tc5.session.InstantSnapshotManager.snapshot(InstantSnapshotManager.java:49)
      at org.jboss.web.tomcat.tc5.session.ClusteredSessionValve.invoke(ClusteredSessionValve.java:98)
      at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)
      at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
      at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
      at org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:156)
      at org.apache.catalina.valves.FastCommonAccessLogValve.invoke(FastCommonAccessLogValve.java:495)
      at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
      at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
      at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
      at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
      at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)


      Here is the object that has the lock:

      http-0.0.0.0-9006-12" daemon prio=1 tid=0x8a563918 nid=0x71a waiting for monitor entry [0x87dc5000..0x87dc6480]
      at org.jgroups.stack.Retransmitter$Entry.(Retransmitter.java:272)
      at org.jgroups.stack.Retransmitter.add(Retransmitter.java:104)
      - locked <0x9246a670> (a java.util.LinkedList)
      at org.jgroups.stack.AckSenderWindow.add(AckSenderWindow.java:100)
      at org.jgroups.protocols.UNICAST.down(UNICAST.java:279)
      - locked <0x9246a478> (a org.jgroups.protocols.UNICAST$Entry)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:283)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.pbcast.GMS.down(GMS.java:840)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.FC.sendCreditRequest(FC.java:567)
      at org.jgroups.protocols.FC.handleDownMessage(FC.java:390)
      at org.jgroups.protocols.FC.down(FC.java:320)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.protocols.FC.receiveDownEvent(FC.java:314)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.FRAG2.down(FRAG2.java:167)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
      at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:294)
      at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
      at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:385)
      at org.jgroups.JChannel.down(JChannel.java:1231)
      at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:788)
      at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.passDown(MessageDispatcher.java:765)
      at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:295)
      at org.jgroups.blocks.GroupRequest.doExecute(GroupRequest.java:444)
      at org.jgroups.blocks.GroupRequest.execute(GroupRequest.java:193)
      at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:429)
      at org.jgroups.blocks.RpcDispatcher.callRemoteMethods(RpcDispatcher.java:192)
      at sun.reflect.GeneratedMethodAccessor177.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:585)
      at org.jboss.cache.TreeCache.callRemoteMethodsViaReflection(TreeCache.java:4435)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4391)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4344)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4455)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
      at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:119)
      at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:88)
      at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
      at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:365)
      at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:160)
      at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
      at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:167)
      at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5863)
      at org.jboss.cache.TreeCache.put(TreeCache.java:3759)
      ...


      There are some JIRA's for Jboss Cache that appear to have a similar stack trace (533, 1103), but they are Caceloader related ?

      Any help ?!?!

      Thanks,
      David

        • 1. Re: Session Replication deadlock
          belaban

          That's clearly a bug in JGroups. I think I fixed it in 2.4.x. Which version of JGroups are you running ?

          java -cp jgroups.jar org.jgroups.Version

          will tell you.

          If you use 2.4.1.SP4, this will disappear (probably 2.3 already fixes this issue too).

          • 2. Re: Session Replication deadlock
            stringaling

            Thank-you for the prompt reply.

            I will try upgrading jgroups.

            Would it be advisable to upgrade jboss-cache to SP4 as well?


            [devadm@dev01 lib]$ java -cp jgroups.jar org.jgroups.Version

            Version: 2.4.1
            CVS: $Id: Version.java,v 1.42.2.1 2006/12/04 13:57:06 belaban Exp $
            History: (see doc/history.txt for details)

            [devadm@dev01 lib]$ java -cp jboss-cache.jar org.jboss.cache.Version

            Version: 1.4.1.SP3
            Codename: Cayenne
            CVS: $Id: Version.java,v 1.18.2.11 2007/03/06 20:12:53 msurtani Exp $
            History: (see http://jira.jboss.com/jira/browse/JBCACHE for details)

            Cheers,
            David Stringer

            • 3. Re: Session Replication deadlock
              stringaling

              Moving to Jboss-Cache 1.4.1 SP4 and Jgroups 2.4.1SP3 seems to resolve this issue.

              There is an interesting memory usage pattern occuring though: After the server starts up we see a gradual increase in memory usage on one node in the cluster. It approaches the Maximum allocated to the instance, then drops off and behaves.

              • 4. Re: Session Replication deadlock
                belaban

                The deadlock you ran into was fixed on 9/11/06 in 2.4.1 SP1, so, yes, SP4 will of course contain the fix as well.