0 Replies Latest reply on Mar 15, 2006 2:39 PM by jcprout

    Cluster error: failed to replicate session

    jcprout

      We had a melt-down on our production JBoss cluster this morning. The errors being logged (1000s of times) were this:



      2006-03-15 07:31:41,612 ERROR [org.jboss.cache.lock.IdentityLock] write lock for /JSESSION/localhost/sunOLM2/hxCHnkvAh4ZboOrlmGqzSg** could not be acquired after 15000 ms. Locks: Read lock owners: []
      Write lock owner: <g1apiapp4:54077>:49603
      (caller=<g1apiapp5:58563>:119341, lock info: write owner=<g1apiapp4:54077>:49603 (activeReaders=0, activeWriter=Thread[MessageDispatcher up processing thread,5,jboss], waitingReaders=0, waitingWriters=0, waitingUpgrader=0))
      2006-03-15 07:31:41,614 INFO [STDOUT] org.jboss.cache.lock.TimeoutException: write lock for /JSESSION/localhost/sunOLM2/hxCHnkvAh4ZboOrlmGqzSg** could not be acquired after 15000 ms. Locks: Read lock owners: []
      Write lock owner: <g1apiapp4:54077>:49603
      (caller=<g1apiapp5:58563>:119341, lock info: write owner=<g1apiapp4:54077>:49603 (activeReaders=0, activeWriter=Thread[MessageDispatcher up processing thread,5,jboss], waitingReaders=0, waitingWriters=0, waitingUpgrader=0))
      2006-03-15 07:31:41,616 INFO [STDOUT] at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:174)
      2006-03-15 07:31:41,617 INFO [STDOUT] at org.jboss.cache.Node.acquireWriteLock(Node.java:518)
      2006-03-15 07:31:41,618 INFO [STDOUT] at org.jboss.cache.Node.acquire(Node.java:475)
      2006-03-15 07:31:41,618 INFO [STDOUT] at org.jboss.cache.interceptors.LockInterceptor.lock(LockInterceptor.java:242)
      2006-03-15 07:31:41,619 INFO [STDOUT] at org.jboss.cache.interceptors.LockInterceptor.invoke(LockInterceptor.java:153)
      2006-03-15 07:31:41,619 INFO [STDOUT] at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:41)
      2006-03-15 07:31:41,620 INFO [STDOUT] at org.jboss.cache.interceptors.UnlockInterceptor.invoke(UnlockInterceptor.java:35)
      2006-03-15 07:31:41,620 INFO [STDOUT] at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:41)
      2006-03-15 07:31:41,621 INFO [STDOUT] at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:54)
      2006-03-15 07:31:41,621 INFO [STDOUT] at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3116)
      2006-03-15 07:31:41,622 INFO [STDOUT] at org.jboss.cache.TreeCache.put(TreeCache.java:1762)
      2006-03-15 07:31:41,623 INFO [STDOUT] at sun.reflect.GeneratedMethodAccessor298.invoke(Unknown Source)
      2006-03-15 07:31:41,623 INFO [STDOUT] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      2006-03-15 07:31:41,624 INFO [STDOUT] at java.lang.reflect.Method.invoke(Method.java:585)
      2006-03-15 07:31:41,624 INFO [STDOUT] at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:141)
      2006-03-15 07:31:41,625 INFO [STDOUT] at org.jboss.mx.server.Invocation.dispatch(Invocation.java:80)
      2006-03-15 07:31:41,625 INFO [STDOUT] at org.jboss.mx.server.Invocation.invoke(Invocation.java:72)
      2006-03-15 07:31:41,625 INFO [STDOUT] at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:245)
      2006-03-15 07:31:41,626 INFO [STDOUT] at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:644)
      2006-03-15 07:31:41,626 INFO [STDOUT] at org.jboss.mx.util.MBeanProxyExt.invoke(MBeanProxyExt.java:176)
      2006-03-15 07:31:41,627 INFO [STDOUT] at $Proxy85.put(Unknown Source)
      2006-03-15 07:31:41,628 INFO [STDOUT] at org.jboss.web.tomcat.tc5.session.JBossCacheService._put(JBossCacheService.java:433)
      2006-03-15 07:31:41,628 INFO [STDOUT] at org.jboss.web.tomcat.tc5.session.JBossCacheService.putSession(JBossCacheService.java:229)
      2006-03-15 07:31:41,629 INFO [STDOUT] at org.jboss.web.tomcat.tc5.session.SessionBasedClusteredSession.processSessionRepl(SessionBasedClusteredSession.java:165)
      2006-03-15 07:31:41,629 INFO [STDOUT] at org.jboss.web.tomcat.tc5.session.JBossCacheManager.processSessionRepl(JBossCacheManager.java:606)
      2006-03-15 07:31:41,630 INFO [STDOUT] at org.jboss.web.tomcat.tc5.session.JBossCacheManager.storeSession(JBossCacheManager.java:375)
      2006-03-15 07:31:41,630 INFO [STDOUT] at org.jboss.web.tomcat.tc5.session.InstantSnapshotManager.snapshot(InstantSnapshotManager.java:38)
      2006-03-15 07:31:41,631 INFO [STDOUT] at org.jboss.web.tomcat.tc5.session.ClusteredSessionValve.invoke(ClusteredSessionValve.java:91)
      etc...


      and this:




      2006-03-15 07:31:41,638 ERROR [org.jboss.web.tomcat.tc5.session.JBossCacheManager] processSessionRepl: failed with exception: java.lang.RuntimeException: JBossCacheService: exception occurred in cache put after retry ...
      2006-03-15 07:31:41,639 WARN [org.jboss.web.tomcat.tc5.session.InstantSnapshotManager] Failed to replicate sessionID:hxCHnkvAh4ZboOrlmGqzSg**
      org.jboss.util.NestedRuntimeException: JBossCacheManager.processSessionRepl(): failed to replicate session.; -
      nested throwable: (java.lang.RuntimeException: JBossCacheService: exception occurred in cache put after retry ... )
      at org.jboss.web.tomcat.tc5.session.JBossCacheManager.processSessionRepl(JBossCacheManager.java:622)
      at org.jboss.web.tomcat.tc5.session.JBossCacheManager.storeSession(JBossCacheManager.java:375)
      at org.jboss.web.tomcat.tc5.session.InstantSnapshotManager.snapshot(InstantSnapshotManager.java:38)
      at org.jboss.web.tomcat.tc5.session.ClusteredSessionValve.invoke(ClusteredSessionValve.java:91)
      etc.


      Does anyone know what causes these exceptions and if there's anything I can do to stop them from happening again?

      We're running JBoss 4.0.3 with Java 5.0 in a two node cluster (soon to increase). At present, the cluster is configured to NOT use sticky-session, although we'll be converting to sticky-session this weekend.

      Is it possible that the non-sticky-session configuration was a cause of this problem?

      Both cluster machines were under very heavy load (constant 90%+ utilization) when this problem happened.

      The errors cleared when I restarted the problem JBoss server

      Thanks in advance

      John