7 Replies Latest reply on May 16, 2007 10:06 AM by manik

    TimeoutException occuring on one node of jboss cache

    raghavendray

      hi,
      we are facing a problem with jboss cache which is running on 4 nodes with Websphere Application server 5.0.2. One of the nodes gives the following error.

      [2007-04-12 15:22:38,369] [WARN ][org.jboss.cache.interceptors.ReplicationInterceptor#390] - runPreparePhase() failed. Transaction is marked
      as rolled back
      org.jboss.cache.lock.TimeoutException: rsp=sender=toledo:39243, retval=null, received=false, suspected=false
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2186)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2208)
      at org.jboss.cache.interceptors.ReplicationInterceptor.runPreparePhase(ReplicationInterceptor.java:479)
      at org.jboss.cache.interceptors.ReplicationInterceptor$SynchronizationHandler.beforeCompletion(ReplicationInterceptor.java:383)
      at org.jboss.cache.interceptors.OrderedSynchronizationHandler.beforeCompletion(OrderedSynchronizationHandler.java:71)
      at com.ibm.ejs.jts.jta.TransactionImpl$SyncronizationWrapper.before_completion(TransactionImpl.java:1285)
      at com.ibm.ejs.jts.jts.CoordinatorImpl$Sync.beforePrepare(CoordinatorImpl.java:1250)
      at com.ibm.ejs.jts.tran.EventCallback.executeCallback(EventCallback.java:166)
      at com.ibm.ejs.jts.tran.EventCallback.executeCallbackTree(EventCallback.java:132)
      at com.ibm.ejs.jts.tran.EventPrepare.ExecuteBeforePrepareCallbacks(EventPrepare.java:307)
      at com.ibm.ejs.jts.tran.EventPrepare.event_LocalPrepareWork(EventPrepare.java:287)
      at com.ibm.ejs.jts.tran.EventPrepare.event_BecomeCoordinator(EventPrepare.java:970)
      at com.ibm.ejs.jts.tran.EventControl.event_EndTopLevel(EventControl.java:295)
      at com.ibm.ejs.jts.tran.TrecInterface.end(TrecInterface.java:94)
      at com.ibm.ejs.jts.jts.TerminatorImpl.commit(TerminatorImpl.java:105)
      at com.ibm.ejs.jts.jts.CurrentImpl.commit(CurrentImpl.java:250)
      at com.ibm.ejs.jts.jts.CurrentSet.commit(CurrentSet.java:316)
      at com.ibm.ejs.csi.TranStrategy.commit(TranStrategy.java:628)
      at com.ibm.ejs.csi.TranStrategy.postInvoke(TranStrategy.java:153)
      at com.ibm.ejs.csi.TransactionControlImpl.postInvoke(TransactionControlImpl.java:540)
      at com.ibm.ejs.container.EJSContainer.postInvoke(EJSContainer.java:3040)
      at com.ibm.ejs.container.EJSContainer.postInvoke(EJSContainer.java:2914)
      at com.orange.fraud.fpc.service.maintainprofile.EJSRemoteStatelessMaintainProfileBean_2efad162.startProfileCache(Unknown Source)
      at com.orange.fraud.fpc.service.maintainprofile._MaintainProfile_Stub.startProfileCache(Unknown Source)
      at com.orange.fraud.fpc.startup.FpcStartupServlet.init(FpcStartupServlet.java:47)
      at com.ibm.ws.webcontainer.servlet.StrictServletInstance.doInit(StrictServletInstance.java:82)
      at com.ibm.ws.webcontainer.servlet.StrictLifecycleServlet._init(StrictLifecycleServlet.java:147)
      at com.ibm.ws.webcontainer.servlet.PreInitializedServletState.init(StrictLifecycleServlet.java:270)
      at com.ibm.ws.webcontainer.servlet.StrictLifecycleServlet.init(StrictLifecycleServlet.java:113)
      at com.ibm.ws.webcontainer.servlet.ServletInstance.init(ServletInstance.java:189)
      at javax.servlet.GenericServlet.init(GenericServlet.java:258)
      at com.ibm.ws.webcontainer.webapp.WebAppServletManager.addServlet(WebAppServletManager.java:876)
      at com.ibm.ws.webcontainer.webapp.WebAppServletManager.loadServlet(WebAppServletManager.java:224)
      at com.ibm.ws.webcontainer.webapp.WebAppServletManager.loadAutoLoadServlets(WebAppServletManager.java:542)
      at com.ibm.ws.webcontainer.webapp.WebApp.loadServletManager(WebApp.java:1370)
      at com.ibm.ws.webcontainer.webapp.WebApp.init(WebApp.java:286)
      at com.ibm.ws.webcontainer.srt.WebGroup.loadWebApp(WebGroup.java:418)
      at com.ibm.ws.webcontainer.srt.WebGroup.init(WebGroup.java:220)
      at com.ibm.ws.webcontainer.WebContainer.addWebApplication(WebContainer.java:1051)
      at com.ibm.ws.runtime.component.WebContainerImpl.install(WebContainerImpl.java:136)
      at com.ibm.ws.runtime.component.WebContainerImpl.start(WebContainerImpl.java:356)
      at com.ibm.ws.runtime.component.ApplicationMgrImpl.start(ApplicationMgrImpl.java:419)
      at com.ibm.ws.runtime.component.DeployedApplicationImpl.fireDeployedObjectStart(DeployedApplicationImpl.java:808)
      at com.ibm.ws.runtime.component.DeployedModuleImpl.start(DeployedModuleImpl.java:351)
      at com.ibm.ws.runtime.component.DeployedApplicationImpl.start(DeployedApplicationImpl.java:578)
      at com.ibm.ws.runtime.component.ApplicationMgrImpl.startApplication(ApplicationMgrImpl.java:272)
      at com.ibm.ws.runtime.component.ApplicationMgrImpl.start(ApplicationMgrImpl.java:250)
      at com.ibm.ws.runtime.component.ContainerImpl.startComponents(ContainerImpl.java:543)
      at com.ibm.ws.runtime.component.ContainerImpl.start(ContainerImpl.java:418)
      at com.ibm.ws.runtime.component.ApplicationServerImpl.start(ApplicationServerImpl.java:134)
      at com.ibm.ws.runtime.component.ContainerImpl.startComponents(ContainerImpl.java:543)
      at com.ibm.ws.runtime.component.ContainerImpl.start(ContainerImpl.java:418)
      at com.ibm.ws.runtime.component.ServerImpl.start(ServerImpl.java:183)
      at com.ibm.ws.runtime.WsServer.start(WsServer.java:128)
      at com.ibm.ws.runtime.WsServer.main(WsServer.java:225)
      at java.lang.reflect.Method.invoke(Native Method)
      at com.ibm.ws.bootstrap.WSLauncher.run(WSLauncher.java:128)
      at java.lang.Thread.run(Thread.java:479)


      This happens only on one node when ever it is brought up or when the cache alone is restarted. We are using the cache in REPL_SYNC mode.

      Has someone faced similar problems with Jboss cache? any help is greatly appreciated.

      Thanks in advance
      Raghu

        • 1. Re: TimeoutException occuring on one node of jboss cache
          brian.stansberry

          This log indicates the node did not get a response from another node, toledo:39243. You need to figure out why not.

          In your configuration, make sure your "SyncReplTimeout" is a few seconds longer than "LockAcquisitionTimeout". This may not solve your problem, but if toledo:39243 isn't replying in time because it's waiting to acquire a lock, you'll at minimum get a more useful error message on the calling node.

          • 2. Re: TimeoutException occuring on one node of jboss cache
            raghavendray

            Hi Brian,

            Thanks for your response.

            We tried increasing the SyncReplTimeout to a value greater than LockAcquisitionTimeout as follows:

            <attribute name="SyncReplTimeout">30000</attribute>
             <attribute name="LockAcquisitionTimeout">20000</attribute>


            But still we are getting the same error. Not sure what is going on.
            Any idea what could be wrong?

            Thanks
            Raghu

            • 3. Re: TimeoutException occuring on one node of jboss cache
              brian.stansberry

              Is it the same stack trace?

              • 4. Re: TimeoutException occuring on one node of jboss cache
                raghavendray

                yes. it is the same.

                when Websphere was started with Debug enabled, we got the following message:

                [2007-04-16 14:32:12,027] [INFO ][org.jboss.cache.TreeCache#1253] - interceptor chain is:
                class org.jboss.cache.interceptors.CallInterceptor
                class org.jboss.cache.interceptors.LockInterceptor
                class org.jboss.cache.interceptors.UnlockInterceptor
                class org.jboss.cache.interceptors.ReplicationInterceptor
                [2007-04-16 14:32:12,032] [INFO ][org.jboss.cache.TreeCache#1094] - cache mode is REPL_SYNC
                [2007-04-16 14:32:23,108] [INFO ][org.jboss.cache.TreeCache#2977] - viewAccepted(): new members: [node1:48411, node2:46315][2007-04-16 14:33:24,380] [INFO ][com.orange.fraud.fpc.controller.ProfileCacheSingleton#177] - Loading profile rules into cach
                e, number of records identified are: 2109
                [2007-04-16 14:40:06,115] [WARN ][org.jboss.cache.TreeCache#3064] - status is 1 (not ACTIVE or PREPARING); returning null)
                [2007-04-16 14:40:20,306] [WARN ][org.jboss.cache.TreeCache#3064] - status is 1 (not ACTIVE or PREPARING); returning null)
                [2007-04-16 14:40:20,351] [WARN ][org.jboss.cache.TreeCache#3064] - status is 1 (not ACTIVE or PREPARING); returning null)


                The last line is being printed continuously and we had to stop the process inorder to not flood the log file. As you can see in the log (in bold) it is showing 2 nodes as new members. Node 2 is being restarted but node 1 has been running. There are 2 other nodes which are not mentioned here.
                I suspect there is something wrong in node 1 but not sure what exactly.

                Another thing to note is this is happening only on production environment. We are not able to reproduce it on any other environment.

                Thanks
                Raghu



                • 5. Re: TimeoutException occuring on one node of jboss cache
                  manik

                  What version of JBoss Cache are you using? Also see JBCACHE-982

                  • 6. Re: TimeoutException occuring on one node of jboss cache
                    raghavendray

                    we are using 1.2 version, running on Websphere 5.0.2

                    • 7. Re: TimeoutException occuring on one node of jboss cache
                      manik

                      That is a pretty old version of JBC. Is it at all possible to upgrade to something newer in the 1.x series? It is API compatible. (Remember to use the JGroups version that comes with JBoss Cache as this has been updated too)