7 Replies Latest reply on Sep 4, 2004 4:48 PM by belaban

    TimeOutException using 1.1 with fragmented cache

    smievoll

      I'm trying to set up a fault tolerant JBossCache configuration in a WAN environment, but am experiencing a few difficulties. The current setup is 4 machines that is to be split between two locations, but currently they're connected together in a test configuration using two simple switches.

      After upgrading to JC 1.1 I started getting the following TimeOutException after disconecting the cable between the switches and so creating two separate partitions of the cluster.

      What is attempted to be done is the creation of two nodes in the TreeCache initiated by 10.0.0.11. The nodes should be replicated to 10.0.0.12 and it seems it's when this is beeing done the exceptions occurs.

      The first log-excerpt is from my own code on 10.0.0.11. This is after both caches have accepted the new view configuration (see JBossCache logs below). Please ignore the Axis stuff, it's there because runtime exceptions isn't caught anywhere below. (I see from another thread that your working on your use of exceptions :-). From the JC logs it seems the two nodes are eventually created in 10.0.0.12, but each creation causes an exception in the initiator (10.0.0.11).

      On reconnection of all four machines and subsequent merge of views this problem disappears. I'm not able to fully understand what happens; is it the remote call on 10.0.0.11 it self that times out, or is it 10.0.0.12 that's reporting a timeout. The clocks on the two machines are syncronized.

      Is this intended behaviour? Would increasing the timeout values on locks help? I'm currently using these values:

       <attribute name="SyncReplTimeout">15000</attribute>
      
       <attribute name="LockAcquisitionTimeout">15000</attribute>
      
       <attribute name="LockLeaseTimeout">20000</attribute>
      


      I'm running jGroups with a TCP/TCPPING/MERGE2 configuration.

      Thanks.

      2004-08-31 15:11:09,031 WARN [SimpleAxisServlet] [-] "Invocation of the axis engine failed"
      AxisFault
       faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
       faultSubcode:
       faultString: org.jboss.util.NestedRuntimeException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false; - nested throwable:
      (org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false)
       faultActor:
       faultNode:
       faultDetail:
       {http://xml.apache.org/axis/}stackTrace: org.jboss.util.NestedRuntimeException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false; - nested throwable:
      (org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3179)
       at org.jboss.cache.TreeCache.put(TreeCache.java:1709)
       at no.feide.moria.store.MoriaCacheStore.insertIntoStore(MoriaCacheStore.java:849)
       at no.feide.moria.store.MoriaCacheStore.createAuthnAttempt(MoriaCacheStore.java:283)
       at no.feide.moria.controller.MoriaController.initiateAuthentication(MoriaController.java:593)
       at no.feide.moria.webservices.v1_1.Authentication.initiateAuthentication(Authentication.java:102)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:324)
       at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:402)
       at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:309)
       at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:333)
       at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:71)
       at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:150)
       at org.apache.axis.SimpleChain.invoke(SimpleChain.java:120)
       at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:481)
       at org.apache.axis.server.AxisServer.invoke(AxisServer.java:323)
       at no.feide.moria.servlet.SimpleAxisServlet.doPost(SimpleAxisServlet.java:327)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:165)
       at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:339)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:103)
       at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:113)
       at com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:190)
       at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:177)
       at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:221)
       at com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:392)
       at com.caucho.server.port.TcpConnection.run(TcpConnection.java:323)
       at com.caucho.util.ThreadPool.runTasks(ThreadPool.java:430)
       at com.caucho.util.ThreadPool.run(ThreadPool.java:377)
       at java.lang.Thread.run(Thread.java:534)
      Caused by: org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2145)
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2167)
       at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:89)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:39)
       at org.jboss.cache.interceptors.TransactionInterceptor.invoke(TransactionInterceptor.java:53)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3176)
       ... 30 more
      
      
      org.jboss.util.NestedRuntimeException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false; - nested throwable:
      (org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false)
       at org.apache.axis.AxisFault.makeFault(AxisFault.java:129)
       at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:343)
       at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:71)
       at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:150)
       at org.apache.axis.SimpleChain.invoke(SimpleChain.java:120)
       at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:481)
       at org.apache.axis.server.AxisServer.invoke(AxisServer.java:323)
       at no.feide.moria.servlet.SimpleAxisServlet.doPost(SimpleAxisServlet.java:327)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:165)
       at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:339)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:103)
       at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:113)
       at com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:190)
       at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:177)
       at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:221)
       at com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:392)
       at com.caucho.server.port.TcpConnection.run(TcpConnection.java:323)
       at com.caucho.util.ThreadPool.runTasks(ThreadPool.java:430)
       at com.caucho.util.ThreadPool.run(ThreadPool.java:377)
       at java.lang.Thread.run(Thread.java:534)
      Caused by: org.jboss.util.NestedRuntimeException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false; - nested throwable:
      (org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3179)
       at org.jboss.cache.TreeCache.put(TreeCache.java:1709)
       at no.feide.moria.store.MoriaCacheStore.insertIntoStore(MoriaCacheStore.java:849)
       at no.feide.moria.store.MoriaCacheStore.createAuthnAttempt(MoriaCacheStore.java:283)
       at no.feide.moria.controller.MoriaController.initiateAuthentication(MoriaController.java:593)
       at no.feide.moria.webservices.v1_1.Authentication.initiateAuthentication(Authentication.java:102)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:324)
       at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:402)
       at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:309)
       at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:333)
       ... 18 more
      Caused by: org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2145)
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2167)
       at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:89)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:39)
       at org.jboss.cache.interceptors.TransactionInterceptor.invoke(TransactionInterceptor.java:53)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3176)
       ... 30 more
      
      2004-08-31 15:12:31,806 WARN [SimpleAxisServlet] [-] "Invocation of the axis engine failed"
      AxisFault
       faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
       faultSubcode:
       faultString: org.jboss.util.NestedRuntimeException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false; - nested throwable:
      (org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false)
       faultActor:
       faultNode:
       faultDetail:
       {http://xml.apache.org/axis/}stackTrace: org.jboss.util.NestedRuntimeException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false; - nested throwable:
      (org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3179)
       at org.jboss.cache.TreeCache.put(TreeCache.java:1709)
       at no.feide.moria.store.MoriaCacheStore.insertIntoStore(MoriaCacheStore.java:849)
       at no.feide.moria.store.MoriaCacheStore.createAuthnAttempt(MoriaCacheStore.java:283)
       at no.feide.moria.controller.MoriaController.initiateAuthentication(MoriaController.java:593)
       at no.feide.moria.webservices.v1_1.Authentication.initiateAuthentication(Authentication.java:102)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:324)
       at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:402)
       at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:309)
       at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:333)
       at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:71)
       at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:150)
       at org.apache.axis.SimpleChain.invoke(SimpleChain.java:120)
       at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:481)
       at org.apache.axis.server.AxisServer.invoke(AxisServer.java:323)
       at no.feide.moria.servlet.SimpleAxisServlet.doPost(SimpleAxisServlet.java:327)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:165)
       at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:339)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:103)
       at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:113)
       at com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:190)
       at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:177)
       at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:221)
       at com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:392)
       at com.caucho.server.port.TcpConnection.run(TcpConnection.java:323)
       at com.caucho.util.ThreadPool.runTasks(ThreadPool.java:430)
       at com.caucho.util.ThreadPool.run(ThreadPool.java:377)
       at java.lang.Thread.run(Thread.java:534)
      Caused by: org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2145)
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2167)
       at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:89)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:39)
       at org.jboss.cache.interceptors.TransactionInterceptor.invoke(TransactionInterceptor.java:53)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3176)
       ... 30 more
      
      
      org.jboss.util.NestedRuntimeException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false; - nested throwable:
      (org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false)
       at org.apache.axis.AxisFault.makeFault(AxisFault.java:129)
       at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:343)
       at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:71)
       at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:150)
       at org.apache.axis.SimpleChain.invoke(SimpleChain.java:120)
       at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:481)
       at org.apache.axis.server.AxisServer.invoke(AxisServer.java:323)
       at no.feide.moria.servlet.SimpleAxisServlet.doPost(SimpleAxisServlet.java:327)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:165)
       at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:339)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:103)
       at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:113)
       at com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:190)
       at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:177)
       at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:221)
       at com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:392)
       at com.caucho.server.port.TcpConnection.run(TcpConnection.java:323)
       at com.caucho.util.ThreadPool.runTasks(ThreadPool.java:430)
       at com.caucho.util.ThreadPool.run(ThreadPool.java:377)
       at java.lang.Thread.run(Thread.java:534)
      Caused by: org.jboss.util.NestedRuntimeException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false; - nested throwable:
      (org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3179)
       at org.jboss.cache.TreeCache.put(TreeCache.java:1709)
       at no.feide.moria.store.MoriaCacheStore.insertIntoStore(MoriaCacheStore.java:849)
       at no.feide.moria.store.MoriaCacheStore.createAuthnAttempt(MoriaCacheStore.java:283)
       at no.feide.moria.controller.MoriaController.initiateAuthentication(MoriaController.java:593)
       at no.feide.moria.webservices.v1_1.Authentication.initiateAuthentication(Authentication.java:102)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:324)
       at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:402)
       at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:309)
       at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:333)
       ... 18 more
      Caused by: org.jboss.cache.lock.TimeoutException: rsp=sender=10.0.0.12:7890, retval=null, received=false, suspected=false
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2145)
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2167)
       at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:89)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:39)
       at org.jboss.cache.interceptors.TransactionInterceptor.invoke(TransactionInterceptor.java:53)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3176)
       ... 30 more
      
      


      JC log 10.0.0.11:
      2004-08-31 15:06:43,932 INFO [TreeCache] viewAccepted(): new members: [10.0.0.11:7890, 10.0.0.12:7890]
      2004-08-31 15:10:53,856 DEBUG [TreeCache] _put(null, "/Login Ticket/ioCAix7SAAAA-rT4IndfEPtZC2RvW7*mArS0vjILMssBaocG-5eFjN1VCBOo1Uk9SRhRUWiEHKUGXWAWhcM",
      {MoriaData=no.feide.moria.store.MoriaAuthnAttempt@1de7497, TimeToLive=1093969373810, TicketType=Login Ticket, Principal=demo})
      2004-08-31 15:10:53,856 DEBUG [Node] createChild: fqn=/Login Ticket
      2004-08-31 15:10:53,862 DEBUG [Node] createChild: fqn=/Login Ticket/ioCAix7SAAAA-rT4IndfEPtZC2RvW7*mArS0vjILMssBaocG-5eFjN1VCBOo1Uk9SRhRUWiEHKUGXWAWhcM
      2004-08-31 15:10:53,863 DEBUG [TreeCache] callRemoteMethods(): valid members are [10.0.0.12:7890]
      2004-08-31 15:12:16,766 DEBUG [TreeCache] _put(null, "/Login Ticket/ioCAix7SAAAA-rT5Zn7eveJfvNkqy4cRyOJuDzadpcswm6rUZYwb6Dt3BYdU-xNmuqTVtNwkJNpFZJ-uJ4s",
      {MoriaData=no.feide.moria.store.MoriaAuthnAttempt@13e6346, TimeToLive=1093969456766, TicketType=Login Ticket, Principal=demo})
      2004-08-31 15:12:16,767 DEBUG [Node] createChild: fqn=/Login Ticket/ioCAix7SAAAA-rT5Zn7eveJfvNkqy4cRyOJuDzadpcswm6rUZYwb6Dt3BYdU-xNmuqTVtNwkJNpFZJ-uJ4s
      2004-08-31 15:12:16,767 DEBUG [TreeCache] callRemoteMethods(): valid members are [10.0.0.12:7890]
      
      


      JC log 10.0.0.12:
      2004-08-31 15:06:43,993 INFO [TreeCache] viewAccepted(): new members: [10.0.0.11:7890, 10.0.0.12:7890]
      2004-08-31 15:13:23,771 DEBUG [TreeCache] _put(null, "/Login Ticket/ioCAix7SAAAA-rT4IndfEPtZC2RvW7*mArS0vjILMssBaocG-5eFjN1VCBOo1Uk9SRhRUWiEHKUGXWAWhcM",
      {TicketType=Login Ticket, TimeToLive=1093969373810, MoriaData=no.feide.moria.store.MoriaAuthnAttempt@122c9df, Principal=demo})
      2004-08-31 15:13:23,771 DEBUG [Node] createChild: fqn=/Login Ticket
      2004-08-31 15:13:23,775 DEBUG [Node] createChild: fqn=/Login Ticket/ioCAix7SAAAA-rT4IndfEPtZC2RvW7*mArS0vjILMssBaocG-5eFjN1VCBOo1Uk9SRhRUWiEHKUGXWAWhcM
      2004-08-31 15:16:05,701 DEBUG [TreeCache] _put(null, "/Login Ticket/ioCAix7SAAAA-rT5Zn7eveJfvNkqy4cRyOJuDzadpcswm6rUZYwb6Dt3BYdU-xNmuqTVtNwkJNpFZJ-uJ4s",
      {TicketType=Login Ticket, TimeToLive=1093969456766, MoriaData=no.feide.moria.store.MoriaAuthnAttempt@1a70b8, Principal=demo})
      2004-08-31 15:16:05,720 DEBUG [Node] createChild: fqn=/Login Ticket/ioCAix7SAAAA-rT5Zn7eveJfvNkqy4cRyOJuDzadpcswm6rUZYwb6Dt3BYdU-xNmuqTVtNwkJNpFZJ-uJ4s
      
      


        • 1. Re: TimeOutException using 1.1 with fragmented cache
          belaban

          All calls will fail until the 2 subgroups have formed. So, if you e.g. have {A,B,C,D}, then kill the switch between {A,B} and {C,D}, and invoke a method call (a.k.a. put an element in the cache) *before* the failure has been detected, then the caller still assumes a view of 4 members, and we will hang (e.g. if A invokes the call) until C and D have been excluded from the view. This may take some time, depending on your FD setup.
          Also note that *state merging* (after a partition) is *not* provided in this release yet. You have to do it yourself: register as TreeCacheListener, on viewChange(View v): is v instanceof MergeView --> then you need to consolidate the state. One simple solution is currently to just re-enter the values you are 'responsible' for.

          Bela

          • 2. Re: TimeOutException using 1.1 with fragmented cache
            smievoll

            If you look at the log-excerpts you see that both cache instances reports having accepted the new view (containing just the two machines) about four minutes before I try to create the nodes.

            I've implemented state merging using the TreeCacheListener interface catching MergeViews, and that seems to work ok.

            • 3. Re: TimeOutException using 1.1 with fragmented cache
              belaban

              Okay, then it should work. I suggest you enable tracing (add category "org.jgroups" to log4j and set its level to DEBUG (or TRACE if run inside JBoss).

              Bela

              • 4. Re: TimeOutException using 1.1 with fragmented cache
                smievoll

                I have the logs but they're rather large so I put them here: http://www.conduct.no/~bos/jc/. Can't say I find anything that explains the three minute lag between the distribution of each node, but it seems that 11 does not respond to 12s FD_SIMPLE ARE_YOU_ALIVE requests just as it the puts are beeing done. 11 also reports this 15 seconds after each put:

                2004-08-31 15:11:08,964 DEBUG [GroupRequest] call did not execute correctly, request is [GroupRequest:
                req_id=1093957853950
                members: 10.0.0.12:7890
                responses: null
                request_msg: [dst: <null>, src: <null> (2 headers), size = 1301 bytes]
                rsp_mode: 2
                done: false
                timeout: 15000
                expected_mbrs: 0
                ]
                2004-08-31 15:11:08,972 DEBUG [RpcDispatcher] responses: [sender=10.0.0.12:7890, retval=null, received=false, suspected=false]
                


                When 12 finally recives the first node this is what's in the log:
                2004-08-31 15:13:23,728 DEBUG [TCP] received msg [dst: 10.0.0.12:7890, src: 10.0.0.11:7890 (3 headers), size = 1301 bytes]
                2004-08-31 15:13:23,729 DEBUG [NAKACK] [10.0.0.12:7890] received <10.0.0.11:7890#27>
                2004-08-31 15:13:23,729 DEBUG [NAKACK] receiver window for 10.0.0.11:7890 is: received_msgs: [27 - 27 (size=1, missing=0)], delivered_msgs: [0 - 26]
                2004-08-31 15:13:23,729 DEBUG [STABLE] stable task started; num_gossip_runs=3, max_gossip_runs=3
                2004-08-31 15:13:23,730 DEBUG [RequestCorrelator] header is [Header: name=MessageDispatcher, type=REQ, id=1093957853950, rsp_expected=true], dest_mbrs=[10.0.0.12:7890]
                2004-08-31 15:13:23,730 DEBUG [RequestCorrelator] calling (org.jgroups.blocks.RpcDispatcher) with request 1093957853950
                2004-08-31 15:13:23,771 DEBUG [RpcDispatcher] [sender=10.0.0.11:7890], method_call: _replicate(_put(null,
                /Login Ticket/ioCAix7SAAAA-rT4IndfEPtZC2RvW7*mArS0vjILMssBaocG-5eFjN1VCBOo1Uk9SRhRUWiEHKUGXWAWhcM, {TicketType=Login Ticket
                , TimeToLive=1093969373810, MoriaData=no.feide.moria.store.MoriaAuthnAttempt@122c9df, Principal=demo}, true))
                2004-08-31 15:13:23,775 DEBUG [RequestCorrelator] sending rsp for 1093957853950 to 10.0.0.11:7890
                2004-08-31 15:13:23,776 DEBUG [TCP] dest=10.0.0.11:7890, hdrs:
                TCP: [TCP:group_addr=Moria-Cluster]
                MessageDispatcher: [Header: name=MessageDispatcher, type=RSP, id=1093957853950, rsp_expected=false]
                


                and on 11 at the same moment:
                2004-08-31 15:13:23,742 DEBUG [NAKACK] [10.0.0.11:7890] received <10.0.0.11:7890#27>
                2004-08-31 15:13:23,742 DEBUG [TCP] dest=10.0.0.12:7890, hdrs:
                NAKACK: [NAKACK: MSG, seqno=27, range=null]
                TCP: [TCP:group_addr=Moria-Cluster]
                MessageDispatcher: [Header: name=MessageDispatcher, type=REQ, id=1093957853950, rsp_expected=true], dest_mbrs=[10.0.0.12:7890]
                
                2004-08-31 15:13:23,742 DEBUG [NAKACK] receiver window for 10.0.0.11:7890 is: received_msgs: [27 - 27 (size=1, missing=0)], delivered_msgs: [0 - 26]
                2004-08-31 15:13:23,743 DEBUG [RequestCorrelator] discarded request from 10.0.0.11:7890 as we are not part of destination list
                 (local_addr=10.0.0.11:7890, hdr=[Header: name=MessageDispatcher, type=REQ, id=1093957853950, rsp_expected=true], dest_mbrs=[10.0.0.12:7890])
                2004-08-31 15:13:23,743 DEBUG [TCP] dest=10.0.0.21:7890, hdrs:
                TCP: [TCP:group_addr=Moria-Cluster]
                TCPPING: [PING: type=GET_MBRS_REQ, arg=null]
                
                2004-08-31 15:13:23,743 DEBUG [TCP] dest=10.0.0.21:7891, hdrs:
                TCP: [TCP:group_addr=Moria-Cluster]
                TCPPING: [PING: type=GET_MBRS_REQ, arg=null]
                
                2004-08-31 15:13:23,745 DEBUG [TCP] received msg [dst: 10.0.0.11:7890, src: 10.0.0.12:7890 (2 headers), size = 0 bytes]
                2004-08-31 15:13:23,746 INFO [FD_SIMPLE] received I_AM_ALIVE response from 10.0.0.12:7890
                2004-08-31 15:13:23,747 DEBUG [TCP] received msg [dst: 10.0.0.11:7890, src: 10.0.0.12:7890 (2 headers), size = 0 bytes]
                2004-08-31 15:13:23,747 DEBUG [TCP] received msg [dst: 10.0.0.11:7890, src: 10.0.0.12:7890 (2 headers), size = 0 bytes]
                2004-08-31 15:13:23,748 INFO [FD_SIMPLE] received I_AM_ALIVE response from 10.0.0.12:7890
                2004-08-31 15:13:23,748 DEBUG [TCP] received msg [dst: 10.0.0.11:7890, src: 10.0.0.12:7890 (2 headers), size = 0 bytes]
                2004-08-31 15:13:23,748 INFO [FD_SIMPLE] received I_AM_ALIVE response from 10.0.0.12:7890
                2004-08-31 15:13:23,799 DEBUG [TCP] received msg [dst: 10.0.0.11:7890, src: 10.0.0.12:7890 (2 headers), size = 5 bytes]
                2004-08-31 15:13:23,799 DEBUG [RequestCorrelator] header is [Header: name=MessageDispatcher, type=RSP, id=1093957853950, rsp_expected=false]
                2004-08-31 15:13:23,819 INFO [FD_SIMPLE] sending ARE_YOU_ALIVE message to 10.0.0.12:7890, counters are
                10.0.0.12:7890: 0
                10.0.0.11:7890: 0
                


                None of this tell me anything really, but maybe you can read something from it.


                • 5. Re: TimeOutException using 1.1 with fragmented cache
                  belaban

                  Looks okay to me - nothing suspicious besides the fact the the method call from 11 to 12 does not succeed. Are you sure 11 can 'see' 12 ?

                  On another note: how did FD_SIMPLE end up in your config ? This is not the default... Can you post your config ?
                  Bela

                  • 6. Re: TimeOutException using 1.1 with fragmented cache
                    smievoll

                    Pretty shure they see each other, can't see anything indicating the opposite. The boxes are connected to the same switch. Ended up with FD_SIMPLE just as an experiment, we've tried about a zillion different configurations. Things didn't work any better with FD.
                    The configs are here: http://www.conduct.no/~bos/jc/.

                    • 7. Re: TimeOutException using 1.1 with fragmented cache
                      belaban

                      Try FD (with shun="true") and GMS with sun="true"