1 Reply Latest reply on Nov 29, 2006 9:45 AM by manik

    Replication problem and stuck of Jboss-Cache

    jochenweberfmi

      Hello all,

      we have some strange problems in our standalone application with our JbossCache/JGroups implementation.
      The application is working fine for a while and then the application get stuck, and each put in the tree-Cache will lead to a ReplicationException (see below).
      The port that is used (in this case 33146) was printed in the log file as the first output of the GMS protocol.
      ----------------
      GMS: address is ?.
      ----------------
      Some more GMS outputs like this are available.

      After this moment the cache is not be able to reinitialized.

      After some minutes in one server we see some logs entries like:

      (caller=Thread[Timer-0,5,main], lock info: read owners=[Thread[UpHandler (GMS),5,JGroups threads]] (activeReaders=1, activeWriter=null, waitingReaders=0, waitingWriters=0, waitingUpgrader=0))2006-11-27 19:00:28,145
      ERROR org.jgroups.protocols.pbcast.GMS:446 up_handler thread for GMS was interrupted (in order to be terminated), but is still alive2006-11-27

      and after a while only entries like:

      19:01:06,320 ERROR org.jgroups.protocols.pbcast.GMS:845 coords or merge_id == null

      A restart of a single server is not helping so that in the end all servers have to be restarted.

      We are using the following versions for JGroups and JBoss Cache:
      JGroups-2.4.0
      JBossCache-1.4.0.SP1
      JDK:JBossCache-1.4.0.SP1

      The JGroups configuration is the following:
      **************************************************************************************
      <UDP mcast_addr="228.1.2.3" mcast_port="45566" ip_ttl="64" ip_mcast="true"
      mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
      ucast_recv_buf_size="80000" loopback="false" />
      <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false" />
      <MERGE2 min_interval="10000" max_interval="20000" />
      <FD shun="true" up_thread="true" down_thread="true" />
      <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false" />
      <pbcast.NAKACK gc_lag="50" max_xmit_size="8192" retransmit_timeout="600,1200,2400,4800" up_thread="false" down_thread="false"/>
      <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false" />
      <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" />
      <FRAG frag_size="8192" down_thread="false" up_thread="false" />
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" />
      <pbcast.STATE_TRANSFER up_thread="false" down_thread="false" />

      Cached Exception:
      ***************************************************************************************************************
      2006-11-27 18:39:23,095 FATAL com.fmi.mapserver.licence.LicencedKeysSingleton:? stack: 520E16:CacheException in enableAccessCacheException rsp=sender=10.160.33.18:33146, retval=null, received=false, suspected=false
      org.jboss.cache.ReplicationException: rsp=sender=10.160.33.18:33146, retval=null, received=false, suspected=false
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4191)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4114)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4215)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
      at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:119)
      at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:83)
      at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
      at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:345)
      at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:156)
      at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
      at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:157)
      at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5520)
      at org.jboss.cache.TreeCache.put(TreeCache.java:3678)
      at org.jboss.cache.TreeCache.put(TreeCache.java:3616)
      at com.fmi.mapserver.licence.LicencedKeysSingleton.enableAccess(Unknown Source)
      at com.fmi.mapserver.action.InitAccessAction.execute(Unknown Source)
      at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:419)
      at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:224)
      at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1196)
      at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:414)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
      at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
      at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
      at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
      at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
      at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
      at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
      at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
      at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
      at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
      at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
      at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
      at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
      at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
      at java.lang.Thread.run(Thread.java:595)
      Caused by: org.jboss.cache.lock.TimeoutException: Response timed out: sender=10.160.33.18:33146, retval=null, received=false, suspected=false
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4189)
      ... 36 more

      Have anybody any idea ?

      Jochen