4 Replies Latest reply on Aug 29, 2006 5:20 AM by jl

    ReplicationException caused by SuspectException  when I shut

    jl

      Hallo to all,

      We use a two member cluster.
      Some nodes in the cache will be modified very often ( scaling we the numer off concurrent clients on the system)

      When I shutdown one cluster member, I get in the most cases the following Exception chain:

      2006-08-25 17:28:55,378 [ReactivateArchive_1188239188#702] ERROR org.jboss.cache.transaction.DummyTransaction - beforeCompletion() failed for tx=org.jboss.cache.transaction.DummyTransaction@86f68d, handlers=[TxInterceptor.LocalSynchronizationHandler(gtx=GlobalTransaction:<192.168.4.174:7801>:237, tx=org.jboss.cache.transaction.DummyTransaction@86f68d)]
      java.lang.RuntimeException:
      at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1091)
      at org.jboss.cache.interceptors.OrderedSynchronizationHandler.beforeCompletion(OrderedSynchronizationHandler.java:75)
      at org.jboss.cache.transaction.DummyTransaction.notifyBeforeCompletion(DummyTransaction.java:247)
      at org.jboss.cache.transaction.DummyTransaction.commit(DummyTransaction.java:54)
      at org.jboss.cache.transaction.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:61)
      at com.xtramind.common.distributed.DefaultDistributedResourceAgent.putData(DefaultDistributedResourceAgent.java:499)
      at com.xtramind.common.distributed.DefaultDistributedResourceManager.putData(DefaultDistributedResourceManager.java:166)
      at com.xtramind.irma.archive.ArchiveAgent.addReactivateTask(ArchiveAgent.java:1204)
      at com.xtramind.irma.archive.ReactivateTask.refreshDistributedCache(ReactivateTask.java:331)
      at com.xtramind.irma.archive.AbstractTask.setCurrentCount(AbstractTask.java:187)
      at com.xtramind.irma.archive.ReactivateTask.reactivate(ReactivateTask.java:140)
      at com.xtramind.irma.archive.ReactivateTask.run(ReactivateTask.java:66)
      at java.lang.Thread.run(Thread.java:595)
      Caused by: org.jboss.cache.ReplicationException: rsp=sender=192.168.4.174:7800, retval=null, received=false, suspected=true
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4191)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4114)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4215)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
      at org.jboss.cache.interceptors.ReplicationInterceptor.runPreparePhase(ReplicationInterceptor.java:147)
      at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:64)
      at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
      at org.jboss.cache.interceptors.TxInterceptor.runPreparePhase(TxInterceptor.java:804)
      at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1069)
      ... 12 more
      Caused by: org.jboss.cache.SuspectException: Response suspected: sender=192.168.4.174:7800, retval=null, received=false, suspected=true
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4185)
      ... 21 more
      331155 [ReactivateArchive_1188239188#702] ERROR org.jboss.cache.transaction.DummyTransaction - beforeCompletion() failed for tx=org.jboss.cache.transaction.DummyTransaction@86f68d, handlers=[TxInterceptor.LocalSynchronizationHandler(gtx=GlobalTransaction:<192.168.4.174:7801>:237, tx=org.jboss.cache.transaction.DummyTransaction@86f68d)]
      java.lang.RuntimeException:
      at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1091)
      at org.jboss.cache.interceptors.OrderedSynchronizationHandler.beforeCompletion(OrderedSynchronizationHandler.java:75)
      at org.jboss.cache.transaction.DummyTransaction.notifyBeforeCompletion(DummyTransaction.java:247)
      at org.jboss.cache.transaction.DummyTransaction.commit(DummyTransaction.java:54)
      at org.jboss.cache.transaction.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:61)
      at com.xtramind.common.distributed.DefaultDistributedResourceAgent.putData(DefaultDistributedResourceAgent.java:499)
      at com.xtramind.common.distributed.DefaultDistributedResourceManager.putData(DefaultDistributedResourceManager.java:166)
      at com.xtramind.irma.archive.ArchiveAgent.addReactivateTask(ArchiveAgent.java:1204)
      at com.xtramind.irma.archive.ReactivateTask.refreshDistributedCache(ReactivateTask.java:331)
      at com.xtramind.irma.archive.AbstractTask.setCurrentCount(AbstractTask.java:187)
      at com.xtramind.irma.archive.ReactivateTask.reactivate(ReactivateTask.java:140)
      at com.xtramind.irma.archive.ReactivateTask.run(ReactivateTask.java:66)
      at java.lang.Thread.run(Thread.java:595)
      Caused by: org.jboss.cache.ReplicationException: rsp=sender=192.168.4.174:7800, retval=null, received=false, suspected=true
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4191)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4114)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4215)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
      at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
      at org.jboss.cache.interceptors.ReplicationInterceptor.runPreparePhase(ReplicationInterceptor.java:147)
      at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:64)
      at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
      at org.jboss.cache.interceptors.TxInterceptor.runPreparePhase(TxInterceptor.java:804)
      at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1069)
      ... 12 more
      Caused by: org.jboss.cache.SuspectException: Response suspected: sender=192.168.4.174:7800, retval=null, received=false, suspected=true
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4185)
      ... 21 more
      [192.168.4.174:7801|2] [192.168.4.174:7801]



      So my question:

      Is there any job I have to do on the cache before I will shutdown one member.

      Here is the config file we use:


      <?xml version="1.0" encoding="UTF-8"?>

      <!-- ===================================================================== -->
      <!-- -->
      <!-- Mailminder TreeCache Service Configuration -->
      <!-- DON'T MODIFY THIS FILE -->
      <!-- ===================================================================== -->





      jboss:service=Naming
      jboss:service=TransactionManager

      <!--
      Configure the TransactionManager
      -->
      org.jboss.cache.DummyTransactionManagerLookup


      <!--
      Node locking scheme:
      OPTIMISTIC
      PESSIMISTIC (default)
      -->
      PESSIMISTIC

      <!--
      Note that this attribute is IGNORED if your NodeLockingScheme above is OPTIMISTIC.

      Isolation level : SERIALIZABLE
      REPEATABLE_READ (default)
      READ_COMMITTED
      READ_UNCOMMITTED
      NONE
      -->
      REPEATABLE_READ

      <!--
      Valid modes are LOCAL
      REPL_ASYNC
      REPL_SYNC
      INVALIDATION_ASYNC
      INVALIDATION_SYNC
      -->
      REPL_SYNC

      <!--
      Just used for async repl: use a replication queue
      -->
      false

      <!--
      Replication interval for replication queue (in ms)
      -->
      0

      <!--
      Max number of elements which trigger replication
      -->
      0

      <!-- Name of cluster. Needs to be the same for all clusters, in order
      to find each other
      -->
      MMC

      <!-- JGroups protocol stack properties. Can also be a URL,
      e.g. file:/home/bela/default.xml

      -->



      <TCP start_port="7800" loopback="true" send_buf_size="100000" recv_buf_size="200000"/>
      <TCPPING timeout="3000" initial_hosts="127.0.0.1[7800]" port_range="3" num_initial_members="3"/>
      <FD timeout="2000" max_tries="4"/>
      <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
      <pbcast.NAKACK gc_lag="100" retransmit_timeout="600,1200,2400,4800"/>
      <pbcast.STABLE stability_delay="1000" desired_avg_gossip="20000" down_thread="false" max_bytes="0" up_thread="false"/>
      <VIEW_SYNC avg_send_interval="60000" down_thread="false" up_thread="false" />
      <pbcast.GMS print_local_addr="true" join_timeout="5000" join_retry_timeout="2000" shun="true"/>
      <pbcast.STATE_TRANSFER />




      <!--
      Whether or not to fetch state on joining a cluster
      NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
      -->
      true

      <!--
      The max amount of time (in milliseconds) we wait until the
      initial state (ie. the contents of the cache) are retrieved from
      existing members in a clustered environment
      -->
      70000

      <!--
      Number of milliseconds to wait until all responses for a
      synchronous call have been received.
      -->
      70000

      <!-- Max number of milliseconds to wait for a lock acquisition -->
      30000

      <!-- Name of the eviction policy class. -->
      org.jboss.cache.eviction.LRUPolicy
      <!-- Specific eviction policy configurations. This is LRU -->


      5
      <!-- Cache wide default -->

      0
      0
      0



      0
      600
      3600




      <!--
      Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
      class loader, e.g., inside an application server. Default is "false".
      -->
      false




      <!-- Uncomment to get a graphical view of the TreeCache MBean above -->
      <!-- -->
      <!-- jboss.cache:service=TreeCache-->
      <!-- jboss.cache:service=TreeCache-->
      <!-- -->





      Thaks for help

      JL