ReplicationException caused by SuspectException when I shut
jl Aug 28, 2006 4:31 AMHallo to all,
We use a two member cluster.
Some nodes in the cache will be modified very often ( scaling we the numer off concurrent clients on the system)
When I shutdown one cluster member, I get in the most cases the following Exception chain:
2006-08-25 17:28:55,378 [ReactivateArchive_1188239188#702] ERROR org.jboss.cache.transaction.DummyTransaction - beforeCompletion() failed for tx=org.jboss.cache.transaction.DummyTransaction@86f68d, handlers=[TxInterceptor.LocalSynchronizationHandler(gtx=GlobalTransaction:<192.168.4.174:7801>:237, tx=org.jboss.cache.transaction.DummyTransaction@86f68d)]
java.lang.RuntimeException:
at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1091)
at org.jboss.cache.interceptors.OrderedSynchronizationHandler.beforeCompletion(OrderedSynchronizationHandler.java:75)
at org.jboss.cache.transaction.DummyTransaction.notifyBeforeCompletion(DummyTransaction.java:247)
at org.jboss.cache.transaction.DummyTransaction.commit(DummyTransaction.java:54)
at org.jboss.cache.transaction.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:61)
at com.xtramind.common.distributed.DefaultDistributedResourceAgent.putData(DefaultDistributedResourceAgent.java:499)
at com.xtramind.common.distributed.DefaultDistributedResourceManager.putData(DefaultDistributedResourceManager.java:166)
at com.xtramind.irma.archive.ArchiveAgent.addReactivateTask(ArchiveAgent.java:1204)
at com.xtramind.irma.archive.ReactivateTask.refreshDistributedCache(ReactivateTask.java:331)
at com.xtramind.irma.archive.AbstractTask.setCurrentCount(AbstractTask.java:187)
at com.xtramind.irma.archive.ReactivateTask.reactivate(ReactivateTask.java:140)
at com.xtramind.irma.archive.ReactivateTask.run(ReactivateTask.java:66)
at java.lang.Thread.run(Thread.java:595)
Caused by: org.jboss.cache.ReplicationException: rsp=sender=192.168.4.174:7800, retval=null, received=false, suspected=true
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4191)
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4114)
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4215)
at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
at org.jboss.cache.interceptors.ReplicationInterceptor.runPreparePhase(ReplicationInterceptor.java:147)
at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:64)
at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
at org.jboss.cache.interceptors.TxInterceptor.runPreparePhase(TxInterceptor.java:804)
at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1069)
... 12 more
Caused by: org.jboss.cache.SuspectException: Response suspected: sender=192.168.4.174:7800, retval=null, received=false, suspected=true
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4185)
... 21 more
331155 [ReactivateArchive_1188239188#702] ERROR org.jboss.cache.transaction.DummyTransaction - beforeCompletion() failed for tx=org.jboss.cache.transaction.DummyTransaction@86f68d, handlers=[TxInterceptor.LocalSynchronizationHandler(gtx=GlobalTransaction:<192.168.4.174:7801>:237, tx=org.jboss.cache.transaction.DummyTransaction@86f68d)]
java.lang.RuntimeException:
at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1091)
at org.jboss.cache.interceptors.OrderedSynchronizationHandler.beforeCompletion(OrderedSynchronizationHandler.java:75)
at org.jboss.cache.transaction.DummyTransaction.notifyBeforeCompletion(DummyTransaction.java:247)
at org.jboss.cache.transaction.DummyTransaction.commit(DummyTransaction.java:54)
at org.jboss.cache.transaction.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:61)
at com.xtramind.common.distributed.DefaultDistributedResourceAgent.putData(DefaultDistributedResourceAgent.java:499)
at com.xtramind.common.distributed.DefaultDistributedResourceManager.putData(DefaultDistributedResourceManager.java:166)
at com.xtramind.irma.archive.ArchiveAgent.addReactivateTask(ArchiveAgent.java:1204)
at com.xtramind.irma.archive.ReactivateTask.refreshDistributedCache(ReactivateTask.java:331)
at com.xtramind.irma.archive.AbstractTask.setCurrentCount(AbstractTask.java:187)
at com.xtramind.irma.archive.ReactivateTask.reactivate(ReactivateTask.java:140)
at com.xtramind.irma.archive.ReactivateTask.run(ReactivateTask.java:66)
at java.lang.Thread.run(Thread.java:595)
Caused by: org.jboss.cache.ReplicationException: rsp=sender=192.168.4.174:7800, retval=null, received=false, suspected=true
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4191)
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4114)
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4215)
at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
at org.jboss.cache.interceptors.ReplicationInterceptor.runPreparePhase(ReplicationInterceptor.java:147)
at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:64)
at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
at org.jboss.cache.interceptors.TxInterceptor.runPreparePhase(TxInterceptor.java:804)
at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1069)
... 12 more
Caused by: org.jboss.cache.SuspectException: Response suspected: sender=192.168.4.174:7800, retval=null, received=false, suspected=true
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4185)
... 21 more
[192.168.4.174:7801|2] [192.168.4.174:7801]
So my question:
Is there any job I have to do on the cache before I will shutdown one member.
Here is the config file we use:
<?xml version="1.0" encoding="UTF-8"?>
<!-- ===================================================================== -->
<!-- -->
<!-- Mailminder TreeCache Service Configuration -->
<!-- DON'T MODIFY THIS FILE -->
<!-- ===================================================================== -->
jboss:service=Naming
jboss:service=TransactionManager
<!--
Configure the TransactionManager
-->
org.jboss.cache.DummyTransactionManagerLookup
<!--
Node locking scheme:
OPTIMISTIC
PESSIMISTIC (default)
-->
PESSIMISTIC
<!--
Note that this attribute is IGNORED if your NodeLockingScheme above is OPTIMISTIC.
Isolation level : SERIALIZABLE
REPEATABLE_READ (default)
READ_COMMITTED
READ_UNCOMMITTED
NONE
-->
REPEATABLE_READ
<!--
Valid modes are LOCAL
REPL_ASYNC
REPL_SYNC
INVALIDATION_ASYNC
INVALIDATION_SYNC
-->
REPL_SYNC
<!--
Just used for async repl: use a replication queue
-->
false
<!--
Replication interval for replication queue (in ms)
-->
0
<!--
Max number of elements which trigger replication
-->
0
<!-- Name of cluster. Needs to be the same for all clusters, in order
to find each other
-->
MMC
<!-- JGroups protocol stack properties. Can also be a URL,
e.g. file:/home/bela/default.xml
-->
<TCP start_port="7800" loopback="true" send_buf_size="100000" recv_buf_size="200000"/>
<TCPPING timeout="3000" initial_hosts="127.0.0.1[7800]" port_range="3" num_initial_members="3"/>
<FD timeout="2000" max_tries="4"/>
<VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
<pbcast.NAKACK gc_lag="100" retransmit_timeout="600,1200,2400,4800"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="20000" down_thread="false" max_bytes="0" up_thread="false"/>
<VIEW_SYNC avg_send_interval="60000" down_thread="false" up_thread="false" />
<pbcast.GMS print_local_addr="true" join_timeout="5000" join_retry_timeout="2000" shun="true"/>
<pbcast.STATE_TRANSFER />
<!--
Whether or not to fetch state on joining a cluster
NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
-->
true
<!--
The max amount of time (in milliseconds) we wait until the
initial state (ie. the contents of the cache) are retrieved from
existing members in a clustered environment
-->
70000
<!--
Number of milliseconds to wait until all responses for a
synchronous call have been received.
-->
70000
<!-- Max number of milliseconds to wait for a lock acquisition -->
30000
<!-- Name of the eviction policy class. -->
org.jboss.cache.eviction.LRUPolicy
<!-- Specific eviction policy configurations. This is LRU -->
5
<!-- Cache wide default -->
0
0
0
0
600
3600
<!--
Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
class loader, e.g., inside an application server. Default is "false".
-->
false
<!-- Uncomment to get a graphical view of the TreeCache MBean above -->
<!-- -->
<!-- jboss.cache:service=TreeCache-->
<!-- jboss.cache:service=TreeCache-->
<!-- -->
Thaks for help
JL