newbie question - ReplicationException| JBoss.org Content Archive (Read Only)

1. Re: newbie question - ReplicationException

aditsu Jul 28, 2007 8:34 AM (in response to slattery)

Yes I found that this can happen. My solution is to have each side time out quickly, wait a random (and short) amount of time, and try again.
I don't know if there is a "proper" solution.

Adrian

2. Re: newbie question - ReplicationException

manik Jul 30, 2007 7:43 AM (in response to slattery)

This is something that can happen with pessimistic locking, and part of the motivation to develop an optimistic locking solution.

I'd recommend trying optimistic locking.

3. Re: newbie question - ReplicationException

aditsu Jan 8, 2008 8:05 AM (in response to slattery)

Hi, I decided to come back to this problem and find a better solution than the random sleeps and retries.
We tried using optimistic locking, but it doesn't seem to help at all.
Here's a test program, using 2 replicated cache instances and writing to the same node:

package CacheRepl;

import java.util.concurrent.atomic.AtomicInteger;

import javax.transaction.UserTransaction;

import org.apache.log4j.Logger;
import org.jboss.cache.Cache;
import org.jboss.cache.DefaultCacheFactory;
import org.jboss.cache.Fqn;
import org.jboss.cache.transaction.DummyTransactionManager;
import org.jboss.cache.transaction.DummyUserTransaction;

public class CacheThread extends Thread {
 private static final Logger LOG = Logger.getLogger("test");
 private static final int THREADS = 2;
 private static final int REPEAT = 20;
 private static final int SLEEP_TIME = 10;

 protected int threadId;
 protected final Cache<Object, Object> cache;

 public CacheThread(final int threadId) {
 this.threadId = threadId;
 cache = DefaultCacheFactory.getInstance().createCache("replSync-service.xml", true);
 }

 private boolean doTransaction() {
 final UserTransaction tx = new DummyUserTransaction(DummyTransactionManager.getInstance());
 try {
 tx.begin();
 cache.put(new Fqn<Object>("node"), "key", "value" + threadId);
 tx.commit();
 return true;
 } catch (Exception e) {
 LOG.error("transaction failed", e);
 try {
 tx.rollback();
 } catch (Exception e1) {
 LOG.warn("rollback failed", e1);
 }
 return false;
 }
 }

 @Override
 public void run() {
 int failed = 0;
 for (int i = 0; i < REPEAT; ++i) {
 try {
 Thread.sleep(SLEEP_TIME);
 } catch (Exception e) {
 e.printStackTrace();
 }
 if (!doTransaction()) {
 failed++;
 }
 }
 System.out.println("Thread" + threadId + ": " + failed + " failures");
 }

 private void close() {
 cache.stop();
 }

 public static void main(String[] args) throws InterruptedException {
 final CacheThread[] threads = new CacheThread[THREADS];
 for (int t = 0; t < THREADS; t++) {
 threads[t] = new CacheThread(t);
 }
 for (int t = 0; t < THREADS; t++) {
 threads[t].start();
 }
 System.out.println("started");
 for (int t = 0; t < THREADS; t++) {
 threads[t].join();
 }
 for (int t = 0; t < THREADS; t++) {
 threads[t].close();
 }
 }
}

and the cache configuration:

<server>
 <mbean code="org.jboss.cache.jmx.CacheJmxWrapper"
 name="jboss.cache:service=TreeCache">

 <depends>jboss:service=Naming</depends>
 <depends>jboss:service=TransactionManager</depends>

 <attribute name="TransactionManagerLookupClass">org.jboss.cache.transaction.DummyTransactionManagerLookup
 </attribute>
 <!--
 Node locking scheme:
 OPTIMISTIC
 PESSIMISTIC (default)
 -->
 <attribute name="NodeLockingScheme">OPTIMISTIC</attribute>
 <!--
 Isolation level : SERIALIZABLE
 REPEATABLE_READ (default)
 READ_COMMITTED
 READ_UNCOMMITTED
 NONE
 -->
 <attribute name="IsolationLevel">READ_COMMITTED</attribute>

 <attribute name="CacheMode">REPL_SYNC</attribute>

 <attribute name="ClusterName">Test cache</attribute>

 <attribute name="ClusterConfig">
 <config>
 <UDP bind_addr="10.0.0.226"
 mcast_addr="228.10.10.10"
 mcast_port="45588"
 tos="8"
 ucast_recv_buf_size="20000000"
 ucast_send_buf_size="640000"
 mcast_recv_buf_size="25000000"
 mcast_send_buf_size="640000"
 loopback="false"
 discard_incompatible_packets="true"
 max_bundle_size="64000"
 max_bundle_timeout="30"
 use_incoming_packet_handler="true"
 ip_ttl="2"
 enable_bundling="false"
 enable_diagnostics="true"
 use_concurrent_stack="true"
 thread_naming_pattern="pl"
 thread_pool.enabled="true"
 thread_pool.min_threads="1"
 thread_pool.max_threads="25"
 thread_pool.keep_alive_time="30000"
 thread_pool.queue_enabled="true"
 thread_pool.queue_max_size="10"
 thread_pool.rejection_policy="Run"
 oob_thread_pool.enabled="true"
 oob_thread_pool.min_threads="1"
 oob_thread_pool.max_threads="4"
 oob_thread_pool.keep_alive_time="10000"
 oob_thread_pool.queue_enabled="true"
 oob_thread_pool.queue_max_size="10"
 oob_thread_pool.rejection_policy="Run"/>
 <PING timeout="2000" num_initial_members="3"/>
 <MERGE2 max_interval="30000" min_interval="10000"/>
 <FD_SOCK/>
 <FD timeout="10000" max_tries="5" shun="true"/>
 <VERIFY_SUSPECT timeout="1500"/>
 <pbcast.NAKACK max_xmit_size="60000"
 use_mcast_xmit="false" gc_lag="0"
 retransmit_timeout="300,600,1200,2400,4800"
 discard_delivered_msgs="true"/>
 <UNICAST timeout="300,600,1200,2400,3600"/>
 <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
 max_bytes="400000"/>
 <pbcast.GMS print_local_addr="true" join_timeout="5000"
 join_retry_timeout="2000" shun="false"
 view_bundling="true" view_ack_collection_timeout="5000"/>
 <FRAG2 frag_size="60000"/>
 <pbcast.STREAMING_STATE_TRANSFER use_reading_thread="true"/>
 <!-- <pbcast.STATE_TRANSFER/> -->
 <pbcast.FLUSH timeout="0"/>
 </config>
 </attribute>

 <attribute name="FetchInMemoryState">true</attribute>
 <attribute name="StateRetrievalTimeout">15000</attribute>
 <attribute name="SyncReplTimeout">15000</attribute>
 <attribute name="LockAcquisitionTimeout">1000</attribute>

 <attribute name="UseRegionBasedMarshalling">true</attribute>
 <attribute name="TransactionTimeout">300</attribute>
 </mbean>

</server>

With both pessimistic and optimistic locking, there are failures every time.
Here are some example exceptions:

- optimistic locking:

org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/, caller=GlobalTransaction:<10.0.0.226:32932>:1, lock=write owner=GlobalTransaction:<10.0.0.226:32931>:2 (org.jboss.cache.lock.LockStrategyReadCommitted@1fd6bea)
 at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:528)
 at org.jboss.cache.interceptors.OptimisticLockingInterceptor.lockNodes(OptimisticLockingInterceptor.java:119)
 at org.jboss.cache.interceptors.OptimisticLockingInterceptor.invoke(OptimisticLockingInterceptor.java:52)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.OptimisticReplicationInterceptor.invoke(OptimisticReplicationInterceptor.java:73)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.NotificationInterceptor.invoke(NotificationInterceptor.java:32)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.TxInterceptor.handleOptimisticPrepare(TxInterceptor.java:374)
 at org.jboss.cache.interceptors.TxInterceptor.handleRemotePrepare(TxInterceptor.java:250)
 at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:100)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:123)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.InvocationContextInterceptor.invoke(InvocationContextInterceptor.java:62)
 at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:3939)
 at org.jboss.cache.CacheImpl._replicate(CacheImpl.java:2853)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:585)
 at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:330)
 at org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:77)
 at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:624)
 at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:533)
 at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
 at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:736)
 at org.jgroups.JChannel.up(JChannel.java:1063)
 at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:325)
 at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:406)
 at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:255)
 at org.jgroups.protocols.FRAG2.up(FRAG2.java:197)
 at org.jgroups.protocols.pbcast.GMS.up(GMS.java:722)
 at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
 at org.jgroups.protocols.UNICAST.up(UNICAST.java:263)
 at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:720)
 at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:546)
 at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
 at org.jgroups.protocols.FD.up(FD.java:322)
 at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:298)
 at org.jgroups.protocols.MERGE2.up(MERGE2.java:145)
 at org.jgroups.protocols.Discovery.up(Discovery.java:220)
 at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1486)
 at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1440)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
 at java.lang.Thread.run(Thread.java:595)
Caused by: org.jboss.cache.lock.TimeoutException: write lock for / could not be acquired after 1000 ms. Locks: Read lock owners: []
Write lock owner: GlobalTransaction:<10.0.0.226:32931>:2
 (caller=GlobalTransaction:<10.0.0.226:32932>:1, lock info: write owner=GlobalTransaction:<10.0.0.226:32931>:2 (org.jboss.cache.lock.LockStrategyReadCommitted@1fd6bea))
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java:244)
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:167)
 at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:497)
 ... 46 more

javax.transaction.RollbackException: outcome is false status: 1
 at org.jboss.cache.transaction.DummyTransaction.commit(DummyTransaction.java:75)
 at org.jboss.cache.transaction.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:78)
 at org.jboss.cache.transaction.DummyUserTransaction.commit(DummyUserTransaction.java:77)
 at CacheRepl.CacheThread.doTransaction(CacheThread.java:31)
 at CacheRepl.CacheThread.run(CacheThread.java:53)

- pessimistic locking:

org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/node, caller=GlobalTransaction:<10.0.0.226:32933>:2, lock=write owner=GlobalTransaction:<10.0.0.226:32934>:1 (org.jboss.cache.lock.LockStrategyReadCommitted@1977b9b)
 at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:528)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor$LockManager.acquire(PessimisticLockInterceptor.java:579)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.acquireNodeLock(PessimisticLockInterceptor.java:393)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:329)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java:187)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:34)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.NotificationInterceptor.invoke(NotificationInterceptor.java:32)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.TxInterceptor.replayModifications(TxInterceptor.java:511)
 at org.jboss.cache.interceptors.TxInterceptor.handlePessimisticPrepare(TxInterceptor.java:394)
 at org.jboss.cache.interceptors.TxInterceptor.handleRemotePrepare(TxInterceptor.java:254)
 at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:100)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:123)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:76)
 at org.jboss.cache.interceptors.InvocationContextInterceptor.invoke(InvocationContextInterceptor.java:62)
 at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:3939)
 at org.jboss.cache.CacheImpl._replicate(CacheImpl.java:2853)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:585)
 at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:330)
 at org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:77)
 at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:624)
 at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:533)
 at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
 at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:736)
 at org.jgroups.JChannel.up(JChannel.java:1063)
 at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:325)
 at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:406)
 at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:255)
 at org.jgroups.protocols.FRAG2.up(FRAG2.java:197)
 at org.jgroups.protocols.pbcast.GMS.up(GMS.java:722)
 at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
 at org.jgroups.protocols.UNICAST.up(UNICAST.java:263)
 at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:720)
 at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:546)
 at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
 at org.jgroups.protocols.FD.up(FD.java:322)
 at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:298)
 at org.jgroups.protocols.MERGE2.up(MERGE2.java:145)
 at org.jgroups.protocols.Discovery.up(Discovery.java:220)
 at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1486)
 at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1440)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
 at java.lang.Thread.run(Thread.java:595)
Caused by: org.jboss.cache.lock.TimeoutException: write lock for /node could not be acquired after 1000 ms. Locks: Read lock owners: []
Write lock owner: GlobalTransaction:<10.0.0.226:32934>:1
 (caller=GlobalTransaction:<10.0.0.226:32933>:2, lock info: write owner=GlobalTransaction:<10.0.0.226:32934>:1 (org.jboss.cache.lock.LockStrategyReadCommitted@1977b9b))
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock0(IdentityLock.java:244)
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:167)
 at org.jboss.cache.lock.IdentityLock.acquire(IdentityLock.java:497)
 ... 49 more

javax.transaction.RollbackException: outcome is false status: 1
 at org.jboss.cache.transaction.DummyTransaction.commit(DummyTransaction.java:75)
 at org.jboss.cache.transaction.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:78)
 at org.jboss.cache.transaction.DummyUserTransaction.commit(DummyUserTransaction.java:77)
 at CacheRepl.CacheThread.doTransaction(CacheThread.java:31)
 at CacheRepl.CacheThread.run(CacheThread.java:53)

I don't care about the configuration details, but this test program should NEVER, EVER, NOT EVER, fail under any circumstances, except nuclear attack or something like that.
One cache should always get the lock first, write to the node, finish the transaction, then the other cache can do the same.
Just like 2 threads setting the same AtomicInteger - that NEVER throws an exception.
Is there any solution to this?

Thanks
Adrian

4. Re: newbie question - ReplicationException

manik Jan 8, 2008 11:07 AM (in response to slattery)

Both locking schemes will have one node fail in the described scenario. The differences are, using the original poster's example, with PL the commit on JVM2 will fail while with OL the commit on JVM1 will fail.

Other differences are that with OL, the fail on JVM1 is local, i.e., it will fail before any remote calls are made and hence more efficient. also with OL if the tx on JVM1 hangs, other txs on other JVMs (or even the same JVM) can proceed.

If you want proper atomicity, then you get into the realm of distributed locks (or fail-fast cooperative locks) but either way, we're talking about extremely non-scalable solutions, especially since in a cache, you're looking at a mostly-read use-case which should make such a failure rare and a retry acceptable.

If you are looking at using the cache to store a heavily updated value, like a counter, for example, then a cache is almost certainly the wrong tool for the job.

That said, enough folk have asked for distributed locking and it is on the roadmap (albeit low prio - JBCACHE-1098) . It will almost certainly not be enabled by default for scalability reasons though.

5. Re: newbie question - ReplicationException

aditsu Jan 8, 2008 2:37 PM (in response to slattery)

Hm, the forum system ate my reply, I'll try to write it again.
In my case, with both locking schemes, both nodes usually fail at the same time (but not always).
I don't necessarily want a specific type of locks (e.g. distributed), especially if they're "extremely non-scalable". Indeed, this situation is rare, and if you think a retry is the solution here then I can accept it. However, I would expect the cache to handle it transparently (and in a configurable way); failure is not acceptable.

Also, this is not about storing a heavily updated value. Say we have to make some changes in the cache based on some external real-time events (from a data feed), and we have 2 replicated caches, for failover.
Both caches should listen to the events and perform the same changes immediately (which basically means at the same time). This ensures that no data can be lost if one cache dies, but it also triggers the problem I found. How can this be handled?

Thanks
Adrian

6. Re: newbie question - ReplicationException

manik Jan 8, 2008 7:21 PM (in response to slattery)

Not necessarily heavily updated, but more importantly concurrently updated, which is what you described.

In your case though, perhaps a good approach may be for both caches to attempt to write the change, and if one cache fails to write, assume that this is because the other has completed the write and hence it would not be necessary?

Also, I find it odd that *both nodes* fail. I guess with P/L and identical lock acquisition timeouts bean that this may happen. If your setup is a master/backup, perhaps your backup cache could have a much shorter lock acquisition timeout so it would always fail first, allowing the other cache to continue?

Both caches failing should almost certainly not happen with optimistic locking - let me know if it does and we can investigate that.

7. Re: newbie question - ReplicationException

aditsu Jan 17, 2008 11:59 PM (in response to slattery)

In your case though, perhaps a good approach may be for both caches to attempt to write the change, and if one cache fails to write, assume that this is because the other has completed the write and hence it would not be necessary?

That seems like a terrible approach, because there's no guarantee that the other cache succeeded. In fact, as I will show, it's almost certain that it also failed (if they have the same timeout).

Both caches failing should almost certainly not happen with optimistic locking - let me know if it does and we can investigate that.

Well, with pessimistic locking I can say that if one cache fails than both fail, practically always.
With optimistic locking, they don't always fail together, but "just" most of the time. Also, the percentage of failures seems to be greater.

Here's the modified code:

package CacheRepl;

import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Timer;
import java.util.TimerTask;

import javax.transaction.UserTransaction;

import org.apache.log4j.Logger;
import org.jboss.cache.Cache;
import org.jboss.cache.DefaultCacheFactory;
import org.jboss.cache.Fqn;
import org.jboss.cache.transaction.DummyTransactionManager;
import org.jboss.cache.transaction.DummyUserTransaction;

public class CacheThread2 {
 private static final Logger LOG = Logger.getLogger("test");
 private static final int THREADS = 2;
 private static final int REPEAT = 20;
 protected static final int INTERVAL = 1000;

 protected int count = 0;
 protected final Timer timer = new Timer();
 protected final int threadId;
 protected final Cache<Object, Object> cache;
 protected final DateFormat df = new SimpleDateFormat("HH:mm:ss.SSS");
 protected volatile boolean done = false;
 protected final boolean[] results = new boolean[REPEAT];

 protected class CacheTask extends TimerTask {
 @Override
 public void run() {
 final String t = df.format(new Date());
 final boolean result = doTransaction();
 System.out.println(t + " step " + count + " T" + threadId
 + (result ? " succeeded" : " failed"));
 results[count++] = result;
 if (count >= REPEAT) {
 timer.cancel();
 done = true;
 }
 }
 }

 public CacheThread2(final int threadId) {
 this.threadId = threadId;
 cache = DefaultCacheFactory.getInstance().createCache("replSync-service.xml", true);
 }

 public void start(final Date time) {
 timer.scheduleAtFixedRate(new CacheTask(), time, INTERVAL);
 }

 protected boolean doTransaction() {
 final UserTransaction tx = new DummyUserTransaction(DummyTransactionManager.getInstance());
 try {
 tx.begin();
 cache.put(new Fqn<Object>("node"), "key", "value" + threadId);
 tx.commit();
 return true;
 } catch (Exception e) {
 LOG.error("transaction failed", e);
 try {
 tx.rollback();
 } catch (Exception e1) {
 LOG.warn("rollback failed", e1);
 }
 return false;
 }
 }

 public static void main(String[] args) throws InterruptedException {
 final CacheThread2[] threads = new CacheThread2[THREADS];
 for (int t = 0; t < THREADS; t++) {
 threads[t] = new CacheThread2(t);
 }
 Date time = new Date(System.currentTimeMillis() + 2000);
 for (int t = 0; t < THREADS; t++) {
 threads[t].start(time);
 }
 while (!threads[0].done || !threads[1].done) {
 Thread.sleep(500);
 }
 final int[] stats = new int[4];
 for (int j = 0; j < REPEAT; ++j) {
 stats[(threads[0].results[j] ? 2 : 0) + (threads[1].results[j] ? 1 : 0)]++;
 }
 System.out.println("\nBoth failed: " + stats[0] + " times");
 System.out.println("First one failed: " + stats[1] + " times");
 System.out.println("Second one failed: " + stats[2] + " times");
 System.out.println("Both succeeded: " + stats[3] + " times");
 }
}

and cache configuration:

<server>
 <mbean code="org.jboss.cache.jmx.CacheJmxWrapper"
 name="jboss.cache:service=TreeCache">

 <depends>jboss:service=Naming</depends>
 <depends>jboss:service=TransactionManager</depends>

 <attribute name="TransactionManagerLookupClass">org.jboss.cache.transaction.DummyTransactionManagerLookup
 </attribute>
 <!--
 Node locking scheme:
 OPTIMISTIC
 PESSIMISTIC (default)
 -->
 <attribute name="NodeLockingScheme">OPTIMISTIC</attribute>
 <!--
 Isolation level : SERIALIZABLE
 REPEATABLE_READ (default)
 READ_COMMITTED
 READ_UNCOMMITTED
 NONE
 -->
 <attribute name="IsolationLevel">READ_COMMITTED</attribute>

 <attribute name="CacheMode">REPL_SYNC</attribute>

 <attribute name="ClusterName">Test cache</attribute>

 <attribute name="ClusterConfig">
 <config>
 <UDP bind_addr="10.0.0.226"
 mcast_addr="228.10.10.10"
 mcast_port="45588"
 tos="8"
 ucast_recv_buf_size="20000000"
 ucast_send_buf_size="640000"
 mcast_recv_buf_size="25000000"
 mcast_send_buf_size="640000"
 loopback="false"
 discard_incompatible_packets="true"
 max_bundle_size="64000"
 max_bundle_timeout="30"
 use_incoming_packet_handler="true"
 ip_ttl="2"
 enable_bundling="false"
 enable_diagnostics="true"
 use_concurrent_stack="true"
 thread_naming_pattern="pl"
 thread_pool.enabled="true"
 thread_pool.min_threads="1"
 thread_pool.max_threads="25"
 thread_pool.keep_alive_time="30000"
 thread_pool.queue_enabled="true"
 thread_pool.queue_max_size="10"
 thread_pool.rejection_policy="Run"
 oob_thread_pool.enabled="true"
 oob_thread_pool.min_threads="1"
 oob_thread_pool.max_threads="4"
 oob_thread_pool.keep_alive_time="10000"
 oob_thread_pool.queue_enabled="true"
 oob_thread_pool.queue_max_size="10"
 oob_thread_pool.rejection_policy="Run"/>
 <PING timeout="2000" num_initial_members="3"/>
 <MERGE2 max_interval="30000" min_interval="10000"/>
 <FD_SOCK/>
 <FD timeout="10000" max_tries="5" shun="true"/>
 <VERIFY_SUSPECT timeout="1500"/>
 <pbcast.NAKACK max_xmit_size="60000"
 use_mcast_xmit="false" gc_lag="0"
 retransmit_timeout="300,600,1200,2400,4800"
 discard_delivered_msgs="true"/>
 <UNICAST timeout="300,600,1200,2400,3600"/>
 <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
 max_bytes="400000"/>
 <pbcast.GMS print_local_addr="true" join_timeout="5000"
 join_retry_timeout="2000" shun="false"
 view_bundling="true" view_ack_collection_timeout="5000"/>
 <FRAG2 frag_size="60000"/>
 <pbcast.STREAMING_STATE_TRANSFER use_reading_thread="true"/>
 <!-- <pbcast.STATE_TRANSFER/> -->
 <pbcast.FLUSH timeout="0"/>
 </config>
 </attribute>

 <attribute name="FetchInMemoryState">true</attribute>
 <attribute name="StateRetrievalTimeout">15000</attribute>
 <attribute name="SyncReplTimeout">15000</attribute>
 <attribute name="LockAcquisitionTimeout">500</attribute>

 <attribute name="UseRegionBasedMarshalling">true</attribute>
 <attribute name="TransactionTimeout">300</attribute>
 </mbean>

</server>

Note that LockAcquisitionTimeout should be smaller than INTERVAL in the code.

Here's an example result with PESSIMISTIC locking:

Both failed: 15 times
First one failed: 0 times
Second one failed: 0 times
Both succeeded: 5 times

and with OPTIMISTIC locking:

Both failed: 15 times
First one failed: 2 times
Second one failed: 1 times
Both succeeded: 2 times

What do you think?

Thanks
Adrian

8. Re: newbie question - ReplicationException

aditsu Feb 19, 2008 11:10 PM (in response to slattery)

bump

9. Re: newbie question - ReplicationException

manik Mar 14, 2008 9:18 AM (in response to slattery)

Related to what you have been saying - http://www.jboss.com/index.html?module=bb&op=viewtopic&t=131474

But yes, specific to your problem, I have created a JIRA task: JBCACHE-1307