I'm kicking off this discussion in response to JBCACHE-923 and Jacek Halat's post on node locking algorithms on http://jboss.org/index.html?module=bb&op=viewtopic&p=4084417#4084417.
Most of the problems reported centre around a premise that a transaction manager may attempt to clean up timed out transactions (by rolling them back) in a separate thread.
Now this can lead to problems with the existing pessimistic lock interceptor since locks are obtained and held on Thread (in the non-tx case) or GlobalTransaction (in the tx case). This almost implies that only a single thread may at any time use a GlobalTransaction, and we do see problems when two threads (one being a cleanup rollback thread) try and use the same gtx at the same time.
1. App thread starts tx, writes to /a
2. App thread attempts to write to /a again
3. Tx times out, TM attempts a rollback in a separate thread
4. The write lock on /a is owned by the gtx, which means the rollback thread is allowed to restore the state of /a and the app thread is *also* allowed to change the state of /a in (2).
In addition to inconsistent state, this can also lead to stale locks.
Now a lot of these issues come from the fact that there is an assumption that the TM will not attempt a rollback in a different thread, simultaeous to the app thread working in the transaction. A proper fix would involve rearchitecting how locking is handled in the cache (and this is underway) but as a workaround, does it make sense to serialize access to the cache on a per-gtx basis?
1. Tx interceptor to maintain a list of 'currently working' gtxs.
2. Only allow one thread at a time up the interceptor chain per gtx, causing other threads to block.
This should effectively prevent such collissions. Keep in mind I haven't spent much time thinking about this solution yet - so by all means do tear it apart! :-)