(Inadvertently posted this on the Dev-Cluster forum last week; moving it to the right place)
Discussion on the JIRA issue...
First, an issue we'd briefly discussed off-line was how to find the transaction that holds the locks. I believe that's actually easy; as you walk through the tree, you can call DataNode.getLock() to get the IdentityLock object; from there you can determine what object holds the lock. If it's a GlobalTransaction you can then find the Transaction in the tx_table. If SERIALIZABLE semantics are in place you have to check the read lock owners; otherwise just the write lock owner.
If the owner is not a GlobalTransaction (i.e. it's a Thread), I'll loop a few times trying to acquire the lock with a 1 ms timeout. After a few loops, I'll release the lock myself. This shouldn't cause a problem with the UnlockInterceptor, which shouldn't fail if it later tries to release a lock that I've already released.
The trickier part is what to do if the lock is held by a transaction. We want to acquire the lock with as little disruption as possible. I think the answer here depends on the status of the transaction.
1) ACTIVE, MARKED_ROLLBACK, PREPARING: call tx.rollback().
2) COMMITTING, ROLLING_BACK: try to let it finish. loop a while trying to acquire w/ a 1 ms timeout. Once my patience is exhausted, release the lock myself.
3) COMMITTED, ROLLED_BACK, NO_TRANSACTION: just release the lock myself.
4) PREPARED: Here we need to determine who initiated the gtx. If it was ourself, this means we've sent a prepare() call to the cluster. But we can still abort the commit phase and send a rollback() to the cluster, by calling tx.setRollbackOnly(). Can't call tx.rollback() as this will throw an ISE once the tx is STATUS_PREPARED. Once we've done this, loop a bit trying to acquire, and once patience is exhausted release the lock myself.
If the gtx wasn't initiated on our cache instance, we're in the middle of processing a commit() call from another cache. We just happened to hit the instant when the tx was STATUS_PREPARED. Treat this the same as COMMITTING above: try to let it finish. loop a while trying to acquire w/ a 1 ms timeout. Once my patience is exhausted, release the lock myself.
5) UNKNOWN: I'm thinking just call tx.rollback().
With all of the above, if we catch an exception we'll mark the tx rollback only and release the lock ourselves.