Eager deadlock detection with 2 phase commits
manik Mar 8, 2008 11:06 AMAt JBoss World, Jason had proposed a mechanism for eager deadlock detection when using a 2 phase commit and pessimistic locking or MVCC.
A description of the problem:
* Server 1 updates /a, /b and broadcasts a prepare.
* Server 2 updates /a, /c and broadcasts a prepare at the same time.
---> Deadlock, even though both txs succeed in getting local locks.
With OL, one will fail, with PL, both will fail after TIMING OUT (LockAcquisitionTimeout millisecs), which can be unnecessarily slow. MVCC will have the same behaviour as PL since MVCC’s write lock strategy is similar to PL.
Eager Deadlock Detection (EDD) requires:
* lock() method on the lock manager to inspect other owners of the lock attempting to be acquired if the lock CANNOT be acquired.
* Can be achieved with acquire(someSmallNumber); in a loop to achieve the real timeout required:
lock(int timeout) { while (currentTime < startTime + timeout) { if (acquire(smallTimeout)) break; testEDD(); } }
Assume we have 2 servers, S1 and S2. S1 has thread TL1 which is a thread pertaining to a local transaction. S1-TL1 will have a corresponding thread, S2-TR1 pertaining to the remote prepare broadcast to S2. Similarly, S1-TL2 is a local prepare on S2, and corresponds to S1-TR2.
This process will be followed by either S1-TR2 or S2-TR1.
1) If the lock is owned and the current thread originates remotely, inspect the owner.
2) If the owner is a LOCAL tx, inspect it’s state. If it is COMMITTING or COMMITTED, then there will definitely be a deadlock, since the remote tx wants a lock on a node that has already been locally locked and is committing - which means the local tx thread will fail on other remote nodes for the same reason.
3) If the deadlock is detected abort the local transaction. (e.g., if the current thread is S1-TR2, S2-TL2 will be forced to roll back). This will allow S2-TR1 to get the remote locks it needs, allowing S1-TL1 to succeed.
4) Only one instance should do this with the local transaction, not both (since otherwise both will not succeed). I.e., only S2-TR1 or S1-TR2 should follow this process; not both.
5) Sufficiently randomise the probability of "winning" in a deadlock by each node sending out a "coin toss" - a random number generated for each transaction and stored in the TransactionEntry. This way all cache instances know, deterministically, whether it will yield it's local transaction during a deadlock. I.e., both instances know which of the two threads, S1-TR2 or S2-TR1, will yield.
The upside of such an approach is that the deadlock can be determined very quickly (milliseconds?) and one of the participants in the deadlock forced to abort, allowing at least one transaction to complete quickly.