Possible locking improvements

Version 2

    Note: infinispan does not acquire locks for reads, so by “lock” I always mean write lock, i.e. lock that is being acquired when data is written to the cluster.


         1.  During a transaction locks are acquired locally before commit time.

    E.g. on same node two tx running:

    Tx1: writes on a -> b -> c

    Tx2: writes on a -> c -> d

    • these two transactions will execute in sequence, as after Tx1 locks “a” Tx2 won’t be able to make any progress until Tx1 is finsihed.
    • remote transactions that need to acquire a remote lock on “a” (for prepare) won’t be able to do any progress until Tx1 (and potentially Tx2) are completed.


    Suggester optimization:  do not acquire any locks until commit time. Use same behavior as  when reading the keys. This would reduce lock’s scope and give better throughput by allowing more parallelism between transactions.



    Non transactional writes from two nodes on the same key (tx situation discussed at a further point).






    Let’s say we have key “a” so that consistentHash(“a”) = {N3, N4}.  Two threads write on key “a” on nodes N1 and N2, at the same time. With the right timing RPCs can happen in the order indicated in the above diagram. Both 3 and 4 deadlock in this case.


    Possible optimization: 





    Disregarding numOwners always go to the main owner first (in this case N3). Acquire lock on it and allow it to multicast to the remaining nodes:

    • this makes sure that there won’t be deadlocks when multiple nodes write on the same key
    • user performance is affected as 1 and 2 are now executed in sequence so user performance might be affected
    • optimization stands for high contention on same key
    • also valid for situations where async replication is used as it offers better overall consistency and potentially better throughput by not having deadlocks


         3.  This one is about another approach for deadlock avoidance. Optimization is for distributed caches, without eager locking.


    Tx1: writes on “a” then “b”

    Tx2: writes on “b” then “a”

    With some “right” timing => deadlock during prepare time.


    Suggested optimization:

    • during prepare for each transaction order the keys based on their consistent hash value
    • acquire the locks in this sequence
    • this means that locks are NOT acquired in the order in which the corresponding operation has happened
    • this will assure that there won’t be a deadlock between the two transaction


    • might not work for keys that hash to the same value. For these we can use key’s native hash code - this is valid if Object.hashCode is overridden. Optionally we can expose a pluggable user callback to ask for ordering.
    • this can be extended to replicated and local caches as well.
    • it is simpler and more efficient than the current DLD mechanism as it doesn’t force one TX to rollback.




         4.  This is similar to 2 but extended to transactions. Let’s say we have Tx1 on node N1 writing to {a,b,c} and Tx2 on N2 writing {d,e,c}. Also consistentHash(c) = {N3, N4}



    If the prepare RPCs happen in the order shown on above diagram then:

    • Tx1 has lock on c@N3
    • Tx2 has lock on c@N4
    • N1 and N2 are in a deadlock
    • not handled by current DLD mechanism as e.g. on node N4 there’s no way to determine who owns the lock on N3 without querying N3 (RPC) for this.
    • what happens?
      • locks on {a,b,c,d,e,f} are being held for (potentially) lockAcquisitionTimeout - defaults to 10 seconds
      • at least of of Tx1/Tx2 won’t succeed - potentially both!
      • bad for throughput!!!


    Suggested optimization:

    • only acquire lock on the main data owner. mainDataOwner(key) =  consistentHash(key).get(0). In the above example the main data owner for “c” is N3
    • multicast the prepare message to other backups as well, just don’t acquire locks there.



    Given the example, what would be the output with the optimization in place?

    Tx1 won’t block on Tx2 when acquiring lock on N4 (RPC 3 in above diagram). It will be able to complete and, after releasing locks allow Tx2 to complete as well.

    What happens when N3 fails after prepare but before receiving the commit message? At this point Tx1 is prepared but still we don’t have any cluster lock acquired for it - what’s to be done?  One way of doing it is by marking the transaction for rollback. Here is a potentially better approach though:

    • N4 is the new data owen for “c”. It has both Tx1 and Tx2 prepare state on it
    • N4 knows that “c” was locked on main owner, but doesn’t know which of Tx1 and Tx2 was the lock owner.
    • when topology change notification is received on N4 it locks “c”  on behalf of both Tx1 and Tx2. This makes sure that another tx won’t be able to acquire locking on “c”
    • N1 will send the commit message to N4 and a new backup (N5). N4 receives the commit and “passes” the “c”‘s lock to Tx1 which completes. Lock is then released.



         5.  This has more to do with lock 2 key association. ATM we associate locks with each cache entries. For transactions that touch multiple keys it might be more efficient just use one lock instance per transaction.