8 Replies Latest reply on Jul 27, 2007 7:59 AM by Manik Surtani

    New Locking Strategy Proposal for JBoss Cache

    Jason Greene Master

      Hello everyone,

      I would like your feedback on a new locking strategy design. This design,
      if ratified, would be targeted for a future version of JBoss Cache (not 2.0.0)

      Modern databases are able to achieve high concurrency by ensuring that
      readers are never blocked by writers. Our current pessimistic locking
      strategy does not offer this precondition, which leads to poor
      performance (lock contention, upgrade exceptions, timeouts, etc) when
      there is a mixture of reads and writes on the same node (Pretty much the
      use case of PojoCache). Also, the current pessimistic model also has a
      dirty-read race condition with READ COMMITTED.

      Optimistic locking solves the concurrency problems by implementing full
      MVCC, this however has other problems. It has memory overhead, since
      every tx has to keep its own node copy. It can't be used with write
      locks on reads (SELECT FOR UPDATE, Option.setForceWriteLock()) and for
      the same reason it can't implement the SERIALIZATION isolation level.
      Last, it has to prevent overwrites (rollback exception), which while
      useful in some cases, it's not always the desired behavior.

      My proposed solution is capable of all isolation levels, high read
      concurrency, and has a relatively low memory footprint. This is done by
      using limited global versioning with write locking, and MVCC for reads:

      An incoming reader is notated as R, a writer as W, the current version
      as v, and the node state is a function N(v). Essentially at any given
      time a reader will see N(v). Once a write occurs, we copy the node,
      N(v)' which only the writer sees. So in short we have:

      R = N(v)
      W = N(v)'

      There can only ever be one writer (one write lock), so therefore only
      one instance of N(v)'. This gives us the essential property "readers
      never blocked by writers" as well as prevents dirty reads.

      Once the writer TX commits, the new version is visible to new readers,
      this will of course require a short lived atomic operation for the swap,
      which gives us READ COMMITTED:
      N(v + 1) = N(v)'
      v = v + 1

      If REPEATABLE READ is desired, on the first read of a node, the TX holds
      a reference to that specific version (this is just a reference! no
      copying, so everyone that hits the same version shares the same
      version), z. This version is used for all read calls, until either a
      write occurs within the same TX or the TX ends.

      z = v
      RR = N(z)

      When a RR TX obtains a write lock on a node we toss the stale version,
      N(z), (which is eventually GCd). The same semantics in READ COMMITTED
      are followed, N(v)' is created as a copy of N(v). Reads and writes from
      the write lock holding TX then use N(v)' This gives us SELECT FOR
      UPDATE.

      READ UNCOMMITTED can be implemented by either substituting READ
      COMMITTED (which is allowed), or by not creating an N(v)' for writes as
      an optimization.

      SERIALIZABLE can be implemented by just obtaining a write lock on all
      reads, N(v)' is not needed as well.

      So, in summary, all isolation modes with the exception of SERIALIZABLE
      will never block on another TX for a read (unless setForceWriteLock() is
      used on the read). Memory footprint will only be 1 additional copy per
      active writer in READ COMMITTED, and in the case of REPEATABLE READ only
      1 additional copy per write commit that is visible to a reading TX.

      I created a nice table to outline the differences between the previous
      locking modes, and this proposal. I refer to the proposal
      as NL (New Locking)

      Item PL OL NL
      ---------------------------------------------------------------
      Non-Blocking Reads N Y Y
      All Isolation Modes Y[1] N[2] Y
      SELECT FOR UPDATE Y N Y
      Prevents write-skew in RR Y Y Y (Optional)
      Low memory overhead Y N Y
      No comparison phase Y N Y
      No copy on read Y N Y
      No copy on write Y N Y (Only RU + S)
      Fail-fast version checking N/A N Y (Optional)
      
      
      1. READ COMMITTED has to use a RW lock to be reliable,
      this isn't the case now (dirty read race).
      2. SERIALIZABLE is not possible
      


      Thoughts?