8 Replies Latest reply on Jan 28, 2012 5:41 PM by manik

    Write Skew issue (versioning)

    pruivo

      Hi,

       

      I think I have spotted a problem with the write skew check implementation based on versioning.

       

      I've made this test to confirm:

       

      I have a global counter that is incremented concurrently by two different nodes, running ISPN with Repeatable Read with write skew enabled. I expected that each successfully transaction will commit a different value.

       

      In detail, each node do the following:

       

      beginTx

      Integer count = cache.get("counter");

      count = count + 1;

      cache.put("counter", count)

      commitTx

       

      To avoid errors, I've run this test on two ISPN versions: 5.1.0.CR4 and 5.0.1.Final. In 5.0.1.Final, it works as expected. However, on 5.1.0.CR4 I have a lot of repeated values. After a first check at the code, I've the impression that the problem may be due to that the version numbers of the keys for which the write skew check should be run is not sent with the prepare command.

       

      The ISPN config file can be found here: http://pastebin.com/UCxGXw3K

       

      Cheers,

      Pedro Ruivo

        • 1. Re: Write Skew issue (versioning)
          manik

          Hi Pedro. 

           

          I don't understand how this could have worked in 5.0.x since write skew checks in a cluster was not supported until 5.1. 

           

          Are you testing local mode?

           

          Cheers

          Manik

          • 2. Re: Write Skew issue (versioning)
            mircea.markus

            One way or the other there shouldn't get duplicate counter values, right?

            • 3. Re: Write Skew issue (versioning)
              pruivo

              Hi,

               

              I'm testing in replicated mode (full replication).

               

              In 5.0.x it works because of the locking scheme. In more detail, two cases can happen (list of events);

               

              1) write skew is detected:

               

              localTx reads "counter" and gets the value x

              remote prepare (remoteTx) is received

              remoteTx acquires lock on "counter"

              localTx tries to acquire lock on "counter"

              remoteTx updates "counter" to x+1

              remoteTx releases the lock

              localTx acquires the lock

              localTx detects that "counter"'s value is x+1 and aborts (see [1])

               

              2) deadlock/timeout acquiring the locks

               

              localTx reads "counter" and gets the value x

              localTx acquires the lock on "counter"

              remote prepare (remoteTx) is received

              remoteTx tries to acquire lock on "counter"

               

              deadlock is detected (or a timeout is triggered)

               

              For 5.1.x, I was expecting behavior like this:

               

              localTx reads "counter" and gets the value x (version y)

              remote prepare (remoteTx) is received and updates the "counter" to x+1 (version y+1)

              localTx sends the prepare command and the coordinator performs the write skew check

               

              The coordinator detects that the read version (y) is different from the actual version (y+1) and aborts the transaction

               

              This is my "definition" of write skew.

               

              Cheers,

              Pedro

               

               

              [1] in RepetableReadEntry

              if (actualValue != null && actualValue != value) {

                log.unableToCopyEntryForUpdate(getKey());

                throw new CacheException("Detected write skew");

              }

              • 4. Re: Write Skew issue (versioning)
                manik

                No, in 5.0.x you may still get dupes.

                • 5. Re: Write Skew issue (versioning)
                  manik

                  BTW is this unit test in a form that can be added to the Infinispan codebase?  If you could fork the project and create a pull request with a commit containing the test that would be great.

                  • 6. Re: Write Skew issue (versioning)
                    pruivo

                    No. The code was implemented in a modified version of radargun... However, I can try to implement it as a unit test this weekend if you are interested

                     

                    How hard is to implement a unit test?

                    • 7. Re: Write Skew issue (versioning)
                      pruivo

                      I have made a pull request with the test case. It's my first time that I create a test case and a pull request. If anything is wrong, please let me know.

                       

                      Cheers,

                      Pedro

                      • 8. Re: Write Skew issue (versioning)
                        manik

                        Thanks for the test case.  I've incorporated this into Infinispan's test suite.  The bug is documented here and fixed here.