8 Replies Latest reply on Nov 15, 2010 11:23 AM by infinikli

    Explicit locking in DIST_SYNC mode

    infinikli Newbie

      Hello everybody.

       

      We're currently evaluating infinispan as a data grid. Because we need to have control over the access of the shared objects, we need locking-mechanics. The idea is to lock data-access in the following manner:


          public <K,V> boolean performLockedCacheEntryAction(ItemAction<V> action, K key) throws NotSupportedException, SystemException {
              TransactionManager tm = null;
              try {
                  Cache<K, V> c =  cacheManager.getCache();
                  /*
                   * Obtain TransactionManager.
                   */
                  tm = c.getAdvancedCache().getTransactionManager();
                 
                  /*
                   * Begin transaction.
                   */
                  tm.begin();
                  /*
                   * Lock key clusterwide to avoid concurrent access on the action.
                   */
                  c.getAdvancedCache().lock(key);
                 
                  /*
                   * Perform action.
                   */
                  boolean changedState = action.doAction();
                  if(changedState){
                      /*
                       * Put object into cache when state has changed.
                       */
                      c.put(key, action.getItem());
                  }
                  /*
                   * Commit transaction and release locks.
                   */
                  tm.commit();
                  return changedState;
              } catch (Throwable t) {
                  logger.error(t,t);
                  if(tm!=null && tm.getStatus() == Status.STATUS_ACTIVE){
                      logger.debug("Rolling back transaction "+tm);
                      tm.rollback();
                  }
                  return false;
              }
          }

       

      However when we test this code-snippet on two nodes, the action is sometimes performed twice. The Cache is configured as follows:

       

              Configuration config = new Configuration();
              config.setCacheMode(CacheMode.DIST_SYNC);
              config.setL1CacheEnabled(true);
              config.setL1Lifespan(60000);                             config.setTransactionManagerLookupClass("org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup");
              config.setEagerLockSingleNode(true);       
              cacheManager = new DefaultCacheManager(GlobalConfiguration.getClusteredDefault());

       

      Are we missing something?

      N.B: We are using 4.2.0.BETA1 artifacts. All versions below had massive problems with locking which lead into TimeoutExceptions while acquiring the lock.

        • 1. Re: Explicit locking in DIST_SYNC mode
          Manik Surtani Master

          I'm not sure I understand what you mean by the action is sometimes performed twice.  Is your method (performLockedCacheEntryAction) called in a loop?  By multiple threads?  Both?

          • 2. Re: Explicit locking in DIST_SYNC mode
            infinikli Newbie

            We run this action on two different nodes simultaneously.

            Node A: putIntoCache(key1, objectXY) @ T = 1

            Node A: performLockedCacheEntryAction(actionXY,key1) @ T=2

            Node B: performLockedCacheEntryAction(actionXY,key1) @ T=2

            Where the action changes the state of the objectXY (with 'key1' as key in our distributed cache).

            • 3. Re: Explicit locking in DIST_SYNC mode
              Manik Surtani Master

              Yes, in this case performLockedCacheEntryAction would happen twice, if the tx on NodeA and the tx on NodeB don't overlap.  Or one may block on the other to finish, and then run.

              • 4. Re: Explicit locking in DIST_SYNC mode
                infinikli Newbie

                Ok, my fault. I should have explained what the action is doing

                boolean doaction{

                     if(item.state==0){

                          item.state == 1;     //state changed

                          return true;

                     }

                     return false;

                     //state not changed

                }

                The problem is that this method returns true on both nodes.

                • 5. Re: Explicit locking in DIST_SYNC mode
                  Erik Salter Newbie

                  You mentioned lots of timeouts.  Can you try removing the L1 caching.  See https://jira.jboss.org/browse/ISPN-763 for more details.

                   

                  Also, I'm unsure of where you're getting your state -- the item.state value returned by doAction().  From the code snippet above, it doesn't look like it's coming from the cache.  You might be missing a get() call after you lock the key.

                  • 6. Re: Explicit locking in DIST_SYNC mode
                    infinikli Newbie

                    The item from above is an item which is shared across the cache. Before performing the action, we create an Action-object which takes this cached item as constructor-parameter.

                     

                    //getFromCache performs a get() on the cache.

                    ItemTestImpl itl = cacheManager.<Integer,ItemTestImpl>getFromCache(currentCacheName, key);

                    //create the action
                    TestAction testAction = new TestAction(itl);

                    //perform the action
                    boolean intoCache = cacheManager.performLockedCacheEntryAction(currentCacheName, test, key);

                    if(intoCache) logger.info("Action performed");

                    • 7. Re: Explicit locking in DIST_SYNC mode
                      Erik Salter Newbie

                      Have you read the locking and transaction sections of the wiki?  Specifically, ISPN uses MVCC -- in a nutshell, reads don't block writers.  The default isolation level is READ_COMMITTED as well.

                       

                      It's certainly possible, given your chain of events, that the following is happening.

                       

                      Node A: putIntoCache(key1, objectXY).  Value is 1.

                      Node A: Reads value of objectXY.  Last committed value is 1.

                      Node B: Reads value of objectXY.  Last committed value is 1.

                      Node A: performLockedCacheEntryAction(actionXY,key1)

                      Node B: performLockedCacheEntryAction(actionXY,key1)

                       

                      It seems like to get the behavior you want, you might need to either change the isolation level to REPEATABLE_READ and handle the possibilities of a write-skew error, or explicitly lock the cache key before reading the value. 

                      • 8. Re: Explicit locking in DIST_SYNC mode
                        infinikli Newbie

                        Ok, thanks for the response. I finally got the point