5 Replies Latest reply on Jul 27, 2011 6:18 AM by manik

Locks not working as expected in DIST_SYNC cache

daver32 Jul 5, 2011 6:49 PM

Well, we made it past the joining, but we seem to be missing something on locking. It seems that we cannot achieve cluster-wide locking.

We are running Infinispan within a JBoss AS 5.1.0.GA cluster of two systems. Our application accepts JMS messages and places them in the Infinispan cache. Listeners to the cache on each system awaken and try to lock the just-added node, process it and delete it.

The logic in the listener is as follows:

public void newEntry( CacheEntryCreatedEvent cece ) {

TransactionManager tm = cache.getAdvancedCache().getTransactionManager();

tm.begin();

String key = cece.getKey().toString();

cache.getAdvancedCache().lock(key); // Hoping that this is a cluster-wide lock - one node gets it, the other waits

if (cache.containsKey(key)) { // May have been removed by listener on the other node

... publish on particular JMS topic ...

cache.remove(key);

}

tm.commit(); // Releases lock

}

Unfortunately, we are regularly getting errors like the following:

2011-07-05 15:33:28,129 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-1,MY_CLUSTER,MyHost-15542) Execution error:

org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [ID:JBM-5d328a73-3b57-4a43-b3f4-acc677b1e1e3] for requestor [GlobalTransaction:<OtherHost-43220>:9:remote]! Lock held by [GlobalTransaction:<MyHost-15542>:5:local]

at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:228)

at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:155)

...

This confuses us. The work we do while the key is locked is sub-second work.

We often see that both nodes get the error on the same key.

The cache is configured as follows:

GlobalConfiguration gc = GlobalConfiguration.getClusteredDefault();

gc.setClusterName( "MY_CLUSTER" );

Properties p = new Properties();

p.setProperty("configurationFile", "jgroups-tcp.xml"); // Not sure this is required

gc.setTransportProperties(p);

Configuration c = new Configuration();

c.setCacheMode( Configuration.CacheMode.DIST_SYNC ); // ********

c.setUseEagerLocking(true); // Seems necessary

c.setSyncCommitPhase(true);

c.setSyncRollbackPhase(true);

// c.setUseReplQueue(false);

c.setIsolationLevel(IsolationLevel.READ_COMMITTED);

c.setTransactionManagerLookupClass("org.infinispan.transaction.lookup.GenericTransactionManagerLookup" );

MarshallerFactory foo = (MarshallerFactory) Thread.currentThread().getContextClassLoader()

.loadClass("org.jboss.marshalling.river.RiverMarshallerFactory").newInstance();

EmbeddedCacheManager cm = new DefaultCacheManager( gc, c );

cache = cm.getCache( "MY_CACHE" );

What have we missed? It seems that the cluster-wide locking is not working for us.

Is our model not a good one for Infinispan use? Is there something we could do to better match Infinipsan's strengths?

Are we just not configuring things correctly to achieve cluster-wide locking?

Are we misuing the transactions or lock calls?

Any hints will be appreciated.

1. Re: Locks not working as expected in DIST_SYNC cache

manik Jul 8, 2011 7:17 AM (in response to daver32)

Are you using Infinispan purely as a distributed lock to synchronize access to a JMS topic? This is abuse of Infinispan. It isn't a distributed lock API, but rather a distributed data structure. However, if you really wanted to hack Infinispan to be used as a distributed lock API, you would want to make sure all lock acquisitions are both ordered and pessimistic. For this, I suggest using a customised JGroups config file (copy the one shipped with Infinispan and modify) to add a total order protocol to the JGroups stack (SEQUENCER does this). This will slow things down, but will guarantee that cluster-wide lock acquisitions happen in order and you won't see the deadlocks that cause your problem above.
1 of 1 people found this helpful
Actions
2. Re: Locks not working as expected in DIST_SYNC cache

daver32 Jul 8, 2011 12:57 PM (in response to manik)

Manik,

Thanks for your reply.

This is not just a distributed lock. It is basically a backing-store to JMS. We accept JMS messages from various sources and need to reliably deliver them to our clients, again through JMS. Only one system should pub any particular message to the clients.

Placing the messages into the cache is done by locking the key and then doing the putIfAbsent within a transaction. A CacheListener awakens seeing a CacheEntryCreated event. The listener then locks on the key and, if the cache still contains the key (may have been deleted by the other system) pubs the message and deletes the cache entry, all within a transaction. This should allow only one system to pub the message to the clients.

As in the original post, the cluster-wide locking we were expecting does not seem to happen.

We experimented with SEQUENCE and also CENTRAL/PEER_LOCKing. Thanks for the pointers. However, we are getting deadlock exceptions. Config details below.

Then we experimented with setting EagerLockSingleNode to true and we seem to get the desired behavior – surprisingly. We are working in a two-system cluster and EagerLockSingleNode says it performs a lock to one remote system. It seems to have given us the cluster-wide (two system) locking behavior we need. Are we fooling ourselves?

We are configuring the cache with a file patterned off of the example config files with our additions and minor mods:

<?xml version="1.0" encoding="UTF-8"?>

<infinispan
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="urn:infinispan:config:4.2 http://www.infinispan.org/schemas/infinispan-config-4.2.xsd"
      xmlns="urn:infinispan:config:4.2">

   <global>
      <globalJmxStatistics
            enabled="true"
            jmxDomain="org.infinispan"
            cacheManagerName="SampleCacheManager"/>

      <transport
            clusterName="infinispan-cluster"
            machineId="m1-DJR"
            rackId="r1" nodeName="Node-A-DJR">
         <properties>
            <property name="configurationFile" value="jgroups-tcp.xml" />
         </properties>
      </transport>
   </global>

   <default>
      <locking
         isolationLevel="READ_COMMITTED"
         lockAcquisitionTimeout="20000"
         writeSkewCheck="false"
         concurrencyLevel="5000"
         useLockStriping="false"
      />
      <transaction
            transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"
            syncRollbackPhase="true"
            syncCommitPhase="true"
            useEagerLocking="true"
            eagerLockSingleNode="true"
            cacheStopTimeout="30000" />
      <deadlockDetection enabled="true" spinDuration="1000"/>
      <jmxStatistics enabled="true"/>
      <clustering mode="replication">
         <stateRetrieval
            timeout="20000"
            fetchInMemoryState="false"
            alwaysProvideInMemoryState="false"
         />
         <sync replTimeout="20000"/>
      </clustering>
   </default>

    <namedCache name="OUR_CACHE">
      <clustering mode="distribution">
         <sync/>
         <hash
            numOwners="2"
            rehashWait="120000"
            rehashRpcTimeout="600000"
         />
         <l1
            enabled="false"
            lifespan="600000"
         />
      </clustering>
   </namedCache>

</infinispan>

The jgroups-tcp.xml is taken from the examples as well. It is in there that we added the SEQUENCE block:
   <SEQUENCER ergonomics="false"
              level="TRACE"
   />
We tried this block at both the top and bottom of the config block, but continued to have deadlock issues. We saw the jgroups sequencer in the logged configuration for jgroups, but nowhere else.

How can we best achieve our goals?
Actions
3. Re: Locks not working as expected in DIST_SYNC cache

manik Jul 11, 2011 6:23 AM (in response to daver32)

Hmm, putIfAbsent and similar atomic operations should not be used within the scope of a transaction as this can provide pretty unexpected behaviour. See this thread for more details.
1 of 1 people found this helpful
Actions
4. Re: Locks not working as expected in DIST_SYNC cache

daver32 Jul 13, 2011 4:08 PM (in response to manik)

Manik,

Thanks for the pointer and warning about putIfAbsent. We read many of the posts and had high hopes that replacing putIfAbsent() with a simpler put() would suddenly make everything work. Unfortunately, we continued to have problems - some new and soluble (e.g. ClassNotFound on UUID) and others related to locking.

We're back to wondering if we are trying to make Infinispan do things for which it was not designed. It seems that having a few writers to the cache, many readers and fairly long data lifetimes is a supported paradigm.

We have two writers, two readers and data lifetimes in the milliseconds. We anticipated that eager locking would slow us down, but we expected it to work. We can't get it to work.

Are we trying to put a square peg in a round hole?

Thanks again.
Actions
5. Re: Locks not working as expected in DIST_SYNC cache

manik Jul 27, 2011 6:18 AM (in response to daver32)

So you still get deadlock exceptions when using a put() instead of a putIfAbsent()? I noticed that your deadlock detection spin duration is very high. That means it will take quite a while to detect such deadlocks. Try something like 100ms.
Actions

Go to original post