3 Replies Latest reply on Jun 16, 2015 2:44 AM by rvansa

cannot lock entry after entry owner has been disconnected

jseparovic Jun 14, 2015 12:07 AM

Hi,

I have a 3 node replicated clustered cache functioning as a high frequency "lock then process" type application. Each iteration tries to lock a record in the cache, if successful it does stuff, then writes back (sometimes). If lock fails, it moves on to the next record.

return new ConfigurationBuilder()

.locking()

.concurrencyLevel(3)

.lockAcquisitionTimeout(3000L)

.isolationLevel(IsolationLevel.REPEATABLE_READ)

.useLockStriping(false)

.clustering()

.cacheMode(CacheMode.REPL_SYNC)

.transaction()

.transactionManagerLookup(new JBossTransactionManagerLookup())

.lockingMode(LockingMode.PESSIMISTIC)

.transactionMode(TransactionMode.TRANSACTIONAL)

.autoCommit(false)

.build();

*version: jboss-as-clustering-infinispan-7.2.0.Final.jar

When all 3 nodes are up everything operated as expected. Then when I "ifdown" the owner of the cache entry, the other 2 nodes cannot obtain a lock on this entry until the owner comes back up:

After bringing down the owner, I get a SuspectException, then on the next attempt I get a TimeoutException. Then the timeout exception repeats until the owner node comes back up. (this happens on both the non-owner nodes)

14-Jun-2015 03:40:28,801 DEBUG [CacheContainer] (pool-12-thread-1) TX: lock failed on 5a594cd4-405c-4c50-9086-5deb0bda6571 : org.infinispan.remoting.transport.jgroups.SuspectException : Suspected member: node1/mycache

Lock info: AbstractPerEntryLockContainer{locks={}}

14-Jun-2015 03:40:28,801 DEBUG [Controller] (pool-12-thread-1) Couldn't Lock: 5a594cd4-405c-4c50-9086-5deb0bda6571

14-Jun-2015 03:40:28,801 DEBUG [CacheContainer] (pool-12-thread-1) TX: rollback

14-Jun-2015 03:40:33,803 DEBUG [Controller] (pool-12-thread-1) CacheContainer lock info: AbstractPerEntryLockContainer{locks={}}

14-Jun-2015 03:40:33,803 DEBUG [CacheContainer] (pool-12-thread-1) TX: begin

14-Jun-2015 03:40:33,804 DEBUG [CacheContainer] (pool-12-thread-1) TX: attempting lock on 5a594cd4-405c-4c50-9086-5deb0bda6571

14-Jun-2015 03:40:36,809 DEBUG [CacheContainer] (pool-12-thread-1) TX: lock failed on 5a594cd4-405c-4c50-9086-5deb0bda6571 : org.infinispan.util.concurrent.TimeoutException : Could not acquire lock on 5a594cd4-405c-4c50-9086-5deb0bda6571 on behalf of transaction GlobalTransaction:<node2:mycache>:9:local. Lock is being held by null

Any ideas how to handle the stale lock once the SuspectException is raised? Should this be handled by infinispan?

Cheers,

Jason Separovic

1. Re: cannot lock entry after entry owner has been disconnected

rvansa Jun 15, 2015 3:57 AM (in response to jseparovic)

SuspectExceptions should be handled transparently; not sure if you just see that in Infinispan logs or if it's thrown to application - it should throw only replication exceptions if the dead node does not reply soon enough (or the TimeoutException on lock acquisition, but not in this case I think). After the node gets suspected, rebalance should take place and another node should become the owner (actually the writes should be possible even during the rebalance). Marking the node as dead usually takes about 10 - 60 seconds (depends on your JGroups configuration). So in your case you should get exceptions several seconds after ifdown, but not for too long.
Actions
2. Re: cannot lock entry after entry owner has been disconnected

jseparovic Jun 15, 2015 3:12 PM (in response to rvansa)

Based on my jgroups config, I can see the suspectException after around 6 seconds of issuing ifdown on node1.

But node2 and node3 then get timeout exceptions continuously "Lock is being held by null". (One test bed still has this null lock since saturday).

            <stack name="tcp">
                <transport type="TCP" socket-binding="jgroups-tcp" diagnostics-socket-binding="jgroups-diagnostics-tcp"/>
                <protocol type="TCPPING">
                    <property name="initial_hosts">
                        node1[7600],node2[7600],node3[7600]
                    </property>
                    <property name="num_initial_members">
                        2
                    </property>
                    <property name="port_range">
                        0
                    </property>
                    <property name="timeout">
                        2000
                    </property>
                </protocol>
                <protocol type="MERGE2"/>
                <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd">
                </protocol>
                <protocol type="FD">
                    <property name="timeout">2000</property>
                    <property name="max_tries">3</property>
                </protocol>
                <protocol type="VERIFY_SUSPECT"/>
                <protocol type="BARRIER"/>
                <protocol type="pbcast.NAKACK"/>
                <protocol type="UNICAST2"/>
                <protocol type="pbcast.STABLE"/>
                <protocol type="pbcast.GMS"/>
                <protocol type="UFC"/>
                <protocol type="MFC"/>
                <protocol type="FRAG2"/>
            </stack>
Actions
3. Re: cannot lock entry after entry owner has been disconnected

rvansa Jun 16, 2015 2:44 AM (in response to jseparovic)

Ok, the SuspectExceptions should be definitely handled, and "Lock is being held by null" seems a bit strange (after failing to lock it seems that the lock is not locked by anyone). Please, file a JIRA with log set to TRACE level on org.infinispan
Actions

Go to original post