2 Replies Latest reply on Jan 14, 2011 3:33 PM by ryanhos

Unable to acquire lock on Fqn error

drcallaway Nov 9, 2009 11:44 AM

During load tests, we keep running into this error:

org.jboss.cache.lock.TimeoutException: Unable to acquire lock on Fqn [/session/c06045ea-bb09-44c6-910d-d84eedc67d3e] after [10000] milliseconds
for requestor [Thread[http-0.0.0.0-8080-199,5,jboss]]! Lock held by [null]

I've tried a number of different configuration changes with no success. Currently, my locking configuration looks like this:

<locking isolationLevel="REPEATABLE_READ" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="10000"
writeSkewCheck="false" useLockStriping="false" concurrencyLevel="1000"/>

I'm also using the JDBCCacheLoader with Oracle 11i. The load tests will run fine for quite a while but will eventually fail with the "Unable to acquire lock on Fqn" error. The session node specified by the FQN is only accessed by a single thread at a time. These session nodes are both read and written to but I've also seen this error on read-only nodes. Any ideas why this might occur? Why does it indicate that the lock is held by "[null]"?

Any help is appreciated.

Dustin

1. Re: Unable to acquire lock on Fqn error

ghurdyl Mar 22, 2010 8:25 AM (in response to drcallaway)
Hello,

I allow myself to revive the topic as we have the same problem and can't find a solution.
We are using Jboss Cache 3.2.1 to store and retrive picture encapsulated in an HttpResponse.
We have this problem when trying to use JDBCCacheLoader (on a SQL Server database) instead of FileCacheLoader.

The lock configuration is as follow :

<locking isolationLevel="READ_COMMITTED" lockAcquisitionTimeout="10000" nodeLockingScheme="mvcc" writeSkewCheck="false" concurrencyLevel="1000" />

But few trials to change these values don't significatly affect the behaviour.

The error "org.jboss.cache.lock.TimeoutException: Unable to acquire lock on Fqn [/command_image/style_20/root/3/2/2/2] after [10000] milliseconds for requestor [Thread[http-8080-49,5,main]]! Lock held by [null]"
comes when doing stress test on the cache (100 requests in 10 sec repeted 10 times with JMeter)
The error comes more likely after a while, the first requests use to pass sucessfullly.

The application that uses JBoss Cache used an earlier version (1.4) before I upgraded it so I may have forgotten something in the migration.

I am new in the JBoss Cache's world so I can have forgotten something very obvious.

Thanks for any help.

Nicolas.
Actions
2. Re: Unable to acquire lock on Fqn error

ryanhos Jan 14, 2011 3:33 PM (in response to ghurdyl)
Those of you stuck on JBoss Cache 3.2.x that are running into this bug can use the following work-around to eliminate this issue.

The root of the problem is as previously mentioned, the "acquire local lock on modification, but only acquire the remote lock on commit" pattern. (actually, it's acquired on prepare(), but JBoss Cache's prepare() is called during JTA's commit()). The answer is to prevent the deadlock from ever occurring by denying lock requests that would create such a deadlock.

Determine which LockManager your configuration uses. Inspect LockManagerFactory and your current <jbosscache> XML or runtime configuration to determine which one gets constructed for you.
MyCompanyCustomLockManager extends JBCLockManagerFromStepOne.
Intercept every visible lock() method.
Implement shouldLockBeGranted(Object, GlobalTransaction) throws CacheException, call it during each of the intercepted methods. Throw CacheException when the lock should not be granted.
Install your custom Lock Manager.

On installing your custom lock manager: If you are permitted to modify and repackage OSS code on your project, you're home free. If not, keep reading. JBoss Cache contains a homegrown DI framework. You can abuse this DI framework to inject your own LockManager.

Implement MyCompanyCacheFactory extends DefaultCacheFactory. Update your configuration/code to use this cache factory instead.
Make sure this is called before cache start(): componentRegistry.registerComponent(new MyCompanyCustomLockManager(), LockManager.class);
Annotate MyCompanyCustomLockManager as @NonVolatile. This is a violation of the spirit of that annotation, but the component registry purges all volatile components during the cache start() phase. The caveat here is that the LockManager will be fixed, regardless of changes to the <locking> portion of the configuration. Remember, we're just trying to duct-tape a broken bit of software that we're stuck with, not make durable, maintainable software.

As for the algorithm of shouldLockBeGranted(Object, GlobalTransaction), I cannot give you the code. It must be deterministic. Each node must be able to calculate the superiority of one lock request over another without communicating with the other nodes. You must create some artificial method of ordering GlobalTransactions. The primary key of a GlobalTransaction is a JGroups Address and a java.lang.Long transaction ID. That artificial ordering may not be fair, but it does at least allow one process to win, while the other one is told that it requested a write which would have ended in a deadlock. (e.g. WriteLockDeniedException extends CacheException).

3 Cluster Nodes: A, B, and C. 2 Transactions: X and Y. 1 Cache node: "foo".
Assume that the artificial ordering of GlobalTransactions places Y before X (Y < X).

A: cache.put(foo, bar, bat); "foo" is now locally locked for TX X.
A: Create GlobalTransaction X, associate with JTA transaction.
C: cache.put(foo, bar, boo); "foo is now locally locked for TX Y.
C: Create GlobalTransaction Y, associate with JTA transaction.
(remember, time ordering doesn't matter here. The TX ordering is artificial and not fair, just deterministic).
A: commit();
B: received request for lock on "foo" for TX X. shouldLockBeGranted == X < null == true. Granted. (a lock request is always considred superior to "no existing lock", i.e. getWriteOwner() == null).
C: commit();
B: received request for lock on "foo" for TX Y. shouldLockBeGranted == Y < X == true. Granted, but waiting lockAcquisitionTimeout millis until lock is available.
A: received request for lock on "foo" for TX Y. shouldLockBeGranted == Y < X == true. Granted, but waiting.
C: received request for lock on "foo" from TX X. shouldLockBeGranted == X < Y == FALSE. Not granted.
A: prepare() failed on C. Okay, tell everybody to abort, unlocking "foo"
B: got abort for TX X. unlock "foo".
B: lock foo is now available for TX Y
A: everybody aborted, time for me to abort. lock is now available for TX Y.
C: Got locks for TX Y from every other Node (A, B). prepare() done. commit().

Caveats and notes:
All of our transactions only involve a single cache node. It's a cache, not a database...
We use MVCC locking. If you're not, YMMV.
This was my best effort within the constraints of my project. It works for me with 4 JBoss cluster nodes, 40 cache writes per second per cluster node, performed randomly on a pool of 500 cache nodes. Some nodes win, some nodes get the WriteLockDeniedException. It's better than everybody waiting 15 seconds for a TimeoutException.
The inherent unfairness of the artificial ordering of transactions is mitigated by the fact that we use node.replace(key, oldValue, newValue) to guarantee that good cache state is not overwritten.
Actions

Go to original post