-
1. Re: Lock striping broken for Second Level Cache use case
manik Mar 18, 2009 5:14 AM (in response to brian.stansberry)This is always going to be a problem when you have shared locks, and increasing concurrency level will only mitigate the problem, not resolve it altogether.
There is, unfortunately, a fundamental problem with a lock-per-fqn scheme, in that the lock needs to exist somewhere, even on a non-existent Fqn (for creates, or removes on nodes that don't exist).
We had this problem in PL since locks (per node) were stored in the node, and it involved all sorts of hacks to get it to work, severely degrading performance and still exposing a few race conditions. Not pleasant stuff.
Let me think about alternate approaches here, and what can be done. -
2. Re: Lock striping broken for Second Level Cache use case
manik Mar 18, 2009 5:38 AM (in response to brian.stansberry)One approach that may work is to use a ConcurrentMap as a lock container rather than an array of fixed size. Instead of mapping locks to a modularised hashcode of a given Fqn as we do now, we could use the Fqn itself as a key to a lock, and use concurrent methods on the map to locate the lock, creating new ones if absent, etc. It will perform worse (both in terms of memory and lock locating, since unnecessary locks may be created and thrown away) but it would prevent the deadlock issue mentioned above.
Also, it would be simple enough to implement as an alternate LockContainer impl and plug in. Let me give this a bit more thought. -
3. Re: Lock striping broken for Second Level Cache use case
manik Mar 18, 2009 5:43 AM (in response to brian.stansberry)I have created a JIRA to track this:
https://jira.jboss.org/jira/browse/JBCACHE-1494 -
4. Re: Lock striping broken for Second Level Cache use case
manik Mar 18, 2009 5:44 AM (in response to brian.stansberry)Ok, the solution I proposed above is broken. The map would grow infinitely and be a memory leak since there is no way to remove the lock from the concurrent map, unless there is some further synchronization, which will make things even less performant. More to think about.
-
5. Re: Lock striping broken for Second Level Cache use case
manik Mar 18, 2009 5:58 AM (in response to brian.stansberry)Ok, this can be done, but it won't be altogether straightforward:
getLock(Fqn) { 1. create new lock. 2. putIfAbsent on the concurrent map that holds locks, swapping to the actual lock in the map if needed 3. acquire lock 4. after acquisition, check again that the lock is the current lock in the concurrent map, if not, release lock and retry this method. } releaseLock(Fqn) { 1. get lock from map 2. if not null, and if owner, attempt to release lock 3. remove lock from collection }
While this will work, this will cause waiters to spin on getLock().4 until it has got the same lock it expects. In releaseLock().3, this will cause threads that release locks to clean up, but will cause additional spinning on getLock().4. Basically, in getLock(), a lock is only valid if you created it in getLock().1. -
6. Re: Lock striping broken for Second Level Cache use case
manik Mar 18, 2009 6:00 AM (in response to brian.stansberry)Actually, getLock().4 should read:
4. after acquisition, test that lock acquired is same as lock created in 1, otherwise unluck, discard lock and try again. -
7. Re: Lock striping broken for Second Level Cache use case
manik Mar 18, 2009 7:45 AM (in response to brian.stansberry)Ok, I have checked in something that may work, have a look at the JIRA for details.
This does not impact existing code in any way since it is controlled by a config switch which defaults to using lock striping. If anyone is interested in building trunk and giving this a try beyond my unit tests, feedback would be appreciated. -
8. Re: Lock striping broken for Second Level Cache use case
brian.stansberry Mar 18, 2009 1:28 PM (in response to brian.stansberry)I tried a locally built snapshot of trunk with dukehoops' test and the problem goes away.
Can you use a putIfAbsent instead of a get to try to preserve the original lock? Idea being that if threads are waiting on the original lock you try to keep using it rather than having them queue up on a new lock.### Eclipse Workspace Patch 1.0 #P jbosscache-core Index: src/main/java/org/jboss/cache/util/concurrent/locks/PerElementLockContainer.java =================================================================== --- src/main/java/org/jboss/cache/util/concurrent/locks/PerElementLockContainer.java (revision 7915) +++ src/main/java/org/jboss/cache/util/concurrent/locks/PerElementLockContainer.java (working copy) @@ -56,7 +56,8 @@ Lock l = getLock(object); l.lock(); // now check that the lock is still valid... - if (l != locks.get(object)) + Lock official = locks.putIfAbsent(object, l); + if (official != null && l != official) { // we acquired the wrong lock! l.unlock();
(The above would be applied to the overloaded acquireLock(E, long. TimeUnit) as well.)
There is still a possibility that after the lock owner removes the lock from the map in releaseLock() another thread could sneak in and create a new lock. But that's not incorrect, just not fair. -
9. Re: Lock striping broken for Second Level Cache use case
jason.greene Mar 18, 2009 1:49 PM (in response to brian.stansberry)"manik.surtani@jboss.com" wrote:
Actually, getLock().4 should read:
4. after acquisition, test that lock acquired is same as lock created in 1, otherwise unluck, discard lock and try again.
Why is this step needed? If everything is using putIfAbsent you shouldn't have a problem. -
10. Re: Lock striping broken for Second Level Cache use case
jason.greene Mar 18, 2009 2:07 PM (in response to brian.stansberry)"jason.greene@jboss.com" wrote:
"manik.surtani@jboss.com" wrote:
Actually, getLock().4 should read:
4. after acquisition, test that lock acquired is same as lock created in 1, otherwise unluck, discard lock and try again.
Why is this step needed? If everything is using putIfAbsent you shouldn't have a problem.
Ah I see this is to handle the case of a blocked/queued thread acquiring a lock. Hmm this kind of defeats the performance benefit of aqs lock queueing -
11. Re: Lock striping broken for Second Level Cache use case
jason.greene Mar 18, 2009 2:15 PM (in response to brian.stansberry)"bstansberry@jboss.com" wrote:
Can you use a putIfAbsent instead of a get to try to preserve the original lock? Idea being that if threads are waiting on the original lock you try to keep using it rather than having them queue up on a new lock.
Yes this is better, although still sucks when you have contended access on a Lock. I think it might make more sense to keep locks around and evict them later, either when the key is removed/evicted, or some other special lock eviction algorithm that uses timestamps. -
12. Re: Lock striping broken for Second Level Cache use case
brian.stansberry Mar 18, 2009 2:24 PM (in response to brian.stansberry)PerElementLockContainer.locks is using the default CHM c'tor which means 16 segments. That's low since this will be a very write-heavy map.
-
13. Re: Lock striping broken for Second Level Cache use case
jason.greene Mar 18, 2009 2:28 PM (in response to brian.stansberry)"jason.greene@jboss.com" wrote:
"bstansberry@jboss.com" wrote:
Can you use a putIfAbsent instead of a get to try to preserve the original lock? Idea being that if threads are waiting on the original lock you try to keep using it rather than having them queue up on a new lock.
Yes this is better, although still sucks when you have contended access on a Lock. I think it might make more sense to keep locks around and evict them later, either when the key is removed/evicted, or some other special lock eviction algorithm that uses timestamps.
IMO A lazy LRU lock eviction alg that is executed on lock aquire would work very well for this. -
14. Re: Lock striping broken for Second Level Cache use case
manik Mar 18, 2009 8:10 PM (in response to brian.stansberry)"jason.greene@jboss.com" wrote:
IMO A lazy LRU lock eviction alg that is executed on lock aquire would work very well for this.
Yeah, but another thread? :/
I suppose any existing eviction thread could be reused, if one exists, to mitigate this...