-
15. Re: Deadlock situation when L1 is enabled for distributed cache
manik Apr 12, 2010 8:12 AM (in response to yelin66)Hmm, I wonder if this has to do with lock stripes. Have you tried disabling lock striping?
http://docs.jboss.org/infinispan/4.0/apidocs/config.html#ce_default_locking
-
16. Re: Deadlock situation when L1 is enabled for distributed cache
yelin66 Apr 12, 2010 10:00 AM (in response to manik)I did disable lock striping already. Otherwise, the deadlock happened in no time with two cache instances only. As I mentioned in an earlier message in this thread, this case seems NOT a deaklock as I reproduced it once with the failure ONLY on one instance. So it seems to me just failed to grab the lock sometimes...
BTW, another thing I noticed - after I use FD_SOCK plus FD_ALL in JGroups configuration per Bela Ban's suggestion, this lock failure happens much less often than before. Any hints from this observation?
-
17. Re: Deadlock situation when L1 is enabled for distributed cache
galder.zamarreno Apr 14, 2010 6:17 AM (in response to yelin66)What did you originally have in your configuration? Only FD_SOCK or only FD/FD_ALL?
-
18. Re: Deadlock situation when L1 is enabled for distributed cache
yelin66 Apr 14, 2010 9:37 AM (in response to galder.zamarreno)Originally, I didn't specify the JGroups configuration file in my Infinispan configuration. So it's supposed to use the default one coming with jgroups-2.9.0.GA.jar. In that case, the lock acquisition failure and other exceptions happened quite often. Now I got the following config, and only the lock acquisition failure occurred occasionally.
<FD_SOCK/>
<FD_ALL timeout="15000" interval="5000"/> -
19. Re: Deadlock situation when L1 is enabled for distributed cache
manik Apr 23, 2010 7:51 AM (in response to yelin66)Lin,
Sorry for taking so long to get back to you on this. I did some investigation, and the problem is that when you are not using transactions, the locks are held by the thread performing an invocation and the locks used are JDK ReentrantLocks. ReentrantLocks do not have a mechanism to determine the current owner even if a thread is locked and only provides a best-effort guess (see ReentrantLock.getOwner())
When you use transactions, however, the locks used are OwnableReentrantLocks (in the org.infinispan.util.concurrent.locks package) which can accurately determine which transaction holds the lock.
I agree that the 'null' is misleading though, since it suggests that the lock is not held by anyone. I have fixed this in Infinispan trunk so that the lock owner is not logged as null, but rather as 'another thread' which is more accurate.
Could you give this a try? Your best bet would be to get a hold of trunk and build it yourself, or if you use Maven, point to infinispan 4.1.0-SNAPSHOT.
Cheers
Manik
-
20. Re: Deadlock situation when L1 is enabled for distributed cache
yelin666 Apr 23, 2010 9:22 AM (in response to manik)Manik,
Thanks for the response. I'll give it a try, and get back to you with more meaningful log data.
Meantime, I reproduced the same issue with replicated cache yesterday during stress testing, so it seems not directly L1 related. Anyway I'll get more data to you later.
Thanks,
Lin
P.S. I got something strange with JBoss Community login. I couldn't login with my old account since yesterday, and I ended up creating a new account. Any idea what's going on?
-
21. Re: Deadlock situation when L1 is enabled for distributed cache
manik Apr 23, 2010 11:44 AM (in response to yelin666)Hi, I just released 4.1.0.ALPHA3 which has the fix mentioned above so you don't need to build Infinispan from sources.
Re: JBoss.org logins, have you read this announcement?
-
22. Re: Deadlock situation when L1 is enabled for distributed cache
yelin666 Apr 23, 2010 12:09 PM (in response to manik)Great, thanks!
And thank you for sending the announcement. I guess that's my problem...
Have a wonderful weekend!
-
23. Re: Deadlock situation when L1 is enabled for distributed cache
manik Apr 27, 2010 9:01 AM (in response to yelin666)Meantime, I reproduced the same issue with replicated cache yesterday during stress testing, so it seems not directly L1 related. Anyway I'll get more data to you later.
Was this sync or async replicated? And were you using transactions?
-
24. Re: Deadlock situation when L1 is enabled for distributed cache
yelin66 Apr 27, 2010 9:35 AM (in response to manik)This is sync replicated, and I did NOT use transaction in any of my testing as we don't have a transaction requirements so far.
-
25. Re: Deadlock situation when L1 is enabled for distributed cache
manik Apr 27, 2010 9:46 AM (in response to yelin66)Hmm, I cannot imagine how such a deadlock would occur then, assuming:
- SYNC repl
- Each node only edits its own keys (e.g., key = "NodeName" + key)
- Lock striping is disabled
If you have any TRACE level logs here that would be great.
-
26. Re: Intermittent lock acquisition failure for replicated/distributed cache
yelin66 Apr 27, 2010 9:53 AM (in response to manik)I should probably start a new thread of discussion for this, as the title is INCORRECT. As I mentioned in an earlier message in the middle of this thread, I thought it's NOT really a deadlock. In some occassion the lock acquisition only failed on one node, so it's a lock acquisition failure instead of deadlock.
Sorry about the confusion on the title, I assumed it was a deadlock initially. I'll try to update the title in this message, and hope it would work.