14 Replies Latest reply on May 20, 2005 1:01 PM by ben.wang

Lock Acquisition Timeouts In JBC 1.2.1

jiwils Mar 25, 2005 12:50 PM

I have been using JBC 1.2 for some time, and our application has a case where multiple threads (many of them) might access the same node in the cache at the same time and/or update a node's children. We are not using transactions. This has worked fine even under high volume.

With the introduction of JBC 1.2.1, I now get lock acquisition timeout errors in medium to high volume situations, and I would like to understand how to turn them off. The Javadocs for TreeCache indicate that setting the lock acquisition timeout to zero turns them off, but this only makes the errors I have occur with *much* greater frequency. It appears that a zero setting really means wait zero milliseconds to timeout. I can set this value to an arbitrarily high number, but disabling this would be much better. As I understand it, the reason we want a lock acquisition timeout is to avoid deadlock situations (such as a distributed deadlock), but since I am not using transactions nor a synchronously replicating cache, this should not occur I would not think. Is it not possible to turn these off?

I am going try setting the isolation level to NONE from REPEATABLE_READ (the default) to see if it makes any difference. In JBC 1.2, isolation levels did not come into play unless transactions were utilized (that is what I understood to be the case anyway). Has this changed? Posts on this forum and the documentation with the 1.2.1 release seem to suggest that maybe it has.

1. Re: Lock Acquisition Timeouts In JBC 1.2.1

jiwils Mar 25, 2005 4:30 PM (in response to jiwils)

By setting the isolation level to NONE, I got a different error. Its stacktrace is below.

java.lang.IllegalStateException: addWriter(): owner already existed
 at org.jboss.cache.lock.LockMap.addWriter(LockMap.java:112)
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:175)
 at org.jboss.cache.Node.acquireWriteLock(Node.java:483)
 at org.jboss.cache.Node.acquire(Node.java:440)
 at org.jboss.cache.interceptors.LockInterceptor.lock(LockInterceptor.java:240)
 at org.jboss.cache.interceptors.LockInterceptor.invoke(LockInterceptor.java:156)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:40)
 at org.jboss.cache.interceptors.UnlockInterceptor.invoke(UnlockInterceptor.java:35)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:40)
 at org.jboss.cache.interceptors.ReplicationInterceptor.replicate(ReplicationInterceptor.java:217)
 at org.jboss.cache.TreeCache._replicate(TreeCache.java:2682)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:324)
 at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:236)
 at org.jgroups.blocks.RpcDispatcher.handle(RpcDispatcher.java:220)
 at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:615)
 at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:512)
 at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:326)
 at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUp(MessageDispatcher.java:722)
 at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.access$300(MessageDispatcher.java:554)
 at org.jgroups.blocks.MessageDispatcher$1.run(MessageDispatcher.java:691)
 at java.lang.Thread.run(Thread.java:534)

I am not sure what this means beyond the fact the isolation level change did apparently influence how the cache was behaving regardless of the fact that I am not using transactions.

Is the answer to handle the lock acquisition timeout exceptions mentioned in the previous post, somehow turn them off (how?), or is there another alternative?

2. Re: Lock Acquisition Timeouts In JBC 1.2.1

jiwils Mar 25, 2005 4:57 PM (in response to jiwils)

One more piece of new information, there is obviously some deadlock going on here. I increased the lock acquisition timeout time to 5 minutes and noticed the following:

* the process stopped (did nothing) waiting for 5 minutes after about 500 simultaneous client requests
* after this timeout period, more timeout exceptions occurred than when the timeout was set to 15 or 30 seconds

I am not sure what to do now as this same code (both client and server) works *fine* when used with JBC 1.2.
Actions
3. Re: Lock Acquisition Timeouts In JBC 1.2.1

jiwils Mar 25, 2005 6:36 PM (in response to jiwils)

Additionally, I have noticed/tried the following:

* the lock acquisition timeouts seem only to occur while listing a given node's children and the retrieving information from each of the children
* if the isolation level is set to READ_COMMITTED, the lock acquisition timeouts completely go away (I have no idea why)

Should a change in isolation level create/alleviate a dead lock situation like this?
Actions
4. Re: Lock Acquisition Timeouts In JBC 1.2.1

belaban Mar 26, 2005 3:17 AM (in response to jiwils)

In 1.2.1, we use method-level locking (lock held for the duration of the method call, ege. put()). if no TXs are used. Therefore set the TX-level to NONE should solve this
Actions
5. Re: Lock Acquisition Timeouts In JBC 1.2.1

jiwils Mar 26, 2005 1:40 PM (in response to jiwils)

I thought it might be something like that, but should I get the IllegalStateException that I got (shown above) when I did that?
Actions
6. Re: Lock Acquisition Timeouts In JBC 1.2.1

belaban Mar 30, 2005 4:03 AM (in response to jiwils)

I fixed this (http://jira.jboss.com/jira/browse/JBCACHE-117).
Will be available in JBossCache 1.2.2
Actions
7. Re: Lock Acquisition Timeouts In JBC 1.2.1

neilalbiston May 17, 2005 7:21 AM (in response to jiwils)

I'm getting this Exception quite a lot in our live environment.
Customers are starting to shout so I need to fix it as soon as possible.

I have a few questions

Do you know when 1.2.3 will be available?

Is there a beta release available now which contains the fix?

I can catch the exception and retry the put operation but is there any housekeeping calls I can make to ensure that the retry will complete ?

I'm currently running Jcache 1.2.2 jBoss 3.2.5
Clustered
REP_SYNC
Isolation Level = NONE ( until we sort out the teething problems )

Any help or guidance appreciated,

Thanks,

Neil
Actions
8. Re: Lock Acquisition Timeouts In JBC 1.2.1

belaban May 17, 2005 8:01 AM (in response to jiwils)

#1 You should still use REPEATABLE_READ, or READ_UNCOMMITTED (for dirty reads)

#2 1.2.3 will be out before the end of June, I'm shooting for mid June
Actions
9. Re: Lock Acquisition Timeouts In JBC 1.2.1

neilalbiston May 17, 2005 8:29 AM (in response to jiwils)

Thank you for your quick response.

Initially the setting was synchronized. This was causing areas of the cache to lock up. I tried READ_UNCOMMITTED but the write lock still occurred.

I've set up an eviction policy against a region ( REGIONA ) and initially I was using
put("REGIONA", "key1", Object1)

to put items in the cache. I expected each item to go into a seperate node ...but the eviction behaviour suggests differently.
...and lots of locking errors occurred.

changing the code to
put("/REGIONA/key1", "key1", Object1)

appears to behave the way I expect with each item having its own lifespan.

Could this have also been causing my locking problem? Was I putting all the objects into one node?

This locking only happens in live ...so I cannot really 'try it just to see what happens. Customers complain.

Neil
Actions
10. Re: Lock Acquisition Timeouts In JBC 1.2.1

belaban May 17, 2005 9:18 AM (in response to jiwils)

The nature of pessimistic locking may lead to some deadlocks, e.g. check out the DeadlockUnit test case:
If tx1 acquires /a/b, then tx2 acquires /1/2, then tx1 tries to acquire /1/2 and tx2 tries to acquire /a/b, that's a deadlock.
One of the 2 transactions will time out, therefore rollback its changes, and the other one will succeed. You should be prepared to catch TimeoutExceptions and handle them, e.g. retry,
Actions
11. Re: Lock Acquisition Timeouts In JBC 1.2.1

neilalbiston May 17, 2005 9:35 AM (in response to jiwils)

Thank you.
I though I was being careful about the deadlock because (I thought that) I was putting each object in a new node by giving it an exclusive key.

If I use....

put("/REGIONA", "key1", Object1)
and
put("/REGIONA", "key2", Object2)

Is there a potential lock because they are both in the same path?
and if so will changing to....
put("/REGIONA/key1", "key1", Object1)
and
put("/REGIONA/key2", "key2", Object2)

....solve the deadlock?
Actions
12. Re: Lock Acquisition Timeouts In JBC 1.2.1

belaban May 17, 2005 10:25 AM (in response to jiwils)

Yes, we lock on FQNs, so you use the same FQN (/REGIONA). Yes on the 2nd question
Actions
13. Re: Lock Acquisition Timeouts In JBC 1.2.1

neilalbiston May 20, 2005 4:32 AM (in response to jiwils)

Its all working fine now. Replication set to Synchronised. No deadlock errors. Customers are happy. Helkpdesk is quiet again and I understand the cache a little better. Thank you for your quick response, it saved the day.

Neil
Actions
14. Re: Lock Acquisition Timeouts In JBC 1.2.1

ben.wang May 20, 2005 1:01 PM (in response to jiwils)

Yes, if it is just one node, the locking granularity is on the node! So you should create a tree if you can.

-Ben
Actions

Go to original post