By setting the isolation level to NONE, I got a different error. Its stacktrace is below.
java.lang.IllegalStateException: addWriter(): owner already existed at org.jboss.cache.lock.LockMap.addWriter(LockMap.java:112) at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:175) at org.jboss.cache.Node.acquireWriteLock(Node.java:483) at org.jboss.cache.Node.acquire(Node.java:440) at org.jboss.cache.interceptors.LockInterceptor.lock(LockInterceptor.java:240) at org.jboss.cache.interceptors.LockInterceptor.invoke(LockInterceptor.java:156) at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:40) at org.jboss.cache.interceptors.UnlockInterceptor.invoke(UnlockInterceptor.java:35) at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:40) at org.jboss.cache.interceptors.ReplicationInterceptor.replicate(ReplicationInterceptor.java:217) at org.jboss.cache.TreeCache._replicate(TreeCache.java:2682) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:236) at org.jgroups.blocks.RpcDispatcher.handle(RpcDispatcher.java:220) at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:615) at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:512) at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:326) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUp(MessageDispatcher.java:722) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.access$300(MessageDispatcher.java:554) at org.jgroups.blocks.MessageDispatcher$1.run(MessageDispatcher.java:691) at java.lang.Thread.run(Thread.java:534)
I am not sure what this means beyond the fact the isolation level change did apparently influence how the cache was behaving regardless of the fact that I am not using transactions.
Is the answer to handle the lock acquisition timeout exceptions mentioned in the previous post, somehow turn them off (how?), or is there another alternative?
One more piece of new information, there is obviously some deadlock going on here. I increased the lock acquisition timeout time to 5 minutes and noticed the following:
* the process stopped (did nothing) waiting for 5 minutes after about 500 simultaneous client requests
* after this timeout period, more timeout exceptions occurred than when the timeout was set to 15 or 30 seconds
I am not sure what to do now as this same code (both client and server) works *fine* when used with JBC 1.2.
Additionally, I have noticed/tried the following:
* the lock acquisition timeouts seem only to occur while listing a given node's children and the retrieving information from each of the children
* if the isolation level is set to READ_COMMITTED, the lock acquisition timeouts completely go away (I have no idea why)
Should a change in isolation level create/alleviate a dead lock situation like this?
In 1.2.1, we use method-level locking (lock held for the duration of the method call, ege. put()). if no TXs are used. Therefore set the TX-level to NONE should solve this
I thought it might be something like that, but should I get the IllegalStateException that I got (shown above) when I did that?
I'm getting this Exception quite a lot in our live environment.
Customers are starting to shout so I need to fix it as soon as possible.
I have a few questions
Do you know when 1.2.3 will be available?
Is there a beta release available now which contains the fix?
I can catch the exception and retry the put operation but is there any housekeeping calls I can make to ensure that the retry will complete ?
I'm currently running Jcache 1.2.2 jBoss 3.2.5
Isolation Level = NONE ( until we sort out the teething problems )
Any help or guidance appreciated,
#1 You should still use REPEATABLE_READ, or READ_UNCOMMITTED (for dirty reads)
#2 1.2.3 will be out before the end of June, I'm shooting for mid June
Thank you for your quick response.
Initially the setting was synchronized. This was causing areas of the cache to lock up. I tried READ_UNCOMMITTED but the write lock still occurred.
I've set up an eviction policy against a region ( REGIONA ) and initially I was using
put("REGIONA", "key1", Object1)
to put items in the cache. I expected each item to go into a seperate node ...but the eviction behaviour suggests differently.
...and lots of locking errors occurred.
changing the code to
put("/REGIONA/key1", "key1", Object1)
appears to behave the way I expect with each item having its own lifespan.
Could this have also been causing my locking problem? Was I putting all the objects into one node?
This locking only happens in live ...so I cannot really 'try it just to see what happens. Customers complain.
The nature of pessimistic locking may lead to some deadlocks, e.g. check out the DeadlockUnit test case:
If tx1 acquires /a/b, then tx2 acquires /1/2, then tx1 tries to acquire /1/2 and tx2 tries to acquire /a/b, that's a deadlock.
One of the 2 transactions will time out, therefore rollback its changes, and the other one will succeed. You should be prepared to catch TimeoutExceptions and handle them, e.g. retry,
I though I was being careful about the deadlock because (I thought that) I was putting each object in a new node by giving it an exclusive key.
If I use....
put("/REGIONA", "key1", Object1)
put("/REGIONA", "key2", Object2)
Is there a potential lock because they are both in the same path?
and if so will changing to....
put("/REGIONA/key1", "key1", Object1)
put("/REGIONA/key2", "key2", Object2)
....solve the deadlock?
Yes, we lock on FQNs, so you use the same FQN (/REGIONA). Yes on the 2nd question
Its all working fine now. Replication set to Synchronised. No deadlock errors. Customers are happy. Helkpdesk is quiet again and I understand the cache a little better. Thank you for your quick response, it saved the day.
Yes, if it is just one node, the locking granularity is on the node! So you should create a tree if you can.