14 Replies Latest reply on May 20, 2005 1:01 PM by ben.wang

    Lock Acquisition Timeouts In JBC 1.2.1

    jiwils

      I have been using JBC 1.2 for some time, and our application has a case where multiple threads (many of them) might access the same node in the cache at the same time and/or update a node's children. We are not using transactions. This has worked fine even under high volume.

      With the introduction of JBC 1.2.1, I now get lock acquisition timeout errors in medium to high volume situations, and I would like to understand how to turn them off. The Javadocs for TreeCache indicate that setting the lock acquisition timeout to zero turns them off, but this only makes the errors I have occur with *much* greater frequency. It appears that a zero setting really means wait zero milliseconds to timeout. I can set this value to an arbitrarily high number, but disabling this would be much better. As I understand it, the reason we want a lock acquisition timeout is to avoid deadlock situations (such as a distributed deadlock), but since I am not using transactions nor a synchronously replicating cache, this should not occur I would not think. Is it not possible to turn these off?

      I am going try setting the isolation level to NONE from REPEATABLE_READ (the default) to see if it makes any difference. In JBC 1.2, isolation levels did not come into play unless transactions were utilized (that is what I understood to be the case anyway). Has this changed? Posts on this forum and the documentation with the 1.2.1 release seem to suggest that maybe it has.

        • 1. Re: Lock Acquisition Timeouts In JBC 1.2.1
          jiwils

          By setting the isolation level to NONE, I got a different error. Its stacktrace is below.

          java.lang.IllegalStateException: addWriter(): owner already existed
           at org.jboss.cache.lock.LockMap.addWriter(LockMap.java:112)
           at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:175)
           at org.jboss.cache.Node.acquireWriteLock(Node.java:483)
           at org.jboss.cache.Node.acquire(Node.java:440)
           at org.jboss.cache.interceptors.LockInterceptor.lock(LockInterceptor.java:240)
           at org.jboss.cache.interceptors.LockInterceptor.invoke(LockInterceptor.java:156)
           at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:40)
           at org.jboss.cache.interceptors.UnlockInterceptor.invoke(UnlockInterceptor.java:35)
           at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:40)
           at org.jboss.cache.interceptors.ReplicationInterceptor.replicate(ReplicationInterceptor.java:217)
           at org.jboss.cache.TreeCache._replicate(TreeCache.java:2682)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
           at java.lang.reflect.Method.invoke(Method.java:324)
           at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:236)
           at org.jgroups.blocks.RpcDispatcher.handle(RpcDispatcher.java:220)
           at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:615)
           at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:512)
           at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:326)
           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUp(MessageDispatcher.java:722)
           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.access$300(MessageDispatcher.java:554)
           at org.jgroups.blocks.MessageDispatcher$1.run(MessageDispatcher.java:691)
           at java.lang.Thread.run(Thread.java:534)
          


          I am not sure what this means beyond the fact the isolation level change did apparently influence how the cache was behaving regardless of the fact that I am not using transactions.

          Is the answer to handle the lock acquisition timeout exceptions mentioned in the previous post, somehow turn them off (how?), or is there another alternative?

          • 2. Re: Lock Acquisition Timeouts In JBC 1.2.1
            jiwils

            One more piece of new information, there is obviously some deadlock going on here. I increased the lock acquisition timeout time to 5 minutes and noticed the following:

            * the process stopped (did nothing) waiting for 5 minutes after about 500 simultaneous client requests
            * after this timeout period, more timeout exceptions occurred than when the timeout was set to 15 or 30 seconds

            I am not sure what to do now as this same code (both client and server) works *fine* when used with JBC 1.2.

            • 3. Re: Lock Acquisition Timeouts In JBC 1.2.1
              jiwils

              Additionally, I have noticed/tried the following:

              * the lock acquisition timeouts seem only to occur while listing a given node's children and the retrieving information from each of the children
              * if the isolation level is set to READ_COMMITTED, the lock acquisition timeouts completely go away (I have no idea why)

              Should a change in isolation level create/alleviate a dead lock situation like this?

              • 4. Re: Lock Acquisition Timeouts In JBC 1.2.1
                belaban

                In 1.2.1, we use method-level locking (lock held for the duration of the method call, ege. put()). if no TXs are used. Therefore set the TX-level to NONE should solve this

                • 5. Re: Lock Acquisition Timeouts In JBC 1.2.1
                  jiwils

                  I thought it might be something like that, but should I get the IllegalStateException that I got (shown above) when I did that?

                  • 6. Re: Lock Acquisition Timeouts In JBC 1.2.1
                    belaban

                    I fixed this (http://jira.jboss.com/jira/browse/JBCACHE-117).
                    Will be available in JBossCache 1.2.2

                    • 7. Re: Lock Acquisition Timeouts In JBC 1.2.1
                      neilalbiston

                      I'm getting this Exception quite a lot in our live environment.
                      Customers are starting to shout so I need to fix it as soon as possible.

                      I have a few questions

                      Do you know when 1.2.3 will be available?

                      Is there a beta release available now which contains the fix?

                      I can catch the exception and retry the put operation but is there any housekeeping calls I can make to ensure that the retry will complete ?

                      I'm currently running Jcache 1.2.2 jBoss 3.2.5
                      Clustered
                      REP_SYNC
                      Isolation Level = NONE ( until we sort out the teething problems )

                      Any help or guidance appreciated,

                      Thanks,

                      Neil

                      • 8. Re: Lock Acquisition Timeouts In JBC 1.2.1
                        belaban

                        #1 You should still use REPEATABLE_READ, or READ_UNCOMMITTED (for dirty reads)

                        #2 1.2.3 will be out before the end of June, I'm shooting for mid June

                        • 9. Re: Lock Acquisition Timeouts In JBC 1.2.1
                          neilalbiston

                          Thank you for your quick response.

                          Initially the setting was synchronized. This was causing areas of the cache to lock up. I tried READ_UNCOMMITTED but the write lock still occurred.

                          I've set up an eviction policy against a region ( REGIONA ) and initially I was using
                          put("REGIONA", "key1", Object1)

                          to put items in the cache. I expected each item to go into a seperate node ...but the eviction behaviour suggests differently.
                          ...and lots of locking errors occurred.

                          changing the code to
                          put("/REGIONA/key1", "key1", Object1)

                          appears to behave the way I expect with each item having its own lifespan.

                          Could this have also been causing my locking problem? Was I putting all the objects into one node?

                          This locking only happens in live ...so I cannot really 'try it just to see what happens. Customers complain.

                          Neil

                          • 10. Re: Lock Acquisition Timeouts In JBC 1.2.1
                            belaban

                            The nature of pessimistic locking may lead to some deadlocks, e.g. check out the DeadlockUnit test case:
                            If tx1 acquires /a/b, then tx2 acquires /1/2, then tx1 tries to acquire /1/2 and tx2 tries to acquire /a/b, that's a deadlock.
                            One of the 2 transactions will time out, therefore rollback its changes, and the other one will succeed. You should be prepared to catch TimeoutExceptions and handle them, e.g. retry,

                            • 11. Re: Lock Acquisition Timeouts In JBC 1.2.1
                              neilalbiston

                              Thank you.
                              I though I was being careful about the deadlock because (I thought that) I was putting each object in a new node by giving it an exclusive key.

                              If I use....

                              put("/REGIONA", "key1", Object1)
                              and
                              put("/REGIONA", "key2", Object2)

                              Is there a potential lock because they are both in the same path?
                              and if so will changing to....
                              put("/REGIONA/key1", "key1", Object1)
                              and
                              put("/REGIONA/key2", "key2", Object2)

                              ....solve the deadlock?

                              • 12. Re: Lock Acquisition Timeouts In JBC 1.2.1
                                belaban

                                Yes, we lock on FQNs, so you use the same FQN (/REGIONA). Yes on the 2nd question

                                • 13. Re: Lock Acquisition Timeouts In JBC 1.2.1
                                  neilalbiston

                                  Its all working fine now. Replication set to Synchronised. No deadlock errors. Customers are happy. Helkpdesk is quiet again and I understand the cache a little better. Thank you for your quick response, it saved the day.

                                  Neil

                                  • 14. Re: Lock Acquisition Timeouts In JBC 1.2.1

                                    Yes, if it is just one node, the locking granularity is on the node! So you should create a tree if you can.

                                    -Ben