4 Replies Latest reply on Feb 9, 2012 10:43 AM by sudheerk84

    Help in debugging locking and time out issues

    sudheerk84

      I run infinispan in 4 nodes under distributed mode and i get teh following exception.

       

      330613 2012-01-25 19:18:42,330 [Incoming-2,tspexServicesClusterPRODUCTION,378944-eng2.com-56328] ERROR org.infinispan.interceptors.InvocationContextInterceptor.handleAll:111  - ISPN000136: Execution error

      org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [25] for requestor [GlobalTransaction:<378941-eng1com-56847>:10:remote]! Lock held by [GlobalTransaction:<37894

      4-eng2.com-56328>:16:local]

              at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:240)

              at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:157)

              at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:109)

              at org.infinispan.interceptors.LockingInterceptor.visitPutKeyValueCommand(LockingInterceptor.java:293)

       

       

      I understand that , exception which clearly says that some other local vm thread , has taken a lock on this key entry.

       

      Is there any way in which i can use this log information to find out , which exact process/ code or thread is holding the lock at this moment of time. ?

       

      Lock held by [GlobalTransaction:<378944-eng2.com-56328>:16:local

       

      GlobalTransaction: - means its global transaction

       

      378944-eng2.com- - gives me information about teh vm .

       

      What exactly do 56328 and 16 mean ?

       

      Is there anything i can use in jmx to get more information about these kind of errors .

       

       


       


        • 1. Re: Help in debugging locking and time out issues
          dan.berindei

          Infinispan uses a specialized lock that can be held by a transaction instead of a thread - that's because the originator of a transaction could be a remote thread. So in order to find the thread holding the lock you have to find the thread that started transaction 'GlobalTransaction:<378941-eng1com-56847>:10:remote'.

           

          • 378941-eng1com is indeed the name of the node (as defined in the transport configuration, or the hostname of the machine if you haven't defined it)
          • 56847 is a random id for the node - you can ignore it. Every time a node starts it gets a new id, so that after a restart the other members know it doesn't have the old data any more.
          • 10 is the transaction id - it's guaranteed to be unique in a node - this is the one you have to look for.

           

          I don't think you can find about a specific transaction in JMX, but if you run your application with a debugger you can pause it and check which one of your user threads started this particular transaction.

          You can also enable TRACE logging for org.infinispan.transaction.xa.TransactionXaAdapter and it will log the beginning of every transaction.

          1 of 1 people found this helpful
          • 2. Re: Help in debugging locking and time out issues
            sudheerk84

            Thanks for the reply.

             

            I canot use teh debugger because i see most of these errors in my production like environment with high load and concurrency . I think enabling teh logging as you suggested may be teh way to go.

            • 3. Re: Help in debugging locking and time out issues
              sudheerk84

              i have tried to enable these logs and started debuuging my issue. But i have the following confusions.

               

              1. I  see many of the following logs (for different transactions)

               

                   2012-02-09 11:34:55,453 [serviceTaskExecutor-23] TRACE org.infinispan.transaction.xa.TransactionXaAdapter.tracef:1815  - start called on tx GlobalTransaction:<TSSEDEV5-60439>:327:local

               

                   2012-02-09 11:34:55,766 [Atomikos:125] TRACE org.infinispan.transaction.xa.TransactionXaAdapter.tracef:1815  - end called on tx GlobalTransaction:<TSSEDEV5-60439>:327:local

               

                   2012-02-09 11:34:55,810 [Atomikos:127] TRACE org.infinispan.transaction.xa.TransactionXaAdapter.tracef:1815  - Skipping prepare as we're configured to run 1PC. Xid=tm_tspex-app-container_10027900078tm_tspex-app-container_1472

               

                   2012-02-09 11:34:55,817 [Atomikos:124] TRACE org.infinispan.transaction.xa.TransactionXaAdapter.tracef:1800  - Committing transaction GlobalTransaction:<TSSEDEV5-60439>:327:local. One phase? false

               

                   followed by call on - transaction commit event listeners

               

              a. Why is it skipping prepare ?

               

              b.  I see many such transactions when i am expecting only one transaction ?

               

              I use Atomikos and spring transaction  annotaions - i have marked the outer most method @Tranastcional  - and inner methods are not expected to start new transactions. Is this the problem with my configurations ?

               

              OR

               

              Infinspan maintains this transaction per cache ? (what is the relation of this id with my JTA tranactions). If i have multiple calls to a cache inside my JTA transaction will there be multiple of these logs above ?

              • 4. Re: Help in debugging locking and time out issues
                sudheerk84

                Now i have noted that

                 

                "   2012-02-09 11:34:55,810 [Atomikos:127] TRACE org.infinispan.transaction.xa.TransactionXaAdapter.tracef:1815  - Skipping prepare as we're configured to run 1PC. Xid=tm_tspex-app-container_10027900078tm_tspex-app-container_1472 "

                 

                comes when teh replicated cache is configured for asynchronous replication across nodes.

                 

                The cache whicg have synchronuis replciation do not seem to skip this step.

                 

                Also it will be great if you let me know which is the logger i have to enable to get teh cache name printed along with the transaction.

                 

                PS:

                 

                One more observation : I also see taht a transtcion gets started when we do a cache.get() operation. is this expected ?