1 2 Previous Next 26 Replies Latest reply on Apr 27, 2010 9:53 AM by yelin66

    Deadlock situation when L1 is enabled for distributed cache

    yelin66

      I am doing some testing with Infinispan. One of my tests have parallel updates from multiple cach instances, but there is no key collision for the updates (the key on each instance has different prefix). I also disabled useStripingLock, and as I have no key collision I never got a deadlock situation when I run the test with replicated and distributed cache without L1. However, when I enalble L1, the deadlock situation appears. When looking at the exception message, some cache instances tried to lock keys that's supposed to be used by other instances. As the L1 cache is for read only and will be invalidated instead of updated, I am wondering why this happens? Could you please suggest?

       

      Thanks in advance.

        • 1. Re: Deadlock situation when L1 is enabled for distributed cache
          manik

          Do you have a unit test for this case?  Where an instance attempts to lock a key it does not own (that is in L1)?

          • 2. Re: Deadlock situation when L1 is enabled for distributed cache
            yelin66

            Thanks for the response. I don't have a unit test for it, and I am afraid it's hard to reproduce the case in unit test. Actually, it only happens to me when I have parallel updates from multiple cache instances. Basically, on each instance, I am only reading/writing to keys with a substring specific to that instance, so the L1 on that instance should only contain keys with that specific substring. By looking at the log, I noticed that certain instance tried to lock a key that contains a substring specific to a different instance, that key may or may not owned by the local instance but is supposed only to be updated by the other instance. Any suggestions on what's going on?

             

            Also, while I was reproducing the above issue, I got couple other exceptions.

             

            1. The following replication timeout exception occurred in about 10 seconds after the update was kicked off while my replTimeout was set much longer than that. Also, when the update succeeded on average that update took around 60 milliseconds only. Could you please suggest what may potentically cause the timeout?

            13:20:53,888 | INFO  | Thread-14        | ReadUpdateTestDriver             | t.drivers.ReadUpdateTestDriver$5  341 | updating 2500 entries
            13:21:03,230 | ERROR | Thread-14        | InvocationContextInterceptor     | ors.InvocationContextInterceptor   55 | Execution error:
            org.infinispan.util.concurrent.TimeoutException: Replication timeout for SSELabCl07-53048
                at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:421)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:100)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:124)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:229)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:216)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:212)
                at org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:304)
                at org.infinispan.interceptors.DistributionInterceptor.visitPutMapCommand(DistributionInterceptor.java:168)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.LockingInterceptor.visitPutMapCommand(LockingInterceptor.java:210)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
                at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:183)
                at org.infinispan.interceptors.TxInterceptor.visitPutMapCommand(TxInterceptor.java:152)
                at org.infinispan.interceptors.DistTxInterceptor.visitPutMapCommand(DistTxInterceptor.java:104)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
                at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
                at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
                at org.infinispan.CacheDelegate.putAll(CacheDelegate.java:447)
                at org.infinispan.CacheDelegate.putAll(CacheDelegate.java:216)
                at ge.energy.smartgrid.applications.omtest.controllers.InfinispanCacheController.putAll(InfinispanCacheController.java:180)
                at ge.energy.smartgrid.applications.omtest.drivers.ReadUpdateTestDriver$5.run(ReadUpdateTestDriver.java:370)
            13:21:03,232 | ERROR | Thread-14        | jgroups                          | org.jgroups.util.Util$1            76 | uncaught exception in Thread[Thread-14,5,Thread Pools] (thread group=org.jgroups.util.Util$1[name=JGroups,maxpri=10] )
            org.infinispan.util.concurrent.TimeoutException: Replication timeout for SSELabCl07-53048
                at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:421)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:100)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:124)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:229)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:216)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:212)
                at org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:304)
                at org.infinispan.interceptors.DistributionInterceptor.visitPutMapCommand(DistributionInterceptor.java:168)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.LockingInterceptor.visitPutMapCommand(LockingInterceptor.java:210)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
                at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:183)
                at org.infinispan.interceptors.TxInterceptor.visitPutMapCommand(TxInterceptor.java:152)
                at org.infinispan.interceptors.DistTxInterceptor.visitPutMapCommand(DistTxInterceptor.java:104)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
                at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
                at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
                at org.infinispan.CacheDelegate.putAll(CacheDelegate.java:447)
                at org.infinispan.CacheDelegate.putAll(CacheDelegate.java:216)
                at ge.energy.smartgrid.applications.omtest.controllers.InfinispanCacheController.putAll(InfinispanCacheController.java:180)
                at ge.energy.smartgrid.applications.omtest.drivers.ReadUpdateTestDriver$5.run(ReadUpdateTestDriver.java:370)
            13:21:03,492 | WARN  | bCluster01-36455 | NAKACK                           | .jgroups.protocols.pbcast.NAKACK  965 | (requester=SSELabClusterMaster-58229, local_addr=LabCluster01-36455) message SSELabCl04-17481::7670 not found in retransmission table of SSELabCl04-17481:
            [7654 : 7671 (7671)]
            13:21:03,492 | WARN  | bCluster01-36455 | NAKACK                           | .jgroups.protocols.pbcast.NAKACK  965 | (requester=SSELabCl07-53048, local_addr=LabCluster01-36455) message SSELabCl04-17481::7670 not found in retransmission table of SSELabCl04-17481:
            [7654 : 7671 (7671)]
            13:21:03,493 | WARN  | bCluster01-36455 | NAKACK                           | .jgroups.protocols.pbcast.NAKACK  965 | (requester=SSELabCl05-11846, local_addr=LabCluster01-36455) message SSELabCl04-17481::7670 not found in retransmission table of SSELabCl04-17481:
            [7654 : 7671 (7671)]
            13:21:03,492 | WARN  | bCluster01-36455 | NAKACK                           | .jgroups.protocols.pbcast.NAKACK  965 | (requester=SSELabCl08-8505, local_addr=LabCluster01-36455) message SSELabCl04-17481::7670 not found in retransmission table of SSELabCl04-17481:

             

            2. Got the following SuspectException, but not sure what it means. Could you please suggest?

            11:35:20,063 | WARN  | SSELabCl02-54447 | GMS                              | org.jgroups.protocols.pbcast.GMS  562 | SSELabCl02-54447: not member of view [SSELabClusterMaster-6376|10] [SSELabClusterMaster-6376, LabCluster01-42248, SSELabCl03-25876, SSELabCl04-52466, SSELabCl05-1712, SSELabCl06-47092, SSELabCl08-59019, SSELabCl09-22842]; discarding it
            11:35:22,181 | WARN  | SSELabCl02-54447 | GMS                              | org.jgroups.protocols.pbcast.GMS  562 | SSELabCl02-54447: not member of view [SSELabClusterMaster-6376|11] [SSELabClusterMaster-6376]; discarding it
            11:35:24,090 | ERROR | Thread-8         | InvocationContextInterceptor     | ors.InvocationContextInterceptor   55 | Execution error:
            org.infinispan.remoting.transport.jgroups.SuspectException: Suspected member: SSELabCl08-59019
                at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:418)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:100)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:124)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:229)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:216)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:212)
                at org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:304)
                at org.infinispan.interceptors.DistributionInterceptor.visitPutMapCommand(DistributionInterceptor.java:168)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.LockingInterceptor.visitPutMapCommand(LockingInterceptor.java:210)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
                at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:183)
                at org.infinispan.interceptors.TxInterceptor.visitPutMapCommand(TxInterceptor.java:152)
                at org.infinispan.interceptors.DistTxInterceptor.visitPutMapCommand(DistTxInterceptor.java:104)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
                at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
                at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
                at org.infinispan.CacheDelegate.putAll(CacheDelegate.java:447)
                at org.infinispan.CacheDelegate.putAll(CacheDelegate.java:216)
                at ge.energy.smartgrid.applications.omtest.controllers.InfinispanCacheController.putAll(InfinispanCacheController.java:180)
                at ge.energy.smartgrid.applications.omtest.drivers.ReadUpdateTestDriver$2.run(ReadUpdateTestDriver.java:182)
            11:35:24,092 | ERROR | Thread-8         | jgroups                          | org.jgroups.util.Util$1            76 | uncaught exception in Thread[Thread-8,5,Thread Pools] (thread group=org.jgroups.util.Util$1[name=JGroups,maxpri=10] )
            org.infinispan.remoting.transport.jgroups.SuspectException: Suspected member: SSELabCl08-59019
                at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:418)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:100)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:124)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:229)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:216)
                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:212)
                at org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:304)
                at org.infinispan.interceptors.DistributionInterceptor.visitPutMapCommand(DistributionInterceptor.java:168)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.LockingInterceptor.visitPutMapCommand(LockingInterceptor.java:210)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
                at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:183)
                at org.infinispan.interceptors.TxInterceptor.visitPutMapCommand(TxInterceptor.java:152)
                at org.infinispan.interceptors.DistTxInterceptor.visitPutMapCommand(DistTxInterceptor.java:104)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
                at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
                at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
                at org.infinispan.CacheDelegate.putAll(CacheDelegate.java:447)
                at org.infinispan.CacheDelegate.putAll(CacheDelegate.java:216)
                at ge.energy.smartgrid.applications.omtest.controllers.InfinispanCacheController.putAll(InfinispanCacheController.java:180)
                at ge.energy.smartgrid.applications.omtest.drivers.ReadUpdateTestDriver$2.run(ReadUpdateTestDriver.java:182)

            • 3. Re: Deadlock situation when L1 is enabled for distributed cache
              manik

              Lin Ye wrote:

               

              Thanks for the response. I don't have a unit test for it, and I am afraid it's hard to reproduce the case in unit test. Actually, it only happens to me when I have parallel updates from multiple cache instances. Basically, on each instance, I am only reading/writing to keys with a substring specific to that instance, so the L1 on that instance should only contain keys with that specific substring. By looking at the log, I noticed that certain instance tried to lock a key that contains a substring specific to a different instance, that key may or may not owned by the local instance but is supposed only to be updated by the other instance. Any suggestions on what's going on?

               

              I'm still not entirely sure I get what you mean.  So you have the following setup:

               

                   Node1: read/write on keys "Node1-*"

                   Node2: read/write on keys "Node2-*"

               

              And you are surprised that you see Node1 locking a key starting with "Node2-* ?  If you are using DIST with > 1 numOwner, then this is normal since Node1 will hold Node2's backups (and vice versa). 

               

              Re: your second issue, could you start a separate thread for that?

              • 4. Re: Deadlock situation when L1 is enabled for distributed cache
                yelin66

                Thanks for the reply. What you described for the setup is correct. However, I am still not clear why the deadlock happens. The node1 could hold node2's back up and tried to acquire a lock, but I don't see why it failed to acquire the lock and cause a dead lock on both sides. As my test program on node1 & node2 would never update the same key at the same time, why did it cause a deadlock? Actually, when I run the test against replicated cache with 10 nodes having exact same copy of data, if I disable striping lock, I don't see the deadlock as there is no key collision. I only saw this when L1 is enabled.

                • 5. Re: Deadlock situation when L1 is enabled for distributed cache
                  manik

                  Hmm, only when L1 is enabled.  Interesting.  L1 only comes into play when you have remote nodes reading state which it does not own.  Can I confirm that:

                   

                  1. your test runs on 2 nodes
                  2. you have numOwners set to 2
                  3. that you do not see the deadlock when you use distribution with L1 disabled?

                   

                  Also, is this something easy to reproduce?  Can you grab TRACE level logs on both nodes when the deadlock occurs?  I'd like to see who's holding the locks on both sides.

                  • 6. Re: Deadlock situation when L1 is enabled for distributed cache
                    yelin66

                    My test runs on 10 nodes with numOwners set to 2. I haven't seen the deadlock for distribution with L1 disabled so far, but I can't garrantee it won't happen as it may be a chance issue. My test runs some read test first, then the write test. So the L1 got kicked in during the read test, and may somehow relates to the write test later but I am not sure.

                     

                    I used to be able to reproduce this. However, currently I am seeing the Replication timeout & SuspectException in the other post more frequently, that kind of cover this one. I'll try to reproduce this, and get the TRACE level logs to you. Meantime, if you get any hints on what's going on, please let me know.

                    • 7. Re: Deadlock situation when L1 is enabled for distributed cache
                      manik

                      Lin Ye wrote:

                       

                      My test runs on 10 nodes with numOwners set to 2. I haven't seen the deadlock for distribution with L1 disabled so far, but I can't garrantee it won't happen as it may be a chance issue. My test runs some read test first, then the write test. So the L1 got kicked in during the read test, and may somehow relates to the write test later but I am not sure.

                      How can you guarantee that the reads are local?  E.g.,

                       

                           cache1.put(k1);

                           cache1.get(k1);

                       

                      there is no guarantee that k1 is mapped to cache1.  It could be resident on cache7 and cache9, for example.  Then doing the get() on cache1 will cause it to be put into L1.

                      • 8. Re: Deadlock situation when L1 is enabled for distributed cache
                        yelin66

                        Sorry I got confused by your latest message, and need clarification. What do you refer to by cache1, cache7, cache9? Do you mean the portion of cache data on node1, node7, and node9? If that's the case, I NEVER said it's guaranteed a cache.put(k1) on node1 would map to node1, and that's why I said a cache.get(k1) on node1 may kick in an entry on L1 of node1. I hope I made it clear. However, regardless which node the k1 mapped to, if I do cache.put(k1) on node1 and cach.put(k7) on node7 at the same time, but k1 & k7 would never be the same and useStripingLock is disabled, do you expect a deaklock situation to arise as design?

                        • 9. Re: Deadlock situation when L1 is enabled for distributed cache
                          manik

                          Right, that makes sense.  And no, I still don't expect any deadlocks to arise - I'll have a look and see why this is the case.  Again, if you can recreate this on, say, 3 nodes, in the form of a functional unit test, that would help us immensely.

                          • 10. Re: Deadlock situation when L1 is enabled for distributed cache
                            yelin66

                            Thanks for the feedbacks. I'll try to reproduce it.

                            • 11. Re: Deadlock situation when L1 is enabled for distributed cache
                              yelin66

                              I couldn't reproduce it with 3 or 5 nodes so far. I got it with 10 nodes. Turning on TRACE level log for log4j.logger.org.infinispan.util.concurrent.locks produced too much logging. The test took forever, and the meaningful part of log was lost at the end. Here I can only give you the exception info. Please let me know if you get any hints from it...

                               

                              Basically, I got exceptions in the logs on 2 nodes. Please note the clocks on 2 nodes drifted a little bit, so the exceptions on 2 nodes actually occured around the same time. It looks to me the update was initiated by SSELabCl04 as it's in the key, but the key was owned by SSELabCl05. I am not sure about what it means "for requestor [Thread[OOB-23,SSELabCl05-61693,5,Thread Pools]]! Lock held by [null]". Please suggest. And for this particular test the only node failed to finish the test is SSELabCl04, so it seems to me that may not be a deadlock situation but just failed to acquire the lock somehow. Any idea on what may cause the failure and how to avoid it? I would appreciate your feedbacks.

                               

                              The exception on node SSELabCl04:

                              12:54:43,020 | ERROR | Thread-14        | InvocationContextInterceptor     | ors.InvocationContextInterceptor   55 | Execution error:
                              org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [/measurements/acline/value@SSELabCl04/10.44.44.154_439] for requestor [Thread[OOB-23,SSELabCl05-61693,5,Thread Pools]]! Lock held by [null]
                                  at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:213)
                                  at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:148)
                                  at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:106)
                                  at org.infinispan.interceptors.LockingInterceptor.visitPutMapCommand(LockingInterceptor.java:208)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
                                  at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:183)
                                  at org.infinispan.interceptors.TxInterceptor.visitPutMapCommand(TxInterceptor.java:152)
                                  at org.infinispan.interceptors.DistTxInterceptor.visitPutMapCommand(DistTxInterceptor.java:104)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
                                  at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
                                  at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
                                  at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:41)
                                  at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:44)
                                  at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:65)
                                  at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:178)
                                  at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:150)
                                  at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:578)
                                  at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:489)
                                  at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:350)
                                  at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:716)
                                  at org.jgroups.JChannel.up(JChannel.java:1456)
                                  at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:829)
                                  at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:441)
                                  at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:270)
                                  at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
                                  at org.jgroups.protocols.FC.up(FC.java:495)
                                  at org.jgroups.protocols.pbcast.GMS.up(GMS.java:890)
                                  at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:236)
                                  at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:559)
                                  at org.jgroups.protocols.UNICAST.up(UNICAST.java:275)
                                  at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:705)
                                  at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
                                  at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:178)
                                  at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
                                  at org.jgroups.stack.Protocol.up(Protocol.java:340)
                                  at org.jgroups.protocols.Discovery.up(Discovery.java:277)
                                  at org.jgroups.protocols.PING.up(PING.java:67)
                                  at org.jgroups.protocols.TP.passMessageUp(TP.java:953)
                                  at org.jgroups.protocols.TP.access$100(TP.java:53)
                                  at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1457)
                                  at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1439)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                                  at java.lang.Thread.run(Unknown Source)
                              12:54:43,028 | ERROR | Thread-14        | jgroups                          | org.jgroups.util.Util$1            76 | uncaught exception in Thread[Thread-14,5,Thread Pools] (thread group=org.jgroups.util.Util$1[name=JGroups,maxpri=10] )
                              org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [/measurements/acline/value@SSELabCl04/10.44.44.154_439] for requestor [Thread[OOB-23,SSELabCl05-61693,5,Thread Pools]]! Lock held by [null]
                                  at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:213)
                                  at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:148)
                                  at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:106)
                                  at org.infinispan.interceptors.LockingInterceptor.visitPutMapCommand(LockingInterceptor.java:208)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
                                  at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:183)
                                  at org.infinispan.interceptors.TxInterceptor.visitPutMapCommand(TxInterceptor.java:152)
                                  at org.infinispan.interceptors.DistTxInterceptor.visitPutMapCommand(DistTxInterceptor.java:104)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
                                  at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
                                  at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
                                  at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:41)
                                  at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:44)
                                  at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:65)
                                  at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:178)
                                  at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:150)
                                  at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:578)
                                  at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:489)
                                  at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:350)
                                  at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:716)
                                  at org.jgroups.JChannel.up(JChannel.java:1456)
                                  at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:829)
                                  at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:441)
                                  at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:270)
                                  at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
                                  at org.jgroups.protocols.FC.up(FC.java:495)
                                  at org.jgroups.protocols.pbcast.GMS.up(GMS.java:890)
                                  at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:236)
                                  at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:559)
                                  at org.jgroups.protocols.UNICAST.up(UNICAST.java:275)
                                  at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:705)
                                  at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
                                  at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:178)
                                  at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
                                  at org.jgroups.stack.Protocol.up(Protocol.java:340)
                                  at org.jgroups.protocols.Discovery.up(Discovery.java:277)
                                  at org.jgroups.protocols.PING.up(PING.java:67)
                                  at org.jgroups.protocols.TP.passMessageUp(TP.java:953)
                                  at org.jgroups.protocols.TP.access$100(TP.java:53)
                                  at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1457)
                                  at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1439)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                                  at java.lang.Thread.run(Unknown Source)

                               

                              The exception on node SSELabCl05:

                              12:55:39,372 | ERROR | SSELabCl05-61693 | InvocationContextInterceptor     | ors.InvocationContextInterceptor   55 | Execution error:
                              org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [/measurements/acline/value@SSELabCl04/10.44.44.154_439] for requestor [Thread[OOB-23,SSELabCl05-61693,5,Thread Pools]]! Lock held by [null]
                                  at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:213)
                                  at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:148)
                                  at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:106)
                                  at org.infinispan.interceptors.LockingInterceptor.visitPutMapCommand(LockingInterceptor.java:208)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
                                  at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:183)
                                  at org.infinispan.interceptors.TxInterceptor.visitPutMapCommand(TxInterceptor.java:152)
                                  at org.infinispan.interceptors.DistTxInterceptor.visitPutMapCommand(DistTxInterceptor.java:104)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
                                  at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
                                  at org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:73)
                                  at org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:64)
                                  at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
                                  at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:41)
                                  at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:44)
                                  at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:65)
                                  at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:178)
                                  at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:150)
                                  at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:578)
                                  at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:489)
                                  at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:350)
                                  at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:716)
                                  at org.jgroups.JChannel.up(JChannel.java:1456)
                                  at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:829)
                                  at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:441)
                                  at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:270)
                                  at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
                                  at org.jgroups.protocols.FC.up(FC.java:495)
                                  at org.jgroups.protocols.pbcast.GMS.up(GMS.java:890)
                                  at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:236)
                                  at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:559)
                                  at org.jgroups.protocols.UNICAST.up(UNICAST.java:275)
                                  at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:705)
                                  at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
                                  at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:178)
                                  at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
                                  at org.jgroups.stack.Protocol.up(Protocol.java:340)
                                  at org.jgroups.protocols.Discovery.up(Discovery.java:277)
                                  at org.jgroups.protocols.PING.up(PING.java:67)
                                  at org.jgroups.protocols.TP.passMessageUp(TP.java:953)
                                  at org.jgroups.protocols.TP.access$100(TP.java:53)
                                  at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1457)
                                  at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1439)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                                  at java.lang.Thread.run(Unknown Source)
                              12:55:39,372 | ERROR | SSELabCl05-61693 | InvocationContextInterceptor     | ors.InvocationContextInterceptor   55 | Execution error:
                              org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [/measurements/acline/value@SSELabCl04/10.44.44.154_7161] for requestor [Thread[OOB-18,SSELabCl05-61693,5,Thread Pools]]! Lock held by [null]
                                  at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:213)
                                  at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:148)
                                  at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:106)
                                  at org.infinispan.interceptors.LockingInterceptor.visitInvalidateCommand(LockingInterceptor.java:186)
                                  at org.infinispan.commands.write.InvalidateCommand.acceptVisitor(InvalidateCommand.java:119)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
                                  at org.infinispan.commands.AbstractVisitor.visitInvalidateCommand(AbstractVisitor.java:117)
                                  at org.infinispan.commands.write.InvalidateCommand.acceptVisitor(InvalidateCommand.java:119)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.TxInterceptor.enlistWriteAndInvokeNext(TxInterceptor.java:183)
                                  at org.infinispan.interceptors.TxInterceptor.visitInvalidateCommand(TxInterceptor.java:157)
                                  at org.infinispan.commands.write.InvalidateCommand.acceptVisitor(InvalidateCommand.java:119)
                                  at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
                                  at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:48)
                                  at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:34)
                                  at org.infinispan.commands.AbstractVisitor.visitInvalidateCommand(AbstractVisitor.java:117)
                                  at org.infinispan.commands.write.InvalidateCommand.acceptVisitor(InvalidateCommand.java:119)
                                  at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:269)
                                  at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:41)
                                  at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:44)
                                  at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:65)
                                  at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:178)
                                  at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:150)
                                  at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:578)
                                  at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:489)
                                  at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:350)
                                  at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:716)
                                  at org.jgroups.JChannel.up(JChannel.java:1456)
                                  at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:829)
                                  at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:489)
                                  at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:270)
                                  at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
                                  at org.jgroups.protocols.FC.up(FC.java:495)
                                  at org.jgroups.protocols.pbcast.GMS.up(GMS.java:890)
                                  at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:236)
                                  at org.jgroups.protocols.UNICAST.up(UNICAST.java:290)
                                  at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:811)
                                  at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:667)
                                  at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
                                  at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:178)
                                  at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
                                  at org.jgroups.stack.Protocol.up(Protocol.java:340)
                                  at org.jgroups.protocols.Discovery.up(Discovery.java:277)
                                  at org.jgroups.protocols.PING.up(PING.java:67)
                                  at org.jgroups.protocols.TP.passMessageUp(TP.java:953)
                                  at org.jgroups.protocols.TP.access$100(TP.java:53)
                                  at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1457)
                                  at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1439)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                                  at java.lang.Thread.run(Unknown Source)

                              • 12. Re: Deadlock situation when L1 is enabled for distributed cache
                                manik

                                Hmm, the lock owner being null is weird.  Perhaps it is a scheduling/thread starvation issue that no one actually owns the lock (it is uncontended) but due to scheduling, the thread cannot acquire the lock in time before the timeout throws it out?

                                 

                                Or it could also be a race that the lock could not be acquired due to being held elsewhere, but when it comes to reporting this (throwing the exception) the lock had been released by then. 

                                 

                                Here is the code that generates this exception:

                                 

                                 

                                if (lockManager.lockAndRecord(key, ctx)) {
                                  return true;
                                } else {
                                  Object owner = lockManager.getOwner(key);
                                  // if lock cannot be acquired, expose the key itself, not the marshalled value
                                  if (key instanceof MarshalledValue) {
                                    key = ((MarshalledValue) key).get();
                                  }
                                  throw new TimeoutException("Unable to acquire lock after [" + 
                                        Util.prettyPrintTime(getLockAcquisitionTimeout(ctx)) + 
                                        "] on key [" + key + "] for requestor [" +
                                        ctx.getLockOwner() + "]! Lock held by [" + owner + "]");
                                }
                                
                                
                                • 13. Re: Deadlock situation when L1 is enabled for distributed cache
                                  manik

                                  Anyway I have this on my todo list, I'll try and create a unit/stress test for this case and investigate the issue in more detail.

                                  • 14. Re: Deadlock situation when L1 is enabled for distributed cache
                                    yelin66

                                    Manik, thanks a lot for looking into it.

                                     

                                    FYI, when I used replicated cache for the scenario with key collision updates. When an update on the same key from two instances caused a true deadlock, I got the same message "Lock held by [null]". I don't know why it's [null], but that's what I got here. You probably can try reproduce the deadlock situation first (as that's much easier than the above case), and see if you get who held the lock... If you can get that information, let me know how. So I'll try to collect that information, and it may be helpful.

                                    1 2 Previous Next