-
15. Re: failure situation using cluster-wide lock
manik Dec 3, 2010 10:00 AM (in response to yelin666)Hi Lin - this is done, see details of the fix in http://lists.jboss.org/pipermail/infinispan-dev/2010-December/006943.html
You will see this fix in 4.2.0.CR3.
-
16. Re: failure situation using cluster-wide lock
yelin666 Dec 6, 2010 9:21 AM (in response to manik)I restested with 4.2.0.CR3 using the simple test above & JBossTX. Unfortunated, it still locked up every time after two instances running for a while.
-
17. Re: failure situation using cluster-wide lock
manik Dec 14, 2010 6:51 AM (in response to yelin666)Hi Lin
I haven't been able to reproduce this with more recent snapshots. I just published a new 4.2.0-SNAPSHOT which contains some important fixes in the transaction adapters, which may be related. Could you pls try with this snapshot, and if it works, this will form the basis of 4.2.0.CR4.
Thanks
Manik
-
18. Re: failure situation using cluster-wide lock
yelin666 Dec 14, 2010 8:52 AM (in response to manik)Hi Manik,
I downloaded infinispan-core-4.2.0-20101214.115023-9.jar (assumed that's the snapshot build you mentioned). I am still seeing the lock up running the simple test I attached earlier in this thread, when the counter reached about 20 (doesn't take long). Please suggest if that's the right version to try.
Thanks,
Lin
-
19. Re: failure situation using cluster-wide lock
manik Dec 17, 2010 10:21 AM (in response to yelin666)I'm not sure if that is the right snapshot, but 4.2.0.CR4 is out. Could you pls try that?
-
20. Re: failure situation using cluster-wide lock
yelin666 Dec 17, 2010 2:11 PM (in response to manik)Manik,
I just tried 4.2.0.CR4, and it locked up as previous versions. It seems one thread was holding the lock and never released it. I am wondering how you verify it is fixed on your side?
I understand a cluster-wide lock is not easy to implement. However, with a lock up in a simple test case above, I won't be able to recommend using the lock in a production environment to our team.
Thanks,
Lin
-
21. Re: failure situation using cluster-wide lock
manik Jan 5, 2011 3:07 PM (in response to yelin666)Hi Lin - we did have a few unit tests for this. I'll have a look again, but could I ask you to try 4.2.0.FINAL as well?
-
22. Re: failure situation using cluster-wide lock
yelin666 Jan 5, 2011 9:32 PM (in response to manik)I tried 4.2.0.FINAL, and it's having the same issue with my test attached above. If I kill one instance, it may release the lock and the other instance could get the lock and continue. So it seems one thread in the process being killed was holding the lock.
Did you have a unit test for this particular issue? Did it pass? I am just wondering...
-
23. Re: failure situation using cluster-wide lock
manik Jan 6, 2011 7:48 AM (in response to yelin666)This is the test that I use.
I'll give it another run and test for regressions.
-
24. Re: failure situation using cluster-wide lock
manik Jan 6, 2011 11:11 AM (in response to manik)I've reopened the JIRA and will investigate further. I recommend adding a watch on that JIRA. :-)
-
25. Re: failure situation using cluster-wide lock
yelin666 Jan 6, 2011 11:19 AM (in response to manik)Thanks for the update. Did you reproduce it with the stress test? I hope it'll be fixed in 4.2.1.
-
26. failure situation using cluster-wide lock
manik Jan 7, 2011 10:55 AM (in response to yelin666)Hi - I haven't been able to reproduce this with the stress test as yet. The test failure you still see - is it still exactly the same? Or is it some other error that you now see? Could you share a recent (4.2.0.FINAL) log file?
-
27. Re: failure situation using cluster-wide lock
yelin666 Jan 9, 2011 9:11 PM (in response to manik)The problem I saw is exactly the same as before. I looked at the code of the stress test you sent, and noticed a major difference between your test and my test. Basically, in your test, only two test threads were started in the main thread, and the CacheManager of one thread got killed and the lock owner was checked to verify the correctness. In my test, I created a timer to repeatedly start the CounterTask at a fixed rate, so more threads got involved in the test. I attached my test code again. It's a standalone Java application, and I run two instances to let them race for the lock. After a while, both of them got locked out, and no one could acquire the lock anymore. The following is the output from one instance.
Create cache manager ...
previous settings {}
log4j:WARN No appenders could be found for logger (org.infinispan.factories.GlobalComponentRegistry).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
get cache
memebers [T00696119-60327, T00696119-38912]
cache size 1
aquiring lock on cache test-grid key locked-counter...
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [locked-counter] for requestor [GlobalTransaction:<T00696119-38912>:1:remote]! Lock held by [GlobalTransaction:<T00696119-60327>:4:local]
at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:228)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:155)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:107)
at org.infinispan.interceptors.LockingInterceptor.lockKeysForLockCommand(LockingInterceptor.java:217)
at org.infinispan.interceptors.LockingInterceptor.visitLockControlCommand(LockingInterceptor.java:195)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:152)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:169)
at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:125)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:87)
at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:63)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
at org.infinispan.commands.control.LockControlCommand.perform(LockControlCommand.java:165)
at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:94)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:177)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:149)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:578)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:489)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771)
at org.jgroups.JChannel.up(JChannel.java:1465)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:478)
at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:265)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:419)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:419)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:310)
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:806)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:671)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
at org.jgroups.protocols.FD.up(FD.java:266)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
at org.jgroups.protocols.Discovery.up(Discovery.java:292)
at org.jgroups.protocols.PING.up(PING.java:67)
rollback...
at org.jgroups.protocols.MPING.up(MPING.java:175)
done rollback
at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
at org.jgroups.protocols.TP.access$100(TP.java:56)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
current value : 6
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 7
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 9
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 11
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 12
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 14
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 16
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 18
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 19
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 21
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
current value : 23
commit...
done commit
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
2011-01-09 20:54:24,958 [Thread-4] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.TransactionReaper_18] - TransactionReaper::check timeout for TX 0:ffff032b3859:662:4d2a6694:32 in state RUN
2011-01-09 20:54:24,958 [Thread-5] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.BasicAction_58] - Abort of action id 0:ffff032b3859:662:4d2a6694:32 invoked while multiple threads active within it.
2011-01-09 20:54:24,958 [Thread-5] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.CheckedAction_2] - CheckedAction::check - atomic action 0:ffff032b3859:662:4d2a6694:32 aborting with 1 threads active!
2011-01-09 20:54:24,974 [Thread-5] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.TransactionReaper_7] - TransactionReaper::doCancellations worker Thread[Thread-5,5,main] successfully canceled TX 0:ffff032b3859:662:4d2a6694:32
java.lang.IllegalStateException: Transaction TransactionImple < ac, BasicAction: 0:ffff032b3859:662:4d2a6694:32 status: ActionStatus.ABORTED > is not in a valid state to be invoking cache operations on.
at org.infinispan.interceptors.TxInterceptor.enlist(TxInterceptor.java:193)
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:165)
at org.infinispan.interceptors.TxInterceptor.visitGetKeyValueCommand(TxInterceptor.java:160)
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:59)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:87)
rollback...
at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:58)
2011-01-09 20:54:26,381 [Timer-0] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.BasicAction_40] - Abort called on already aborted atomic action 0:ffff032b3859:662:4d2a6694:32
done rollback
tx status at the end : no transaction
at org.infinispan.commands.AbstractVisitor.visitGetKeyValueCommand(AbstractVisitor.java:88)
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:59)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
at org.infinispan.CacheDelegate.get(CacheDelegate.java:207)
at CacheScheduledCounter$CounterTask.run(CacheScheduledCounter.java:84)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
aquiring lock on cache test-grid key locked-counter...
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [locked-counter] for requestor [GlobalTransaction:<T00696119-38912>:14:local]! Lock held by [GlobalTransaction:<T00696119-38912>:13:local]
at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:228)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:155)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:107)
at org.infinispan.interceptors.LockingInterceptor.lockKeysForLockCommand(LockingInterceptor.java:217)
at org.infinispan.interceptors.LockingInterceptor.visitLockControlCommand(LockingInterceptor.java:195)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:152)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:169)
at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:125)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:87)
rollback...
done rollback
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:63)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:311)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:303)
at CacheScheduledCounter$CounterTask.run(CacheScheduledCounter.java:82)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [locked-counter] for requestor [GlobalTransaction:<T00696119-38912>:15:local]! Lock held by [GlobalTransaction:<T00696119-38912>:13:local]
at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:228)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:155)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:107)
at org.infinispan.interceptors.LockingInterceptor.lockKeysForLockCommand(LockingInterceptor.java:217)
at org.infinispan.interceptors.LockingInterceptor.visitLockControlCommand(LockingInterceptor.java:195)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:152)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:169)
at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:125)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:87)
rollback...
done rollback
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:63)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:311)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:303)
at CacheScheduledCounter$CounterTask.run(CacheScheduledCounter.java:82)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [locked-counter] for requestor [GlobalTransaction:<T00696119-38912>:16:local]! Lock held by [GlobalTransaction:<T00696119-38912>:13:local]
at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:228)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:155)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:107)
at org.infinispan.interceptors.LockingInterceptor.lockKeysForLockCommand(LockingInterceptor.java:217)
at org.infinispan.interceptors.LockingInterceptor.visitLockControlCommand(LockingInterceptor.java:195)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:152)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:169)
at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:125)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:87)
rollback...
done rollback
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:63)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:311)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:303)
at CacheScheduledCounter$CounterTask.run(CacheScheduledCounter.java:82)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [locked-counter] for requestor [GlobalTransaction:<T00696119-38912>:17:local]! Lock held by [GlobalTransaction:<T00696119-38912>:13:local]
at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:228)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:155)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:107)
at org.infinispan.interceptors.LockingInterceptor.lockKeysForLockCommand(LockingInterceptor.java:217)
at org.infinispan.interceptors.LockingInterceptor.visitLockControlCommand(LockingInterceptor.java:195)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:152)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:169)
at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:125)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:87)
rollback...
done rollback
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:63)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:311)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:303)
at CacheScheduledCounter$CounterTask.run(CacheScheduledCounter.java:82)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [locked-counter] for requestor [GlobalTransaction:<T00696119-38912>:18:local]! Lock held by [GlobalTransaction:<T00696119-38912>:13:local]
at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:228)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:155)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:107)
at org.infinispan.interceptors.LockingInterceptor.lockKeysForLockCommand(LockingInterceptor.java:217)
at org.infinispan.interceptors.LockingInterceptor.visitLockControlCommand(LockingInterceptor.java:195)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:152)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:169)
at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:125)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:87)
rollback...
done rollback
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:63)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:311)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:303)
at CacheScheduledCounter$CounterTask.run(CacheScheduledCounter.java:82)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
2011-01-09 20:55:29,871 [Thread-4] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.TransactionReaper_18] - TransactionReaper::check timeout for TX 0:ffff032b3859:662:4d2a6694:4a in state RUN
2011-01-09 20:55:29,871 [Thread-5] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.BasicAction_58] - Abort of action id 0:ffff032b3859:662:4d2a6694:4a invoked while multiple threads active within it.
2011-01-09 20:55:29,871 [Thread-5] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.CheckedAction_2] - CheckedAction::check - atomic action 0:ffff032b3859:662:4d2a6694:4a aborting with 1 threads active!
2011-01-09 20:55:29,871 [Thread-5] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.TransactionReaper_7] - TransactionReaper::doCancellations worker Thread[Thread-5,5,main] successfully canceled TX 0:ffff032b3859:662:4d2a6694:4a
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [locked-counter] for requestor [GlobalTransaction:<T00696119-38912>:19:local]! Lock held by [GlobalTransaction:<T00696119-38912>:13:local]
at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:228)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:155)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:107)
at org.infinispan.interceptors.LockingInterceptor.lockKeysForLockCommand(LockingInterceptor.java:217)
at org.infinispan.interceptors.LockingInterceptor.visitLockControlCommand(LockingInterceptor.java:195)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:152)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:169)
at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:125)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:87)
rollback...
2011-01-09 20:55:30,027 [Timer-0] WARN com.arjuna.ats.arjuna.logging.arjLoggerI18N - [com.arjuna.ats.arjuna.coordinator.BasicAction_40] - Abort called on already aborted atomic action 0:ffff032b3859:662:4d2a6694:4a
at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:63)
done rollback
tx status at the end : no transaction
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(aquiring lock on cache test-grid key locked-counter...
LockControlCommand.java:143)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:311)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:303)
at CacheScheduledCounter$CounterTask.run(CacheScheduledCounter.java:82)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [10 seconds] on key [locked-counter] for requestor [GlobalTransaction:<T00696119-38912>:20:local]! Lock held by [GlobalTransaction:<T00696119-38912>:13:local]
at org.infinispan.container.EntryFactoryImpl.acquireLock(EntryFactoryImpl.java:228)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:155)
at org.infinispan.container.EntryFactoryImpl.wrapEntryForWriting(EntryFactoryImpl.java:107)
at org.infinispan.interceptors.LockingInterceptor.lockKeysForLockCommand(LockingInterceptor.java:217)
at org.infinispan.interceptors.LockingInterceptor.visitLockControlCommand(LockingInterceptor.java:195)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132)
at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:152)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:169)
at org.infinispan.interceptors.TxInterceptor.visitLockControlCommand(TxInterceptor.java:125)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:87)
rollback...
done rollback
tx status at the end : no transaction
aquiring lock on cache test-grid key locked-counter...
at org.infinispan.interceptors.InvocationContextInterceptor.visitLockControlCommand(InvocationContextInterceptor.java:63)
at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:143)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:273)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:311)
at org.infinispan.CacheDelegate.lock(CacheDelegate.java:303)
at CacheScheduledCounter$CounterTask.run(CacheScheduledCounter.java:82)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
-
28. Re: failure situation using cluster-wide lock
manik Jan 11, 2011 11:48 AM (in response to yelin666)I just tried your test. Here is what I see:
- start one instance. It increments counter to 3 or 4
- start another instance. They take turns incrementing the counter. Occasional timeout exceptions (but this is expected). In general, the counter is still increasing on both sides.
- Kill one instance (I specifically kill it just after it prints that it has acquired the lock and is waiting 2 secs)
- The other instance still continues as per normal.
- Start the 2nd instance again. Goto 3.
I couldn't make it fail? I did set the networking to use 127.0.0.1 as a bind address and both instances were running on my laptop - although I don't think that would make much of a difference. Any other suggestions on making this fail? Did you have more than 2 instances of your program running?
Cheers
Manik
-
29. failure situation using cluster-wide lock
yelin666 Jan 11, 2011 11:58 AM (in response to manik)Hi Manik,
Thanks for the update. I am playing with it right now. Basically, what I got so far is that it doesn't fail with the DummyTransactionManager; However, I got the problem with the JBossStandalonJTAManager. Not sure if it's a TX Manager specific issue. So I am going to do a testing with the TX Manager we used in our container. I am in a whole week meeting at the moment, and will have limited time to test it, but I'll try my best to get back to you as soon as possible.
Cheers,
Lin