5 Replies Latest reply on Sep 25, 2017 11:16 AM by rvansa

    Could not commit implicit transaction & RpcException

    stehlik.michal

      Hi All,

       

      we have system deployed on customer site. Sometimes, based on unknown conditions, happened that we have exception in our workflow which forbid us to continue some critical operation. In our logs, we have following exception:

      org.infinispan.commons.CacheException: Could not commit implicit transaction
      at org.infinispan.cache.impl.CacheImpl.tryCommit(CacheImpl.java:1738)
      at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1692)
      at org.infinispan.cache.impl.CacheImpl.putInternal(CacheImpl.java:1134)
      at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1124)
      at org.infinispan.cache.impl.DecoratedCache.put(DecoratedCache.java:453)
      

       

      Cache over which this exception happened has following settings:

      <distributed-cache name="metadata" owners="5" mode="SYNC" deadlock-detection-spin="1000">
              <store-as-binary keys="false" values="false"/>
              <transaction mode="NON_DURABLE_XA" 
              locking="PESSIMISTIC" 
              transaction-manager-lookup="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup" 
              stop-timeout="30000" />
              <locking isolation="READ_COMMITTED"
              acquire-timeout="60000"
              striping="false"
              concurrency-level="1000"/>
              <state-transfer enabled="true" timeout="2000000" />
      </distributed-cache>
      

       

      Infinispan version where this sometimes occurred is 8.2.6. We have also infinispan logs in separate file, there I can see detailed reason, why this issue occurred. Reason is following:

      (InvocationContextInterceptor.java:136) - ISPN000136: Error executing command PrepareCommand, writing keys [31cc4536-49d7-4cb9-acbc-4ee0ae22374e:light]
      org.infinispan.util.concurrent.TimeoutException: Replication timeout
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:639) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at java.util.concurrent.CompletableFuture.uniApply(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture.complete(Unknown Source) ~[?:1.8.0_71]
      at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:61) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:16) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_71]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_71]
      at java.lang.Thread.run(Unknown Source) [?:1.8.0_71]
      Suppressed: org.infinispan.remoting.RpcException: Not accepted: null
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.addSuppressedExceptions(JGroupsTransport.java:694) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:639) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at java.util.concurrent.CompletableFuture.uniApply(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture.complete(Unknown Source) ~[?:1.8.0_71]
      at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:61) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:16) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_71]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_71]
      at java.lang.Thread.run(Unknown Source) [?:1.8.0_71]
      Suppressed: org.infinispan.remoting.RpcException: Not accepted: null
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.addSuppressedExceptions(JGroupsTransport.java:694) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:639) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at java.util.concurrent.CompletableFuture.uniApply(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture.complete(Unknown Source) ~[?:1.8.0_71]
      at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:61) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:16) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_71]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_71]
      at java.lang.Thread.run(Unknown Source) [?:1.8.0_71]
      Suppressed: org.infinispan.remoting.RpcException: Not accepted: null
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.addSuppressedExceptions(JGroupsTransport.java:694) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:639) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at java.util.concurrent.CompletableFuture.uniApply(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.CompletableFuture.complete(Unknown Source) ~[?:1.8.0_71]
      at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:61) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:16) ~[infinispan-embedded-8.2.6.Final.jar:8.2.6.Final]
      at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:1.8.0_71]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_71]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_71]
      at java.lang.Thread.run(Unknown Source) [?:1.8.0_71]
      

       

      I don't understand why this happened as well as conditions when this happened is not clear to me, since this is deep in Infinispan & Jgroups implementation. How to avoid such state? Will it helps if we create explicit transaction and commit it? I think, main reason why this happened is RpcException: Not accepted: null, am I right?: t accepted: null

        • 1. Re: Could not commit implicit transaction & RpcException
          rvansa

          All the stack trace says is that the response was not received within timeout, while 3 nodes have responded (I admit that "Not accepted: null" is not the best indicator of that). Since you have 5 owners, I guess that one node is the local one (sending the RPC) and therefore we're missing one response.

           

          About the reasons: one is GC pause, then there may be some delay during topology change (was the cluster stable at that moment?) and last but not least is a bug. Have you checked logs on the other nodes?

          1 of 1 people found this helpful
          • 2. Re: Could not commit implicit transaction & RpcException
            stehlik.michal

            Unfortunately I can not get logs from remaining nodes. Currently, it's difficult to reproduce this issue on our environments.

            To possible reasons i have following comments:

            • GC pause: this is possible in our case. How can we increase timeout values?
            • delay during topology change: before exception occurred, topology was changed (added new node), probably, to avoid such exception will helps increase timeouts.
            • bug: since issue is really hard to reproduce and we have only logs from one node, this we can not (dis)approve.

             

            To me, currently it seems, that adjust timeouts for transaction can help. Can you give me advise, how to set this up (configuration or programatically)?

            • 3. Re: Could not commit implicit transaction & RpcException
              rvansa

              In XML configuration, there's "timeout" attribute on "distributed-cache"/"replicated-cache" element; programmatically you'll find it in configuration.clustering().remoteTimeout(...). This setting handles timeouts for most RPCs.

               

              I strongly suggest logging garbage collection info (as much as you can, it has very low impact) in production.

              • 4. Re: Could not commit implicit transaction & RpcException
                stehlik.michal

                We examine Infinispan logs deeply and we found something maybe interesting... 1st I would like to explain some workflow to understand our need.

                Our nodes start with some Infinispan configuration, Jgroups configuration & own configuration. Infinispan is used for distributed data, Jgroups for node communication. As soon, as Jgroups notice topology change, this node will set NTP (network time protocol) configuration to some server, because it's vital for us to keep time in sync. In logs we found, that some synchronization of time happened around time when mention exception occurred. It seems, time shift was around 1 sec.

                • How this kind of time shifts affects putting data into caches?
                • Can RPC call timeouts be affected by time shifts?

                 

                This are interesting log entries from our logs:

                2017-09-11 23:27:20,190 DEBUG  [transport-thread-22275785-64f3-41ff-81c2-e92af1cc2e98-p4-t2] (LocalTopologyManagerImpl.java:344) - Updating local topology for cache systemPermissions: CacheTopology{id=14, rebalanceId=8, currentCH=ReplicatedConsistentHash{ns = 256, owners = (7)[661185d5-4f6e-424e-b089-f653aad2b9e4-35366: 39, 22275785-64f3-41ff-81c2-e92af1cc2e98-36575: 38, 178b3a71-3c7a-4325-b179-8ffd30457663-61588: 33, 45b82e9b-5f65-4615-98f0-b67fac9968cb-27609: 39, c9ac983c-c663-43a4-9d2c-7ea9d06db100-17177: 39, 5a009b18-610b-4633-8498-1fb7c867cb7b-10733: 33, d69416d7-4488-4f77-baa2-ad09f2747c95-18595: 35]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (8)[661185d5-4f6e-424e-b089-f653aad2b9e4-35366: 34, 22275785-64f3-41ff-81c2-e92af1cc2e98-36575: 34, 178b3a71-3c7a-4325-b179-8ffd30457663-61588: 29, 45b82e9b-5f65-4615-98f0-b67fac9968cb-27609: 32, c9ac983c-c663-43a4-9d2c-7ea9d06db100-17177: 34, 5a009b18-610b-4633-8498-1fb7c867cb7b-10733: 29, d69416d7-4488-4f77-baa2-ad09f2747c95-18595: 32, e2789e17-398f-40d1-965a-45d69dbc56d7-4039: 32]}, unionCH=null, actualMembers=[661185d5-4f6e-424e-b089-f653aad2b9e4-35366, 22275785-64f3-41ff-81c2-e92af1cc2e98-36575, 178b3a71-3c7a-4325-b179-8ffd30457663-61588, 45b82e9b-5f65-4615-98f0-b67fac9968cb-27609, c9ac983c-c663-43a4-9d2c-7ea9d06db100-17177, 5a009b18-610b-4633-8498-1fb7c867cb7b-10733, d69416d7-4488-4f77-baa2-ad09f2747c95-18595, e2789e17-398f-40d1-965a-45d69dbc56d7-4039]}
                2017-09-11 23:27:20,195 DEBUG  [transport-thread-22275785-64f3-41ff-81c2-e92af1cc2e98-p4-t2] (LocalTopologyManagerImpl.java:473) - Starting local rebalance for cache systemPermissions, topology = CacheTopology{id=14, rebalanceId=8, currentCH=ReplicatedConsistentHash{ns = 256, owners = (7)[661185d5-4f6e-424e-b089-f653aad2b9e4-35366: 39, 22275785-64f3-41ff-81c2-e92af1cc2e98-36575: 38, 178b3a71-3c7a-4325-b179-8ffd30457663-61588: 33, 45b82e9b-5f65-4615-98f0-b67fac9968cb-27609: 39, c9ac983c-c663-43a4-9d2c-7ea9d06db100-17177: 39, 5a009b18-610b-4633-8498-1fb7c867cb7b-10733: 33, d69416d7-4488-4f77-baa2-ad09f2747c95-18595: 35]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (8)[661185d5-4f6e-424e-b089-f653aad2b9e4-35366: 34, 22275785-64f3-41ff-81c2-e92af1cc2e98-36575: 34, 178b3a71-3c7a-4325-b179-8ffd30457663-61588: 29, 45b82e9b-5f65-4615-98f0-b67fac9968cb-27609: 32, c9ac983c-c663-43a4-9d2c-7ea9d06db100-17177: 34, 5a009b18-610b-4633-8498-1fb7c867cb7b-10733: 29, d69416d7-4488-4f77-baa2-ad09f2747c95-18595: 32, e2789e17-398f-40d1-965a-45d69dbc56d7-4039: 32]}, unionCH=null, actualMembers=[661185d5-4f6e-424e-b089-f653aad2b9e4-35366, 22275785-64f3-41ff-81c2-e92af1cc2e98-36575, 178b3a71-3c7a-4325-b179-8ffd30457663-61588, 45b82e9b-5f65-4615-98f0-b67fac9968cb-27609, c9ac983c-c663-43a4-9d2c-7ea9d06db100-17177, 5a009b18-610b-4633-8498-1fb7c867cb7b-10733, d69416d7-4488-4f77-baa2-ad09f2747c95-18595, e2789e17-398f-40d1-965a-45d69dbc56d7-4039]}
                2017-09-11 23:27:19,892 ERROR  [Camel (camel-1) thread #5 - seda://omAccessor] (InvocationContextInterceptor.java:136) - ISPN000136: Error executing command PrepareCommand, writing keys [31cc4536-49d7-4cb9-acbc-4ee0ae22374e:light]
                org.infinispan.util.concurrent.TimeoutException: Replication timeout
                
                • 5. Re: Could not commit implicit transaction & RpcException
                  rvansa

                  From my experience a 'weird' order of messages in log may not be caused only by changing the system time but also by a heavy contention in logging itself.

                  The logic behind timeouts is quite simple, setting up a deadline using System.currentTimeMillis() + timeout, but I can't guarantee that a backwards time shift wouldn't break anything; we don't test this. When adjusting the time systems usually rather slow down the perceived wall clock time to keep the monotonicity than setting it instantly.