1 Reply Latest reply on Jun 13, 2018 3:16 AM by Dan Berindei

    Upgrade from 6.0.2 to 9.2.4- issue after split brain recovery

    Prakash Kolandaivelu Newbie

      Upgraded the infinispan from 6.0.2.Final to 9.2.4.Final. All seems to be working when the cluster is started. But after recovering from split brain, below exception is thrown and cluster stops working. No code change done after the upgrade in the clustering.

       

      Infinispan 6 config.

      -------------------

          <infinispan>

          <global>

              <transport>

              <properties>

              <property name="configurationFile" value="configuration/jgroups.xml"/>

              </properties>

          </transport>

          <!-- Enable JMX statistics -->

          <globalJmxStatistics

              enabled="true"

              jmxDomain="org.infinispan"

              cacheManagerName="CacheManager"/>

          </global>

          <default>

          <!-- Configure a synchronous replication cache -->

          <clustering mode="replication">

            <sync/>

          </clustering>

          <!--

              Used to register JMX statistics in any available MBean server

          -->

          <jmxStatistics enabled="true"/>

          </default>

          <!-- transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup" -->

          <namedCache name="transactional-type">

          <transaction

              transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"

              syncRollbackPhase="true"

              syncCommitPhase="true"

              cacheStopTimeout="30000"

              use1PcForAutoCommitTransactions="true"

              autoCommit="true"

              lockingMode="OPTIMISTIC"

              useSynchronization="true"

              transactionMode="TRANSACTIONAL"

              />

          </namedCache>

          </infinispan>

       

      Infinispan 9 config.

      -----------------------

          <?xml version="1.0" encoding="UTF-8"?>

          <infinispan>

          <jgroups>

              <stack-file name="tcp" path="configuration/jgroups.xml"/>

          </jgroups>

          <cache-container default-cache="default">

          <transport stack="tcp" cluster="cluster" channel="other"/>

          <replicated-cache name="default" statistics="true"  mode="ASYNC" remote-timeout="20000" >

              <locking

                  isolation="READ_COMMITTED" striping="false" acquire-timeout="180000"

                  write-skew="true"/>

              <versioning scheme="SIMPLE"/>

          </replicated-cache>

          <replicated-cache name="transactional-type" statistics="true"  mode="SYNC" remote-timeout="20000" >

              <locking

                   isolation="READ_COMMITTED" striping="false" acquire-timeout="180000"

                   write-skew="true"/>

              <transaction

                   locking="OPTIMISTIC"

                   auto-commit="true"

                   complete-timeout="60000"

                   mode="FULL_XA"

                   notifications="true"

                   stop-timeout="30000"

                   transaction-manager-lookup="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"/>

          </replicated-cache>

          </cache-container>

          </infinispan>

       

      jgroup.xml after the upgrade.

      -----------------------------

          <config xmlns="urn:org:jgroups"

              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

              xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">

          <TCP bind_addr="${jgroups.tcp.address}"

               bind_port="12000" port_range="10"

               recv_buf_size="20000000"

               send_buf_size="640000"

               bundler_type="no-bundler"

               enable_diagnostics="true"

               thread_naming_pattern="cl"

               thread_pool.min_threads="0"

               thread_pool.max_threads="8"

               thread_pool.keep_alive_time="5000"

          />

          <TCPGOSSIP initial_hosts="${jgroups.tcpgossip.initial_hosts}"/>

          <MERGE3/>

          <FD timeout="3000" max_tries="3"/>

          <VERIFY_SUSPECT timeout="500"/>

          <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="true" />

          <UNICAST3/>

          <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="10m"/>

          <pbcast.GMS print_local_addr="true" join_timeout="${jgroups.join_timeout:5000}"

                      max_bundling_time="30"

                      view_bundling="true"/>

          <MFC max_credits="2M" min_threshold="0.4"/>

          <FRAG3/>

          </config>

       

      Everything seems to be working including the merge after the split brain recovery, Below exception occurs after recovering from split brain. this does not happens consistently. 1 out of 4 times this happens.

       

          2018-06-12 16:15:21.748 IST [jgroups-302,ndb-cluster,ndbDevBox-39617] ERROR org.infinispan.interceptors.impl.InvocationContextInterceptor - ISPN000136: Error executing command PrepareCommand, writing keys [admin]

          org.infinispan.remoting.RemoteException: ISPN000217: Received exception from DevBox-43633, see cause for remote stack trace

          at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:27) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]

          at org.infinispan.remoting.transport.impl.MapResponseCollector.addException(MapResponseCollector.java:65) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]

          at org.infinispan.remoting.transport.impl.MapResponseCollector$IgnoreLeavers.addException(MapResponseCollector.java:103) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]

          at org.infinispan.remoting.transport.ValidResponseCollector.addResponse(ValidResponseCollector.java:29) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]

          at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:91) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]

          at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52) ~[na:na]

          at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1318) [infinispan-core-9.2.4.Final.jar:9.2.4.Final]

          at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1221) [infinispan-core-9.2.4.Final.jar:9.2.4.Final]

          at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) [infinispan-core-9.2.4.Final.jar:9.2.4.Final]

          at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1356) [infinispan-core-9.2.4.Final.jar:9.2.4.Final]

          at org.jgroups.JChannel.up(JChannel.java:819) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.FD.up(FD.java:212) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.Discovery.up(Discovery.java:267) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) [jgroups-4.0.11.Final.jar:4.0.11.Final]

          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131]

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]

          at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]

          Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key admin and requestor RecoveryAwareGlobalTransaction{xid=Xid{formatId=131077, globalTransactionId=00000000000000000000FFFF7F00010100008DB35B1F966000000D0E31,branchQualifier=00000000000000000000FFFF7F00010100008DB35B1F966000000D0F0000000000000000}, internalId=1688854155231238} GlobalTx:DevBox-39617:837. Lock is held by RecoveryAwareGlobalTransaction{xid=Xid{formatId=131077, globalTransactionId=00000000000000000000FFFF7F0001010000BA065B1F967200000FCE31,branchQualifier=00000000000000000000FFFF7F0001010000BA065B1F967200000FCF0000000000000000}, internalId=1688854155231236} GlobalTx:DevBox-43633:1012

          at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288) ~[na:na]

          at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:218) ~[na:na]

          at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:436) ~[na:na]

          at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:408) ~[na:na]

          at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:248) ~[na:na]

          at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:272) ~[na:na]

          at org.infinispan.interceptors.locking.OptimisticLockingInterceptor.visitPrepareCommand(OptimisticLockingInterceptor.java:53) ~[na:na]

          at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]

          at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:183) ~[na:na]

          at org.infinispan.interceptors.impl.TxInterceptor.handlePrepareCommand(TxInterceptor.java:137) ~[na:na]

          at org.infinispan.interceptors.impl.TxInterceptor.visitPrepareCommand(TxInterceptor.java:126) ~[na:na]

          at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]

          at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[na:na]

          at org.infinispan.interceptors.BaseAsyncInterceptor.lambda$new$0(BaseAsyncInterceptor.java:22) ~[na:na]

          at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:21) ~[na:na]

          at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.addCallback(SimpleAsyncInvocationStage.java:69) ~[na:na]

          at org.infinispan.interceptors.InvocationStage.thenApply(InvocationStage.java:24) ~[na:na]

          at org.infinispan.interceptors.BaseAsyncInterceptor.asyncInvokeNext(BaseAsyncInterceptor.java:224) ~[na:na]

          at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitCommand(TransactionSynchronizerInterceptor.java:46) ~[na:na]

          at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:185) ~[na:na]

          at org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:203) ~[na:na]

          at org.infinispan.statetransfer.StateTransferInterceptor.visitPrepareCommand(StateTransferInterceptor.java:69) ~[na:na]

          at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]

          at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[na:na]

          at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[na:na]

          at org.infinispan.interceptors.DDAsyncInterceptor.visitPrepareCommand(DDAsyncInterceptor.java:132) ~[na:na]

          at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]

          at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) ~[na:na]

          at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) ~[na:na]

          at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) ~[na:na]

          at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[na:na]

          at org.infinispan.interceptors.DDAsyncInterceptor.visitPrepareCommand(DDAsyncInterceptor.java:132) ~[na:na]

          at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]

          at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) ~[na:na]

          at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) ~[na:na]

          at org.infinispan.commands.tx.PrepareCommand.invokeAsync(PrepareCommand.java:109) ~[na:na]

          at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[na:na]

          at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) ~[na:na]

          at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) ~[na:na]

          at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) ~[na:na]

          ... 3 common frames omitted

       

      Cluster members can change the same key.

      Need help in fixing this exception. Not sure if this was discussed any thread. could not find any thing so posted it.

       

      Any code change is required in the way cache update ?

      Is this the configuration mistake or anything else need to be corrected?

        • 1. Re: Upgrade from 6.0.2 to 9.2.4- issue after split brain recovery
          Dan Berindei Expert

          I'm not 100% sure but I think it's a combination of having a rather small JGroups thread pool and Infinispan 9.2.4 performing conflict resolution on merge by default. BTW, we've reverted that in 9.3.0.Beta1, you now have to request conflict resolution explicitly -- we should probably backport ISPN-9103 to 9.2.x.

           

          I suggest disabling conflict resolution with <partition-handling merge-policy="NONE"/> first. If you still have problems, increase thread_pool.max_threads (e.g. 32). If you still have problems then, please collect some thread dumps and collect TRACE (or at least DEBUG) logs from org.infinispan so we can investigate further.