Upgrade from 6.0.2 to 9.2.4- issue after split brain recovery
pkolanda Jun 12, 2018 11:39 AMUpgraded the infinispan from 6.0.2.Final to 9.2.4.Final. All seems to be working when the cluster is started. But after recovering from split brain, below exception is thrown and cluster stops working. No code change done after the upgrade in the clustering.
Infinispan 6 config.
-------------------
<infinispan>
<global>
<transport>
<properties>
<property name="configurationFile" value="configuration/jgroups.xml"/>
</properties>
</transport>
<!-- Enable JMX statistics -->
<globalJmxStatistics
enabled="true"
jmxDomain="org.infinispan"
cacheManagerName="CacheManager"/>
</global>
<default>
<!-- Configure a synchronous replication cache -->
<clustering mode="replication">
<sync/>
</clustering>
<!--
Used to register JMX statistics in any available MBean server
-->
<jmxStatistics enabled="true"/>
</default>
<!-- transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup" -->
<namedCache name="transactional-type">
<transaction
transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"
syncRollbackPhase="true"
syncCommitPhase="true"
cacheStopTimeout="30000"
use1PcForAutoCommitTransactions="true"
autoCommit="true"
lockingMode="OPTIMISTIC"
useSynchronization="true"
transactionMode="TRANSACTIONAL"
/>
</namedCache>
</infinispan>
Infinispan 9 config.
-----------------------
<?xml version="1.0" encoding="UTF-8"?>
<infinispan>
<jgroups>
<stack-file name="tcp" path="configuration/jgroups.xml"/>
</jgroups>
<cache-container default-cache="default">
<transport stack="tcp" cluster="cluster" channel="other"/>
<replicated-cache name="default" statistics="true" mode="ASYNC" remote-timeout="20000" >
<locking
isolation="READ_COMMITTED" striping="false" acquire-timeout="180000"
write-skew="true"/>
<versioning scheme="SIMPLE"/>
</replicated-cache>
<replicated-cache name="transactional-type" statistics="true" mode="SYNC" remote-timeout="20000" >
<locking
isolation="READ_COMMITTED" striping="false" acquire-timeout="180000"
write-skew="true"/>
<transaction
locking="OPTIMISTIC"
auto-commit="true"
complete-timeout="60000"
mode="FULL_XA"
notifications="true"
stop-timeout="30000"
transaction-manager-lookup="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"/>
</replicated-cache>
</cache-container>
</infinispan>
jgroup.xml after the upgrade.
-----------------------------
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">
<TCP bind_addr="${jgroups.tcp.address}"
bind_port="12000" port_range="10"
recv_buf_size="20000000"
send_buf_size="640000"
bundler_type="no-bundler"
enable_diagnostics="true"
thread_naming_pattern="cl"
thread_pool.min_threads="0"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
/>
<TCPGOSSIP initial_hosts="${jgroups.tcpgossip.initial_hosts}"/>
<MERGE3/>
<FD timeout="3000" max_tries="3"/>
<VERIFY_SUSPECT timeout="500"/>
<pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="true" />
<UNICAST3/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="10m"/>
<pbcast.GMS print_local_addr="true" join_timeout="${jgroups.join_timeout:5000}"
max_bundling_time="30"
view_bundling="true"/>
<MFC max_credits="2M" min_threshold="0.4"/>
<FRAG3/>
</config>
Everything seems to be working including the merge after the split brain recovery, Below exception occurs after recovering from split brain. this does not happens consistently. 1 out of 4 times this happens.
2018-06-12 16:15:21.748 IST [jgroups-302,ndb-cluster,ndbDevBox-39617] ERROR org.infinispan.interceptors.impl.InvocationContextInterceptor - ISPN000136: Error executing command PrepareCommand, writing keys [admin]
org.infinispan.remoting.RemoteException: ISPN000217: Received exception from DevBox-43633, see cause for remote stack trace
at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:27) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]
at org.infinispan.remoting.transport.impl.MapResponseCollector.addException(MapResponseCollector.java:65) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]
at org.infinispan.remoting.transport.impl.MapResponseCollector$IgnoreLeavers.addException(MapResponseCollector.java:103) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]
at org.infinispan.remoting.transport.ValidResponseCollector.addResponse(ValidResponseCollector.java:29) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]
at org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:91) ~[infinispan-core-9.2.4.Final.jar:9.2.4.Final]
at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52) ~[na:na]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1318) [infinispan-core-9.2.4.Final.jar:9.2.4.Final]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1221) [infinispan-core-9.2.4.Final.jar:9.2.4.Final]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:123) [infinispan-core-9.2.4.Final.jar:9.2.4.Final]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1356) [infinispan-core-9.2.4.Final.jar:9.2.4.Final]
at org.jgroups.JChannel.up(JChannel.java:819) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:893) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.FRAG3.up(FRAG3.java:171) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.FlowControl.up(FlowControl.java:343) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.FD.up(FD.java:212) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.MERGE3.up(MERGE3.java:276) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.Discovery.up(Discovery.java:267) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.protocols.TP.passMessageUp(TP.java:1248) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87) [jgroups-4.0.11.Final.jar:4.0.11.Final]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key admin and requestor RecoveryAwareGlobalTransaction{xid=Xid{formatId=131077, globalTransactionId=00000000000000000000FFFF7F00010100008DB35B1F966000000D0E31,branchQualifier=00000000000000000000FFFF7F00010100008DB35B1F966000000D0F0000000000000000}, internalId=1688854155231238} GlobalTx:DevBox-39617:837. Lock is held by RecoveryAwareGlobalTransaction{xid=Xid{formatId=131077, globalTransactionId=00000000000000000000FFFF7F0001010000BA065B1F967200000FCE31,branchQualifier=00000000000000000000FFFF7F0001010000BA065B1F967200000FCF0000000000000000}, internalId=1688854155231236} GlobalTx:DevBox-43633:1012
at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288) ~[na:na]
at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:218) ~[na:na]
at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:436) ~[na:na]
at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:408) ~[na:na]
at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:248) ~[na:na]
at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:272) ~[na:na]
at org.infinispan.interceptors.locking.OptimisticLockingInterceptor.visitPrepareCommand(OptimisticLockingInterceptor.java:53) ~[na:na]
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:183) ~[na:na]
at org.infinispan.interceptors.impl.TxInterceptor.handlePrepareCommand(TxInterceptor.java:137) ~[na:na]
at org.infinispan.interceptors.impl.TxInterceptor.visitPrepareCommand(TxInterceptor.java:126) ~[na:na]
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[na:na]
at org.infinispan.interceptors.BaseAsyncInterceptor.lambda$new$0(BaseAsyncInterceptor.java:22) ~[na:na]
at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:21) ~[na:na]
at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.addCallback(SimpleAsyncInvocationStage.java:69) ~[na:na]
at org.infinispan.interceptors.InvocationStage.thenApply(InvocationStage.java:24) ~[na:na]
at org.infinispan.interceptors.BaseAsyncInterceptor.asyncInvokeNext(BaseAsyncInterceptor.java:224) ~[na:na]
at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitCommand(TransactionSynchronizerInterceptor.java:46) ~[na:na]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:185) ~[na:na]
at org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:203) ~[na:na]
at org.infinispan.statetransfer.StateTransferInterceptor.visitPrepareCommand(StateTransferInterceptor.java:69) ~[na:na]
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) ~[na:na]
at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[na:na]
at org.infinispan.interceptors.DDAsyncInterceptor.visitPrepareCommand(DDAsyncInterceptor.java:132) ~[na:na]
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) ~[na:na]
at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) ~[na:na]
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) ~[na:na]
at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) ~[na:na]
at org.infinispan.interceptors.DDAsyncInterceptor.visitPrepareCommand(DDAsyncInterceptor.java:132) ~[na:na]
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:185) ~[na:na]
at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) ~[na:na]
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) ~[na:na]
at org.infinispan.commands.tx.PrepareCommand.invokeAsync(PrepareCommand.java:109) ~[na:na]
at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[na:na]
at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) ~[na:na]
at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) ~[na:na]
at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) ~[na:na]
... 3 common frames omitted
Cluster members can change the same key.
Need help in fixing this exception. Not sure if this was discussed any thread. could not find any thing so posted it.
Any code change is required in the way cache update ?
Is this the configuration mistake or anything else need to be corrected?