Infinispan gets locked and idle when a cluster is synchronizing
zeituni Dec 3, 2012 6:09 AMHi,
We are using Infinispan version 5.1.5 Final and experiencing an outage of the infinispan when a cluster is trying to synchronize, this is what we see in the logs (jbossUpload.log):
2012-11-29 19:33:28,148 ERROR [org.infinispan.cacheviews.CacheViewsManagerImpl] ISPN000172: Failed to prepare view CacheView{viewId=28, members=[BBTPNJDADO5-G-FN-SPMB-02-23310, BBTPNJDADO4-G--FN-SPMB-01-33526]} for cache SIM-CACHE, rolling back to view CacheView{viewId=27, members=[BBTPNJDADO5-G-FN-SPMB-02-23310]}
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:318)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:249)
at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:875)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2012-11-29 19:33:28,911 WARN [org.infinispan.cacheviews.CacheViewsManagerImpl] ISPN000168: Error rolling back to cache view 27 for cache SIM-CACHE
java.lang.IllegalArgumentException: Cannot rollback to view 27, we are at view 25
at org.infinispan.statetransfer.BaseStateTransferManagerImpl.rollbackView(BaseStateTransferManagerImpl.java:364)
at org.infinispan.cacheviews.CacheViewsManagerImpl.handleRollbackView(CacheViewsManagerImpl.java:538)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterRollbackView(CacheViewsManagerImpl.java:354)
at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:274)
at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:875)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2012-11-29 19:33:39,896 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] ISPN000136: Execution error
org.infinispan.statetransfer.StateTransferInProgressException: Timed out waiting for the state transfer lock, state transfer in progress for view 24
at org.infinispan.interceptors.StateTransferLockInterceptor.signalStateTransferInProgress(StateTransferLockInterceptor.java:201)
at org.infinispan.interceptors.StateTransferLockInterceptor.handleWriteCommand(StateTransferLockInterceptor.java:177)
at org.infinispan.interceptors.StateTransferLockInterceptor.visitPutKeyValueCommand(StateTransferLockInterceptor.java:152)
at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:132)
at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:91)
at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:62)
at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)
at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1006)
at org.infinispan.CacheImpl.put(CacheImpl.java:702)
at org.infinispan.CacheImpl.put(CacheImpl.java:694)
at org.infinispan.CacheSupport.put(CacheSupport.java:53)
at com.flash.storage.service.cache.DistributedCacheStorage.setIdentity(DistributedCacheStorage.java:81)
at com.flash.storage.service.cache.dao.IPToMSISDNAccessStorage.setIdentity(IPToMSISDNAccessStorage.java:76)
at com.flash.storage.server.handlers.GetSessionInformationBaseHandler.addHPIDataToSIM(GetSessionInformationBaseHandler.java:34)
at com.flash.storage.server.handlers.GetNATSessionInformationProtocolHandler.getSessionInformation(GetNATSessionInformationProtocolHandler.java:63)
at com.flash.storage.server.handlers.TcpProtocolHandler.handleRequest(TcpProtocolHandler.java:69)
at com.flash.externalInterface.impls.BaseProtocolHandler.messageReceived(BaseProtocolHandler.java:98)
at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.messageReceived(AbstractIoFilterChain.java:570)
at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:299)
at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:53)
at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:648)
at org.apache.mina.filter.executor.ExecutorFilter.processEvent(ExecutorFilter.java:220)
at org.apache.mina.filter.executor.ExecutorFilter$ProcessEventsRunnable.run(ExecutorFilter.java:264)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
We have 2 members in the cluster on 2 different servers. Our Infinispan cache handles ~2000 TPS: ~1000 get requests, and ~1000 put / remove requests. This problem usually occurs when the cache size is more than 500,000 records - and the average size is ~2 million records.
This is how our jgroups tcp.xml looks like:
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd">
<TCP bind_port="20000"
bind_addr="${bind.address}"
loopback="false"
discard_incompatible_packets="true"
use_send_queues="true"
sock_conn_timeout="300"
conn_expire_time="1800000"
timer_type="new"
timer.min_threads="4"
timer.max_threads="30"
timer.keep_alive_time="3000"
timer.queue_max_size="500"
thread_pool.enabled="true"
thread_pool.min_threads="1"
thread_pool.max_threads="30"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="8"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"/>
<TCPPING timeout="2000"
initial_hosts="${active.storage.bind.address}[20000],${passive.storage.bind.address}[20000]"
port_range="2"
num_initial_members="3"/>
<MERGE2 min_interval="10000"
max_interval="20000"/>
<FD_SOCK/>
<FD timeout="3000" max_tries="3" />
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK use_mcast_xmit="false"
exponential_backoff="500"
discard_delivered_msgs="true"/>
<UNICAST />
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="4M"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true"/>
<UFC max_credits="2M"
min_threshold="0.4"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="8192" />
<pbcast.STATE_TRANSFER/>
</config>
This is how our infinispan is configured via code:
Configuration configuration = new ConfigurationBuilder().clustering().cacheMode("REPL_ASYNC").async()
.replQueueInterval(3000).replQueueMaxElements(50).useReplQueue(false)
.stateTransfer().fetchInMemoryState(true).timeout(10000).eviction().strategy(EvictionStrategy.LIRS)
.maxEntries(5000000).expiration().lifespan(-1).maxIdle(259200000).wakeUpInterval(4320000).build();
We have upgraded from the former jboss-cache to infinispan so we will be able to handle more load.
Whar are we missing here?
Thanks,
Uri Zeituni