9 Replies Latest reply on Dec 10, 2012 12:51 PM by vblagojevic

    Infinispan gets locked and idle when a cluster is synchronizing

    zeituni

      Hi,

      We are using Infinispan version 5.1.5 Final and experiencing an outage of the infinispan when a cluster is trying to synchronize, this is what we see in the logs (jbossUpload.log):

       

      2012-11-29 19:33:28,148 ERROR [org.infinispan.cacheviews.CacheViewsManagerImpl] ISPN000172: Failed to prepare view CacheView{viewId=28, members=[BBTPNJDADO5-G-FN-SPMB-02-23310, BBTPNJDADO4-G--FN-SPMB-01-33526]} for cache  SIM-CACHE, rolling back to view CacheView{viewId=27, members=[BBTPNJDADO5-G-FN-SPMB-02-23310]}

      java.util.concurrent.TimeoutException

                at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)

                at java.util.concurrent.FutureTask.get(FutureTask.java:91)

                at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:318)

                at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:249)

                at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:875)

                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

                at java.util.concurrent.FutureTask.run(FutureTask.java:138)

                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                at java.lang.Thread.run(Thread.java:662)

      2012-11-29 19:33:28,911 WARN  [org.infinispan.cacheviews.CacheViewsManagerImpl] ISPN000168: Error rolling back to cache view 27 for cache SIM-CACHE

      java.lang.IllegalArgumentException: Cannot rollback to view 27, we are at view 25

                at org.infinispan.statetransfer.BaseStateTransferManagerImpl.rollbackView(BaseStateTransferManagerImpl.java:364)

                at org.infinispan.cacheviews.CacheViewsManagerImpl.handleRollbackView(CacheViewsManagerImpl.java:538)

                at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterRollbackView(CacheViewsManagerImpl.java:354)

                at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:274)

                at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:875)

                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

                at java.util.concurrent.FutureTask.run(FutureTask.java:138)

                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                at java.lang.Thread.run(Thread.java:662)

      2012-11-29 19:33:39,896 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] ISPN000136: Execution error

      org.infinispan.statetransfer.StateTransferInProgressException: Timed out waiting for the state transfer lock, state transfer in progress for view 24

                at org.infinispan.interceptors.StateTransferLockInterceptor.signalStateTransferInProgress(StateTransferLockInterceptor.java:201)

                at org.infinispan.interceptors.StateTransferLockInterceptor.handleWriteCommand(StateTransferLockInterceptor.java:177)

                at org.infinispan.interceptors.StateTransferLockInterceptor.visitPutKeyValueCommand(StateTransferLockInterceptor.java:152)

                at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:132)

                at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:91)

                at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:62)

                at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77)

                at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)

                at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1006)

                at org.infinispan.CacheImpl.put(CacheImpl.java:702)

                at org.infinispan.CacheImpl.put(CacheImpl.java:694)

                at org.infinispan.CacheSupport.put(CacheSupport.java:53)

                at com.flash.storage.service.cache.DistributedCacheStorage.setIdentity(DistributedCacheStorage.java:81)

                at com.flash.storage.service.cache.dao.IPToMSISDNAccessStorage.setIdentity(IPToMSISDNAccessStorage.java:76)

                at com.flash.storage.server.handlers.GetSessionInformationBaseHandler.addHPIDataToSIM(GetSessionInformationBaseHandler.java:34)

                at com.flash.storage.server.handlers.GetNATSessionInformationProtocolHandler.getSessionInformation(GetNATSessionInformationProtocolHandler.java:63)

                at com.flash.storage.server.handlers.TcpProtocolHandler.handleRequest(TcpProtocolHandler.java:69)

                at com.flash.externalInterface.impls.BaseProtocolHandler.messageReceived(BaseProtocolHandler.java:98)

                at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.messageReceived(AbstractIoFilterChain.java:570)

                at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:299)

                at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:53)

                at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:648)

                at org.apache.mina.filter.executor.ExecutorFilter.processEvent(ExecutorFilter.java:220)

                at org.apache.mina.filter.executor.ExecutorFilter$ProcessEventsRunnable.run(ExecutorFilter.java:264)

                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                at java.lang.Thread.run(Thread.java:662)

       

      We have 2 members in the cluster on 2 different servers. Our Infinispan cache handles ~2000 TPS: ~1000 get requests, and ~1000 put / remove requests. This problem usually occurs when the cache size is more than 500,000 records - and the average size is ~2 million records.

       

      This is how our jgroups tcp.xml looks like:

      <config xmlns="urn:org:jgroups"

              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

              xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd">

          <TCP bind_port="20000"

               bind_addr="${bind.address}"

               loopback="false"

               discard_incompatible_packets="true"

               use_send_queues="true"

               sock_conn_timeout="300"

               conn_expire_time="1800000"

       

               timer_type="new"

               timer.min_threads="4"

               timer.max_threads="30"

               timer.keep_alive_time="3000"

               timer.queue_max_size="500"

       

               thread_pool.enabled="true"

               thread_pool.min_threads="1"

               thread_pool.max_threads="30"

               thread_pool.keep_alive_time="5000"

               thread_pool.queue_enabled="false"

               thread_pool.queue_max_size="100"

               thread_pool.rejection_policy="discard"

       

               oob_thread_pool.enabled="true"

               oob_thread_pool.min_threads="1"

               oob_thread_pool.max_threads="8"

               oob_thread_pool.keep_alive_time="5000"

               oob_thread_pool.queue_enabled="false"

               oob_thread_pool.queue_max_size="100"

               oob_thread_pool.rejection_policy="discard"/>

       

          <TCPPING timeout="2000"

                   initial_hosts="${active.storage.bind.address}[20000],${passive.storage.bind.address}[20000]"

                   port_range="2"

                   num_initial_members="3"/>

          <MERGE2  min_interval="10000"

                   max_interval="20000"/>

          <FD_SOCK/>

          <FD timeout="3000" max_tries="3" />

          <VERIFY_SUSPECT timeout="1500"  />

          <BARRIER />

          <pbcast.NAKACK use_mcast_xmit="false"

                         exponential_backoff="500"

                         discard_delivered_msgs="true"/>

          <UNICAST />

          <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="4M"/>

          <pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true"/>

          <UFC max_credits="2M"

               min_threshold="0.4"/>

          <MFC max_credits="2M"

               min_threshold="0.4"/>

           <FRAG2 frag_size="8192" />

               <pbcast.STATE_TRANSFER/>

      </config>

       

      This is how our infinispan is configured via code:

       

      Configuration configuration = new ConfigurationBuilder().clustering().cacheMode("REPL_ASYNC").async()

           .replQueueInterval(3000).replQueueMaxElements(50).useReplQueue(false)

           .stateTransfer().fetchInMemoryState(true).timeout(10000).eviction().strategy(EvictionStrategy.LIRS)

           .maxEntries(5000000).expiration().lifespan(-1).maxIdle(259200000).wakeUpInterval(4320000).build();

       

      We have upgraded from the former jboss-cache to infinispan so we will be able to handle more load.

      Whar are we missing here?

      Thanks,

      Uri Zeituni