4 Replies Latest reply on Nov 26, 2012 9:10 AM by dan.berindei

    Timeout initializing cache

    acorbellini

      Hi!,

       

      I've been struggling with these issue for several weeks and couldn't find a workaround or a fix for it.

       

      The thing is that, I succesfully load my data (about 40gb) into a 6-node infinispan cluster and persist them using a jdbm loader.

       

      Now, when I try to restart infinispan (by shutting it down and restarting), it seems to get in a deadlock, and finally throws the following Exception:

       

       

      2012-11-14 10:00:52,130 WARN  [LegacyStoreConfigurationBuilder] (InfinispanServer-Main) ISPN000149: Fetch persistent state and purge on startup are both disabled, cache may contain stale entries on startup
      2012-11-14 10:00:52,861 WARN  [Configurator] (InfinispanServer-Main) [JGRP00013] TP.discard_incompatible_packets has been deprecated: incompatible packets are discarded anyway
      2012-11-14 10:00:52,898 WARN  [Configurator] (InfinispanServer-Main) [JGRP00013] UNICAST2.setTimeout has been deprecated: not used anymore
      Failed to boot JBoss:
      org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
           at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
           at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:883)
           at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:654)
           at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:643)
           at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:546)
           at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:199)
           at org.infinispan.CacheImpl.start(CacheImpl.java:520)
           at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:690)
           at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:653)
           at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:549)
           at org.infinispan.server.hotrod.HotRodServer$$anonfun$preStartCaches$1.apply(HotRodServer.scala:127)
           at org.infinispan.server.hotrod.HotRodServer$$anonfun$preStartCaches$1.apply(HotRodServer.scala:125)
           at scala.collection.Iterator$class.foreach(Iterator.scala:772)
           at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573)
           at org.infinispan.server.hotrod.HotRodServer.preStartCaches(HotRodServer.scala:125)
           at org.infinispan.server.hotrod.HotRodServer.startTransport(HotRodServer.scala:114)
           at org.infinispan.server.core.AbstractProtocolServer.start(AbstractProtocolServer.scala:99)
           at org.infinispan.server.hotrod.HotRodServer.start(HotRodServer.scala:93)
           at org.infinispan.server.core.Main$.boot(Main.scala:140)
           at org.infinispan.server.core.Main$$anon$1.call(Main.scala:94)
           at org.infinispan.server.core.Main$$anon$1.call(Main.scala:91)
           at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
           at java.util.concurrent.FutureTask.run(FutureTask.java:138)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
           at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
           at java.lang.Thread.run(Thread.java:662)
      Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache mycache on GridCluster1-60905
           at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:209)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
           at java.lang.reflect.Method.invoke(Method.java:597)
           at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
           ... 27 more
      Exception in thread "main" java.util.concurrent.ExecutionException: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
           at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
           at java.util.concurrent.FutureTask.get(FutureTask.java:83)
           at org.infinispan.server.core.Main$.main(Main.scala:112)
           at org.infinispan.server.core.Main.main(Main.scala)
      Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
           at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
           at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:883)
           at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:654)
           at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:643)
           at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:546)
           at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:199)
           at org.infinispan.CacheImpl.start(CacheImpl.java:520)
           at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:690)
           at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:653)
           at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:549)
           at org.infinispan.server.hotrod.HotRodServer$$anonfun$preStartCaches$1.apply(HotRodServer.scala:127)
           at org.infinispan.server.hotrod.HotRodServer$$anonfun$preStartCaches$1.apply(HotRodServer.scala:125)
           at scala.collection.Iterator$class.foreach(Iterator.scala:772)
           at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573)
           at org.infinispan.server.hotrod.HotRodServer.preStartCaches(HotRodServer.scala:125)
           at org.infinispan.server.hotrod.HotRodServer.startTransport(HotRodServer.scala:114)
           at org.infinispan.server.core.AbstractProtocolServer.start(AbstractProtocolServer.scala:99)
           at org.infinispan.server.hotrod.HotRodServer.start(HotRodServer.scala:93)
           at org.infinispan.server.core.Main$.boot(Main.scala:140)
           at org.infinispan.server.core.Main$$anon$1.call(Main.scala:94)
           at org.infinispan.server.core.Main$$anon$1.call(Main.scala:91)
           at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
           at java.util.concurrent.FutureTask.run(FutureTask.java:138)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
           at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
           at java.lang.Thread.run(Thread.java:662)
      Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache mycache on GridCluster1-60905
           at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:209)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
           at java.lang.reflect.Method.invoke(Method.java:597)
           at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
           ... 27 more
      2012-11-14 13:10:50,718 WARN  [GMS] (Incoming-2,GridCluster1-60905) GridCluster1-60905: not member of view [GridCluster5-57770|9]; discarding it
      2012-11-14 13:10:53,128 WARN  [GMS] (ViewHandler,GridCluster1-60905) GridCluster1-60905: failed to collect all ACKs (expected=3) for view [GridCluster1-60905|10] after 2000ms, missing ACKs from [GridCluster4-28055, GridCluster2-32147]
      2012-11-14 13:11:08,746 WARN  [NAKACK] (OOB-97,GridCluster1-60905) [JGRP00011] GridCluster1-60905: dropped message 21 from non-member GridCluster5-57770 (view=[GridCluster1-60905|10] [GridCluster1-60905, GridCluster4-28055, GridCluster2-32147])
      2012-11-14 13:11:51,670 ERROR [ClusterTopologyManagerImpl] (notification-thread-0) ISPN000196: Failed to recover cluster state after the current node became the coordinator
      java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
           at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
           at java.util.concurrent.FutureTask.get(FutureTask.java:91)
           at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:563)
           at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:427)
           at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:231)
           at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:593)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
           at java.lang.reflect.Method.invoke(Method.java:597)
           at org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:200)
           at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
           at java.lang.Thread.run(Thread.java:662)
      Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
           at org.infinispan.util.Util.rewrapAsCacheException(Util.java:532)
           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:152)
           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.broadcastRemoteCommands(CommandAwareRpcDispatcher.java:196)
           at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:498)
           at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:545)
           at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:542)
           at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
           at java.util.concurrent.FutureTask.run(FutureTask.java:138)
           ... 3 more
      Caused by: org.jgroups.TimeoutException: TimeoutException
           at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)
           at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)
           at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)
           at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)
           at org.jgroups.protocols.RSVP.down(RSVP.java:118)
           at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
           at org.jgroups.JChannel.down(JChannel.java:700)
           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:622)
           at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:173)
           at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360)
           at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103)
           at org.jgroups.blocks.Request.execute(Request.java:83)
           at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:335)
           at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249)
           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:330)
           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:145)
           ... 9 more
      
      

       

      My infinispan.xml looks like this:

       

       

      <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd "
                xmlns="urn:infinispan:config:5.0">
                <global>
                          <globalJmxStatistics cacheManagerName="mycache"
                                    enabled="true" jmxDomain="org.infinispan" /> 
                          <transport clusterName="infinispan">
                                    <properties>
                                              <property name="configurationFile" value="/home/user/infinispan/jgroups-udp.xml" />
                                    </properties>
                          </transport>
                          <asyncTransportExecutor>
                             <properties>
                                <property name="maxThreads" value="1000" />
                             </properties>
                          </asyncTransportExecutor>
                </global>
                <default>
                          <expiration lifespan="100"/>
                </default>
      
      
                <namedCache name="mycache">
                          <locking useLockStriping="false" 
                                    concurrencyLevel="100" 
                                    lockAcquisitionTimeout="10000"/>
                          <clustering mode="dist">
                                    <async />
                                    <hash numOwners="1" />
                                    <stateTransfer fetchInMemoryState="false"/>
                          </clustering>
                          <eviction maxEntries="100000" strategy="LRU" />
        
                          <loaders passivation="false" preload="false">
                                    <loader  purgeOnStartup="false" 
                                              ignoreModifications="false"
                                              class="org.infinispan.loaders.jdbm.JdbmCacheStore">
                                              <async enabled="true" threadPoolSize="10" />
                                              <properties>
                                                        <property name="location" value="/home/user/jdbm-db/lists" />
                                              </properties>
                                    </loader>
                          </loaders>
                </namedCache>
      </infinispan>
      
      

       

      I'm currently using a version of infinispan built by myself using the source code from github (built using just "mvn package"). I've already tried using version 5.0.1, 5.2.0-Beta3, 5.1.6-Final, with no luck. I've already seen https://issues.jboss.org/browse/ISPN-2373 , which is why I'm using the source build.

      I've already tried using TCP, with the same result.

       

      Any idea of what is wrong with my configuration?

       

      Thanks in advance!!

      Ale

        • 1. Re: Timeout initializing cache
          dan.berindei

          Your log looks a bit odd: the first messages are logged at 10:00 and the last messages at 13:00. Did you really wait for 3 hours for the cluster to start???

           

          Based on the second part of the log I think it might be a JGroups problem - you should not get a TimeoutException in RSVP if JGroups is functioning properly. Well, unless all the threads in your OOB pool were busy, but I don't think that's the case.

           

          Please post your JGroups configuration and some logs with DEBUG enabled for everything (they shouldn't be very big).

          • 2. Re: Timeout initializing cache
            acorbellini

            Hi!

            In first place, thanks for your answer. Before using the source build, it took about 10 min to fail.

            As I was coding another alternatives to Infinispan, I didn't realized it took 3 hours to fail. I think that's connected with the LockBreakingService log messages shown below.

             

            In the first post, I posted the standard output from the server (sorry). Here is my output from infinispan.log (I thing DEBUG is enabled) :

             

            2012-11-17 12:08:47,562 INFO  (main) [org.infinispan.server.core.Main$] ISPN005001: Start main with args: -r, hotrod, -c, /home/acorbellini/infinispan/infinispan.xml, -l, 192.168.240.1, -p, 11222
            2012-11-17 12:08:47,755 DEBUG (InfinispanServer-Main) [org.infinispan.util.FileLookup] Unable to find file /home/acorbellini/infinispan/infinispan.xml in classpath; searching for this file on the filesystem instead.
            2012-11-17 12:08:48,240 WARN  (InfinispanServer-Main) [org.infinispan.configuration.cache.LegacyStoreConfigurationBuilder] ISPN000149: Fetch persistent state and purge on startup are both disabled, cache may contain stale entries on startup
            2012-11-17 12:08:48,269 DEBUG (InfinispanServer-Main) [org.infinispan.util.ModuleProperties] Loading lifecycle SPI class: org.infinispan.server.memcached.LifecycleCallbacks@494b6bed
            2012-11-17 12:08:48,270 DEBUG (InfinispanServer-Main) [org.infinispan.util.ModuleProperties] Loading lifecycle SPI class: org.infinispan.server.hotrod.LifecycleCallbacks@b3e75a5
            2012-11-17 12:08:48,282 DEBUG (InfinispanServer-Main) [org.infinispan.util.ModuleProperties] Loading lifecycle SPI class: org.infinispan.cli.interpreter.LifecycleCallbacks@c569c60
            2012-11-17 12:08:48,284 DEBUG (InfinispanServer-Main) [org.infinispan.util.ModuleProperties] Loading lifecycle SPI class: org.infinispan.server.core.LifecycleCallbacks@786db724
            2012-11-17 12:08:48,373 DEBUG (InfinispanServer-Main) [org.infinispan.util.ModuleProperties] No module command extensions to load
            2012-11-17 12:08:48,472 DEBUG (InfinispanServer-Main) [org.infinispan.manager.DefaultCacheManager] Started cache manager infinispan-twitter on null
            2012-11-17 12:08:48,575 DEBUG (InfinispanServer-Main) [org.infinispan.server.hotrod.HotRodServer] Starting server with basic settings: host=192.168.240.1, port=11222, masterThreads=-1, workerThreads=12, idleTimeout=-1, tcpNoDelay=true, sendBufSize=0, recvBufSize=0
            2012-11-17 12:08:48,769 INFO  (InfinispanServer-Main) [org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000078: Starting JGroups Channel
            2012-11-17 12:08:48,914 DEBUG (InfinispanServer-Main) [org.jgroups.conf.ClassConfigurator] Using jg-magic-map.xml as magic number file and jg-protocol-ids.xml for protocol IDs
            2012-11-17 12:08:49,263 DEBUG (InfinispanServer-Main) [org.jgroups.stack.Configurator] set property UDP.diagnostics_addr to default value /224.0.75.75
            2012-11-17 12:08:49,313 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.FRAG2] received CONFIG event: {bind_addr=/192.168.240.1}
            2012-11-17 12:08:49,326 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.UDP] sockets will use interface 192.168.240.1
            2012-11-17 12:08:49,330 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.UDP] socket information:
            , mcast_addr=228.6.7.8:46655, bind_addr=/192.168.240.1, ttl=2
            sock: bound to 192.168.240.1:54149, receive buffer size=20000000, send buffer size=640000
            mcast_sock: bound to 192.168.240.1:46655, send buffer size=640000, receive buffer size=25000000
            2012-11-17 12:08:49,332 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.pbcast.GMS] address=GridCluster1-26415, cluster=infinispan-twitter, physical address=192.168.240.1:54149
            2012-11-17 12:08:49,454 WARN  (OOB-3,GridCluster1-26415) [org.jgroups.protocols.UDP] [JGRP00012] discarded message from different cluster DKVS (our cluster is infinispan-twitter). Sender was e868c8d8-3552-5e8f-5186-7d2645511457
            2012-11-17 12:08:49,532 WARN  (OOB-3,GridCluster1-26415) [org.jgroups.protocols.UDP] [JGRP00012] discarded message from different cluster DKVS (our cluster is infinispan-twitter). Sender was 561006b2-13d7-6c3d-f1c9-134914678135
            2012-11-17 12:08:50,074 WARN  (OOB-3,GridCluster1-26415) [org.jgroups.protocols.UDP] [JGRP00012] discarded message from different cluster DKVS (our cluster is infinispan-twitter). Sender was 20ed63eb-095b-5eeb-418a-7439ff20154a
            2012-11-17 12:08:50,128 WARN  (OOB-3,GridCluster1-26415) [org.jgroups.protocols.UDP] [JGRP00012] discarded message from different cluster DKVS (our cluster is infinispan-twitter). Sender was 2ae07fa6-7aef-9c5d-2656-2ad8fb194409
            2012-11-17 12:08:50,136 WARN  (OOB-3,GridCluster1-26415) [org.jgroups.protocols.UDP] [JGRP00012] discarded message from different cluster DKVS (our cluster is infinispan-twitter). Sender was e62ab18b-9677-eac0-81d9-0b1081c50ff9
            2012-11-17 12:08:50,357 WARN  (OOB-3,GridCluster1-26415) [org.jgroups.protocols.UDP] [JGRP00012] discarded message from different cluster DKVS (our cluster is infinispan-twitter). Sender was 3a0ff170-11de-405e-6104-daafb3c0a298
            2012-11-17 12:08:50,991 WARN  (OOB-3,GridCluster1-26415) [org.jgroups.protocols.UDP] [JGRP00012] discarded message from different cluster DKVS (our cluster is infinispan-twitter). Sender was 82fd23cc-17b7-d74b-0dfc-61a35e79c3ed
            2012-11-17 12:08:52,182 WARN  (OOB-3,GridCluster1-26415) [org.jgroups.protocols.UDP] [JGRP00012] discarded message from different cluster DKVS (our cluster is infinispan-twitter). Sender was a0c99186-d28b-abe9-5d28-5eeaabe61bc5
            2012-11-17 12:08:52,346 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.pbcast.GMS] election results: {}
            2012-11-17 12:08:52,848 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.pbcast.GMS] election results: {GridCluster3-61902=1}
            2012-11-17 12:08:52,848 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.pbcast.GMS] sending JOIN(GridCluster1-26415) to GridCluster3-61902
            2012-11-17 12:08:53,319 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.pbcast.NAKACK2] 
            [GridCluster1-26415 setDigest()]
            existing digest:  []
            new digest:       GridCluster3-61902: [2 (2)], GridCluster7-33991: [0 (0)], GridCluster2-5131: [0 (0)], GridCluster1-26415: [0 (0)], GridCluster4-5796: [0 (0)], GridCluster5-11747: [0 (0)]
            resulting digest: GridCluster2-5131: [0 (0)], GridCluster1-26415: [0 (0)], GridCluster3-61902: [2 (2)], GridCluster5-11747: [0 (0)], GridCluster4-5796: [0 (0)], GridCluster7-33991: [0 (0)]
            2012-11-17 12:08:53,319 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.pbcast.GMS] GridCluster1-26415: installing view [GridCluster3-61902|3] [GridCluster3-61902, GridCluster7-33991, GridCluster2-5131, GridCluster1-26415, GridCluster4-5796, GridCluster5-11747]
            2012-11-17 12:08:53,320 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [GridCluster3-61902, GridCluster7-33991, GridCluster2-5131, GridCluster1-26415, GridCluster4-5796, GridCluster5-11747]
            2012-11-17 12:08:53,324 DEBUG (FD_SOCK pinger,GridCluster1-26415) [org.jgroups.protocols.FD_SOCK] ping_dest is GridCluster4-5796, pingable_mbrs=[GridCluster3-61902, GridCluster7-33991, GridCluster2-5131, GridCluster1-26415, GridCluster4-5796, GridCluster5-11747]
            2012-11-17 12:08:53,345 DEBUG (InfinispanServer-Main) [org.jgroups.protocols.pbcast.STABLE] [ergonomics] setting max_bytes to 6MB (6 members)
            2012-11-17 12:08:53,355 DEBUG (InfinispanServer-Main) [org.infinispan.remoting.transport.jgroups.JGroupsTransport] New view accepted: [GridCluster3-61902|3] [GridCluster3-61902, GridCluster7-33991, GridCluster2-5131, GridCluster1-26415, GridCluster4-5796, GridCluster5-11747]
            2012-11-17 12:08:53,356 INFO  (InfinispanServer-Main) [org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [GridCluster3-61902|3] [GridCluster3-61902, GridCluster7-33991, GridCluster2-5131, GridCluster1-26415, GridCluster4-5796, GridCluster5-11747]
            2012-11-17 12:08:53,407 INFO  (InfinispanServer-Main) [org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000079: Cache local address is GridCluster1-26415, physical addresses are [192.168.240.1:54149]
            2012-11-17 12:08:53,407 DEBUG (InfinispanServer-Main) [org.infinispan.remoting.transport.jgroups.JGroupsTransport] Waiting on view being accepted
            2012-11-17 12:08:53,612 DEBUG (InfinispanServer-Main) [org.infinispan.interceptors.InterceptorChain] Interceptor chain size: 5
            2012-11-17 12:08:53,612 DEBUG (InfinispanServer-Main) [org.infinispan.interceptors.InterceptorChain] Interceptor chain is: 
                      >> org.infinispan.interceptors.InvocationContextInterceptor
                      >> org.infinispan.interceptors.NotificationInterceptor
                      >> org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor
                      >> org.infinispan.interceptors.EntryWrappingInterceptor
                      >> org.infinispan.interceptors.CallInterceptor
            2012-11-17 12:08:53,649 DEBUG (InfinispanServer-Main) [org.infinispan.CacheImpl] Started cache ___defaultcache on GridCluster1-26415
            2012-11-17 12:08:53,719 DEBUG (InfinispanServer-Main) [org.infinispan.loaders.decorators.AsyncStore] Async cache loader starting org.infinispan.loaders.decorators.AsyncStore@190872ce
            2012-11-17 12:08:53,795 DEBUG (InfinispanServer-Main) [org.infinispan.loaders.jdbm.JdbmCacheStore] CacheLoader located as 131100
            2012-11-17 12:08:53,811 INFO  (InfinispanServer-Main) [org.infinispan.loaders.jdbm.JdbmCacheStore] ISPN009001: JDBM database /home/acorbellini/jdbm-twitter/lists/twitter-cache opened
            2012-11-17 12:08:53,811 DEBUG (InfinispanServer-Main) [org.infinispan.loaders.jdbm.JdbmCacheStore] cleaning up expired entries...
            2012-11-17 12:08:53,812 DEBUG (InfinispanServer-Main) [org.infinispan.loaders.jdbm.JdbmCacheStore] started
            2012-11-17 12:08:53,816 DEBUG (InfinispanServer-Main) [org.infinispan.interceptors.InterceptorChain] Interceptor chain size: 10
            2012-11-17 12:08:53,816 DEBUG (InfinispanServer-Main) [org.infinispan.interceptors.InterceptorChain] Interceptor chain is: 
                      >> org.infinispan.interceptors.InvocationContextInterceptor
                      >> org.infinispan.interceptors.IsMarshallableInterceptor
                      >> org.infinispan.statetransfer.StateTransferInterceptor
                      >> org.infinispan.interceptors.NotificationInterceptor
                      >> org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor
                      >> org.infinispan.interceptors.EntryWrappingInterceptor
                      >> org.infinispan.interceptors.ClusteredCacheLoaderInterceptor
                      >> org.infinispan.interceptors.DistCacheStoreInterceptor
                      >> org.infinispan.interceptors.DistributionInterceptor
                      >> org.infinispan.interceptors.CallInterceptor
            2012-11-17 12:08:53,821 DEBUG (InfinispanServer-Main) [org.infinispan.topology.LocalTopologyManagerImpl] Node GridCluster1-26415 joining cache twitter-cache
            2012-11-17 12:08:53,868 DEBUG (InfinispanServer-Main) [org.infinispan.topology.LocalTopologyManagerImpl] Updating local consistent hash(es) for cache twitter-cache: new topology = CacheTopology{id=1, currentCH=DefaultConsistentHash{numSegments=60, numOwners=1, members=[GridCluster7-33991], owners={0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 0, 52: 0, 53: 0, 54: 0, 55: 0, 56: 0, 57: 0, 58: 0, 59: 0}, pendingCH=DefaultConsistentHash{numSegments=60, numOwners=1, members=[GridCluster7-33991, GridCluster3-61902], owners={0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 1, 31: 1, 32: 1, 33: 1, 34: 1, 35: 1, 36: 1, 37: 1, 38: 1, 39: 1, 40: 1, 41: 1, 42: 1, 43: 1, 44: 1, 45: 1, 46: 1, 47: 1, 48: 1, 49: 1, 50: 1, 51: 1, 52: 1, 53: 1, 54: 1, 55: 1, 56: 1, 57: 1, 58: 1, 59: 1}}
            2012-11-17 12:08:53,871 DEBUG (InfinispanServer-Main) [org.infinispan.transaction.TransactionTable] Topology changed, recalculating minTopologyId
            2012-11-17 12:08:54,334 DEBUG (OOB-3,GridCluster1-26415) [org.infinispan.topology.LocalTopologyManagerImpl] Starting local rebalance for cache twitter-cache, topology = CacheTopology{id=1, currentCH=DefaultConsistentHash{numSegments=60, numOwners=1, members=[GridCluster7-33991], owners={0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 0, 51: 0, 52: 0, 53: 0, 54: 0, 55: 0, 56: 0, 57: 0, 58: 0, 59: 0}, pendingCH=DefaultConsistentHash{numSegments=60, numOwners=1, members=[GridCluster7-33991, GridCluster3-61902], owners={0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 1, 31: 1, 32: 1, 33: 1, 34: 1, 35: 1, 36: 1, 37: 1, 38: 1, 39: 1, 40: 1, 41: 1, 42: 1, 43: 1, 44: 1, 45: 1, 46: 1, 47: 1, 48: 1, 49: 1, 50: 1, 51: 1, 52: 1, 53: 1, 54: 1, 55: 1, 56: 1, 57: 1, 58: 1, 59: 1}}
            2012-11-17 12:08:54,704 DEBUG (LockBreakingService,twitter-cache,GridCluster1-26415) [org.infinispan.transaction.TransactionTable] About to cleanup completed transaction. Initial size is 0
            2012-11-17 12:08:54,705 DEBUG (LockBreakingService,twitter-cache,GridCluster1-26415) [org.infinispan.transaction.TransactionTable] Finished cleaning up completed transactions. 0 transactions were removed, total duration was 0 millis, current number of completed transactions is 0
            
            

             

            The last 2 messages repeat every second. I'm using the jgroups-udp.xml that comes with infinispan. I borrowed that xml file for another project, and it works fine. The "only" difference is that I used pbcast.STATE_TRANSFER for managing shared state between nodes.

             

            Thanks again!

            • 3. Re: Timeout initializing cache
              dan.berindei

              Ok, I imagine the messages around 13:10 are actually caused by you stopping the cluster then, and there isn't anything wrong with JGroups.

               

              The "LockBreakingService" messages are completely harmless, there is a thread trying to garbage-collect finished transactions every seconds and it just logs that there aren't any transactions (because the cache didn't  start yet). In fact, these logs have already been switched to TRACE in master.

               

              I'm not sure I understand your comment about pbcast.STATE_TRANSFER. Are you saying that you added STATE_TRANSFER only in your other project that uses JGroups directly?

               

              From the log you posted, it looks like a state transfer started between the first 2 nodes that started (GridCluster7 and GridCluster3) and it never finished. Possibly one of them tried to load all the entries in the cache store (although there should be no need to do that with the settings you posted) and caused some sort of timeout that the cluster never recovered from.

               

              Could post the logs from all the nodes and also a dump of the thread stacks on all of them? Actually you should try with TRACE enabled first, if you'll get GBs of logs in a couple of minutes then you'll know something's wrong...

              • 4. Re: Timeout initializing cache
                dan.berindei

                I think I know what's the problem: when the cluster starts back up, the consistent hash (the way Infinispan wants to distribute  data between the nodes) is not the same as it was when it had shut down. So there is a lot of data in the local cache store that suddenly "doesn't belong" on local node, and Infinispan is trying to delete it.

                 

                This is not a problem when the first node starts, because it "owns" everything, but as soon as the second node starts up they both end up deleting a lot of keys from their cache stores (even though they don't actually swap any data between them, as both in-memory and persistent data transfer is disabled. We are doing this because we don't want any node to end up with stale data (there is not easy way to "expire" this data, so it stays stale more or less forever). But perhaps we shouldn't try to delete anything from the cache store when persistent data transfer is disabled...