14 Replies Latest reply on Jul 13, 2012 12:28 PM by dex80526

    jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup

    dex80526

      I have 2 node cluster in replication mode using jgroup-tcp.xml config.

       

      It has a cache loader configured with  ~1/2 million entries in Berkley DB. 

      It will take a long 8 miuntes for second node to join the cluster. I saw lot exceptions in the logs.

       

      What and where should I look to fix the problem?

       

       

       

       

      On the joining node:

      2012-06-01/09:57:22.210/MDT [OOB-3,null] WARN org.infinispan.statetransfer.BaseStateTransferManagerImpl[224] - ISPN000167: Rejecting state pushed by node portal2.performancetest.com-48158 for view 7, there is no state transfer in progress (we are at view 8)

      2012-06-01/09:57:22.210/MDT [OOB-3,null] WARN org.infinispan.statetransfer.BaseStateTransferManagerImpl[224] - ISPN000167: Rejecting state pushed by node portal2.performancetest.com-48158 for view 7, there is no state transfer in progress (we are at view 8)

      2012-06-01/09:57:27.271/MDT [OOB-2,null] WARN org.infinispan.statetransfer.BaseStateTransferManagerImpl[224] - ISPN000167: Rejecting state pushed by node portal2.performancetest.com-48158 for view 7, there is no state transfer in progress (we are at view 8)

      2012-06-01/09:57:27.271/MDT [OOB-2,null] WARN org.infinispan.statetransfer.BaseStateTransferManagerImpl[224] - ISPN000167: Rejecting state pushed by node portal2.performancetest.com-48158 for view 7, there is no state transfer in progress (we are at view 8)

       

      On the first node (coordinator):

      2012-06-01/09:53:00.219/MDT [Incoming-1,null] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport[607] - ISPN000094: Received new cluster view: [portal2.performancetest.com-48158|3] [portal2.performancetest.com-48158, portal1.performancetest.com-840]

      2012-06-01/09:53:24.324/MDT [OOB-2,null] WARN org.infinispan.commands.control.CacheViewControlCommand[141] - ISPN000071: Caught exception when handling command CacheViewControlCommand{cache=keychain, type=PREPARE_VIEW, sender=portal1.performancetest.com-42037, newViewId=4, newMembers=[portal2.performancetest.com-48158, portal1.performancetest.com-42037], oldViewId=3, oldMembers=[portal2.performancetest.com-48158]}

      java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

           at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)

           at java.util.concurrent.FutureTask.get(FutureTask.java:119)

           at org.infinispan.util.concurrent.AggregatingNotifyingFutureBuilder.get(AggregatingNotifyingFutureBuilder.java:93)

           at org.infinispan.statetransfer.BaseStateTransferTask.finishPushingState(BaseStateTransferTask.java:139)

           at org.infinispan.statetransfer.ReplicatedStateTransferTask.doPerformStateTransfer(ReplicatedStateTransferTask.java:116)

           at org.infinispan.statetransfer.BaseStateTransferTask.performStateTransfer(BaseStateTransferTask.java:93)

           at org.infinispan.statetransfer.BaseStateTransferManagerImpl.prepareView(BaseStateTransferManagerImpl.java:331)

           at org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:485)

           at org.infinispan.commands.control.CacheViewControlCommand.perform(CacheViewControlCommand.java:126)

           at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:95)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:221)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:201)

           at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:456)

           at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:363)

           at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:238)

           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:543)

           at org.jgroups.JChannel.up(JChannel.java:716)

           at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1026)

           at org.jgroups.protocols.RSVP.up(RSVP.java:179)

           at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)

           at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)

           at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)

           at org.jgroups.protocols.pbcast.GMS.up(GMS.java:889)

           at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)

           at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:759)

           at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:365)

           at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:602)

           at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)

           at org.jgroups.protocols.FD.up(FD.java:273)

           at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)

           at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)

           at org.jgroups.protocols.Discovery.up(Discovery.java:359)

           at org.jgroups.stack.Protocol.up(Protocol.java:363)

           at org.jgroups.protocols.TP.passMessageUp(TP.java:1180)

           at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1728)

           at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1710)

           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

           at java.lang.Thread.run(Thread.java:722)

      Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

           at org.infinispan.util.Util.rewrapAsCacheException(Util.java:525)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:172)

           at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:489)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:161)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:183)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:240)

           at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:78)

           at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:274)

           at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

           at java.util.concurrent.FutureTask.run(FutureTask.java:166)

           ... 3 more

      Caused by: org.jgroups.TimeoutException: TimeoutException

           at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:82)

           at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:41)

           at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)

           at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:275)

           at org.jgroups.protocols.RSVP.down(RSVP.java:114)

           at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1033)

           at org.jgroups.JChannel.down(JChannel.java:730)

           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:559)

           at org.jgroups.blocks.RequestCorrelator.sendUnicastRequest(RequestCorrelator.java:193)

           at org.jgroups.blocks.UnicastRequest.sendRequest(UnicastRequest.java:44)

           at org.jgroups.blocks.Request.execute(Request.java:83)

           at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:342)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:270)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:165)

           ... 11 more

      2012-06-01/09:53:24.324/MDT [OOB-2,null] WARN org.infinispan.commands.control.CacheViewControlCommand[141] - ISPN000071: Caught exception when handling command CacheViewControlCommand{cache=keychain, type=PREPARE_VIEW, sender=portal1.performancetest.com-42037, newViewId=4, newMembers=[portal2.performancetest.com-48158, portal1.performancetest.com-42037], oldViewId=3, oldMembers=[portal2.performancetest.com-48158]}

      java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

           at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)

           at java.util.concurrent.FutureTask.get(FutureTask.java:119)

           at org.infinispan.util.concurrent.AggregatingNotifyingFutureBuilder.get(AggregatingNotifyingFutureBuilder.java:93)

           at org.infinispan.statetransfer.BaseStateTransferTask.finishPushingState(BaseStateTransferTask.java:139)

           at org.infinispan.statetransfer.ReplicatedStateTransferTask.doPerformStateTransfer(ReplicatedStateTransferTask.java:116)

           at org.infinispan.statetransfer.BaseStateTransferTask.performStateTransfer(BaseStateTransferTask.java:93)

           at org.infinispan.statetransfer.BaseStateTransferManagerImpl.prepareView(BaseStateTransferManagerImpl.java:331)

           at org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:485)

           at org.infinispan.commands.control.CacheViewControlCommand.perform(CacheViewControlCommand.java:126)

           at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:95)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:221)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:201)

           at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:456)

           at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:363)

           at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:238)

           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:543)

           at org.jgroups.JChannel.up(JChannel.java:716)

           at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1026)

           at org.jgroups.protocols.RSVP.up(RSVP.java:179)

           at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)

           at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)

           at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)

           at org.jgroups.protocols.pbcast.GMS.up(GMS.java:889)

           at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)

           at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:759)

           at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:365)

           at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:602)

           at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)

           at org.jgroups.protocols.FD.up(FD.java:273)

           at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)

           at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)

           at org.jgroups.protocols.Discovery.up(Discovery.java:359)

           at org.jgroups.stack.Protocol.up(Protocol.java:363)

           at org.jgroups.protocols.TP.passMessageUp(TP.java:1180)

           at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1728)

           at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1710)

           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

           at java.lang.Thread.run(Thread.java:722)

      Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

           at org.infinispan.util.Util.rewrapAsCacheException(Util.java:525)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:172)

           at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:489)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:161)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:183)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:240)

           at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:78)

           at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:274)

           at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

           at java.util.concurrent.FutureTask.run(FutureTask.java:166)

           ... 3 more

      Caused by: org.jgroups.TimeoutException: TimeoutException

           at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:82)

           at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:41)

           at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)

           at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:275)

           at org.jgroups.protocols.RSVP.down(RSVP.java:114)

           at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1033)

           at org.jgroups.JChannel.down(JChannel.java:730)

           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:559)

           at org.jgroups.blocks.RequestCorrelator.sendUnicastRequest(RequestCorrelator.java:193)

           at org.jgroups.blocks.UnicastRequest.sendRequest(UnicastRequest.java:44)

           at org.jgroups.blocks.Request.execute(Request.java:83)

           at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:342)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:270)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:165)

           ... 11 more

        • 1. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
          dex80526

          Here is my jgroup-tcp.xml:

           

          <config xmlns="urn:org:jgroups"

                  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                  xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-3.0.xsd">

             <TCP

                  bind_addr="${jgroups.tcp.address:127.0.0.1}"

                  bind_port="${jgroups.tcp.port:7900}"

                  loopback="true"

                  port_range="1"

                  recv_buf_size="20M"

                  send_buf_size="640K"

                  discard_incompatible_packets="true"

                  max_bundle_size="64K"

                  max_bundle_timeout="30"

                  enable_bundling="true"

                  use_send_queues="true"

                  sock_conn_timeout="300" 

                  enable_diagnostics="false"

                  bundler_type="old"

                  singleton_name="tcp"

                 

                  timer_type="new"

                  timer.min_threads="4"

                  timer.max_threads="10"

                  timer.keep_alive_time="3000"

                  timer.queue_max_size="500"

                 

                  thread_naming_pattern="pl"

           

                  thread_pool.enabled="true"

                  thread_pool.min_threads="2"

                  thread_pool.max_threads="30"

                  thread_pool.keep_alive_time="60000"

                  thread_pool.queue_enabled="true"

                  thread_pool.queue_max_size="100"

                  thread_pool.rejection_policy="Discard"

           

                  oob_thread_pool.enabled="true"

                  oob_thread_pool.min_threads="2"

                  oob_thread_pool.max_threads="30"

                  oob_thread_pool.keep_alive_time="60000"

                  oob_thread_pool.queue_enabled="false"

                  oob_thread_pool.queue_max_size="100"

                  oob_thread_pool.rejection_policy="Discard"       

                   />

           

             <!-- Ergonomics, new in JGroups 2.11, are disabled by default in TCPPING until JGRP-1253 is resolved -->

             <TCPPING timeout="3000"

                      initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7900]}"

                    

                      port_range="1"

                      num_initial_members="1"

                      ergonomics="false"

                  />

          <!--

             <MPING bind_addr="${jgroups.bind_addr:127.0.0.1}" break_on_coord_rsp="true"

                mcast_addr="${jgroups.udp.mcast_addr:228.6.7.8}" mcast_port="${jgroups.udp.mcast_port:46655}" ip_ttl="${jgroups.udp.ip_ttl:2}"

                num_initial_members="3"/>

          -->

             <MERGE2 max_interval="30000"

                     min_interval="10000"/>

             <FD_SOCK start_port="7902" port_range="1"/>

             <FD timeout="3000" max_tries="3"/>

             <VERIFY_SUSPECT timeout="1500"/>

             <pbcast.NAKACK

                   use_mcast_xmit="false"

                   retransmit_timeout="300,600,1200,2400,4800"

                   discard_delivered_msgs="false"/>

             <UNICAST2 timeout="300,600,1200" stable_interval="5000" max_bytes="1M" />

             <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

                            max_bytes="1M"/>

             <pbcast.GMS print_local_addr="false" join_timeout="7000" view_bundling="true"/>

             <UFC max_credits="200K" min_threshold="0.20"/>

             <MFC max_credits="200K" min_threshold="0.20"/>

             <FRAG2 frag_size="60K"/>

             <RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" />

          </config>

          • 2. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
            galder.zamarreno

            What's the Infinispan configuration like?

            • 3. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
              dex80526

              Here is the infinispan config:

              <?xml version="1.0" encoding="UTF-8"?>

              <infinispan

                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                    xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"

                    xmlns="urn:infinispan:config:5.1">

               

                 <global>

                    <transport clusterName="TestCluster"

                             machineId="node1"

                              rackId="r1" nodeName="Node1">

                       <properties>

                          <property name="configurationFile" value="./test-resources/jgroups-tcp.xml" />

                       </properties>

                    </transport>

                    <globalJmxStatistics enabled="false"/>

                    <!--

                          Used to register JVM shutdown hooks.

                          hookBehavior: DEFAULT, REGISTER, DONT_REGISTER

                    -->

                     <shutdown hookBehavior="DONT_REGISTER"/>

                 </global>

               

                 <default>

                   <locking

                       isolationLevel="READ_COMMITTED"

                       lockAcquisitionTimeout="1500"

                       writeSkewCheck="false"

                       concurrencyLevel="500"

                       useLockStriping="false"

                    />

                  

                    <transaction

                          transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"

                         

                          syncRollbackPhase="false"

                          syncCommitPhase="false"

                          useEagerLocking="false"

                          eagerLockSingleNode="false"

                          cacheStopTimeout="30000" />

                      

                    <deadlockDetection enabled="true" spinDuration="1000"/>

                    <jmxStatistics enabled="false"/>

                   

                  </default>

                  

                 <namedCache name="session">

                    <clustering mode="replication">

                       <stateTransfer

                          timeout="240000"

                          fetchInMemoryState="true"

                       />

                       <async useReplQueue="true" replQueueInterval="5000" replQueueMaxElements="500" asyncMarshalling="false" />

               

                    </clustering>

                    <transaction transactionMode="TRANSACTIONAL"/>

                    <eviction

                       maxEntries="500000"

                       strategy="LRU"

                    />

                    <!--  time units below are millseconds -->

                    <expiration

                       wakeUpInterval="-1"

                       lifespan="-1"

                       maxIdle="-1"

                    />

                   

                 </namedCache>

               

                 <namedCache name="keychain" >  <!--  the name must match CacheType.java -->

                    <clustering mode="replication">

                       <stateTransfer

                          timeout="240000"

                          fetchInMemoryState="true"

                       />

                       <sync replTimeout="20000"/> 

                      

                    </clustering>

                    <transaction  transactionMode="TRANSACTIONAL" />

                    <expiration

                       wakeUpInterval="-1"

                       lifespan="-1"

                       maxIdle="-1"

                    />

                     

                       <loaders

                          passivation="false"

                          shared="false"

                          preload="true">

                          <loader

                            class="org.infinispan.loaders.jdbm.JdbmCacheStore"

                            fetchPersistentState="true"

                            purgeOnStartup="false">

                            <properties>

                               <property name="location" value="./target/cacheData/upDB"/>

                            </properties>

                       

                            <async enabled="true" flushLockTimeout="15000" shutdownTimeout="10000" modificationQueueSize="10" threadPoolSize="50"/>

                         

                         </loader>

                      </loaders>

                 </namedCache>

               

                  <!-- LDAP user store cookie cache -->

                  <namedCache name="ispn-ldapcookie">

                      <clustering mode="replication">

                          <stateTransfer

                                  timeout="240000"

                                  fetchInMemoryState="true"

                                  />

                          <async useReplQueue="true" replQueueInterval="5000" replQueueMaxElements="50" asyncMarshalling="false" />

                      </clustering>

                      <loaders passivation="false" shared="false" preload="true">

                          <loader

                             class="org.infinispan.loaders.jdbm.JdbmCacheStore"

                             fetchPersistentState="true"

                             purgeOnStartup="false">

                            <properties>

                               <property name="location" value="./target/cacheData/ldapcooki"/>

                            </properties>

                            <async enabled="true" flushLockTimeout="15000" shutdownTimeout="10000" modificationQueueSize="10" threadPoolSize="5"/>

                         </loader>

                      </loaders>

                      

                      <transaction transactionMode="NON_TRANSACTIONAL"/>

                     

                      <!--  time units below are millseconds -->

                      <expiration

                              wakeUpInterval="-1"

                              lifespan="-1"

                              maxIdle="-1"

                              />

               

                  </namedCache>

                  <!-- Cluster Wide Lock Token -->

                  <namedCache name="ispn-locktoken">

                      <clustering mode="replication">

                       <stateTransfer fetchInMemoryState="true" timeout="240000"/>

                    </clustering>

                    <transaction transactionMode="TRANSACTIONAL" cacheStopTimeout="30000" eagerLockSingleNode="false" syncCommitPhase="false" syncRollbackPhase="false" transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup" useEagerLocking="true" lockingMode="OPMISTIC"/>

                  

                    <!--  time units below are millseconds -->

                    <expiration lifespan="-1" maxIdle="-1" wakeUpInterval="1000"/>

               

                  </namedCache>

                 

              </infinispan>

              • 4. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                galder.zamarreno

                The error is coming from a cache called keychain which is configured with JDBM instead of Berkeley, which you claim to have issues with?

                 

                I'd try switching fetchPersistentState to false in the keychain cache loader config.

                 

                Other than that, I'd profile the startup to see where the time is going and why the timeout happens. If you don't have a profiler, maybe get some thread dumps and see if it's blocking on something else.

                • 5. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                  dex80526

                  Galder: My infinispan used in the testing was using Berkely Derby (I just copy pasted the similar one which did not replace JDBM).

                  fetchPersistenState by default is set to false according to Doc. Is that not true?

                   

                  I'll try that any way.  Thanks,

                  • 6. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                    dex80526

                    Just FYI.  I come up a workaround to set preload=false, and I have a thread to load cache entries from DB to memory in backgroud, which speeds up the startup.

                    • 7. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                      galder.zamarreno

                      Out of curiosity, how different is the implementation of your cache load logic, compared to our preload? Can you post the code?

                      • 8. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                        dex80526

                        Not much different. I did that for 2 reasons:

                        1) in 5.1.4.Final, there is bug realted to Derby (does not allow to specify maxEntries in evictionm, which may not directly related to this issue)  which you fixed in 5.1.5 now. With that bug, I can not specify max entries to pre-load.

                        2) if pre-load set to true, it could take a long time to load all entries (even if we are able specify maxEntries), which prevent the cache service from serving real requests. This could end up long start up time for the application.

                         

                        My approach will work around the above problems, especially the second one. It reduce the overall startup time of our application.

                         

                        The code wise: I get a connection from the connection pool to the database, and select a set of keys from the DB directly and calling get(key) on the cache for each key, which does not trigger replication (if the entry is already in memeory, it will not load from store again). I might be able to optimize it a bit.

                         

                        Let me know if you see something I could improve and your thought.

                        thanks.

                        • 9. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                          galder.zamarreno

                          Our preload logic does not trigger replication, but does not check if a key is present in memory, it just overrwrites it. It assumes that the cache loader has the right info and if there's anything in memory, it's invalid. Other than that, I don't see much differences right now to explain where you speed up is coming from.

                          • 10. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                            dex80526

                            the speed up is in forming the cluster. If the preload is on, the cache manager startup call (seems to me) completes (returns) only after the preload is completed. In my approach, the join will complete regardless of my lazy load status.

                             

                            If the preload is not blocking cache manager startup, then my approach does not do any difference from enablig preload.

                            • 11. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                              galder.zamarreno

                              Well, the obvious problem there is that if lazy load happens after the cache is started, there can be invocations to the cache that won't return anything (due to timing). That's an invalid situations in many cases, maybe not in yours

                              • 12. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                                dex80526

                                I might be worng. As I understood and observed that the cache will try to load from the cach store when a get is called if the cache is not loaded into the memory by the lazy loader. In other words, the time of lazy load is not an issue.

                                • 13. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                                  galder.zamarreno

                                  Indeed, that is not a problem. What could be problematic is the fact that you can make updates to the cache, so the following could potentially happen (dunno if it applies to your use case):

                                   

                                  1. T1: cache get(a)

                                  2. T1: cache.put(a, 1)

                                  3. T-Preload: cache.put(a, 0)

                                   

                                  In this case, the updated value would be overriden by the preload, but it all depends on the logic of your preload. Our preload just overrides in memory state.

                                  • 14. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                                    dex80526

                                    Galder:  I got your pont. My lazyload does not override values already in memory since I do not use put directly. In other words, the values already in memory will take precendent.  But, it is good to the know the race condition. Thanks,