14 Replies Latest reply on Jul 13, 2012 12:28 PM by dex chen

    jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup

    dex chen Novice

      I have 2 node cluster in replication mode using jgroup-tcp.xml config.

       

      It has a cache loader configured with  ~1/2 million entries in Berkley DB. 

      It will take a long 8 miuntes for second node to join the cluster. I saw lot exceptions in the logs.

       

      What and where should I look to fix the problem?

       

       

       

       

      On the joining node:

      2012-06-01/09:57:22.210/MDT [OOB-3,null] WARN org.infinispan.statetransfer.BaseStateTransferManagerImpl[224] - ISPN000167: Rejecting state pushed by node portal2.performancetest.com-48158 for view 7, there is no state transfer in progress (we are at view 8)

      2012-06-01/09:57:22.210/MDT [OOB-3,null] WARN org.infinispan.statetransfer.BaseStateTransferManagerImpl[224] - ISPN000167: Rejecting state pushed by node portal2.performancetest.com-48158 for view 7, there is no state transfer in progress (we are at view 8)

      2012-06-01/09:57:27.271/MDT [OOB-2,null] WARN org.infinispan.statetransfer.BaseStateTransferManagerImpl[224] - ISPN000167: Rejecting state pushed by node portal2.performancetest.com-48158 for view 7, there is no state transfer in progress (we are at view 8)

      2012-06-01/09:57:27.271/MDT [OOB-2,null] WARN org.infinispan.statetransfer.BaseStateTransferManagerImpl[224] - ISPN000167: Rejecting state pushed by node portal2.performancetest.com-48158 for view 7, there is no state transfer in progress (we are at view 8)

       

      On the first node (coordinator):

      2012-06-01/09:53:00.219/MDT [Incoming-1,null] INFO org.infinispan.remoting.transport.jgroups.JGroupsTransport[607] - ISPN000094: Received new cluster view: [portal2.performancetest.com-48158|3] [portal2.performancetest.com-48158, portal1.performancetest.com-840]

      2012-06-01/09:53:24.324/MDT [OOB-2,null] WARN org.infinispan.commands.control.CacheViewControlCommand[141] - ISPN000071: Caught exception when handling command CacheViewControlCommand{cache=keychain, type=PREPARE_VIEW, sender=portal1.performancetest.com-42037, newViewId=4, newMembers=[portal2.performancetest.com-48158, portal1.performancetest.com-42037], oldViewId=3, oldMembers=[portal2.performancetest.com-48158]}

      java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

           at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)

           at java.util.concurrent.FutureTask.get(FutureTask.java:119)

           at org.infinispan.util.concurrent.AggregatingNotifyingFutureBuilder.get(AggregatingNotifyingFutureBuilder.java:93)

           at org.infinispan.statetransfer.BaseStateTransferTask.finishPushingState(BaseStateTransferTask.java:139)

           at org.infinispan.statetransfer.ReplicatedStateTransferTask.doPerformStateTransfer(ReplicatedStateTransferTask.java:116)

           at org.infinispan.statetransfer.BaseStateTransferTask.performStateTransfer(BaseStateTransferTask.java:93)

           at org.infinispan.statetransfer.BaseStateTransferManagerImpl.prepareView(BaseStateTransferManagerImpl.java:331)

           at org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:485)

           at org.infinispan.commands.control.CacheViewControlCommand.perform(CacheViewControlCommand.java:126)

           at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:95)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:221)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:201)

           at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:456)

           at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:363)

           at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:238)

           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:543)

           at org.jgroups.JChannel.up(JChannel.java:716)

           at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1026)

           at org.jgroups.protocols.RSVP.up(RSVP.java:179)

           at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)

           at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)

           at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)

           at org.jgroups.protocols.pbcast.GMS.up(GMS.java:889)

           at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)

           at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:759)

           at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:365)

           at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:602)

           at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)

           at org.jgroups.protocols.FD.up(FD.java:273)

           at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)

           at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)

           at org.jgroups.protocols.Discovery.up(Discovery.java:359)

           at org.jgroups.stack.Protocol.up(Protocol.java:363)

           at org.jgroups.protocols.TP.passMessageUp(TP.java:1180)

           at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1728)

           at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1710)

           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

           at java.lang.Thread.run(Thread.java:722)

      Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

           at org.infinispan.util.Util.rewrapAsCacheException(Util.java:525)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:172)

           at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:489)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:161)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:183)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:240)

           at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:78)

           at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:274)

           at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

           at java.util.concurrent.FutureTask.run(FutureTask.java:166)

           ... 3 more

      Caused by: org.jgroups.TimeoutException: TimeoutException

           at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:82)

           at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:41)

           at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)

           at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:275)

           at org.jgroups.protocols.RSVP.down(RSVP.java:114)

           at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1033)

           at org.jgroups.JChannel.down(JChannel.java:730)

           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:559)

           at org.jgroups.blocks.RequestCorrelator.sendUnicastRequest(RequestCorrelator.java:193)

           at org.jgroups.blocks.UnicastRequest.sendRequest(UnicastRequest.java:44)

           at org.jgroups.blocks.Request.execute(Request.java:83)

           at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:342)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:270)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:165)

           ... 11 more

      2012-06-01/09:53:24.324/MDT [OOB-2,null] WARN org.infinispan.commands.control.CacheViewControlCommand[141] - ISPN000071: Caught exception when handling command CacheViewControlCommand{cache=keychain, type=PREPARE_VIEW, sender=portal1.performancetest.com-42037, newViewId=4, newMembers=[portal2.performancetest.com-48158, portal1.performancetest.com-42037], oldViewId=3, oldMembers=[portal2.performancetest.com-48158]}

      java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

           at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)

           at java.util.concurrent.FutureTask.get(FutureTask.java:119)

           at org.infinispan.util.concurrent.AggregatingNotifyingFutureBuilder.get(AggregatingNotifyingFutureBuilder.java:93)

           at org.infinispan.statetransfer.BaseStateTransferTask.finishPushingState(BaseStateTransferTask.java:139)

           at org.infinispan.statetransfer.ReplicatedStateTransferTask.doPerformStateTransfer(ReplicatedStateTransferTask.java:116)

           at org.infinispan.statetransfer.BaseStateTransferTask.performStateTransfer(BaseStateTransferTask.java:93)

           at org.infinispan.statetransfer.BaseStateTransferManagerImpl.prepareView(BaseStateTransferManagerImpl.java:331)

           at org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:485)

           at org.infinispan.commands.control.CacheViewControlCommand.perform(CacheViewControlCommand.java:126)

           at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:95)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:221)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:201)

           at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:456)

           at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:363)

           at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:238)

           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:543)

           at org.jgroups.JChannel.up(JChannel.java:716)

           at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1026)

           at org.jgroups.protocols.RSVP.up(RSVP.java:179)

           at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)

           at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)

           at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)

           at org.jgroups.protocols.pbcast.GMS.up(GMS.java:889)

           at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)

           at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:759)

           at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:365)

           at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:602)

           at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)

           at org.jgroups.protocols.FD.up(FD.java:273)

           at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)

           at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)

           at org.jgroups.protocols.Discovery.up(Discovery.java:359)

           at org.jgroups.stack.Protocol.up(Protocol.java:363)

           at org.jgroups.protocols.TP.passMessageUp(TP.java:1180)

           at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1728)

           at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1710)

           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

           at java.lang.Thread.run(Thread.java:722)

      Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

           at org.infinispan.util.Util.rewrapAsCacheException(Util.java:525)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:172)

           at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:489)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:161)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:183)

           at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:240)

           at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:78)

           at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:274)

           at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

           at java.util.concurrent.FutureTask.run(FutureTask.java:166)

           ... 3 more

      Caused by: org.jgroups.TimeoutException: TimeoutException

           at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:82)

           at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:41)

           at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)

           at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:275)

           at org.jgroups.protocols.RSVP.down(RSVP.java:114)

           at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1033)

           at org.jgroups.JChannel.down(JChannel.java:730)

           at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:559)

           at org.jgroups.blocks.RequestCorrelator.sendUnicastRequest(RequestCorrelator.java:193)

           at org.jgroups.blocks.UnicastRequest.sendRequest(UnicastRequest.java:44)

           at org.jgroups.blocks.Request.execute(Request.java:83)

           at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:342)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:270)

           at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:165)

           ... 11 more

        • 1. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
          dex chen Novice

          Here is my jgroup-tcp.xml:

           

          <config xmlns="urn:org:jgroups"

                  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                  xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-3.0.xsd">

             <TCP

                  bind_addr="${jgroups.tcp.address:127.0.0.1}"

                  bind_port="${jgroups.tcp.port:7900}"

                  loopback="true"

                  port_range="1"

                  recv_buf_size="20M"

                  send_buf_size="640K"

                  discard_incompatible_packets="true"

                  max_bundle_size="64K"

                  max_bundle_timeout="30"

                  enable_bundling="true"

                  use_send_queues="true"

                  sock_conn_timeout="300" 

                  enable_diagnostics="false"

                  bundler_type="old"

                  singleton_name="tcp"

                 

                  timer_type="new"

                  timer.min_threads="4"

                  timer.max_threads="10"

                  timer.keep_alive_time="3000"

                  timer.queue_max_size="500"

                 

                  thread_naming_pattern="pl"

           

                  thread_pool.enabled="true"

                  thread_pool.min_threads="2"

                  thread_pool.max_threads="30"

                  thread_pool.keep_alive_time="60000"

                  thread_pool.queue_enabled="true"

                  thread_pool.queue_max_size="100"

                  thread_pool.rejection_policy="Discard"

           

                  oob_thread_pool.enabled="true"

                  oob_thread_pool.min_threads="2"

                  oob_thread_pool.max_threads="30"

                  oob_thread_pool.keep_alive_time="60000"

                  oob_thread_pool.queue_enabled="false"

                  oob_thread_pool.queue_max_size="100"

                  oob_thread_pool.rejection_policy="Discard"       

                   />

           

             <!-- Ergonomics, new in JGroups 2.11, are disabled by default in TCPPING until JGRP-1253 is resolved -->

             <TCPPING timeout="3000"

                      initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7900]}"

                    

                      port_range="1"

                      num_initial_members="1"

                      ergonomics="false"

                  />

          <!--

             <MPING bind_addr="${jgroups.bind_addr:127.0.0.1}" break_on_coord_rsp="true"

                mcast_addr="${jgroups.udp.mcast_addr:228.6.7.8}" mcast_port="${jgroups.udp.mcast_port:46655}" ip_ttl="${jgroups.udp.ip_ttl:2}"

                num_initial_members="3"/>

          -->

             <MERGE2 max_interval="30000"

                     min_interval="10000"/>

             <FD_SOCK start_port="7902" port_range="1"/>

             <FD timeout="3000" max_tries="3"/>

             <VERIFY_SUSPECT timeout="1500"/>

             <pbcast.NAKACK

                   use_mcast_xmit="false"

                   retransmit_timeout="300,600,1200,2400,4800"

                   discard_delivered_msgs="false"/>

             <UNICAST2 timeout="300,600,1200" stable_interval="5000" max_bytes="1M" />

             <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

                            max_bytes="1M"/>

             <pbcast.GMS print_local_addr="false" join_timeout="7000" view_bundling="true"/>

             <UFC max_credits="200K" min_threshold="0.20"/>

             <MFC max_credits="200K" min_threshold="0.20"/>

             <FRAG2 frag_size="60K"/>

             <RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" />

          </config>

          • 3. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
            dex chen Novice

            Here is the infinispan config:

            <?xml version="1.0" encoding="UTF-8"?>

            <infinispan

                  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                  xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"

                  xmlns="urn:infinispan:config:5.1">

             

               <global>

                  <transport clusterName="TestCluster"

                           machineId="node1"

                            rackId="r1" nodeName="Node1">

                     <properties>

                        <property name="configurationFile" value="./test-resources/jgroups-tcp.xml" />

                     </properties>

                  </transport>

                  <globalJmxStatistics enabled="false"/>

                  <!--

                        Used to register JVM shutdown hooks.

                        hookBehavior: DEFAULT, REGISTER, DONT_REGISTER

                  -->

                   <shutdown hookBehavior="DONT_REGISTER"/>

               </global>

             

               <default>

                 <locking

                     isolationLevel="READ_COMMITTED"

                     lockAcquisitionTimeout="1500"

                     writeSkewCheck="false"

                     concurrencyLevel="500"

                     useLockStriping="false"

                  />

                

                  <transaction

                        transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"

                       

                        syncRollbackPhase="false"

                        syncCommitPhase="false"

                        useEagerLocking="false"

                        eagerLockSingleNode="false"

                        cacheStopTimeout="30000" />

                    

                  <deadlockDetection enabled="true" spinDuration="1000"/>

                  <jmxStatistics enabled="false"/>

                 

                </default>

                

               <namedCache name="session">

                  <clustering mode="replication">

                     <stateTransfer

                        timeout="240000"

                        fetchInMemoryState="true"

                     />

                     <async useReplQueue="true" replQueueInterval="5000" replQueueMaxElements="500" asyncMarshalling="false" />

             

                  </clustering>

                  <transaction transactionMode="TRANSACTIONAL"/>

                  <eviction

                     maxEntries="500000"

                     strategy="LRU"

                  />

                  <!--  time units below are millseconds -->

                  <expiration

                     wakeUpInterval="-1"

                     lifespan="-1"

                     maxIdle="-1"

                  />

                 

               </namedCache>

             

               <namedCache name="keychain" >  <!--  the name must match CacheType.java -->

                  <clustering mode="replication">

                     <stateTransfer

                        timeout="240000"

                        fetchInMemoryState="true"

                     />

                     <sync replTimeout="20000"/> 

                    

                  </clustering>

                  <transaction  transactionMode="TRANSACTIONAL" />

                  <expiration

                     wakeUpInterval="-1"

                     lifespan="-1"

                     maxIdle="-1"

                  />

                   

                     <loaders

                        passivation="false"

                        shared="false"

                        preload="true">

                        <loader

                          class="org.infinispan.loaders.jdbm.JdbmCacheStore"

                          fetchPersistentState="true"

                          purgeOnStartup="false">

                          <properties>

                             <property name="location" value="./target/cacheData/upDB"/>

                          </properties>

                     

                          <async enabled="true" flushLockTimeout="15000" shutdownTimeout="10000" modificationQueueSize="10" threadPoolSize="50"/>

                       

                       </loader>

                    </loaders>

               </namedCache>

             

                <!-- LDAP user store cookie cache -->

                <namedCache name="ispn-ldapcookie">

                    <clustering mode="replication">

                        <stateTransfer

                                timeout="240000"

                                fetchInMemoryState="true"

                                />

                        <async useReplQueue="true" replQueueInterval="5000" replQueueMaxElements="50" asyncMarshalling="false" />

                    </clustering>

                    <loaders passivation="false" shared="false" preload="true">

                        <loader

                           class="org.infinispan.loaders.jdbm.JdbmCacheStore"

                           fetchPersistentState="true"

                           purgeOnStartup="false">

                          <properties>

                             <property name="location" value="./target/cacheData/ldapcooki"/>

                          </properties>

                          <async enabled="true" flushLockTimeout="15000" shutdownTimeout="10000" modificationQueueSize="10" threadPoolSize="5"/>

                       </loader>

                    </loaders>

                    

                    <transaction transactionMode="NON_TRANSACTIONAL"/>

                   

                    <!--  time units below are millseconds -->

                    <expiration

                            wakeUpInterval="-1"

                            lifespan="-1"

                            maxIdle="-1"

                            />

             

                </namedCache>

                <!-- Cluster Wide Lock Token -->

                <namedCache name="ispn-locktoken">

                    <clustering mode="replication">

                     <stateTransfer fetchInMemoryState="true" timeout="240000"/>

                  </clustering>

                  <transaction transactionMode="TRANSACTIONAL" cacheStopTimeout="30000" eagerLockSingleNode="false" syncCommitPhase="false" syncRollbackPhase="false" transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup" useEagerLocking="true" lockingMode="OPMISTIC"/>

                

                  <!--  time units below are millseconds -->

                  <expiration lifespan="-1" maxIdle="-1" wakeUpInterval="1000"/>

             

                </namedCache>

               

            </infinispan>

            • 4. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
              Galder Zamarreño Master

              The error is coming from a cache called keychain which is configured with JDBM instead of Berkeley, which you claim to have issues with?

               

              I'd try switching fetchPersistentState to false in the keychain cache loader config.

               

              Other than that, I'd profile the startup to see where the time is going and why the timeout happens. If you don't have a profiler, maybe get some thread dumps and see if it's blocking on something else.

              • 5. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                dex chen Novice

                Galder: My infinispan used in the testing was using Berkely Derby (I just copy pasted the similar one which did not replace JDBM).

                fetchPersistenState by default is set to false according to Doc. Is that not true?

                 

                I'll try that any way.  Thanks,

                • 6. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                  dex chen Novice

                  Just FYI.  I come up a workaround to set preload=false, and I have a thread to load cache entries from DB to memory in backgroud, which speeds up the startup.

                  • 7. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                    Galder Zamarreño Master

                    Out of curiosity, how different is the implementation of your cache load logic, compared to our preload? Can you post the code?

                    • 8. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                      dex chen Novice

                      Not much different. I did that for 2 reasons:

                      1) in 5.1.4.Final, there is bug realted to Derby (does not allow to specify maxEntries in evictionm, which may not directly related to this issue)  which you fixed in 5.1.5 now. With that bug, I can not specify max entries to pre-load.

                      2) if pre-load set to true, it could take a long time to load all entries (even if we are able specify maxEntries), which prevent the cache service from serving real requests. This could end up long start up time for the application.

                       

                      My approach will work around the above problems, especially the second one. It reduce the overall startup time of our application.

                       

                      The code wise: I get a connection from the connection pool to the database, and select a set of keys from the DB directly and calling get(key) on the cache for each key, which does not trigger replication (if the entry is already in memeory, it will not load from store again). I might be able to optimize it a bit.

                       

                      Let me know if you see something I could improve and your thought.

                      thanks.

                      • 9. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                        Galder Zamarreño Master

                        Our preload logic does not trigger replication, but does not check if a key is present in memory, it just overrwrites it. It assumes that the cache loader has the right info and if there's anything in memory, it's invalid. Other than that, I don't see much differences right now to explain where you speed up is coming from.

                        • 10. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                          dex chen Novice

                          the speed up is in forming the cluster. If the preload is on, the cache manager startup call (seems to me) completes (returns) only after the preload is completed. In my approach, the join will complete regardless of my lazy load status.

                           

                          If the preload is not blocking cache manager startup, then my approach does not do any difference from enablig preload.

                          • 11. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                            Galder Zamarreño Master

                            Well, the obvious problem there is that if lazy load happens after the cache is started, there can be invocations to the cache that won't return anything (due to timing). That's an invalid situations in many cases, maybe not in yours

                            • 12. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                              dex chen Novice

                              I might be worng. As I understood and observed that the cache will try to load from the cach store when a get is called if the cache is not loaded into the memory by the lazy loader. In other words, the time of lazy load is not an issue.

                              • 13. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                                Galder Zamarreño Master

                                Indeed, that is not a problem. What could be problematic is the fact that you can make updates to the cache, so the following could potentially happen (dunno if it applies to your use case):

                                 

                                1. T1: cache get(a)

                                2. T1: cache.put(a, 1)

                                3. T-Preload: cache.put(a, 0)

                                 

                                In this case, the updated value would be overriden by the preload, but it all depends on the logic of your preload. Our preload just overrides in memory state.

                                • 14. Re: jgroups.TimeoutException causes failure of prepare view and long time to form a cluster in startup
                                  dex chen Novice

                                  Galder:  I got your pont. My lazyload does not override values already in memory since I do not use put directly. In other words, the values already in memory will take precendent.  But, it is good to the know the race condition. Thanks,