4 Replies Latest reply on Mar 25, 2008 12:58 AM by vblagojevic

    CacheException: Unable to fetch state on startup

    calatberk

      JBC 2.1CR4 and JGroups 2.6.1, Tomcat/Linux. Initial state transfer fails when the number of nodes increases over 100, when a cache loader is used (either jdbm or berkeley db.) If there are 100 or so or fewer nodes, or no cache loader, state transfer is successful.

      The following log messages are received:

      dev2|DEBUG|2008-03-17 20:42:05,766 Incoming,xxcid,192.168.3.204:5265
      9|handler|org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER|||
      STATE_TRANSFER_INPUTSTREAM_CLOSED received,passing down a RESUME_STABLE event
      dev2|ERROR|2008-03-17 20:42:05,767 Incoming,xxcid,192.168.3.204:5265
      9|handler|org.jgroups.protocols.pbcast.NAKACK|||sender 192.168.2.235:34532
      not found in xmit_table
      dev2|ERROR|2008-03-17 20:42:05,767 Incoming,xxcid,192.168.3.204:5265
      9|handler|org.jgroups.protocols.pbcast.NAKACK|||range is null
      dev2|ERROR|2008-03-17 20:42:05,767 Incoming,xxcid,192.168.3.204:5265
      9|handler|org.jgroups.protocols.pbcast.NAKACK|||sender 192.168.3.204:52659
      not found in xmit_table
      dev2|ERROR|2008-03-17 20:42:05,767 Incoming,xxcid,192.168.3.204:5265
      9|handler|org.jgroups.protocols.pbcast.NAKACK|||range is null

      And then the following exceptions occur shortly:

      org.jboss.cache.CacheException: Unable to fetch state on startup
      at org.jboss.cache.RPCManagerImpl.start(RPCManagerImpl.java:146)
      at org.jboss.cache.CacheImpl.startManualComponents(CacheImpl.java:439)
      at org.jboss.cache.CacheImpl.internalStart(CacheImpl.java:410)
      at org.jboss.cache.CacheImpl.start(CacheImpl.java:344)
      at org.jboss.cache.invocation.CacheInvocationDelegate.start(CacheInvocat
      ionDelegate.java:256)
      at org.jboss.cache.DefaultCacheFactory.createCache(DefaultCacheFactory.j
      ava:96)
      at org.jboss.cache.DefaultCacheFactory.createCache(DefaultCacheFactory.j
      ava:68)
      at org.jboss.cache.DefaultCacheFactory.createCache(DefaultCacheFactory.j
      ava:61) ...
      Caused by: org.jgroups.StateTransferException: 192.168.3.204:52659 could not fetch state null from null
      at org.jgroups.JChannel.connect(JChannel.java:453)
      at rg.jboss.cache.RPCManagerImpl.start(RPCManagerImpl.java:134)
      ... 60 more
      Caused by: org.jgroups.StateTransferException: 192.168.3.204:52659 could not fetch state null from null
      at org.jgroups.JChannel.connect(JChannel.java:446)
      ... 61 more

      JBC config file:




      READ_UNCOMMITTED
      REPL_ASYNC
      xxcid


      <UDP mcast_addr="228.8.8.199" mcast_port="45199" bind_addr="192.168.2.235" />

      <PING timeout="2000" num_initial_members="3"/>
      <MERGE2 max_interval="3000000" min_interval="10000"/>
      <FD timeout="10000" max_tries="5" shun="true"/>
      <VERIFY_SUSPECT timeout="15000"/>
      <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0" retransmit_timeout="3000,6000,12000,24000,48000" discard_delivered_msgs="true"/>

      <pbcast.STABLE stability_delay="10000" desired_avg_gossip="0" max_bytes="40000000"/>
      <pbcast.GMS print_local_addr="true" join_timeout="5000000" shun="false" view_bundling="true" view_ack_collection_timeout="5000000"/>
      <FC max_credits="20000000" min_threshold="0.10"/>
      <FRAG2 frag_size="60000"/>
      <pbcast.STREAMING_STATE_TRANSFER />
      <!-- <pbcast.STATE_TRANSFER/> -->


      true
      5000000
      5000000
      15000



      5
      30000000
      org.jboss.cache.eviction.LFUPolicy

      10000
      0


      350000
      0


      10
      0


      10
      0



      <!-- Cache loader config block -->


      false

      false

      org.jboss.cache.loader.bdbje.BdbjeCacheLoader
      true
      true
      false
      false

      location=./









      Any help is greatly appreciated! Thanks very much.

        • 1. Re: CacheException: Unable to fetch state on startup
          calatberk

          Sorry, the JBC config file:




          READ_UNCOMMITTED
          REPL_ASYNC
          xxcid


          <UDP mcast_addr="228.8.8.199" mcast_port="45199" bind_addr="192.168.2.235" />

          <PING timeout="2000" num_initial_members="3"/>
          <MERGE2 max_interval="3000000" min_interval="10000"/>
          <FD timeout="10000" max_tries="5" shun="true"/>
          <VERIFY_SUSPECT timeout="15000"/>
          <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0" retransmit_timeout="3000,6000,12000,24000,48000" discard_delivered_msgs="true"/>

          <pbcast.STABLE stability_delay="10000" desired_avg_gossip="0" max_bytes="40000000"/>
          <pbcast.GMS print_local_addr="true" join_timeout="5000000" shun="false" view_bundling="true" view_ack_collection_timeout="5000000"/>
          <FC max_credits="20000000" min_threshold="0.10"/>
          <FRAG2 frag_size="60000"/>
          <pbcast.STREAMING_STATE_TRANSFER />
          <!-- <pbcast.STATE_TRANSFER/> -->


          true
          5000000
          5000000
          15000



          5
          30000000
          org.jboss.cache.eviction.LFUPolicy

          10000
          0


          350000
          0


          10
          0


          10
          0



          <!-- Cache loader config block -->


          false

          false

          org.jboss.cache.loader.bdbje.BdbjeCacheLoader
          true
          true
          false
          false

          location=./







          • 2. Re: CacheException: Unable to fetch state on startup
            calatberk

            JBC config file:

            <server>
             <mbean code="org.jboss.cache.jmx.CacheJmxWrapper" name="jboss.cache:service=TreeCache">
            
             <attribute name="IsolationLevel">READ_UNCOMMITTED</attribute>
             <attribute name="CacheMode">REPL_ASYNC</attribute>
             <attribute name="ClusterName">xxcid</attribute>
             <attribute name="ClusterConfig">
             <config>
             <UDP mcast_addr="228.8.8.199" mcast_port="45199" bind_addr="192.168.2.235" />
             <AUTOCONF/>
             <PING timeout="2000" num_initial_members="3"/>
             <MERGE2 max_interval="3000000" min_interval="10000"/>
             <FD timeout="10000" max_tries="5" shun="true"/>
             <VERIFY_SUSPECT timeout="15000"/>
             <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0" retransmit_timeout="3000,6000,12000,24000,48000" discard_delivered_msgs="true"/>
             <UNICAST timeout="3000,6000,12000,24000,36000"/>
             <pbcast.STABLE stability_delay="10000" desired_avg_gossip="0" max_bytes="40000000"/>
             <pbcast.GMS print_local_addr="true" join_timeout="5000000" shun="false" view_bundling="true" view_ack_collection_timeout="5000000"/>
             <FC max_credits="20000000" min_threshold="0.10"/>
             <FRAG2 frag_size="60000"/>
             <pbcast.STREAMING_STATE_TRANSFER />
             <!-- <pbcast.STATE_TRANSFER/> -->
             </config>
             </attribute>
             <attribute name="FetchInMemoryState">true</attribute>
             <attribute name="InitialStateRetrievalTimeout">5000000</attribute>
             <attribute name="SyncReplTimeout">5000000</attribute>
             <attribute name="LockAcquisitionTimeout">15000</attribute>
            
             <attribute name="EvictionPolicyConfig">
             <config>
             <attribute name="wakeUpIntervalSeconds">5</attribute>
             <attribute name="eventQueueSize">30000000</attribute>
             <attribute name="policyClass">org.jboss.cache.eviction.LFUPolicy</attribute>
             <region name="/_default_">
             <attribute name="maxNodes">10000</attribute>
             <attribute name="timeToLiveSeconds">0</attribute>
             </region>
             <region name="/TN">
             <attribute name="maxNodes">350000</attribute>
             <attribute name="timeToLiveSeconds">0</attribute>
             </region>
             <region name="/GID">
             <attribute name="maxNodes">10</attribute>
             <attribute name="timeToLiveSeconds">0</attribute>
             </region>
             <region name="/STB">
             <attribute name="maxNodes">10</attribute>
             <attribute name="timeToLiveSeconds">0</attribute>
             </region>
             </config>
             </attribute>
             <!-- Cache loader config block -->
             <attribute name="CacheLoaderConfig">
             <config>
             <passivation>false</passivation>
             <preload/>
             <shared>false</shared>
             <cacheloader>
             <class>org.jboss.cache.loader.bdbje.BdbjeCacheLoader</class>
             <async>true</async>
             <fetchPersistentState>true</fetchPersistentState>
            <ignoreModifications>false</ignoreModifications>
             <purgeOnStartup>false</purgeOnStartup>
             <properties>
             location=./
             </properties>
            
             </cacheloader>
             </config>
             </attribute>
             </mbean>
            </server>
            


            • 3. Re: CacheException: Unable to fetch state on startup
              calatberk

              Just noticed that 2.1 has been released, but unfortunately run into the same problem using that version.

              • 4. Re: CacheException: Unable to fetch state on startup
                vblagojevic

                How do you start these 100 nodes? Concurrently? Tell us a bit more about your setup.