6 Replies Latest reply on Mar 14, 2008 8:45 AM by Jorge Morales

    FetchInMemoryState && InitialStateRetrievalTimeout

    David Sancho Newbie

      Hi guys

      I am facing a problem when trying to get the state from cache in a 2-clustered cache enviroment. What a I get from the node which is starting and trying to get the state from another node in the cluster is:

      2008-03-10 11:01:20,630 WARN [org.jboss.system.ServiceController] Problem starting service jboss.cache:service=ServicesDataCache
      org.jboss.cache.CacheException: Initial state transfer failed: Channel.getState() returned false


      As I saw in http://wiki.jboss.org/wiki/Wiki.jsp?page=JBossCacheTroubleshooting I increased the InitialStateRetrievalTimeout parameter in the cache configuration file and everything worked fine.
      Then, I wanted to test setting the parameter FetchInMemoryState to false so I could test a different solution (in my case, I do not care using a cold loading cache) What I got from this test is that some cache attributes (around 55000) were transferred to the cache that is starting (there are around 90000 attributes in the other node)
      Perhaps, I am missing some configuration parameter. I am using Jboss Cache 1.4.1.SP8 and JbossAS 4.2.2.GA. Here is my cache config file:

      <?xml version="1.0" encoding="UTF-8"?>
      
      <!-- ===================================================================== -->
      <!-- -->
      <!-- ServicesCache Service Configuration -->
      <!-- -->
      <!-- ===================================================================== -->
      
      <server>
      
       <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar" />
      
      
       <!-- ==================================================================== -->
       <!-- Defines TreeCache configuration -->
       <!-- ==================================================================== -->
      
      
       <mbean code="org.jboss.cache.TreeCache"
       name="jboss.cache:service=ServicesDataCache">
      
       <depends>jboss:service=Naming</depends>
       <depends>jboss:service=TransactionManager</depends>
      
       <attribute name="TransactionManagerLookupClass">
       org.jboss.cache.JBossTransactionManagerLookup
       </attribute>
      
       <!--
       Isolation level : SERIALIZABLE
       REPEATABLE_READ (default)
       READ_COMMITTED
       READ_UNCOMMITTED
       NONE
       -->
       <attribute name="IsolationLevel">NONE</attribute>
      
       <!--
       Valid modes are LOCAL
       REPL_ASYNC
       REPL_SYNC
       INVALIDATION_ASYNC
       INVALIDATION_SYNC
       -->
       <attribute name="CacheMode">REPL_SYNC</attribute>
      
       <!--
       Just used for async repl: use a replication queue
       -->
       <attribute name="UseReplQueue">false</attribute>
      
       <!--
       Replication interval for replication queue (in ms)
       -->
       <attribute name="ReplQueueInterval">0</attribute>
      
       <!--
       Max number of elements which trigger replication
       -->
       <attribute name="ReplQueueMaxElements">0</attribute>
      
       <!-- Name of cluster. Needs to be the same for all clusters, in order
       to find each other
       -->
       <attribute name="ClusterName">
       SOMServicesCache-Cluster
       </attribute>
      
       <!-- JGroups protocol stack properties. Can also be a URL,
       e.g. file:/home/bela/default.xml
       <attribute name="ClusterProperties"></attribute>
       -->
      
       <attribute name="ClusterConfig">
       <config>
       <!-- UDP: if you have a multihomed machine,
       set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
       -->
       <!-- UDP: On Windows machines, because of the media sense feature
       being broken with multicast (even after disabling media sense)
       set the loopback attribute to true -->
       <UDP
       mcast_addr="${jboss.cache.ServicesCache.addr:228.1.2.3}"
       mcast_port="${jboss.cache.ServicesCache.port:48866}" ip_ttl="64"
       ip_mcast="true" mcast_send_buf_size="150000"
       mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
       ucast_recv_buf_size="80000" loopback="false" />
       <PING timeout="2000" num_initial_members="3"
       up_thread="false" down_thread="false" />
       <MERGE2 min_interval="10000" max_interval="20000" />
       <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
       <FD_SOCK />
       <VERIFY_SUSPECT timeout="1500" up_thread="false"
       down_thread="false" />
       <pbcast.NAKACK gc_lag="50"
       retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192"
       up_thread="false" down_thread="false" />
       <UNICAST timeout="600,1200,2400" down_thread="false" />
       <pbcast.STABLE desired_avg_gossip="20000"
       up_thread="false" down_thread="false" />
       <FRAG frag_size="8192" down_thread="false"
       up_thread="false" />
       <pbcast.GMS join_timeout="5000"
       join_retry_timeout="2000" shun="true" print_local_addr="true" />
       <pbcast.STATE_TRANSFER up_thread="true"
       down_thread="true" />
       </config>
       </attribute>
      
      
       <!--
       Whether or not to fetch state on joining a cluster
       NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
       -->
       <attribute name="FetchInMemoryState">false</attribute>
      
       <!--
       The max amount of time (in milliseconds) we wait until the
       initial state (ie. the contents of the cache) are retrieved from
       existing members in a clustered environment
       -->
       <attribute name="InitialStateRetrievalTimeout">15000</attribute>
      
       <!--
       Number of milliseconds to wait until all responses for a
       synchronous call have been received.
       -->
       <attribute name="SyncReplTimeout">15000</attribute>
      
       <!-- Max number of milliseconds to wait for a lock acquisition -->
       <attribute name="LockAcquisitionTimeout">10000</attribute>
      
       <!-- Name of the eviction policy class. -->
       <attribute name="EvictionPolicyClass">
       org.jboss.cache.eviction.LRUPolicy
       </attribute>
      
       <!-- Specific eviction policy configurations. This is LRU -->
       <attribute name="EvictionPolicyConfig">
       <config>
       <!-- This attribute will be share by all eviction policies -->
       <attribute name="wakeUpIntervalSeconds">30</attribute>
       <!-- Cache wide default 86400-->
       <region name="/_default_">
       <attribute name="maxNodes">100000</attribute>
       <attribute name="timeToLiveSeconds">86400</attribute>
       </region>
       </config>
       </attribute>
      
      
       <!--
       Indicate whether to use region based marshalling or not. Set this to true if you are running under a scoped
       class loader, e.g., inside an application server. Default is "false".
       -->
      
       <attribute name="UseRegionBasedMarshalling">true</attribute>
       <attribute name="InactiveOnStartup">false</attribute>
      
       <attribute name="CacheLoaderConfiguration">
       <config>
       <passivation>false</passivation>
       <preload>/</preload>
       <shared>true</shared>
      
       <cacheloader>
       <class>
       org.jboss.cache.loader.ClusteredCacheLoader
       </class>
       <properties>timeout=1000</properties>
       <async>true</async>
       <fetchPersistentState>false</fetchPersistentState>
       <ignoreModifications>false</ignoreModifications>
      
       </cacheloader>
      
       </config>
       </attribute>
      
      
       </mbean>
      
      
       <!-- Uncomment to get a graphical view of the TreeCache MBean above -->
       <!-- <mbean code="org.jboss.cache.TreeCacheView" name="jboss.cache:service=TreeCacheView">-->
       <!-- <depends>jboss.cache:service=TreeCache</depends>-->
       <!-- <attribute name="CacheService">jboss.cache:service=TreeCache</attribute>-->
       <!-- </mbean>-->
      
      
      </server>
      
      



      Hope you could help me.
      Thanks in advance



        • 1. Re: FetchInMemoryState && InitialStateRetrievalTimeout
          Manik Surtani Master

          What are you trying to achieve here? If you want state to be transferred from neighbouring caches in the cluster, you have 2 options - 1) enable FetchInMemoryState with a suitable InitialStateRetrievalTimeout, or 2) use the ClusteredCacheLoader which will load these elements lazily.

          Since they are fetched lazily there is no guarantee that all cache instances will have all the state at the same time. Also, the fact that you have an eviction policy with no persistent cache loader means that you could be losing cached state.

          • 2. Re: FetchInMemoryState && InitialStateRetrievalTimeout
            Jorge Morales Master

            We need to assure that the data that is not evicted is accesible from both nodes of the cluster.

            We have a big ammount of data in cache, so start up time is very high, if we do an initial state trasfer. We were trying to test if disabling initialStateTransfer and allowing the cache to get state as time passes by, getting it from the other node was a performance point to take into account, wheter it was for better or for worse. We supposed that startup time should have been much better, but instead of this, it gets stuck, while loading, and loads lots of info from the other cache, but not all.

            We don`t know if this is the correct configuration to use for what we are trying to achieve. Right til`now, it was fine, but we faced an unexpected startup problem in production when we had lot of data, that we had to evict in order to start it up again, until we changed our app to take initial State transfer right. It was safer than changing this we didn't fully understand right at the momment.

            Thanks, Manik

            • 3. Re: FetchInMemoryState && InitialStateRetrievalTimeout
              Manik Surtani Master

              Ah, I see your problem - don't preload the root node ("/") in your CacheLoader. Let all the data load lazily.

              • 4. Re: FetchInMemoryState && InitialStateRetrievalTimeout
                Jorge Morales Master

                 

                "manik.surtani@jboss.com" wrote:
                Ah, I see your problem - don't preload the root node ("/") in your CacheLoader. Let all the data load lazily.

                From what I see in the docs:
                preload allows us to define a list of nodes, or even entire subtrees, that are visited by the cache on startup, in order to preload the data associated with those nodes. The default ("/") loads the entire data available in the backend store into the cache, which is probably not a good idea given that the data in the backend store might be large.


                and

                fetchPersistentState determines whether or not to fetch the persistent state of a cache when joining a cluster. Only one configured cache loader may set this property to true.


                So it seems to me that this 2 parameters should be mutually exclusive. I mean, if I preload "/" then setting fetchPersistentState makes no sense. And if I set fetchPersistentState, then preload makes no sense.

                Now I don`t know what should be the correct way to go.

                • 5. Re: FetchInMemoryState && InitialStateRetrievalTimeout
                  Manik Surtani Master

                  Preload specifies which Fqns to preload from a cache loader, regardless of whether you are refereing to a ClusteredCacheLoader or a local persistence engine such as a JDBCCacheLoader. Also, preload allows you to be more fine-grained with what is preloaded, e.g.,

                  <preload>/catalog, /users, /countryCodes</preload>
                  


                  fetchPersistentState, on the other hand, is a flag that is only used with fetchImMemoryState. If the latter is true and you are retrieving all the state from a neighbouring cache in startup, state in any cache loader marked with fetchPersistentState set to true will be retrieved as well.

                  So, it's not mutually exclusive, and in fact pertains to different functions altogether. Hope this makes sense...

                  • 6. Re: FetchInMemoryState && InitialStateRetrievalTimeout
                    Jorge Morales Master

                    Thanks,

                    It seems to make sense. I`ll do some test to fully understand it.