2 Replies Latest reply on Jan 26, 2009 10:44 AM by abcdefg1234

    Replicated Cache - Failover & Failback

    abcdefg1234

      We have a 2 node replicated cache i.e.
      node1 --> application instance 1 --> cache instance 1
      node2 --> application instance 2 --> cache instance 2

      According to the configuration below, everything seems to work fine. i.e when node 1 is shutdown and brought back up, any changes made to cache instance 2 are replicated.

      Problem: - However, what if the application server node is still running, but only the cache instance on node 1 has to be shutdown or crashes. I would think that the application on node 1 can access to the cache on node 2. That does'nt happen though. Is it possible to have a configuration wherein, all the instances in the clustered cache are searched for.

      I even tried the ClusteredCacheLoader as shown in the configuration below, but that does not make a difference. Isi'nt that what ClusteredCacheLoader is used for??

      --------------------------------------------------------------------------------
      <?xml version="1.0" encoding="UTF-8"?>
      
      <!-- ===================================================================== -->
      <!-- -->
      <!-- Sample TreeCache Service Configuration -->
      <!-- -->
      <!-- ===================================================================== -->
      
      <server>
      
       <classpath codebase="./lib" archives="jbosscache-core.jar, jboss-common-core.jar, jgroups.jar"/>
       <!-- ==================================================================== -->
       <!-- Defines TreeCache configuration -->
       <!-- ==================================================================== -->
      
       <mbean code="org.jboss.cache.TreeCache"
       name="jboss.cache:service=TreeCache">
      
       <depends>jboss:service=Naming</depends>
       <depends>jboss:service=TransactionManager</depends>
      
       <!--
       Configure the TransactionManager
       -->
       <attribute name="TransactionManagerLookupClass">org.jboss.cache.transaction.DummyTransactionManagerLookup
       </attribute>
      
       <!--
       Node locking level : SERIALIZABLE
       REPEATABLE_READ (default)
       READ_COMMITTED
       READ_UNCOMMITTED
       NONE
       -->
       <attribute name="IsolationLevel">REPEATABLE_READ</attribute>
      
       <!--
       Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC
       -->
       <attribute name="CacheMode">REPL_SYNC</attribute>
      
       <!--
       Just used for async repl: use a replication queue
       -->
       <attribute name="UseReplQueue">false</attribute>
      
       <!--
       Replication interval for replication queue (in ms)
       -->
       <attribute name="ReplQueueInterval">0</attribute>
      
       <!--
       Max number of elements which trigger replication
       -->
       <attribute name="ReplQueueMaxElements">0</attribute>
      
       <!-- Name of cluster. Needs to be the same for all TreeCache nodes in a
       cluster in order to find each other.
       -->
       <attribute name="ClusterName">JBossCache-Cluster</attribute>
      
       <attribute name="ClusterConfig">
       <config>
       <!-- UDP: if you have a multihomed machine,
       set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2".
       The mcast_addr and mcast_port should be different from the cluster address and port to avoid contention.
       -->
       <!-- UDP: On Windows machines, because of the media sense feature
       being broken with multicast (even after disabling media sense)
       set the loopback attribute to true -->
       <UDP mcast_addr="224.10.10.10" mcast_port="45566"
       ip_ttl="64" ip_mcast="true"
       mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
       ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
       loopback="false" bind_addr="127.0.0.1" bind_port="5544" />
       <PING timeout="2000" num_initial_members="3"
       up_thread="false" down_thread="false"/>
       <MERGE2 min_interval="10000" max_interval="20000"/>
       <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
       <FD_SOCK/>
       <VERIFY_SUSPECT timeout="1500"
       up_thread="false" down_thread="false"/>
       <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
       max_xmit_size="8192" up_thread="false" down_thread="false"/>
       <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
       down_thread="false"/>
       <pbcast.STABLE desired_avg_gossip="20000"
       up_thread="false" down_thread="false"/>
       <FRAG frag_size="8192"
       down_thread="false" up_thread="false"/>
       <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
       shun="true" print_local_addr="true"/>
       <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
       </config>
       </attribute>
      
       <!--
       Whether or not to fetch state on joining a cluster
       -->
       <attribute name="FetchStateOnStartup">true</attribute>
      
       <!--
       The max amount of time (in milliseconds) we wait until the
       initial state (ie. the contents of the cache) are retrieved from
       existing members in a clustered environment
       -->
       <attribute name="InitialStateRetrievalTimeout">5000</attribute>
      
       <!--
       The max amount of time (in milliseconds) we wait until the
       state (ie. the contents of the cache) are retrieved from
       existing members in a clustered environment
       -->
       <attribute name="StateRetrievalTimeout">20000</attribute>
      
       <!--
       Number of milliseconds to wait until all responses for a
       synchronous call have been received.
       -->
       <attribute name="SyncReplTimeout">20000</attribute>
      
       <!-- Max number of milliseconds to wait for a lock acquisition -->
       <attribute name="CacheLoaderConfig" replace="false">
       <config>
       <cacheloader>
       <class>org.jboss.cache.loader.ClusteredCacheLoader</class>
       <properties>
       timeout=15000
       </properties>
       </cacheloader>
       </config>
       </attribute>
       </mbean>
      </server>
      ------------------------------------------------------------------------------------


        • 1. Re: Replicated Cache - Failover & Failback
          manik

          Not quite sure I understand your problem. If a cache instance is shut down, then clients on the same VM as the stopped cache have no interface to the clustered cache as a whole.

          The CCL is for cases where you want to disable state transfer (FetchStateOnStartup) perhaps because you have too much state and it takes too long to start up. CCL is just a way for this to happen lazily, on demand. But you still need both cache instances up and running.

          • 2. Re: Replicated Cache - Failover & Failback
            abcdefg1234

            Thanks for the quick reply. I had assumed earlier that in a CCL configuration, if one of the cache instance's in a cluster is shutdown the requests get automatically rerouted to the other cache in the cluster, until it is brought back again.