5 Replies Latest reply on Jan 25, 2005 5:04 PM by belaban

    Cache Replication Timeout in a Cluster

    joshid

      Hello Everyone,

      We are a JBOSS shop currently using JBOSS 3.2.6 and using the JBOSSCache version that comes with 3.2.6 to store the session information for the user (we are not using Tomcat so we have our own session tracking solution using a specially configured JBossCache). We are in production for a large .com site and overall things have been running well. However, there is an issue preventing us from successfully running the JBossCache in a cluster:

      1. When we need to restart a node on the cluster we have set FetchStateOnStartup to true and set the InitialStateRetrievalTimeout to 120000 milliseconds. Keeping in mind that this is a REPL_SYNC cache on the cluster, the problem is that --on restart-- once the timeout is hit the cache appears to quit replicating and as such the restarted node has an incomplete copy of the cache. Does anyone know how to force the cache to finish replication on startup so we can be guaranteed that we have a duplicate of the cache on every node in our cluster?

      Thanks in advance for your time and consideration,

      Dan

        • 1. Re: Cache Replication Timeout in a Cluster
          belaban

          I just fixed a bug in CVS head which leaves the cache locked if state transfer fails, for whatever reasons.

          Try the latest from CVS head, or increase that state transfer timeout to work around the issue for now.

          • 2. Re: Cache Replication Timeout in a Cluster
            joshid

            Excellent! That will definitely help. Two followup questions:

            1. Are there any issues with using the HEAD version of JBOSSCache in JBOSS 3.2.6? We are currently using 3.2.6 in production and I've tried putting JBOSSCache 1.2 in 3.2.6. Unfortunately, after working through the new JNDI settings I ran into some ClassCastExceptions.

            2. How would one lock the cache for state transfer to ensure that the cache is entirely duplicated when a new node is added to the cluster? That way there is no timeout that can short-circuit the state transfer process. Or have I missed something and with the fix you just mentioned the cache already does this?

            Thank you again and I appreciate your help.

            • 3. Re: Cache Replication Timeout in a Cluster
              belaban

              1. The latest JBossCache is not in the 3.2 branch, but in CVS head. You are probably missing jboss-remoting.jar

              • 4. Re: Cache Replication Timeout in a Cluster
                joshid

                Hi Bela,

                Thank you very much for the advice. Already tried putting the jboss-remoting.jar in the \lib directory and to follow that path of thinking I got the following exception:

                2005-01-25 12:36:11,617 ERROR [org.jboss.cache.TreeCache:bindToJndi] failed binding to JNDI as {locatorURI=rmi://localhost:10444, name=MDVTreeCache}, exception=org/apache/commons/httpclient/HttpMethod

                After figuring out that the above class does not exist in 3.2.6 I copied the commons-httpclient.jar from jboss-4.0.1 and the error went away. Now JBOSSCache registers the MBean in JNDI but when I look it up I get a ClassCastException. The line of code where I look up the JNDI reference looks like:

                cache = (TreeCache) context.lookup("XXXX");

                I'm continuing to research the problem. Any ideas/insight would be most appreciated.

                • 5. Re: Cache Replication Timeout in a Cluster
                  belaban

                  1. You should copy the libs shipped with JBossCache. Using an arbitrary HTTP lib *might* work, but there's no guarantee.

                  2. You need to cast to TreeCacheMBean, not TreeCache