3 Replies Latest reply on Oct 10, 2005 9:31 AM by belaban

    Problem with state replication of new cache

    joc

      Hi, I'm having a problem with replication if an existing cache.

      If I start a cache, and populate it with some values, then I start a long running transaction on that cache. While the transaction is running, another cache is started, and attempts to get the values from the first cache - but seems to fail if the first cache is still in the transaction (?)

      Once the transaction of the first cache has completed and been committed, the new value is readable from the second cache, but if it was rolled back, the original value is not readable (just get a NULL).

      Im sure theres something I've missed here, but many of the methods in the javadoc have no description of what they actually do so its all rather confusing.

      So 2 main questions here really,
      1) Is what I described above expected behaviour? I know a long transaction is bad, but it might just happen - and I dont want a new server to be started up and get an empty cache.
      2) Is there a way to check if the replication worked ok? I didnt seem to get any exception to say something went wrong, it just carried on as if it worked fine?

      I attach a (very rough) example showing the problem below.....

      ********************************************************
      package jbosscachetest;

      import java.io.*;
      import javax.transaction.*;

      import org.jboss.cache.*;
      import org.jboss.cache.transaction.*;



      public class Example {

      boolean sleeping = true;
      boolean finished = false;

      public static void main(String[] args) {
      Example testcase = new Example();
      }


      public Example() {

      Thread coordinator = new Thread(new Runnable()
      {
      TreeCache tree = createCache();
      public void run() {
      try {
      // Set an initial value
      tree.put("/c", "lockMe", "not yet changed");
      System.out.println("Tree 1 updated");

      // Start up the second server
      sleeping = false;

      Transaction tx2 = startTransaction();
      System.out.println("Thread 1 starting LONG transaction");

      // change value inside transaction
      tree.put("/c", "lockMe", "this-changed-and-comitted");
      _sleep(15000);

      tx2.commit();
      System.out.println("Finished transaction");
      }
      catch (Exception ex) {
      }
      try {
      System.out.println("Thread 1 out of transaction - value = " +
      tree.get("/c", "lockMe"));
      }
      catch (CacheException ex1) {
      }
      _sleep(5000);
      tree.stopService();
      tree.destroyService();
      finished = false;
      }
      });

      coordinator.start();

      Thread reader = new Thread(new Runnable()
      {
      TreeCache tree2 = null;
      public void run() {
      System.out.println("Thread two woken, creating cache");
      tree2 = createCache();
      System.out.println("Created second cache - would hope not to see NULL here....");
      while (!finished) {
      try {
      System.out.println("FETCHING - " + tree2.get("/c","lockMe"));
      }
      catch (CacheException ex) {
      System.out.println("EXCEPTION - "+ex);
      }
      _sleep(1000);
      }
      tree2.stopService();
      tree2.destroyService();
      System.out.println("Test finished");
      }
      });

      while (sleeping == true){}
      reader.start();
      }


      public TreeCache createCache() {
      TreeCache tree = null;
      try {
      tree = new TreeCache();
      PropertyConfigurator config = new PropertyConfigurator();
      config.configure(tree, "META-INF/replSync-service.xml");
      tree.setClusterName("demo-cluster");

      tree.setFetchStateOnStartup(true);
      tree.setInactiveOnStartup(true);
      tree.setSyncCommitPhase(true);
      tree.setSyncReplTimeout(0);

      tree.createService();
      tree.startService();
      }
      catch (Exception ex) {
      System.out.println("EXCEPTION-" + ex);
      }
      return tree;
      }


      void _sleep(long time) {
      Thread.yield();
      try {
      Thread.sleep(time);
      }
      catch (InterruptedException e) {}
      }


      private Transaction startTransaction() throws SystemException,
      NotSupportedException {
      DummyTransactionManager mgr = DummyTransactionManager.getInstance();
      mgr.begin();
      return mgr.getTransaction();
      }


      }

      ******************************************
      Here's hoping there's a simple solution.... :)

      << JOC >>

        • 1. Re: Problem with state replication of new cache
          belaban

          This is a well-known 'feature'. A state transfer will time out unless it can acquire all locks on the oldest member (=coordinator). If another TX holds locks on the coordinator, then the state transfer to the new member will fail.

          In 1.3 we will change this behavior by simply breaking (= force-releasing) locks held by any other TX and rolling back those TXs.

          Current workarounds include
          - Don't use long running TXs
          - Use optimistic locking (this is done, but not officially supported until 1.3 though)
          - Use a different isolation level, e.g. READ-UNCOMMITTED

          • 2. Re: Problem with state replication of new cache
            joc

            Thanks for the incredibly fast reply,

            It has however raised a couple of other questions (as answers usually do :) )

            When you say "A state transfer will time out unless it can acquire all locks on the oldest member" I assume this means that, in effect, this locks the whole cluster for the duration, as a transaction would need to update every member?

            Are these locks only for writing, and you can still read?

            Assuming each node is locked individually (I assume a lot :) ), are all the locks kept for the duration of the transfer to the new member, or released as each node is transfered? (we cache a lot of large objects, so the transfer could take some time - I havent got as far as investigating transfer speed yet but read some posts saying it can be quite slow)

            Would I be right in thinking that because of the locking, you should therefore only add 1 cache at a time or is there something to handle this case (so if we restarted all the servers in a cluster at the same time and the transfer took say 1 minute possibly all but the first one in might fail ? )

            Will the transfer always be an all or nothing thing? (so its safe to assume if you can't read a node you *know* should be there - the transfer failed) or is there a better/proper way of telling it failed to get the state correctly?


            Sorry for the rapid fire questions, and thanks again for your time

            << JOC >>

            • 3. Re: Problem with state replication of new cache
              belaban

               

              "joc" wrote:
              Thanks for the incredibly fast reply,

              It has however raised a couple of other questions (as answers usually do :) )

              When you say "A state transfer will time out unless it can acquire all locks on the oldest member" I assume this means that, in effect, this locks the whole cluster for the duration, as a transaction would need to update every member?


              No, it only locks the oldest member in the cluster (the coordinator). And it only locks it until we have copied its state. With streaming state transfer, where we will transfer the state in user-defined chunks of bytes, the tree will be locked until the transfer is done.


              Are these locks only for writing, and you can still read?


              We use read-locks on all nodes (Node.acquireAll()). In the future, we may simply acquire a WL on the root node, then we are guaranteed that no RL can have been held somewhere underneath us in the tree.


              Assuming each node is locked individually (I assume a lot :) ), are all the locks kept for the duration of the transfer to the new member, or released as each node is transfered?

              Until the state has been copied, or the state transfer is done (with streaming state transfer)


              Would I be right in thinking that because of the locking, you should therefore only add 1 cache at a time or is there something to handle this case (so if we restarted all the servers in a cluster at the same time and the transfer took say 1 minute possibly all but the first one in might fail ? )


              If you restarted all nodes in a cluster at the same time, you'd potentially lose all in-memory state, unless the state is backed up by a shared CacheLoader, e.g. JDBCCacheLoader.


              Will the transfer always be an all or nothing thing? (so its safe to assume if you can't read a node you *know* should be there - the transfer failed) or is there a better/proper way of telling it failed to get the state correctly?


              Yes, all-or-nothing: if I cannot acquire a lock, the entire state transfer will fail