0 Replies Latest reply on Aug 6, 2009 5:21 AM by Tim Fox

    Replication

    Tim Fox Master

      Currently JBM 2.0 uses server replication between a live and a backup server to maintain the backup server in a (quasi) identical state so that on failover of a client from the live to the backup server, the clients session(s) can be found in the exact same state they left off on on the live server and can be reattached so the client can continue it's operation 100% transparently.

      The current implementation in TRUNK uses a single thread for replication which makes it easier to guarantee that any state changes on the backup are applied in the same order as live, but has a down side in that forcing everything to be single threaded destroys concurrency on the server, effectively pushing everything to a single core.

      Recently I have been working on multi-threaded replication. This allows state changes to be applied on backup by many different threads so we solve the concurrency problem. However, we still have to ensure that state changes are applied globally in the same order on backup as live. This is tricky with multiple threads. The technique used is to note the acquisition of mutexes around shared data on the live node and when replicating we replicate this list of acquires too. On the backup node we create a special mutex which forces locks to be obtained in the same order as the list.

      This is a complex problem to solve/implement. The current status is it's "more or less working" but not ready yet, and would probably take a significant amount of time to complete/debug fully etc.

      The replication code significantly complicates the server code, and all replication comes at a cost of latency - since you need to make sure each packet is replicated and received on the backup before returning results to the user.

      Let's take a look at what other messaging systems do:

      1) Weblogic JMS - they don't use server replication
      2) Websphere MQ - they don't use server replication
      3) Tibco EMS - they don't use server replication
      4) ActiveMQ - has slow synchronous single threaded replication
      5) SonicMQ - *does* have full server replication.

      Really, only one of our significant competitors (SonicMQ) actually does server replication.

      Most of them do failover via a shared store on a shared filesystem, any session state is lost.

      Since most users use one of the above systems which typically don't have server replication, it seems to me it can't be a critically important feature, that's worth the cost (latency, performance). A non replicating server is likely to be faster than a replicating server.

      Therefore, what I am proposing is we remove full server replication from the JBM 2.0 server, since it's not worth the cost in terms of
      a) Performance overhead
      b) Maintainability difficulties
      c) Hard work in implementing and debugging it.

      Compared to the small benefit of having 100% transparent failover.

      If we remove full server replication, when a client detects server failure it can stil automatically fail over to the backup server and automatically reconnect, the only difference will be the session state won't be there, so in a non transacted session, any messages or acks sent might not have actually reached the server which could result in sent messages being lost or duplicates delivered.

      For a transacted session, that has already sent messages or acks before failover occurs, we just need to flag the session as rollback only, and on commit, the commit will fail with a TransactionRolledBackException. The user would need to catch this and restart the transaction. In such a way we can maintain the once and only once delivery guarantee and never lose messages or get duplicates with transacted sessions.

      AIUI is pretty much how the majority of the other messaging systems handle failover.

      With no full server replication the user can choose two modes of HA:

      1) Failover via a shared store residing on a shared file system. When the live fails the backup loads the journal, and clients can connect to it.

      2) Replicated data store. We can replicate the data store from the live to the backup node so there's no need for a shared file system. Replicating the data store is a lot easier than replicating the entire server

      I'll park the MT replication code in a branch in case we want to revisit it in the future.

      Thoughts?