14 Replies Latest reply on Oct 7, 2009 12:14 PM by clebert.suconic

    StorageManager replicator

    clebert.suconic


      I was looking at how we could make the StorageManager replicated.

      I - The replicator:

      The idea I currently have about this would be done by encapsulating the transmission and receiving of repplication in a single class, that would be called replicator.

      I could have basic methods on that class at the journal level, such as:

      replicateUpdate, replicateDelete, replicateCommit, replicateTXupdate, replicateTXappend.. etc.

      For those methods, I could have a proxy implementing the Journal interface talking to the replicator directly. (I won't need to make any changes on the Storagemanager for the journal operations).




      Paging and LargeMessage

      I will also add operations for paging and large message, and delegate through the StoragemManager. On this case, the StorageManager will be talking to the Replicator and make sure the pages and large messages are replicated.





      Wait for transmission:

      It would be simple work with latches, and make sure the transaction is fully transmitted only at transaction operations.

      Say... you are writing a transaction with 10 messages, and a commit.

      All the 10 writes to the journal are just replicated, but we don't wait for the transmission.

      Later at the commit, we wait for the full transmission through a latch.




      Syncing:

      Another aspect I'm thinking is... we don't need to wait for any syncs while replicating. We just need to ensure the data is saved on the backup side. We could just sync when activated.




      Taking a backup from a live node:

      This will be probably another post.. but I have some thoughts about how to do this already.
      For the journal itself, it would be possible to disable reclaiming while the backup is being done, batch all the commands while the backup is being made, flush the commands and make the backup active. Of course there will be other considerations from paging and large message.. but this discussion will be probably be done after the initial implementation is done.

        • 1. Re: StorageManager replicator
          timfox

          clebert, all you need to do is:
          clebert, when an operation arrives on the storagemaanger
          clebert, you add that operation to a queue
          clebert, and you replicate the operation
          clebert, then the next one arrives, you do the same, etc
          jbossfox: and how you replicate the operation?
          Isn't that what I wrote?
          clebert, some time later
          clebert, you get a response back
          clebert, and you pull the op from the queue and execute it locally
          jbossfox: that's how it is currently done...
          clebert, this is how our replication worked before
          clebert, yes, it's the same
          clebert, just copy and paste that
          jbossfox: I would still be doing that to update the latch...
          clebert, job done
          clebert, why do you need a latch
          clebert, ?
          clebert, you don't need any latch
          jbossfox: I was just trying to avoid a wait for instance on:
          clebert, you don't need to wait
          sendMessage(TX, message)
          sendMessage(tx, message)
          tx.commit();
          I would only wait for the full cycle on the commit
          clebert, no
          clebert, have a look how the session etc replication works
          jbossfox: well.. ok.. it's the same for me then... the only thing is remove the latch from my post
          clebert, you need to do the same
          jbossfox: I know how it works
          clebert, you don't need any latches
          jbossfox: ok.. remove that then...
          clebert, just replicate the action
          clebert, this can be pipelined
          but I would still encapsulate through a Replicator...
          And have the journal talking to the replicator...
          clebert, no you can't do that
          for most cases on the Storagemanager... I only need to replicate 4 operations
          clebert, since this is not RPC!
          clebert, the proxy idea makes no sense
          clebert, you can't block waiting for the result
          clebert, we're not doing an RPC approach
          clebert, that would be very slow
          clebert, we are pipelining
          clebert, ==> much faster
          clebert, just need to replicate the action
          clebert, then go off and do the next thing
          clebert, *later* a response comes back
          clebert, and you pick the action off the queue
          clebert, and execute it locally
          jbossfox: yes.. but for instance.... since we are only replicating the SM action:
          say.. on route
          you do
          SM.persistMessage
          sm.persistReference
          to make sure the information is on the backup...
          I would need to block on persistMessage, waiting before i can continue
          clebert, no!
          to make sure the information is at the backup level
          clebert, no, no. no
          clebert, you're not following
          before we would pipeline a bigger operation.. that would encapsulate but the storage and the routing
          clebert, it works like this
          s/but/both
          clebert, actually this is the same as how replication used to work
          clebert, so just look at the old code to see how to do it
          clebert, let's take the example of a session commit
          clebert, in the old code
          clebert, (i have removed this code in my branch but the principle is sound)
          clebert, so.. with a transaction commit, we need to make sure it is committed on the backup before the user call to commit returns, right?
          right
          clebert, the way this works is as follows@
          transactionc commit would be a good example...
          clebert, the commit arrives on the live node
          it's a single operation
          clebert, we add the commit action to an internal queue
          I would be more interested on routing, what takes two journal operations
          clebert, then we replicate it to the backup
          clebert, note *we do not block*
          clebert, the remoting thread then services another request
          clebert, the commit action then arrives on the backup
          clebert, it is executed on the backup
          clebert, the backup sends a response back to the live
          jbossfox: you don't block on the server, but the user will be blocked waiting the response
          clebert, the live node receives the response and removes the action from the top of the queue
          clebert, it then executes the action
          clebert, the action when complete, sends a null response packet to the client
          clebert, the client receives that packet and the client call to commit returns
          clebert, *the only blocking is happening on the client side* not the server side
          clebert, there is zero blocking on the server
          clebert, in other words we are pipelining on the server
          clebert, u with me?
          jbossfox: Yeah.. I know how that works...
          but look at routing for instance
          clebert, so you need to do the same thing
          clebert, proxy won't work, since it implies RPC which is blocking on the server
          clebert, and that will be realllllly slooow
          clebert, you'll have a network RTT per operation replicated :(
          jbossfox: commit was a simple use case.. I knew how I would implement that...
          I' m more concerned about routing... let me get the code here. .just 1 sec
          clebert, routing?
          sending a message
          just 1 sec
          jbossfox: for instance, handleSend
          clebert, sending a message is the same
          clebert, it's the same for all operations
          in a simple case... that is translated as at least two operations....
          clebert, handleSend - is on the session, not the storagemanager
          clebert, you're replicating the storagemanahger
          jbossfox: yes.. I know...
          but ...
          how could I pipiline that?
          clebert, i don't understand what you mean
          clebert, you pipeline it the same way for all operations
          clebert, like i described
          clebert, it's the same as how the old replication used to work
          jbossfox: just 1 minute
          let me find the code
          jbossfox: simple operation....
          queueImpl::route
          that is making two operations on the storageManager
          first: storeMessage
          and second storerefrence
          clebert, what is the issue here?
          I can't continue routing.. until the data is replicated....
          pipilining this would require route to send a callback... and continue the rest of the code inside the callback
          clebert, ?
          and there are two operations here
          storeMessage
          and
          storeReference
          clebert, i don't see what the issue is
          also.. updateScheduledDeliveryTime
          clebert, you just replicate it
          clebert, like i described
          clebert, when you get the response back you execute it locally
          look at QueueImpl::route....
          lets simplify the code.. just as an example:
          clebert, i'm not interested in queue
          clebert, we are replicatng storagemanager operations
          clebert, not queue operations
          jbossfox: QueueImpl is calling the storageManager
          clebert, it's not relevant what calls it
          how can I pipeline from the Storagemanager, when all the rest of the operation is inside queue
          clebert, i don't care who calls it
          clebert, i don't understand your question
          clebert, as operations arrive on storage maanger
          say.. you call storeMessage(message)... I send it to the "pipeline"...
          clebert, it's simple
          clebert, you just need to replicate them
          I don't have guarantees it is already replicated
          from QueueImpl
          clebert, you don't need any guarantee
          clebert, this is async right?
          clebert, you're thinking in RPC terms
          jbossfox: no.. I' m not....
          I" m just saying that is "easy" (relatively) to implement at handleSend
          clebert, no!
          clebert, it's very easy to implement
          clebert, if you do as i described
          on handleSend.. you can just send a callback to be executed after replicated
          clebert, forget handlesend
          clebert, it's not relevant
          jbossfox: can we think how sending a message would work?
          jbossfox: I mean.. I would need you to look at QueueImpl::route....
          for instance.. say the user needs guarantees of a send
          (sync on send.. non transactional)
          the user will be waiting the return from the server, until the message is persisted on disk, and replicated to the backup
          (I know you know that BTW)
          (just completing a though)
          when send is happening at the server, you will have at least 2 StorageManager operations.... (maybe 3)
          you need all the 3 operations replicated before you can return
          before you can send the NullResponse back
          before (the current/old schema).. that was a single replication operation.. so you needed a single callback to be executed after the repplication
          clebert, i don't follow what the problem si
          clebert, s/si/is
          to do a correct pipeline now.. you would need to break route into several callback operations for instance
          clebert, i agree with what you said, but what is the problem?
          (it would be a horrible code)
          do you understand about pipelining when we replicate StorageManager directly?
          about what I mean with the pipline? (I meant)
          clebert, what do you mean, do I understand?
          say.. this is the current code on routing:
          storageManager.storeMessage(message)
          storageManager.storeReference(ref)
          storageManager.updatescheduleDelivery(ref)
          to do a correct pipeline here, I would need to do:
          storageManager.storeMessage(message, new Callback() { public void run() { storageManager.storeReference(ref); }
          clebert, no
          or else I would need to block on storeMessage....
          clebert, no
          clebert, you *do not* need to wait for the previous operation to be replicated before replicating the next one
          clebert, you just pipeline them
          clebert, when the last one comes back, you send the null response to the user
          clebert, ouch. if you had to wait on each one that would be a network RTT per operation!
          clebert, it would be very slow
          clebert, this is async
          clebert, are you with me?
          yes.. I don't want to wait RTT...
          I wouldn't do that
          clebert, once you understand this I think you will have a eureka moment
          clebert, your code above does exactly that!
          clebert, it waits for the response from the storeMessage before it sends the storeReference!
          clebert, that means a network RTT!
          that's the current code
          clebert, current code?
          clebert, huh?
          (01:11:48 PM) clebert: say.. this is the current code on routing:
          (01:11:48 PM) clebert: storageManager.storeMessage(message)
          (01:11:48 PM) clebert: storageManager.storeReference(ref)
          (01:11:48 PM) clebert: storageManager.updatescheduleDelivery(ref)
          then I said:
          (01:12:25 PM) clebert: to do a correct pipeline here, I would need to do:
          (01:12:25 PM) clebert: storageManager.storeMessage(message, new Callback() { public void run() { storageManager.storeReference(ref); }
          clebert, what do you mean "current code" ?
          clebert, we haven't implemented this yet
          if you look at QueueImpl::route...
          how it is interacting with the storageManager
          clebert, yes i know
          clebert, what queue does
          clebert, your misunderstanding this clebert
          clebert, you do not have to wait for the response from storeMessage to return before replicating the storeReference!
          clebert, this is a key point
          clebert, in your example code you posted above
          (01:12:25 PM) clebert: storageManager.storeMessage(message, new Callback() { public void run() { storageManager.storeReference(ref); }
          clebert, that code waits for the response before sending the next replication
          clebert, that is incorrect
          clebert, you can send the replications one after another
          clebert, without waiting for a reponse
          jbossfox: but I need to ensure everything is replicating but before sending the response to the user
          clebert, then when the response for the last one for that route comes back
          clebert, yes! you only wait for the * last response * to come back
          clebert, you do not wait on all of them!
          jbossfox: ok.. that would be same thing on the Latch.. but it will be an Callable instead of a latch
          clebert, no!
          clebert, there is no blocking on the server side
          jbossfox: yes.. that's what I' m saying
          clebert, it's completely different from latch
          clebert, latches block
          clebert, you just said it was like latch. that is incorrect
          jbossfox: I was just thinking aloud in how I would implement this
          clebert, it is not like latch
          clebert, you implement it, like i described before ;)
          clebert, it's the same as how the replication code I removed worked
          clebert, it's no different
          clebert, i think you're with me now, right?
          jbossfox: I know how the current code works Tim....
          My problem is where to place the callback for the last operation.
          clebert, you probably just need to add a callback runnable as a parameter to each storagemanager operation
          clebert, and when the response comes back it gets called
          clebert, then you can put whatever you want in the runnable
          clebert, in many cases it will be null
          jbossfox: that's the part I don't like....
          like... when routing is being called.. I don't have any information about channels.. or anything else like that
          jbossfox: I'm thinking about adding a method...
          ensureReplicated(Callback)
          that could be called ServerSession level
          and the callback could be called from there
          * plugtree (n=osde@201.250.56.105) has joined #hornetq
          jbossfox: but that's an implementation detail I think
          clebert, ok
          jbossfox: also.. I don't need to wait for any callback when transacted.. unless it's the commit operation

          • 2. Re: StorageManager replicator
            clebert.suconic

            I need an Acceptor at the backup side that will initialize the channel where the live node will send the replication packets sent by the StorageManager

            that Acceptor will need its own packet manager.

            For achieving that, So far I have done some minor refactoring on RemotingServiceimpl, where I pass a HandlerFactory as a parameter and removed the reference between RemotingServiceImpl with the server.

            I could also start the Acceptor manually but I would duplicate a lot of code already at the RemotingService. I'm just trying the easiest path where I would reuse what's already there.

            • 3. Re: StorageManager replicator
              timfox

               

              "clebert.suconic@jboss.com" wrote:
              I need an Acceptor at the backup side that will initialize the channel where the live node will send the replication packets sent by the StorageManager

              that Acceptor will need its own packet manager.

              For achieving that, So far I have done some minor refactoring on RemotingServiceimpl, where I pass a HandlerFactory as a parameter and removed the reference between RemotingServiceImpl with the server.

              I could also start the Acceptor manually but I would duplicate a lot of code already at the RemotingService. I'm just trying the easiest path where I would reuse what's already there.


              Take a look how it was done before with replicating connections - you just need to do the same.

              Most of the code is still commented out in HornetQServerImpl so you can just uncomment it.

              You don't need any special packet manager or acceptor.



              • 4. Re: StorageManager replicator
                timfox

                All the code for replicating connections should be exactly the same (and configured the same - see user manual)

                • 5. Re: StorageManager replicator
                  clebert.suconic

                   

                  "timfox" wrote:
                  All the code for replicating connections should be exactly the same (and configured the same - see user manual)



                  The old code replicate every single packet to the backup, which is done at the channel level.

                  The new code will need some sort of endpoint on the backup side that will take commands and add them to the storage at backup side.

                  Say... you replicate the storageManager.storeMessage. That will be translated to a a packet TBD, and on the storeMessage will be played at the backup side. As far as I understand I would need an endpoint for that. I don't want to do this at the Channel level, so I was planning a new PacketHandler for this.

                  • 6. Re: StorageManager replicator
                    clebert.suconic

                    BTW: All the configuration (as you mentioned) will still be the same.

                    I will reuse a lot of the old code.. maybe just move a few of them to a different place due to some subtle differences in concept (like replicate the whole server versus the storage)

                    • 7. Re: StorageManager replicator
                      timfox

                      Ok sure you need a new packet handler, but all the rest should be the same - you don't need to create any special acceptors or make any changes to remoting connection like you said previously.

                      • 8. Re: StorageManager replicator
                        clebert.suconic

                        Just a status update about today:


                        I have the endpoints being created..
                        I have started writing a few tests..

                        Tomorrow: I plan to already do some replications at the journal level (what will be the easiest part). Will start changing Paging and Large message after that.

                        • 9. Re: StorageManager replicator
                          clebert.suconic


                          Still have work to do. But just another update:


                          I already have FailoverTest working with the replicated Journals already.

                          I'm replicating at the Journal level. The StorageManager will have two replicated journals instances that will be sending messages to the backup endpoint. I'm also playing with the ReplicationToken as we discussed.

                          Tomorrow I will be testing. Paging and LargeMessage are not going to be as difficult as I thoughtit would be, so I'm still good.

                          • 10. Re: StorageManager replicator
                            timfox

                            You're replicating at the journal level?

                            This seems like a big change of plan to how this thread started and what we discussed...

                            Can you explain why? Also, paging and large messages don't work at the journal level so how will this work?

                            • 11. Re: StorageManager replicator
                              clebert.suconic

                              It is replicated at the journal level, but still controlled through the JournalStorageManager.

                              When I started implementing this, everything was translating to add, delete, update, prepare and commit. So instead of replicating all the 10+ methods from StorageManager, I'm replicating the basic methods at the journal level.


                              There will be an inner class on JournalStorageManager called ReplicatedJournal that will replicate the journal method and delegat to the real journal. It was much easier and simpler doing that way.


                              The journalImpl still intact. No changes.


                              Paging and LargeMessage still as we discussed.

                              • 12. Re: StorageManager replicator
                                clebert.suconic

                                The live node is connecting to the backup node thorugh the ConnectionManager. This was removed on trunk, so I was wondering how one live node would connect to the endpoint channel on the backup?

                                Basically I just re-enabled the code that was commented out. I could still use the FailoverManager in replacement, but I'm not sure if that's the right way to go any more.

                                • 13. Re: StorageManager replicator
                                  timfox

                                   

                                  "clebert.suconic@jboss.com" wrote:
                                  The live node is connecting to the backup node thorugh the ConnectionManager. This was removed on trunk, so I was wondering how one live node would connect to the endpoint channel on the backup?

                                  Basically I just re-enabled the code that was commented out. I could still use the FailoverManager in replacement, but I'm not sure if that's the right way to go any more.


                                  Yes, you could just use FailoverManager

                                  • 14. Re: StorageManager replicator
                                    clebert.suconic

                                    Summary of the changes so far:


                                    1 - package replication:

                                    I have created a class called ReplicationManager. The ReplicationManager. As usual there will be a ReplicationManager, and an impl.

                                    The ReplicationManager will contact the Endpoint on the backup side, and it has all the details on talking to the backup side.

                                    Most of the changes for replication are concentrated here.

                                    1.1 - ReplicatedJournalImpl

                                    There is also a class called ReplicatedJournalImpl, which will be used by the StorageManager. I was going to make it an inner class on Storage at first, but it is separated on the replication package ATM.

                                    As I was replicating the StorageManager everything was being translated to adds, deletes, prepares.. etc.. (all journal operations). So, it was easier to just created this class that will act as a proxy between the replicationManager and the real journals.


                                    1.2 - ReplicationToken

                                    We can't add to the queues, deliver or return the response on sync calls until data is replicated. The ReplicationManager will store a ReplicationToken in a ThreadLocal. I have also added token methods to the StorageManager, so QueueImpl and SessionImpl will be able to configure actions to be done after replication.


                                    2. StorageManager changes

                                    The StorageManager is installing the ReplicatedJournal on both journals.


                                    3 Paging and Large message.

                                    I'm not 100% sure if the StorageManager would be the right place to control Paging and LargeMessages. That would bring some functionality into the JournalStorageManager that doesn't belong to any journal. I would prefer to keep that separated and having those two entities to delegate to the same place the JournalStorageManager is talking to.


                                    The current changes on the JournalStorageManager are minimal, so I don't think it would be an issue to keep largeMessage and Paging separated from here. I'm still evaluating this approach, but I will develop something along the day today.