14 Replies Latest reply on Oct 7, 2009 12:14 PM by clebert.suconic

StorageManager replicator

clebert.suconic Sep 24, 2009 1:38 PM

I was looking at how we could make the StorageManager replicated.

I - The replicator:

The idea I currently have about this would be done by encapsulating the transmission and receiving of repplication in a single class, that would be called replicator.

I could have basic methods on that class at the journal level, such as:

replicateUpdate, replicateDelete, replicateCommit, replicateTXupdate, replicateTXappend.. etc.

For those methods, I could have a proxy implementing the Journal interface talking to the replicator directly. (I won't need to make any changes on the Storagemanager for the journal operations).

Paging and LargeMessage

I will also add operations for paging and large message, and delegate through the StoragemManager. On this case, the StorageManager will be talking to the Replicator and make sure the pages and large messages are replicated.

Wait for transmission:

It would be simple work with latches, and make sure the transaction is fully transmitted only at transaction operations.

Say... you are writing a transaction with 10 messages, and a commit.

All the 10 writes to the journal are just replicated, but we don't wait for the transmission.

Later at the commit, we wait for the full transmission through a latch.

Syncing:

Another aspect I'm thinking is... we don't need to wait for any syncs while replicating. We just need to ensure the data is saved on the backup side. We could just sync when activated.

Taking a backup from a live node:

This will be probably another post.. but I have some thoughts about how to do this already.
For the journal itself, it would be possible to disable reclaiming while the backup is being done, batch all the commands while the backup is being made, flush the commands and make the backup active. Of course there will be other considerations from paging and large message.. but this discussion will be probably be done after the initial implementation is done.

1. Re: StorageManager replicator

timfox Sep 24, 2009 2:27 PM (in response to clebert.suconic)

clebert, all you need to do is:
clebert, when an operation arrives on the storagemaanger
clebert, you add that operation to a queue
clebert, and you replicate the operation
clebert, then the next one arrives, you do the same, etc
jbossfox: and how you replicate the operation?
Isn't that what I wrote?
clebert, some time later
clebert, you get a response back
clebert, and you pull the op from the queue and execute it locally
jbossfox: that's how it is currently done...
clebert, this is how our replication worked before
clebert, yes, it's the same
clebert, just copy and paste that
jbossfox: I would still be doing that to update the latch...
clebert, job done
clebert, why do you need a latch
clebert, ?
clebert, you don't need any latch
jbossfox: I was just trying to avoid a wait for instance on:
clebert, you don't need to wait
sendMessage(TX, message)
sendMessage(tx, message)
tx.commit();
I would only wait for the full cycle on the commit
clebert, no
clebert, have a look how the session etc replication works
jbossfox: well.. ok.. it's the same for me then... the only thing is remove the latch from my post
clebert, you need to do the same
jbossfox: I know how it works
clebert, you don't need any latches
jbossfox: ok.. remove that then...
clebert, just replicate the action
clebert, this can be pipelined
but I would still encapsulate through a Replicator...
And have the journal talking to the replicator...
clebert, no you can't do that
for most cases on the Storagemanager... I only need to replicate 4 operations
clebert, since this is not RPC!
clebert, the proxy idea makes no sense
clebert, you can't block waiting for the result
clebert, we're not doing an RPC approach
clebert, that would be very slow
clebert, we are pipelining
clebert, ==> much faster
clebert, just need to replicate the action
clebert, then go off and do the next thing
clebert, *later* a response comes back
clebert, and you pick the action off the queue
clebert, and execute it locally
jbossfox: yes.. but for instance.... since we are only replicating the SM action:
say.. on route
you do
SM.persistMessage
sm.persistReference
to make sure the information is on the backup...
I would need to block on persistMessage, waiting before i can continue
clebert, no!
to make sure the information is at the backup level
clebert, no, no. no
clebert, you're not following
before we would pipeline a bigger operation.. that would encapsulate but the storage and the routing
clebert, it works like this
s/but/both
clebert, actually this is the same as how replication used to work
clebert, so just look at the old code to see how to do it
clebert, let's take the example of a session commit
clebert, in the old code
clebert, (i have removed this code in my branch but the principle is sound)
clebert, so.. with a transaction commit, we need to make sure it is committed on the backup before the user call to commit returns, right?
right
clebert, the way this works is as follows@
transactionc commit would be a good example...
clebert, the commit arrives on the live node
it's a single operation
clebert, we add the commit action to an internal queue
I would be more interested on routing, what takes two journal operations
clebert, then we replicate it to the backup
clebert, note *we do not block*
clebert, the remoting thread then services another request
clebert, the commit action then arrives on the backup
clebert, it is executed on the backup
clebert, the backup sends a response back to the live
jbossfox: you don't block on the server, but the user will be blocked waiting the response
clebert, the live node receives the response and removes the action from the top of the queue
clebert, it then executes the action
clebert, the action when complete, sends a null response packet to the client
clebert, the client receives that packet and the client call to commit returns
clebert, *the only blocking is happening on the client side* not the server side
clebert, there is zero blocking on the server
clebert, in other words we are pipelining on the server
clebert, u with me?
jbossfox: Yeah.. I know how that works...
but look at routing for instance
clebert, so you need to do the same thing
clebert, proxy won't work, since it implies RPC which is blocking on the server
clebert, and that will be realllllly slooow
clebert, you'll have a network RTT per operation replicated :(
jbossfox: commit was a simple use case.. I knew how I would implement that...
I' m more concerned about routing... let me get the code here. .just 1 sec
clebert, routing?
sending a message
just 1 sec
jbossfox: for instance, handleSend
clebert, sending a message is the same
clebert, it's the same for all operations
in a simple case... that is translated as at least two operations....
clebert, handleSend - is on the session, not the storagemanager
clebert, you're replicating the storagemanahger
jbossfox: yes.. I know...
but ...
how could I pipiline that?
clebert, i don't understand what you mean
clebert, you pipeline it the same way for all operations
clebert, like i described
clebert, it's the same as how the old replication used to work
jbossfox: just 1 minute
let me find the code
jbossfox: simple operation....
queueImpl::route
that is making two operations on the storageManager
first: storeMessage
and second storerefrence
clebert, what is the issue here?
I can't continue routing.. until the data is replicated....
pipilining this would require route to send a callback... and continue the rest of the code inside the callback
clebert, ?
and there are two operations here
storeMessage
and
storeReference
clebert, i don't see what the issue is
also.. updateScheduledDeliveryTime
clebert, you just replicate it
clebert, like i described
clebert, when you get the response back you execute it locally
look at QueueImpl::route....
lets simplify the code.. just as an example:
clebert, i'm not interested in queue
clebert, we are replicatng storagemanager operations
clebert, not queue operations
jbossfox: QueueImpl is calling the storageManager
clebert, it's not relevant what calls it
how can I pipeline from the Storagemanager, when all the rest of the operation is inside queue
clebert, i don't care who calls it
clebert, i don't understand your question
clebert, as operations arrive on storage maanger
say.. you call storeMessage(message)... I send it to the "pipeline"...
clebert, it's simple
clebert, you just need to replicate them
I don't have guarantees it is already replicated
from QueueImpl
clebert, you don't need any guarantee
clebert, this is async right?
clebert, you're thinking in RPC terms
jbossfox: no.. I' m not....
I" m just saying that is "easy" (relatively) to implement at handleSend
clebert, no!
clebert, it's very easy to implement
clebert, if you do as i described
on handleSend.. you can just send a callback to be executed after replicated
clebert, forget handlesend
clebert, it's not relevant
jbossfox: can we think how sending a message would work?
jbossfox: I mean.. I would need you to look at QueueImpl::route....
for instance.. say the user needs guarantees of a send
(sync on send.. non transactional)
the user will be waiting the return from the server, until the message is persisted on disk, and replicated to the backup
(I know you know that BTW)
(just completing a though)
when send is happening at the server, you will have at least 2 StorageManager operations.... (maybe 3)
you need all the 3 operations replicated before you can return
before you can send the NullResponse back
before (the current/old schema).. that was a single replication operation.. so you needed a single callback to be executed after the repplication
clebert, i don't follow what the problem si
clebert, s/si/is
to do a correct pipeline now.. you would need to break route into several callback operations for instance
clebert, i agree with what you said, but what is the problem?
(it would be a horrible code)
do you understand about pipelining when we replicate StorageManager directly?
about what I mean with the pipline? (I meant)
clebert, what do you mean, do I understand?
say.. this is the current code on routing:
storageManager.storeMessage(message)
storageManager.storeReference(ref)
storageManager.updatescheduleDelivery(ref)
to do a correct pipeline here, I would need to do:
storageManager.storeMessage(message, new Callback() { public void run() { storageManager.storeReference(ref); }
clebert, no
or else I would need to block on storeMessage....
clebert, no
clebert, you *do not* need to wait for the previous operation to be replicated before replicating the next one
clebert, you just pipeline them
clebert, when the last one comes back, you send the null response to the user
clebert, ouch. if you had to wait on each one that would be a network RTT per operation!
clebert, it would be very slow
clebert, this is async
clebert, are you with me?
yes.. I don't want to wait RTT...
I wouldn't do that
clebert, once you understand this I think you will have a eureka moment
clebert, your code above does exactly that!
clebert, it waits for the response from the storeMessage before it sends the storeReference!
clebert, that means a network RTT!
that's the current code
clebert, current code?
clebert, huh?
(01:11:48 PM) clebert: say.. this is the current code on routing:
(01:11:48 PM) clebert: storageManager.storeMessage(message)
(01:11:48 PM) clebert: storageManager.storeReference(ref)
(01:11:48 PM) clebert: storageManager.updatescheduleDelivery(ref)
then I said:
(01:12:25 PM) clebert: to do a correct pipeline here, I would need to do:
(01:12:25 PM) clebert: storageManager.storeMessage(message, new Callback() { public void run() { storageManager.storeReference(ref); }
clebert, what do you mean "current code" ?
clebert, we haven't implemented this yet
if you look at QueueImpl::route...
how it is interacting with the storageManager
clebert, yes i know
clebert, what queue does
clebert, your misunderstanding this clebert
clebert, you do not have to wait for the response from storeMessage to return before replicating the storeReference!
clebert, this is a key point
clebert, in your example code you posted above
(01:12:25 PM) clebert: storageManager.storeMessage(message, new Callback() { public void run() { storageManager.storeReference(ref); }
clebert, that code waits for the response before sending the next replication
clebert, that is incorrect
clebert, you can send the replications one after another
clebert, without waiting for a reponse
jbossfox: but I need to ensure everything is replicating but before sending the response to the user
clebert, then when the response for the last one for that route comes back
clebert, yes! you only wait for the * last response * to come back
clebert, you do not wait on all of them!
jbossfox: ok.. that would be same thing on the Latch.. but it will be an Callable instead of a latch
clebert, no!
clebert, there is no blocking on the server side
jbossfox: yes.. that's what I' m saying
clebert, it's completely different from latch
clebert, latches block
clebert, you just said it was like latch. that is incorrect
jbossfox: I was just thinking aloud in how I would implement this
clebert, it is not like latch
clebert, you implement it, like i described before ;)
clebert, it's the same as how the replication code I removed worked
clebert, it's no different
clebert, i think you're with me now, right?
jbossfox: I know how the current code works Tim....
My problem is where to place the callback for the last operation.
clebert, you probably just need to add a callback runnable as a parameter to each storagemanager operation
clebert, and when the response comes back it gets called
clebert, then you can put whatever you want in the runnable
clebert, in many cases it will be null
jbossfox: that's the part I don't like....
like... when routing is being called.. I don't have any information about channels.. or anything else like that
jbossfox: I'm thinking about adding a method...
ensureReplicated(Callback)
that could be called ServerSession level
and the callback could be called from there
* plugtree (n=osde@201.250.56.105) has joined #hornetq
jbossfox: but that's an implementation detail I think
clebert, ok
jbossfox: also.. I don't need to wait for any callback when transacted.. unless it's the commit operation
Actions
2. Re: StorageManager replicator

clebert.suconic Sep 25, 2009 9:46 PM (in response to clebert.suconic)

I need an Acceptor at the backup side that will initialize the channel where the live node will send the replication packets sent by the StorageManager

that Acceptor will need its own packet manager.

For achieving that, So far I have done some minor refactoring on RemotingServiceimpl, where I pass a HandlerFactory as a parameter and removed the reference between RemotingServiceImpl with the server.

I could also start the Acceptor manually but I would duplicate a lot of code already at the RemotingService. I'm just trying the easiest path where I would reuse what's already there.
Actions
3. Re: StorageManager replicator

timfox Sep 26, 2009 2:28 AM (in response to clebert.suconic)

"clebert.suconic@jboss.com" wrote:
I need an Acceptor at the backup side that will initialize the channel where the live node will send the replication packets sent by the StorageManager

that Acceptor will need its own packet manager.

For achieving that, So far I have done some minor refactoring on RemotingServiceimpl, where I pass a HandlerFactory as a parameter and removed the reference between RemotingServiceImpl with the server.

I could also start the Acceptor manually but I would duplicate a lot of code already at the RemotingService. I'm just trying the easiest path where I would reuse what's already there.

Take a look how it was done before with replicating connections - you just need to do the same.

Most of the code is still commented out in HornetQServerImpl so you can just uncomment it.

You don't need any special packet manager or acceptor.
Actions
4. Re: StorageManager replicator

timfox Sep 26, 2009 2:29 AM (in response to clebert.suconic)

All the code for replicating connections should be exactly the same (and configured the same - see user manual)
Actions
5. Re: StorageManager replicator

clebert.suconic Sep 26, 2009 10:03 AM (in response to clebert.suconic)

"timfox" wrote:
All the code for replicating connections should be exactly the same (and configured the same - see user manual)

The old code replicate every single packet to the backup, which is done at the channel level.

The new code will need some sort of endpoint on the backup side that will take commands and add them to the storage at backup side.

Say... you replicate the storageManager.storeMessage. That will be translated to a a packet TBD, and on the storeMessage will be played at the backup side. As far as I understand I would need an endpoint for that. I don't want to do this at the Channel level, so I was planning a new PacketHandler for this.
Actions
6. Re: StorageManager replicator

clebert.suconic Sep 26, 2009 10:05 AM (in response to clebert.suconic)

BTW: All the configuration (as you mentioned) will still be the same.

I will reuse a lot of the old code.. maybe just move a few of them to a different place due to some subtle differences in concept (like replicate the whole server versus the storage)
Actions
7. Re: StorageManager replicator

timfox Sep 26, 2009 10:39 AM (in response to clebert.suconic)

Ok sure you need a new packet handler, but all the rest should be the same - you don't need to create any special acceptors or make any changes to remoting connection like you said previously.
Actions
8. Re: StorageManager replicator

clebert.suconic Sep 28, 2009 9:29 PM (in response to clebert.suconic)

Just a status update about today:

I have the endpoints being created..
I have started writing a few tests..

Tomorrow: I plan to already do some replications at the journal level (what will be the easiest part). Will start changing Paging and Large message after that.
Actions
9. Re: StorageManager replicator

clebert.suconic Oct 5, 2009 7:21 PM (in response to clebert.suconic)

Still have work to do. But just another update:

I already have FailoverTest working with the replicated Journals already.

I'm replicating at the Journal level. The StorageManager will have two replicated journals instances that will be sending messages to the backup endpoint. I'm also playing with the ReplicationToken as we discussed.

Tomorrow I will be testing. Paging and LargeMessage are not going to be as difficult as I thoughtit would be, so I'm still good.
Actions
10. Re: StorageManager replicator

timfox Oct 6, 2009 4:19 AM (in response to clebert.suconic)

You're replicating at the journal level?

This seems like a big change of plan to how this thread started and what we discussed...

Can you explain why? Also, paging and large messages don't work at the journal level so how will this work?
Actions
11. Re: StorageManager replicator

clebert.suconic Oct 6, 2009 8:54 AM (in response to clebert.suconic)

It is replicated at the journal level, but still controlled through the JournalStorageManager.

When I started implementing this, everything was translating to add, delete, update, prepare and commit. So instead of replicating all the 10+ methods from StorageManager, I'm replicating the basic methods at the journal level.

There will be an inner class on JournalStorageManager called ReplicatedJournal that will replicate the journal method and delegat to the real journal. It was much easier and simpler doing that way.

The journalImpl still intact. No changes.

Paging and LargeMessage still as we discussed.
Actions
12. Re: StorageManager replicator

clebert.suconic Oct 6, 2009 11:21 AM (in response to clebert.suconic)

The live node is connecting to the backup node thorugh the ConnectionManager. This was removed on trunk, so I was wondering how one live node would connect to the endpoint channel on the backup?

Basically I just re-enabled the code that was commented out. I could still use the FailoverManager in replacement, but I'm not sure if that's the right way to go any more.
Actions
13. Re: StorageManager replicator

timfox Oct 6, 2009 3:47 PM (in response to clebert.suconic)

"clebert.suconic@jboss.com" wrote:
The live node is connecting to the backup node thorugh the ConnectionManager. This was removed on trunk, so I was wondering how one live node would connect to the endpoint channel on the backup?

Basically I just re-enabled the code that was commented out. I could still use the FailoverManager in replacement, but I'm not sure if that's the right way to go any more.

Yes, you could just use FailoverManager
Actions
14. Re: StorageManager replicator

clebert.suconic Oct 7, 2009 12:14 PM (in response to clebert.suconic)

Summary of the changes so far:

1 - package replication:

I have created a class called ReplicationManager. The ReplicationManager. As usual there will be a ReplicationManager, and an impl.

The ReplicationManager will contact the Endpoint on the backup side, and it has all the details on talking to the backup side.

Most of the changes for replication are concentrated here.

1.1 - ReplicatedJournalImpl

There is also a class called ReplicatedJournalImpl, which will be used by the StorageManager. I was going to make it an inner class on Storage at first, but it is separated on the replication package ATM.

As I was replicating the StorageManager everything was being translated to adds, deletes, prepares.. etc.. (all journal operations). So, it was easier to just created this class that will act as a proxy between the replicationManager and the real journals.

1.2 - ReplicationToken

We can't add to the queues, deliver or return the response on sync calls until data is replicated. The ReplicationManager will store a ReplicationToken in a ThreadLocal. I have also added token methods to the StorageManager, so QueueImpl and SessionImpl will be able to configure actions to be done after replication.

2. StorageManager changes

The StorageManager is installing the ReplicatedJournal on both journals.

3 Paging and Large message.

I'm not 100% sure if the StorageManager would be the right place to control Paging and LargeMessages. That would bring some functionality into the JournalStorageManager that doesn't belong to any journal. I would prefer to keep that separated and having those two entities to delegate to the same place the JournalStorageManager is talking to.

The current changes on the JournalStorageManager are minimal, so I don't think it would be an issue to keep largeMessage and Paging separated from here. I'm still evaluating this approach, but I will develop something along the day today.
Actions

Go to original post