scale down and prepared transactions| JBoss.org Content Archive (Read Only)

15. Re: Re: scale down and prepared transactions

ataylor Mar 20, 2014 9:42 AM (in response to jbertram)

Handling Store and Forward queues on scale down.

Let me outline the different scenarios, firsly we can have the following topology

we have 3 servers S1,S2 and S3
each server has the address AddA
each server has a bridge to the other 2 servers and a Store and Forward Queue, so S1 will have store and forward queue SNF2 and SNF3 and so forth.
Each server will have a queue on address AddA with a different queueID

In this scenario S1 will be scaling down to S2.

The first thing to remember is that a message in a Store and Forward queue can be in 2 states, lets call this before and after send. When the message is routed to the remote queue binding we add a header which would be something like “_HQ_ROUTE_TOsf.cluster1.c7c6d187-b01e-11e3-93a4-a5c816132e94 queueid's” where the end is the name of the bridge, this is 'before'. When we send the message via the cluster connection bridge we remove this property and replace it with something like “HQ_ROUTE_TO queueIds” which is used when routed at the destination post office to choose the bindings. Which state messages will be in is dependant on when the bridge failed and whether or not an attempted send was carried out on the message.

S1 handling SNF2

Since this is what would be happening by the cluster connection bridge all we have to do is make sure the queue ids on each message is set correctly. If in the 'before' state we need to replace the property “_HQ_ROUTE_TOsf.cluster1.c7c6d187-b01e-11e3-93a4-a5c816132e94 queueid's” to “HQ_ROUTE_TO queueIds”. If in the after state then we need to do nothing. currently we only deal with the second scenarion

S1 handling SNF3

basically these messages just need to end up in SNF3 on server 2 so they can be forwarded by the bridge to S3, however they need to end up at SNF3 in the correct state. If the message is sent in the 'before' state then the post office at S2 will strip this off and then route it to SNF3. The cluster bridge will then throw an exception whilst trying to determine the route to ids. If in the 'after'state then they will be routed and handled correctly. Currently we dont check which state they are in and need to.

At this point all the messages on S1 have been handled, however there will be store and forward queues for S1 on all the other nodes. One option is to notify the other nodes before we start to scale down is to notify the other nodes to handle any remaining messages so that when S1 scales down these can just be deleted. The downside to this is that this could take a while if there are lots of messages and also this wont work for subscriptions. Any subscription queues on S1 would need to receive all messages routed to the other servers between the bridge going down and the subscription being created on another server. You would either have to pause routing by the post office until scale down had occurred (or at least the subscription had been recreated) or lose theh messages.

The second way to handle this would be notify after scale down what the other servers should do with its store and forward queue for S1. Each server would need to know the target server that was scaled down to and the new mappings for the queue ids, i.e. Queue with id2 on S1 is queue with id 3 on S2. The following 2 scenarios would then need handling

S2 handling SNF1

These message were originally destined for server S1 and these queues will now exist on S2 these will need to be moved from SNF1 to whichever local queue they were originally destined for using the mappings provided after scale down.

S3 handling SNF1

These message will need to be added to SNF2 on S2 with the correct queue id's set by the mappings provided after scaledown. These will then be forwarded on the bridge.

For both the above any routing of new messages to this store and forward queue would need to be paused while this happens, after this the store and forward queue would need to be deleted along with its remote binding changed to use the the SNF2 queue if S3 or deleted entirely of S2. Im not sure how easy it would be to do this ot what effect this would have on throughput etc.

Alternatively we could leave SNF1 as it is and simply move from 1 queue to another as messages arrive. The downside to this is that as a cluster grows and shrinks these bindings and queues may becomde un manageable as new one will be recreated on every server added.

I think this just about covers what we need to do, thoughts? Have i missed something?

16. Re: scale down and prepared transactions

clebert.suconic Mar 20, 2014 10:42 PM (in response to ataylor)

TBH I got lost on these SNFs...

But a node with SnFs to other nodes is easy to address.. in fact Justin has already done it where we just finish what the Bridge was supposed to do. I think this is what you were describing here on your post.

The issue is when the node going down is the target of another node..

Say on your example S1 is going down... being merged ad SX (doesn't matter where.. a specific node)

Messages on the SNF (S2->S1) would have to be now sent to SX...

The IDs for the queues on the SNF(S2->S1) won't match any valid queue on SX. You would have to rename them.. you would have to recreate them and make sure they are translated. The issue will get complex in the case S2 is failed on the process. If we have the information to recreate these queues it will be on the Bindings.. that I don't know if it will survive:

- until we start merging the queues

- an eventual crash during the merge.

We can talk about this on monday / Tuesday. But this thing can get complex... that's why I was suggesting we should do something simple such as propose a clean shutdown on scale down.

17. Re: scale down and prepared transactions

ataylor Mar 21, 2014 4:45 AM (in response to clebert.suconic)

But a node with SnFs to other nodes is easy to address.. in fact Justin has already done it where we just finish what the Bridge was supposed to do. I think this is what you were describing here on your post.

this is S1 handling SNF3, currently this doesnt work when the message is in the before state, the bridge on S3 ends up throwing an exception.

Also S1 handling SNF2 only works currently when in the after state.

I think my post covers everything that we need to.

18. Re: scale down and prepared transactions

ataylor Mar 21, 2014 5:23 AM (in response to ataylor)

We can talk about this on monday / Tuesday. But this thing can get complex... that's why I was suggesting we should do something simple such as propose a clean shutdown on scale down.

I dont think a clean shutdown makes any difference, you still getthe same issues. but lets talk next week.