Austin Clustering meeting thoughts| JBoss.org Content Archive (Read Only)

15. Re: Austin Clustering meeting thoughts

marklittle May 8, 2006 5:53 AM (in response to timfox)

You'd think it's trivial, but the number of implementations (commercial and OSS) that get it wrong has been surprising. By "wrong" I mean: over engineered. Best effort doesn't even require an ack from the receiver: you can fabricate the ack (if one is needed) at the sender, because ultimately it won't be able to tell the difference between a situation where the message was delivered but the receiver subsequently crashed and the situation where the message was never delivered (e.g., because the sender dropped it on the floor due to flow-control issues).

16. Re: Austin Clustering meeting thoughts

marklittle May 8, 2006 5:56 AM (in response to timfox)

"adrian@jboss.org" wrote:

It is the guaranteed **ONLY ONCE** delivery that is hard.

Oh and I know it's hard, but solutions have existed for well over a decade (can point you at my PhD if you really want ;-)

Mark.

17. Re: Austin Clustering meeting thoughts

marklittle May 8, 2006 6:08 AM (in response to timfox)

"adrian@jboss.org" wrote:
Guys, I know I've told all you before about this, except Mark Little
(and I might have only mentioned it in passing to Tim?).

JMS does not require total ordering, except as it appears
for one client connection/session.

Agreed, but I think the point Tim and Bela were discussing was about replicating the JMS system so that it was highly available. In that case, if you are using an active replication protocol to emulate a highly available, fault-tolerant implementation, all of the replicas MUST receive the same set of messages in the same order (and they MUST start in the same state). Otherwise you don't get deterministic behaviour. This is irrespective of the entity you are trying to replicate (the same is the case for replicating a transaction system, a spreadsheet, a bank account, etc.)

If you have competing senders there is no guarantee how the race
will be resolved so there is no point introducing total ordering.
This is true for one node as it is for clustered nodes.

This is replication for high availability and not for load balancing. Two different things, which can be catered for using replication techniques, but those techniques are different. If you want to do both, then you need a suitable replication protocol (which may, as you point out, have to be tailored for the system being replicated if we want to optimise it).

e.g. You can have two clients sending messages A and B to a topic
simultaneously. Two topic subscriptions can see these messages
in different orders.

All that is required for HA JMS is:
1) You have a singleton location that is in charge of a queue/topic
subscription
2) A client can connect to the singleton location regardless of which
machine it actually initially connects to
3) You replicate/move the messages to the singleton location in a guaranteed way (for persistent mesasges and durable destinations only). i.e. persist/forward/ack/delete
4) You duplicate the singleton location under the singleton's location control
(for consistency) this can either be total replication across every node or it could be buddy replication or some other mechanism like shared database.

That would work as long as your subscribers aren't part of a replica group themselves and in which case they'd either have to get messages from the same queue/topic, or see messages in the same order. If you don't ensure that the states of replicas remain identical, then consistency and correctness go out the window!

If they are truly independent subscribers (essentially not replicas of one another) then I agree, the order isn't important. This is an example of using application semantics to optimize the protocol though. For example, if you combine transactions and replication, you don't need total ordering either because the transaction system will impose the ordering when necessary.

18. Re: Austin Clustering meeting thoughts

adrian.brock May 8, 2006 6:09 AM (in response to timfox)

"mark.little@jboss.com" wrote:
"adrian@jboss.org" wrote:

It is the guaranteed **ONLY ONCE** delivery that is hard.

Oh and I know it's hard, but solutions have existed for well over a decade (can point you at my PhD if you really want ;-)

Mark.

I mean hard to do efficiently. Not hard as in we don't know how to do it. :-)
Introducing things like total ordering across the cluster is not efficient!

I've always argued that "load blancing" queues is inefficent as well.
It is like trying to "load blanance" SFSB or a web session.
i.e. Lots of needless replication when you can just make the
state sticky to a node coupled with some kind of "hot" backup.

19. Re: Austin Clustering meeting thoughts

marklittle May 8, 2006 6:14 AM (in response to timfox)

"adrian@jboss.org" wrote:
"mark.little@jboss.com" wrote:
"adrian@jboss.org" wrote:

It is the guaranteed **ONLY ONCE** delivery that is hard.

Oh and I know it's hard, but solutions have existed for well over a decade (can point you at my PhD if you really want ;-)

Mark.

I mean hard to do efficiently. Not hard as in we don't know how to do it. :-)
Introducing things like total ordering across the cluster is not efficient!

I've always argued that "load blancing" queues is inefficent as well.
It is like trying to "load blanance" SFSB or a web session.
i.e. Lots of needless replication when you can just make the
state sticky to a node coupled with some kind of "hot" backup.

I agree. You've got to introduce semantic information (like, as you pointed out, the fact that two different clients can see the same set of messages in different orders) and optimize for the common cases, rather than take the brute-force approach, which would give you something that worked, but ran like a tortoise. Like trying to kill an ant with an atom bomb!

20. Re: Austin Clustering meeting thoughts

adrian.brock May 8, 2006 6:19 AM (in response to timfox)

"mark.little@jboss.com" wrote:
"adrian@jboss.org" wrote:

JMS does not require total ordering, except as it appears
for one client connection/session.

Agreed, but I think the point Tim and Bela were discussing was about replicating the JMS system so that it was highly available. In that case, if you are using an active replication protocol to emulate a highly available, fault-tolerant implementation, all of the replicas MUST receive the same set of messages in the same order (and they MUST start in the same state). Otherwise you don't get deterministic behaviour. This is irrespective of the entity you are trying to replicate (the same is the case for replicating a transaction system, a spreadsheet, a bank account, etc.)

I agree you need consistent behaviour on which nodes control
the "singleton location" and the backups. The only time this
impacts the clients is when the jms cluster is making decisions
to move the singleton to different nodes because of failure or
load balancing concerns.

As long the "protocol" is correct, you might only get some inefficieny during the move, e.g.
Cluster: decide to mode queue from node A to node B
Cluster: tell A the queue is now on B
Client: connect to node A told to talk to B instead
Client: connect to node B, told talk to node A (B doesn't know about the change yet)
Cluster: finally tells B it is the singleton in control
Client: connect to node A, told to talk to node B
Client: connect to node B, all is well

21. Re: Austin Clustering meeting thoughts

adrian.brock May 8, 2006 6:22 AM (in response to timfox)

"mark.little@jboss.com" wrote:
"adrian@jboss.org" wrote:

If you have competing senders there is no guarantee how the race
will be resolved so there is no point introducing total ordering.
This is true for one node as it is for clustered nodes.

This is replication for high availability and not for load balancing. Two different things, which can be catered for using replication techniques, but those techniques are different. If you want to do both, then you need a suitable replication protocol (which may, as you point out, have to be tailored for the system being replicated if we want to optimise it).

HA >> load balancing

That is, more people are interested in configuring JMS for guaranteed
delivery behaviour.

Like I said before, once you get into contested queues the idea
of two clients talking to different nodes to remove messages from the
same queue just looks expensive to me.

Better to transparently make the nodes talk to the same singleton
either directly or via proxying.

22. Re: Austin Clustering meeting thoughts

timfox May 8, 2006 6:24 AM (in response to timfox)

"adrian@jboss.org" wrote:
Guys, I know I've told all you before about this, except Mark Little
(and I might have only mentioned it in passing to Tim?).

JMS does not require total ordering, except as it appears
for one client connection/session.

True, but if you want to support competing consumers on a queue (not required by JMS but just about every JMS implementation does it), then if you're queue is load balanced across several nodes, and you have one consumer on node A and another on node B then you need to make sure that both consumers don't get the same message when they call receive().

In that case I think you need total ordering so you can ensure that each node gets the receive() call in the same order.

Alternatively you could just have a singleton queue on one node and each node forwards it's receive() to that - which is what I think you suggested, but that doesn't seem a scaleable solution to me since you've then got lots of contention on that single queue instance to retrieve the messages which will degrade with the number of nodes.

Interesting discussion though, and I am sure this is just the start ;)

Need to pack my bags to catch my plane now...

23. Re: Austin Clustering meeting thoughts

adrian.brock May 8, 2006 6:27 AM (in response to timfox)

"mark.little@jboss.com" wrote:

That would work as long as your subscribers aren't part of a replica group themselves and in which case they'd either have to get messages from the same queue/topic, or see messages in the same order. If you don't ensure that the states of replicas remain identical, then consistency and correctness go out the window!

If they are truly independent subscribers (essentially not replicas of one another) then I agree, the order isn't important. This is an example of using application semantics to optimize the protocol though. For example, if you combine transactions and replication, you don't need total ordering either because the transaction system will impose the ordering when necessary.

That is a different problem. Requiring all topic subscriptions to have
the same ordering is a stronger semantic than JMS provides.
It is a "value add" configuration that must be dealt with at the
topic configuration level.

24. Re: Austin Clustering meeting thoughts

adrian.brock May 8, 2006 6:33 AM (in response to timfox)

"timfox" wrote:
"adrian@jboss.org" wrote:
Guys, I know I've told all you before about this, except Mark Little
(and I might have only mentioned it in passing to Tim?).

JMS does not require total ordering, except as it appears
for one client connection/session.

True, but if you want to support competing consumers on a queue (not required by JMS but just about every JMS implementation does it), then if you're queue is load balanced across several nodes, and you have one consumer on node A and another on node B then you need to make sure that both consumers don't get the same message when they call receive().

In that case I think you need total ordering so you can ensure that each node gets the receive() call in the same order.

Alternatively you could just have a singleton queue on one node and each node forwards it's receive() to that - which is what I think you suggested, but that doesn't seem a scaleable solution to me since you've then got lots of contention on that single queue instance to retrieve the messages which will degrade with the number of nodes.

Interesting discussion though, and I am sure this is just the start ;)

Need to pack my bags to catch my plane now...

The problem with load balancing the queue is that you need
to replicate the queue. So you are doing the same work over network
anyway (probably a lot more since you need to slow down the
cluster with the overhead of the locking/ordering guarantee).

In most cases, this is likely to be redudant anyway.
Simply forwarding the client to the singleton means you can
use an in memory lock and messages only go to other nodes
(besides the backup(s)) as required.

25. Re: Austin Clustering meeting thoughts

marklittle May 8, 2006 6:37 AM (in response to timfox)

"adrian@jboss.org" wrote:
"mark.little@jboss.com" wrote:

That would work as long as your subscribers aren't part of a replica group themselves and in which case they'd either have to get messages from the same queue/topic, or see messages in the same order. If you don't ensure that the states of replicas remain identical, then consistency and correctness go out the window!

If they are truly independent subscribers (essentially not replicas of one another) then I agree, the order isn't important. This is an example of using application semantics to optimize the protocol though. For example, if you combine transactions and replication, you don't need total ordering either because the transaction system will impose the ordering when necessary.

That is a different problem. Requiring all topic subscriptions to have
the same ordering is a stronger semantic than JMS provides.
It is a "value add" configuration that must be dealt with at the
topic configuration level.

I agree that total order is overkill iff the topic subscribers are not related (not replicas of one another). Otherwise, they do need to see the same set of messages in the same order. However, even then you may be able to accomplish this via some deterministic algorithm at the side of the topic/queue combined with reliable and unordered message delivery (but I can see some windows of vulernability there).

I'm not convinved (and I think this is where we agree) that it's necessarily the behaviour (ordered and reliable) the majority of users want from an HA solution.

26. Re: Austin Clustering meeting thoughts

adrian.brock May 8, 2006 6:39 AM (in response to timfox)

"adrian@jboss.org" wrote:

The problem with load balancing the queue is that you need
to replicate the queue.

In fact, you need to more than this. You also need to provide
a consistent view of:
1) Who is waiting for messages
2) What messages have been (n)acked such that they are not mistakenly
re-introduced into the queue or re-introduced with the wrong state
(e.g. redelivery count, redelivery delay, etc.)

27. Re: Austin Clustering meeting thoughts

marklittle May 8, 2006 6:39 AM (in response to timfox)

"adrian@jboss.org" wrote:

I agree you need consistent behaviour on which nodes control
the "singleton location" and the backups. The only time this
impacts the clients is when the jms cluster is making decisions
to move the singleton to different nodes because of failure or
load balancing concerns.

As long the "protocol" is correct, you might only get some inefficieny during the move, e.g.
Cluster: decide to mode queue from node A to node B
Cluster: tell A the queue is now on B
Client: connect to node A told to talk to B instead
Client: connect to node B, told talk to node A (B doesn't know about the change yet)
Cluster: finally tells B it is the singleton in control
Client: connect to node A, told to talk to node B
Client: connect to node B, all is well

Seems like passive (primary copy) replication would be a much better approach :-)

28. Re: Austin Clustering meeting thoughts

marklittle May 8, 2006 6:41 AM (in response to timfox)

"adrian@jboss.org" wrote:
"mark.little@jboss.com" wrote:
"adrian@jboss.org" wrote:

If you have competing senders there is no guarantee how the race
will be resolved so there is no point introducing total ordering.
This is true for one node as it is for clustered nodes.

This is replication for high availability and not for load balancing. Two different things, which can be catered for using replication techniques, but those techniques are different. If you want to do both, then you need a suitable replication protocol (which may, as you point out, have to be tailored for the system being replicated if we want to optimise it).

HA >> load balancing

That is, more people are interested in configuring JMS for guaranteed
delivery behaviour.

Like I said before, once you get into contested queues the idea
of two clients talking to different nodes to remove messages from the
same queue just looks expensive to me.

Better to transparently make the nodes talk to the same singleton
either directly or via proxying.

I agree, but that doesn't negate the original issue (or my reading of it): how to ensure that the node that is hosting the queue is highly available?

29. Re: Austin Clustering meeting thoughts

adrian.brock May 8, 2006 6:51 AM (in response to timfox)

"mark.little@jboss.com" wrote:

I agree, but that doesn't negate the original issue (or my reading of it): how to ensure that the node that is hosting the queue is highly available?

I think you mean the queue is HA? :-)
The node is just a JVM that can crash at anytime.

I said before there needs to be a "hot" backup. That is provided for
either by replication or shared persistent store/logs.

This introduces the other major problem. That of messages/transactions getting temporarily "lost" until the crashed node recovers any
prepared transactions.

e.g. In the store/forward protocol, it is possible that the client
has been told the message will be delivered, but it won't actually
be delivered until the node it sent the message is recovered
and forwards the message to the real destination.

This does not break the JMS spec, which just guarantees delivery.
It doesn't guarantee when or even that you can send a message
to a queue and then instantaneosly re-retrieve the message.
JMS has quite weak "atomic" requirements in that respect.

send() just means it will be delivered at some point.