1 2 3 Previous Next 43 Replies Latest reply on May 10, 2006 9:14 AM by marklittle Go to original post
      • 15. Re: Austin Clustering meeting thoughts
        marklittle

        You'd think it's trivial, but the number of implementations (commercial and OSS) that get it wrong has been surprising. By "wrong" I mean: over engineered. Best effort doesn't even require an ack from the receiver: you can fabricate the ack (if one is needed) at the sender, because ultimately it won't be able to tell the difference between a situation where the message was delivered but the receiver subsequently crashed and the situation where the message was never delivered (e.g., because the sender dropped it on the floor due to flow-control issues).

        • 16. Re: Austin Clustering meeting thoughts
          marklittle

           

          "adrian@jboss.org" wrote:

          It is the guaranteed **ONLY ONCE** delivery that is hard.


          Oh and I know it's hard, but solutions have existed for well over a decade (can point you at my PhD if you really want ;-)

          Mark.


          • 17. Re: Austin Clustering meeting thoughts
            marklittle

             

            "adrian@jboss.org" wrote:
            Guys, I know I've told all you before about this, except Mark Little
            (and I might have only mentioned it in passing to Tim?).

            JMS does not require total ordering, except as it appears
            for one client connection/session.


            Agreed, but I think the point Tim and Bela were discussing was about replicating the JMS system so that it was highly available. In that case, if you are using an active replication protocol to emulate a highly available, fault-tolerant implementation, all of the replicas MUST receive the same set of messages in the same order (and they MUST start in the same state). Otherwise you don't get deterministic behaviour. This is irrespective of the entity you are trying to replicate (the same is the case for replicating a transaction system, a spreadsheet, a bank account, etc.)


            If you have competing senders there is no guarantee how the race
            will be resolved so there is no point introducing total ordering.
            This is true for one node as it is for clustered nodes.


            This is replication for high availability and not for load balancing. Two different things, which can be catered for using replication techniques, but those techniques are different. If you want to do both, then you need a suitable replication protocol (which may, as you point out, have to be tailored for the system being replicated if we want to optimise it).


            e.g. You can have two clients sending messages A and B to a topic
            simultaneously. Two topic subscriptions can see these messages
            in different orders.

            All that is required for HA JMS is:
            1) You have a singleton location that is in charge of a queue/topic
            subscription
            2) A client can connect to the singleton location regardless of which
            machine it actually initially connects to
            3) You replicate/move the messages to the singleton location in a guaranteed way (for persistent mesasges and durable destinations only). i.e. persist/forward/ack/delete
            4) You duplicate the singleton location under the singleton's location control
            (for consistency) this can either be total replication across every node or it could be buddy replication or some other mechanism like shared database.


            That would work as long as your subscribers aren't part of a replica group themselves and in which case they'd either have to get messages from the same queue/topic, or see messages in the same order. If you don't ensure that the states of replicas remain identical, then consistency and correctness go out the window!

            If they are truly independent subscribers (essentially not replicas of one another) then I agree, the order isn't important. This is an example of using application semantics to optimize the protocol though. For example, if you combine transactions and replication, you don't need total ordering either because the transaction system will impose the ordering when necessary.

            • 18. Re: Austin Clustering meeting thoughts

               

              "mark.little@jboss.com" wrote:
              "adrian@jboss.org" wrote:

              It is the guaranteed **ONLY ONCE** delivery that is hard.


              Oh and I know it's hard, but solutions have existed for well over a decade (can point you at my PhD if you really want ;-)

              Mark.


              I mean hard to do efficiently. Not hard as in we don't know how to do it. :-)
              Introducing things like total ordering across the cluster is not efficient!

              I've always argued that "load blancing" queues is inefficent as well.
              It is like trying to "load blanance" SFSB or a web session.
              i.e. Lots of needless replication when you can just make the
              state sticky to a node coupled with some kind of "hot" backup.

              • 19. Re: Austin Clustering meeting thoughts
                marklittle

                 

                "adrian@jboss.org" wrote:
                "mark.little@jboss.com" wrote:
                "adrian@jboss.org" wrote:

                It is the guaranteed **ONLY ONCE** delivery that is hard.


                Oh and I know it's hard, but solutions have existed for well over a decade (can point you at my PhD if you really want ;-)

                Mark.


                I mean hard to do efficiently. Not hard as in we don't know how to do it. :-)
                Introducing things like total ordering across the cluster is not efficient!

                I've always argued that "load blancing" queues is inefficent as well.
                It is like trying to "load blanance" SFSB or a web session.
                i.e. Lots of needless replication when you can just make the
                state sticky to a node coupled with some kind of "hot" backup.


                I agree. You've got to introduce semantic information (like, as you pointed out, the fact that two different clients can see the same set of messages in different orders) and optimize for the common cases, rather than take the brute-force approach, which would give you something that worked, but ran like a tortoise. Like trying to kill an ant with an atom bomb!

                • 20. Re: Austin Clustering meeting thoughts

                   

                  "mark.little@jboss.com" wrote:
                  "adrian@jboss.org" wrote:

                  JMS does not require total ordering, except as it appears
                  for one client connection/session.


                  Agreed, but I think the point Tim and Bela were discussing was about replicating the JMS system so that it was highly available. In that case, if you are using an active replication protocol to emulate a highly available, fault-tolerant implementation, all of the replicas MUST receive the same set of messages in the same order (and they MUST start in the same state). Otherwise you don't get deterministic behaviour. This is irrespective of the entity you are trying to replicate (the same is the case for replicating a transaction system, a spreadsheet, a bank account, etc.)


                  I agree you need consistent behaviour on which nodes control
                  the "singleton location" and the backups. The only time this
                  impacts the clients is when the jms cluster is making decisions
                  to move the singleton to different nodes because of failure or
                  load balancing concerns.

                  As long the "protocol" is correct, you might only get some inefficieny during the move, e.g.
                  Cluster: decide to mode queue from node A to node B
                  Cluster: tell A the queue is now on B
                  Client: connect to node A told to talk to B instead
                  Client: connect to node B, told talk to node A (B doesn't know about the change yet)
                  Cluster: finally tells B it is the singleton in control
                  Client: connect to node A, told to talk to node B
                  Client: connect to node B, all is well

                  • 21. Re: Austin Clustering meeting thoughts

                     

                    "mark.little@jboss.com" wrote:
                    "adrian@jboss.org" wrote:

                    If you have competing senders there is no guarantee how the race
                    will be resolved so there is no point introducing total ordering.
                    This is true for one node as it is for clustered nodes.


                    This is replication for high availability and not for load balancing. Two different things, which can be catered for using replication techniques, but those techniques are different. If you want to do both, then you need a suitable replication protocol (which may, as you point out, have to be tailored for the system being replicated if we want to optimise it).


                    HA >> load balancing

                    That is, more people are interested in configuring JMS for guaranteed
                    delivery behaviour.

                    Like I said before, once you get into contested queues the idea
                    of two clients talking to different nodes to remove messages from the
                    same queue just looks expensive to me.

                    Better to transparently make the nodes talk to the same singleton
                    either directly or via proxying.

                    • 22. Re: Austin Clustering meeting thoughts
                      timfox

                       

                      "adrian@jboss.org" wrote:
                      Guys, I know I've told all you before about this, except Mark Little
                      (and I might have only mentioned it in passing to Tim?).

                      JMS does not require total ordering, except as it appears
                      for one client connection/session.


                      True, but if you want to support competing consumers on a queue (not required by JMS but just about every JMS implementation does it), then if you're queue is load balanced across several nodes, and you have one consumer on node A and another on node B then you need to make sure that both consumers don't get the same message when they call receive().

                      In that case I think you need total ordering so you can ensure that each node gets the receive() call in the same order.

                      Alternatively you could just have a singleton queue on one node and each node forwards it's receive() to that - which is what I think you suggested, but that doesn't seem a scaleable solution to me since you've then got lots of contention on that single queue instance to retrieve the messages which will degrade with the number of nodes.

                      Interesting discussion though, and I am sure this is just the start ;)

                      Need to pack my bags to catch my plane now...

                      • 23. Re: Austin Clustering meeting thoughts

                         

                        "mark.little@jboss.com" wrote:

                        That would work as long as your subscribers aren't part of a replica group themselves and in which case they'd either have to get messages from the same queue/topic, or see messages in the same order. If you don't ensure that the states of replicas remain identical, then consistency and correctness go out the window!

                        If they are truly independent subscribers (essentially not replicas of one another) then I agree, the order isn't important. This is an example of using application semantics to optimize the protocol though. For example, if you combine transactions and replication, you don't need total ordering either because the transaction system will impose the ordering when necessary.


                        That is a different problem. Requiring all topic subscriptions to have
                        the same ordering is a stronger semantic than JMS provides.
                        It is a "value add" configuration that must be dealt with at the
                        topic configuration level.

                        • 24. Re: Austin Clustering meeting thoughts

                           

                          "timfox" wrote:
                          "adrian@jboss.org" wrote:
                          Guys, I know I've told all you before about this, except Mark Little
                          (and I might have only mentioned it in passing to Tim?).

                          JMS does not require total ordering, except as it appears
                          for one client connection/session.


                          True, but if you want to support competing consumers on a queue (not required by JMS but just about every JMS implementation does it), then if you're queue is load balanced across several nodes, and you have one consumer on node A and another on node B then you need to make sure that both consumers don't get the same message when they call receive().

                          In that case I think you need total ordering so you can ensure that each node gets the receive() call in the same order.

                          Alternatively you could just have a singleton queue on one node and each node forwards it's receive() to that - which is what I think you suggested, but that doesn't seem a scaleable solution to me since you've then got lots of contention on that single queue instance to retrieve the messages which will degrade with the number of nodes.

                          Interesting discussion though, and I am sure this is just the start ;)

                          Need to pack my bags to catch my plane now...


                          The problem with load balancing the queue is that you need
                          to replicate the queue. So you are doing the same work over network
                          anyway (probably a lot more since you need to slow down the
                          cluster with the overhead of the locking/ordering guarantee).

                          In most cases, this is likely to be redudant anyway.
                          Simply forwarding the client to the singleton means you can
                          use an in memory lock and messages only go to other nodes
                          (besides the backup(s)) as required.

                          • 25. Re: Austin Clustering meeting thoughts
                            marklittle

                             

                            "adrian@jboss.org" wrote:
                            "mark.little@jboss.com" wrote:

                            That would work as long as your subscribers aren't part of a replica group themselves and in which case they'd either have to get messages from the same queue/topic, or see messages in the same order. If you don't ensure that the states of replicas remain identical, then consistency and correctness go out the window!

                            If they are truly independent subscribers (essentially not replicas of one another) then I agree, the order isn't important. This is an example of using application semantics to optimize the protocol though. For example, if you combine transactions and replication, you don't need total ordering either because the transaction system will impose the ordering when necessary.


                            That is a different problem. Requiring all topic subscriptions to have
                            the same ordering is a stronger semantic than JMS provides.
                            It is a "value add" configuration that must be dealt with at the
                            topic configuration level.


                            I agree that total order is overkill iff the topic subscribers are not related (not replicas of one another). Otherwise, they do need to see the same set of messages in the same order. However, even then you may be able to accomplish this via some deterministic algorithm at the side of the topic/queue combined with reliable and unordered message delivery (but I can see some windows of vulernability there).

                            I'm not convinved (and I think this is where we agree) that it's necessarily the behaviour (ordered and reliable) the majority of users want from an HA solution.

                            • 26. Re: Austin Clustering meeting thoughts

                               

                              "adrian@jboss.org" wrote:

                              The problem with load balancing the queue is that you need
                              to replicate the queue.


                              In fact, you need to more than this. You also need to provide
                              a consistent view of:
                              1) Who is waiting for messages
                              2) What messages have been (n)acked such that they are not mistakenly
                              re-introduced into the queue or re-introduced with the wrong state
                              (e.g. redelivery count, redelivery delay, etc.)

                              • 27. Re: Austin Clustering meeting thoughts
                                marklittle

                                 

                                "adrian@jboss.org" wrote:

                                I agree you need consistent behaviour on which nodes control
                                the "singleton location" and the backups. The only time this
                                impacts the clients is when the jms cluster is making decisions
                                to move the singleton to different nodes because of failure or
                                load balancing concerns.

                                As long the "protocol" is correct, you might only get some inefficieny during the move, e.g.
                                Cluster: decide to mode queue from node A to node B
                                Cluster: tell A the queue is now on B
                                Client: connect to node A told to talk to B instead
                                Client: connect to node B, told talk to node A (B doesn't know about the change yet)
                                Cluster: finally tells B it is the singleton in control
                                Client: connect to node A, told to talk to node B
                                Client: connect to node B, all is well


                                Seems like passive (primary copy) replication would be a much better approach :-)

                                • 28. Re: Austin Clustering meeting thoughts
                                  marklittle

                                   

                                  "adrian@jboss.org" wrote:
                                  "mark.little@jboss.com" wrote:
                                  "adrian@jboss.org" wrote:

                                  If you have competing senders there is no guarantee how the race
                                  will be resolved so there is no point introducing total ordering.
                                  This is true for one node as it is for clustered nodes.


                                  This is replication for high availability and not for load balancing. Two different things, which can be catered for using replication techniques, but those techniques are different. If you want to do both, then you need a suitable replication protocol (which may, as you point out, have to be tailored for the system being replicated if we want to optimise it).


                                  HA >> load balancing

                                  That is, more people are interested in configuring JMS for guaranteed
                                  delivery behaviour.

                                  Like I said before, once you get into contested queues the idea
                                  of two clients talking to different nodes to remove messages from the
                                  same queue just looks expensive to me.

                                  Better to transparently make the nodes talk to the same singleton
                                  either directly or via proxying.


                                  I agree, but that doesn't negate the original issue (or my reading of it): how to ensure that the node that is hosting the queue is highly available?

                                  • 29. Re: Austin Clustering meeting thoughts

                                     

                                    "mark.little@jboss.com" wrote:

                                    I agree, but that doesn't negate the original issue (or my reading of it): how to ensure that the node that is hosting the queue is highly available?


                                    I think you mean the queue is HA? :-)
                                    The node is just a JVM that can crash at anytime.

                                    I said before there needs to be a "hot" backup. That is provided for
                                    either by replication or shared persistent store/logs.

                                    This introduces the other major problem. That of messages/transactions getting temporarily "lost" until the crashed node recovers any
                                    prepared transactions.

                                    e.g. In the store/forward protocol, it is possible that the client
                                    has been told the message will be delivered, but it won't actually
                                    be delivered until the node it sent the message is recovered
                                    and forwards the message to the real destination.

                                    This does not break the JMS spec, which just guarantees delivery.
                                    It doesn't guarantee when or even that you can send a message
                                    to a queue and then instantaneosly re-retrieve the message.
                                    JMS has quite weak "atomic" requirements in that respect.

                                    send() just means it will be delivered at some point.