1 2 Previous Next 22 Replies Latest reply on Jun 1, 2006 10:22 AM by timfox

    Non persistent messages and 2PC multicast

    timfox

      As discussed in Austin, if a consumer is attached to a non active node, and sends a message to a destination and the active node is on a different node, and the active node also has a buddy replica, then we should multicast the message across the LAN so both the active node and the backup can pick it up.

      There's a possible race condition here, where the message arrives at the active node, is consumed and acknowledged before the message arrives at the buddy, so it gets removed from the active then some time later arrives at the buddy.

      The active node then fails over onto the buddy, and the consumed messages is there and gets redelivered.

      The solution we discussed for this was to use a 2PC protocol to make sure the message arrives at all participants before continuing.

      In my notes I have it down as "only applies to persistent messages", but I think this has to apply to non persistent messages too, since the race condition can still occur there.

      Any comments?

      I'd like to avoid having to multicast 2PC for non persistent messages too for performance reasons.

        • 1. Re: Non persistent messages and 2PC multicast
          marklittle

          How about a flag on the volatile messages so the sender can say whether or not they are idempotent? If they are, then redoing the work on the backup shouldn't be a problem. If they aren't idempotent, then you need to something else. Can you multicast the response from the primary so that the backup also sees it?

          • 2. Re: Non persistent messages and 2PC multicast

             


            How about a flag on the volatile messages so the sender can say whether or not they are idempotent?


            It is the receiver that specifies their Idempotency
            javax.jms.Session.DUPS_OK_ACKNOWLEDGE


            Can you multicast the response from the primary so that the backup also sees it?


            Only the sender gets the acks back from the multicast.
            Here we are talking about one receiver (the master) knowing
            whether another (the buddy) has a copy before it delivers the
            message to the client.

            Without that you get:

            sender -> mulitcast
            master -> receives multicast
            master -> ack message to sender
            master -> deliver to client
            client -> ack to master
            master -> replicate client's ack to buddy
            buddy -> what are you talking about?
            buddy -> finally processes original multicast

            This is an example where total ordering is useful.
            However 2PR (2 phase replication) also works

            sender -> mulitcast prepare
            master -> receives multicast
            master -> ack message to sender
            master -> wait for confirmation
            buddy -> receives multicast
            buddy -> ack message to sender
            sender -> multicast commit
            master -> receives commit and delivers to the client


            • 3. Re: Non persistent messages and 2PC multicast
              timfox

               

              "adrian@jboss.org" wrote:


              It is the receiver that specifies their Idempotency
              javax.jms.Session.DUPS_OK_ACKNOWLEDGE



              Right, but I guess we could implement a JBoss Messaging specific feature, where the sender specifies that duplicates might happen.

              E.g. DeliveryMode.DUPS_OK

              • 4. Re: Non persistent messages and 2PC multicast
                marklittle

                Total ordering is necessary, I was just wondering if there was a way to only do it when you have to by exploiting application-specific semantics.

                • 5. Re: Non persistent messages and 2PC multicast
                  timfox

                  Ouch.

                  I don't even think 2PC is sufficient since that won't guarantee total ordering.

                  Since if different nodes multicast their message via 2pc to the active and the buddy, both the active and the buddy need to receive in the correct order which won't necessarily be the case.

                  So this means we'd need multicast + total order protocol.......

                  • 6. Re: Non persistent messages and 2PC multicast
                    timfox

                    If we did passive replication rather than active we could avoid the total ordering.

                    So node A sends message to active node. Then active node synchronously sends message to replica(s) before returning.

                    This is more network traffic and more latency though...

                    • 7. Re: Non persistent messages and 2PC multicast

                       

                      "timfox" wrote:

                      I don't even think 2PC is sufficient since that won't guarantee total ordering.


                      You don't need total ordering if you have 2PR.
                      Each node has a view of the state (including persisting it) from the prepare.
                      Nothing is done externally until the commit.

                      Yes, the commit on the buddy might race with the ack from the client via
                      the master, but that is easily dealt with if you have a well defined state
                      machine.

                      Remember as far as the client is concerned it hasn't full acked the
                      message until the invocation on the server returns.

                      client -> ack the message
                      master -> replicate ack to buddy and update internal state (again 2PR)
                      master -> return to client
                      * Only at the this point does the client know there won't be a redelivery.

                      Of course for non persistent messages and DUPS_OK these rules
                      can be relaxed. But only as long a the server(s) don't get in a
                      confused state.

                      • 8. Re: Non persistent messages and 2PC multicast
                        timfox

                         

                        "adrian@jboss.org" wrote:


                        You don't need total ordering if you have 2PR.


                        What about the following situation:

                        I have 4 nodes.

                        node A and B are inactive, node C has the active, and node D has the replica (buddy)

                        I send a message m1 from node A.

                        Prepare(m1) gets multicast.
                        commit (m1) gets multicast

                        Around the same time I send a message m2 from node B.

                        Prepare(m2) gets multicast
                        Commit (m2) gets multicast



                        node C receives this:

                        prepare (m1)
                        prepare (m2)
                        commit (m1)
                        commit (m2)

                        node D receives this:

                        prepare (m2)
                        prepare (m1)
                        commit (m2)
                        commit (m1)

                        Can this happen?

                        If so, then we need total ordering surely...

                        • 9. Re: Non persistent messages and 2PC multicast

                          Yes it can happen, but it is OK.

                          It doesn't matter that the order changed, the messages came from
                          different senders. Their order cannot matter. There is a race
                          condition here anywhere even if you have only one jms server.
                          i.e. Which of the servers A and B gets to send their message first.

                          Total ordering across multiple sender sessions is not a required
                          JMS semantic. It is only required if the sends come from the same
                          session/transaction.

                          Can we put this in the FAQ somewhere. I keep having to repeat this. ;-)

                          Like we discussed on a different thread. If somebody does want this
                          semantic then it is a "value add" configuration, they pay the cost in throughput for this total ordering.
                          It should not be the default!

                          • 10. Re: Non persistent messages and 2PC multicast
                            timfox

                            I know it's not a required JMS semantic, and I want to avoid total ordering if at all possible as much as the next man :)

                            What you're saying is it's ok for the active and the backup to have different states.

                            E.g. active node has messages A, B, C, D

                            but backup has messages C, A, B, D

                            So they're not true replicas. I'm trying to figure if this works.

                            This means when I multicast a receive(), the active node is going to remove the first message - message A and put it in delivering state.

                            But the backup node is going to remove the first message - message C and put it in the delivering state.

                            Then I multicast an ack(message A). The active node will receive it and remove message A.

                            But the backup node will receive it and say "but message A is not in the delivering state" and barf

                            • 11. Re: Non persistent messages and 2PC multicast

                              That's what I said about implementing a proper state machine
                              so the server doesn't get its "knickers in a twist".

                              If the master is going to deliver a message to the client,
                              the buddy needs to know that the message should be in the delivered
                              state. Besides the reason you state, it needs to know this in case
                              the master crashes at this point.

                              master -> deliver -> client
                              master -> crash
                              client -> ack -> failover -> buddy

                              If the message isn't in the delivered state then another client could
                              come in after failover and steal it before the original client acks it.

                              There is no requirement for total ordering here either.
                              All the "put in delivered state" and ack/nack messages to the buddy
                              are coming from the master node which provides the ordering.

                              • 12. Re: Non persistent messages and 2PC multicast
                                timfox

                                 

                                "adrian@jboss.org" wrote:

                                There is no requirement for total ordering here either.
                                All the "put in delivered state" and ack/nack messages to the buddy
                                are coming from the master node which provides the ordering.


                                Ok, I was assuming the "put in delivered state" and ack/nack messages *wouldn't* be coming from the master node, but could be coming from any node.

                                If they are coming from the master node, than I agree that would be ok.

                                My previous assumption was that if a client is connected to a node other than the active node, then any send(Message m), ack(MessageID id), receive() calls would be multicast.

                                My understanding of what you're saying is we should only multicast the send() call to the master and the buddy, and all others (ack, receive) need to be channelled through the master to the buddy to give the ordering guarantee.

                                We could just channel everything through the master (even the send), which is what I suggested earlier:


                                So node A sends message to active node. Then active node synchronously sends message to replica(s) before returning.

                                This is more network traffic and more latency though...


                                At the expense of latency ensuring only the master talks to the replica might make our lives easier.





                                • 13. Re: Non persistent messages and 2PC multicast
                                  ovidiu.feodorov

                                   

                                  Adrian wrote:
                                  master -> deliver to client
                                  client -> ack to master
                                  master -> replicate client's ack to buddy
                                  buddy -> what are you talking about?

                                  The buddy knows it's a buddy, so it's supposed to hold replicated state. When it receives such an out-of-order acknowledgment, why can't it just store it, and then let it to be cancelled out by the (eventually) arriving message?
                                  It doesn't matter the order in which they arrive, what it matters is that eventually, they'll cancel each-other (or don't, if the master crashes and it cannot send the acknowledgment at all)

                                  • 14. Re: Non persistent messages and 2PC multicast
                                    timfox

                                     

                                    "ovidiu.feodorov@jboss.com" wrote:
                                    Adrian wrote:
                                    master -> deliver to client
                                    client -> ack to master
                                    master -> replicate client's ack to buddy
                                    buddy -> what are you talking about?

                                    The buddy knows it's a buddy, so it's supposed to hold replicated state. When it receives such an out-of-order acknowledgment, why can't it just store it, and then let it to be cancelled out by the (eventually) arriving message?
                                    It doesn't matter the order in which they arrive, what it matters is that eventually, they'll cancel each-other (or don't, if the master crashes and it cannot send the acknowledgment at all)


                                    Well, this is all moot now, if we channel everything through the master.

                                    1 2 Previous Next