1 2 3 4 Previous Next 49 Replies Latest reply on Oct 26, 2006 10:13 AM by clebert.suconic

    Client failover redeliveries discussion

    clebert.suconic

      I already have consumers seamlessly reconnecting to a new server during a HA event.

      Basically what I do on failover is to create a new consumer on the new server, replace IDs and re-register the Callback handler. The server at this point will think it's a new client coming. In case of durable subscribers/queues and persistent messages you will have the queue refilled

      There are some issues that I would like to discuss now:

      If the consumer receives a message from CallBack but if it didn't send an ACK yet, after the failover, the server not knowing the message might throw an exception (messageId not known).

      There are a couple of use cases we have to consider.
      - Persistent Messages. (how to treat a redelivery).
      - Should we send the list of previously ACKs to the server?
      - Should we ignore ACKs for non existent messages on the server?

      Second point also:

      What to do when a durable subscriber gets the queue refilled?
      - The client will probably receive the message again. I would just ignore redeliveries.

      I'm considering having a conference call with developers about these possibilities.


      Clebert Suconic

        • 1. Re: Client failover redeliveries discussion
          timfox

           

          "clebert.suconic@jboss.com" wrote:
          I already have consumers seamlessly reconnecting to a new server during a HA event.

          Basically what I do on failover is to create a new consumer on the new server, replace IDs and re-register the Callback handler.


          Sounds about right. But what do you mean by "replace ids"?



          There are some issues that I would like to discuss now:

          If the consumer receives a message from CallBack but if it didn't send an ACK yet, after the failover, the server not knowing the message might throw an exception (messageId not known).

          There are a couple of use cases we have to consider.
          - Persistent Messages. (how to treat a redelivery).
          - Should we send the list of previously ACKs to the server?



          Yes - we should send the ids of every persistent message as part of the failover protocol - the server then repopulates the delivery list in the server consumer endpoint


          - Should we ignore ACKs for non existent messages on the server?


          Non existent messages on the server will be non persistent messages that didn't survive the failover.

          They should be removed from the client side list on failover so the acks will never get sent.




          Second point also:

          What to do when a durable subscriber gets the queue refilled?
          - The client will probably receive the message again. I would just ignore redeliveries.



          I don't understand the issue here. Can you explain more?


          I'm considering having a conference call with developers about these possibilities.

          Clebert Suconic


          Yes, let's do that

          • 2. Re: Client failover redeliveries discussion
            clebert.suconic

             

            "timfox" wrote:
            "clebert.suconic@jboss.com" wrote:
            I already have consumers seamlessly reconnecting to a new server during a HA event.

            Basically what I do on failover is to create a new consumer on the new server, replace IDs and re-register the Callback handler.


            Sounds about right. But what do you mean by "replace ids"?


            Take an example of recreating the connection.
            On the HA experiment Branch, look at ConnectionAspect::handleFailoever:

            failoever receives a new ClientConnectionDelegate as the parameter. The idea is to get a new connection, but keep the actual delegates we are using.

            Creating a new connection on the new server will create a new server Object, consequently a new ServerId.

            The method State.failoever and Delegate transferHAState will be both responsible on making the delegates on the old connection assuming new IDs coming from the new server.

             ClientConnectionDelegate otherConnection = (ClientConnectionDelegate)((MethodInvocation)invocation).getArguments()[0];
             ConnectionState newConnectionState = (ConnectionState)((ClientConnectionDelegate)otherConnection).getState();
            
             currentConnectionState.failOver(newConnectionState);
            
             if (currentConnectionState.getClientID()!=null)
             {
             otherConnection.setClientID(currentConnectionState.getClientID());
             }
            
             // Transfering state from newDelegate to currentDelegate
             currentDelegate.transferHAState(otherConnection);
            
            
            


            • 3. Re: Client failover redeliveries discussion
              timfox

               

              "clebert.suconic@jboss.com" wrote:
              The idea is to get a new connection, but keep the actual delegates we are using.


              I see what you are doing. You want to re-use the old delegate instance - in which case you need to change the ids.

              Having said that, why do you want to keep the old delegates? Why not just create new ones? Then you won't have to change any ids - I would have thought this would be simpler and it's what I imagined when I thought this through.

              • 4. Re: Client failover redeliveries discussion
                clebert.suconic

                 

                "Tim" wrote:

                "Clebert" wrote:

                Second point also:

                What to do when a durable subscriber gets the queue refilled?
                - The client will probably receive the message again. I would just ignore redeliveries.


                I don't understand the issue here. Can you explain more?


                The consumer is going to be recreated the same way the connection on the previous example. (creating a new consumer / replacing the IDs on the old objects).

                On that case, the new server will think it's a new client connecting and it will resend non committed messages from a durable subscription. (So I hope).

                On that case, I'm considering to ignore message previously sent, and on the list of CurrentTransaction.ACKs(). I just want to know if this is everybody's expected behavior.

                • 5. Re: Client failover redeliveries discussion
                  clebert.suconic

                   

                  "timfox" wrote:
                  "clebert.suconic@jboss.com" wrote:
                  The idea is to get a new connection, but keep the actual delegates we are using.


                  I see what you are doing. You want to re-use the old delegate instance - in which case you need to change the ids.

                  Having said that, why do you want to keep the old delegates? Why not just create new ones? Then you won't have to change any ids - I would have thought this would be simpler and it's what I imagined when I thought this through.


                  That's was my first idea, to repalce delegates. But it was easier to just replace the ID instead of replacing eventually listeners and things like that.

                  If I wanted to repalce the entire delegate I would have more data to replace (think about listeners), consequently more space for errors.

                  Replacing the ID was just easier.

                  • 6. Re: Client failover redeliveries discussion
                    clebert.suconic

                    Another point also was the relationship between States, Delegates and JMS facades. Besides the state itself I would have to eventually treat/replace more data.

                    • 7. Re: Client failover redeliveries discussion
                      timfox

                       

                      "clebert.suconic@jboss.com" wrote:
                      Another point also was the relationship between States, Delegates and JMS facades. Besides the state itself I would have to eventually treat/replace more data.


                      Ok fair enough

                      • 8. Re: Client failover redeliveries discussion
                        timfox

                         

                        "clebert.suconic@jboss.com" wrote:

                        The consumer is going to be recreated the same way the connection on the previous example. (creating a new consumer / replacing the IDs on the old objects).

                        On that case, the new server will think it's a new client connecting and it will resend non committed messages from a durable subscription. (So I hope).

                        On that case, I'm considering to ignore message previously sent, and on the list of CurrentTransaction.ACKs(). I just want to know if this is everybody's expected behavior.


                        I don't see why anything would be received twice (apart from non persistent messages since we lose the acks for them but that's fine)

                        As long as you make sure you send the list of ids for the unacked peristent messages and recreate the delivery list on the server you should be fine, before any message delivery occurs then you should be fine.

                        • 9. Re: Client failover redeliveries discussion
                          timfox

                          So to summarise:

                          1) Detect failover
                          2) Find the "correct" failover server. (This may take several hops)
                          3) Let the server "stall" you until server failover has completed
                          4) Recreate the conections, sessions, consumers, producers and browsers. (Swapping ids here sounds fine)
                          5) Delete any non persistent messages from the client list of unacked messages in any sessions in the failed connection.
                          6) Send a list of the ids of the peristent messages for each consumer that failed to the server. For each list recreate the ServerConsumerEndpoint delivery list by removing the refs from the channel and creating deliveries and putting in the list.
                          7) The connection is now ready to be used

                          Note: We must also ensure that no new connections are created on the failover node while old connections are being recreated, otherwise we can have a situation where the new connections grab the messages which have already been delivered to consumers in the failed connecton!

                          • 10. Re: Client failover redeliveries discussion
                            clebert.suconic

                             

                            "Tim" wrote:

                            I don't see why anything would be received twice (apart from non persistent
                            messages since we lose the acks for them but that's fine)


                            There are some scenarios I'm thinking about, all of them under incomplete transactions.

                            I will create few testcases and will come up with them to the discussion later.


                            As for your summary list. I'm already doing 4, and will be working on 5 and 6 now. 1-3 is dependent on how we are going to discover nodes on ConnectionFactories. (dependent on Remoting)

                            • 11. Re: Client failover redeliveries discussion
                              ovidiu.feodorov


                              Clebert wrote:

                              Basically what I do on failover is to create a new consumer on the new server, replace IDs and re-register the Callback handler. The server at this point will think it's a new client coming. In case of durable subscribers/queues and persistent messages you will have the queue refilled


                              Could you explain in a little bit more detail how this works? You detect the failure at remoting level, and then, what happens? Do you preserve the Connection/Session/Consumer hierarchy on the client? How do you create the corresponding endpoint hierarchy on the fail-over ServerPeer?

                              (btw, for the sake of clarity, when you describe a process, could you please qualify the actors a little bit better. For example, instead of saying "what I do on failover is to create a new consumer", you probably wanted to say "what I do on failover is to create a new ClientConsumerDelegate (?) instance")

                              Clebert wrote:

                              If the consumer receives a message from CallBack but if it didn't send an ACK yet, after the failover, the server not knowing the message might throw an exception (messageId not known).


                              The client stack knows that the fail-over took place, so if the message is non-persistent, it doesn't need to send the ACK anyway (from the server's perspective, the message is lost), and if the message is persistent, it will be recovered on the fail-over server, so the ACK will arrive for a known message. Am I am missing something?

                              Tim wrote:

                              Yes - we should send the ids of every persistent message as part of the failover protocol - the server then repopulates the delivery list in the server consumer endpoint

                              What do you mean send the ids of every persistent message as part of the failover protocol. Who sends the ids? The messages are in database, until they are ACKed, and this is where the fail-over server will recover them from, and repopulate the queue.

                              Clebert wrote:

                              I'm considering having a conference call with developers about these possibilities.


                              Before having a conference call, I think a better idea is to summarize what you've implemented so far. Describe how the fail-over happens step by step. Expand http://wiki.jboss.org/wiki/Wiki.jsp?page=NewMessagingHADesign for that.

                              • 12. Re: Client failover redeliveries discussion
                                timfox

                                 

                                "ovidiu.feodorov@jboss.com" wrote:

                                The client stack knows that the fail-over took place, so if the message is non-persistent, it doesn't need to send the ACK anyway (from the server's perspective, the message is lost), and if the message is persistent, it will be recovered on the fail-over server, so the ACK will arrive for a known message. Am I am missing something?



                                This is all explained in my summary (previous post) and on the wiki page.

                                So yes, on failover, any non persistent messages on the client side need to be removed from the list (since they will be lost on the server side) so the ACKs will never be sent, so there will be no exception.


                                What do you mean send the ids of every persistent message as part of the failover protocol. Who sends the ids? The messages are in database, until they are ACKed, and this is where the fail-over server will recover them from, and repopulate the queue.


                                The client needs to send the ids of any persistent messages.
                                Yes, the server repopulates the queues but does not have enough information to populate the server consumer endpoint delivery list. This is what the ids are for.


                                • 13. Re: Client failover redeliveries discussion
                                  clebert.suconic

                                   

                                  "ovidiu.feodorov@jboss.com" wrote:

                                  Clebert wrote:

                                  Basically what I do on failover is to create a new consumer on the new server, replace IDs and re-register the Callback handler. The server at this point will think it's a new client coming. In case of durable subscribers/queues and persistent messages you will have the queue refilled


                                  Could you explain in a little bit more detail how this works? You detect the failure at remoting level, and then, what happens? Do you preserve the Connection/Session/Consumer hierarchy on the client? How do you create the corresponding endpoint hierarchy on the fail-over ServerPeer?


                                  At this point I'm not detecting the failure at remoting level yet.
                                  There is a method called failover on ClientConnectionDelegate:

                                  class ClientConnectionDelegate:
                                   public void failOver(ConnectionDelegate newConnection)
                                  


                                  That method is implemented as an aspect into ConnectionAspect::handleFailover.

                                  handleFailover will navigate over the state hierarchy described at http://wiki.jboss.org/wiki/Wiki.jsp?page=NewMessagingHADesign, and it will create another ServerSide object for each instance found on the hierarchy.

                                  Every time a new serverObject is created, the client object ID is swaped by the new created object, and the remotingConnection instance is swaped by the connection passed as a parameter. This way, all the failed delegates will take place on the new server.


                                  "Ovidiu" wrote:
                                  Before having a conference call, I think a better idea is to summarize what you've implemented so far. Describe how the fail-over happens step by step.


                                  I will work with some testcases today, and update the design document. After that we will be ready for a conference about this. This thread has provided the discussion I needed already.



                                  • 14. Re: Client failover redeliveries discussion
                                    clebert.suconic

                                    I have added a test into org.jboss.test.messaging.jms.ManualClusteringTest, named testHAFailoeverRequirementsOnDurable.

                                    The test is basically mimicking what will happen when a failover will recreate the subscription in a new server.

                                    Please take a look on the test and lets discuss about it.

                                    I have also updated http://wiki.jboss.org/wiki/Wiki.jsp?page=NewMessagingHADesign with the description of the actual HA implementation.

                                    (I"m post editting this message... I have removed the test added to ManualClusteringTest as the test is not valid. JBoss Messaging uses a partial queue concept that will require some sort of server transfer when a connection is failed over)

                                    Clebert Suconic

                                    1 2 3 4 Previous Next