1 2 Previous Next 19 Replies Latest reply on May 16, 2011 12:12 PM by clebert.suconic

    Messages being dropped during failover?

    stwhit

      If a JMS consumer is using a transacted session, receives a message, attempts to commit, and receives a TransactionRolledBackException on the commit() call, what should that consumer do with that message?

       

      I believe the answer is "discard it".  But if that is the case, I've got some code that demonstrates that (sometimes) that message will never get redelivered.  See the attachment.

       

      This example uses a live/backup HA pair of hornetq instances.  The application starts 2 threads (a producer and a consumer).  The producer produces messages containing incrementing numbers beginning with 1.  After a period of time, the live server is killed, and everything fails over to the backup properly.

       

      When the producer is finished, it sends a final message containing the negative of the total number of messages sent.  So, when the consumer receives a negative value, it know the total number of messages to expect, and will shut down when all those messages have been received.

       

      Many times, this example will run to successful completion, which means the consumer received all the messages it expected.  But sometimes, the example will hang.  During these runs, the consumer never exits, because it is forever waiting on a message it will never receive.  The message that the consumer is waiting for is the message that it received but failed to commit.

       

      To run the example, untar it, set your HORNETQ_222_HOME environment variable to point to the directory containing a hornetq 2.2.2 installation, and run build.sh.

       

      If you see the "SUCCESS!" message, that means the problem didn't occur during that run.  Please re-run a few times, and you should see the problem.

       

      I've also attached a couple of logs, one demonstrating successful completion of the example, the other demonstrating the consumer hanging waiting for a message that never arrives.

        • 1. Messages being dropped during failover?
          clebert.suconic

          Are you failing during a commit or something like that on the test?

           

          Are you using XA?

           

           

          You would need XA to completely avoid issues. the system could throw you an connection exception and the system failed after the write was accepted.

          • 2. Messages being dropped during failover?
            stwhit

            Thanks for the response.

             

            The test sometimes successfully runs to completion, sometimes hangs forever because a message the consumer expects never arrives.

             

            No, I'm not using XA.  I am using a transacted JMS session.  I thought this would be sufficient to guarantee once-and-only-once delivery.  The manual states:

             

            By catching the rollback exceptions and retrying, catching unblocked calls                        and enabling duplicate detection, once and only once delivery guarantees for                        messages can be provided in the case of failure, guaranteeing 100% no loss                        or duplication of messages.

             

            This led me to believe that the code, as written, was sufficient for once-and-only-once delivery.

             

            To be honest, I'm pretty new to JMS and totally new to XA, so maybe I need to do some more homework on XA.  Thanks.

            • 3. Messages being dropped during failover?
              stwhit

              (thinking about this a little more...)

               

              I used the "jms/transaction-failover" example shipped with HornetQ as the basis for my example code.  The readme.html for that example states:

               

              When a transacted JMS session is used, once-and-only once delivery is guaranteed.

               

              So I don't understand how XA comes into this picture.  What am I missing?  Thanks!

              • 4. Messages being dropped during failover?
                clebert.suconic

                " When a transacted JMS session is used, once-and-only once delivery is guaranteed. "

                 

                 

                err.... when it succeeds....

                 

                 

                When it fails, you could have the server failing after the commit was written to the disk.

                 

                And this is not a HornetQ's exclusive "feature". That goes the same with DBs, and other systems as well.

                • 5. Re: Messages being dropped during failover?
                  stwhit

                  Yes, I understand that only if the transaction succeeds should my application consider a message "sent" or "received".  Please look at the sample code.  The message producer uses the following code to send messages:

                   

                   

                     private boolean sendMessage(long value) throws JMSException {

                   

                        ObjectMessage message = session.createObjectMessage(new Long(value));

                        message.setStringProperty(Message.HDR_DUPLICATE_DETECTION_ID.toString(), jobId + "_" + Long.toString(value));

                   

                        messageProducer.send(message);

                   

                        try {

                           session.commit();

                           System.out.println("Sent " + value);

                           return true;

                        } catch (TransactionRolledBackException ex) {

                           System.out.println("Exception sending " + value + ": " + ex);

                           return false;

                        }

                     }

                   

                  If the transaction is rolled back when sending a value, this method returns false, which causes my app to resend the value.

                   

                  The consumer uses the following code to receive messages:

                   

                   

                     private long receiveMessage() throws JMSException {

                        while (true) {

                           ObjectMessage message = (ObjectMessage)messageConsumer.receive();

                           long value = ((Long)message.getObject()).longValue();

                           try {

                              session.commit();

                              System.out.println("Received (and committed) " + value);

                              return value;

                           } catch (TransactionRolledBackException ex) {

                              System.out.println("Received " + value + " but transaction was rolled back, so discarding...");

                           }

                        }

                     }

                   

                  Again, we're using a transacted JMS session, and calling .commit() to determine whether the message was successfully received or not.  If the .commit() call throws a TransactionRolledBackException, the value is discarded and we call .receive() again.

                  • 6. Re: Messages being dropped during failover?
                    clebert.suconic

                    There's currently no way to verify if the server failed after or before persisting the commit on the disk.

                     

                    The only way to guarantee this atm is with XA.

                    • 7. Re: Messages being dropped during failover?
                      stwhit

                      I see.  Thank you very much for your help.

                      • 8. Messages being dropped during failover?
                        timfox

                        It would specifically designed this way so you don't need XA.

                         

                        It works like this:

                         

                        If you send a message in a local tx (not XA!), and commit fails because failover happened around then, *you do not know whether it failed before or after the tx actually committed on the server*. That means you do not know whether the message reached the server or not.

                         

                        The correct thing to do in this case is to *resend the message* to the server. Make sure you enable duplicate detection. Now, if the message *did* originally hit the server, when you send it again it won't matter since duplicate detection will weed out the duplicate. if the original message didn't reach the server, all is well and good, now it has the message.

                         

                        This is how you get once and only once without XA. You shouldn't need XA for this.

                        • 9. Messages being dropped during failover?
                          leosbitto

                          Tim, sending is clear, but what would you suggest when the JMS Session is used to receive some message(s) and committing of this JMS Session fails? The client then doesn't know whether the server performed the commit (i.e. deleted the received messages) or not (i.e. the messages will be redelivered), and maybe that client's processing of the received messages is not idempotent...

                          • 10. Messages being dropped during failover?
                            stwhit

                            Leos has gotten to the heart of my question.  Thank you Leos.

                             

                            If you look at the code, the producer *is* resending messages using duplicate detection.

                             

                            The question is: what to do in the consumer?  The consumer calls .commit(), but doesn't know whether the message will be redelivered or not.  As my code demonstrates, sometimes the message is redelivered, and sometimes it is not.  What is the appropriate way to handle this situation?

                             

                            Thanks guys.

                            • 11. Messages being dropped during failover?
                              clebert.suconic

                              You could have an unbound topic/address (Address or Topic without any queues or Subscriptions), and send one message with a duplicateID set to that "fake" destination per transaction. When you replay the transaction you could send the same duplicateID and you would know if the TX was committed or not.

                               

                               

                              Well... having said that... I believe I could encapsulate this behavior on failover. Changing commit to send an UUID. Case the commit fails I could then wait reconnection or failover and verify for that UUID.

                               

                               

                              @Stuart: can you open a JIRA for this? I had one open at some point but I closed it in favor of XA for receiving as I didn't think about this solution by then.

                               

                              Call it "Making non XA Transactions Idempotent for receiving messages with reconnection or failover"

                              • 12. Messages being dropped during failover?
                                timfox

                                Ah, on consume. This is a classic issue, that affects just about every messaging system I know of.

                                 

                                To deal with this you really need to do client side duplicate detection on the consumer.

                                • 13. Messages being dropped during failover?
                                  clebert.suconic
                                  Ah, on consume. This is a classic issue, that affects just about every messaging system I know of.

                                   

                                   

                                  I believe I found a solution that will work for us. Having failover do some sort of duplicate detection using IDs on failover.

                                   

                                  I will post it next week on he dev-forums.

                                  • 14. Messages being dropped during failover?
                                    stwhit

                                    @Clebert: I am out of town and will return late next week and will open the JIRA then. Thanks.

                                    1 2 Previous Next