1 2 Previous Next 27 Replies Latest reply on Feb 6, 2007 1:27 PM by clebert.suconic

    http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Failure

    clebert.suconic

      I have replicated the root cause of MultiThreadFailoverTest in a better scenario (single-threaded).

      If a failure happens on the middle of an ACK invocation. (actually if it happens right after the ACK is completed, on the way back of the response), the message can't be found on the new server, as the message was already ACKed on the previous server.

      Maybe we should ignore failures if can't find the ACK.


      FailoverTest::testFailureRightAfterACK is failing because of this.

        • 1. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
          timfox

          Yes.

          We should not barf if an ack is received and the message can't be found.

          Actually this is valid even without failover.

          In general any invocation can fail as the response is being written to the caller, but after the actual deed has been done on the server, this applies to sends as well as acks.

          In the case of sends this means the call to invoke() throws an exception but the message has actually reached the queue, in which case you don't know whether to retry or not since you don't want duplicate messages in the queue. This is where duplicate message detection becomes useful.

          In the case of an ack it is always safe to retry the ack, *as long* as the server silently ignores the ack if the message can't be found. If we're not doing that already we should.

          • 2. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
            timfox

            In other words, ideally all our operations should be idempotent. This is easy for acks, but not so easy for sends.

            • 3. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
              clebert.suconic

              As part of this discussion I'm also adding testFailureRightBeforeSend and testFailureRightAfterSend

              • 4. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                clebert.suconic

                I have just committed a fix for this.

                Please Tim and Ovidiu.. if you could take a look...

                Especially on JDBCPersistenceManager... MultiThreadFailoverTest was failling on the changed line. If you think this change is not ok I can investigate why this was happening.

                I have used SVN comment as "http://jira.jboss.com/jira/browse/JBMESSAGING-808 - fix". You could locate the changes with this.

                • 5. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                  timfox

                  Looks good.

                  But when you make changes don't just comment things out, get rid of them if they are not needed.

                   if (rows != 1)
                   {
                   // http://jira.jboss.com/jira/browse/JBMESSAGING-808
                   log.warn("Failed to remove row for: " + ref);
                   return;
                   //throw new IllegalStateException("Failed to remove row for: " + ref);
                   }
                  


                  Otherwise we end up with the code scattered with rubbish.

                  If we want to retrieve the previous version, it's in version control.

                  I'm not sure if the log.warn is really necessary either.

                  log.warn would imply there is probably something wrong. Is this the case?

                  • 6. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                    clebert.suconic

                     

                    "Tim Fox" wrote:
                    But when you make changes don't just comment things out, get rid of them if they are not needed.


                    Fair enough... I will remove these comments later (if you haven't done so yet).
                    "Tim Fox" wrote:

                    I'm not sure if the log.warn is really necessary either.

                    log.warn would imply there is probably something wrong. Is this the case?


                    It's not a problem.

                    I just wanted to log ACKs not found case we get lots of them.. (if something is wrong with our code at some point for example).

                    And I kept it as log.warn because I thought if something is wrong wit a config in a Production system. (Say... for example if someone deleted information from the database in the middle of an operation).

                    But we could use log.debug if you think it's better.

                    • 7. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                      clebert.suconic

                      As part of the investigation of this issue, and per what you said about producers, I have found this other one:

                      http://jira.jboss.com/jira/browse/JBMESSAGING-809

                      It has to do with sending messages now. It's a rare event but it can happen if you crash the server under high load. (>30% of probability on MultiThreadFailover if you have numberOfThreadConsumers>numberOfThreadProducers). (you have two properties you can change for these number of threads on MultiThreadFailover)

                      • 8. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                        timfox

                        Well... there's no much you can do about this other than implement duplicate message detection (there's already a task for this).

                        What you could do though, is isolate the exact case in a test.

                        This should be easy: Use the PoisonInterceptor to crash the server as a call to send() is returning. Send a persistent message, then we know the message is in storage after the send, but the client will receive an exception, then try and send the message again, and you'll get an exception (probably PK violation).

                        This would be solved properly by duplicate message detection.

                        Also, for a partial solution, we could introduce a flag "ignore PK violations" which ignores the send if it's already in the database.

                        That solution would only be partial since you could still get duplicate messages sent since the original one might be acked before the second copy is sent, but at least it means that failover will work ok.

                        • 9. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                          clebert.suconic

                          I forgot to say that I had already created the testcase.. (Using the PoisonInterceptor)

                          MultiThreadFailoverTest::testFailureOnSendReceiveSynchronized


                          It crashes the server when you have two threads... each one in a receiver/send method. It always replicate the problem.

                          • 10. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                            timfox

                            You could replicate this even more simply with just a single send, no receive necessary

                            • 11. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                              clebert.suconic

                              I created MultiThreadFailoverTest::testFailureOnSendReceiveSynchronized on Friday because FailoverTest::testFailureRightAfterSend and FailoverTest::testFailureRightAfterSend were not failure.

                              • 12. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                                timfox

                                I'm not sure what those tests do, but you just need to do this:

                                
                                1. send message
                                2. crash the server in the poison interceptor after the send has been handled but before the response is written
                                3. client will get an exception
                                4. try and send the message again - should give a PK violation.
                                




                                • 13. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                                  timfox

                                  Also, no failover is necessary.

                                  The test should can be done in a non clustered environment

                                  • 14. Re: http://jira.jboss.org/jira/browse/JBMESSAGING-808 - Fail
                                    clebert.suconic

                                     

                                    "timfox" wrote:
                                    I'm not sure what those tests do, but you just need to do this:

                                    
                                    1. send message
                                    2. crash the server in the poison interceptor after the send has been handled but before the response is written
                                    3. client will get an exception
                                    4. try and send the message again - should give a PK violation.
                                    





                                    That's what FailoverTest::testFailureRightBeforeSend and FailoverTest::testFailureRightAfterSend are about.



                                    1 2 Previous Next