1 2 Previous Next 25 Replies Latest reply on Jul 19, 2010 3:38 PM by clebert.suconic

    commit ordering issue

    ataylor

      this is regarding http://community.jboss.org/message/551534#551534.

       

      Ive added a test to TransactedSessionTest that shows this. Firstly i thought the issue was just that an ordered executor wasn't being used, however this is noty the case. Secondly I thought it might be how OperationContext worked, basically for every operation, such as a commit, a new ordered executor is created and used to run the work, this could mean that if the first commit took a while, i.e. there were lots of messages to persist, and a second commit arrived the second commit would use a different ordered executor and could complete before the first. I thought setting the main threadpool size to one would stop this thus proving it, however it didnt.

       

      I'm a bit confused by Cleberts code and how it's all meant to hang together, Clebert any chance you could take a look. Th etest should fail every time as its in a loop.

        • 1. Re: commit ordering issue
          clebert.suconic

          The OperationContext will complete operations as soon as their confirmation come back from the storage.

           

          The Storage shouldn't ever send a confirmation out of the order on syncs. Libaio doesn't guarantee ordering on the callback, but we control this at AsyncFileImpl...

           

          Even if the confirmation came back out of order... the OperationContext would assume it belonged to the correct order and it would deliver the sends in order.

           

           

          The OperationContext is also using an OrderedExecutor that is shared by the Session. So.. and the sends should be in order.

           

           

          And anyway.. I couldn't replicate any issues from your code. The only issue was an assertion at tearDown (It didn't have all the messages consumed). Later I removed that and I started having a OutOfMemoryException. (It appears delivering is happening much faster than consumer as you send transactionally (batching severl messages per transaction) while consume in AutoACK what will make a sync on each message.

           

           

          I've also tried the user's code and I couldn't replicate an issue.

           

           

          Maybe there's some issue with buffering?

          • 2. Re: commit ordering issue
            clebert.suconic

            The test will never failed If I ACKed at a greater rate that's being produced.

             

            If there's an issue with Ordering here will be because of OutOfmemoryExceptions at the server's. (Which is a case under investigation ATM actually)

             

            The only issue I could produce with both Andy's and the user's code was OOME as the destination was not configured to block or page.

            • 3. Re: commit ordering issue
              clebert.suconic
              The test will never failed If I ACKed at a greater rate that's being produced.

              ran it for two hours straight.. no failures. 161.936.000 persistent messages sent/received.

               

              I will run it more two hours like this.

               

               

              so.. it seems definetely related to the OME.

              • 4. Re: commit ordering issue
                timfox

                Even with OOM, messages should *never* be received out of order.

                 

                If this is occurring the problem needs to be understood better.

                • 5. Re: commit ordering issue
                  clebert.suconic

                  I need someone to replicate and confirm out of order issues.

                   

                   

                  The only issue I could replicate on the test was OutOfmemoryExceptions and some calls failing because the server was out of resources.

                  • 6. Re: commit ordering issue
                    jmesnil

                    I confirm I see the out of order thing (using ben spiller's code).

                     

                    From what I saw, the issue is that some messages are delivered "outside a transaction". They jump ahead of the messages which are delivered inside a tx and thus appears the out of order delivery.

                     

                    Why/when does the messages are dealt with outside a tx? After some help from clebert, I think the issues is when we commit paged messages.

                     

                    In PostOfficeImpl.PageMessageOperation.pageMessages(Transaction) which is called before we commit a tx, we have the code

                     

                                   if (message.page(tx.getID(), first))
                                   {
                                      ...
                                   }
                                   else
                                   {
                                      // This could happen when the PageStore left the pageState
                    
                                      // TODO is this correct - don't we lose transactionality here???
                                      route(message, false);
                                   }
                    

                     

                    The route() call will use a null transaction and we will then route the message outside the tx which is completing!

                     

                    I think the issue is fixed by simply making sure the message will be routed in the context of the tx:

                     

                    route(message, tx, false);

                     

                    Am I missing a reason why we would want to lose the transaction *on purpose*?

                    • 7. Re: commit ordering issue
                      clebert.suconic

                      "Am I missing a reason why we would want to lose the transaction *on purpose*?

                       

                      That method is called inside the commit. On Before Commit. So, those messages should be delivered for sure.

                       

                      It's a bit of a challenge here, since we are already inside the TX adding messages to paging.

                       

                      There's a test already on PagingTest (testDepageDuringTransaction) validating the ordering, however that's only testing for PersistentMessages. PersistentMessages will use the callback from the persistence and will only deliver messages after persistence has finished.

                       

                      However, for non-persistent messages... it won't wait any of that and the message will be delivered immediately what is causing the issue. The user's test is not using persistent messages.

                      • 8. Re: commit ordering issue
                        clebert.suconic

                        Please, look again at my previous message as I post edited it... I have used OpenOffice last week.. and I got the mania of pressing Ctrl-S (for saving it) while thinking...  as a result it submitted before I wanted it.

                        • 9. Re: commit ordering issue
                          clebert.suconic

                          Jeff,

                           

                          Can you try the attached patch here?

                           

                          what I changed on the patch is:

                           

                          I can't send the same tx to route here, or we would have issues on when the routing is being executed:

                           

                          On PostOfficeImpl::PageMessageOperation::pageMessages

                           

                           

                                        {

                                            // This could happen when the PageStore left the pageState

                           

                                            route(message, tx, false); // I can't send the tx here

                                         }

                           

                           

                          If I send the same tx here, I will get modification exceptions on the operations, as the code is currently running operations here.

                           

                           

                          And I can't just open a new TX here, and commit when commit is called.

                           

                           

                          So, what I did on the patch was to create a new TX object that will have new collections for the operations, but will use the same id as the original TX. As it will all happen on the same TX on the journal, but using different Collections for the operations.

                           

                          And when the afterCommit, afterPrepare.. etc was called, this will call the aftercommit, afterPrepare.. etc on the TX (I had to expose new methods for this).

                           

                          Everything would happen inside the same TX and at the same natural order.

                           

                          If you could fix the issue with the patch, we would need to validate the changes and try creating a test (stress test maybe) that would replicate the issue.

                          • 10. Re: commit ordering issue
                            timfox

                            I'm not happy with the patch without understanding the issue further.

                             

                            As I understand it, a transaction commits and messages are paged to disk, then before the commit completes the address stops paging, this means the remaining messages are routed directly, outside a tx context. Correct?

                             

                            How does this result in messages being delivered out of order? I would like us all to be sure we understand the problem here before jumping to solutions.

                            • 11. Re: commit ordering issue
                              jmesnil

                              clebert, this patch is not working, it generates a StackOverflowError (http://gist.github.com/473791#file_gistfile1.java)

                               

                              Let me first be sure to understand what/when the issue occurs and I'll fix it. Using transaction's subcontext seems convoluted... and I'm already lost in the paging + routing with tx code

                              • 12. Re: commit ordering issue
                                clebert.suconic

                                There is a caveat on paging. Once it starts paging.. everything should go to the page files. As soon as the page files are consumed it goes out of page mode.

                                 

                                 

                                first issue:

                                 

                                 

                                When sending messages on a transacted session, we verify: isPaging -> the message should be cached until a commit was made.

                                 

                                And I remember I used to do a check on paging If an address was ever paged on a TX all messages to that address should be cached until commit is done. That will definitely cause an issue with ordering here, as the system could enter and leave page mode while the TX is alive.

                                 

                                We need to place a check on ServerMessageImpl::storeIsPaging():

                                 

                                 

                                public boolean storeIsPaging(Transaction tx) // add the tx somehow

                                   {

                                      if (pagingStore != null)

                                      {

                                         if (pagingStore.isPaging())

                                         {

                                              return true;

                                         }

                                         else

                                         {

                                             if (tx != null && tx.isPaging(...getTheAddressSomehow....)) // need to add this method

                                             {

                                                      return true;

                                             }

                                             else

                                             {

                                                       return false;

                                             }

                                         }

                                      }

                                      else

                                      {

                                         return false;

                                      }

                                   }

                                 

                                 

                                 

                                second issue:

                                 

                                 

                                Another issue is with the Postoffice::PageMessageOperation::pageMessages code:

                                 

                                The system could also enter and leave page mode while this loop is done. It would be a rare condition but it could happen.

                                 

                                We should add a message on Paging that would send a whole batch of messages to page. or nothing.

                                 

                                This way we would have everything or nothing on page mode.

                                 

                                for (ServerMessage message : messagesToPage)

                                            {

                                               if (message.page(tx.getID(), first))

                                               {

                                                  if (message.isDurable())

                                                  {

                                                     // We only create pageTransactions if using persistent messages

                                                     pageTransaction.increment();

                                                     pagingPersistent = true;

                                                     pagingStoresToSync.add(message.getPagingStore());

                                                  }

                                               }

                                               else

                                               {

                                                  // This could happen when the PageStore left the pageState

                                 

                                                  route(message, getOrCreateSubTX(tx), false);

                                               }

                                               first = false;

                                            }

                                 

                                 

                                Third issue: (probably non related to ordering)

                                 

                                I would place the messages as part of the same TX somehow when the system is not in page mode any more at commit time. But that would mean change the Operations. A way I thought for this was to create another Transaction that would have a sub-set of collections. (which was on my last patch).

                                • 13. Re: commit ordering issue
                                  jmesnil

                                  Taking a step back to the user issue, I realized he never said he was paging messages.

                                  He confirmed he configured to BLOCK messages. If that's the case, we shouldn't even have to deal with pages but it appears this is the case.

                                   

                                  There is one issue in PagingStoreImpl.isPaging(). If the policy is BLOCK, the method will return isFull() (which is false at the start of the example and then becomes true as there more messages which are produced than consumed).

                                  This is not correct if the policy is to BLOCK, there is not paging and the method should always return false.

                                  With the current code, in PostOfficeImpl.route(), we add a PageMessageOperation to the transaction but this operation will not page the message later on and the message will be routed again outside of the transactional context... wow, this is confusing!

                                   

                                  Clebert showed me https://jira.jboss.org/browse/HORNETQ-209 to rename PagingManager to MemoryManager. Maybe we should do it now, it is very confusing to read the code as it is written (page() means to page except when it drops or blocks...)

                                  Clebert Suconic wrote:

                                   

                                  When sending messages on a transacted session, we verify: isPaging -> the message should be cached until a commit was made.

                                   

                                  And I remember I used to do a check on paging If an address was ever paged on a TX all messages to that address should be cached until commit is done. That will definitely cause an issue with ordering here, as the system could enter and leave page mode while the TX is alive.

                                  Indeed, this looks like what happened to the user (with the "accidental" paging described above).

                                   

                                  Clebert Suconic wrote:

                                   

                                  Third issue: (probably non related to ordering)

                                   

                                  I would place the messages as part of the same TX somehow when the system is not in page mode any more at commit time. But that would mean change the Operations. A way I thought for this was to create another Transaction that would have a sub-set of collections. (which was on my last patch).

                                  I did not understand the third issue. Could you explain it a bit further?

                                  • 14. Re: commit ordering issue
                                    clebert.suconic

                                    "I did not understand the third issue. Could you explain it a bit further?"

                                     

                                    It would be a bit of a rare case.. but it could happen:

                                     

                                    Say, the user is preparing a TX, but the system is not on page mode any more.

                                     

                                    The message will be sent right away on that case.

                                     

                                     

                                    We need to make the messages to be part of the same TX somehow. ATM I could only think how to do it with a sub-TX sharing the same TX ID, but using different collections.

                                    1 2 Previous Next