2 Replies Latest reply on Jul 29, 2010 11:04 AM by clebert.suconic

    Messages building up on the journal because of postACK

    clebert.suconic

      Say you are sending and ack messages transactionally at a rate of 5k messages per commit.

       

      We are deleting the messages after the commit of the ACKs, on PostAcknoledge, right here:

       

       

      QueueImpl::postAcknowledge(final MessageReference ref)
      
      
      ...
      
      
      if (durableRef)
      {
      
      
      ...
      
      
      storageManager.deleteMessage(message.getMessageID());
      
      

       

       

      It happens that appendDeleteRecord done here is sync (syncNonTransaction's default is true as far as I know):

       

       

       

      JournalStorageManager::deleteMessage:

       

       

       

      public void deleteMessage(final long messageID) throws Exception
         {
            messageJournal.appendDeleteRecord(messageID, syncNonTransactional, getContext(syncNonTransactional));
         }
      

       

       

       

      So, we are ACKing messages possibly at 5K message / commit, while we are deleting messages individually synchronously.

       

      We would lose the deletes anyway case the server dies with these messages built up...on the Executor's queue.

       

      My suggestion to fix this problem is:

       

       

      I - delete messages with sync=false on postACK (at least for now)

      II - at a later point. do a proper sync with a batch of post ACKed messaged. Maybe a single TX.

        • 1. Re: Messages building up on the journal because of postACK
          timfox

          The delete is done lazily outside the transaction since we don't lose transactionality if the delete is lost.

           

          What should happen is on startup, the journalstoragemanager load code should detect the message has been acked from all queues so can delete it / ignore it

          • 2. Re: Messages building up on the journal because of postACK
            clebert.suconic

            I know the TX is not lost and the delete happens outside of the context of the TX.

             

            My point is.. it's taking a while to execute these deletes in the case of a sustained insertion with large transactions.

             

            My suggestion is to always execute these deletes with sync=false. Sync=true here is not guaranteeing you anything.. quite the opposite actually, it will increase the chances of losing these deletes.

             

            Since we need a startup check anyway, I suggest we make sync=false on deleting messages on post-ack, and add the startup check for messages without references.

             

            That's a low hanging fruit for an optimization, as we would release space on the journal faster at those cases.