5 Replies Latest reply on May 29, 2008 10:28 AM by clebert.suconic

    Some serious issues with new journal code for AIO

    timfox

      Further to what Clebert and I discussed yesterday, and in the process of making some small changes to the journal I have discovered several major bugs / issues in the changes that were added to the journal when integrating the AIO journal.

      1) appendAddRecord(final long id, final byte recordType, final EncodingSupport record):

      Why bother with a callback - much easier just to call write with sync = true? (less code)

      2) public void appendAddRecord(final long id, final byte recordType, final byte[] record) throws Exception

      same as 1)

      3) public void appendDeleteRecord(long id) throws Exception

      same as 1)

      4) public void appendAddRecordTransactional(final long txID, final byte recordType, final long id,
      final byte[] record) throws Exception

      Previous code was always using callbacks - even if the journal didn't support callbacks

      5) public void appendUpdateRecordTransactional(final long txID, byte recordType, final long id,
      final byte[] record) throws Exception

      same as 4)

      6) public void appendDeleteRecordTransactional(final long txID, final long id) throws Exception

      Same as 4)

      7) public void appendPrepareRecord(final long txID) throws Exception

      Two bugs in this one
      a) Previous code was always using callbacks - even if the journal didn't support callbacks
      b) Not waiting for completion on prepare

      8) public void appendRollbackRecord(final long txID) throws Exception

      The bug here is that it doesn't wait for completion of all the transactional operations that came before it. It only waits for completion if the current operation

      Some of these are pretty critical bugs, I can only asume the new functionality test coverage isn't sufficient since these weren't caught.

        • 1. Re: Some serious issues with new journal code for AIO
          timfox

          I have fixed all the issues apart from 7).

          This requires more thought since we probably need to reset the callback somehow - i.e. we need to wait on completion more than once - once at prepare and then again on commit/rollback.

          • 2. Re: Some serious issues with new journal code for AIO
            timfox

            The overloaded methods that Clebert introduced, have a lot of code duplication, e.g.

            public void appendAddRecordTransactional(final long txID, final byte recordType, final long id,
            final EncodingSupport record) throws Exception

            and

            public void appendAddRecordTransactional(final long txID, final byte recordType, final long id,
            final byte[] record) throws Exception

            Are 90% the same.

            The similar parts should be refactored into a common method.

            • 3. Re: Some serious issues with new journal code for AIO
              clebert.suconic

               

              "timfox" wrote:

              8) public void appendRollbackRecord(final long txID) throws Exception

              The bug here is that it doesn't wait for completion of all the transactional operations that came before it. It only waits for completion if the current operation


              A rollback is actually marking an ignore on the previous transaction with the same id... why do we care about them to complete if the journal is marking all the previous transaction as ignore?


              Example:

              Journal is sending 1000 messages on transaction 17.

              Transaction 17 is marked as rollback while Journal is still processing messaging 300.

              All the 700 upcoming messages are ignored. On an eventual recovery you would see a rollback anyway.. so I don't think you need to wait all the 700 messages to complete. (At that point you just need to make sure the rollback is serialized on disk)

              • 4. Re: Some serious issues with new journal code for AIO
                clebert.suconic

                 

                "timfox" wrote:
                The overloaded methods that Clebert introduced, have a lot of code duplication, e.g.


                That's why I created http://jira.jboss.org/jira/browse/JBMESSAGING-1347

                • 5. Re: Some serious issues with new journal code for AIO
                  clebert.suconic

                   

                  "timfox" wrote:

                  Some of these are pretty critical bugs, I can only asume the new functionality test coverage isn't sufficient since these weren't caught.



                  The only real bug I see here is 7. 8 is working as I expected and 1-6 are just simple refactoring as a Callback is aways called even on NIO.

                  As for 1... the sync call on AIO wouldn't make it wait until yesterday.