8 Replies Latest reply on Dec 7, 2009 10:50 AM by clebert.suconic

    TransactionHealthTest failing

    timfox

      Intermittent failure on Hudson:

      http://hudson.qa.jboss.com/hudson/job/HornetQ/2126/testReport/org.hornetq.tests.integration.journal/ValidateTransactionHealthTest/testNIO2MultiThread/


      This is easy to replicate:

      public void test() throws Exception
      {
      for (int i = 0; i < 1000000; i++)
      {
      log.info("****** ITERATION " + i);

      this.testNIO2MultiThread();

      tearDown();

      setUp();

      }
      }

      It fails after a few iterations.

      Now, one thing I notice,if I add a Thread.sleep(500) at the end of the main method of the spawned process:

       if (transactionSize == 0)
       {
       journal.debugWait();
       }
       }
       catch (Exception e)
       {
       this.e = e;
       }
      
       try
       {
       Thread.sleep(500);
       }
       catch (Exception e)
       {
      
       }
      


      It nevers fails, it seems to me that files are still be written to the disk after the process has finished - for NIO this could happen because stuff hasn't been synced from the OS buffers.

      When it *does* fail (without the sleep), I added some debug in the orderFiles() method:

      int bytesRead = file.read(bb);
      
       int fileID;
       try
       {
       fileID = bb.getInt();
       }
       catch (BufferUnderflowException e)
       {
       log.info("read " + bytesRead + " bytes");
      
       throw e;
       }
      


      This shows the file has size of zero (it returns -1).

      Perhaps something is creating a new file and not flushing it before the spawned process completes?

        • 1. Re: TransactionHealthTest failing
          timfox

          Grrrrr....

          This is now fixed.

          It was a broken test that wasn't calling stop() on the journal when it was done with it. This meant the executors weren't being shut-down.

          • 2. Re: TransactionHealthTest failing
            clebert.suconic

            The test was not supposed to stop the journal.

            The test was supposed to crash *right after* commits.


            The fix would be on createFile. Where we would only rename the files to the final extension after the file was finished.


            I'm not going to commit it since Andy already finished the release. This is not critical IMO as there is a workaround. (remove any truncated files after crash).

            private JournalFile createFile(final boolean keepOpened,
             final boolean multiAIO,
             final boolean fill,
             final boolean tmpCompact) throws Exception
             {
             int fileID = generateFileID();
            
             String fileName;
            
             if (tmpCompact)
             {
             fileName = filePrefix + "-" + fileID + "." + fileExtension + ".cmp";
             }
             else
             {
             fileName = filePrefix + "-" + fileID + "." + fileExtension;
             }
            
             if (trace)
             {
             trace("Creating file " + fileName);
             }
            
             String tmpFileName = fileName + ".tmp";
            
             SequentialFile sequentialFile = fileFactory.createSequentialFile(tmpFileName, maxAIO);
            
             sequentialFile.open(1, false);
            
             if (fill)
             {
             sequentialFile.fill(0, fileSize, FILL_CHARACTER);
            
             ByteBuffer bb = fileFactory.newBuffer(SIZE_HEADER);
            
             bb.putInt(fileID);
            
             bb.rewind();
            
             sequentialFile.writeDirect(bb, true);
             }
            
            
             sequentialFile.close();
            
             sequentialFile.renameTo(fileName);
            
             if (keepOpened)
             {
            
             if (multiAIO)
             {
             sequentialFile.open();
             }
             else
             {
             sequentialFile.open(1, false);
             }
             }
            
             return new JournalFileImpl(sequentialFile, fileID);
             }
            




            • 3. Re: TransactionHealthTest failing
              clebert.suconic
              • 4. Re: TransactionHealthTest failing
                timfox

                 

                "clebert.suconic@jboss.com" wrote:
                The test was not supposed to stop the journal.



                Ok, so add a big comment in the file explaining the test is not supposed to stop the journal, since it's not clear otherwise.

                • 5. Re: TransactionHealthTest failing
                  timfox

                   

                  "clebert.suconic@jboss.com" wrote:
                  The test was not supposed to stop the journal.

                  The test was supposed to crash *right after* commits.



                  But the journal did not crash.

                  The test is not valid.

                  It's not valid to read a journal if it's still running, which is what is occurring in this case.

                  If you want a test that *crashes* a journal, then write a test that crashes the journal. Trying to read a journal that is still running is not the same thing.

                  • 6. Re: TransactionHealthTest failing
                    clebert.suconic

                    The test is spawning a new process that will append and commit to the journal.

                    As soon as the process is done, (i.e.... exited, without stop the journal). the caller will reload the journal.

                    This test is not loading the journal while another process is also using it.

                    • 7. Re: TransactionHealthTest failing
                      timfox

                      I still don't think it is a good test.

                      If you want a test that tests what happens if you try and load a journal that has crashed then you should be killing the journal process (not just not stopping the journal).

                      That would be a much better test than just not stopping the journal.

                      • 8. Re: TransactionHealthTest failing
                        clebert.suconic

                        That was the intention on the test...

                        I have changed a System.exit to Runtime.getRuntime().halt() to make sure it behaves like a crash / kill.