11 Replies Latest reply on Feb 17, 2009 7:29 AM by timfox

    Journal Clean up / Compactor

    clebert.suconic


      Let me first explain the issues we have on the journal:

      Issue I - The biggest issue we have on the journal right now is the second criteria on Reclaiming.

      The first criteria is a simple reference counting of adds and deletes.

      But, as we mix Adds and Deletes on the same files, we can't delete a file if it has deletes towards another file. And if delay ACKing a message at any point, that will lead to a lot of files hanging on the disk.


      To fix this first issue, I see two possible solutions:

      Solution I) Have a Cleaner Thread removing the links between the files. Something that would phisically eliminate deleted records that will have the second level dependency, and that would allow several files to be reclaimed.

      Solution II) Put Deletes and Adds on separate journal files. We would have JournalLogs only used to add, and JournalLogs only used to delete.

      Soluion III) Read all valid records, place them on another journal-file and replace the files.


      Solution I and II are very easy to implement.

      Solution III will have a little extra complexity due to issues on locking. (deletes coming as we are still doing the compacting)

      ______________________________________________________________________________________________________

      Issue II) Supposing you don't have an issue with linked deletes any more (Criteria 2 on reclaiming) , there is a also a need for compacting files.

      Say you have an Address where you Produce and Consume Messages very fast, say at a rate of 1000/second, and another Address where you Produce a message without any consumers, say at a rate of 1 message per Second.

      This won't be as bad as if you had the linked-deletes, but it may still be an issue depending on the usage.

      Solution IV) To solve this we need to compact several files into a single file, as a slight variation of Solution III.

      There is some complexity on keeping original file-ids on the file, as the commit holds a count of records per file (transaction-summary) to detect incomplete commits, besides the issue on locks also.


      ______________________________________________________________________________________________________


      So far, independently of any of those solutions, I need to be able to read JournalFiles, outside of the context of Loading the files. For that I have refactored the method load, as a private method readJournalFile(JournalFile....), and I'm passing an interface as a parameter.

      This way readJournal will know how to deal with the journal data-format, and the implementation passed by parameter will deal with the specifics of loading or compacting.

      I don't want to duplicate the data-format of the journal file among two distinct methods.


      (I'm actually considering moving that method to JournalFile, but I'm not sure yet)

        • 1. Re: Journal Clean up / Compactor
          timfox

          I don't understand the issue with "linked list" and I don't understand you're solution to it (you didn't provide enough detail).

          If I have three files

          F1, F2, F3

          (A = add, D = delete)

          F1 contains:
          A1, A2

          F2 contains:
          D1, D2, A3, A4

          F3 contains:
          D3, D4

          The current algorithm should delete all files.

          Are you saying it is not?

          • 2. Re: Journal Clean up / Compactor
            clebert.suconic

            The problem is when one record is *not* deleted.

            f1: A1, A2 (A2 won't be deleted for a while)

            f2: D1, A3, A4

            f3: D3, D4 A5 A6

            f4: D5, D6, A7, a8

            f5: d7, d8, A9, A10


            F2 won't be reclaimed because f1 is not completed yet. F3 won't be reclaimed because it has waiting records on F2... F4->F3, F5->F4


            If I phisically removed A1 from f1 (as it is deleted & confirmed already), f2, f3, f4 and f5 will be instantly reclaimable.



            That's the common scenario that would break reclaiming.

            • 3. Re: Journal Clean up / Compactor
              clebert.suconic

              So... if you keep a single record on any file...

              You can have 1 million adds, 1 million deletes .... and you will have about thousand files on the journal, when you only needed 1.

              • 4. Re: Journal Clean up / Compactor
                timfox

                Right, but that's the job of the compactor - to physically remove records that have been deleted - then the problem disappears.

                • 5. Re: Journal Clean up / Compactor
                  clebert.suconic

                  Yes.. but there are two ways of doing this:


                  I - doing only a cleanup on the file, what would only remove the records that are causing issues, and letting reclaiming to work. Which is a very simple approach to implement.

                  II - Doing a full compactation of files, merging multiple files, what will have more complexity, and it would require a little bit more of time to implement.

                  • 6. Re: Journal Clean up / Compactor
                    timfox

                    Tried to ping you on IRC but you'd gone.

                    Anyway, after some reflection I think you're idea of moving those records that cause the "linked list" problem in memory is good. Note however you only need to change the JournalFile objects and counts in memory - you don't need to change anything on disk. Doing so will make normal reclaim kick in and delete the files.

                    Be careful though to make sure if you delete one file you delete them all - or you might end up deleting the files with deletes in them, leaving the adds!

                    On top of that, we also need to solve the other reclaiming issue I talked of before. Undeleted records in a file surrounded by deleted records. This can only be solved by compaction.

                    • 7. Re: Journal Clean up / Compactor
                      clebert.suconic

                       

                      "timfox" wrote:
                      Tried to ping you on IRC but you'd gone.

                      Anyway, after some reflection I think you're idea of moving those records that cause the "linked list" problem in memory is good. Note however you only need to change the JournalFile objects and counts in memory - you don't need to change anything on disk. Doing so will make normal reclaim kick in and delete the files.


                      The idea was to update the file on disk, + change the counters in memory, so case there is a reload, after the linked files were deleted, the data wouldn't be duplicated.

                      • 8. Re: Journal Clean up / Compactor
                        timfox

                         

                        "clebert.suconic@jboss.com" wrote:

                        The idea was to update the file on disk, + change the counters in memory, so case there is a reload, after the linked files were deleted, the data wouldn't be duplicated.


                        Why would you need to update the file on disk? If you update in memory, then the files get deleted, so they won't get reloaded... What am I missing here?

                        • 9. Re: Journal Clean up / Compactor
                          clebert.suconic

                           

                          "Tim" wrote:
                          Why would you need to update the file on disk? If you update in memory, then the files get deleted, so they won't get reloaded... What am I missing here?



                          Take my initial example in consideration:


                          f1: A1, A2 (A2 won't be deleted for a while)
                          
                          f2: D1, A3, A4
                          
                          f3: D3, D4 A5 A6
                          
                          f4: D5, D6, A7, a8
                          
                          f5: d7, d8, A9, A10
                          ....
                          
                          f1000: d998, d999, a1000, a1001
                          
                          


                          You have 1000 files hanging on A1. So what I need to do on the above example is to phisically update f1, so f1 will be:

                          f1: A2 (There is no more A1 on the file).


                          And remove the memory dependency between f1 and f2.


                          After I do that, 999 files will be reclaimed or deleted.


                          If I don't update F1, during reload A1 would appear back, as the delete D1 is gone with the file.

                          • 10. Re: Journal Clean up / Compactor
                            clebert.suconic

                            This is a summary of the main changes:

                            http://fisheye.jboss.org/browse/Messaging/trunk/src/main/org/jboss/messaging/core/journal/impl/JournalImpl.java?r1=5858&r2=5864

                            and

                            http://fisheye.jboss.org/browse/Messaging/trunk/src/main/org/jboss/messaging/core/journal/impl/JournalFile.java?r1=5387&r2=5864


                            - JournalFile..
                            I have added:

                             void addCleanupInfo(long id, JournalFile deleteFile);
                             JournalFile getCleanupInfo(long id);
                            


                            I need to know what records are possibly bad. If a delete happened on a different file, I need that information to later perform a delete. The cleanupInfo will give me that List.


                            On JournalImpl, when a record is deleted, if the record is on a different file, I add the Information on that list:

                            inner class PosFiles:
                            ...
                            
                             void addDelete(final long id, final JournalFile deleteFile)
                             {
                             if (addFile != deleteFile)
                             {
                             addFile.addCleanupInfo(id, deleteFile);
                             }
                            
                             deleteFile.incNegCount(addFile);
                            
                             if (updateFiles != null)
                             {
                             for (JournalFile updateF : updateFiles)
                             {
                             if (addFile != updateF)
                             {
                            
                             // ... (some comments suppressed here)
                             updateF.addCleanupInfo(id, deleteFile);
                             }
                            
                             deleteFile.incNegCount(updateF);
                             }
                             }
                             }
                            



                            During the cleanup, if a record is on that list, I change the field-type for that record as CLEARED.


                            readJournalFile(journalFile, new JournalReader()
                             {
                            
                             public void addRecord(final int recordPos, final RecordInfo recordInfo) throws Exception
                             {
                             JournalFile cleanupFile = journalFile.getCleanupInfo(recordInfo.id);
                             if (cleanupFile != null)
                             {
                             if (trace)
                             {
                             trace("Cleaning addRecord id = " + recordInfo.id);
                             }
                            
                             ByteBufferWrapper buffer = generateAddRecord(false,
                             fileID,
                             recordInfo.id,
                             recordInfo.userRecordType,
                             new ByteArrayEncoding(recordInfo.data));
                            
                             buffer.rewind();
                            
                             sf.position(recordPos);
                             sf.write(buffer.getBuffer(), false);
                            
                             // Eliminating the dependency between a and b
                            
                             cleanupFile.decNegCount(journalFile);
                             journalFile.decPosCount();
                             }
                            
                             }
                            




                            And to support reading JournalFiles for different purposes other than loading, I have created a method readJournalFile, that will be used by both load, cleaning, and compacting (in the future). The logic of reading the file is kept on a single place:


                             // Used by both load, cleanup, and compact (in future)
                             private int readJournalFile(final JournalFile file, final JournalReader reader) throws Exception
                             {
                            


                            • 11. Re: Journal Clean up / Compactor
                              timfox

                              Can you pls revert your changes like we discussed?