Journal Clean up / Compactor
clebert.suconic Feb 10, 2009 12:20 PM
Let me first explain the issues we have on the journal:
Issue I - The biggest issue we have on the journal right now is the second criteria on Reclaiming.
The first criteria is a simple reference counting of adds and deletes.
But, as we mix Adds and Deletes on the same files, we can't delete a file if it has deletes towards another file. And if delay ACKing a message at any point, that will lead to a lot of files hanging on the disk.
To fix this first issue, I see two possible solutions:
Solution I) Have a Cleaner Thread removing the links between the files. Something that would phisically eliminate deleted records that will have the second level dependency, and that would allow several files to be reclaimed.
Solution II) Put Deletes and Adds on separate journal files. We would have JournalLogs only used to add, and JournalLogs only used to delete.
Soluion III) Read all valid records, place them on another journal-file and replace the files.
Solution I and II are very easy to implement.
Solution III will have a little extra complexity due to issues on locking. (deletes coming as we are still doing the compacting)
______________________________________________________________________________________________________
Issue II) Supposing you don't have an issue with linked deletes any more (Criteria 2 on reclaiming) , there is a also a need for compacting files.
Say you have an Address where you Produce and Consume Messages very fast, say at a rate of 1000/second, and another Address where you Produce a message without any consumers, say at a rate of 1 message per Second.
This won't be as bad as if you had the linked-deletes, but it may still be an issue depending on the usage.
Solution IV) To solve this we need to compact several files into a single file, as a slight variation of Solution III.
There is some complexity on keeping original file-ids on the file, as the commit holds a count of records per file (transaction-summary) to detect incomplete commits, besides the issue on locks also.
______________________________________________________________________________________________________
So far, independently of any of those solutions, I need to be able to read JournalFiles, outside of the context of Loading the files. For that I have refactored the method load, as a private method readJournalFile(JournalFile....), and I'm passing an interface as a parameter.
This way readJournal will know how to deal with the journal data-format, and the implementation passed by parameter will deal with the specifics of loading or compacting.
I don't want to duplicate the data-format of the journal file among two distinct methods.
(I'm actually considering moving that method to JournalFile, but I'm not sure yet)