Journal Cleanup and Journal Compactor
clebert.suconic Jun 9, 2009 4:16 PMAs documented, and as all the JBM developers known, the journal is append-only. When we delete a record we actually append a delete record. When every record of a file has a delete somewhere else, that file is marked as reclaimable. It could then be reused or deleted depending on the number of files in use.
However, there is an issue when a record is not deleted. Say..the consumer will need a couple days to come and consume his messages. (DLQs... or I could list a few usecases for this)
If a file has any number of records that will never or take a long time to be deleted, the relationship between files will create what I called the "linked-list-effect" on the journal.
You would have something like:
JournalFile 1:
- Add Record1
- AddRecord2
JournalFile 2:
- DeleteRecord2
- AddRecord3
JournalFile 3:
- DeleteRecord3
- AddREcord4
....
JournalFile 1000:
- DeleteRecord 1000
- AddRecord 10001
We need to somehow remove the dependencies between the files for when this happens. My current suggestion is to Update JournalFile1, in a way you don't need any records from other files to process the delete. (I call this as journal cleanup).
So you would have:
JournalFile 1:
- Add Record 1
- AddRecord 2 (XXXXX... removed .... XXXX)
By doing this.. you are ready to delete 999 files that were dependent on the JournalFile1.
There is another problem also.
Say.. that you now have 100 records pending.. but one record on each journal File. You would be using 1GiB of your disk to store about just 10KiB of real data. For that I would need to compact all the 100 records in a single journal-file and saving space this way. (I call this as journal compacting).
You may think that journal compacting would solve cleanup, but there are a few tricky situations:
I - you may need to move more data than required.
II - It doesn't solve the linked-list effect. As soon as you compact, the records will still be there. The linked-list effect will happen again for sure. You are in an infinite loop of compacting.. compacting.. compacting.
So.. my conclusion: Both solutions are equally necessary.
I - We need to fix the linked-list effect anyway. If not by updating the file.. we need to find some other way.
II - We need to fix the compacting issue. It's not acceptable using 1GiB or any number higher than that to store just a few records.