We've recently encountered an issue where we are trying to import a relatively large number of documents into a Modeshape JCR repository. For now, we are storing the content using file-based binary storage on the local filesystem - however, the size of the modeshape.repository file is growing much larger and quicker than we anticipated/expected. We're currently using Modeshape 5.4.1.Final (although we temporarily rolled back to 5.0.0.Final and observed the same behaviour there).
As a quick test, I wrote a simple tool which could be configured to repeatedly add a configurable number of nt:file nodes including a 10240 byte (10kb) binary attachment (nt:content node) using the JcrTools.uploadFile Java API. I also configured the repository that any file greater than 1024 bytes (1kb) would be stored in the binary store (minimumBinarySizeInBytes = 1024) rather than in the persistence storage (modeshape.repository). I repeated the process with 100, 1000 and 10000 files - from the figures below, the overall size of the binary store grew in line with expected results, however, the size of the persistence repository seemed to grow almost expontentially - is this expected behaviour - is there really a lot of overhead in the storage of simple nt:folder/file/content nodes?
100 x 10kb binary files: Total Binary Storage (~1MB). Total Persistance storage (1.9MB)
1000 x 10kb binary files: Total Binary Storage (~10MB). Total Persistance storage (62MB)
10000 x 10kb binary files: Total Binary Storage (~100MB). Total Persistance storage (3.76GB!!!)
I feel like I'm missing something fundamental here - please tell me I'm missing something fundamental!! Is something being cached within the modeshape.repository file - if so, how can that space be freed up - and can it be programatically forced - garbage collection maybe? Shutting down and restarting our Spring-based application (within which Modeshape is embedded) has no effect on the overall size of the persistence storage.