-
1. Re: Infinispan Binary Store - Preview
hchiorean Aug 6, 2012 10:25 AM (in response to tfromm)Some questions/answers:
- regarding the testing, we have this: https://issues.jboss.org/browse/MODE-1590 which will hopefully unify all the testing for the various binary stores implementations and provide an easy way to use different configurations
- what kind of data would the "metadata-cache" store for a binary value ?
- storeValue does trigger text extraction, but this is not something that different binary store implementations should worry about. The text extraction process should work the same way, regardless of the implementation, by extending AbstractBinaryStore. The different implementations should only "worry" about the storeExtractedText and getExtractedText methods. I guess this shows that there is a certain degree of "responsibility ambiguity" among the BinaryStore interface and AbstractBinaryStore class.
-
2. Re: Infinispan Binary Store - Preview
rhauch Aug 6, 2012 10:31 AM (in response to tfromm)Thomas Fromm wrote:
Hi,
I've agreed rhauch to provide an BinaryStore impl for Infinispan.
Today I've commited at my branch a first shot:
https://github.com/kosch/modeshape/commit/2cfe15c78b3a60ab82c15dfa46afd01c3e9437f5
Very nice. BTW, you can create a pull-request and in the description say that you're seeking feedback and that it's not ready to be merged. The pull-request will be automatically updated when you make changes locally and either push new commits to the same branch or amend and push the last commit (effectively rewriting it). The benefit of this is that it's easier to get line-by-line comments (e.g., what GitHub calls "line notes").
To the impl:
As you can see I've not used GridFS of Infinispan at least due lack of capability of storing additional properties.
However I use the same concept: One cache for the metadata of the blob, 2nd cache for the blob itself divided into chunks. The default chunk size is now 5 MB, means an entry at the blob cache can have a size up to 5MB. Maybe reduce the size to 1MB? Maybe easiert to calculate when BLOB-Cache has eviction configuration... (until Infinispan provides size-based eviction :-)
That looks good. The Infinispan folks describe GridFS as a "prototype", so I think it's certainly okay to do it the way you did. Regarding eviction, can't we use a LRU strategy for the cache. Yes, it requires a bit of computation, but I think people should plan on tuning the configuration (and the chunk size, the number of entries, eviction strategy, etc.) for their environment. (Once we start performance optimization, we may find that certain values may be optimal for lots of situations, in which case we can change the defaults).
The biggest point still open: The integration into Modeshape configuration. Maybe this is better to discussed in #modeshape. I need at least 2 cache names.
It's okay to have two cache names; we do that when configuring ModeShape to store the Lucene indexes in Infinispan; see here for the lines in our JSON Schema. Any chance that two of those three cache names would work for the Infinispan binary store?
-
3. Re: Infinispan Binary Store - Preview
rhauch Aug 6, 2012 10:40 AM (in response to tfromm)- Has modeshape some kind of central tmp directory except the default on uses in the jvm?
Considering that both the DatabaseBinaryStore, InfinispanBinaryStore, and the MongodbCacheStore all need some form of temp space to write the content before storing it (e.g., to compute the BinaryKey), perhaps we need a new subclass of AbstractBinaryStore that encapsulates this. Not sure if that would really simplify all three, but it might be worth considering.
-
4. Re: Infinispan Binary Store - Preview
rhauch Aug 6, 2012 10:45 AM (in response to hchiorean)- storeValue does trigger text extraction, but this is not something that different binary store implementations should worry about. The text extraction process should work the same way, regardless of the implementation, by extending AbstractBinaryStore. The different implementations should only "worry" about the storeExtractedText and getExtractedText methods. I guess this shows that there is a certain degree of "responsibility ambiguity" among the BinaryStore interface and AbstractBinaryStore class.
It does seem like we need to refactor the base class. If we really think the methods implemented in AbstractBinaryStore will never need to be overridden, then the easiest approach might be to simply mark them as "final".
Alternatively, we could extract the basic logic operations (e.g., storeExtractedText and getExtractedText) used in the AbstractBinaryStore to a separate interface/class, which the concrete subclass of AbstractBinaryStore provides in the constructor. This is less straightforward.
Any other ideas? On one hand, I'd like to improve this framework; on the other, I don't know that we'll really have that many more implementations. (Perhaps I'm wrong.)
-
5. Re: Infinispan Binary Store - Preview
tfromm Aug 6, 2012 10:50 AM (in response to hchiorean)- regarding the testing, we have this: https://issues.jboss.org/browse/MODE-1590 which will hopefully unify all the testing for the various binary stores implementations and provide an easy way to use different configurations
Ok, hm. Nobody works on it, so the AbstractBinaryStoreTest could be used to fix also this issue. Need just to extend the current tests (some are still missing) and port the other BinaryStore Impls.
- what kind of data would the "metadata-cache" store for a binary value ?
The Metadata object holt in the cache contains stuff like unused timestamp, last modified, size, mimetype. I assume the cache for the metadata will be in the most cases a replicated cache to provide fast access to the metadata of an binary entry.
-
6. Re: Infinispan Binary Store - Preview
hchiorean Aug 6, 2012 10:56 AM (in response to rhauch)It seems to me as the preemptive text extraction is causing these problems: the storeValue method is something each type of binary store should implement, while on the other hand, the text extraction mechanism should always be triggered. I think we might need to either extract another class (private to the -jcr module) and use that or somehow "minimize" the scope of the BinaryStore interface
-
7. Re: Infinispan Binary Store - Preview
rhauch Aug 6, 2012 10:59 AM (in response to tfromm)- regarding the testing, we have this: https://issues.jboss.org/browse/MODE-1590 which will hopefully unify all the testing for the various binary stores implementations and provide an easy way to use different configurations
Ok, hm. Nobody works on it, so the AbstractBinaryStoreTest could be used to fix also this issue. Need just to extend the current tests (some are still missing) and port the other BinaryStore Impls.
We just logged the issue this past Thursday. And actually MODE-1590 deals just as much with the Maven environment and being able to specify the repository configuration used for the unified BinaryStore tests as it does refactoring the current tests into a single BinaryStoreTest class (or whatever it's called). Note that each impl may still want some unit tests, but basic functionality should be tested with the BinaryStoreTest class.
- what kind of data would the "metadata-cache" store for a binary value ?
The Metadata object holt in the cache contains stuff like unused timestamp, last modified, size, mimetype. I assume the cache for the metadata will be in the most cases a replicated cache to provide fast access to the metadata of an binary entry.
I agree - the main reason for breaking the "metadata" cache into a separate cache is so that it can be replicated, whereas the regular "data" cache will have different modes (e.g., perhaps replicated in a small-ish cluster, and distributed for almost every other configuration).