8 Replies Latest reply on Jun 13, 2012 10:54 AM by hchiorean

    Where should ModeShape 3 store indexes (and binaries) by default?

    rhauch

      Neil tried to use ModeShape 3 (outside of AS7), and ran into problems with queries that would return no results after restart. The source of the problem was that so far ModeShape 3 stores the Lucene indexes in-memory by default, although this is very easily remedied in the configuration by simply providing a specific location. Neil asked if there should be some sort of warning, so I'd like to discuss the default behavior in this thread.

       

      Our initial thinking (even with ModeShape 2) was that we know very little about where the application is running, so we should behave in a transient way. This has several issues: the data is lost if the engine is shutdown, and the memory footprint is pretty high (since we're storing nearly everything in memory only). Consequently, I no longer think this is the best approach, but I'd like to get your impressions of what the default behavior should be. I can think of two primary possibilities:

       

      1. Transient by default - This is basically the current behavior, where binaries are stored in a temporary directory, content is stored in-memory, and indexes are stored in-memory. This results in a fairly large memory footprint, which we could reduce by storing the indexes in a temporary directory as well, and even by default setting up an internal Infinispan cache with a file-system cache store that also stores in a temporary directory. Regardless, this behavior means that as soon as the engine is shutdown, the data is all lost.
      2. Persistent by default - This would change the behavior so that all data is stored locally on the file system within the directory where the application is being run. So for example, we could create a directory with the repository name, and in that directory create a "store" directory (where we'd persist the Infinispan cache), a "binaries" directory for binary storage, and an "indexes" directory for the Lucene indexes. The benefit is that all the repository data survives a restart with no extra configuration, and the memory footprint is smaller. It also would (hopefully) be clear to the user that if these directories were created and they don't like the location, they would attempt to change them.

       

      What do you think? Which makes sense for the zero/minimal configuration case?

        • 1. Re: Where should ModeShape 3 store indexes (and binaries) by default?
          jonathandfields

          I vote for #2 persistent.

           

          For the AS7 deployment kit, I would expect the data files to be stored in a "modeshape" directory in the JBoss server data directory, along with Hornetq etc. (I haven't yet tried Modeshape 3 so forgive me if that's already the case). Since that is the only way that I will be using Modeshape, I do not have a preference for the standalone or embedded deployments.

           

          I think that more log messages stating where data is located  located can avoid confusion.

          • 2. Re: Where should ModeShape 3 store indexes (and binaries) by default?
            jonathandfields

            On the topic of configuration and defaults, I'd like to suggest that ModeShape provides Infinisipan configuration examples to help the new user get started. I would guess that some ModeShape users like me have little or no Inifinispan experience. Perhaps a section in the Wiki documentation with some examples and quick starts for common configurations (both with AS7 and without, but I'm mostly interested with AS7).

             

            For example, I would like to configure a distributed Infinispan cache over multiple servers, with a ModeShape app running in AS7, to create an in-memory content grid. How would that be accomplished? What would the Inifinispan configuration be? How do I start Infinispan on all of the nodes (other than the node running ModeShape and AS7).... This might be obvious to some, but to me, it is not obvious from the Infinispan docs....

            • 3. Re: Where should ModeShape 3 store indexes (and binaries) by default?
              kbachl

              I second the points of Jonathan. Frist ModeShape should by default not loose anything (its easier to loose things later than to request things that are already lost....) and I also would liek to see example configs that are for the simple use cases like:

               

              - data is stored in infinispan but backed on disk in a folder

              - data is stored in infinispan but backed by a jdbc accessed RDBMS

              - data is stored in memory (no backing)

              - data is stored in distributed infinispan where on each node a backup is put on disk in a folder

              • 4. Re: Where should ModeShape 3 store indexes (and binaries) by default?
                rhauch

                These are all really good points. We absolutely do need to provide some configuration recipes; I'll add them to the list.

                • 5. Re: Where should ModeShape 3 store indexes (and binaries) by default?
                  rhauch

                  Added MODE-1511 to cover the change to the default behavior.

                  • 6. Re: Where should ModeShape 3 store indexes (and binaries) by default?
                    hchiorean

                    I agree that from a client perspective the data should be persistent out-of-the box.

                     

                    However, we need to make sure if we change the default configuration, that our tests still use the in-memory settings

                    • 7. Re: Where should ModeShape 3 store indexes (and binaries) by default?
                      rhauch

                      However, we need to make sure if we change the default configuration, that our tests still use the in-memory settings

                      That's an excellent point, Horia. I hadn't thought of that aspect, but I completely agree. Interestingly, our unit tests need to be set up in a special way anyway, because of the testing requirements for Infinispan (e.g., the cache container needs to be killed after each test to clean out any cached information and to prevent content created in one test from leaking into other tests). We already have an AbstractJcrRepositoryTest (with SingleUseAbstractTest and MultiUseAbstractTest subclasses) that we're hopefully using in as many places as we can. Perhaps we need to use them more consistently, or perhaps we need to update how we set up our repositories for unit testing. WDYT?

                      • 8. Re: Where should ModeShape 3 store indexes (and binaries) by default?
                        hchiorean

                        I'd go with "unifying" our testing approach and using by default our base test classes (hopefully for all tests). We just need to be carfeful of variations like: testing with a real JTA implementation (like a good number of our tests do - jboss jta) while keeping the storages in memory.