2 Replies Latest reply on Sep 16, 2015 8:42 AM by dankelleher

    Modeshape 4.x mapdb index sizes

    wesssel

      Hey everyone, I am currently experimeting with using Modeshape 4.x. In Modeshape 3.x everything was indexed with Lucene and Hibernate Search, some pretty powerful technologies. My initial impression is that the MapDB index is lot bigger than the Lucene indexes were. In Lucene after indexing some ~2 million nodes with about 10 string properties and all the basic JCR properties this takes up around 4 gig of data. I've been adding dummy nodes to Modeshape 4 and am currently at 5 million nodes with just an index on the jcr:name attribute and I'm already at 6.7 gig index size.

       

      Index provider and defined indexes:

                      <index-providers>

                          <index-provider name="local2" classname="org.modeshape.jcr.index.local.LocalIndexProvider" relative-to="/" path="data/modeshape-indexes"/>

                      </index-providers>

                      <indexes>

                          <index name="nameIndex" provider-name="local2" synchronous="false" node-type="nt:base" columns="jcr:name(NAME)"/>

                      </indexes>

       

      I'm curious to what other people are experiencing with regards to indexing performance in version 4.x

        • 1. Re: Modeshape 4.x mapdb index sizes
          ma6rl

          wesssel, I'm seeing similar issues with index sizes and share your concerns about the size of the MapDB index.

           

          The other problems I'm running into are:

           

          - the time it takes for new instances to create an index from scratch when adding additional instances to a cluster.

          - the lack of support for LIKE and Full Text Search.

           

          At the moment adding Lucene as an index provider is scheduled for 4.4 but it does not look like any work has been done on it yet [MODE-2159] Store indexes in local Lucene - JBoss Issue Tracker. This may well help with the Index size and Like/Full Text queries but most likely will not help with the time it takes for a new cluster instance to build it's index.

           

          Given this I'm currently looking at what it would take to add support for Solr or ElasticSearch as an index provider to Modeshape. I know they are on the roadmap [MODE-2161] Store indexes in Solr - JBoss Issue Tracker, [MODE-2162] Store indexes in ElasticSearch - JBoss Issue Tracker but there are not currently any resources assigned to work on them. Would adding support for either of these be of use to you and if so which one would be your preference? At the moment I'm leaning towards ElasticSearch as we run our instances in AWS and they provide a hosted ElasticSearch service which would make our lives easier.

          • 2. Re: Modeshape 4.x mapdb index sizes
            dankelleher

            I'm having exactly the same problem with index sizes at the moment on Modeshape 4.1.0 and would be interested in knowing if  there are any best practices we can follow, or if reverting to Lucene (or to ElasticSearch/Solr) is an option.