3 Replies Latest reply on Oct 27, 2014 2:22 AM by hchiorean

    Questions regarding indexing

    dalbani

      Hello,

       

      Before opening yet another invalid bug report, I have a couple of questions / remarks regarding indexing.

       

      First, is it expected to have *large* index files?

      In my setup, on the one side, I have a LevelDB-based (non binary) store of around 400 MB.

      On the other side, the MapDB index files amount to almost 5 GB!? The local-indexes.db.t file is the largest by far.

      I have around 20 indexes defined, mainly on STRING columns.

       

      Other question, more of a bug report. Here's the exception that I get when I use Workspace.reindex():

       

      2014-10-23 22:00:06,678 ERROR [org.modeshape.jcr.RepositoryIndexManager$ScanningRequest] (modeshape-reindexing-6-thread-2) Error while indexing '/' in workspace 'default': null: java.lang.NullPointerException
              at org.mapdb.DataOutput2.writeUTF(DataOutput2.java:147) [mapdb-1.0.6.jar:]
              at org.mapdb.Serializer$1.serialize(Serializer.java:70) [mapdb-1.0.6.jar:]
              at org.mapdb.Serializer$1.serialize(Serializer.java:67) [mapdb-1.0.6.jar:]
              at org.modeshape.jcr.index.local.MapDB$UniqueKeyBTreeSerializer.serialize(MapDB.java:434) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.mapdb.BTreeMap$NodeSerializer.serialize(BTreeMap.java:385) [mapdb-1.0.6.jar:]
              at org.mapdb.BTreeMap$NodeSerializer.serialize(BTreeMap.java:288) [mapdb-1.0.6.jar:]
              at org.mapdb.Store.serialize(Store.java:154) [mapdb-1.0.6.jar:]
              at org.mapdb.StoreWAL.update(StoreWAL.java:403) [mapdb-1.0.6.jar:]
              at org.mapdb.Caches$HashTable.update(Caches.java:269) [mapdb-1.0.6.jar:]
              at org.mapdb.BTreeMap.put2(BTreeMap.java:746) [mapdb-1.0.6.jar:]
              at org.mapdb.BTreeMap.put(BTreeMap.java:643) [mapdb-1.0.6.jar:]
              at org.modeshape.jcr.index.local.LocalDuplicateIndex.add(LocalDuplicateIndex.java:90) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.index.local.IndexChangeAdapters$SingleValuedPropertyChangeAdapter.addValues(IndexChangeAdapters.java:587) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.index.local.IndexChangeAdapters$AbstractPropertyChangeAdapter.reindexNode(IndexChangeAdapters.java:490) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.spi.index.provider.IndexChangeAdapter.index(IndexChangeAdapter.java:66) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.spi.index.provider.IndexProvider$7.add(IndexProvider.java:802) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.spi.index.provider.IndexProvider$1.add(IndexProvider.java:190) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.RepositoryQueryManager.reindexContent(RepositoryQueryManager.java:484) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.RepositoryQueryManager$2$1.scan(RepositoryQueryManager.java:279) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.RepositoryIndexManager$ScanningRequest.onEachPathInWorkspace(RepositoryIndexManager.java:889) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.RepositoryQueryManager$2.call(RepositoryQueryManager.java:285) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at org.modeshape.jcr.RepositoryQueryManager$2.call(RepositoryQueryManager.java:252) [modeshape-jcr-4.0.0.Final.jar:4.0.0.Final]
              at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_72]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_72]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_72]
              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_72]
      

       

      What's strange is that it doesn't happen when using Workspace.reindexAsync()?!

       

      Thanks.

        • 1. Re: Questions regarding indexing
          hchiorean

          The exception does look like a bug, so please feel free to open a JIRA for it. Also, make sure you attach either a test case or some runnable code that we can use to reproduce it.

          However, are you sure that the statement:

          What's strange is that it doesn't happen when using Workspace.reindexAsync()?!

          is correct ? The stack trace clearly shows that the exception occurs off another thread, not the main thread. Workspace.reindex() should never produce such a track trace, because that method is executed in-thread

           

          Regarding the size of the index, I suspect that can happen if you have a large amount of nodes in conjunction with your index definitions. Also, if the STRING properties that you're indexing contain large string values that will probably also increase the size of the indexes.

          • 2. Re: Re: Questions regarding indexing
            dalbani

            The exception does look like a bug, so please feel free to open a JIRA for it. Also, make sure you attach either a test case or some runnable code that we can use to reproduce it.

            Well, the test case is pretty simple: I simply call session.getWorkspace().reindex() on a brand new session object. Couldn't be simpler!

            (This call is made from a Spring REST @RequestMapping method, should it matter.)

            And I do confirm that reindexAsync() doesn't trigger the exception.

             

            As for my problem related to the size of the index, I've reduced it to only a couple of properties (STRING value, up to a dozen characters each at most).

            And I still get index files in the range of several GB?!

            But I've had a look at the contents of the local-indexes.db.t (binary) file: it looks like all properties of all the nodes in my repository are indexed?!

            I have no idea what's going on here...

            It absolutely doesn't reflect my configuration:

             

            <index-providers>
                <index-provider name="local-index-provider" classname="local" relative-to="jboss.server.data.dir" path="modeshape/index/xyz"/>
            </index-providers>
            <indexes>
                <index name="xyzIdentifier" provider-name="local-index-provider" kind="VALUE" node-type="xyz:item" columns="imp:identifier(STRING)"/>
                <index name="xyzReference" provider-name="local-index-provider" kind="VALUE" node-type="xyz:item" columns="imp:reference(STRING)"/>
            </indexes>
            

             

            I have of course restarted WildFly numerous times and completely deleted the "modeshape/index/xyz" directory as well each time.

             

            And when I removed all the indexing settings from WildFly configuration file, ModeShape went back to normal behavior: reindex() produced no error (and of course no index files were created as well on disk).

             

            Is my ModeShape bewitched or what?

            • 3. Re: Re: Questions regarding indexing
              hchiorean

              We already have tests for the simple workspace.reindex() calls: modeshape/JcrRepositoryTest.java at master · ModeShape/modeshape · GitHub so it's more complicated than this, which is why we need something runnable to reproduce the exception. It may be very well dependent on your index definitions, content etc.

              Also, as stated in my previous comment, if you read the stack trace you posted, it doesn't seem to have anything to do with Workspace.reindex which is an in-thread method call. Your stack trace is clearly executing off another thread.