8 Replies Latest reply on Sep 4, 2014 5:23 PM by rhauch

    Can't get the Indexmanager to work :(

    bes82

      Hi,

       

      please take a look at the attached test and tell me what I'm doing wrong / why my index is not used?

       

      {code}

       

      package org.erratic.wm.test;

       

       

      import javax.enterprise.context.RequestScoped;

      import javax.inject.Inject;

      import javax.jcr.PropertyType;

      import javax.jcr.RepositoryException;

      import javax.jcr.Session;

      import javax.jcr.query.Query;

      import static org.apache.deltaspike.core.api.projectstage.ProjectStage.UnitTest;

      import org.apache.deltaspike.testcontrol.api.TestControl;

      import org.apache.deltaspike.testcontrol.api.junit.CdiTestRunner;

      import org.junit.Assert;

      import org.junit.FixMethodOrder;

      import org.junit.Test;

      import org.junit.runner.RunWith;

      import org.junit.runners.MethodSorters;

      import org.modeshape.jcr.api.index.IndexColumnDefinition;

      import org.modeshape.jcr.api.index.IndexDefinition;

      import org.modeshape.jcr.api.index.IndexDefinitionTemplate;

      import org.modeshape.jcr.api.index.IndexManager;

      import org.slf4j.Logger;

       

       

      /**

      *

      * @author bschmidt

      */

      @RunWith(CdiTestRunner.class)

      @TestControl(startScopes = RequestScoped.class, projectStage = UnitTest.class)

      @FixMethodOrder(MethodSorters.NAME_ASCENDING)

      //@Ignore

      public class WmModeshapeIndexTest {

       

       

       

        @Inject

        private Session session;

       

       

        @Inject

        protected transient Logger logger;

       

       

        @Test

        public void test001CheckSomethingWithIndexes() throws RepositoryException {

       

       

       

        ensureIndex(indexManager(), "ntusysname", IndexDefinition.IndexKind.VALUE, "local", "nt:unstructured", "", null, "sysName", PropertyType.STRING);

        Query q = session.getWorkspace().getQueryManager().createQuery("select BASE.* FROM [nt:unstructured] as BASE WHERE BASE.sysName=$sysName",Query.JCR_SQL2);

        q.bindValue("sysName", session.getValueFactory().createValue("X"));

        String plan = ((org.modeshape.jcr.api.query.Query)q).explain().getPlan();

        logger.info(plan);

       

        Assert.assertEquals(1,indexManager().getIndexDefinitions().size());

       

        Assert.assertEquals(true,plan.contains("INDEX_USED=true"));

        }

       

        protected IndexManager indexManager(){

        if (!(session instanceof org.modeshape.jcr.api.Session)){

        return null;

        }

       

        try {

        return ((org.modeshape.jcr.api.Session) session).getWorkspace().getIndexManager();

        } catch (RepositoryException ex) {

        logger.error(ex.toString(), ex);

        }

        return null;

        }

       

        protected void ensureIndex(IndexManager manager, String indexName, IndexDefinition.IndexKind kind, String providerName, String indexedNodeType, String desc, String workspaceNamePattern, String propertyName, int propertyType) throws RepositoryException {

       

       

       

       

        if (manager.getIndexDefinitions().containsKey(indexName)){

        return;

        }

       

        logger.info("registering index on property "+propertyName+", type "+indexedNodeType);

       

        // Create the index template ...

        IndexDefinitionTemplate template = manager.createIndexDefinitionTemplate();

        template.setName(indexName);

        template.setKind(kind);

        template.setNodeTypeName(indexedNodeType);

        template.setProviderName(providerName);

        if (workspaceNamePattern != null) {

        template.setWorkspaceNamePattern(workspaceNamePattern);

        } else {

        template.setAllWorkspaces();

        }

        if (desc != null) {

        template.setDescription(desc);

        }

       

        // Set up the columns ...

        IndexColumnDefinition colDefn = manager.createIndexColumnDefinitionTemplate().setPropertyName(propertyName).setColumnType(propertyType);

        template.setColumnDefinitions(colDefn);

       

       

        // Register the index ...

        manager.registerIndex(template, false);

        }

       

       

      }

       

      {code}

        • 1. Re: Can't get the Indexmanager to work :(
          rhauch

          One thing is when registering indexes, creating and populated the indexes themselves (which may be time-consuming) is an asynchronous process across the cluster (even when there is just one process in the cluster). We did this because this may actually take quite a bit of time if there is lots of content. Therefore, try waiting 0.5 seconds or so after registering before they are used, plus any time necessary for the existing content to be reindexed.

           

          Otherwise, the code looks good. I presume that you're not overwriting an existing index definition, via the "false" second parameter in "registerIndex"

          • 2. Re: Can't get the Indexmanager to work :(
            bes82

            Ok, I waited 60 seconds, nothing changes. Additionally I think don't understand what you wrote:

            I register an index but there is no content. But I would still expect that the querybuilder considers using the index even if there is no content?

             

             

            Anyway, even after registering an index, waiting, registering a node, waiting again, the test still fails.

            There has to be a bug or an operator error on my side and I need help Please take a look at the changed test in next post.

             

             

            Additional Questions:

             

             

            1. after registering an index, do already existing nodes get indexed and how does this work, assuming the repo contains millions of nodes?

            2. indexing supertypes (nt:base) for instance also indexes subtypes, right?

            3. if there is an index for subtype and supertype on the same property, will the subtype index be used if the query searches only for the subtype?

            4. after shutting down the repo, deleting an index and restarting modeshape, when will existing nodes be indexed again?

            5. Do you have to reregister an index in this case or is the indexdefinition stored inside the repo itself?

            6. what does the update index property actually mean, I think this is related to 1.

             

             

            And finally will this (my usecase) actually work:

             

             

            During runtime, while discovering POJO classes with fields annotated to be indexed I create a definition key based on class/field, ask the indexmanager if he already knows this definition and if not create an index definition on the corresponding nodetype (which is also derived from the class name). Now what schould actually happen is that after the index is defined (or maybe was already defined) all queries should consider using the defined index (maybe with a little delay, ok) This should also work if the index files were deleted during server/app restart, hence question number 5 and 6. And it should also work on nodes defined before the index was defined.

             

             

            Thanks in advance

            package org.erratic.wm.test;
            
            import javax.enterprise.context.RequestScoped;
            import javax.inject.Inject;
            import javax.jcr.Node;
            import javax.jcr.PropertyType;
            import javax.jcr.RepositoryException;
            import javax.jcr.Session;
            import javax.jcr.query.Query;
            import static org.apache.deltaspike.core.api.projectstage.ProjectStage.UnitTest;
            import org.apache.deltaspike.testcontrol.api.TestControl;
            import org.apache.deltaspike.testcontrol.api.junit.CdiTestRunner;
            import org.junit.Assert;
            import org.junit.FixMethodOrder;
            import org.junit.Test;
            import org.junit.runner.RunWith;
            import org.junit.runners.MethodSorters;
            import org.modeshape.jcr.api.index.IndexColumnDefinition;
            import org.modeshape.jcr.api.index.IndexDefinition;
            import org.modeshape.jcr.api.index.IndexDefinitionTemplate;
            import org.modeshape.jcr.api.index.IndexManager;
            import org.slf4j.Logger;
            
            /**
             *
             * @author bschmidt
             */
            @RunWith(CdiTestRunner.class)
            @TestControl(startScopes = RequestScoped.class, projectStage = UnitTest.class)
            @FixMethodOrder(MethodSorters.NAME_ASCENDING)
            //@Ignore
            public class WmModeshapeIndexTest {
            
                 
                 @Inject
                 private Session session;
            
                 @Inject
                 protected transient Logger logger;
            
                 @Test
                 public void test001CheckSomethingWithIndexes() throws RepositoryException, InterruptedException {
            
                      
                      ensureIndex(indexManager(), "ntusysname", IndexDefinition.IndexKind.VALUE, "local", "nt:unstructured", "", null, "sysName", PropertyType.STRING);
                      
                      Thread.sleep(2500);
                      
                      Node newNode = session.getRootNode().addNode("XNODE","nt:unstructured");
                      newNode.setProperty("sysName", "X");
                      
                      Thread.sleep(2500);
                      
                      Query q = session.getWorkspace().getQueryManager().createQuery("select BASE.* FROM [nt:unstructured] as BASE WHERE BASE.sysName=$sysName",Query.JCR_SQL2);
                      q.bindValue("sysName", session.getValueFactory().createValue("X"));
                      String plan = ((org.modeshape.jcr.api.query.Query)q).explain().getPlan();
                      logger.info(plan);
                      
                      Assert.assertEquals(1,indexManager().getIndexDefinitions().size());
                      
                      Assert.assertEquals(true,plan.contains("INDEX_USED=true"));
                 }
                 
                 protected IndexManager indexManager(){
                      if (!(session instanceof org.modeshape.jcr.api.Session)){
                           return null;
                      }
                      
                      try {
                           return ((org.modeshape.jcr.api.Session) session).getWorkspace().getIndexManager();
                      } catch (RepositoryException ex) {
                           logger.error(ex.toString(), ex);
                      }
                      return null;
                 }
                 
                 protected void ensureIndex(IndexManager manager, String indexName, IndexDefinition.IndexKind kind, String providerName, String indexedNodeType, String desc, String workspaceNamePattern, String propertyName, int propertyType) throws RepositoryException {
            
            
                      if (manager.getIndexDefinitions().containsKey(indexName)){
                           return;
                      }
                      
                      logger.info("registering index on property "+propertyName+", type "+indexedNodeType);
                      
                      // Create the index template ...
                      IndexDefinitionTemplate template = manager.createIndexDefinitionTemplate();
                      template.setName(indexName);
                      template.setKind(kind);
                      template.setNodeTypeName(indexedNodeType);
                      template.setProviderName(providerName);
                      if (workspaceNamePattern != null) {
                           template.setWorkspaceNamePattern(workspaceNamePattern);
                      } else {
                           template.setAllWorkspaces();
                      }
                      if (desc != null) {
                           template.setDescription(desc);
                      }
                      
                      // Set up the columns ...
                      IndexColumnDefinition colDefn = manager.createIndexColumnDefinitionTemplate().setPropertyName(propertyName).setColumnType(propertyType);
                      template.setColumnDefinitions(colDefn);
            
                      // Register the index ...
                      manager.registerIndex(template, false);
                 }
                 
                 
            }
            
            
            • 3. Re: Can't get the Indexmanager to work :(
              rhauch

              What is the repository configuration? In particular, I'd like to know what the index provider is defined to be.

               

              Everything in your example seems to make sense. This is very similar to our test case, which we know works on Beta1. I suggest turning up debug/trace logging on "org.modeshape.jcr.index" to see if you can spot what's going on.

               

              UPDATE: Please try changing your second "registerIndex(...)" parameter from "false" to "true".

               

              Ok, I waited 60 seconds, nothing changes. Additionally I think don't understand what you wrote:

              I register an index but there is no content. But I would still expect that the querybuilder considers using the index even if there is no content?

               

              It still takes a very short amount of time for the index provider to receive the asynchronous notification that it needs to create a new index, and then to create the new index. On my system this takes < 0.1 seconds, but it still is not synchronous.

               

              Additional Questions:

               

               

              1. after registering an index, do already existing nodes get indexed and how does this work, assuming the repo contains millions of nodes?

               

              Yes. When an index definition is registered, the repository figures out which index provider applies, and asks it to validate the index definition. If there are any problems, then an exception is thrown from the "registerIndex" method. If the index definition is valid, then the repository saves the index definition to the repository state under "/jcr:system/mode:indexes". If there are multiple index definitions registered, then they are all written to the system area with one save/txn. Then, asynchronously, a listener receives the notification that changes were made under "/jcr:system/mode:indexes", and it does some fairly complex logic to figure out what changed, and for each index definition it then notifies the index provider of the changes so that the index provider can create/update/remove the appropriate index definitions. For each of these calls, the affiliated index provider will then create/update/remove the indexes described by the new/updated/removed definition, register an event listener so that it can keep this index updated as content is changed, and the provider then tells the repository which paths, if any, need to be (re)indexed to (re)populate the index. The repository then initiates a reindex of the specified paths, and this reindexing occurs in a separate thread.

               

              For example, your test case is registering a single index definition. When your test code calls "registerIndex(...)", the repository is validating the index definition and, if valid, writing it to the "/jcr:system/mode:indexes" area. Asynchronously, the index manager's event listener is notified of the changes under "/jcr:system/mode:indexes", figures out that this is a new index definition, that the "local" index provider should own this index, and then it calls the "local" index provider with this information.

               

              The "local" index provider then looks at the new index definition, discovers that it should apply to all workspaces, then will create a new local index for each existing workspace. It registers an event listener for each index that will update the index based upon the incoming events. It then tells the repository (via an input parameter passed to the index provider's call) that the "/" path should be reindexed for each workspace. The repository then starts an asynchronous reindex process for (in this case) all content in each existing workspace. Since your workspace(s) are pretty empty, this will complete very quickly. If the workspace contains millions of nodes, then it will take quite some time to complete, and while the reindexing is underway the partially-filled index will be visible and available to the query engine. Obviously, you want to do this only when necessary, and at an appropriate time.

               

              Theoretically, there should be no need to create a new index when the repository is so large, except when you've added new functionality to your application. And you simply need to plan enough time for the "upgrade" to complete.

               

              BTW, if two separate indexes require the "/" path to be reindexed, then the repository only scans the "/" path once.

               

              2. indexing supertypes (nt:base) for instance also indexes subtypes, right

              Yes. Basically, the index's event listener uses this node type to know whether the node described by an incoming event should be included in the index. It's a simple "is-a" relationship. So if the index definition's node type is 'nt:base', then every node (since all node types extend 'nt:base') will be included in the index. If the index definition's node type is 'nt:unstructured', then only nodes that have a primary type or mixin type that are or subtype 'nt:unstructured' will be included in the index.

               

              The property (or properties) do not play any part of determining whether the changed node should be indexed. Instead, once the above logic determines that a node should be indexed, then the node's property value(s) are extracted from the event. The nonexistence of a property value is treated as NULL.

               

              3. if there is an index for subtype and supertype on the same property, will the subtype index be used if the query searches only for the subtype

               

              Don't do this. If index definition A is on 'nt:base' and some property 'foo', and index definition B is on 'nt:unstructured' and some property 'foo', then index A contains everything that index B has, but B contains nothing that index A has. Doing this creates extra storage space, extra work during all transactions, and offers no value whatsoever.

               

              4. after shutting down the repo, deleting an index and restarting modeshape, when will existing nodes be indexed again

               

              When an repository is restarted, the repository will read all of the index definitions persisted under '/jcr:system/mode:indexes' and will pass them to the appropriate index providers. Each provider is then responsible for looking at its persisted indexes and telling the repository which paths (if any) should be reindexed.

               

              If you delete the files that the "local" index provider uses for a particular index (e.g., "sysname" in the "default" workspace), then when the "local" index provider is restarted it will discover that the index files are missing and will notify the repository that the "default" workspace's content should be reindexed so that the index can be repopulated.

               

              5. Do you have to reregister an index in this case or is the indexdefinition stored inside the repo itself

               

              The index definition is stored within the "/jcr:system/mode:indexes" area, and the files for each of the "local" index provider's indexes are stored on the file system. There is no need to re-register the index definition.

              6. what does the update index property actually mean, I think this is related to 1

              I'm not sure what you mean by "update index property" means.

              1 of 1 people found this helpful
              • 4. Re: Can't get the Indexmanager to work :(
                bes82

                Thanks for answering in detail, this definitely helped me to understand how indexing is working.

                 

                About  Question 6: This is also related to what you worte at the beginning: UPDATE: Please try changing your second "registerIndex(...)" parameter from "false" to "true".

                 

                What does this parameter actually do? If the Indexprovider / Repository automatically figure out which nodes have to be (re)indexed, what does this parameter change?

                 

                This is my repository config:

                 

                 

                {

                    "name" : "modeshapeRepository",

                    "monitoring" : {

                        "enabled" : true

                    },

                    "storage" : {

                        "cacheName" : "contentRepository",

                        "cacheConfiguration" : "META-INF/infinispan-test-file-config.xml",

                        "binaryStorage" : {

                            "type" : "file",

                            "directory": "/tmp/modeshape-binaries"

                        }

                    },

                    "workspaces" : {

                        "default" : "default",

                        "allowCreation" : true

                    },

                  "indexProviders" : {

                        "local" : {

                            "classname" : "org.modeshape.jcr.index.local.LocalIndexProvider",

                            "directory" : "/tmp/modeshape-local-index"

                        }

                    },

                    "security" : {

                        "anonymous" : {

                            "roles" : ["readonly","readwrite","admin"],

                            "useOnFailedLogin" : false

                        }

                    }

                   

                }

                 

                 

                What I noticed is that the  local-indexes.db.t file stays at 16 bytes for a long time, then at some point it starts to grow. I can't figure out when that is, indexes have been defined long before. But query performance isn't affected by that, still slow.

                 

                Then after restarting the app, the local-indexes.db.t goes to 16 bytes again. So it seems the indexes get deleted again.

                • 5. Re: Can't get the Indexmanager to work :(
                  bes82

                  Interesting:

                   

                  If I try to access a node from jcr:system/mode:indexs/local I get this:

                   

                  javax.jcr.RepositoryException: No valid property definition on node '/jcr:system/mode:indexes/local/ntusysname/mode:indexColumn' with primary type 'mode:indexColumn' and mixin types [] for the property: mode:columnTypeName="String"

                  • 6. Re: Can't get the Indexmanager to work :(
                    rhauch

                    I've tracked this down to the indexes not being considered properly when the constraint has a bind variable or uses an aliased property. I've logged MODE-2290 to deal with this.

                    1 of 1 people found this helpful
                    • 7. Re: Can't get the Indexmanager to work :(
                      rhauch

                      BTW, I've merged the fixes for MODE-2290 into the 'master' branch. Give it a whirl, and let us know how it works.

                      • 8. Re: Can't get the Indexmanager to work :(
                        rhauch

                        Thanks for sending your test case. I was able to use it to create a much simpler unit test case, and I logged MODE-2292 to handle the restart issue. The new test case shows that it's possible to restart the repository and the indexes now work as expected. You're test case also passes with the changes for -2292, which already have been merged into the 'master' branch.