3 Replies Latest reply on Aug 6, 2014 3:41 AM by folch

    How can we control what content is indexed by Lucene in Modeshape 3.8

    folch

      Hi,

       

      Is there a configuration setting (by config file or by code) where we can specify which kind of nodes do we need to index with Lucene?

      I'm particulary interested in something like: only index content for nodeType: nt:unestructured or under a certain path structure.

       

      Is this possible? By default seems that lucene is indexing all the repository, even if we only search for a small subset of content.

      By the way, I'm using Modeshape 3.8

       

      Thanks

      Guillem

        • 1. Re: How can we control what content is indexed by Lucene in Modeshape 3.8
          hchiorean

          JCR does have a "noquery" attribute on a node type. For example: [mode:Acl] noquery. However, this only works in a all-or-nothing manner, i.e. either the entire node type isn't queryable or it is.

          1 of 1 people found this helpful
          • 2. Re: How can we control what content is indexed by Lucene in Modeshape 3.8
            rhauch

            Horia is correct that the `noquery` node type attribute will prevent indexing of that node (and IIRC all descendants). It's also possible in 3.x to completely disable the query system and indexing, but obviously this is only an option if your application never queries for data.

             

            The thought of the community thru 3.x was that ModeShape can index all content (except for 'noquery' types), and so apps can issue any queries with no need to set up any indexing. This makes this simple but less efficient.

             

            With 4.x we've taken the approach that all queries will work, but will generally be slow until you explicitly create indexes on the properties that make sense.

            1 of 1 people found this helpful
            • 3. Re: How can we control what content is indexed by Lucene in Modeshape 3.8
              folch

              The thing is we are:

              • Uploading zip files
              • Unzipping the content (each file has more or less 30 folders + 30 files) under a single node which contains metadata

               

              For instance:

              • documents\023\089\{docId}\content

               

              where {docId} is different for each file and content folder contains the unzipped content.

               

              This operation is repeated along the day many times (up to 12.000 times).

              We have observed that the index is consuming a lot of time and is blocking repository operations even if we don't need all the content indexed.

              We only need to index the {docId} node as it's the only one that contains metadata and is used in searches.

               

              So in summary, we cannot disable indexing as we still need to search for document metadata, but we don't want to index all nt:folder and nt:resource generated by the unzip operation.

              Based on Horia recommendation, we should define a new NodeType based on nt:folder or nt:resource which contains 'noquery' attribute definition and then use this new NodeTypes to unzip the content. Am I right?

               

              Do you have any exmaple about how to do that? I mean, how to create a new type based on nt:file which contains 'noquery' attribute?

               

              Thanks in advance

              Guillem