7 Replies Latest reply on Mar 4, 2016 5:15 AM by hchiorean

    How to configure full-text search stemming?

    zcc39r

      According to Full text search:

      ModeShape uses a complex system to analyze the node content and the query terms, and may perform a number of optimizations, such as ... converting words to base forms using a process called stemming (e.g., "running" into "run", "customers" into "customer").

      How could I ensure that stemming is in force? May be some configuration steps need to be performed?

        • 1. Re: How to configure full-text search stemming?
          hchiorean

          This is left-over documentation from ModeShape 3 (which I've corrected) and doesn't necessarily apply to ModeShape 4.

           

          In ModeShape 4 if you want to try stemming, you have to use the Lucene index provider and define one or more full text indexes. In addition, you should configure the Lucene index provider to use an analyzerClass which supports stemming (for example org.apache.lucene.analysis.en.EnglishAnalyzer). Note that it's not something we've tried so it may or may not work.

          • 2. Re: How to configure full-text search stemming?
            zcc39r

            So if Lucene indexing is not configured we have no stemming. But what are the characteristics of this default full-text searching? Having an nt:unstructured node with a property with value "class" I could find this node searching for "class", "lass", "ass", but couldn't find searching for "classes". So I guess that leading wildcard is in effect, while I'm actually didn't specify it and Full text search isn't actual as regards wildcards. Am I right?

            • 3. Re: How to configure full-text search stemming?
              hchiorean

              The default behavior is to use Java's regex "matches" on each of the node's properties, as they are, without any processing. In other words, it's "strict matching", but you can use special regex chars in the FTS expression.

              • 4. Re: How to configure full-text search stemming?
                zcc39r

                In other words the default behaviour is to use implicit left and right '*' wildcards.

                Returning to stemming I tried to configure Lucene index provider in WildFly just as stated in Lucene and immediately got the following error:

                10:59:28,358 ERROR [org.modeshape.jcr.JcrRepository] (ServerService Thread Pool -- 57) Unable to initialize the "lucene" index provider for repository "illmysql": org.apache.lucene.analysis.ro.RomanianAnalyzer cannot be cast to org.apache.lucene.analysis.Analyzer: java.lang.ClassCastException: org.apache.lucene.analysis.ro.RomanianAnalyzer cannot be cast to org.apache.lucene.analysis.Analyzer

                at org.modeshape.jcr.index.lucene.LuceneConfig.analyzer(LuceneConfig.java:171) [modeshape-lucene-index-provider-4.6.0.Final.jar:4.6.0.Final]

                 

                 

                ...

                So what's wrong in this configuration:

                <index-provider
                name="lucene"
                classname="lucene"
                module="org.modeshape.index-provider.lucene"

                lockFactoryClass="org.apache.lucene.store.NoLockFactory"

                directoryClass="org.apache.lucene.store.RAMDirectory"

                analyzerClass="org.apache.lucene.analysis.ro.RomanianAnalyzer"

                codec="Lucene53"/>

                ?

                • 5. Re: How to configure full-text search stemming?
                  hchiorean

                  RomanianAnalyzer is definitively an instance of org.apache.lucene.analysis.Analyzer. Unless your classpath is corrupt (i.e. you have multiple Lucene jars in your classpath, not just the ones from the org.apache.lucene.531 module, this should work.

                  • 6. Re: How to configure full-text search stemming?
                    zcc39r

                    I tried fresh installs of WildFly 8.2.0, 9.0.2 and encountered the same problem. Currently can't guess what's wrong is on classpath. In fact, ModeShape 4.6.0 contains two Lucene modules - 4.10 and 5.3.1. Could this be a root of the problem?

                    • 7. Re: How to configure full-text search stemming?
                      hchiorean

                      That's a good point and yes, the fact that there are multiple Lucene modules causes the problem.

                       

                      I've raised [MODE-2578] Cannot configure custom Lucene analzyer class in Wildfly when multiple Lucene modules are installed - JBoss … which we'll look at for 5.0. There is no workaround in Wildfly atm. because simply removing 4.10 will cause Infinispan to fail (without which ModeShape won't work).