6 Replies Latest reply on Mar 13, 2015 3:29 PM by brmeyer

    Design discussion: Free Text Search

    brmeyer

      [ARTIF-260] Free Text Search for artifacts Enhancement - JBoss Issue Tracker

       

      Currently, the query syntax supports searching metadata values only.  The XPath2 syntax allows wildcard searches using this type of function:

       

      xp2:matches(@FooProp, '.*foo.*') <-- find all artifacts with "foo" in the FooProp custom property

       

      We'd like to expand that in multiple ways:

      1. continue to support querying by single property names
      2. support querying all metadata values
      3. support querying artifact content
      4. support querying both artifact content and metadata

       

      This will involve expanding the 'matches' function arguments.  Here's a proposal, in order of appearance above:

      1. xp2:matches(@FooProp, '.*foo.*')
      2. xp2:matches(@*, '.*foo.*')
      3. xp2:matches(., '.*foo.*')
      4. xp2:matches(*, '.*foo.*')

       

      So, '@*' would search all metadata values.  '.' would follow XPath conventions and search the text-based children of the current node (ie, the artifact's content).

       

      '*' could be used to search both content and metadata, but that's up for discussion.  We'd considered something like xp2:matches(@* | ., 'foo.*'), but that felt a bit wonky (not to mention it'd be a pain to parse).

       

      Any thoughts?  As a user, what would feel more natural?

        • 1. Re: Design discussion: Free Text Search
          eric.wittmann

          When searching against content you'll do a full text search of some kind?  Lucene or something?

           

          Wondering how the wildcards translate to a full text search ending.

          • 2. Re: Design discussion: Free Text Search
            brmeyer

            When searching against content you'll do a full text search of some kind?  Lucene or something?

            We'd currently rely on ModeShape 4.0's full-text support (they provide a 'CONTAINS' function).  From what I understand, that relies on internal indexing (MS 4 indexing is a lot different than 3) and Tika text extractors.  However, they're currently working on additional index providers, Lucene being the first.  An alternative could be using MS 'CONTAINS' for the metadata, but an external index provider the the content (which would be possible, since we use filesystem file storage).  But, I'll cross that bridge when MS performance becomes an issue.

            Wondering how the wildcards translate to a full text search ending.

            https://docs.jboss.org/author/display/MODE/Full+text+search

            JCR-SQL2 - ModeShape 3 - Project Documentation Editor

             

            The full-text search 'CONTAINS' method supports wildcards, etc.

            • 3. Re: Design discussion: Free Text Search
              eric.wittmann

              Ah perfect - the state of the art for full text indexing has apparently progressed since the last time I really dove into it (*cough* 2001 *cough*).

              • 4. Re: Design discussion: Free Text Search
                brmeyer

                Actually, I might suggest simplifying this.  ModeShape only supports full-text searching in general -- it's an all or nothing approach.  IE, there's not a way to search metadata-only or content-only.  Further, I may be overthinking it.  Most users would probably need the original queries + full text search on the whole shebang.  So, reduce the requirements to:

                 

                1. xp2:matches(@FooProp, '.*foo.*')
                2. xp2:matches(*, '.*foo.*')

                 

                Thoughts?

                • 5. Re: Design discussion: Free Text Search
                  eric.wittmann

                  This makes sense to me. Keep it simple, stupid.

                   

                  However if you're going with just these two options, then my suggestion would be to use . instead of * for #2:

                   

                  2. xp2:matches(., 'foo.*')

                   

                  That's more consistent with xpath semantics I think.

                  • 6. Re: Design discussion: Free Text Search
                    brmeyer

                    That's more consistent with xpath semantics I think.

                    Fair point, will do.  Thanks!