2 Replies Latest reply on Apr 13, 2017 2:39 AM by Paul Vlasin

    Modeshape full-text-search works only on binary files but not on text files

    Paul Vlasin Newbie

      I am trying to perform a full-text-search on my Modeshape 5.3.0.Final repository. The query is as simple as:

       

      Query query = queryManager.createQuery("SELECT * FROM [nt:resource] as data WHERE ISDESCENDANTNODE('/somenode') AND CONTAINS(data.*,'*" + text + "*')

       

      Looks like it works well for binary stored files (i.e. pdf,doc,docx, etc...) but it does not match txt files, or any text format file.

       

      Currently I'm performing a hack to get around this issue by executing another search for configured text file extensions and manually using Tika (maybe since it's text already Tika is not required here...) to extract the text and search for occurrences.

      Does anybody know if this is expected behavior or maybe I am doing something wrong?

      Cheers!

       

       

      p.s. This is my repo config

       

      {
        "name": "Persisted-Repository",
        "textExtraction": {
        "extractors": {
        "tikaExtractor": {
        "name": "General content-based extractor",
        "classname": "tika"
        }
        }
        },
        "workspaces": {
        "predefined": [
        "otherWorkspace"
        ],
        "default": "default",
        "allowCreation": true
        },
        "security": {
        "anonymous": {
        "roles": [
        "readonly",
        "readwrite",
        "admin"
        ],
        "useOnFailedLogin": false
        }
        },
        "storage": {
        "persistence": {
        "type": "file",
        "path": "/var/content/storage"
        },
        "binaryStorage": {
        "type": "file",
        "directory": "/var/content/binaries",
        "minimumBinarySizeInBytes": 999,
        "mimeTypeDetection": "content"
        }
        },
        "indexProviders": {
        "lucene": {
        "classname": "lucene",
        "directory": "/var/content/indexes"
        }
        },
        "indexes": {
        "textFromFiles": {
        "kind": "text",
        "provider": "lucene",
        "nodeType": "nt:resource",
        "columns": "jcr:data(BINARY)"
        }
        }
      }