2 Replies Latest reply on Apr 13, 2017 2:39 AM by paulbistr

Modeshape full-text-search works only on binary files but not on text files

paulbistr Apr 12, 2017 3:27 PM

I am trying to perform a full-text-search on my Modeshape 5.3.0.Final repository. The query is as simple as:

Query query = queryManager.createQuery("SELECT * FROM [nt:resource] as data WHERE ISDESCENDANTNODE('/somenode') AND CONTAINS(data.*,'*" + text + "*')

Looks like it works well for binary stored files (i.e. pdf,doc,docx, etc...) but it does not match txt files, or any text format file.

Currently I'm performing a hack to get around this issue by executing another search for configured text file extensions and manually using Tika (maybe since it's text already Tika is not required here...) to extract the text and search for occurrences.

Does anybody know if this is expected behavior or maybe I am doing something wrong?

Cheers!

p.s. This is my repo config

{ "name": "Persisted-Repository", "textExtraction": { "extractors": { "tikaExtractor": { "name": "General content-based extractor", "classname": "tika" } } }, "workspaces": { "predefined": [ "otherWorkspace" ], "default": "default", "allowCreation": true }, "security": { "anonymous": { "roles": [ "readonly", "readwrite", "admin" ], "useOnFailedLogin": false } }, "storage": { "persistence": { "type": "file", "path": "/var/content/storage" }, "binaryStorage": { "type": "file", "directory": "/var/content/binaries", "minimumBinarySizeInBytes": 999, "mimeTypeDetection": "content" } }, "indexProviders": { "lucene": { "classname": "lucene", "directory": "/var/content/indexes" } }, "indexes": { "textFromFiles": { "kind": "text", "provider": "lucene", "nodeType": "nt:resource", "columns": "jcr:data(BINARY)" } } }

1. Re: Modeshape full-text-search works only on binary files but not on text files

hchiorean Apr 13, 2017 1:39 AM (in response to paulbistr)

It's hard to say what's going on, especially since we're testing for this exact behavior (note that I also did a quick test locally adding a ISDESCENDANTNODE constraint and the test still passes): modeshape/LuceneIndexProviderTest.java at master · ModeShape/modeshape · GitHub
If you can provide a simple test case for this, please log a JIRA. Thanks.
1 of 1 people found this helpful
Actions
2. Re: Modeshape full-text-search works only on binary files but not on text files

paulbistr Apr 13, 2017 2:39 AM (in response to hchiorean)

Thanks hchiorean for taking your time for performing the local test and pointing me to the right place.
I will try to write a test case and log a jira if the problem is not on my side.
Cheers!

p.s. really appreciate your work on modeshape
Actions

Go to original post