Error while trying to setup Tika text extractor in modeshape
satyakishor.m Jan 25, 2013 5:17 PMI am running into an issue while trying to setup Tike text extractor in mode shape. Following is the error I am getting when running my application with Tika text extractor.
17:00:55,076 ERROR [stderr] (modeshape-text-extractor-7-thread-1) Exception in thread "modeshape-text-extractor-7-thread-1" java.lang.ExceptionInInitializerError
17:00:55,091 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.poi.openxml4j.opc.internal.unmarshallers.PackagePropertiesUnmarshaller.<clinit>(PackagePropertiesUnmarshaller.java:49)
17:00:55,091 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154)
17:00:55,091 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141)
17:00:55,107 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
17:00:55,107 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99)
17:00:55,107 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207)
17:00:55,123 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.tika.parser.pkg.ZipContainerDetector.detectOfficeOpenXML(ZipContainerDetector.java:194)
17:00:55,123 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:134)
17:00:55,123 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:77)
17:00:55,138 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
17:00:55,138 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.modeshape.jcr.mimetype.TikaMimeTypeDetector.mimeTypeOf(TikaMimeTypeDetector.java:126)
17:00:55,138 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.modeshape.jcr.mimetype.MimeTypeDetectors.mimeTypeOf(MimeTypeDetectors.java:74)
17:00:55,154 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.modeshape.jcr.value.binary.AbstractBinaryStore.getMimeType(AbstractBinaryStore.java:161)
17:00:55,154 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.modeshape.jcr.value.binary.StoredBinaryValue.getMimeType(StoredBinaryValue.java:69)
17:00:55,154 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.modeshape.jcr.TextExtractors$Worker.run(TextExtractors.java:175)
17:00:55,170 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
17:00:55,170 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
17:00:55,170 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at java.lang.Thread.run(Thread.java:722)
17:00:55,185 ERROR [stderr] (modeshape-text-extractor-7-thread-1) Caused by: java.lang.ClassCastException: org.dom4j.DocumentFactory cannot be cast to org.dom4j.DocumentFactory
17:00:55,185 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.dom4j.DocumentFactory.getInstance(DocumentFactory.java:97)
17:00:55,185 ERROR [stderr] (modeshape-text-extractor-7-thread-1) at org.dom4j.tree.AbstractNode.<clinit>(AbstractNode.java:39)
17:00:55,201 ERROR [stderr] (modeshape-text-extractor-7-thread-1) ... 18 more
I am not sure why I am running into this issue. I checked in my classpath for more than one dom4j jars and I didn't find more than one dom4j jars.
Following is my jcr configuration
{
"name" : "jerms",
"jndiName" : "jcr/jerms",
"storage" : {
"transactionManagerLookup" = "org.infinispan.transaction.lookup.DummyTransactionManagerLookup"
},
"query" : {
"indexStorage" : {
"type" : "ram"
},
"textExtracting": {
"extractors" : {
"tikaExtractor":{
"name" : "Tika content-based extractor",
"classname" : "tika"
}
}
}
}
}
I am stuck at this issue for couple of hours, any help is appreciated.