2 Replies Latest reply on Feb 26, 2013 11:40 PM by tamer_sk

    Tika writeLimit

    tamer_sk

      Hello,

       

      With Modeshape 3.1.x, I see that writeLimit was added to TikaTextExtractor. This is really great as the default limit I believe is 10000 characters. My question is, what would be the best way for setting the limit to a higher number? Is it possible to do that through some configuration or does it have to be done programmatically. What would be the best practice?

       

       

      Thanks

        • 1. Re: Tika writeLimit
          rhauch

          Yes, you can set the write limit it in the repository configuration. A simple example (taken from one of our unit test configurations) that shows how this is done is as follows:

           

          {

              "name" : "Test Repository",

              "query" : {

                  "enabled" : true,

                  "enableFullTextSearch" : true,

                  "rebuildUponStartup" : "if_missing",

                  "indexStorage" : {

                      "type" : "ram"

                  },

                  "textExtracting": {

                      "extractors" : {

                          "tikaExtractor":{

                              "name" : "Tika content-based extractor",

                              "classname" : "org.modeshape.extractor.tika.TikaTextExtractor",

                              "writeLimit" : 100

                          }

                      }

                  }

              }

          }

           

          Note the "writeLimit" field, which here is set to 100. You can set this to any integer value you want. Otherwise, it's just a normal repository configuration file that should be set up based upon your needs. (You can also do this in the AS7 configuration, and it's done in a similar manner.)

           

          BTW, this works because such "extra" properties (in the nested document defining the Tika extractor) are used to set field values on the org.modeshape.extractor.tika.TikaTextExtractor Java object.

          • 2. Re: Tika writeLimit
            tamer_sk

            Thank you very much Randall.