2 Replies Latest reply on Feb 17, 2014 4:33 AM by nl

    Text extractor ignores text if write limit is exceeded

    nl Newbie

      Hello ModeShapers,

       

      if the text extractor runs into an exception it does not check whether any output is already available. In case that the write limit is exceeded, Tika throws a TikaException

       

      {noformat}

      Parsing exception while extracting text: Your document contained more than 1001 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).

      {noformat}

       

      which is catched by the Extractor but the output is not recorded (only in case of no exceptions).

       

      Is there a reason for this behaviour or is it just a bug?

       

      Thanks, Niels

       

      EDIT: This refers to MS 3.x