3 Replies Latest reply on Jun 9, 2008 2:12 PM by mathwizard

    New sequencers

    mathwizard

      Hello,
      I have already written onto the mailing list, but as I am not sure which is better way to communicate (the forums or the mailing list), I am writing also here. I really like tho project and find it very interesting and also with huge possibilities. I would very much like to contribute. I have started by creating a simple msoffice sequencer that takes almost any MSOffice document and searches for common metadata such as title, author, keywords,.... (haven't tried Office 2007). Sequencers for specific documents and extraction of things like formulas from excel should be written separately.

      I also started thinking about writing a sequencer for recognizing mime types.

      Please let me know if I should follow this direction or if you have other things I can help with. Also please let me know where should I upload the code and what should I do to claim tickets in Jira.

      Regards,

      Michael Trezzi

        • 1. Re: New sequencers
          rhauch

          I've received your sequencer code on the dev list, and will take a look at it. I wasn't sure whether it'd be better to have multiple sequencers or a single sequencer for the different kinds of MS Office files.

          Glad you're finding the project interesting. As for other activities, take a look at some of the smaller issues and feel free to submit patches (you could attach them to the JIRA issue for someone to review). I'll also contact you on email to talk about the specifics.

          Thanks again!

          • 2. Re: New sequencers
            jpav

            Michael,

            I'm going to tackle Jira issue DNA-76, which proposes providing a sequencing context to sequencers, and which also now incorporates your concern about MIME types. MIME type determination would be handled by the sequencer service rather than by a standalone sequencer. It does seem like we need to develop some sort of framework similar to the sequencer framework that allows for contributing processors that determine MIME type, names, etc., that would be populated within the proposed sequencing context.

            • 3. Re: New sequencers
              mathwizard

              That's definately a solution, however it must be clearly stated what should be handled by sequencers and what should be handled by the sequencing engine itself (or the processing engine).