I'm wondering about the relationship between sequencers and extractors.
It seems like extractors are for creating indexes for full-text searching, whereas sequencers are for manipulating the repository-- adding nodes and properties (which can also be indexed for full-text searches) and such.
Specifically, I was happy to see some stuff for PDF files, and merrily went about trying it out. Love the way everything is glued together BTW! It was real easy to test the various configurations. Everything worked like it was supposed to. After the successful text extraction, I wanted to pull up an excerpt of what had been extracted-- similar to MODE-1163 which talks about a jackrabbit-specific way of doing it.
I say "similar" in that I think I actually want that data sequenced instead of indexed (assuming I understand the relationship between extraction and sequencing), as I need to to be able to display what was extracted, and probably put things in nodes/properties opposed to a blob of text.
Basically I'm wondering if I have the relation right 'twixt the two. Judging by MODE-1163, getting at the data the extractor stores isn't trivial-- but I honestly haven't looked at it, I'm just going by the ticket being pushed out a few times.
I wrote a quick sequencer for PDF files, but it was so easy I fear I'm missing something. The ticket for a PDF sequencer was closed ages ago, but I didn't see anything besides the Tika extractor in the sources. (Seems like instead of a PDF sequencer, a Tika sequencer would be more useful, since it can introspect so many file types, but before I mess around more I wanted to do a quick sanity check.) Am I missing something obvious?
If I'm not missing something, and the only reason there's no Tika sequencer is because nobody has needed one, then my next question will be about configuring sequencers: It doesn't seem like there's autowire magic for them like there is for extractors, configuration-wise. I think initialize(blah,blah) is called alone, vs looking for setters or whatnot, but I haven't really dug into it, maybe I'm just overlooking something that turns on the magic, so to speak.
Anyways, this project is swell-- I've had lots of fun and little frustration, so kudos!