JBossWS Streaming Implementation Proposal
jason.greene Dec 15, 2004 2:46 AMHello everyone,
After much thought, I was able to narrow everything down to one design, which I think is the best solution. It works off a similar concept to the XML fragment design, though it does introduce a lot of changes to the existing code base.
First, I will start with a bit of background on StAX. StAX consists of two APIs (cursor, and event). The cursor API consists of 2 primary interfaces (XMLStreamReader, and XMLStreamWriter). The cursor API is forward only, and all functionality is accessed via that interface. As the cursor is advanced, an event is returned that corresponds to a valid token set encountered by the parser (i.e. START_ELEMENT, CHARACTERS, COMMENT, etc). The consumer then calls the desired accessor methods that are associated with the event.
The event API operates similar to the cursor API, except that it allocates and returns an event object whos hierarchy is based off of the event type. The event object can be indefinitely held, which makes it ideal for pipelining. There are 2 main interfaces that a consumer uses to interact with the event API, XMLEventReader and XMLEventWriter.
I will only describe the process from an unmarshalling perspective, since the marshalling process is reflexive.
For unmarshalling, this would involve a front message parser that would use the StAX cursor API (XMLStreamReader) to pull from the incoming message stream and analyze each element in the order that it occurs. Based off of the typemapping registry, a deserializer would be passed the XMLStreamReader at a START_ELEMENT event. The deserializer would then construct the appropriate object by lazily pulling from the parser until it hits the corresponding END_ELEMENT. The front message parser would then continue to the next START_ELEMENT that needs to be delegated. The JAXB spec already provides such a concept in its JAXBContext interface. (When passed an XMLStreamingReader, it expects it to be positioned at a START_ELEMENT, and advances to the corresponding END_ELEMENT).
Now, I know what you are thinking, what about SAAJ? We know in advance if there is a handler registered. If there is one, it is unavoidable that we must convert our incoming stream into a DOM tree, if there isn't one, and there are attachments, we just mime decode the stream on the fly and process the XML portion ignoring the attachments. Assuming there was a handler, and after the message is manipulated (or perhaps not) by the handler, the message is deserialized into our unmarshalling component as described above. We take a hit here in reparsing a message we just processed, but IMO this is far better than the alternative of maintaining 2 code paths.
The main problem to a streaming parser implementation is that stream parsing and SAAJ are mutually exclusive. Which is why I also propose that we add a proprietary enhancement to the protocol handler's SOAPMessageContext that would allow the handler to obtain an XMLEventReader and XMLEventWriter (or XMLStreamWriter/XMLStreamReader) pair. Regardless, the interfaces would be emulated such that a dispatch component could then pipeline XMLEvent objects into and out of each handler in the chain, the last handler being the front message parser itself. Each push and pull operation on the reader/writer would pipe chucks to the next handler. The handler would only push if it could pipeline. So if, for example, the handler needed to process the entire message before it modified it, it would just queue, and hold off pushing till the end. If a handler still wanted to use SAAJ, we would just lazily construct it when the handler called getSOAPMessage(). I emailed this idea to the jax-rpc comments address, and I got a response saying that the expert group would look into this, so it is potentially possible to become part of the standard.
Adding all of these pieces together, you end up with the ability to parse the SOAP message from a stream with many handlers only once, and with no large copies.
-Jason