Introduction
This article will discuss the reasoning behind various strategies in parsing payload and design considerations for project PicketLink.
Background
PicketLink is a project that supports both SAML and WS-Trust specifications. In version 1.x, we utilized JAXB2 Object Model and Parsing mechanism provided by the JDK.
Challenges/Issues
- XML Security ( XML Digital Signature and XML Encryption) in the current version of specifications rely on a DOM representation of payload. ( If you see Sean Mullan's presentation, he talks about Stax as a potential future solution to the performance issues. )
- We need a Java object model to correctly represent the state of the payload that came on the wire.
- Because of XML Security, we had to parse the payload via DOM. Apply the XML Security semantics (Signature Validation and Decryption) and then take the DOM and apply xml transformation to pass to JAXB for an object model that PicketLink can rely on.
- JAXB2 parsing is extremely complex and performance intensive. There is very little control over either the object model or the prefix that gets written to the wire.
Performance Alternatives for Parsing
The JDK provides various solutions to XML parsing:
- JAXP - DOM and SAX parsing
- JAXB2
- StAX
We can choose any of these based on our needs - memory, time, space, complexity considerations.
- DOM parsing is extremely simple. But it can be trouble for large documents.
- SAX parsing is extremely fast. But the code can become unmaintainable.
- JAXB2 is performance intensive and there is not always clear binding between Java and XML.
- StAX is way better than JAXB2 in performance but slightly bad than SAX. It gives greater control to the developer in parsing.
Stax operates on the philosophy of XML Pull parsing where as the other approaches use Push mechanism.
Stax Design Considerations
There are two ways of doing Stax parsing.
- Streaming
- Event based reading.
Stax Streaming (XMLStreamReader) is extremely fast but the code can become cumbersome. You are dealing with a stream here - byte by byte. Stax Events mechanism (XMLEventReader) provides better code and is only slightly slower than the streaming mechanism.
If you have absolutely need blazing fast parsing, then streaming is the way to go.
In all normal circumstance, Event based parsing should suffice.
PicketLink Parsing
Design Choice 1:
Choice: As the first pass, we are going to continue using the JAXB2 object model (it is just a bunch of Java objects) for both saml and ws-trust.
But the parsing will be done using Stax.
Reason: The SAML object model is so large that it is not productive to hand craft the object model.
Design Choice 2:
Choice: We are going to use Event based parsing.
Reason: We want maintainable parsing code.
Design Choice 3:
Choice: We will use an Event filter that will only emit start and end elements.
Reason: We can write decent code with these two elements - StartElement and EndElement.
<saml2:AudienceRestriction> <saml2:Audience>http://services.testcorp.org/provider2</saml2:Audience> </saml2:AudienceRestriction>
In this example, our event filter would kick in to provide the following XML events to our parser.
<saml2:AudienceRestriction> <saml2:Audience> </saml2:AudienceRestriction>
When we reach the Audience start element, we are going to be making the call "getElementText() on the XMLEventReader". This method call basically chews in the end element for Audience.
Other Useful Information:
- There is a utility class called as StaxParserUtil. This provides the project with all the utility methods needed for parsing. All the methods throw a PicketLink ParsingException to wrap XMLStreamException.
- All our parsers implement the PicketLink interface ParserNamespaceSupport which has an important method called as "supports" which when passed a QName can tell whether it is capable of parsing that QName.
- When you get an element such as Conditions, it is better to have either a separate method or parser to parse the complex element.
- The getElementText() on the XMLEventReader gobbles up the endelement for that particular element.
StaxParserUtil Class
This is an important utility class in the PicketLink federation project. This should be the one source for getting the stax events as well as validating endelement or startelement.
Performance Numbers
On Lenovo T61, Fedora 13 with 4GB RAM. Sun HotSpot JDK1.6
File to Parse:
<wst:RequestSecurityToken Context="validatecontext2" xmlns:wst="http://docs.oasis-open.org/ws-sx/ws-trust/200512"> <wst:RequestType>http://docs.oasis-open.org/ws-sx/ws-trust/200512/BatchValidate</wst:RequestType> <wst:TokenType>http://docs.oasis-open.org/ws-sx/ws-trust/200512/RSTR/Status</wst:TokenType> <wst:ValidateTarget> <saml2:Assertion xmlns:saml2="urn:oasis:names:tc:SAML:2.0:assertion" ID="ID_cf9efbf0-9d7f-4b4a-b77f-d83ecaafd374" IssueInstant="2010-09-30T19:13:37.911Z" Version="2.0"> <saml2:Issuer>Test STS</saml2:Issuer> <saml2:Subject> <saml2:NameID NameQualifier="urn:picketlink:identity-federation">jduke</saml2:NameID> <saml2:SubjectConfirmation Method="urn:oasis:names:tc:SAML:2.0:cm:bearer"/> </saml2:Subject> <saml2:Conditions NotBefore="2010-09-30T19:13:37.911Z" NotOnOrAfter="2010-09-30T21:13:37.911Z"> <saml2:AudienceRestriction> <saml2:Audience>http://services.testcorp.org/provider2</saml2:Audience> </saml2:AudienceRestriction> </saml2:Conditions> <ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"> <ds:SignedInfo> <ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#WithComments"/> <ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmlds#rsa-sha1"/> <ds:Reference URI="#ID_cf9efbf0-9d7f-4b4a-b77f-d83ecaafd374"> <ds:Transforms> <ds:Transform Algorithm="http://www.w3.org/2000/09/xmlds#enveloped-signature"/> <ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/> </ds:Transforms> <ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmlds#sha1"/> <ds:DigestValue>TMZdBOA0MvR7aNpCAg2CXggkdZc=</ds:DigestValue> </ds:Reference> </ds:SignedInfo> <ds:SignatureValue> Q8mEzGWlnWmSmb+KUkP0wju4LOINaUYXBBXNF5vRhYVBixSUe8HSHKzNIdQ+ZGtijaV1vh0LUFbT //faZKyHRgPXtskDn8cJTVT6obp7rUIOCKMoCs5p9/bUAbtaQHYjfWpifdT3PaTdlehpS8INK2P0 JUQYU3q8F3u7je9VHbA= </ds:SignatureValue> <ds:KeyInfo> <ds:KeyValue> <ds:RSAKeyValue> <ds:Modulus> suGIyhVTbFvDwZdx8Av62zmP+aGOlsBN8WUE3eEEcDtOIZgO78SImMQGwB2C0eIVMhiLRzVPqoW1 dCPAveTm653zHOmubaps1fY0lLJDSZbTbhjeYhoQmmaBro/tDpVw5lKJwspqVnMuRK19ju2dxpKw lYGGtrP5VQv00dfNPbs= </ds:Modulus> <ds:Exponent>AQAB</ds:Exponent> </ds:RSAKeyValue> </ds:KeyValue> </ds:KeyInfo> </ds:Signature> </saml2:Assertion> </wst:ValidateTarget> </wst:RequestSecurityToken>
Numbers:
JAXB, time spent for 1000 iterations = 4169 ms or 4.169 secs STAX, time spent for 1000 iterations = 2347 ms or 2.347 secs JAXB, time spent for 10000 iterations = 21733 ms or 21.733 secs STAX, time spent for 10000 iterations = 16939 ms or 16.939 secs JAXB, time spent for 5000 iterations = 12613 ms or 12.613 secs STAX, time spent for 5000 iterations = 10216 ms or 10.216 secs
Note: Stax parsing just bypasses the contents of
<ds:signature />
element, by ignoring the streamed events.
Test Code:
private int runs = 1000; String fileName = "parser/perf/wst-batch-validate-one.xml"; public void testParsingPerformance() throws Exception { ClassLoader tcl = Thread.currentThread().getContextClassLoader(); InputStream configStream = tcl.getResourceAsStream( fileName ); Document doc = DocumentUtil.getDocument( configStream ); Source source = DocumentUtil.getXMLSource(doc); //JAXB way long start = System.currentTimeMillis(); for( int i = 0 ; i < runs; i++ ) { useJAXB( source ); } long elapsedTimeMillis = System.currentTimeMillis() - start; System.out.println("JAXB, time spent for " + runs + " iterations = " + elapsedTimeMillis + " ms or " + elapsedTimeMillis/1000F + " secs"); configStream = tcl.getResourceAsStream( fileName ); byte[] xmlData = new byte[ configStream.available() ]; //This can be a problem on some jvm configStream.read( xmlData ); //Stax Way start = System.currentTimeMillis(); for( int i = 0 ; i < runs; i++ ) { useStax( new ByteArrayInputStream( xmlData ) ); } elapsedTimeMillis = System.currentTimeMillis() - start; System.out.println("STAX, time spent for " + runs + " iterations = " + elapsedTimeMillis + " ms or " + elapsedTimeMillis/1000F + " secs"); } private void useJAXB( Source source ) throws Exception { WSTrustJAXBFactory.getInstance().parseRequestSecurityToken(source); } private void useStax( InputStream configStream ) throws Exception { WSTrustParser parser = new WSTrustParser(); parser.parse( configStream ); } }
An important point from David M Lloyd.
dmlloyd: with stax you tend to do some of your processing inline with parsing, as well as validation dmlloyd: with jaxb the validation and parsing come first, and then the processing is after
Comments