-
1. Re: first stax performance test
aloubyansky May 7, 2010 6:54 AM (in response to aloubyansky)BTW, within the AS, in case of the SAX-based parser, the SAX parser factory will be created and initialized just once. All the XML files except the first one will benefit from the initialization during the first XML parsing.
-
2. Re: first stax performance test
jason.greene May 7, 2010 1:09 PM (in response to aloubyansky)Hi Alexey,
Which StAX impl are you using? In the past I found woodstox to be much faster than what is bundled in the JDK.
-
3. Re: first stax performance test
aloubyansky May 10, 2010 6:03 AM (in response to jason.greene)It was the one from Sun's jdk6 on windows. I'll try woodstox.
-
4. Re: first stax performance test
aloubyansky May 10, 2010 3:15 PM (in response to aloubyansky)I've tested the latest woodstox 4.0.8. And also have found a simply stupid bug in my previous test: XMLInputFactory was an instance variable instead of being a class variable. Sorry, that makes the previous tests not fare wrt StAX.
I've re-run the tests only for XB just switching the parsers: SAX, JDK StAX, Woodstox.
For all the parsers validation was disabled, namespace awerness enabled, XInclude disabled.
XB features: property replacement disabled, default reflection-based handlers.
JDK6 SAX JDK6 StAX Woodstox first run (in ms) 119.9 21.8 79.5 next 1000 runs (in ms) 939.2 546 463.3 Woodstox takes longer to initialize but the total time (first plus subsequent 1000 runs) is still slightly better than the JDK's StAX (542.8 vs 567.8).
WRT the XB testsuite, Woodstox showed the same failures as the JDK6 StAX, i.e. no support for default attribute values (same reason, i.e. schema validation is not supported) and XInclude.
Woodstox does support validation but only for DTD, which is not that interesting now.
It does make sense to look into switching to a StAX impl, which means the limitations above will have to be addressed. Default attribute values could be specified with Java binding annotations or initialized manually. But XInclude still has to be implemented. I'm gonna look into that.
Another thing, entity resolution (which is in JBossEntityResolver) will have to be adapted as well. StAX API, of course, doesn't use SAX's EntityResolver and InputSource.
-
5. Re: first stax performance test
jason.greene May 13, 2010 9:44 AM (in response to aloubyansky)Wow, they have really improved StAX JDK performance. It might not be worth using woodstox after all.
-
6. Re: first stax performance test
aloubyansky May 13, 2010 5:49 PM (in response to jason.greene)Actually, I had to continue with Woodstox. JDK's impl doesn't give me full DTD info. It does report the DTD event but I couldn't get publicId/systemId. But using Woodstox's API I could get to it. It's really necessary for the metadata and deployers. I got all the metadata tests passing and the AS booting.
We actually don't have any XML at the moment in the AS with XInclude. But I am sure there was something in MC using XInclude.
-
7. Re: first stax performance test
dmlloyd May 13, 2010 6:21 PM (in response to aloubyansky)Alexey Loubyansky wrote:
Actually, I had to continue with Woodstox. JDK's impl doesn't give me full DTD info. It does report the DTD event but I couldn't get publicId/systemId. But using Woodstox's API I could get to it. It's really necessary for the metadata and deployers. I got all the metadata tests passing and the AS booting.
We actually don't have any XML at the moment in the AS with XInclude. But I am sure there was something in MC using XInclude.
Wow, I'm surprised at this. Are you using the stream reader or event reader interface? Stream reader might be better (and, it might give access to the publicId/systemId stuff by way of getPIData or something like that).
-
8. Re: first stax performance test
aloubyansky May 14, 2010 1:52 AM (in response to dmlloyd)I originally used stream readers. And the test results above are for stream readers. But then I needed to get the DTD publicId/systemId and with stream readers I couldn't. Although I haven't tried getPIData(), I assumed it's for processing instructions. In the the standard StAX API there is general getText() and in case of JDK impl it returns something like "couldn't get the DTD info" and in case of Woodstox - just an empty string.
The only way I've found to get publicId/systemId so far is by using event readers and Woodstox-specific API.
I haven't run the performance comparison tests for streams vs events yet as I wanted to make it work for the metadata and the AS first.
-
9. Re: first stax performance test
aloubyansky May 14, 2010 11:09 AM (in response to aloubyansky)Here are the results of Woodstox evet readers vs stream readers (running the same tests as above against XB)
event readers stream readers first run (in ms) 101.6 85.8 next 1000 runs (in ms) 541.2 464.8 And here is the average of 10 AS 6 trunk start-ups comparing Woodstox event readers against SAX:
Woodstox: 18223.2 ms
SAX: 19038.3 ms
The results of the AS start-ups were very inconsistent. I.e. (I actually have booted the AS many more times) sometimes SAX would boot in 17 sec + something and Woodstox in 22 sec + something. But still on average the difference is probably close to the one above.
When I was testing the start-up I was offline, no anti-virus or other heavy processes. It's on windows vista.
-
10. Re: first stax performance test
dmlloyd May 20, 2010 12:41 PM (in response to aloubyansky)Alexey Loubyansky wrote:
You might be able to set an XMLResolver on the XMLInputFactory and use that to detect the publicId/systemId?
-
11. Re: first stax performance test
aloubyansky May 20, 2010 6:04 PM (in response to dmlloyd)I actually do set an XMLResolver. But it's a kind of a separate component, which actually wraps current JBossEntityResolver with all the registered schemas/dtds.
But, yes, it might be a good idea. I'll look into it.
-
12. Re: first stax performance test
dmlloyd May 20, 2010 8:45 PM (in response to aloubyansky)Alexey Loubyansky wrote:
I actually do set an XMLResolver. But it's a kind of a separate component, which actually wraps current JBossEntityResolver with all the registered schemas/dtds.
But, yes, it might be a good idea. I'll look into it.
It's probably OK to use the woodstox API too, I was just surprised that such a (seemingly basic) thing is so hard to do with a basic StAX spec implementation.
-
13. Re: first stax performance test
aloubyansky May 21, 2010 2:22 AM (in response to dmlloyd)The API is ok, i.e. it provides the methods to get the information but the impl doesn't.