I spent quite a long time for investigating possibel schema validator solutions. Basically, schema validation is part of the Java XML API and is not something new. I was struggled by the tremendious parsing time of some of the Java EE descriptors, like the application_6.xsd for example. This takes up to 30 seconds. I tried a long time to serialize parsed schemas but this failed with all serializers if found (xstream, kryo).
Finally, I profiled the parsing step and realized that the time is spent in HTTP calls resolving external entities (xs:includes), mainly by downloading the xsd.xml. After that, a prototype was quickly developed. The solution is based on the Xerces XNI library. There is a new package called schema-validator.
Here is an example of validating an XML file against one of the supported schemas:
final XmlValidator validator = new XmlValidator(SchemaType.XSD);
Including initializing, the validation requires about 0.4s instead of 20-30s. But you can keep the instance for other validations and will be therefore even quicker.
Currently, the validator is not integrated into the metadata-parser and allows to validate against the supported schemas.
THe link to the branch is: https://github.com/rbattenfeld/descriptors/tree/SHRINKDESC-130
Let me know what you think and possible further steps.