1 2 Previous Next 22 Replies Latest reply on Jan 10, 2006 3:52 AM by aloubyansky

    XSD parsing and annotations

       

      1 DEBUG [JbxbPojoServerUnitTestCase] ==== Starting testSimpleCollection ====
      1 DEBUG [JbxbPojoServerUnitTestCase] ================ Getting Schema Binding
      66 TRACE [Util] loading xsd: /home/adrian/jboss-head/workspace/testsuite/src/resources/xml/jbxb-bean-deployer_1_0.xsd
      700 TRACE [Util] Loaded xsd: /home/adrian/jboss-head/workspace/testsuite/src/resources/xml/jbxb-bean-deployer_1_0.xsd in 634ms
      ...
      2398 DEBUG [JbxbPojoServerUnitTestCase] ================ Got Schema Binding in 2397ms
      ...
      3036 DEBUG [JbxbPojoServerUnitTestCase] testSimpleCollection took 3035ms
      


      The schema/annotation parsing seems to be taking a long time, admittedly I have TRACE logging enabled. Without TRACE logging:

      0 DEBUG [JbxbPojoServerUnitTestCase] ==== Starting testSimpleCollection ====
      0 DEBUG [JbxbPojoServerUnitTestCase] ================ Getting Schema Binding
      1607 DEBUG [JbxbPojoServerUnitTestCase] ================ Got Schema Binding in 1606ms
      ...
      2189 DEBUG [JbxbPojoServerUnitTestCase] testSimpleCollection took 2189ms
      2189 DEBUG [JbxbPojoServerUnitTestCase] ==== Stopping testSimpleCollection ====
      


      The problem appears to be that it does a first pass of the parsing
      using Xerces to load the model
      Util.loadSchema
       XSModel model = schemaLoader.loadURI(xsdURL);
      

      then it parses each annotation individually using a new parser for each.

      Are there any plans to improve this?
      Wouldn't it be better to just write our own XSD model/parser and remove the dependency
      on Xerces altogether? i.e. any SAX parser could be used.

      The only related issue I can find is a (bad in my opinion) request from the WS team
      to get a hold of the Xerces model from JBossXB.
      http://jira.jboss.com/jira/browse/JBXB-33

        • 1. Re: XSD parsing and annotations
          starksm64

          I have been complaing about this double parse and the poor state of xsd integration into the jaxp apis. The todo that I don't think has a formal jira issue is to look to some of the info set apis in jdk5 that I believe are tied to w3c proposed recommendations for better integration of the xsd info. I believe you can disable validation at the parser level to avoid the double parse.

          • 2. Re: XSD parsing and annotations

            Re: JAXP 1.3

            There is improved support for schemas and data types,
            http://java.sun.com/developer/technicalArticles/xml/jaxp1-3/
            but the schema enhancements appear to be limited to validation:
            http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/validation/Schema.html
            rather than retrieval/manipulation of the model.

            • 3. Re: XSD parsing and annotations
              starksm64

              The only potential tie between jaxp and the xsd model I can see is the post schema validation infoset (PSVI) that is talked about in the following xerces faq: http://xerces.apache.org/xerces2-j/faq-xs.html#faq. There is a connection to jaxp via the javax.xml.validation.* api, but its not clear that this provides sufficient info. This is what I had asked Alexey to look into as a way to have a single pass validation driven that the parser which produces the xsd object model needed for jbossxb as a byproduct.

              • 4. Re: XSD parsing and annotations
                anil.saldhana

                Duplicating the effort made by the Xerces research team in building a schema model into our own custom schema model (based on a sax parser), as suggested by Adrian, may work positively for JBoss, when it comes to dependencies. But,

                The problem is the effort required at building a Schema Model of our own and taking care of every case that a schema can support. I think Xerces is probably the best case implementation out there when it come to a schema model implementation.

                With regard to http://jira.jboss.com/jira/browse/JBXB-33, the ideal solution would be a read/write schema model that exists in JBossXB (Xerces can be an implementation model, as it is now), which the web services module can make use of.

                • 5. Re: XSD parsing and annotations

                  Yes, but like I said elsewhere, that api is not a standard it is only a submission to w3c.

                  There is no guarantee it will become a standard, or that Xerces will maintain the api
                  across versions.

                  It if it does become a standard, it will certainly change package names
                  and probably implementation details as it goes through the standards body.

                  So why not implement this api using ANY sax parser in an org.jboss.xs namespace.
                  It can be discarded later if/when it becomes a part of JAXP/DOM.

                  It seems to me that this will be especially important when this information/processing
                  is needed in a client environment where we don't control the parser being used
                  or the JAXP/DOM level of the JDK.

                  • 6. Re: XSD parsing and annotations
                    anil.saldhana

                    I had built a schema model based off of the JBossXB Object Model framework in March/April timeframe. That survived for like a forthnight, before a collective decision was made to move to Xerces schema api.

                    I think xml-commons on Apache also has a schema model that has passed the w3c TCK. Just FYI. Then there is Apache XMLBeans.

                    Given this, if we really have to make a decision on having a custom schema model, then the decision needs to be done asap as the dependencies on the JBossXB project are across - includes Web Services, Microcontainer and counting.

                    • 7. Re: XSD parsing and annotations

                       

                      "anil.saldhana@jboss.com" wrote:

                      I think xml-commons on Apache also has a schema model that has passed the w3c TCK. Just FYI. Then there is Apache XMLBeans.


                      Hmm, yet more dependency on non-POSS projects. :-)

                      Find me a committer that will support this software for JBoss otherwise
                      WE must support it either inside the Apache projects or our own implementation.

                      I for one dislike the trend of dumping Apache software inside the JBoss project
                      because of "we need to do it ASAP" without thought to how it will supported
                      going forward or the integration problems it causes us or our users.

                      There is a long litany of issues with apache-commons relating to its quality
                      (or at least suitablity for use by platform software), stability of api and conflicts with user versions.

                      At least when the likes of Sun or IBM do this, they change the package names
                      to avoid the most basic conflicts.

                      • 8. Re: XSD parsing and annotations

                         

                        "anil.saldhana@jboss.com" wrote:
                        I had built a schema model based off of the JBossXB Object Model framework in March/April timeframe.


                        Link please. I don't remember any discussion on a public mailing list?

                        • 9. Re: XSD parsing and annotations
                          anil.saldhana

                          Sorry, Adrian. No public discussion was done then. :(

                          The schema model that I built using JBossXB was more like a prototype that never went further (developed and vanished in a forthnight).

                          • 10. Re: XSD parsing and annotations
                            starksm64

                             

                            "adrian@jboss.org" wrote:

                            So why not implement this api using ANY sax parser in an org.jboss.xs namespace.
                            It can be discarded later if/when it becomes a part of JAXP/DOM.

                            Ok, but this is beyond the current issue of double parsing the schema. I was talking about just doing a better job on the current implmeentation to avoid the redundant work. If there are uses of the schema model outside of jbossxb then that needs to be decoupled from the parser to avoid leaking implementation details through the api.

                            • 11. Re: XSD parsing and annotations
                              aloubyansky

                              I am aware of these issues. To clarify, I think, Scott and Adrian are talking about two different double parsings.
                              Scott is talking about parsing whole schemas when validation is enabled (disabling validation is a possible workaround). While Adrian is talking about parsing each xsd:annotation separately (xsd annotations are available as unparsed strings in the xerces' schema model, so, there is no other way around except for our own schema parser in this case).

                              "adrian" wrote:
                              Wouldn't it be better to just write our own XSD model/parser and remove the dependency
                              on Xerces altogether? i.e. any SAX parser could be used.


                              That would be better and I have been thinking about ivestigating this. Since this now becomes a real problem I'll raise the priority.

                              "adrian" wrote:
                              The only related issue I can find is a (bad in my opinion) request from the WS team
                              to get a hold of the Xerces model from JBossXB.
                              http://jira.jboss.com/jira/browse/JBXB-33


                              I think, we will switch to SchemaBinding representation of the schema in WS anyway. So, that should not be an issue.

                              "adrian" wrote:
                              So why not implement this api using ANY sax parser in an org.jboss.xs namespace.


                              The schema API already exists and it is the SchemaBinding API. Eventually, it should be repackaged since it is in the **.unmarshalling.** package.

                              • 12. Re: XSD parsing and annotations
                                aloubyansky

                                Besides implementing an XSD parser that would create an instance of SchemaBinding out of the input schema[s], there are validation problems:

                                * schema validation. Currently, we rely on Xerces. If we implement our own schema parser we have to
                                a) validate the schema ourselves (AFAICT, it's far from being trivial);
                                b) trust user and assume that the schema is valid;
                                c) make a config option and use one of the available APIs to validate the passed in schema[s] which means double parsing.

                                * xml validation. If validation is enabled, SAX parser will have to parse the XSD (double parsing). If we want to avoid double parsing here we have to implement XML validation. Which is also not trivial. As I mentioned before in other threads, XML validation is available to some extent currently (as side effect of model group binding support), i.e. what is validated is the position of an element in the XML content. Attributes, values, other constraints are not validated.

                                • 13. Re: XSD parsing and annotations

                                  Am I correct in assuming?

                                  1) Our schema parse will only happens once
                                  2) Validation requires the JAXP implementation to reparse the schema
                                  3) We can make validation optional
                                  4) We can optimize the validation/schema parse later
                                  5) JDK5 has features towards pluggable/optimized schema validation
                                  http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/validation/SchemaFactory.html#newInstance(java.lang.String)
                                  6) Even without JDK5 it is likely our endorsed JAXP implementation has this api?

                                  • 14. Re: XSD parsing and annotations
                                    aloubyansky

                                     

                                    "adrian@jboss.org" wrote:
                                    1) Our schema parse will only happens once


                                    If we write our own xsd parser and make xsd/xml validation optional or implement validation ourselves then yes

                                    "adrian@jboss.org" wrote:
                                    2) Validation requires the JAXP implementation to reparse the schema


                                    Yes. Maybe not necessarily JAXP.

                                    "adrian@jboss.org" wrote:
                                    3) We can make validation optional


                                    Yes. But that's not nice.

                                    "adrian@jboss.org" wrote:
                                    4) We can optimize the validation/schema parse later


                                    Yes. I like this one.

                                    "adrian@jboss.org" wrote:
                                    5) JDK5 has features towards pluggable/optimized schema validation


                                    Right. But I see it as a re-parse.

                                    "adrian@jboss.org" wrote:
                                    6) Even without JDK5 it is likely our endorsed JAXP implementation has this api?


                                    E.g. Xerces. But I haven't looked into internals yet to find out whether/how we can give it already parsed XSModel.

                                    1 2 Previous Next