8 Replies Latest reply on Feb 11, 2007 9:25 PM by aguizar

    Text nodes in SOAP messages

    aguizar

      In the TCK, the test case saaj/api/javax_xml_soap/SOAPElement contains two tests about adding text nodes to different SOAPElement subtypes, namely SOAPBody and SOAPHeader.

      Neither the specification nor the API documents are clear on which SOAPElement subtypes accept Text nodes. The TCK requires that SOAPBodies always accept Text nodes, whereas SOAPHeaders must accept them only if the protocol is SOAP 1.1.

      Apart from that, I saw our SOAPEnvelopeImpl class overrides addTextNode() to signal it is not legal to attach Text nodes to it. However, one can still add a Text node by invoking the DOM method appendChild().

      How should we treat Text nodes? AFAICS there are three approaches in increasing order of effort:

      1) Accept them in general, unless the TCK requires otherwise
      2) Behave as the RI does
      3) Scan the SOAP specifications and XML Schemas, as well as the Basic Profiles for explicit requirements or implicit clues

        • 1. Re: Text nodes in SOAP messages
          thomas.diesler

          The SOAP-1.2 spec is clear about text in body and bodyElement

          http://www.w3.org/TR/soap12-part1/#soapbody


          5.3 SOAP Body

          A SOAP body provides a mechanism for transmitting information to an ultimate SOAP receiver (see 2.5 Structure and Interpretation of SOAP Bodies).

          The Body element information item has:

          * A [local name] of Body .
          * A [namespace name] of "http://www.w3.org/2003/05/soap-envelope".
          * Zero or more namespace qualified attribute information items in its [attributes] property.
          * Zero or more namespace qualified element information items in its [children] property.

          The Body element information item MAY have any number of character information item children whose character code is amongst the white space characters as defined by XML 1.0 [XML 1.0]. These are considered significant.



          • 2. Re: Text nodes in SOAP messages
            thomas.diesler

            SOAP-1.1 is also clear and does not allow text

            http://www.w3.org/TR/2000/NOTE-SOAP-20000508/#_Toc478383503


            4.3 SOAP Body

            The SOAP Body element provides a simple mechanism for exchanging mandatory information intended for the ultimate recipient of the message. Typical uses of the Body element include marshalling RPC calls and error reporting.

            The Body element is encoded as an immediate child element of the SOAP Envelope XML element. If a Header element is present then the Body element MUST immediately follow the Header element, otherwise it MUST be the first immediate child element of the Envelope element.

            All immediate child elements of the Body element are called body entries and each body entry is encoded as an independent element within the SOAP Body element.

            The encoding rules for body entries are as follows:

            1. A body entry is identified by its fully qualified element name, which consists of the namespace URI and the local name. Immediate child elements of the SOAP Body element MAY be namespace-qualified.
            2. The SOAP encodingStyle attribute MAY be used to indicate the encoding style used for the body entries (see section 4.1.1).

            SOAP defines one body entry, which is the Fault entry used for reporting errors (see section 4.4).


            • 3. Re: Text nodes in SOAP messages
              thomas.diesler

              I only looked at the specs for SOAPBody. Could you please look at the other SOAP elements before we jump to the conclusion that text is generally allowed/disallowed. Once we found out we challege the CTS with our findings.

              SOAP-1.1
              Envelope: ?
              Header: ?
              HeaderElement: ?
              Body: no
              BodyElement: ?

              SOAP-1.2
              Envelope: ?
              Header: ?
              HeaderElement: ?
              Body: yes
              BodyElement: ?

              Thanks

              • 4. Re: Text nodes in SOAP messages
                aguizar

                According to the SOAP 1.2 section you quoted, text nodes are accepted in Body, but only if they are whitespace. The CTS actually adds non-whitespace text to a Body in the 1.2 namespace.

                SOAP 1.1 does not mention CIIs in Body, but that does not necessarily mean they are disallowed. Comparing the two schemas, neither seems to allow text context, unless it is white space for pretty-printing purposes. The CTS adds non-whitespace text to a Body in the 1.1 namespace as well.

                SOAP 1.1 Schema

                <xs:element name="Body" type="tns:Body"/>
                <xs:complexType name="Body">
                 <xs:sequence>
                 <xs:any namespace="##any" minOccurs="0" maxOccurs="unbounded" processContents="lax"/>
                 </xs:sequence>
                 <xs:anyAttribute namespace="##any" processContents="lax"/>
                </xs:complexType>

                SOAP 1.2 Schema
                <xs:element name="Body" type="tns:Body"/>
                <xs:complexType name="Body">
                 <xs:sequence>
                 <xs:any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
                 </xs:sequence>
                 <xs:anyAttribute namespace="##other" processContents="lax"/>
                </xs:complexType>


                Do we want to validate that only whitespace text is added?

                Furthermore, that SOAP 1.2 section specifies that unqualified elements and attributes are invalid. We do not currently check for that. How far do we want to go with validation?

                • 5. Re: Text nodes in SOAP messages
                  aguizar

                  In the earlier post I missed the following extract from the SOAP 1.2 spec:

                  5. SOAP Message Construct
                  [...]
                  Element information items defined by this specification that only have element information items defined as allowable members of their [children] property can also have zero or more character information item children whose character code is amongst the white space characters as defined by XML 1.0 [XML 1.0]. Unless otherwise indicated, such character information items are considered insignificant.


                  • 6. Re: Text nodes in SOAP messages
                    aguizar

                    The above extract allows me to complete the checklist:

                    SOAP-1.2
                    Envelope: whitespace only, insignificant
                    Header: whitespace only, insignificant
                    HeaderElement: any character, significant
                    Body: whitespace only, significant
                    BodyElement: any character, significant
                    Fault: whitespace only, insignificant
                    Detail: whitespace only, significant
                    DetailEntry: any character, significant

                    SOAP 1.1 does not explicitly allow or disallow them. The BP 1.2 explicitly disallows DTDs and PIs, but not text nodes. The only source of information on text nodes is the schema. All SOAP-1.1-defined elements have an element-only content type. Considering the validation rules from the XML Schema Part 1 (see below), SOAP-1.1 elements allow text nodes in the same way as their 1.2 counterparts, except that whitespace is always insignificant.

                    3.4.4 Complex Type Definition Validation Rules
                    [...]
                    If the {content type} is element-only, then the element information item has no character information item [children] other than those whose [character code] is defined as a white space in [XML 1.0 (Second Edition)].


                    • 7. Re: Text nodes in SOAP messages
                      thomas.diesler

                      I assume comments are not white space. Is that right?


                      2.5 Comments

                      [Definition: Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string "--" (double-hyphen) must not occur within comments.] Parameter entity references are not recognized within comments.




                      • 8. Re: Text nodes in SOAP messages
                        aguizar

                        Correct, comments are not whitespace. In SOAP 1.2 they are considered separately:

                        5. SOAP Message Construct
                        [...]
                        Comment information items MAY appear as children and/or descendants of the [document element] element information item but not before or after that element information item