14 Replies Latest reply on Jun 20, 2007 5:40 AM by aloubyansky

    Ignoring non-ignorable white space?

    wolfc

      Why is JBossXB ignoring non-ignorable white space?

      Besides the obvious: because the code does so.

      public void characters(char ch[], int start, int length)
       {
       // todo look at this later
       // do not notify content handler if these are just whitespaces
       int i = start;
       while(i < start + length)
       {
       if(!Character.isWhitespace(ch[i++]))
       {
       contentHandler.characters(ch, start, length);
       break;
       }
       }
       }
      
       public void ignorableWhitespace(char ch[], int start, int length)
       {
       }

      In the end when I put '<env-entry-value> </env-entry-value>' in a deployment descriptor I get a null back.
      This is IMHO wrong.

        • 1. Re: Ignoring non-ignorable white space?
          aloubyansky

          If I just call characters() w/o filtering whitespaces I get 97 errors in the testsuite while it should be 3 (known ones).
          The problem is that whitespaces between e1 and e2 in the following example are reported as text content which is wrong.

          <e1>
           <e2>text</e2>
          </e1>
          


          • 2. Re: Ignoring non-ignorable white space?
            aloubyansky

            You get null using ObjectModelFactory API? Using XSD you should definitely get an empty string if the type of the element is xsd:string. You would get null if there was xsi:nil='1' attribute.

            • 3. Re: Ignoring non-ignorable white space?
              wolfc

              Yup. I've opened up http://jira.jboss.com/jira/browse/JBXB-103 and attached an unit test patch. (Why can't I commit there?)

              • 4. Re: Ignoring non-ignorable white space?
                aloubyansky

                I am running your test and logging the characters(). Here is the output:

                1292 DEBUG [SaxJBossXBParser] Using parser: org.apache.xerces.jaxp.SAXParserImpl@12d7a10, isNamespaceAware: true, isValidating: true, isXIncludeAware: true
                1412 DEBUG [SaxJBossXBParser] characters: '
                 '
                1432 DEBUG [SaxJBossXBParser] characters: ' '
                1432 DEBUG [IgnorableWhitespaceUnitTestCase] Add org.jboss.test.xml.IgnorableWhitespaceUnitTestCase$Top@161f10f null
                1432 DEBUG [SaxJBossXBParser] characters: '
                '


                How can we make a difference between the ignorable and non-ignorable?

                • 5. Re: Ignoring non-ignorable white space?
                  aloubyansky

                  If I modify the schema to

                  private static final String XSD =
                   "<?xml version='1.0' encoding='UTF-8'?>" +
                   "<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'" +
                   " targetNamespace='http://www.jboss.org/test/xml/simpleContent'" +
                   " xmlns='http://www.jboss.org/test/xml/simpleContent'" +
                   " elementFormDefault='qualified'" +
                   " attributeFormDefault='unqualified'" +
                   " version='1.0'>" +
                   " <xsd:element name='top'>" +
                   " <xsd:complexType>" +
                   " <xsd:sequence>" +
                   " <xsd:element name='string' type='xsd:string' minOccurs='0' maxOccurs='1'/>" +
                   " </xsd:sequence>" +
                   " </xsd:complexType>" +
                   " </xsd:element>" +
                   "</xsd:schema>";


                  and the class to
                  public static class Top
                   {
                   public String string;
                   }


                  and remove that interceptor then the top.string == ''.

                  • 6. Re: Ignoring non-ignorable white space?
                    wolfc

                    Wicked, when I try the following:

                    import java.io.IOException;
                    import java.io.StringReader;
                    import javax.xml.parsers.ParserConfigurationException;
                    import javax.xml.parsers.SAXParserFactory;
                    import javax.xml.transform.stream.StreamSource;
                    import javax.xml.validation.Schema;
                    import javax.xml.validation.SchemaFactory;
                    import javax.xml.validation.ValidatorHandler;
                    import org.xml.sax.InputSource;
                    import org.xml.sax.SAXException;
                    import org.xml.sax.XMLReader;
                    import org.xml.sax.helpers.DefaultHandler;
                    
                    
                    public class test {
                    
                     public static final String XSD = "<?xml version='1.0'?>\n"
                     + "<schema xmlns='http://www.w3.org/2001/XMLSchema'\n"
                     + " xmlns:test='jaxp13_test'\n"
                     + " targetNamespace='jaxp13_test'\n"
                     + " elementFormDefault='qualified'>\n"
                     + " <element name='test'>\n"
                     + " <complexType>\n"
                     + " <sequence>\n"
                     + " <element name='child' type='string' maxOccurs='unbounded'/>\n"
                     + " </sequence>\n"
                     + " </complexType>\n"
                     + " </element>\n"
                     + "</schema>\n";
                    
                     public static final String XML = "<?xml version='1.0'?>\n"
                     + "<ns:test xmlns:ns='jaxp13_test'>\n"
                     + " <ns:child> </ns:child>\n"
                     + " <ns:child> 123 </ns:child>\n"
                     + "</ns:test>\n";
                    
                    
                     private ValidatorHandler createValidatorHandler(String xsd)
                     throws SAXException {
                     SchemaFactory schemaFactory =
                     SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
                    
                     StringReader reader = new StringReader(xsd);
                     StreamSource xsdSource = new StreamSource(reader);
                    
                     Schema schema = schemaFactory.newSchema(xsdSource);
                     return schema.newValidatorHandler();
                     }
                    
                     private XMLReader createXMLReader() throws ParserConfigurationException,
                     SAXException {
                     SAXParserFactory parserFactory = SAXParserFactory.newInstance();
                     if (!parserFactory.isNamespaceAware()) {
                     parserFactory.setNamespaceAware(true);
                     }
                    
                     return parserFactory.newSAXParser().getXMLReader();
                     }
                    
                     private void parse(XMLReader xmlReader, String xml) throws SAXException,
                     IOException {
                     StringReader reader = new StringReader(xml);
                     InputSource inSource = new InputSource(reader);
                    
                     xmlReader.parse(inSource);
                     }
                    
                     public static void main(String argv[]) {
                     try {
                     new test().run();
                     } catch (Exception e) {
                     e.printStackTrace();
                     System.exit(1);
                     }
                     }
                    
                     public void run() throws SAXException, ParserConfigurationException,
                     IOException {
                     XMLReader xmlReader = createXMLReader();
                     ValidatorHandler validatorHandler = createValidatorHandler(XSD);
                     xmlReader.setContentHandler(validatorHandler);
                    
                     final boolean[] invoked = {false};
                     DefaultHandler contentHandler = new DefaultHandler() {
                     @Override
                     public void characters(char[] ch, int start, int length) throws SAXException
                     {
                     StringBuffer sb = new StringBuffer();
                     sb.append(ch, start, length);
                     System.err.println("characters: '" + sb.toString() + "'");
                     }
                    
                     public void ignorableWhitespace(char[] ch,
                     int start,
                     int length)
                     throws SAXException {
                     StringBuffer sb = new StringBuffer();
                     sb.append(ch, start, length);
                     System.err.println("whitespace: '" + sb.toString() + "'");
                     invoked[0] = true;
                     }
                     };
                     validatorHandler.setContentHandler(contentHandler);
                    
                     parse(xmlReader, XML);
                    
                     if (!invoked[0]) {
                     System.out.println("Method ignorableWhitespace() was not invoked.");
                     } else {
                     System.out.println("OK");
                     }
                     }
                    }

                    I get:
                    whitespace: '
                     '
                    characters: ' '
                    whitespace: '
                     '
                    characters: ' 123 '
                    whitespace: '
                    '
                    OK


                    java version "1.5.0_11"
                    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)
                    Java HotSpot(TM) Client VM (build 1.5.0_11-b03, mixed mode, sharing)

                    • 7. Re: Ignoring non-ignorable white space?
                      aloubyansky

                      Actually, even with maxOccurs='unbounded', if the type is xsd:string it works if you expect an empty string.

                      public void testCollectionOverrideProperty() throws Exception
                       {
                       SchemaBinding schema = XsdBinder.bind(new StringReader(XSD), null);
                      
                       schema.setIgnoreUnresolvedFieldOrClass(false);
                      
                       ClassMetaData classMetaData = new ClassMetaData();
                       classMetaData.setImpl(Top.class.getName());
                       ElementBinding element = schema.getElement(new QName(NS, "top"));
                       assertNotNull(element);
                       element.setClassMetaData(classMetaData);
                      
                       Top top = (Top) unmarshal("IgnorableWhitespaceContent.xml", schema, Top.class);
                       assertNotNull(top.string);
                       assertEquals(1, top.string.size());
                       assertEquals("", top.string.get(0));
                       }


                      • 8. Re: Ignoring non-ignorable white space?
                        wolfc

                        No, I expect a space.

                        There is no ValidatorHandler in between, that seems to be the problem.

                        • 9. Re: Ignoring non-ignorable white space?
                          aloubyansky

                          Yes, it should be a validating parser. There should be a config option for this. Otherwise, we'll have to implement it ourselves. I'll look into this.

                          • 10. Re: Ignoring non-ignorable white space?
                            aloubyansky

                            Before filtering characters, the element's type should be checked whether it can contain text content. A lot of the current tests fail if I do this. Mainly, the ones that use anyType with indented elements in its content.

                            • 11. Re: Ignoring non-ignorable white space?
                              aloubyansky

                              I've just committed the test and the following changes to trunk:
                              - if the type is simple then all the characters are reported as they appear in the xml content
                              - if the type doesn't allow text content then all the characters are ignored
                              - otherwise it depends on the value of SchemaBinding.isIgnoreWhitespacesInMixedContent()

                              For now, the default for the ignoreWhitespacesInMixedContent is true for backwards compatibility. But we can discuss this. On one hand, it's ok to report whitespaces and line breaks between elements as the text content in this case, on the other hand it may not be desirable.

                              Another question is if ignoreWhitespacesInMixedContent is true then should indentation be reported as an empty string or null.

                              • 12. Re: Ignoring non-ignorable white space?
                                wolfc

                                I like it.

                                How can I set this option from a regular ObjectModelFactory? (For example org.jboss.ejb3.metamodel.JBossDDObjectFactory.)

                                As for the last I would say it should return empty string, because then you can determine if an element was specified at all. But this breaks backwards compatibility.

                                • 13. Re: Ignoring non-ignorable white space?
                                  aloubyansky

                                  There is no option for the ObjectModelFactory at the moment. Maybe I'll add a common one to the Unmarshaller.
                                  As to the extended test you added with new-lines, that was part of the indentation recognition logic I was trying. So, it worked ;) I'll fix it.

                                  • 14. Re: Ignoring non-ignorable white space?
                                    aloubyansky

                                    I've added a test for the ObjectModelFactory to the IgnorableWhitespaceUnitTestCase. To make it pass you should set TrimTextContent to false (it can be set in the newRoot for all elements or in newChild per element):

                                    public Object newRoot(Object root, UnmarshallingContext ctx, String namespaceURI, String localName, Attributes attrs)
                                     {
                                     ctx.setTrimTextContent(false);
                                     return new Top();
                                     }


                                    Does this work for everybody?