-
1. Re: Ignoring non-ignorable white space?
aloubyansky Jun 7, 2007 5:56 AM (in response to wolfc)If I just call characters() w/o filtering whitespaces I get 97 errors in the testsuite while it should be 3 (known ones).
The problem is that whitespaces between e1 and e2 in the following example are reported as text content which is wrong.<e1> <e2>text</e2> </e1>
-
2. Re: Ignoring non-ignorable white space?
aloubyansky Jun 7, 2007 6:02 AM (in response to wolfc)You get null using ObjectModelFactory API? Using XSD you should definitely get an empty string if the type of the element is xsd:string. You would get null if there was xsi:nil='1' attribute.
-
3. Re: Ignoring non-ignorable white space?
wolfc Jun 7, 2007 7:10 AM (in response to wolfc)Yup. I've opened up http://jira.jboss.com/jira/browse/JBXB-103 and attached an unit test patch. (Why can't I commit there?)
-
4. Re: Ignoring non-ignorable white space?
aloubyansky Jun 8, 2007 5:09 AM (in response to wolfc)I am running your test and logging the characters(). Here is the output:
1292 DEBUG [SaxJBossXBParser] Using parser: org.apache.xerces.jaxp.SAXParserImpl@12d7a10, isNamespaceAware: true, isValidating: true, isXIncludeAware: true 1412 DEBUG [SaxJBossXBParser] characters: ' ' 1432 DEBUG [SaxJBossXBParser] characters: ' ' 1432 DEBUG [IgnorableWhitespaceUnitTestCase] Add org.jboss.test.xml.IgnorableWhitespaceUnitTestCase$Top@161f10f null 1432 DEBUG [SaxJBossXBParser] characters: ' '
How can we make a difference between the ignorable and non-ignorable? -
5. Re: Ignoring non-ignorable white space?
aloubyansky Jun 8, 2007 5:50 AM (in response to wolfc)If I modify the schema to
private static final String XSD = "<?xml version='1.0' encoding='UTF-8'?>" + "<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'" + " targetNamespace='http://www.jboss.org/test/xml/simpleContent'" + " xmlns='http://www.jboss.org/test/xml/simpleContent'" + " elementFormDefault='qualified'" + " attributeFormDefault='unqualified'" + " version='1.0'>" + " <xsd:element name='top'>" + " <xsd:complexType>" + " <xsd:sequence>" + " <xsd:element name='string' type='xsd:string' minOccurs='0' maxOccurs='1'/>" + " </xsd:sequence>" + " </xsd:complexType>" + " </xsd:element>" + "</xsd:schema>";
and the class topublic static class Top { public String string; }
and remove that interceptor then the top.string == ''. -
6. Re: Ignoring non-ignorable white space?
wolfc Jun 8, 2007 5:51 AM (in response to wolfc)Wicked, when I try the following:
import java.io.IOException; import java.io.StringReader; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParserFactory; import javax.xml.transform.stream.StreamSource; import javax.xml.validation.Schema; import javax.xml.validation.SchemaFactory; import javax.xml.validation.ValidatorHandler; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.DefaultHandler; public class test { public static final String XSD = "<?xml version='1.0'?>\n" + "<schema xmlns='http://www.w3.org/2001/XMLSchema'\n" + " xmlns:test='jaxp13_test'\n" + " targetNamespace='jaxp13_test'\n" + " elementFormDefault='qualified'>\n" + " <element name='test'>\n" + " <complexType>\n" + " <sequence>\n" + " <element name='child' type='string' maxOccurs='unbounded'/>\n" + " </sequence>\n" + " </complexType>\n" + " </element>\n" + "</schema>\n"; public static final String XML = "<?xml version='1.0'?>\n" + "<ns:test xmlns:ns='jaxp13_test'>\n" + " <ns:child> </ns:child>\n" + " <ns:child> 123 </ns:child>\n" + "</ns:test>\n"; private ValidatorHandler createValidatorHandler(String xsd) throws SAXException { SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); StringReader reader = new StringReader(xsd); StreamSource xsdSource = new StreamSource(reader); Schema schema = schemaFactory.newSchema(xsdSource); return schema.newValidatorHandler(); } private XMLReader createXMLReader() throws ParserConfigurationException, SAXException { SAXParserFactory parserFactory = SAXParserFactory.newInstance(); if (!parserFactory.isNamespaceAware()) { parserFactory.setNamespaceAware(true); } return parserFactory.newSAXParser().getXMLReader(); } private void parse(XMLReader xmlReader, String xml) throws SAXException, IOException { StringReader reader = new StringReader(xml); InputSource inSource = new InputSource(reader); xmlReader.parse(inSource); } public static void main(String argv[]) { try { new test().run(); } catch (Exception e) { e.printStackTrace(); System.exit(1); } } public void run() throws SAXException, ParserConfigurationException, IOException { XMLReader xmlReader = createXMLReader(); ValidatorHandler validatorHandler = createValidatorHandler(XSD); xmlReader.setContentHandler(validatorHandler); final boolean[] invoked = {false}; DefaultHandler contentHandler = new DefaultHandler() { @Override public void characters(char[] ch, int start, int length) throws SAXException { StringBuffer sb = new StringBuffer(); sb.append(ch, start, length); System.err.println("characters: '" + sb.toString() + "'"); } public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException { StringBuffer sb = new StringBuffer(); sb.append(ch, start, length); System.err.println("whitespace: '" + sb.toString() + "'"); invoked[0] = true; } }; validatorHandler.setContentHandler(contentHandler); parse(xmlReader, XML); if (!invoked[0]) { System.out.println("Method ignorableWhitespace() was not invoked."); } else { System.out.println("OK"); } } }
I get:whitespace: ' ' characters: ' ' whitespace: ' ' characters: ' 123 ' whitespace: ' ' OK
java version "1.5.0_11"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)
Java HotSpot(TM) Client VM (build 1.5.0_11-b03, mixed mode, sharing) -
7. Re: Ignoring non-ignorable white space?
aloubyansky Jun 8, 2007 5:56 AM (in response to wolfc)Actually, even with maxOccurs='unbounded', if the type is xsd:string it works if you expect an empty string.
public void testCollectionOverrideProperty() throws Exception { SchemaBinding schema = XsdBinder.bind(new StringReader(XSD), null); schema.setIgnoreUnresolvedFieldOrClass(false); ClassMetaData classMetaData = new ClassMetaData(); classMetaData.setImpl(Top.class.getName()); ElementBinding element = schema.getElement(new QName(NS, "top")); assertNotNull(element); element.setClassMetaData(classMetaData); Top top = (Top) unmarshal("IgnorableWhitespaceContent.xml", schema, Top.class); assertNotNull(top.string); assertEquals(1, top.string.size()); assertEquals("", top.string.get(0)); }
-
8. Re: Ignoring non-ignorable white space?
wolfc Jun 8, 2007 6:01 AM (in response to wolfc)No, I expect a space.
There is no ValidatorHandler in between, that seems to be the problem. -
9. Re: Ignoring non-ignorable white space?
aloubyansky Jun 8, 2007 7:44 AM (in response to wolfc)Yes, it should be a validating parser. There should be a config option for this. Otherwise, we'll have to implement it ourselves. I'll look into this.
-
10. Re: Ignoring non-ignorable white space?
aloubyansky Jun 11, 2007 6:14 AM (in response to wolfc)Before filtering characters, the element's type should be checked whether it can contain text content. A lot of the current tests fail if I do this. Mainly, the ones that use anyType with indented elements in its content.
-
11. Re: Ignoring non-ignorable white space?
aloubyansky Jun 14, 2007 6:54 AM (in response to wolfc)I've just committed the test and the following changes to trunk:
- if the type is simple then all the characters are reported as they appear in the xml content
- if the type doesn't allow text content then all the characters are ignored
- otherwise it depends on the value of SchemaBinding.isIgnoreWhitespacesInMixedContent()
For now, the default for the ignoreWhitespacesInMixedContent is true for backwards compatibility. But we can discuss this. On one hand, it's ok to report whitespaces and line breaks between elements as the text content in this case, on the other hand it may not be desirable.
Another question is if ignoreWhitespacesInMixedContent is true then should indentation be reported as an empty string or null. -
12. Re: Ignoring non-ignorable white space?
wolfc Jun 14, 2007 10:50 AM (in response to wolfc)I like it.
How can I set this option from a regular ObjectModelFactory? (For example org.jboss.ejb3.metamodel.JBossDDObjectFactory.)
As for the last I would say it should return empty string, because then you can determine if an element was specified at all. But this breaks backwards compatibility. -
13. Re: Ignoring non-ignorable white space?
aloubyansky Jun 18, 2007 9:53 AM (in response to wolfc)There is no option for the ObjectModelFactory at the moment. Maybe I'll add a common one to the Unmarshaller.
As to the extended test you added with new-lines, that was part of the indentation recognition logic I was trying. So, it worked ;) I'll fix it. -
14. Re: Ignoring non-ignorable white space?
aloubyansky Jun 20, 2007 5:40 AM (in response to wolfc)I've added a test for the ObjectModelFactory to the IgnorableWhitespaceUnitTestCase. To make it pass you should set TrimTextContent to false (it can be set in the newRoot for all elements or in newChild per element):
public Object newRoot(Object root, UnmarshallingContext ctx, String namespaceURI, String localName, Attributes attrs) { ctx.setTrimTextContent(false); return new Top(); }
Does this work for everybody?