2 Replies Latest reply on Feb 12, 2013 9:44 AM by kattaw

    Escape Characters Being Unescaped in UTF-8 to ISO-8859-1 conversion

    kattaw

      In short, some XML special characters are being unescaped when some CFX code alters the encoding from UTF-8 to ISO-8859-1 in the process of returning a Soap response. We need to maintain a UTF-8 encoding.

       

      We’re seeing this issue in the context of a Jax-WS, and we’re using Jboss 6. The cxf version we are using is 2.5.1 and it is being pulled as a Maven dependency. Unfortunately, we do not have the option of moving to Jboss 7, so we’d really like to find a solution with our current version.

       

      We are attempting to send an XML document in a Soap response as an MTOM attachment, and it is a business requirement that the UTF-8 encoding of the document be maintained through the process and returned unchanged.  The desired encoding is set in the XML header (<?xml version="1.0" encoding="UTF-8"?>), and we have attempted to add the specified charset to the @XmlMimeType annotation on the field in question, with no change in behavior.

       

      While debugging through the cxf code, we noticed that a second message response variable is maintained, separate from the one we constructed. This second variable contains exactly our response, with one change: its encoding is set to ISO-8859-1. At some point before we actually return the response, our originally constructed message (UTF-8) is replaced with this newly created and wrongly encoded message (ISO-8859-1). In our web.xml file, however, we have created an encodingFilter (of class org.springframework.web.filter.CharacterEncodingFilter) which sets the encoding to UTF-8. Our response passes through this filter before being returned, so while we do technically return something  UTF-8 encoded, the message has already been altered during the UTF-8 to ISO-8859-1 conversion. During that conversion, some of the XML special characters become unescaped.

       

      Specifically, & gt; becomes <, & quot; becomes “, and & apos; becomes ‘. The other two special XML characters (& lt; and & amp;) remain in their original form, which we consider correct.

       

       

      Additionally we also tried using the cxf-api jar in the common-/lib folder of jboss instead of getting it from maven, but the result remained unchanged.

       

      In short, we are looking for some way to entirely avoid the UTF-8 to ISO-8859-1 conversion that occurs (potentially) in the InvokerJSE.invoke method. Is there perhaps a configuration we’re missing? All of our searches for one thus far have been fruitless. Any input would be very helpful.

       

      Example:

       

      Original Message: <root> hello & gt; world </root>

       

      Expected Output: <root> hello & gt; world </root>

       

      Actual Output: <root> hello > world </root>

       

      PLEASE NOTE: Throughout this post, I have spread out the escape sequences to prevent them from being unescaped when this question is posted. Ordinarily, all characters of the escape sequence are together without spaces.