1 Reply Latest reply on Nov 5, 2012 10:32 PM by bondchan921

Character encoding changes in JBoss 5.1: UTF-8 vs ISO-8859-1 - how to handle?

mkantor Oct 15, 2012 10:02 AM

Upgrading from version 4.* to 5.1, I notice that some pages containing characters outside the regular ASCII range break - specifically, the http response from JBoss gets truncated mid-page. (These characters are being read from SQL).

I believe this has to do with character encoding. I have found references to the following two settings, which do not fix the problem:

1. In <JBOSS-ROOT>server\default\deploy\jbossweb.sar\server.xml, settings URIEncoding="UTF-8" in <Connector> elements.

2. In run.bat (or equivalent startup file), setting file encoding: set "JAVA_OPTS=-Dfile.encoding=utf-8 %JAVA_OPTS%"

After both these changes, http responses continue to include the header ContentType = "text/html;chartset=ISO-8859-1", and continue to have the truncation problem when characters outside the ASCII range are included.

I have the following solution which DOES solve the immediate problem, but I don't fully understand why, and am not confident in its correctness:

I wrote a servlet filter that ensures the output of text/html content pages is UTF-8 encoded:

public class EncodingFilter implements Filter {
   ...

    /**
    * Set the character encoding for request. Wrap the response and set character encoding
    * on the return trip.
    */
    public void doFilter(ServletRequest req, ServletResponse resp, FilterChain chain) throws IOException, ServletException {
        
        resp.setCharacterEncoding(encoding);
        req.setCharacterEncoding(encoding);
        
        // Create a wrapper around the response, so we can intercept it and change it later
        CharResponseWrapper wrapper = new CharResponseWrapper( (HttpServletResponse) resp);
        
        // Now let the request go through other filters and the servlet
        chain.doFilter(req, wrapper);
        
        PrintWriter respStream = resp.getWriter();
        
        // if content type is text/html, set character encoding
        if(wrapper.getContentType().substring(0,9).equals("text/html") && !wrapper.getContentType().contains("UTF-8")) {
            ((HttpServletResponse) resp).setHeader("Content-Type", "text/html;charset=" + encoding);
            ((HttpServletResponse) resp).setHeader("X-Wrapper-Encoding", "text/html;charset=" + wrapper.getCharacterEncoding());
                    
            // Now transfer the content from the wrapper to the response
            respStream.write(wrapper.toString());    
            
        } else {
            // Not text/html, so we just want the plain character stream
            wrapper.writeToStream(respStream);
        }
    }

The variable 'encoding' is read form web.xml, and has value "UTF-8". The CharResponseWrapper uses a CharArrayWriter to capture the response, and exposes its writeToStream and ToString methods.

As I said, this solves the immediate problem. The responses now say content type is "text/html;charset=UTF-8", and the pages are not truncated. Randomly sampled characters in the ASCII 128-255 range appear correctly in the browser. What I don't understand is:

1. What could be choking on the non-ASCII characters?

2. Why does changing the encoding help?

3. What changed about JBoss between 4.* and 5.1 to cause this problem?

4. Is the filter a correct solution to the problem?

5. Is there a better solution?

Any help is much appreciated.

1. Re: Character encoding changes in JBoss 5.1: UTF-8 vs ISO-8859-1 - how to handle?

bondchan921 Nov 5, 2012 10:32 PM (in response to mkantor)

Michael,

there is a bug in jboss-5.1.0.GA, https://community.jboss.org/message/497765#497765

You can find the full details on Jira JBAS-6442.
Actions