Character encoding changes in JBoss 5.1: UTF-8 vs ISO-8859-1 - how to handle?
mkantor Oct 15, 2012 10:02 AMUpgrading from version 4.* to 5.1, I notice that some pages containing characters outside the regular ASCII range break - specifically, the http response from JBoss gets truncated mid-page. (These characters are being read from SQL).
I believe this has to do with character encoding. I have found references to the following two settings, which do not fix the problem:
1. In <JBOSS-ROOT>server\default\deploy\jbossweb.sar\server.xml, settings URIEncoding="UTF-8" in <Connector> elements.
2. In run.bat (or equivalent startup file), setting file encoding: set "JAVA_OPTS=-Dfile.encoding=utf-8 %JAVA_OPTS%"
After both these changes, http responses continue to include the header ContentType = "text/html;chartset=ISO-8859-1", and continue to have the truncation problem when characters outside the ASCII range are included.
I have the following solution which DOES solve the immediate problem, but I don't fully understand why, and am not confident in its correctness:
I wrote a servlet filter that ensures the output of text/html content pages is UTF-8 encoded:
public class EncodingFilter implements Filter { ... /** * Set the character encoding for request. Wrap the response and set character encoding * on the return trip. */ public void doFilter(ServletRequest req, ServletResponse resp, FilterChain chain) throws IOException, ServletException { resp.setCharacterEncoding(encoding); req.setCharacterEncoding(encoding); // Create a wrapper around the response, so we can intercept it and change it later CharResponseWrapper wrapper = new CharResponseWrapper( (HttpServletResponse) resp); // Now let the request go through other filters and the servlet chain.doFilter(req, wrapper); PrintWriter respStream = resp.getWriter(); // if content type is text/html, set character encoding if(wrapper.getContentType().substring(0,9).equals("text/html") && !wrapper.getContentType().contains("UTF-8")) { ((HttpServletResponse) resp).setHeader("Content-Type", "text/html;charset=" + encoding); ((HttpServletResponse) resp).setHeader("X-Wrapper-Encoding", "text/html;charset=" + wrapper.getCharacterEncoding()); // Now transfer the content from the wrapper to the response respStream.write(wrapper.toString()); } else { // Not text/html, so we just want the plain character stream wrapper.writeToStream(respStream); } }
The variable 'encoding' is read form web.xml, and has value "UTF-8". The CharResponseWrapper uses a CharArrayWriter to capture the response, and exposes its writeToStream and ToString methods.
As I said, this solves the immediate problem. The responses now say content type is "text/html;charset=UTF-8", and the pages are not truncated. Randomly sampled characters in the ASCII 128-255 range appear correctly in the browser. What I don't understand is:
1. What could be choking on the non-ASCII characters?
2. Why does changing the encoding help?
3. What changed about JBoss between 4.* and 5.1 to cause this problem?
4. Is the filter a correct solution to the problem?
5. Is there a better solution?
Any help is much appreciated.