Character encoding changes in JBoss 5.1: UTF-8 vs ISO-8859-1 - how to handle?
mkantor Oct 15, 2012 10:02 AMUpgrading from version 4.* to 5.1, I notice that some pages containing characters outside the regular ASCII range break - specifically, the http response from JBoss gets truncated mid-page. (These characters are being read from SQL).
I believe this has to do with character encoding. I have found references to the following two settings, which do not fix the problem:
1. In <JBOSS-ROOT>server\default\deploy\jbossweb.sar\server.xml, settings URIEncoding="UTF-8" in <Connector> elements.
2. In run.bat (or equivalent startup file), setting file encoding: set "JAVA_OPTS=-Dfile.encoding=utf-8 %JAVA_OPTS%"
After both these changes, http responses continue to include the header ContentType = "text/html;chartset=ISO-8859-1", and continue to have the truncation problem when characters outside the ASCII range are included.
I have the following solution which DOES solve the immediate problem, but I don't fully understand why, and am not confident in its correctness:
I wrote a servlet filter that ensures the output of text/html content pages is UTF-8 encoded:
public class EncodingFilter implements Filter {
...
/**
* Set the character encoding for request. Wrap the response and set character encoding
* on the return trip.
*/
public void doFilter(ServletRequest req, ServletResponse resp, FilterChain chain) throws IOException, ServletException {
resp.setCharacterEncoding(encoding);
req.setCharacterEncoding(encoding);
// Create a wrapper around the response, so we can intercept it and change it later
CharResponseWrapper wrapper = new CharResponseWrapper( (HttpServletResponse) resp);
// Now let the request go through other filters and the servlet
chain.doFilter(req, wrapper);
PrintWriter respStream = resp.getWriter();
// if content type is text/html, set character encoding
if(wrapper.getContentType().substring(0,9).equals("text/html") && !wrapper.getContentType().contains("UTF-8")) {
((HttpServletResponse) resp).setHeader("Content-Type", "text/html;charset=" + encoding);
((HttpServletResponse) resp).setHeader("X-Wrapper-Encoding", "text/html;charset=" + wrapper.getCharacterEncoding());
// Now transfer the content from the wrapper to the response
respStream.write(wrapper.toString());
} else {
// Not text/html, so we just want the plain character stream
wrapper.writeToStream(respStream);
}
}
The variable 'encoding' is read form web.xml, and has value "UTF-8". The CharResponseWrapper uses a CharArrayWriter to capture the response, and exposes its writeToStream and ToString methods.
As I said, this solves the immediate problem. The responses now say content type is "text/html;charset=UTF-8", and the pages are not truncated. Randomly sampled characters in the ASCII 128-255 range appear correctly in the browser. What I don't understand is:
1. What could be choking on the non-ASCII characters?
2. Why does changing the encoding help?
3. What changed about JBoss between 4.* and 5.1 to cause this problem?
4. Is the filter a correct solution to the problem?
5. Is there a better solution?
Any help is much appreciated.