2 Replies Latest reply on Oct 11, 2018 9:08 AM by waterstorm

    Wildfly Umlaut (UTF-8) issue

    waterstorm

      Hello,

       

      I'm currently having a lot of issues (after upgrading Wildfly > 10) with special characters, such as the German "Umlaut" using AJP. I already created a Stackoverflow issues a while ago, but I simply wanted to ask here in the forums one more time before posting this as a Bug Report.

       

      I'll update my question according to my latest tests, but it will be very similar. So here we go:

       

      As already stated this problem happened after upgrading Wildfly. Originally (when writing my Stackoverflow question) I had 10 installed and upgraded to 13, but the problem still exists with the most current version (14.0.1).

      I'm have serious issues with encoding in Wildfly. I don't know if this is a Wildfly Bug so I'm asking for help here first. Maybe I just missed something.

      I also tested this with multiple Java versions in the meantime, so I'm pretty sure that it's not related to my Java 8 to Java 10 upgrade (which I stated in the Stackoverflow question).

       

      My setup is a bit more "complex":

      Shibboleth SP -> Apache2 (with AJP) -> Wildfly -> Wicket Application

      Before the upgrade, everything was working as excepted. I'd get the attributes of the logged in user via the HttpServletRequest in Java:

      (HttpServletRequest)getRequest().getContainerRequest().attributes

      For example if I wanted to get the display name I could do:

      ((HttpServletRequest)getRequest().getContainerRequest()).getAttribute("displayName")

      However, this I returns now for example "Ãberpruefung" instead of "Überpruefung". Before the update I did not have this issue.

      I've come a long way to post here, so I'll describe my (failed) steps to fix this issue in short

      Validating the problem

      First I checked and validated what exactly the issue is and it turned out that my string was encoded somehow/somewhere in ISO-8859-1 (latin-1) because I could "fix" the issue by doing:

      new String(attribute.getBytes("ISO-8859-1"), "UTF-8")

      However this seems to me to be nothing but a workaround, I'd rather would have this fixed (and it did work before after all...)

      1 Wildfly

      Obviously I though Wildfly is the issue, so I set UTF-8 as default. I've done this as suggested here for the server and the AJP listener

      <servlet-container name="default" default-encoding="UTF-8"><ajp-listener name="ajp" socket-binding="ajp" url-charset="UTF-8"/>

      This showed up in the Wildfly interface, so it was set, but it did not change anything on the issue.

      2 Java

      Second, I was thinking of the new Java 10 which I also updated back then around that time and which therefore could also be the reason for this.

      I tried setting all Java charsets to default to UTF-8 using the VM options:

      -Dfile.encoding=UTF8 -Dfile.io.encoding=UTF8 -DjavaEncoding=UTF8

      Just to be sure I checked this in Java using this code snippet

      System.err.println("Default Charset: " + Charset.defaultCharset());System.err.println("file.encoding: " + System.getProperty("file.encoding"));System.err.println("Default Charset in use: " + getDefaultCharSet());System.err.println("Request encoding: " + ((HttpServletRequest)getRequest().getContainerRequest()).getCharacterEncoding());

      Which returned:

      15:33:28,354 ERROR [stderr] (default task-1) Default Charset: UTF-815:33:28,355 ERROR [stderr] (default task-1) file.encoding: utf-815:33:28,356 ERROR [stderr] (default task-1) Default Charset in use: UTF815:33:28,356 ERROR [stderr] (default task-1) Request encoding: UTF-8

      Shibboleth / Apache2 / Wicket

      Checking all the steps of the way to make sure UTF-8 is default, even though it worked before in the same setup. I did not upgrade either of those.

      Shibboleth

      Docs here and here only speak of UTF-8 and checking the attributes using the URL /Shibboleth.sso/Session did show all chars just fine.

      Apache

      As suggested somewhere on Stackoverflow I've added AddDefaultCharset UTF-8 to /etc/apache2/apache2.conf.

      But this did not change anything either. The string was still showing up in the "wrong" charset in Java.

      Wicket

      As suggested here wicket can be set to UTF-8 as well. But this was already set in my Application.java:

      @Override    protected void init() {        super.init();         getMarkupSettings().setDefaultMarkupEncoding("UTF-8");        getRequestCycleSettings().setResponseRequestEncoding("UTF-8");    }

      So nothing new here either.

      Filter

      I've read some bug reports on the Wildfly JIRA about this issue and about custom filters for UTF-8 here, here and here (yes some of them are quite old and marked fixed, but I was kind of desperate). So I tried implementing a very basic filter as described in one of the reports and added it to the web.xml:

      public class Utf8Filter implements Filter {     @Override    public void init(FilterConfig filterConfig) {     }     @Override    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException {        System.err.println("encoding is " + servletRequest.getCharacterEncoding());        if (servletRequest.getCharacterEncoding() == null) {            servletRequest.setCharacterEncoding("UTF-8");        }         if (servletRequest.getCharacterEncoding() == null) {            System.err.println("could not set encoding"); Thread.dumpStack();        }         filterChain.doFilter(servletRequest, servletResponse);    }     @Override    public void destroy() {     }}

      This prints encoding is UTF-8 in the terminal, so it was obviously already set correctly. Therefore the string still showed up as ISO-8859-1:

      15:33:28,356 ERROR [stderr] (default task-1) Display Name: Ãberpruefung15:33:28,357 ERROR [stderr] (default task-1) Detected Charset of Display Name: ISO-8859-1

      Everything I can think of is now set manually to UTF-8. Clearly the string gets encoded somewhere in ISO-8859-1 but I just can't figure out where and how to prevent this.

      Did I miss something? How I can I get the attributes from the Servlet in UTF-8 so it will show the German "Umlaut" correctly?

      It clearly worked before, I just don't know why it does not work now. Any help is very appreciated. Thank you!

      The link to the original Stackoverflow questions: https://stackoverflow.com/questions/51542234/wildfly-13-utf-8-encoding-servlet-issues