3 Replies Latest reply on Sep 30, 2005 4:56 AM by maint175

    Encoding problem -- looks like a bug

    sunpy

      I have a very basic webapp built with struts, basically it reads Chinese characters and saves them in a database. The main servlet looks like this:

      request.setCharacterEncoding("UTF-8");
      
      String inputTex = form.getEnteredText();
      
      <save inputTex to DB>
      

      The content header of input JSP was set to 'charset = "UTF-8". But the characters saved in the DB was all question marks. After fiddling for days, I added a statement like this:
      inputTex = new String(inputTex.getBytes("ISO-8859-1"), "UTF-8");

      and now everything works fine. It looks like JBoss would use ISO-8859-1 no matter whatever is set in the request. Beside getting input value from a form bean, I also tried getting it directly with
      inputTex = request.getParameter("input");

      and everything is same. The JBoss version I use is 4.0.1sp1. I saw similiar problem reported by others using search, but no one gave a definitive answer. Is this a bug?

        • 1. Re: Encoding problem -- looks like a bug
          raja05

          Hmm, that doesnt seem to be the case. Jboss by default uses the file.encoding value to be UTF-8, so you shoudnt even need to set the request to be UTF-8.
          I created a simple webapp which has a form with a textfield, submit which goes to a servlet, where i print back the value to the screen . Entering some japanese text in the form seems to print it back just fine.
          Would it be possible for u to send me a sample of ur application in case its complex that my test setup above ? my email is rajasaur at gmail dot com

          • 2. Re: Encoding problem -- looks like a bug
            sunpy

            I created a simple WAR after seeing your reply, and it seems to work without having to do the encoding conversion. Now I kind of suspect the problem was caused by Struts, as the application in my first question was built with Struts. I have sent the application to your email, really appreciate your help.

            • 3. Re: Encoding problem -- looks like a bug
              maint175

              Check for URIEncoding parameter in Connector configuration.

              URIEncoding

              This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used.