4 Replies Latest reply on Sep 28, 2005 2:47 AM by tcomtcom

    Lost in the Charset (utf-8) configuration

    tcomtcom

      I'm trying to create a portal that is complete utf-8.
      My application is separate in a JBossPortlet that is preparing all ejb3 and then redirect the request on a JSP.

      To configure the utf-8 charset:
      1) I save the jsp file in utf-8 format.
      2) I place response.setContentType("text/html;charset=utf-8") at the beginning of the doView method in the portlet

      The portlet is like that:

      public void doView (JBossRenderRequest request, JBossRenderResponse response) throws PortletException, IOException
      {
       response.setContentType("text/html;charset=utf-8");
       request.setAttribute("eacute", "é");
       request.setAttribute("eacute2", "\uc3a9");
       PortletRequestDispatcher rd = getPortletContext().getRequestDispatcher(page.jsp);
       rd.include(request, response);
      }

      The JSP page.jsp that is save on the utf-8 format
      <%@ page language="java" %>
      <%@ taglib uri="/WEB-INF/tld/c-rt.tld" prefix="c" %>
      <p>
       é <br>
       ${eacute}
       ${eacute2}
      </p>

      And the result is that the browser (IE) detect the correct file format and print the page in utf-8.
      The first e is printing right (because the jsp is right encode)
      The second e is printing wrong, I get a chinese caracter. If I'm looking inside the hexacode of the file, I can find that the é had the code E9. E9 is the ASCII code of é, so what I think is that the é that is comming from the portlet have not been converted to utf-8.
      The third e is a '?', and I don't know why...

      Looking inside many documentation I find that for Servlet the method setContentType will convert every output to the charset specified. Is it the same for portlet ?? Should I configure something else ??

      I find that for servlet, it's possible to do this:
      PrintWriter out = new PrintWriter(new OutputStreamWriter(response.getOutputStream(), "UTF8"), true);
      Can I do somthing similar with portlets ?

      To resume, I don't know how to print in a good way the String that are comming from the java code? If someone have already turn in this problem, it would be great. thanks in advance

      David

      Note: Sorry for my bad english :(

        • 1. Re: Lost in the Charset (utf-8) configuration
          tcomtcom

          Hi all,

          I continue looking around my problem... The problem came when I use the

          response.getWriter().write("é").
          In java all is unicode, so the problem came when the writer try to write the text. Instead of writting the two bytes of the utf8 caracter "é" the ouput of this method is only one byte.
          This is probably a configuration problem. If some one know were to configure the output charset that must use the writer, please thanks to advise me.

          Have a good day

          • 2. Re: Lost in the Charset (utf-8) configuration

            Your web pages need a good doc type:

            <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
             "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">


            and a good encoding in the Head:

            <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

            IE is not going to understand entities the way you hope unless your are using standards - otherwise it's all hit and miss.

            Here is an enourmously useful presentation on the whole topic:
            http://www.w3.org/International/tutorials/tutorial-char-enc/


            • 3. Re: Lost in the Charset (utf-8) configuration
              tcomtcom

              First thanks for your answer. But this isn't helpful for me...

              I already use the doc type, but it's not located in the portlet but in the layout. My layout start like this:

              <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
              <%@ taglib uri="/WEB-INF/theme/portal-layout.tld" prefix="p" %>
              <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
              <head>
               <meta http-equiv="content-type" content="text/html; charset=utf-8">
               ...
              


              I thinks my problem is comming from the java output, that is not writting in utf-8. Maybe this message can have a better place in an other JBoss forum, but I don't know witch one ?? If someone know where...



              • 4. Re: Lost in the Charset (utf-8) configuration
                tcomtcom

                Hello all,

                I continue looking around to know where is my problem. Last thing that I did is to analise my HTTP header. And it's like that:


                HTTP/1.1 200 OK
                Server: Apache-Coyote/1.1
                X-Powered-By: Servlet 2.4; JBoss-4.0.3RC1 (build:CVSTag=JBoss_4_0_3_RC1 date=200506260723)/Tomcat-5.5
                Set-Cookie: JSESSIONID=E91F699FB51CFD9E9AFF56DAA8172F80; Path=/
                Content-Type: text/html;charset=ISO-8859-1
                Date: Wed, 28 Sep 2005 06:37:31 GMT
                Connection: close

                I dont' know why the charset is still set to ISO, I'have been changing all charset parameter to UTF-8... So where to change the http headers charset parameter ???