Unicode peculiarity with JBoss/Tomcat

jimotte Dec 16, 2002 11:32 AM

Well, this is the situation- I was perplexed as to why in a certain JSP page the user would input the bullet character (unicode value \u0222 or #8226; and ANSI - ISO_8859-1 value of #149). I wanted to get the unicode value from the user input so I tried the following:
1. I put <%@ page contentType = "text/html; charset=UTF-8"%> and <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=UTF-8"> in the jsp- and thought by calling in my bean that receives the parameters- String s = new String(parameterString, "UTF8"); that it would get me the unicode value of #8226 when I did a conversion of the character using (Integer.toString( charVal ));- however it didnt- it always came up #149 - as if it was ignoring the unicode headers. I am running JBoss 2.4.6 with Tomcat 4.0.3. Standalone . So this didnt work so I tried this:

2. I set the previous jsp headers but as :
<%@ page contentType = "text/html; charset=US-ASCII"%>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=US-ASCII">
and in my bean I just used the string as it came in from the parameter and used Integer.toString( charVal ); and using my standalone version of tomcat 4 with JBuilder out popped #8226 - the value I wanted- thought I had it solved!! But once run in JBoss container I kept getting #149; what was going on I thought.....So I did some more digging and tried this:

3. IN my jsp page I did not place any encoding tags at all- but in my bean I placed:
String toConvert = new String(toConvert2.getBytes("8859_1"));
and then get the charVal off of that String and voila- out popped #8226; - the unicode value I wanted.

My question is now that I have it working- why does it work this way??
Is this a bug with jboss/tomcat that ignores my headers (thus making 2 not work)?
Why would I not be able to say
String toConvert = new String(toConvert2.getBytes(), "8859_1");as that to me would be saying the original String is 8859_1 and I want unicode. But this does not work.
What does work String toConvert = new String(toConvert2.getBytes("8859_1"));
seems to me to be saying get the original String 8859_1 bytes and the encoding is unicode- which works- but seems counterintuitive as it really seems to be saying get me the original string 8859_1 bytes and use the unicode char set (which it really was not) to create the unicode string.

Am I missing something- does anyone else have any insight as to what is going on here and why it works this way- just for my own comprehension???

Thanks
Jim