3 Replies Latest reply on Jan 18, 2007 1:50 AM by Frank LaRosa

    Unicode character issue - happens only on Linux

    Frank LaRosa Newbie

      Hi,

      I have a customer who regularly cuts text from Word documents before pasting them into forms I created for him on his web site. The text often contains non-UTF-8 characters such as u2019 for single quotes or u201C for double-quotes. We were having some problems storing these characters in our database, so I added a filter that replaces them with the standard quotes from the UTF-8 set.

      I tested my work by deploying to a local copy of JBoss on my workstation, which is a Windows XP computer, and it worked fine. I did the conversion using the String.replace function, for example:

      s = s.replace('\u201C', '"');

      However, when I deployed this to my production environment - which has the same version of Java, and the same version of JBoss, but is Linux - it failed. To see what was going on, I tried logging all the characters of the input string using s.codePointAt(). It turns out that instead of getting characters 201C and 2019, I'm getting character FFFD in both cases.

      Does anyone understand why this is happening? I have been working with Java for almost 7 years, and I have never encountered an inconsistency between its behavior on Linux and Windows before.

      Thanks,
      Frank