1 Reply Latest reply on Feb 1, 2004 9:44 AM by julien1

    The unicode message issues of module bb

    paxsonyang

       

      "paxsonyang" wrote:
      First of all, I wanna say that the porting of module bb is really an admirable job.

      I installed the nukes as the internal knowledge sharing tool for our company in Taiwan. When using the module bb, I found that there was an issue for the unicode string handling of the message part. That's because the message will be formatted via TagFilter and CodeFilter. Therefore, the TagFilter will replace the "&" with "&". That's not a problem with the single "&" symbol. However, that should be a problem with the unicode string. When using the chinese, the "&" character of a unicode string will be replaced by the "&". That causes the display problem of chinese string.

      I found the same problem with the nukes on JBoss website.


      Ex. When display the unicode string which is
      中文
      will become
      中文
      .

      I fixed this problem last night by the following steps.

      [1] Add a new token named UNICODE_TEXT in TagAnalyzer.jj.
      <DEFAULT> TOKEN :
      {
       <OPEN_TAG: "<" (["A"-"Z","a"-"z"])+
       (
       (" ")+ (["A"-"Z","a"-"z"])+
       (" ")* "="
       (
       (" ")*("\""|"'")(~["\"","'"])*("\""|"'") |
       (["0"-"9","A"-"Z","a"-"z","-","_"])+
       )
       )*
       (" ")*
       (["/"])? ">"> |
       <CLOSE_TAG : "</" (["A"-"Z","a"-"z"])+ ">">
      }
      
      + <DEFAULT> TOKEN :
      + {
      + <UNICODE_TEXT: "&#" (["0"-"9"]){5,5} ";">
      + }
      
      <DEFAULT> TOKEN :
      {
       <AMP: "&"> |
       <GT: ">"> |
       <LT: "<"> |
       <CR: "\n">
      }
      
      <DEFAULT> TOKEN :
      {
       <TEXT: ~[]>
      }
      


      [2] Add the UNICODE_TEXT processing in TagFilter.
       case TEXT:
       log.info("### TEXT");
      
       chars.append(t.image.charAt(0));
       break;
      + case UNICODE_TEXT:
      + log.info("### UNICODE_TEXT");
      +
      + chars.append(t.image);
      + break;
      


      I have no idea whether this way can fix this problem perfectly. However, I can now display the chinese string without problems. Just for a reference to support the unicode string.