6 Replies Latest reply on May 22, 2004 12:58 PM by julien1

can nukes support chinese now

hawking May 19, 2004 5:30 PM

yesterday i download one version of nukes from cvs.and compiled then deployed at jboss,but still i found it can't support chinese(that is can't use chinese in the forum).

1. Re: can nukes support chinese now

hawking May 19, 2004 5:31 PM (in response to hawking)

next is the chinese word of "chinese":中文
Actions
2. Re: can nukes support chinese now

hawking May 19, 2004 5:34 PM (in response to hawking)

or is there anywhere i can configure to support chinese？
Actions
3. Re: can nukes support chinese now

julien1 May 21, 2004 4:11 AM (in response to hawking)

I looked at the problem two days ago. The problem is mostly a charset issue.

To reproduce and insert these chars in the HTML module, I had to switch my browser in "chinese charset". After I pasted the chinese chars and submited them. The thing is that they are stored as XML entities in the database.

When going to display, these are treated as text, so & becomes & and this not work. I have to look further how to change that, this is not a very obvious thing because that spans on many layers.
Actions
4. Re: can nukes support chinese now

hawking May 21, 2004 8:38 AM (in response to hawking)

thank you so much.
wait for your good news.
i search the forum and find this
can this help?

"paxsonyang" wrote:
First of all, I wanna say that the porting of module bb is really an admirable job.

I installed the nukes as the internal knowledge sharing tool for our company in Taiwan. When using the module bb, I found that there was an issue for the unicode string handling of the message part. That's because the message will be formatted via TagFilter and CodeFilter. Therefore, the TagFilter will replace the "&" with "&". That's not a problem with the single "&" symbol. However, that should be a problem with the unicode string. When using the chinese, the "&" character of a unicode string will be replaced by the "&". That causes the display problem of chinese string.

Quote:
I found the same problem with the nukes on JBoss website.

Ex. When display the unicode string which is Quote:
中文
will become Quote:
中文
.

I fixed this problem last night by the following steps.

[1] Add a new token named UNICODE_TEXT in TagAnalyzer.jj.
Code:

TOKEN :
{
<OPEN_TAG: "<" (["A"-"Z","a"-"z"])+
(
(" ")+ (["A"-"Z","a"-"z"])+
(" ")* "="
(
(" ")*("\""|"'")(~["\"","'"])*("\""|"'") |
(["0"-"9","A"-"Z","a"-"z","-","_"])+
)
)*
(" ")*
(["/"])? ">"> |
<CLOSE_TAG : "</" (["A"-"Z","a"-"z"])+ ">">
}

+ TOKEN :
+ {
+ <UNICODE_TEXT: "&#" (["0"-"9"]){5,5} ";">
+ }

TOKEN :
{
<AMP: "&"> |
<GT: ">"> |
<LT: "<"> |
<CR: "\n">
}

TOKEN :
{
<TEXT: ~[]>
}

[2] Add the UNICODE_TEXT processing in TagFilter.
Code:

case TEXT:
log.info("### TEXT");

chars.append(t.image.charAt(0));
break;
+ case UNICODE_TEXT:
+ log.info("### UNICODE_TEXT");
+
+ chars.append(t.image);
+ break;

I have no idea whether this way can fix this problem perfectly. However, I can now display the chinese string without problems. Just for a reference to support the unicode string.
Actions
5. Re: can nukes support chinese now

julien1 May 21, 2004 8:57 AM (in response to hawking)

Yes I understand why it fixes the issue.

Then I am wondering if doing that is the normal way to do it. I mean when I select chinese as charset in my browser and submit the infos, that force the browser to send the chars as unicode entity. So another way to handle it would be to decode such request and translate that back into UTF-16 unicode (as in java.lang.String).

On the other hand, the Parser stack should provide support for unicode and the above code seems to do that in the good way so I would integrate first that code and see if it gives.
Actions
6. Re: can nukes support chinese now

julien1 May 22, 2004 12:58 PM (in response to hawking)

I have modified the parser stack to take in account that behaviour. The above code is legacy and I had to adapt to the current parsing scheme we have.

Of course feedback is appreciated.
Actions

Go to original post