4 Replies Latest reply on Jun 7, 2005 3:01 PM by chiba

lexer won't tokenize identifiers with non us-ascii character

hengels May 18, 2005 11:10 AM

I'm using jboss aop and am trying to aspectize a method with a german umlaut in its name (that was not my idea). The Parser throws a CompileError("; is missing") in line 593 (version 3.0).

Regards,

Holger

1. Re: lexer won't tokenize identifiers with non us-ascii chara

hengels May 19, 2005 2:54 AM (in response to hengels)

Please, can anyone comment on this? Is this possibly a bug? Can you please check the grammar, if valid (from the java language perspective) identifiers with non-usascii characters are tokenized correctly?

Thanks,

Holger
Actions

2. FIX

hengels May 19, 2005 8:17 AM (in response to hengels)

the following patches fix the problem:

--- javassist-3.0/src/main/javassist/preproc/Compiler.java 2005-01-18 15:53:48.000000000 +0100
+++ javassist-3.0_patched/src/main/javassist/preproc/Compiler.java 2005-05-19 12:07:20.038916592 +0200
@@ -199,8 +199,7 @@
 throws IOException
 {
 int c = skipSpaces(reader, ' ');
- while ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z'
- || '0' <= c && c <= '9' || c == '.' || c == '_') {
+ while (Character.isJavaIdentifierPart((char)c)) {
 buf.append((char)c);
 c = reader.read();
 }

--- javassist-3.0/src/main/javassist/compiler/Lex.java 2005-01-18 15:53:48.000000000 +0100
+++ javassist-3.0_patched/src/main/javassist/compiler/Lex.java 2005-05-19 13:09:20.419332800 +0200
@@ -133,8 +133,7 @@
 return readSeparator('.');
 }
 }
- else if ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z' || c == '_'
- || c == '$')
+ else if (Character.isJavaIdentifierStart((char)c))
 return readIdentifier(c, token);
 else
 return readSeparator(c);
@@ -434,8 +433,7 @@
 do {
 tbuf.append((char)c);
 c = getc();
- } while ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z' || c == '_'
- || c == '$' || '0' <= c && c <= '9');
+ } while (Character.isJavaIdentifierPart((char)c));

 ungetc(c);

3. Re: lexer won't tokenize identifiers with non us-ascii chara

kabirkhan May 20, 2005 5:18 AM (in response to hengels)

Hi,

I've added this to JIRA http://jira.jboss.org/jira/browse/JASSIST-10
Actions
4. Re: lexer won't tokenize identifiers with non us-ascii chara

chiba Jun 7, 2005 3:01 PM (in response to hengels)

I applied your patch to both Javassist 3.0 and 3.1
Actions

Go to original post