4 Replies Latest reply on Jun 7, 2005 3:01 PM by chiba

    lexer won't tokenize identifiers with non us-ascii character

    hengels

      I'm using jboss aop and am trying to aspectize a method with a german umlaut in its name (that was not my idea). The Parser throws a CompileError("; is missing") in line 593 (version 3.0).

      Regards,

      Holger

        • 1. Re: lexer won't tokenize identifiers with non us-ascii chara
          hengels

          Please, can anyone comment on this? Is this possibly a bug? Can you please check the grammar, if valid (from the java language perspective) identifiers with non-usascii characters are tokenized correctly?

          Thanks,

          Holger

          • 2. FIX
            hengels

            the following patches fix the problem:

            --- javassist-3.0/src/main/javassist/preproc/Compiler.java 2005-01-18 15:53:48.000000000 +0100
            +++ javassist-3.0_patched/src/main/javassist/preproc/Compiler.java 2005-05-19 12:07:20.038916592 +0200
            @@ -199,8 +199,7 @@
             throws IOException
             {
             int c = skipSpaces(reader, ' ');
            - while ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z'
            - || '0' <= c && c <= '9' || c == '.' || c == '_') {
            + while (Character.isJavaIdentifierPart((char)c)) {
             buf.append((char)c);
             c = reader.read();
             }


            --- javassist-3.0/src/main/javassist/compiler/Lex.java 2005-01-18 15:53:48.000000000 +0100
            +++ javassist-3.0_patched/src/main/javassist/compiler/Lex.java 2005-05-19 13:09:20.419332800 +0200
            @@ -133,8 +133,7 @@
             return readSeparator('.');
             }
             }
            - else if ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z' || c == '_'
            - || c == '$')
            + else if (Character.isJavaIdentifierStart((char)c))
             return readIdentifier(c, token);
             else
             return readSeparator(c);
            @@ -434,8 +433,7 @@
             do {
             tbuf.append((char)c);
             c = getc();
            - } while ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z' || c == '_'
            - || c == '$' || '0' <= c && c <= '9');
            + } while (Character.isJavaIdentifierPart((char)c));
            
             ungetc(c);
            


            • 3. Re: lexer won't tokenize identifiers with non us-ascii chara
              kabirkhan
              • 4. Re: lexer won't tokenize identifiers with non us-ascii chara
                chiba

                I applied your patch to both Javassist 3.0 and 3.1