开发者

Why does Antlr think there is a missing bracket?

开发者 https://www.devze.com 2023-03-20 17:22 出处:网络
I\'ve created a grammar to parse simple ldap query syntax. The grammer is: expressi开发者_开发技巧on:LEFT_PAREN! (\'&\' | \'||\' | \'!\')^ (atom | expression)* RIGHT_PAREN! EOF ;

I've created a grammar to parse simple ldap query syntax. The grammer is:

expressi开发者_开发技巧on   :   LEFT_PAREN! ('&' | '||' | '!')^ (atom | expression)* RIGHT_PAREN! EOF ;

atom    :   LEFT_PAREN! left '='^ right RIGHT_PAREN! ;

left    :   ITEM;
right   :   ITEM;

ITEM        :   ALPHANUMERIC+; 
LEFT_PAREN  :   '(';
RIGHT_PAREN :   ')';

fragment ALPHANUMERIC
    :   ('a'..'z' | 'A'..'Z' | '0'..'9'); 

WHITESPACE : (' ' | '\t' | '\r' | '\n') { skip(); };

Now this grammar works fine for:

(!(attr=hello2))
(&(attr=hello2)(attr2=12))
(||(attr=hello2)(attr2=12))

However, when I try and run:

(||(attr=hello2)(!(attr2=12)))

It fails with: line 1:29 extraneous input ')' expecting EOF

If I remove the EOF off the expression grammar, everything passes, but then wrong numbers of brackets are not caught as being a syntax error. (This is being parsed into a tree, hence the ^ and ! after tokens) What have I missed?


As already mentioned by others, your expression has to end with a EOF, but a nested expression cannot end with an EOF, of course.

Remove the EOF from expression, and create a proper "entry point" for your parser that ends with the EOF.

file: T.g

grammar T;

options {
  output=AST;
}

parse
  :  expression EOF!
  ;

expression
  :  '('! ('&' | '||' | '!')^ (atom | expression)* ')'!
  ;

atom
  :  '('! ITEM '='^ ITEM ')'!
  ;

ITEM        
  :  ALPHANUMERIC+
  ;

fragment ALPHANUMERIC
  :  ('a'..'z' | 'A'..'Z' | '0'..'9')
  ;

WHITESPACE 
  :  (' ' | '\t' | '\r' | '\n') { skip(); }
  ;

file: Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "(||(attr=hello2)(!(attr2=12)))";
    TLexer lexer = new TLexer(new ANTLRStringStream(source));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}

To run the demo, do:

*nix/MacOS:

java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

Windows:

java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main

which produces the DOT code representing the following AST:

Why does Antlr think there is a missing bracket?

image created using graphviz-dev.appspot.com


In your definition of expression, there can be parentheses containing a nested expression, but the nested expression has to end in EOF. In your sample input, the nested expression doesn't end in EOF.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号