开发者

Collapsing some AST subtrees in Antlr using rewrite rules

开发者 https://www.devze.com 2023-02-10 03:05 出处:网络
I have the following grammar (i\'m showing only the important sectons) packageDeclaration :PACKAGE qualifiedIdentifier SEMI -> ^(PACKAGE qualifiedIdentifier+)

I have the following grammar (i'm showing only the important sectons)

packageDeclaration
    :   PACKAGE qualifiedIdentifier SEMI -> ^(PACKAGE qualifiedIdentifier+) 
    ;

qualifiedIdentifier
    :   (   IDENT               ->  IDENT
        )
        (   DOT ident=IDENT     -> 开发者_如何学编程 ^(DOT $qualifiedIdentifier $ident)
        )*
    ;

and let's say i have a package decleration of "package a.b.c" (Java parser btw)

What i get now is something of the form (package (. (. a b) c))

what i want is to inline the package so that i get (package a.b.c)

I would prefer not to change the rewriting for the qualifiedIdentifier, but only for the packageDeclaration.

How can i do this.

I understand what you mean by alias and + now. But i'm still unclear about the difference between qualifiedIdentifier and $qualifiedIdentifier in the rewrite rule.

As for my second question, what i mean is that i removed the rewrite rule for qualifiedIdentifier and for the package i have the following:

packageDeclaration
    :   PACKAGE ident=qualifiedIdentifier SEMI -> ^(PACKAGE $ident+) 
    ;

What i get as a result of this is nested tokens as in:

 (package
    (a) [end:a]

    (.) [end:.]

    (b) [end:b]

    (.) [end:.]

    (c) [end:c]
  ) [end:package]

Each token is represented as "(<token's text property>) [end: <token's text property>]"

I hope it's clear in the output above but i have one parent token (package) with 5 children. Now they are in the correct order and all that. What i would like is the same parent with only one child as in:

 (package
    (a.b.c) [end:a.b.c]
  ) [end:package]


If you want to remove the DOT's from the tree and only keep the tokens a, b and c from import a.b.c; (with PACKAGE as root, of course) try:

qualifiedIdentifier
  :  IDENT (DOT IDENT)* -> IDENT+
  ;

or with the DOT's, simply remove the rewrite rule:

qualifiedIdentifier
  :  IDENT (DOT IDENT)*
  ;

By the way, your packageDeclaration has an error in it:

packageDeclaration
  :  PACKAGE qualifiedIdentifier SEMI -> ^(PACKAGE qualifiedIdentifier+)
  ;

qualifiedIdentifier+ should be qualifiedIdentifier instead.

Coder wrote:

I would prefer not to change the rewriting for the qualifiedIdentifier, but only for the packageDeclaration.

That is not possible.

Unless you do not use qualifiedIdentifier inside packageDeclaration, in which case you could do something like:

packageDeclaration
  :  PACKAGE IDENT (DOT IDENT)* SEMI -> ^(PACKAGE IDENT+)
  ;

EDIT

Regarding your comments:

Coder wrote:

I'm a little confused about the meaning of + in the rewrite rule. The documentation here antlr.org/wiki/display/ANTLR3/Tree+construction says that you can break apart trees into sequences "a : (^(ID INT))+ -> INT+ ID+ ; // break apart trees into sequences ".

From the rule:

a 
  :  (^(ID INT))+ -> INT+ ID+
  ; 

(^(ID INT))+ means there are one or more ID's and one ore more INT's, and only then you can use + in the rewrite rule (everything to the right of ->).

You on the other hand have:

packageDeclaration 
  :  PACKAGE qualifiedIdentifier SEMI
  ; 

there's only one qualifiedIdentifier in there, so you can only use that sinle qualifiedIdentifier in your rewrite rule (no +!)

Coder wrote:

I'm also confused about the difference between specifying an alias in the rewrite rule as in "PACKAGE ident:qualifiedIdentifier -> $ident" vs, using quanlifiedIdentifier vs $quanlifiedIdentifier

An alias can be used if it's not apparent which not apparent which rule should be placed where in the tree or if you want more control over how the children are placed. For example, given the rule:

name
  :  ID '.' ID
  ;

and you want the first ID to become the right child. Doing:

name
  :  ID '.' ID > ^(NAME ID ID)
  ;

will place the first ID as the left child. To make it the right child, do:

name
  :  a=ID '.' b=ID > ^(NAME $b $a)
  ;

EDIT II

Coder wrote:

What i get as a result of this is nested tokens as in:

(SNIP)

Each token is represented as ...

Okay, I see what you mean. You must understand that the parser "feeds of" the lexer. The lexer chops up the input of characters and creates tokens from these characters (IDENT is such a token, just as DOT is). These tokens are then passed to the parser. The parser can't just create or merge tokens. So the answer is: what you want is not easily done, it certainly is not possible to do in "ANTLR syntax" inside your grammar.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号