开发者

How to write flex and bison file parsing this language?

开发者 https://www.devze.com 2023-04-12 04:09 出处:网络
Let\'s define a language: VAR := [0-9A-Za-z_]+ Exp := VAR VAR,\'=\',VAR \'(\', Exp, \')\' Exp, \'&\', Exp

Let's define a language:

VAR := [0-9A-Za-z_]+
Exp := VAR 
   | VAR,'=',VAR 
   | '(', Exp, ')'
   | Exp, '&', Exp 
   | Exp ,'|', Exp       

eg: "( a = b ) & ( c | (d=e) ) "is legal

I've read the YASS & Lex manual, but I'm totally confused , I just want the compiler that can parse this language

Can you tell me how to write the flex&bison configure file for this language?

I've done so far:

file a.l:

%{

#include <string.h>
#include "stdlib.h"
#include "stdio.h"
#include "y.tab.h"

%}

%%

("&"|"and"|"AND")   { return AND; }
("|"|"or"|"OR")   { return OR; }
("="|"eq"|"EQ")   { return EQ; }
([A-Za-z0-9_]+)   { return VAR;}
("(")   { return LB ;}
(")")   { return RB ;}
("\n")   { return LN ;}



%%

int main(void)
{
 yyparse();
 return 0;
}

int yywrap(void)
{
 return 0;
}

int yyerror(void)
{
  printf("Error\n");
  exit(1);
}

file a.y

%{
#include <stdio.h>
%}

%token AND OR EQ VAR LB RB LN

%left AND OR
%left EQ

%%

line : 
       | exp LN{ printf("LN: %s",$1);}
;

exp:    VAR             { printf("var:%s",$1);}
    |  VAR EQ VAR      { printf("var=:%s %s %s",$1,$2,$3);}
    |  exp AND exp      { printf("and :%s %s %s",开发者_如何学编程$1,$2,$3);}
    |  exp OR exp      { printf("or :%s %s %s",$1,$2,$3);}
    |  LB exp RB      { printf("abstract :%s %s %s",$1,$2,$3);}    

    ;

Now I edited file as Chris Dodd guided,it seems much better(at least the lex worked fine),but I get output like this:

disk_path>myprogram
a=b
var=:(null) (null) (null)LN: (null)ab=b
Error

So, why the function printf output null? and after input the second ,it prompt Error and exit the program?


First write a lex file to tokenize input (and print out what it sees)

You want to introduce the terminals:

  • [0-9A-Za-z_]+ --> VAR
  • ( --> LPAREN and ) --> RPAREN
  • & --> AND
  • | --> OR
  • = --> EQUAL

and just print out a word for each. For your example

( a = b ) & ( c | (d=e) ) --> LPAREN VAR EQUAL VAR RPAREN AND LPAREN VAR OR LPAREN VAR EQUAL VAR RPAREN RPAREN

This is doable in pure lex. When you do this, update your response and we can talk about the next step


Your lex rule ("[0-9A-Za-z_]+") will match (only) the literal string [0-9A-Za-z_]+ -- get rid of the " characters to have it be a pattern to match any identifier or number.

Your yacc code does not match your lex code for punctuation -- the lex code returns AND for & while the yacc code is expecting an & -- so either change the lex code to return '&' or change the yacc code to use the token AND, and similarly for |, (, and ). You might also want to ignore spaces in the lex code (rather than treating them as errors). You also have no lex rule to match and return '\n', even though you use that in your yacc grammar.

Your yacc code is otherwise correct, but is ambiguous, thus giving you shift/reduce conflicts. That's because your grammar is ambiguous -- an input like a&b|c can be parsed as either (a&b)|c or a&(b|c). You need to decide how that ambiguity should be resolved and reflect that in your grammar -- either by using more non-terminals, or by using yacc's built-in precedence support for resolving this kind of ambiguity. If you stick the declarations:

%left '|'
%left '&'

in the top of your yacc file, that will resolve the ambiguity by making both & and | left associative, and & higher precedence than |, which would be the normal interpretation.

Edit

The problem you have now is that you never define YYSTYPE (either directly or with %union) in your .y file and you never set yylval in your .l file. The first problem means that $1 etc are just ints, not pointers (so it makes no sense to try to print them with %s -- you should be getting a warning from your C compiler over that). The second problem means that they never have a value anyways, so its just always the default 0 value of an uninitialized global variable

The easiest fix would be to add

%union {
    const char *name;
}
%token <name> VAR LB RB LN
%left <name> AND OR
%left <name> EQ
%type <name> expr

to the top of the yacc file. Then change the all the lex rules to be something like

([A-Za-z0-9_]+)   { yylval.name = strdup(yytext); return VAR;}

Finally, you also need to change the bison actions for expr to set $$, eg:

|  LB exp RB      { asprintf(&$$, "%s %s %s",$1,$2,$3);  printf("abstract: %s\n", $$); }

This will at least work, though it will leak lots of memory for the allocated strings.

The last problem you have is that your line rule only matches a single line, so a second line of input causes an error. You need a recursive rule like:

line: /* empty */
    | line exp LN { printf....
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号