I'm trying to build a compiler in C for a C like language. I have right now written a lexer which would read the input file and outputs a stream of tokens. Now, I understand the theory behind grammars and I know how to write parse trees by hand for different expressions. But I'm kind of lost relating it to the actual code! How should I start the proces开发者_运维百科s of parsing once I have got the tokens? To start with, what should my parser program do when it receives the first token? I know somehow it has to get arranged as a tree. But how do I start? Any help will be great, I'm just a beginner.
Thanks a lot!
the next token after it processes the current token. Now, I'm planning to start off by assuming that my language consists entirely of assignment statements.
So, the BNF form is something like:< assignment_statement > ::= < destination > := < factor >
< destination > ::= < identifier >
< identifier > ::= [a-zA-Z][a-zA-Z0-9]*
<factor > ::= < name >|< number >|< string >
< name > ::= < identifier >
< number > ::= [0-9]+
< string > ::="[a-zA-Z0-9]*"
Now, when I see a statement like : var := 9
, the lexer will return the first token as "var" , what will the parent rule be and the sub-rules be?
Also, how do I go about building the parse tree for this statement?
Thanks a lot!
A common pattern is to have the parser ask for tokens. The parser loop is usually of the form "read next token, determine what to do with it, update partially parsed tree, repeat" and it's easier to achieve if the parser calls the lexer itself (instead of a third piece of code reading from the lexer and feeding tokens to the parser).
Of course, the heart of the parser depends on the algorithm you're using (recursive descent, LR ...) but it usually consists in determining if the new token fits within the rule you are currently parsing (like finding a -
when reading EXPR = '-' EXPR | ...
), fits with a sub-rule (like finding a class
when reading DEF = CLASS | ...
such that CLASS = 'class' ...
) or does not fit at all (at which point you should terminate the current rule, constructing the corresponding AST node, and repeat the process on the parent rule).
Recursive descent parsers do this with sub-calls (for sub-rules) and returning values (for going back to the parent rule) while RL parsers tend to compact several rules and sub-rules into one and either shift to stay in the current set of rules or reduce to terminate the rules and build one or more AST nodes.
精彩评论