Recursive descent parser

Incomputer science,arecursive descent parseris a kind oftop-down parserbuilt from a set ofmutually recursiveprocedures (or a non-recursive equivalent) where each suchprocedureimplements one of thenonterminalsof thegrammar.Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.[1][2]

Apredictive parseris a recursive descent parser that does not requirebacktracking.[3]Predictive parsing is possible only for the class ofLL(k)grammars, which are thecontext-free grammarsfor which there exists some positive integerkthat allows a recursive descent parser to decide which production to use by examining only the nextktokens of input. The LL(k) grammars therefore exclude allambiguous grammars,as well as all grammars that containleft recursion.Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(k) grammar. A predictive parser runs inlinear time.

Recursive descent with backtracking is a technique that determines whichproductionto use by trying each production in turn. Recursive descent with backtracking is not limited to LL(k) grammars, but is not guaranteed to terminate unless the grammar is LL(k). Even when they terminate, parsers that use recursive descent with backtracking may requireexponential time.

Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by aparser generator,[citation needed]either for an LL(k) language or using an alternative parser, such asLALRorLR.This is particularly the case if a grammar is not inLL(k)form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools likeANTLR.

Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.[4]

Example parser

edit

The followingEBNF-likegrammar(forNiklaus Wirth'sPL/0programming language, fromAlgorithms + Data Structures = Programs) is inLL(1)form:

program=block".".

block=
["const"ident"="number{","ident"="number}";"]
["var"ident{","ident}";"]
{"procedure"ident";"block";"}statement.

statement=
ident":="expression
|"call"ident
|"begin"statement{";"statement}"end"
|"if"condition"then"statement
|"while"condition"do"statement.

condition=
"odd"expression
|expression("="|"#"|"<"|"<="|">"|">=")expression.

expression=["+"|"-"]term{("+"|"-")term}.

term=factor{("*"|"/")factor}.

factor=
ident
|number
|"("expression")".

Terminalsare expressed in quotes. Eachnonterminalis defined by a rule in the grammar, except foridentandnumber,which are assumed to be implicitly defined.

C implementation

edit

What follows is an implementation of a recursive descent parser for the above language inC.The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.

Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner until the final nonterminal has been processed. The program fragment depends on a global variable,sym,which contains the current symbol from the input, and the functionnextsym,which updatessymwhen called.

The implementations of the functionsnextsymanderrorare omitted for simplicity.

typedefenum{ident,number,lparen,rparen,times,slash,plus,
minus,eql,neq,lss,leq,gtr,geq,callsym,beginsym,semicolon,
endsym,ifsym,whilesym,becomes,thensym,dosym,constsym,comma,
varsym,procsym,period,oddsym}Symbol;

Symbolsym;
voidnextsym(void);
voiderror(constcharmsg[]);

intaccept(Symbols){
if(sym==s){
nextsym();
return1;
}
return0;
}

intexpect(Symbols){
if(accept(s))
return1;
error("expect: unexpected symbol");
return0;
}

voidfactor(void){
if(accept(ident)){
;
}elseif(accept(number)){
;
}elseif(accept(lparen)){
expression();
expect(rparen);
}else{
error("factor: syntax error");
nextsym();
}
}

voidterm(void){
factor();
while(sym==times||sym==slash){
nextsym();
factor();
}
}

voidexpression(void){
if(sym==plus||sym==minus)
nextsym();
term();
while(sym==plus||sym==minus){
nextsym();
term();
}
}

voidcondition(void){
if(accept(oddsym)){
expression();
}else{
expression();
if(sym==eql||sym==neq||sym==lss||sym==leq||sym==gtr||sym==geq){
nextsym();
expression();
}else{
error("condition: invalid operator");
nextsym();
}
}
}

voidstatement(void){
if(accept(ident)){
expect(becomes);
expression();
}elseif(accept(callsym)){
expect(ident);
}elseif(accept(beginsym)){
do{
statement();
}while(accept(semicolon));
expect(endsym);
}elseif(accept(ifsym)){
condition();
expect(thensym);
statement();
}elseif(accept(whilesym)){
condition();
expect(dosym);
statement();
}else{
error("statement: syntax error");
nextsym();
}
}

voidblock(void){
if(accept(constsym)){
do{
expect(ident);
expect(eql);
expect(number);
}while(accept(comma));
expect(semicolon);
}
if(accept(varsym)){
do{
expect(ident);
}while(accept(comma));
expect(semicolon);
}
while(accept(procsym)){
expect(ident);
expect(semicolon);
block();
expect(semicolon);
}
statement();
}

voidprogram(void){
nextsym();
block();
expect(period);
}

Examples

edit

Some recursive descent parser generators:

See also

edit

References

edit
  1. ^This article is based on material taken fromRecursive+descent+parserat theFree On-line Dictionary of Computingprior to 1 November 2008 and incorporated under the "relicensing" terms of theGFDL,version 1.3 or later.
  2. ^Burge, W.H. (1975).Recursive Programming Techniques.Addison-Wesley Publishing Company.ISBN0-201-14450-6.
  3. ^Watson, Des (22 March 2017).A Practical Approach to Compiler Construction.Springer.ISBN978-3-319-52789-5.
  4. ^Aho, Alfred V.;Sethi, Ravi;Ullman, Jeffrey(1986).Compilers: Principles, Techniques and Tools(first ed.). Addison Wesley. p.183.

General references

edit
edit