# 4. An example: a little expression interpreter

This interpreter is able to calculate the value of any mathematical expression written with real numbers, the operators "+", "-" (binary and unary operators), "*", "/", "^" (power operator). This interpreter will not be able to use variables or functions.

## 4.1. The Lex part of the interpreter

Here is the source:

%{ #include "global.h" #include "calc.h" #include <stdlib.h> %} white [ \t]+ digit [0-9] integer {digit}+ exponant [eE][+-]?{integer} real {integer}("."{integer})?{exponant}? %% {white} { /* We ignore white characters */ } {real} { yylval=atof(yytext); return(NUMBER); } "+" return(PLUS); "-" return(MINUS); "*" return(TIMES); "/" return(DIVIDE); "^" return(POWER); "(" return(LEFT_PARENTHESIS); ")" return(RIGHT_PARENTHESIS); "\n" return(END); |

Explaination:

- The first part includes the file
`calc.h`, that will be generated later by Yacc and that will contain the definition for NUMBER, PLUS, MINUS, etc... We include the stdlib header, because we will use the`atof()`function after. We declare the*real*notion used in the second part. The`global.h`file contains only the`#define YYSTYPE double`declaration, because all the structures we will manipulate have the type`double`. By the way, this is the type of`yylval`. - The second part tells the syntaxical parser which type
of token it encountered. If it is a number, we put its value
in the
`yylval`variable, in order to be used later. - Finally, the third part is empty, because we do not want
Lex to create a
`main()`function, that will be declared in the Yacc file.

## 4.2. The Yacc part of the interpreter

This is the most important, and the most interesting:

%{ #include "global.h" #include <stdio.h> #include <stdlib.h> #include <math.h> %} %token NUMBER %token PLUS MINUS TIMES DIVIDE POWER %token LEFT_PARENTHESIS RIGHT_PARENTHESIS %token END %left PLUS MINUS %left TIMES DIVIDE %left NEG %right POWER %start Input %% Input: /* Empty */ | Input Line ; Line: END | Expression END { printf("Result: %f\n",$1); } ; Expression: NUMBER { $$=$1; } | Expression PLUS Expression { $$=$1+$3; } | Expression MINUS Expression { $$=$1-$3; } | Expression TIMES Expression { $$=$1*$3; } | Expression DIVIDE Expression { $$=$1/$3; } | MINUS Expression %prec NEG { $$=-$2; } | Expression POWER Expression { $$=pow($1,$3); } | LEFT_PARENTHESIS Expression RIGHT_PARENTHESIS { $$=$2; } ; %% int yyerror(char *s) { printf("%s\n",s); } int main(void) { yyparse(); } |

Well, it seems a little less simple, isn't it? In fact, it
is not so complicated. We include the usual files, and we use
the `%token` keyword to declare the tokens
that we can find. There is, in this case, no particular order
for the declaration.

Then we have the `%left` and
`%right` keywords. This is used to tell Yacc
the associativity of the operators, and their priority. Then, we
define the operators in an increasing order of priority. So,
"1+2*3" is evaluated as "1+(2*3)". You will have to choose
between "left" or "right" declaration for your operator. For a
left-associative operator (`%left` - "+" in
this example), `a+b+c` will be evaluated as
`(a+b)+c`. For a right-associative operator
("^" here), `a^b^c` will be evaluated as
`a^(b^c)`.

Then will tell Yacc that the axiom will be
`Input`, that is its state will be such as it
will consider any entry as an `Input`, at the
begining. You should also note the recursivity in the definition
of `Input`. It is used to treat an entry
which size is unknown. For internal reason to Yacc, you should
use:

Input: /* Empty */ | Input Line ; |

instead of:

Input: /* Vide */ | Line Input ; |

(This permits a *reduction* as soon as
possible).

Let's have a look to the definition of
`Line`. The definition itself is quite
simple, but you should ask yourself what represents the
`$1`. In fact, `$1` is a
reference to the value returned to the first notion of the
production. It is similar for `$2`,
`$3`, ... And `$$` is the
value returned by the production. So, in the definition of
`Expression`, `$$=$1+$3`
adds the value of the first expression to the value of the
second expression (this is the third notion) and returns the
result in `$$`.

If you have a look to the definition of the unary minus,
the `%prec` keyword is used to tell Yacc that
the priority is that of `NEG`.

Finally, the third part of the file is simple, since it
just calls the `yyparse()` function.

## 4.3. Compiling and running the example

Provided that the Lex file is called
`calc.lex`, and the Yacc file
`calc.y`, all you have to do is:

bison -d calc.y mv calc.tab.h calc.h mv calc.tab.c calc.y.c flex calc.lex mv lex.yy.c calc.lex.c gcc -c calc.lex.c -o calc.lex.o gcc -c calc.y.c -o calc.y.o gcc -o calc calc.lex.o calc.y.o -ll -lm [and maybe -lfl] |

Please note that you need to create a file called
`global.h` which will contain:

#define YYSTYPE double extern YYSTYPE yylval; |

I only have `bison` and
`flex` instead of `yacc`
and `lex`, but it should be the same, except
the file names.

The call to bison with the "-d" parameter creates the
`calc.tab.h` header file, which defines the
tokens. We call flex and we rename the files we get. Then, you
only have to compile, and do not forget the proper libraries. We
get:

$ calc 1+2*3 Result : 7.000000 2.5*(3.2-4.1^2) Result : -34.025000 |

## 4.4. A better calculator

When there is a syntax error, the program stops. In order to continue, we may replace

Line: END | Expression END { printf("Result : %f\n",$1); } ; |

by

Line: END | Expression END { printf("Result : %f\n",$1); } | error END { yyerrok; } ; |

but, of course, it is only an idea and there are many others (usage and definition of variables and functions, many data types, etc.).