4. An example: a little expression interpreter

This interpreter is able to calculate the value of any mathematical expression written with real numbers, the operators "+", "-" (binary and unary operators), "*", "/", "^" (power operator). This interpreter will not be able to use variables or functions.

4.1. The Lex part of the interpreter

Here is the source:

%{
#include "global.h"
#include "calc.h"

#include <stdlib.h>

%}

white		[ \t]+

digit		[0-9]
integer 	{digit}+
exponant	[eE][+-]?{integer}

real		{integer}("."{integer})?{exponant}?

%%

{white} 	{ /* We ignore white characters */ }

{real}		{
		  yylval=atof(yytext);
		  return(NUMBER);
		}

"+"		return(PLUS);
"-"		return(MINUS);

"*"		return(TIMES);
"/"		return(DIVIDE);

"^"		return(POWER);

"("		return(LEFT_PARENTHESIS);
")"		return(RIGHT_PARENTHESIS);

"\n"	return(END);

Explaination:

4.2. The Yacc part of the interpreter

This is the most important, and the most interesting:

%{

#include "global.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

%}

%token	NUMBER
%token	PLUS	MINUS	TIMES	DIVIDE	POWER
%token	LEFT_PARENTHESIS	RIGHT_PARENTHESIS
%token	END

%left	PLUS	MINUS
%left	TIMES	DIVIDE
%left	NEG
%right	POWER

%start Input
%%

Input:
	  /* Empty */
	| Input Line
	;

Line:
	  END
	| Expression END		{ printf("Result: %f\n",$1); }
	;

Expression:
	  NUMBER			{ $$=$1; }
	| Expression PLUS Expression	{ $$=$1+$3; }
	| Expression MINUS Expression	{ $$=$1-$3; }
	| Expression TIMES Expression	{ $$=$1*$3; }
	| Expression DIVIDE Expression	{ $$=$1/$3; }
	| MINUS Expression %prec NEG	{ $$=-$2; }
	| Expression POWER Expression	{ $$=pow($1,$3); }
	| LEFT_PARENTHESIS Expression RIGHT_PARENTHESIS	{ $$=$2; }
	;

%%

int yyerror(char *s) {
  printf("%s\n",s);
}

int main(void) {
  yyparse();
}

Well, it seems a little less simple, isn't it? In fact, it is not so complicated. We include the usual files, and we use the %token keyword to declare the tokens that we can find. There is, in this case, no particular order for the declaration.

Then we have the %left and %right keywords. This is used to tell Yacc the associativity of the operators, and their priority. Then, we define the operators in an increasing order of priority. So, "1+2*3" is evaluated as "1+(2*3)". You will have to choose between "left" or "right" declaration for your operator. For a left-associative operator (%left - "+" in this example), a+b+c will be evaluated as (a+b)+c. For a right-associative operator ("^" here), a^b^c will be evaluated as a^(b^c).

Then will tell Yacc that the axiom will be Input, that is its state will be such as it will consider any entry as an Input, at the begining. You should also note the recursivity in the definition of Input. It is used to treat an entry which size is unknown. For internal reason to Yacc, you should use:

Input:
       /* Empty */
     | Input Line
     ;

instead of:

Input:
       /* Vide */
     | Line Input
     ;

(This permits a reduction as soon as possible).

Let's have a look to the definition of Line. The definition itself is quite simple, but you should ask yourself what represents the $1. In fact, $1 is a reference to the value returned to the first notion of the production. It is similar for $2, $3, ... And $$ is the value returned by the production. So, in the definition of Expression, $$=$1+$3 adds the value of the first expression to the value of the second expression (this is the third notion) and returns the result in $$.

If you have a look to the definition of the unary minus, the %prec keyword is used to tell Yacc that the priority is that of NEG.

Finally, the third part of the file is simple, since it just calls the yyparse() function.

4.3. Compiling and running the example

Provided that the Lex file is called calc.lex, and the Yacc file calc.y, all you have to do is:

bison -d calc.y
mv calc.tab.h calc.h
mv calc.tab.c calc.y.c
flex calc.lex
mv lex.yy.c calc.lex.c
gcc -c calc.lex.c -o calc.lex.o
gcc -c calc.y.c -o calc.y.o
gcc -o calc calc.lex.o calc.y.o -ll -lm [and maybe -lfl]

Please note that you need to create a file called global.h which will contain:

#define YYSTYPE double
extern YYSTYPE yylval;

I only have bison and flex instead of yacc and lex, but it should be the same, except the file names.

The call to bison with the "-d" parameter creates the calc.tab.h header file, which defines the tokens. We call flex and we rename the files we get. Then, you only have to compile, and do not forget the proper libraries. We get:

$ calc
 1+2*3
 Result : 7.000000
 2.5*(3.2-4.1^2)
 Result : -34.025000

4.4. A better calculator

When there is a syntax error, the program stops. In order to continue, we may replace

Line:
          END
        | Expression END                { printf("Result : %f\n",$1); }
        ;

by

Line:
          END
        | Expression END                { printf("Result : %f\n",$1); }
        | error END                     { yyerrok; }
        ;

but, of course, it is only an idea and there are many others (usage and definition of variables and functions, many data types, etc.).