Converting from lex & yacc to flex & bison
Recommended reading
- flex & bison: Text Processing Tools by John Levine. Look Inside
Affiliate disclosure: we get a small commission for purchases made through the above links
This is part one of two pages explaining how to convert example lex and yacc code from Lex & Yacc by Levine, Mason and Brown [1] to work on modern systems using flex and bison. The source code from the book is available from the publisher's site.
That code was written over 20 years ago for the old, buggy AT&T lex and yacc software, and C compilers have changed slightly since then in what external functions they provide by default. The message boards seem to be full of complaints along the line of "OMG I downloaded the code and it doesn't compile. Help!".
This page takes a sample lex and yacc program and explains what we did to make it compile without error or warning on a standard Linux system in 2013 (Bodhi 3.2.0.19-generic with gcc 4.4.3, bison 2.4.1 and flex 2.5.35). Part two Using flex and bison in MSVC++ covers how we made it compile without any error or warning messages on Windows using Microsoft Visual Studio 2008. We also look at how to fix the memory leak problem.
UPDATE: There is a more recent book: Flex & Bison by John Levine [2]
We should add that we are AR* programmers who like our source code to compile without any warnings whatsoever, and we like to be able to compile the same source code on both our Windows MSVC compiler and on Linux, preferably without using convoluted preprocessor macros. The code we give here compiles for us on both systems without any any warnings (OK, we do supress a few of them) and does not have any memory leaks.
* AR = Overly obsessive concerning small details.
The sample program
We try the simple calculator with variables and real values from page 64 of [1] using the original files ch3-03.y and ch3-03.l. Compiling in Linux using the following commands
bison -d -v -b y ch3-03.y flex ch3-03.l g++ -o ch3-03 y.tab.c lex.yy.c -lflwe get these error messages
y.tab.c: In function ‘int yyparse()’: y.tab.c:1258: error: ‘yylex’ was not declared in this scope ch3-03.y:23: error: ‘printf’ was not declared in this scope ch3-03.y:31: error: ‘yyerror’ was not declared in this scope y.tab.c:1439: error: ‘yyerror’ was not declared in this scope y.tab.c:1582: error: ‘yyerror’ was not declared in this scopeWe should explain why we use the above commands. The
-b y
option in bison causes it to create
its output files in the form y.tab.c
and y.tab.h
.
We use g++
instead of gcc
because, surprising enough,
it seems to be the easier option on Linux to compile the pure ANSI C output from flex and bison.
The fix
Revised file: ch3-03.l | Revised file: ch3-03.y |
---|---|
1 %{ 2 /* We usually need these... */ 3 #include <stdio.h> 4 #include <stdlib.h> 5 6 /* Include this to use yylex_destroy for flex version < 2.5. 7 9 */ 8 #include "flex_memory_fix.h" 9 10 /* This is required and is generated automatically by bison 11 from the .y file */ 12 #include "y.tab.h" 13 14 /* Local stuff we need here... */ 15 #include <math.h> 16 extern double vbltable[26]; 17 %} 18 19 /* Add this to get line numbers... */ 20 %option yylineno 21 22 %% 23 ([0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) { 24 yylval.dval = atof(yytext); return NUMBER; 25 } 26 27 [ \t] ; /* ignore white space */ 28 29 [a-z] { yylval.vblno = yytext[0] - 'a'; return NAME; } 30 31 "$" { return 0; /* end of input */ } 32 33 \n | 34 . return yytext[0]; 35 %% 36 37 /* We need to add a main() function. 38 * It is more convenient to put it here to manage flex 39 memory management issues. 40 * At the minimum it must call yyparse(). 41 */ 42 extern int yyparse(); 43 44 int main(int argc, char *argv[]) 45 { 46 printf("Enter sums using + - * / and () or type $ to quit. 47 \n"); 48 yyparse(); /* REQUIRED */ 49 yylex_destroy(); /* Add to clean up memory leaks */ 50 } |
1 %{ 2 /* For printf() */ 3 #include <stdio.h> 4 5 /* Proformas for functions we define below... */ 6 void yyerror(char *s); 7 int yylex(void); 8 9 /* Specific for here... */ 10 double vbltable[26]; 11 %} 12 13 %union { 14 double dval; 15 int vblno; 16 } 17 18 %token <vblno> NAME 19 %token <dval> NUMBER 20 %left '-' '+' 21 %left '*' '/' 22 %nonassoc UMINUS 23 24 %type <dval> expression 25 %% 26 statement_list: statement '\n' 27 | statement_list statement '\n' 28 ; 29 30 statement: NAME '=' expression { vbltable[$1] = $3; } 31 | expression { printf("= %g\n", $1); } 32 ; 33 34 expression: expression '+' expression { $$ = $1 + $3; } 35 | expression '-' expression { $$ = $1 - $3; } 36 | expression '*' expression { $$ = $1 * $3; } 37 | expression '/' expression 38 { if($3 == 0.0) 39 yyerror("divide by zero"); 40 else 41 $$ = $1 / $3; 42 } 43 | '-' expression %prec UMINUS { $$ = -$2; } 44 | '(' expression ')' { $$ = $2; } 45 | NUMBER 46 | NAME { $$ = vbltable[$1]; } 47 ; 48 %% 49 /* An optional but friendlier yyerror function... */ 50 void yyerror(char *s) 51 { 52 extern int yylineno; // defined and maintained in lex 53 extern char *yytext; // defined and maintained in lex 54 fprintf(stderr, "ERROR: %s at symbol '%s' on line %d\n", s, 55 yytext, yylineno); 56 } |
Differences
To see the highlighted differences between the revised and original files look at ch3-03.l-differences and ch3-03.y-differences.
The revised source code files ch3-03.l and ch3-03.y and flex_memory_fix.h are in this zip file (2 kB).
Some explanation
-
We need to add explicitly the standard ANSI C include files like
stdio.h
andstdlib.h
(see.l
: lines 3-4,.y
: line 3). Back in 1990 you generally didn't need to do that. -
We need to add a
main()
function and (optionally) ayyerror()
function for lex. You could put these in a separate .c module. If not, it is more convenient to put the main function in the lex.l
file (see.l
: lines 44-50) - The core lex regular expressions and yacc grammar rules between the %% lines are unchanged from the originals.
-
We call
yylex_destroy()
(.l
: line 49) to remove a memory leak problem. The optionalflex_memory_fix.h
(.l
: line 8) file should be added for versions of flex earlier than 2.5.9.
A diagram
If you are interested, the logic in the bison file is shown in
this diagram (309 kB).
To obtain this gif file, we used the -g
option in bison and then used dot
(available from Graphviz) on the resulting .dot file to create a gif file.
bison -d -v -b y -g ch3-03.y dot -Tgif y.dot -o ch3-03.dot.gif
Compiling without warnings
g++ -Wno-write-strings -o ch3-03 y.tab.c lex.yy.c -lfl
to avoid an annoying and useless warning message [deprecated conversion from string constant to 'char *'
].
Even better options
-Wall -Wno-unused -Wno-deprecated -Wno-write-strings
Memory leak problem
There is a memory leak problem in GNU flex:
see Memory leak - 16386 bytes allocated by malloc.
In flex versions 2.5.9 and above you can fix this by calling yylex_destroy()
.
The fudge in flex_memory_fix.h
should allow you to
solve this for earlier versions.
This is more of an issue for Windows users where the latest available GNU version of flex is currently only 2.5.4 (as of March 2013).
If your version of flex is 2.5.9 or above, then you don't need this .h file.
Flex and bison in Windows
See Using flex and bison in MSVC++.
References
- [1] Levine, John, Tony Mason, and Doug Brown. Lex & Yacc, 2nd Edition, O'Reilly, 1992
- [2] Levine, John. Flex & Bison: Text Processing Tools, O'Reilly, 2009
Contact
To comment on this page or to contact us, please send us a message.
If you send us a relevant comment for this page, we'll post it here. Just mention "converting_from_lex_and_yacc
" in your message.
This page first published 3 March 2013. Last updated 24 June 2020.