Thursday, April 22, 2004

Recursive Descent Parsing



Recursive descent parsing could also be called as top-down parsing.

The simplest way to construct a top-down parser is to regard:

each production rule as defining a function, where:

- the name of the function is the non-terminal on the l.h.s. of the rule
- each instance of a non-terminal on the r.h.s. is a call to the corresponding function
- each instance of a terminal on the r.h.s. tells us to match it with the input symbol

Grammar could be called as set of production rules.

Tuesday, April 20, 2004

Some of the important Lexer & Parser Generator tools



Yacc - Yet Another Compiler-Compiler
AT&T Yacc is a classical parser generator which comes with Unix. It provides a general tool for describing the input to a computer program. The user specifies the structures of his input, together with code to be invoked as each such structure is recognized. Yacc turns such a specification into a subroutine that handles the input process; frequently, it is convenient and appropriate to have most of the flow of control in the user's application handled by this subroutine.

Yacc++
Does it sound like something to do with C++... Yacc++ contains not only C++ wrappers but enhanced features of object-oriented technology that makes it powerful tool. To learn more, check out Yacc++

Lex - A Lexical Analyzer Generator
Lex helps write programs whose control flow is directed by instances of regular expressions in the input stream. It is well suited for editor-script type transformations and for segmenting input in preparation for a parsing routine.
Lex source is a table of regular expressions and corresponding program fragments. The table is translated to a program which reads an input stream, copying it to an output stream and partitioning the input into strings which match the given expressions. As each such string is recognized the corresponding program fragment is executed. The recognition of the expressions is performed by a deterministic finite automaton (DFA) generated by Lex. The program fragments written by the user are executed in the order in which the corresponding regular expressions occur in the input stream.

Flex - A fast scanner generator
Flex is a tool for generating scanners: programs which recognized lexical patterns in text. It reads the given input files, or its standard input if no file names are given, for a description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file, `lex.yy.c', which defines a routine `yylex()'. This file is compiled and linked with the `-lfl' library to produce an executable. When the executable is run, it analyzes its input for occurrences of the regular expressions. Whenever it finds one, it executes the corresponding C code.

Bison - The YACC-compatible Parser Generator
Bison is a general-purpose parser generator that converts a grammar description for an LALR(1) context-free grammar into a C program to parse that grammar. Once you are proficient with Bison, you may use it to develop a wide range of language parsers, from those used in simple desk calculators to complex programming languages.
Bison is upward compatible with Yacc: all properly-written Yacc grammars ought to work with Bison with no change. Anyone familiar with Yacc should be able to use Bison with little trouble.

Monday, April 19, 2004

Reduce-reduce conflict...Lex/Yacc



Lets take a look at the reduce-reduce conflict in shift reduce parsers.
Consider the following grammar:

line : nameValueTypes1
       | nameValueTypes2
       | newLine
       ;

nameValueTypes1 : NAME EQUALTOSEPARATOR VALUE
                    ;

nameValueTypes2 : NAME COLONSEPARATOR VALUE
                ;


newLine : NL
            ;


Above grammar is used to read a line having name-value pair of following type:

name = value
name : value

Can you figure out how above grammar has reduce-reduce conflict?

A point is reached when line is there and next token is NAME. There arises confusion whethar to use the rule nameValueTypes1 or nameValueTypes2 to reduce. Thus, this leads to reduce-reduce conflict.

Sunday, April 18, 2004

Shift-reduce conflict in shift-reduce parsers...continued



Let us take a typical example:

int example(String str1, String str2, Hashtable htb1, Hashtable htb2)

So, what would be left-hand and right hand side rules?

function : TYPENAME FUNCNAME OPENCIRCLRBRACKET paramlist CLOSECIRCLRBRACKET
               {
               }
               | TYPENAME FUNCNAME OPENCIRCLRBRACKET CLOSECIRCLRBRACKET
               {
               }
               ;

paramlist : param
               {
               }
               | paramlist COMMA param
               {
               }
               ;

param : TYPENAME IDENTIFIER
               {
               }
               ;

Above correctly shifts & reduces the tokens. What if the language decide of having something like following:

int example(String str1,str2, Hashtable htb1,htb2)

The rules are modified in the following manner:

param : TYPENAME identifier-list
            ;

identifier-list : IDENTIFIER
                    | identifier-list COMMA IDENTIFIER
               ;

This would lead to shift-reduce conflict. Can you determine why?... :-) hint: probably confusion could arise due to the fact of whethaR to shift or reduce when COMMA(,) is read as look ahead token. Because, a COMMA(,) symbol could be followed by identifier or TYPENAME.


Shift-reduce parsers...Lex/Yacc



So, where arises the shift-reduce conflict talked earlier? ...You are correct! They occur in shift-reduce parsers.

What are shift reduce parsers?

The most common bottom-up parsers are called shift-reduce parsers. Recall bottom-up parsers discussed earlier (LR(k) Parsers) in topic Various kinds of parsers. These parsers examine the input tokens and either shift them onto a stack or reduce the top of the stack, replacing a right-hand prduction rule by a left-hand side handle.

Shift-reduce decisions are solely based on what is on the stack and what the next tokens are.

Saturday, April 17, 2004

Shift-reduce conflict...Lex/Yacc



The shift-reduce or reduce-reduce conflict relating to parser has been one of the things that creates curiosity in a beginner. Well, thats something that also made me curious when I started doing lex/yacc... However I ignored it in the beginning and moved ahead with my lex/yacc. But, now I truely could appreciate that a knowledge of shift/reduce or reduce/reduce paradigm in lex/yacc is very important.

What is conflict?
Sometimes the grammar is written in such a way that the parser generator cannot determine what to do in every possible circumstance. (We assume a lookahead of one symbol so only the first symbol in the input buffer can be examined at each step. More is possible if multiple symbol lookahead is allowed, but this complicates the parsing process.) Situations where problems can occur are called conflicts.

So, what is shift/reduce conflict?

As the name tells, the problem is related to the choice of whethar to shift an input token on the stack OR reduce and execute the action code, if any. Lets say parser is reading four tokens.

NAME
SEPARATOR
VALUE
NEWLINE
WHITESPACE
if following are two rules:

NAME SEPARATOR VALUE
{
Action code goes here...
}

NAME SEPARATOR VALUE NEWLINE WHITESPACE
{
Action code goes here...
}

Above would create shift-reduce problem..:) When the NEWLINE token is read and becomes the look-ahead token, it is prefectly all right to reduce the contents of the stack. But, it is also legitimate to to shift the NEWLINE because that would eventually lead to reduction of second production rule.

So, what is the default behaviour of Yacc parser generator in such case ?
If there is a shift-reduce conflict then YACC will prefer the shift over the reduce. However, relying on default behaviour like YACC can be dangerous because changes to the way the grammar is written can affect the parser in subtle ways. For example, just reordering the productions changes the way shift-reduce conflicts are resolved.

Tuesday, April 13, 2004

Lets talk about grammer...Parsers



So, what is grammer in context of parsers? What is context free grammar? What is Backus-Naur Form(BNF)? What is van Wijngaarden form?

In general, grammar is a book of rules and examples which describes and teaches the language.

Grammar is used to formally specify the syntax of a language. It can be specifieed as a structure {N,T,P,S} where N is a set of non-terminals, T is a set of terminals, P is a set of productions, and S is a special non-terminal called the start symbol of the grammar. In computer science a formal grammar is a way to describe a formal language, i.e., a set of finite-length strings over a certain finite alphabet. They are named formal grammars by analogy with the concept of grammar for human languages.
The basic idea behind these grammars is that we generate strings by beginning with a special start symbol and then apply rules that indicate how certain combinations of symbols may be replaced with other combinations of symbols.

What is context-free grammar?

Context-free grammar (CFG) is a formal grammar in which every production rule is of the form V → w where V is a non-terminal symbol and w is a string consisting of terminals and/or non-terminals. The term "context-free" comes from the feature that the variable V can always be replaced by w, in no matter what context it occurs. A formal language is context-free if there is a context-free grammar which generates it.

What is BNF?

BNF (Backus-Naur Form) is a metasyntax used to express context-free grammars: that is, a formal way to describe formal languages.
It is widely used as a notation for the grammars of computer programming languages, command sets and communication protocols; most textbooks for programming language theory and/or semantics document BNF. Some variants, for example ABNF, have their own documentation.
It is often used to express context-free grammars.

Parser generators...Lex/Yacc



By the way, what are parser generators? Heard about common text processing tools in Unix.. such as grep, sed, or AWK?... :-)
Parser generators are something that can write recognizers(another name for parsers) automatically, given a description of a language's grammatical structure.
grep itself is a parser generator whose grammar is the language of regular expressions. AWK, as an extension of grep and sed,
takes the regular expression technologies a step further to implement a general programming language based on text processing
and regular expression pattern matching. Perl, like AWK is yet another attempt to build a generalized text processing language
based on regular expressions.

Watch out for some detailed discussion on parser generators on a later date...!

How about a code migration tool, or say, code browser, or say, debugger...-) Applications relating to lex/yacc or parsers are a lot many!
You would be finding discussion on each one of them in time to come! Enjoy! A saintly advice: One has to visualize a lot while working with parsers.
One also needs to have a little bit of patience to let this stuff get in!:-)

I have really got fascinated by parsers and its application!

....an_0916@hotmail.com

Some applications...Lex/Yacc



People talk about converting C++ code to java or Java to C++ or any kind of code written in one language to be converted to code in some other language? Do you think there is some role of lexer/parsers in acheiving this objective?... :-) Watch out for detailed discussion in near future..:-)

XSLT: Most common application of XSLT is to convert XML into HTML file for display purpose. So, we specify the xml tags/tokens and provide stylesheet/HTML tags to represent each one of them. Doesnot it sound like taking specific action based on the input token?.... I think yes..! Somewhere in the deep level, XSLT could be using parser in one way or the other to transform XML and present it as html page.... sounds interesting right..? Thats something I was wondering about how XSLT might be working...

Parsers are already being used extensively in a number of disciplines:
Computer science
  - compiler construction
  - database interfaces
  - self-describing data-bases
  - artificial intelligence
Linguistics
  - text analysis
  - corpora analysis
  - machine translation
  - textual analysis of biblical texts
Document preparation and conversion
Typesetting chemical formulae and in chromosome recognition

Above list indicates at only few of the important applications; they can be used (and perhaps are) in a far larger number of disciplines.

The combination of lex and yacc is very effective for writing language interpreters of all kinds. Though most programmers never get to do the kind of general-purpose compiler-building that these tools were meant to assist, they're extremely useful for writing parsers for run-control file syntaxes and domain-specific minilanguages.



an_0916@hotmail.com

NFA Vs DFA...Lex/Yacc



Deterministic Finite-state Automata (DFA): A DFA is one whose behavior is fully determined by the state it is in and the current input symbol. In a DFA, when the machine is in state q and reads the next symbol s, we know exactly what the next state will be f(q, s) = q1

DFAs are often represented by directed graphs called (state) transition diagram. The vertices of a transition diagram represent the states of the DFA and the arcs labeled with an input symbol correspond to the transitions. An arc ( p , q ) from vertex p to vertex q with label r represents the transition (p,r ) = q . The accepting states are indicated by double circles. Transition functions can also be represented by tables called transition table.

Non-deterministic Finite-state automata (NFA): A non-deterministic automaton can have multiple paths given the same input and current state. NFA basically comprises of 5 components. They are as following:

DFA/NFA model basically comprises of 5 components (5-tuple) (Q,A,f,q0,qn). They are as following:
1. A finite set of states, Q
2. A finite set of tokens/symbols, A
3. Mapping or transition function, f
4. A finite set of start states, q0
5. A finite set of final states, qn


NFA differ from DFAs in that an input token could be specified to for multiple possible next moves. Thus, in an NFA, one could move from state q0 to any of states q1; q2; : : : qk by seeing the same input. For DFA, for an input, there could be only one start state.
In simple words, in DFA, there is one transition exiting a state for each symbol in alphabet while in NFA, there could be 0, 1, or more transitions from a state for each symbol in alphabet.
In DFA, every transitions require a symbol while in NFA, there could be transitions that do not consume a symbol


NFA is a generalization of DFA. Thus, every DFA is an NFA.

Sunday, April 11, 2004

Finite State Automata... Lexers/Parsers



How do we tackle the problem of recognizing languages while doing Lex/Yacc? We solve it by finite state automaton (FSA).

Before defining FSA, lets take a look at what is language?
A language could be defined as a set of strings. Languages could be defined by a set of rules called as grammer.

A finite state automaton is an abstract model of a simple machine used for recognizing simple syntactical structures or patterns in the text strings. Finite state machine (FSM) is all about states and transitions.

The machine can be in a finite number of states. It receives symbols as input, and the result of receiving a particular input in a particular state transitions itself to a specified new state. One of these is start state and other is final state. Others are intermidiate states. If the machine is in one of the finishing states and the input ends , it has ended successfully (or has accepted the input).

FSA could be one of the following:
- Non-deterministic (NFA) (Non-deterministic finite-state machines)
- Deterministic (DFA) (Deterministic finite-state machines)



Before going any further on FSA, let us look at some of the following definition that would enable better understanding about FSA:

(S) State state: State state of the machine

(F) Final state or Accept state: Finishing state of the machine. There could be more than one final states.

(I) Intermediate States: Intermediate states

(T) Token/string: tokens/string that machine waits for. Based on these tokens, they stays in the same state or transitions to next state.

(Tr) Transitions


Say, there are two tokens A and B. There are 4 states( 1,2,3,4).

1 is state state. 4 is the final state. 2 and 3 are intermediate states.

Following defines states, tokens representin transitions in the FSA:

Tr = {(1,A,2),
(1,B,4),
(2,A,2),
(2,B,3),
(3,A,3),
(3,B,4),
(4,A,3),
(4,B,4)}

Above is of the form (S,T,F) or (S,T,F) or (S,T,I) or (I,T,I) or (I,T,F), or (F,T,F). These are various transitions of this particular FSM model.

Above could be read as following:

If machine is 1 state, and it receives A, it transitions to state 2, or else, if it receives B, it transitions to final state. If the machine is in state 2 and receives A, it stays in the same state and wait for further tokens, or else, if it receives B, it transitions to state 4. and so on...

Thus, FSA could be represented as a transition table of states vs symbols where states represents a particular state and symbols represent a token.

The question, now is, what is DFA and NFA. What is difference between NFA or DFA?

Watch out for a detailed discussion... Enjoy <!:-)

The finite-state machine (FSM) enjoy a special place in computer science. It has proven to be a very useful model for many practical tasks and deserves to be among the tools of every practicing computer scientist. Many simple tasks, such as interpreting the commands typed into a keyboard or running a calculator, can be modeled by finite-state machines.

Some of the examples of FSM are Moore machine, Mealy machine etc.
Moore Machine is a FSM which produces output for each state.
Mealy Machine is a FSM which produces output for each transition.


Regular expressions..Lexers/Parsers



Before going ahead, lets look at some notes on regular expressions. As discussed earlier, lets say A & B and two macro definitions.

(A|B) : Matches {A, B}
(A|B)(A|B) : Matches {AA, AB, BA, BB}
(A|B)+ : Matches {A, B, AB, AAB, BAA, ...}


Some notations and their meaning in regular expressions:

. Matches a single character.

? The preceding item is optional and matched at most once.

* The preceding item will be matched zero or more times.

+ The preceding item will be matched one or more times.

{n} The preceding item is matched exactly n times.

One could refer the book Mastering Regular Expressions by Oreilly associates to learn depth of regular expression! However, I would be highlighting key concepts from time to time. Keep watching this space for more discussions! Enjoy <!:-)

email for any queries... an_0916@hotmail.com

Thursday, April 08, 2004

Before going any further on lexer/parser...



Lets discuss something on Language and context free grammers. Some of the terms discussed will be BNF, terminals, non-terminals, Left derivation, Right derivation, recursive descent parser etc.! Watch out for the content!

Chill out with brain parser in real life!



Listen to everything what comes your way, but train your mind to act appropriately on certain patterns. Your brain might not have the action like staying happy when patterns like 'failure' happens! However, retrain your neural networks consciously! Modify the rules associated with these tokens ( or failure patterns) and stay happy in life..! Enjoy :-) Just believe in yourself..! You could stay so much happy..! Rightly sung by Kelly on believing:

I believe I can fly
I believe I can touch the sky
I think about it every night and day
Spread my wings and fly away
I believe I can soar
I see me running through that open door
I believe I can fly
I believe I can fly
Oh I believe I can fly

----
an_0916@hotmail.com

Different kind of parsers!


As we know by now, parser analyses the relationship of thetokens passed to it by the lexers. There are various possible ways in which a language, set of tokens, could be analysed. They are following:

LL(1) Parsers : LL(1) Parsers are used to analyse very simple languages using Left-to-right scan, Leftmost derivation. The very next character is used to decide about the patterns. This does not include lookahead or backtracking of characters. Just the very next character. That is it! These parsers are also called top down parser.

LL(k) Parsers : These parsers are sometime called as top-down parsers. In this case, lookahead of k tokens is possible. This is a generalization of LL(1) parser. Left recursion of tokens is not allowed with these parsers.

LALR(1) Parsers : These parsers use Look Ahead Left to Right grammars to analyse the language. In simple words, these parsers are Look-ahead LR parsers. It differs from LR parsers in the fact that it looks ahead one symbol in the input string before going for a reduce action. Look-ahead helps in knowing if the complete rule has been matched or not.
Yacc generates LALR parsers. In this case, sometimes ambiguity occurs due to the decision making of whethar to shift the tokens on to the stack or reduce the tokens on the stack by using thegrammer rules at a particular stage. These parsers are also called as bottom-up parsers.

LR(1) Parsers : These parsers are Left-to-right scan, Rightmost derivation. These parsers at times are also called bottom-up parsers.


Thus, parsers are basically categorized into two broad categories. They are as following:

top-down parser
bottom-up parser - Also called as Shift-Reduce parsers.

A top-down parser would read as many rules as possible to match an input. In worst case, it goes through the same rule, time and again if the rule is left recursive.

A bottom-up parser reads as many input as possible to match a rule and decide what actions to take. They are more efficient. However, they are often restricted to language where fixed number of tokens, say k token are used to match any rule.

Some interesting research topics for parsers!


Some of the interesting topics relating to parsers are following:

- Adaptive neural network parser : Applicability of an adaptive neural network to problems with time-dependent input by demonstrating that a deterministic parser for natural language inputs of significant syntactic complexity can be developed using recurrent connectionist architectures. Let me know if you want to know more on this.

- Incremental Parsing of Natural Language Using Recursive Neural Networks

- Parsing and translation decisions

...Watch out for these topics. Latest research and development work relating to parsing technology would be displayed and discussed here. Enjoy..! :-)


Wednesday, April 07, 2004

let's play around with yet another lexer/parser's idea!


How about scanning a file and finding some comon patterns and take the appropriate action? Could you think about writing a scanner for xml file?

What macro definition comes to your mind? Should they be something like following:

OPENTAG = <
CLOSETAG = >
EMPTYCLOSETAG = />
REALCLOSETAG = </
NAME =[^ \t\f\b\n\r\"']
VALUE =[^ \t\f\b\n\r\"']
WHITESPACE = [ \t]


To match a xml text pattern <domain />, we could be writing following rule:

{OPENTAG}{NAME}+{WHITESPACE}*{EMPTYCLOSETAG}

Do try reading above rule vis-a-vis macro definition!

Thus, any file having some constant text pattern could be read pretty easily with lexer and passed to parser as tokens. Writing a program to do the same task could quite complicated and error-prone and inflexible. Here comes the advantage of lexer/parser. Enjoy reading further...!:-)

We all know the Internet is storehouse of unimaginable repository of knowledge. Don't we want to use this knowledge in an effective manner for our best possible use? But, how could we do that? Webpages are so ugly with its markups. Well, you could get rid of it by the help of regular expressions in form of macro definitions in scanner. Use only that information that you want. Say, for example, you want to get price information for a collection of websites on weekly basis because you are training some neural network for prediction of e-market price. How do you get these price information? Doesn't it sound complex? Well... the answer could be lexer/parser...! :-)

For all your suggestions/feedback/queries, please drop an email to an_0916@hotmail.com. Thank you!

Macro definitions and regular expression related to Lexers


Macro definitions form part of declarations. Macro definition are basically the regular expressions. Say, there are two macro definitions A and B that are to be used to match a particular pattern. One or macro definitions together can also match a particular pattern.

Example:
A = [^ \t\n\r]
B = \n\r
C = [ \t]
D = "="

Now, above four macro definitions could be used to match the following text pattern

user=aj
password=aj

So, How should the above macro definitions A,B,C,D come togethar to match the text pattern mentioned above? Probably in the following manner:

<state> {A}+{C}*{D}{A}+{B}

Above reads in the following manner:

One or more text pattern matching {A}+, followed by zero or more space identified by {C}*, followed by = sign identified by {D}*,
followed by one or more text pattern identified by {A}+ followed by new line identifed by {B}

Following forms very important part of macro definitions used in matching various kinds of text pattern:

A | B : Matches all input that is matched by A or by B

A B : Matches the input matched by A followed by the input matched by B.

A* : Matches zero or more repetitions of the input matched by A

A+ : Matches input matched by atleast one A

A? : Matches the input matched by zero or one A

!A : Matches everything but the strings matched by A.

~A : Matches everything upto the first occurence of A.

A{n} : Matches n occurence of A. Thus, A{3} represents 3 occurence of A.

A{n,m} : Matches at least n occurence of A and at most m occurence of A. Thus, A{3,5} represents at least 3 occurence of A and at most 5 occurence of A.

(A) : Matches exactly A.



As patterns are matched, some action could be associated with it. This could be represented in the following manner (above example):

<state> {A}+{C}*{D}{A}+{B} { action code.. say for example: printf("Number is name-value pair separated by = separator\n"); }

What does <state> means would be explained a bit later. Hope you got it till now!

Just going a bit out of the context... Human brain vs Parsers


Did we realize that our human brain in its core has a very intelligent lexer/parser? Whatever we see/feel, the lexical analyser of our brain passes this information in form of tokens to the syntaxical analyser which takes action based on our training provided to it over a period of years? That indicates of a beautiful relationship between the parser in our brain and neural networks that trains our brain according to the input?

Is it ringing some bells relating to association of neural networks/parser in real world applications? Could we use this combination? Can we think of any such useful application? :-)

Tuesday, April 06, 2004

Example relating to writing lexer:



Say, target language is Java and lexer tool used is JFlex, parser tool used as BYacc/J

An HTML page consist of markups/attributes etc. It is required to find out all the hyperlinks and store them in an arraylist.

DEFINITION part would look something like following:

%{

import java.io.*;

%}

HREF = "href"
HYPERLINK = http://[^ \t\n\r]+

%%

HREF { return HREF; }
HYPERLINK { return HYPERLINK; }

%%

A lexer file basically looks like following:


DEFINITION
%%
RULES
%%
USER CODE

Definition has two parts:
1. Code written in target language and enclosed between %{ ... %}
2. Regular expression defining the class of tokens

Monday, April 05, 2004

Let us try and understand how it works:


There are two important functions. They are yylex() and yyparse(). Lexical analyser does the analysis of the language by using the patterns that match the strings in the input. The strings are then converted and consumed as tokens in yylex(). Tokens are basically numerical representation of strings.

Lexer informs the syntaxical analyser or parser, generated by Yacc, through the function yyparse(). Grammer rules are used to analyze tokens from from lex and and create a syntax tree. Yacc reads the grammer descriptions and generates a parser, function yyparse().

Thus, as a flow, a main function is created that calls yyparse(). yyparse() function calls yylex() to obtain each token.

main() - calls --> yyparse() - calls --> yylex()

In all of the above, you just need to create a main function and call the yyparse() function. yyparse() and yylex functions are generated by Yacc & Lex tools respectively.

One example application of lex and yacc

is a web crawler! Lets see how..!

Say, a java application retreives a web page from the net. The lexical analyser searches for hyperlink from this webpage with various markup. Following could be the written to filter out hyperlinks:

href=\"[a-zA-Z0-9:/_]+\"
{
code... goes here
return "URL"
}

The equivalent parser code for handling the token could look like following:

file: URL
{
Go to this link..!
}

The example above is just presented to give an idea!

One of the important pre-requisites for writing Lexers



One of the important pre-requisite for writing a lexical analyser is to have a solid understanding of regular expression.

Then, there are other concepts that one must understand. They are as following:
1. Various approaches to write lexer
2. Data structure Stack
3. Various methodologies in parsers.

Lexer and parser work side-by-side! In very simple words, lexer passes tokens to parser and parser take action based on certain rules formed around these tokens.

For example, in the addition example, if lexer passed following three tokens

NUMBER ADD NUMBER

and if parser has the rule namely,

NUMBER ADD NUMBER {
add($1.sval,$3.sval);
}

Parser would call add(int,int) function to add the number.

Lex and Yacc Utilities



There are several lex and yacc utilities for languages such as C, Java. For example, for Java, one could use JFlex as lexer, and BYacc/J as parser utility!

Lex and Yacc utilities plays important part in writing compilers.

For example, let's say the C source code consists following syntax:

i = j + k;

While compiling the above gets converted into assembly code in the following manner:

token_i = token_j + token_k

load token_j
add token_k
store token_i


To do the above, lexical analyser(lexer) and parser works in the following manner:

Lexer recognizes the tokens and store them in some table. Parser, then, reads each of the token and takes the action appropriately. Thus, in the above example, parser sees the + sign separated by two tokens. Thus, it generates the assembly output shown above to add them and store the result in another variable (register).