# Class 9: Grammars and parsing, continued

Held Wednesday, September 16, 1998

Notes

• Next Tuesday (Sept. 22), representatives of Iowa State's Dept of Electrical and Computer Engineering will be visiting campus to recruit students for their graduate program. If you're interested in going to grad school, it might be interesting to chat with their recruiters. There's a poster at the North end of the NS hallway.
• Next Tuesday (Sept. 22) at 11am Prof. Vikram Dalal from ISU will be giving a talk on microelectronics. There is also a lunch exersion for students only. Check with Paul Weber if you're interested in going.
• Are there questions on the readings?
• Are there questions on the homework assignments?
• Oh no! Time to take another quiz! (which won't be ready until right before class).
• What were your results for the take-home assignment? I'm particularly interested in how you proved your answers correct.
• Those of you who survived CS302 will find parts of today's class familiar. I apologize for the duplication, but I think this is an important topic to cover.

## Context free grammars

• The most common form of BNF grammars used are the so-called context-free grammars.
• In context-free grammars, the left-hand-sides of productions are always single nonterminals.
• When parsing with context-free grammars, we can build parse trees that prove that the string is in the language.
• The root of a parse tree is the start symbol
• The leaves of a parse tree are the tokens of the string (in order)
• Each node and its down edges correspond to a production.
• You can build the tree starting with the start symbol and working your way down, or starting with the tokens and working your way up.
• Can you write a context-free grammar for ``strings of a's and b's with equal numbers of a's and b's''?

## Predictive parsing

• In predictive parsing, you build the tree from the top down, starting at the left and going to the right.
• It is easiest to think of a predictive parser as having a `parse` function for each nonterminal.
• For example, we might write the `parse` function for a program as
```-- Find whether t represents a program.  Returns true if it does
-- and false otherwise.  Modifies the token stream.
boolean parseProgram(TokenStream t) {
-- Every program must begin with the program keyword
if (t.first() != 'program') then return false;
-- Consume the program token
t.deleteFirst();
-- Check and consume the program name
if (t.first() != identifier) then return false;
t.deleteFirst();
-- Check and consume the left paren
if (t.first() != lparen) then return false;
t.deleteFirst();
-- Get the identifier list
if (!parseIDList(t)) then return false;
-- Check and consume the right paren
if (t.first() != lparen) then return false;
t.deleteFirst();
-- The declaration list
if (!parseDecList(t)) then return false;
-- And the compound statement
if (!parseCompound(t)) then return false;
-- The period and we're done
if (t.first() != period) then return false;
t.deleteFirst();
return true;
end parseProgram
```

## An Expression Grammar

• Let us consider a grammar one might use to define simple arithmetic expressions over variables and numbers.
• Each number is an expression. We get numbers from the lexer.
```Exp ::= number
```
• Each identifier is an expression (we won't worry about types right now).
```Exp ::= identifier
```
• Each application of a unary operator to an expression is an expression.
```Exp ::= UnOp Exp
```
• The legal unary operators are the plus symbol and the minus symbol.
```UnOp ::= '+'
UnOp ::= '-'
```
• The application of a binary operator to two expressions is an expression.
```Exp := Exp BinOp Exp
```
• And there are a number of binary operators. For shorthand, we'll write a vertical bar (representing "or") to show alternates .
```BinOp ::= '+' | '-' | '*' | '/'
```
• A parenthesized expression is an expression
```Exp ::= '(' Exp ')'
```
• How do we show that `a+b*2` is an expression? First, we observe that the lexer converts this to
`identifier + identifier * number`
• The derivation follows. A right arrow (=>) is used to indicate one derivation step.
```Exp =>                              // Exp ::= Exp BinOp Exp
Exp BinOp Exp =>                  // Exp ::= identifier
identifier BinOp Exp =>           // BinOp ::= '+'
identifier + Exp =>               // Exp ::= Exp BinOp Exp
identifier + Exp BinOp Exp =>     // Exp ::= number
identifier + Exp BinOp number =>  // Exp ::= identifier
identifier + identifier BinOp number =>
identifier + identifier * number
```
• Note that we can have multiple choices at each step. For example, we might have expanded the second `Exp` rather than the first in `Exp BinOp Exp`.
• To describe these simultaneous choices, we often write visual descriptions of context-free derivations using trees. The interior nodes of a tree are the nonterminals. A derivation is shown by connecting a node (representing the lhs) to children representing the rhs of a derivation.
```                    Exp
/  |  \
/   |   \
Exp  BinOp  Exp------
/      |    | |      \
/       |    |  \      \
identifier    +   Exp  BinOp  Exp
/      |     |
/       |     |
identifier   *   number
```

### Ambiguity

• As in many other areas of computer science, careless grammar design can lead to ambiguity in understanding (and parsing) the objects we describe.
• An ambiguous grammar is one that provides multiple parse trees for the same string.
• Why is this a problem? Often the parse trees provide a natural mechanism for evaluating, compiling, understanding, or otherwise using the parsed expression.
• For example, one might evaluate parsed expressions by evaluating the subtrees and then applying the appropriate operation. We might get different results if we had different parse trees.
• How do we get around ambiguity?
• We identify potential areas of ambiguity (often, using tools that tell us about such ambiguities).
• We decide which parse tree is correct for our intended meaning.
• We rewrite our grammar to eliminate the ambiguity.
• We ensure that the new grammar describes an identical language to the old grammar.

Back to Introduction to grammars and parsing. On to Predictive parsing.

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.