# Class 10: An Expression Grammar

Held Friday, September 18, 1998

Notes

• Next Tuesday at 4:15 Yuriy will be presenting a cool talk on his summer research.
• Sunday the 27th at noon is the math/cs picnic. Come and have good food.
• Once again, I apologize to the survivors of CS302 for any duplication in today's class.

## An Expression Grammar

• Let us consider a grammar one might use to define simple arithmetic expressions over variables and numbers.
• Each number is an expression. We get numbers from the lexer.
```Exp ::= number
```
• Each identifier is an expression (we won't worry about types right now).
```Exp ::= identifier
```
• Each application of a unary operator to an expression is an expression.
```Exp ::= UnOp Exp
```
• The legal unary operators are the plus symbol and the minus symbol.
```UnOp ::= '+'
UnOp ::= '-'
```
• The application of a binary operator to two expressions is an expression.
```Exp := Exp BinOp Exp
```
• And there are a number of binary operators. For shorthand, we'll write a vertical bar (representing "or") to show alternates .
```BinOp ::= '+' | '-' | '*' | '/'
```
• A parenthesized expression is an expression
```Exp ::= '(' Exp ')'
```
• How do we show that `a+b*2` is an expression? First, we observe that the lexer converts this to
`identifier + identifier * number`
• The derivation follows. A right arrow (=>) is used to indicate one derivation step.
```Exp =>                              // Exp ::= Exp BinOp Exp
Exp BinOp Exp =>                  // Exp ::= identifier
identifier BinOp Exp =>           // BinOp ::= '+'
identifier + Exp =>               // Exp ::= Exp BinOp Exp
identifier + Exp BinOp Exp =>     // Exp ::= number
identifier + Exp BinOp number =>  // Exp ::= identifier
identifier + identifier BinOp number =>
identifier + identifier * number
```
• Note that we can have multiple choices at each step. For example, we might have expanded the second `Exp` rather than the first in `Exp BinOp Exp`.
• To describe these simultaneous choices, we often write visual descriptions of context-free derivations using trees. The interior nodes of a tree are the nonterminals. A derivation is shown by connecting a node (representing the lhs) to children representing the rhs of a derivation.
```                    Exp
/  |  \
/   |   \
Exp  BinOp  Exp------
/      |    | |      \
/       |    |  \      \
identifier    +   Exp  BinOp  Exp
/      |     |
/       |     |
identifier   *   number
```

### Ambiguity

• As in many other areas of computer science, careless grammar design can lead to ambiguity in understanding (and parsing) the objects we describe.
• An ambiguous grammar is one that provides multiple parse trees for the same string.
• Why is this a problem? Often the parse trees provide a natural mechanism for evaluating, compiling, understanding, or otherwise using the parsed expression.
• For example, one might evaluate parsed expressions by evaluating the subtrees and then applying the appropriate operation. We might get different results if we had different parse trees.
• How do we get around ambiguity?
• We identify potential areas of ambiguity (often, using tools that tell us about such ambiguities).
• We decide which parse tree is correct for our intended meaning.
• We rewrite our grammar to eliminate the ambiguity.
• We ensure that the new grammar describes an identical language to the old grammar.

### A Revised Expression Grammar

• The standard expression grammar is ambiguous. Why? Because there are multiple ways to parse expressions with two operations.
• Consider `num + num * num`.
• It might be parsed (in shorthand) as
```        Exp
/ | \
/  |  \
Exp  +  Exp
|     / | \
|    /  |  \
num Exp  *  Exp
|       |
|       |
num     num
```
• It might also be parsed as
```        Exp
/ | \
/  |  \
Exp  *  Exp
/ | \     |
/  |  \    |
Exp  +  Exp num
|       |
|       |
num     num
```
• Similarly, `num - num - num` might be parsed with two different trees.
• Is this a problem? Yes, in both cases.
• Which tree is correct?
• In the first case, it's the first tree, which shows us doing addition after multiplication. This is because multiplication has precedence over addition.
• In the second case, it's the second tree, which shows us doing the second subtraction second. This is because subtraction is left associative.
• How do we resolve the problems? It turns out that it's easiest to resolve the two problems separately.
• We're going to ignore the unary operators for this discussion.

### Adding Precedence to the Expression Grammar

• How do we handle precedence?
• By considering what's wrong with the misparsed tree.
• By dividing expressions up into categories to eliminate such problems.
• What's wrong with that tree?
• There's a subtree of a "multiplication tree" that contains unparenthesized addition.
• What does this suggest about categories? That we need a category for "does not include unparenthesized addition". We'll call this category `Term`.
• What is a `Term`? Something that may include multiplication but does not include no unparenthesized addition.
• What other categories do we need? We also need a category for stuff that can include addition.
• It could be "stuff that includes unparenthesized addition". We might call this `AddExp`
• It could be "stuff that may (but need not) include unparenthesized addition".
• Which do you prefer? We'll use the second, since it's close to our definition of `Term`.
• We observe that there are two possibilities for `Term`.
• Something that includes multiplication as the "top level" operation.
• Something that doesn't include multiplication as the "top level" operation. We'll call this `Factor`.
• What expressions include multiplication as the top level operation? We need multiplication. We also need safe "arguments". But we've defined such safe arguments as `Term`s.
```Term ::= Term MulOp Term
MulOp ::= '*' | '/'
```
• What are the other `Term`s? Those that don't include multiplication. We've already called those `Factor`s.
```Term ::= Factor
```
• What are the factors? The expressions that include parentheses and the base expressions.
```Factor ::= '(' + Exp + ')'
Factor ::= num
Factor ::= id
```
• Now, let's return to general expressions. These may or may not include addition at the root.
• As with `Term`, we can separate them into those that do and those that don't.
```Exp ::= Exp AddOp Exp
Exp ::= Term
```
• Have we eliminated the problem with precedence? If we try to misparse our original expression, we'll find that we can't even parse it.
• Do we have the same language? Informally, yes. It's clear that we haven't added any strings. Have we removed any? No, but I wouldn't want to try to argue that.

### Adding Associativity to the Expression Grammar

• Have we solved the precedence problem? No. You'll observe that we can still misparse the original expression (although with a different tree).
• How do we handle precedence?
• By considering which tree is correct.
• By considering why the incorrect tree is incorrect.
• By trying to change the grammar to eliminate the incorrect tree.
• Why is the incorrect tree incorrect? Because there is unparenthesized subtraction to "the right of" subtraction.
• What should we do? Come up with a category for "no unparenthesized subtraction".
• Do we have such a category? Yes, we called it `Term`.
```Exp ::= Exp AddOp Term
```
• Are there other similar areas we must worry about?. Yes, we'll need to worry about repeated division (or multiplication). We can use the same strategy.
```Term ::= Term MulOp Factor
```
• Have we changed the language? No, but the proof is more difficult.

### The Final Grammar

• Here is our unambiguous grammar, so far.
```Exp ::= Exp AddOp Term
|  Term
Term ::= Term MulOp Factor
|  Factor
Factor ::= '(' Exp ')'
|  num
|  id
```
• We could further extend this grammar in many ways.
• Include the unary operators. Note that each at most one unary operator is typically used (e.g., `-+num` is meaningless, as is `---num`).
• Include exponentiation. Is exponentiation left or right associative?
• Include other operations.
• Function calls.
• ...
• These extensions are left as an exercise to the reader.

Back to Grammars and parsing, continued. On to Predictive parsing.

History

• Created Friday, September 18, 1998. Parts taken from the outline of the previous class (everything on the expression grammar).
• Monday, September 21, 1998. Moved discussion of recursive-descent parsing to the next outline.

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.