Programming Languages (CSC-302 98S)

[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Assignments]

# Outline of Class 6: An Expression Grammar

Held: Friday, January 30, 1998

• Randy Smith of Torrent Systems, Inc., a "Parallel Software Infrastructure Company", will give a talk entitled The Programming They Don't Teach You In College on Monday, February 9, as part of the noon brown-bag lunch series. He'll also meet with students at 1pm in the forum for an informal lunch. Mr. Smith has worked for Thinking Machines, the Open Software Foundation, and the Free Software Foundation. Sign up with Sam Rebelsky if you'd like to meet with Mr. Smith.
• As some of you may know, the MathLAN was down from early yesterday evening until about 9 a.m. this morning. As you might guess, that negatively impacted my ability to prepare online notes for today's class, but I've tried to get most of the stuff out here.
• Don't forget Monday's brown-bag lunch on "More Java".
• A number of you have been sick over the past few days and haven't turned in your answers to assignment 1. Since I prefer to grade assignments en masse, I haven't begun grading.
• assignment 2 is now ready.
• The annual Summer research opportunities in math and cs will be held Thursday at 4:15 Sci 2424.

## An Expression Grammar

• Let us consider a grammar one might use to define simple arithmetic expressions over variables and numbers.
• Each number is an expression. We get numbers from the lexer.
```Exp ::= number
```

• Each identifier is an expression (we won't worry about types right now).
```Exp ::= identifier
```

• Each application of a unary operator to an expression is an expression.
```Exp ::= UnOp Exp
```

• The legal unary operators are the plus symbol and the minus symbol.
```UnOp ::= '+'
UnOp ::= '-'
```

• The application of a binary operator to two expressions is an expression.
```Exp := Exp BinOp Exp
```

• And there are a number of binary operators. For shorthand, we'll write a vertical bar (representing "or") to show alternates .
```BinOp ::= '+' | '-' | '*' | '/'
```

• A parenthesized expression is an expression
```Exp ::= '(' Exp ')'
```

• How do we show that `a+b*2` is an expression? First, we observe that the lexer converts this to
`identifier + identifier * number`
• The derivation follows. A right arrow (=>) is used to indicate one derivation step.
```Exp =>                              // Exp ::= Exp BinOp Exp
Exp BinOp Exp =>                  // Exp ::= identifier
identifier BinOp Exp =>           // BinOp ::= '+'
identifier + Exp =>               // Exp ::= Exp BinOp Exp
identifier + Exp BinOp Exp =>     // Exp ::= number
identifier + Exp BinOp number =>  // Exp ::= identifier
identifier + identifier BinOp number =>
identifier + identifier * number
```

• Note that we can have multiple choices at each step. For example, we might have expanded the second `Exp` rather than the first in `Exp BinOp Exp`.
• To describe these simultaneous choices, we often write visual descriptions of context-free derivations using trees. The interior nodes of a tree are the nonterminals. A derivation is shown by connecting a node (representing the lhs) to children representing the rhs of a derivation.
```                    Exp
/  |  \
/   |   \
Exp  BinOp  Exp------
/      |    | |      \
/       |    |  \      \
identifier    +   Exp  BinOp  Exp
/      |     |
/       |     |
identifier   *   number
```

## Ambiguity

• As in many other areas of computer science, careless grammar design can lead to ambiguity in understanding (and parsing) the objects we describe.
• An ambiguous grammar is one that provides multiple parse trees for the same string.
• Why is this a problem? Often the parse trees provide a natural mechanism for evaluating, compiling, understanding, or otherwise using the parsed expression.
• For example, one might evaluate parsed expressions by evaluating the subtrees and then applying the appropriate operation. We might get different results if we had different parse trees.
• How do we get around ambiguity?
• We identify potential areas of ambiguity (often, using tools that tell us about such ambiguities).
• We decide which parse tree is correct for our intended meaning.
• We rewrite our grammar to eliminate the ambiguity.
• We ensure that the new grammar describes an identical language to the old grammar.

## A Revised Expression Grammar

• The standard expression grammar is ambiguous. Why? Because there are multiple ways to parse expressions with two operations.
• Consider `num + num * num`.
• It might be parsed (in shorthand) as
```        Exp
/ | \
/  |  \
Exp  +  Exp
|     / | \
|    /  |  \
num Exp  *  Exp
|       |
|       |
num     num
```

• It might also be parsed as
```        Exp
/ | \
/  |  \
Exp  *  Exp
/ | \     |
/  |  \    |
Exp  +  Exp num
|       |
|       |
num     num
```

• Similarly, `num - num - num` might be parsed with two different trees.
• Is this a problem? Yes, in both cases.
• Which tree is correct?
• In the first case, it's the first tree, which shows us doing addition after multiplication. This is because multiplication has precedence over addition.
• In the second case, it's the second tree, which shows us doing the second subtraction second. This is because subtraction is left associative.
• How do we resolve the problems? It turns out that it's easiest to resolve the two problems separately.
• We're going to ignore the unary operators for this discussion.

### Adding Precedence to the Expression Grammar

• How do we handle precedence?
• By considering what's wrong with the misparsed tree.
• By dividing expressions up into categories to eliminate such problems.
• What's wrong with that tree?
• There's a subtree of a "multiplication tree" that contains unparenthesized addition.
• What does this suggest about categories? That we need a category for "does not include unparenthesized addition". We'll call this category `Term`.
• What is a `Term`? Something that may include multiplication but does not include no unparenthesized addition.
• What other categories do we need? We also need a category for stuff that can include addition.
• It could be "stuff that includes unparenthesized addition". We might call this `AddExp`
• It could be "stuff that may (but need not) include unparenthesized addition".
• Which do you prefer? We'll use the second, since it's close to our definition of `Term`.
• We observe that there are two possibilities for `Term`.
• Something that includes multiplication as the "top level" operation.
• Something that doesn't include multiplication as the "top level" operation. We'll call this `Factor`.
• What expressions include multiplication as the top level operation? We need multiplication. We also need safe "arguments". But we've defined such safe arguments as `Term`s.
```Term ::= Term MulOp Term
MulOp ::= '*' | '/'
```

• What are the other `Term`s? Those that don't include multiplication. We've already called those `Factor`s.
```Term ::= Factor
```

• What are the factors? The expressions that include parentheses and the base expressions.
```Factor ::= '(' + Exp + ')'
Factor ::= num
Factor ::= id
```

• Now, let's return to general expressions. These may or may not include addition at the root.
• As with `Term`, we can separate them into those that do and those that don't.
```Exp ::= Exp AddOp Exp
Exp ::= Term
```

• Have we eliminated the problem with precedence? If we try to misparse our original expression, we'll find that we can't even parse it.
• Do we have the same language? Informally, yes. It's clear that we haven't added any strings. Have we removed any? No, but I wouldn't want to try to argue that.

## A Note on Order of Evaluation

• Many people assume that the parse structure completely determines the order of evaluation.
• However, this is not true in most languages.
• Consider `alpha() + beta() * gamma()`
• We know that the result of `beta()` is multiplied by the result of `gamma()` and that result is then added to `alpha()`.
• Do you know what order `alpha()`, `beta()` and `gamma()` are evaluated in? In most languages, the answer is "it's up to the implementer."
• Does it make a difference? In many cases, yes. Consider the following code fragment in a generic language.
```global integer x = 1;

function zebra
begin
x = x + 1;
return x;
end zebra;

...
y = zebra() + zebra() - zebra();
```

• What is the value of `y`?

On to More Grammars
Back to Syntax
Outlines: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Current position in syllabus

[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Assignments]

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.