Class 10: An Expression Grammar
Held Friday, September 18, 1998
- Next Tuesday at 4:15 Yuriy will be presenting a cool talk on his
- Sunday the 27th at noon is the math/cs picnic. Come and have
- Once again, I apologize to the survivors of CS302 for any duplication
in today's class.
- Let us consider a grammar one might use to define simple arithmetic
expressions over variables and numbers.
- Each number is an expression. We get numbers from the lexer.
Exp ::= number
- Each identifier is an expression (we won't worry about types right
Exp ::= identifier
- Each application of a unary operator to an expression is an
Exp ::= UnOp Exp
- The legal unary operators are the plus symbol and the minus symbol.
UnOp ::= '+'
UnOp ::= '-'
- The application of a binary operator to two expressions is an expression.
Exp := Exp BinOp Exp
- And there are a number of binary operators. For shorthand, we'll write
a vertical bar (representing "or") to show alternates .
BinOp ::= '+' | '-' | '*' | '/'
- A parenthesized expression is an expression
Exp ::= '(' Exp ')'
- How do we show that
a+b*2 is an expression? First, we observe that the
lexer converts this to
identifier + identifier * number
- The derivation follows. A right arrow (=>) is used to indicate
one derivation step.
Exp => // Exp ::= Exp BinOp Exp
Exp BinOp Exp => // Exp ::= identifier
identifier BinOp Exp => // BinOp ::= '+'
identifier + Exp => // Exp ::= Exp BinOp Exp
identifier + Exp BinOp Exp => // Exp ::= number
identifier + Exp BinOp number => // Exp ::= identifier
identifier + identifier BinOp number =>
identifier + identifier * number
- Note that we can have multiple choices at each step. For example,
we might have expanded the second
Exp rather than the
Exp BinOp Exp.
- To describe these simultaneous choices, we often write visual
descriptions of context-free derivations using trees. The interior
nodes of a tree are the nonterminals. A derivation is shown by
connecting a node (representing the lhs) to children representing
the rhs of a derivation.
/ | \
/ | \
Exp BinOp Exp------
/ | | | \
/ | | \ \
identifier + Exp BinOp Exp
/ | |
/ | |
identifier * number
- As in many other areas of computer science, careless grammar design
can lead to ambiguity in understanding (and parsing)
the objects we describe.
- An ambiguous grammar is one that provides multiple parse trees for
the same string.
- Why is this a problem? Often the parse trees provide a natural
mechanism for evaluating, compiling, understanding, or otherwise
using the parsed expression.
- For example, one might evaluate parsed expressions by evaluating the
subtrees and then applying the appropriate operation. We might get
different results if we had different parse trees.
- How do we get around ambiguity?
- We identify potential areas of ambiguity (often, using tools that
tell us about such ambiguities).
- We decide which parse tree is correct for our intended meaning.
- We rewrite our grammar to eliminate the ambiguity.
- We ensure that the new grammar describes an identical language
to the old grammar.
- How do we handle precedence?
- By considering what's wrong with the misparsed tree.
- By dividing expressions up into categories to eliminate such problems.
- What's wrong with that tree?
- There's a subtree of a "multiplication tree" that contains
- What does this suggest about categories? That we need a
category for "does not include unparenthesized addition". We'll call
- What is a
Term? Something that may include
multiplication but does not include no unparenthesized addition.
- What other categories do we need? We also need a category
for stuff that can include addition.
- It could be "stuff that includes unparenthesized addition". We
might call this
- It could be "stuff that may (but need not) include unparenthesized
- Which do you prefer? We'll use the second, since it's
close to our definition of
- We observe that there are two possibilities for
- Something that includes multiplication as the "top level" operation.
- Something that doesn't include multiplication as the "top level" operation. We'll call this
- What expressions include multiplication as the top level operation?
We need multiplication. We also need safe "arguments". But we've
defined such safe arguments as
Term ::= Term MulOp Term
MulOp ::= '*' | '/'
- What are the other
Terms? Those that don't include
multiplication. We've already called those
Term ::= Factor
- What are the factors? The expressions that include parentheses and
the base expressions.
Factor ::= '(' + Exp + ')'
Factor ::= num
Factor ::= id
- Now, let's return to general expressions. These may or may not
include addition at the root.
- As with
Term, we can separate them into those that do and
those that don't.
Exp ::= Exp AddOp Exp
Exp ::= Term
AddOp ::= '+' | '-'
- Have we eliminated the problem with precedence? If we try
to misparse our original expression, we'll find that we can't even
- Do we have the same language? Informally, yes. It's clear that
we haven't added any strings. Have we removed any? No, but I wouldn't
want to try to argue that.
- Created Friday, September 18, 1998. Parts taken from the outline of
the previous class (everything on the expression grammar).
- Monday, September 21, 1998. Moved discussion of recursive-descent
parsing to the next outline.
Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.
Source text last modified Mon Sep 21 10:09:55 1998.
This page generated on Wed Sep 23 13:11:43 1998 by SiteWeaver.
Contact our webmaster at firstname.lastname@example.org