# Class 13: Shift-reduce parsing

Held Monday, September 28, 1998

Handouts

• An LR parsing table for expressions.
• The corresponding finite automaton.

Notes

• I've graded assignment 4 (the written assignment).
• Note that grades may be lower than you expect. This is because (as I've suggested in the past), I begin with a B for ``did the work''.
• How could you do better? On this assignment, explanations would have helped.
• There are a number of problems with the figures in the section on LR(1) and LALR(1) parsing. Make sure to check out the errata sheet for the Java or C edition.
• Today we will finish with chapter 3. You should begin reading chapter 4.

## Problems with this strategy

• As you may know, there are a number of problems with this strategy.
• In particular, we may
• Assign two right-hand-sides to the same symbol because both have that symbol in their first sets
• Assign two right-hand-sides to the same symbol because one has that symbol in its first set and the other is nullable and that symbol appears in N's Follow set.
• Assign two right-hand-sides to the same symbol because both are nullable and that symbol appears in N's Follow set.
• Do some combination of the above.

### Left-recursive grammars

• Left-recursive grammars are a particular problem. These are grammars which have a rule of the form N ::= N beta
• Consider
```A ::= A a
A ::= b
```
• When we see a b, what should we do?
• Note that any left-recursive grammar will have a similar problem, since anything that can begin a string derivable form the nonterminal can begin the left-recursive right-hand side.
• We can make such grammars right-recursive, although they then become less intuitive.

### Left-factoring

• At times, we have two right-hand-sides that begin with the same symbols. Often, we can factor out the common part to allow us to choose between the rules.

## Recursive descent parsing and other alternatives

• There are some significant disadvantages to recursive descent parsers.
• Some common grammars seem to require an arbitrary amount of lookahead (e.g., the standard expression grammar).
• Recursive descent often requires rewriting the original grammar, the rewritten grammars may be significantly less natural.
• Hence, compiler writers often look to other techniques.
• Note that recursive descent parsers are, in effect, top-down (you start with the start symbol and attempt to derive the string).
• We can gain some power by starting at the bottom and working our way up.

## Shift-reduce parsing

• The most common bottom-up parsers are the so-called shift-reduce parsers. These parsers examine the input tokens and either shift them onto a stack or reduce the top of the stack, replacing a right-hand side by a left-hand side.
• In some ways, these parsers can be viewed as finite automata that use a stack (also known as push-down automata).
• Shift reduce parsing is traditionally done with LR(k) parsers. The first L stands for ``left-to-right traversal of the input'', the next R stands for ``rightmost derivation'' and the k stands for ``number of characters of lookahead''.
• You should be able to understand the traversal and lookahead.
• The rightmost refers to the types of derivations that the grammar represents. In a rightmost derivation from the start symbol, you always replace the rightmost nonterminal.
• While LR parsers are bottom up, they simulate rightmost top-down derivations

## LR(0) Automata

• The simplest LR parsers are based on LR automata.
• ``Wait!'' You may say, ``How can we build a parser with no lookahead?''
• It turns out the the 0 refers to the lookahead used in constructing the automata, not (in this case) to the lookahead in running them.
• The states of all LR parsers are sets of extended productions (also called items), representing the states of all possible parsers (more or less)
• Each production is extended with a position (where in the parse we may be)
• Each production is may be extended with lookahead symbols (none in LR(0) automata).
• We begin building an LR(0) parser by augmenting the grammar with a simple rule in which we add an ``end of input'' (\$) to the start symbol.
```S' ::= S \$
```
• The initial state of an LR(0) parser begins with
```S' ::= . S \$
```
• This means ``when we begin parsing, we are ready to match an `S` and the end-of-input symbol''
• We then augment this with the other things that we might match on our way to building an S and end of input. What are those things? Anything that might build us an S. What have we seen of those things? Nothing, yet.
• For our expression grammar, we might write
```S' ::= . E \$
E ::= . E + T
E ::= . E - T
E ::= . T
```
• Of course, in this case, if we're waiting to see a T, then we also need to add the T rules (and then the F rules).
```S' ::= . E \$
E ::= . E + T
E ::= . E - T
E ::= . T
T ::= . T mulop F
T ::= . F
F ::= . id
F ::= . num
F ::= . ( E )
```
• We make new states in the automaton by choosing a symbol (terminal or nonterminal) and advancing the ``here mark'' (period) over that symbol in all rules, and then filling in the rest.
• For example, if we see an E in state 0, we could be
```S' ::= E . \$
E ::= E . + T
E ::= E . - T
```
• If we see a plus sign in that state, we could only be progressing on one rule, giving us
```E ::= E + . T
```
• But now, we're ready to see a `T`, so we need to fill in all the items that say ``ready to see a T''
```E ::= E + . T
T ::= . T mulop F
T ::= . F
F ::= . id
F ::= . num
F ::= . ( E )
```
• If we do indeed see a T, we advance the ``here mark'' and get to
```E ::= E + T .
T ::= T . mulop F
```
• The first part suggests that we may be at the end of an `E`. The second suggests that we may be in the midst of a `T`. How do we decide which it is? By context (and a little lookahead in some cases).

### Constructing LR(0) Automata, Formalized

• Let us know formalize what we did.
• We need a method that ``fills in the rest'' whenever we advance the mark. We'll call this `closure`
```closure(State S)
repeat
for each item, N ::= alpha . M beta in S
for each production M ::= gamma
add M ::= . gamma to S
end for each production
end for each item
until no changes are made to S
return S
end closure
```
• We need a method that describes the edges in the automata. We'll call this `goto`
```goto(State S, Symbol s)
NewS = {}
for each item N ::= alpha . s beta in S
NewS = NewS union { N ::= alpha s . beta }
end for
return closure(NewS)
end goto
```
• Finally, we can build the automaton as follows
```S0 = { S' ::= . S \$ }
S0 = closure(S0);
while there are unmarked states
pick a state, S
mark S
for each symbol, s, add state goto(S,s) with edge labelled s
end while
```

### Using LR(0) Automata

• It is fairly easy to use these automata.
• Begin in state 0 with state 0 on the stack
• For each input token, follow the appropriate edge and push the token and state on the stack.
• If you hit a state containing N ::= alpha ., reduce (pop the rhs off the stack and push the left hand side)
• After reducing, follow the edge corresponding to the left-hand side.

History

• Created Wednesday, September 23, 1998 (no contents)
• Added contents on Monday, September 28, 1998

Back to Predictive parsing, continued. On to Semantic actions.

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.