# Class 16: Shift-Reduce Parsing, Continued

Back to Shift-Reduce Parsing. On to Pause for Breath.

Held Monday, February 26, 2001

Summary

Today we consider how to build shift-reduce automata and the tables that represent them..

Overview

• Building LR(0) automata
• Example
• Some potential problems
• Other tehcniques for computing shift-reduce automata

## LR(0) Automata

• The simplest LR parsers are based on LR automata with no lookahead.
• ``Wait!'' You may say, ``How can we build a parser with no lookahead?''
• It turns out the the 0 refers to the lookahead used in constructing the automata, not (in this case) to the lookahead in running them.
• The states of all LR parsers are sets of extended productions (also called items), representing the states of all possible parses of the input (more or less)
• Each production is extended with a position (where in the parse we may be)
• Each production is may be extended with lookahead symbols (none in LR(0) automata).
• We begin building an LR(0) parser by augmenting the grammar with a simple rule in which we add an ``end of input'' (\$) to the start symbol.
```S' ::= S \$
```
• The initial state of an LR(0) parser begins with
```S' ::= . S \$
```
• This means ``when we begin parsing, we are ready to match an `S` and the end-of-input symbol''
• We then augment this with the other things that we might match on our way to building an S and end of input. What are those things? Anything that might build us an S. What have we seen of those things? Nothing, yet.
• For our expression grammar, we might write
```S' ::= . E \$
E ::= . E + T
E ::= . E - T
E ::= . T
```
• Of course, in this case, if we're waiting to see a T, then we also need to add the T rules (and then the F rules).
```S' ::= . E \$
E ::= . E + T
E ::= . E - T
E ::= . T
T ::= . T mulop F
T ::= . F
F ::= . id
F ::= . num
F ::= . ( E )
```
• We make new states in the automaton by choosing a symbol (terminal or nonterminal) and advancing the ``here mark'' (period) over that symbol in all rules, and then filling in the rest.
• For example, if we see an E in state 0, we could be
```S' ::= E . \$
E ::= E . + T
E ::= E . - T
```
• If we see a plus sign in that state, we could only be making progress on one rule, giving us
```E ::= E + . T
```
• But now, we're ready to see a `T`, so we need to fill in all the items that say ``ready to see a T''
```E ::= E + . T
T ::= . T mulop F
T ::= . F
F ::= . id
F ::= . num
F ::= . ( E )
```
• If we do indeed see a T, we advance the ``here mark'' and get to
```E ::= E + T .
T ::= T . mulop F
```
• The first part suggests that we may be at the end of an `E`. The second suggests that we may be in the midst of a `T`. How do we decide which it is? By context (and a little lookahead in some cases).

### Constructing LR(0) Automata, Formalized

• Let us now formalize what we did.
• We need a method that ``fills in the rest'' whenever we advance the mark. We'll call this `closure`
```closure(State S)
repeat
for each item, N ::= alpha . M beta in S
for each production M ::= gamma
add M ::= . gamma to S
end for each production
end for each item
until no changes are made to S
return S
end closure
```
• We need a method that describes the edges in the automata. We'll call this `goto`
```goto(State S, Symbol s)
newS = {}
for each item N ::= alpha . s beta in S
newS = newS union { N ::= alpha s . beta }
end for
return closure(newS)
end goto
```
• Finally, we can build the automaton as follows
```S0 = { S' ::= . S \$ }
S0 = closure(S0);
while there are unmarked states
pick a state, S
mark S
for each symbol, s, add state goto(S,s) with edge labelled s
end while
```

### Using LR(0) Automata

• It is fairly easy to use these automata.
• Begin in state 0 with state 0 on the stack
• For each input token, follow the appropriate edge and push the token and state on the stack.
• If you hit a state containing N ::= alpha ., reduce (pop the rhs off the stack and push the left hand side)
• After reducing, follow the edge corresponding to the left-hand side.

## Other LR Issues

### Conflicts in LR Automata

• At times, there are conflicts in LR automata. What kinds of conflicts?
• A state may include a ``final'' item (one in which the position marker is at the end) and a nonfinal item.
• This is called a shift-reduce conflict
• A state may include two different ``final'' items.
• This is called a reduce-reduce conflict
• Can we have reduce-reduce conflicts in unambiguous grammars?
• Can we have a shift-shift conflict?

### SLR Automata

• You may have noted (e.g., from our example yesterday) that LR(0) automata can be overly aggressive in choosing to reduce.
• Such automata typically reduce whenever we've reached the end of a right-hand side.
• SLR automata only reduce when the next token is in `Follow` of the left-hand side of the reduction.
• Such choices may help us resolve reduce-reduce conflicts.
• Hence, SLR automata are used more commonly in parser generators.

### LR(1) Automata

• LR(1) automata require a more complicated construction process, one that involves lookahead.
• In effect, instead of using the Follow table (as SLR automata do), LR(1) automata build more specific follow tables that correspond to the possible follow symbols according to a particular context.
• Each LR(1) item contains not just an augmented production, but also a token that can follow the nonterminal when we've reached the current state.
• The tokens are inserted by the closure routine.
• If we have N ::= alpha . M beta then when we insert the M items, we indicate that each of them can be followed by the tokens in first(beta)
• If beta is nullable, then the M items can also be followed by whatever can follow N (in the given LR(1) item).

## History

Monday, 22 January 2001

• Created as a blank outline.

Friday, 23 February 2001

Back to Shift-Reduce Parsing. On to Pause for Breath.

Disclaimer: I usually create these pages on the fly. This means that they are rarely proofread and may contain bad grammar and incorrect details. It also means that I may update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This page was generated by Siteweaver on Mon Apr 30 10:51:57 2001.
This page may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2001S/outline.16.html`.