# Class 07: From Specification to NFA

Back to Finite Automata. On to From NFA to Optimal DFA.

Held Monday, February 5, 2001

Summary

Today we consider how to move from the readable but declarative regular exprssion notation to the executable but obtuse finite automaton notation.

Notes

• If you have not done so already, please start reading Chapter 4 of the Red Dragon.
• Please don't forget lab this week.
• I'm still working on writing up last Wednesday's class.

Overview

• Regular expressions vs. DFAs.
• Regular expressions as lexical descriptions.
• From lexical specification to DFA
• From one regular expression to NFA
• From many regular expressions to NFA

## Tokenizing with Finite Automata

• We've seen how to match with finite automata, but how do we tokenize?
• We only tokenize with deterministic finite automata.
• We begin by attaching a token (or, sometimes, a set of instructions) to each final state.
• When we reach an appropriate final state (possibly not the first final state), we stop and return the corresponding token.
• How do we decide whether a final state is appropriate? We always keep track of the last final state we encountered, but peek ahead until we find another final state or hit a dead end.
```/**
* Find the first token in the candidate string, starting
* at a particular position.
*/
public token findToken(String candidate, starting_pos)
begin
State current_state = q0;
for i = starting_pos to the length of candidate
current_state = edge(current_state,candidate.symbolAt(i))
if current_state is a final state
final_found = current_state
final_pos = i
end if
if current_state is undefined then
exit the for loop
end if
end for
if (final_found is defined) then
return the token given by final_found at position final_pos
else
no token can be found
end if
end
```

## From Regular Expression to DFA

• We've now seen two mechanisms for describing tokens (and, in effect, for tokenizing): regular expressions and finite automata.
• How do we go from one to another? We convert regular expressions to nondeterministic finite automata using a surprisingly naive translation. We convert NFAs to DFAs. We then "optimize" the DFAs.
• We'll do this with a trio of expressions: one for an odd variant of identifiers ((a|bb)[a-d]*) and two for a simple keywords (aa and dd).

### From Regular Expression to NFA

• Our regular expression to NFA converter builds a finite automaton with one final state (and one start state). It uses the recursive definitions of regular expressions to build a finite automaton for a compound expression from the finite automata for the base expressions.
• The finite automaton for the empty string has two states with an epsilon transition between the states. The first state is the start state. The second state is the final state.
• The finite automaton for a single symbol is similar, except that the transition is labeled with the symbol.
• The finite automaton for alternation has a new start state with epsilon edges to the start states of the automata for the subexpressions and a new final state with epsilon edges from the final states of the automata for the subexpressions.
• The finite automaton for concatenation uses the start state of the automaton for the first expression and the final state of the automaton for the second expression. There is an epislon transition from the final state of the first automaton to the start state of the second automaton.
• The finite automaton for repetition has a new start state and a new final state. There are also epsilon transitions
• from the start state to the final state,
• from the start state to the start state of the subautomaton,
• from the final state of the subautomaton to the final state, and
• from the final state of the subautomaton to the start state of the subautomaton.
• If it seems to you that some of these extra states and epsilon transitions are unnecessary, you're probably right. However, you can't get rid of all of them. For example, consider what happens if you don't create any extra states for concatenation or repetition.

### From Token Definitions to NFA

• How do we convert a series of token definitions (using regular expressions) to a tokenizing NFA?
• We build an NFA for each of the token definitions.
• We augment the final states with token type and priority.
• We put 'em together in one big automaton by adding a new start state with epsilon transitions to the start states of all the other automata.

## History

Monday, 22 January 2001

• Created as a blank outline.

Monday, 5 February 2001

• Filled in the details, based primarily on outline 7 of CSC152 98F.
• Updated formatting.
This page may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2001S/outline.07.html`.