# Class 06: From Specification to Optimal DFA

Back to Finite Automata. On to From Specification to Optimal DFA, Continued.

Held Wednesday, September 11, 2002

Summary

Today we start to consider how to move from the readable but declarative regular expression notation to the executable but obtuse finite automaton notation.

Notes

• A few of you didn't show up to lab yesterday and didn't warn me in advance. That's a no no.
• The CS picnic is next Friday, Sept. 20th. Please sign up for the picnic. (And yes, you can bring your parents, but include a note to that effect).
• I plan to assign the lexical analyzer on Friday or Monday.

Assignments

• Do Homework 1 (Due Wednesday, September 17, 2002)

Overview

• From regular expressions to NFAs
• An example
• From NFAs to DFAs
• Example continued

## From Regular Expression to DFA

• We've now seen two mechanisms for describing tokens (and, in effect, for tokenizing): regular expressions and finite automata.
• How do we go from one to another?
• We convert regular expressions to nondeterministic finite automata using a surprisingly naive translation.
• We convert NFAs to DFAs.
• We then optimize the DFAs.

## From Regular Expression to NFA

• Our regular expression to NFA converter builds a finite automaton with one final state (and one start state). It uses the recursive definitions of regular expressions to build a finite automaton for a compound expression from the finite automata for the base expressions.
• Each created NFA has a unique start state and a unique final state.
• The NFA for the empty string has two states with an epsilon transition between the states. The first state is the start state. The second state is the final state.
• The NFA for a single symbol is similar, except that the transition is labeled with the symbol.
• The NFA for alternation has a new start state with epsilon edges to the start states of the NFA for the subexpressions and a new final state with epsilon edges from the final states of the NFA for the subexpressions.
• The finite automaton for concatenation uses the start state of the NFA for the first expression and the final state of the NFA for the second expression. There is an epislon transition from the final state of the first NFA to the start state of the second NFA.
• Alternatively, we can merge the final state of the first NFA with the start state of the second NFA.
• The NFA for repetition has a new start state and a new final state. There are also epsilon transitions
• from the start state to the final state,
• from the start state to the start state of the sub-NFA,
• from the final state of the sub-NFA to the final state, and
• from the final state of the sub-NFA to the start state of the sub-NFA.
• If it seems to you that some of these extra states and epsilon transitions are unnecessary, you're probably right. However, you can't get rid of all of them.
• For example, consider what happens if you don't create any extra states for concatenation or repetition.

## Some Examples

• We'll develop some examples on the fly.
• I'll suggest that we do integers in C
• (([1-9][0-9]*)|(0[1-9]*)|(0x[1-9A-Fa-f]+))[sSuU]?[lL]?

## From NFA to DFA

• How do we turn this lovely NFA into a deterministic computing machine?
• Basically, we build a DFA that simultaneously follows all the paths that we can go through in the NFA.
• Each state in the DFA is a set of states in the corresponding NFA.
• The algorithm is again fairly simple
• Terminology :
• qx is a state in the NFA and q0 is its start state
• QX is a state in the DFA and Q0 is its start state
• The eqsilon-closure of a DFA state consists of all the states in the NFA that we can get to with epsilon moves from the states in the DFA state.
```Q0 = { q0 }
// but there are some states we can reach from q0 at no cost
Q0 = epsilon-closure(Q0)
while there are states we haven't processed
pick one such state, Qn
for each symbol s
let tmp be a new, empty, set
for each q in Qn
end for
tmp = epsilon-closure(tmp)
if tmp is not in the DFA then
let Qi be a new state
Qi = tmp
else
let Qi be the state equivalent to tmp
end if
add an edge from Qn to Qi in the automaton
end for
end while
for each Qi
if there is a q in Qi that is a final state then
Qi is a final state
end if
end for
```

## History

Thursday, 29 August 2002

• First version, based somewhat on outlines from CS362 2001S.

Back to Finite Automata. On to From Specification to Optimal DFA, Continued.

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Fri Dec 6 10:37:57 2002.
The source to the document was last modified on Wed Sep 4 10:08:33 2002.
This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2002F/Outlines/outline.06.html`.

You may wish to validate this document's HTML ; ; Check with Bobby

Glimmer Labs: The Grinnell Laboratory for Interactive Multimedia Experimentation & Research
glimmer@grinnell.edu