[Instructions] [Search] [Current] [News] [Syllabus] [Handouts] [Outlines] [Assignments]

**Held** Monday, September 7, 1998

**Handouts**

**Notes**

- Mr. Walker will be observing classes this week in preparation for my interim review. Feel free to let him know your thoughts on my teaching abilities.
- The structure for the next few days is
- Monday: introduction to lexical analysis; regular expressions
- Wednesday: finite automata
- Friday: more on finite automata

- Given that structure, do you have any questions on the readings appropriate for today?
- When I am done recording or answering those questions, we will take a quiz on lexical analysis.
- I have prepared some comments on the initial debates. Most of those comments apply to most of you. I will also send a few select comments to individuals.
- I would also like you to fill out a short survey on the debates.
- The in-class evaluation forms were clearly too much (too many questions for too many people). There was significant variation (from 1 to 5 on the same question for the same person), not everyone filled them out, and there were few qualitative comments (the more useful comments, in my opinion). Next time, you will only evaluate each team as a whole. I will not be summarizing these evaluation forms for this debate.
- I've designed a few new guidelines that we will be using in our subsequent debates (if we have subsequent debates).
- Tomorrow at 4:30, Hilary, Kevin, Raphen, and Sarah L. will be discussing their summer research. Stop by at 4:15 and get food first.
- I will do my best to assign project teams by Wednesday.
- In preparation for Wednesday's class, you should develop a regular
expression for the following language (a variant of comments in C, Pascal,
and Java)
A subset of [a-z]* in which each element begins with

`az`

, ends with`za`

, and contains no substring of the form`az`

.

- Lexical analysis is traditionally the first ``real'' step in compilation.
During lexical analysis, one identifies the simple
*tokens*(also called*lexemes*that make up a program. - What are these tokens? Things like identifiers, particular keywords,
symbols, and such.
- Some tokens have associated
*semantic*values, such as the name of an identifier or the value of an integer.

- Some tokens have associated
- During the tokenizing phase, a compiler converts a sequence of characters (sometimes thought of as a sequence of bytes) to a sequence of tokens.
- During this conversion we often eliminate ``unnecessary'' components, such
as whitespace (spaces, tabs, newlines) and comments.
- However, there are reasons to preserve some related information, such as the number of the line (used in printing error messages).

- Regular expressions can only describe relatively simple languages, but they permit very efficient membership tests and matching.
- [Note that we need a language to describe regular expressions, but we're only now learning how to describe languages, which leads to an interesting problem. We'll use an informal description of regular expressions.]
- Like all languages, the language of regular expressions is based on an underlying alphabet, sigma.
- We'll denote the set of utterances a regular expression denotes by L(exp).
- There is a special symbol, epsilon, not in sigma.
- epsilon is written as epsilon
- L(epsilon) is the set of the empty string, { "" }.

- Any single symbol in sigma is a regular expression.
- The regular expression for that symbol is that symbol
- For each symbol s in sigma, L(s) = { s }.

- The
*concatenation*of any two regular expressions is a regular expression denoting the combinations of strings from those two regular expressions.- If R and S are regular expressions, the concatenation of R and S is written (R.S)
- L((R.S)) = { concat(x,y) | x in L(R), y in L(S) }

- The concatenation of any two strings is the two strings written in
sequence with no intervening space.
- concat("hello", "world") = "helloworld"

- epsilon is the identity string for concatenation
- concat(epsilon,s) = concat(s,epsilon) = s

- We sometimes will want shorthand
- (R.R) may also be written as (R)^2; ((R.R).(R)) as (R)^3; and so on and so forth.

- The
*alternation*of any two regular expressions is a regular expression denoting strings in either language.- If R and S are regular expressions, the alternation of R and S is written (R|S).
- L((R|S)) = L(R) union L(S)

- If R is a regular expression, then the Kleene star of R is a regular
expression.
- If R is a regular expression, the Kleene star of R is written (R*).
- L((R*)) = L(epsilon) union L((R)^1) union L((R)^2) ...

- Because the parentheses are cumbersome, you may remove them if the
meaning is obvious without them. In addition, we often don't write
the concatenation symbol.
- Kleene star has highest precedence; concatenation next; alternation least.
- ab* is (a.(b*)) and not ((a.b)*)
- a|bc is (a|(b.c)) and not ((a|b).c)

- There are a number of shorthands for regular expression. None add
any expressive power and all can be ``compiled'' into normal regular
expressions. These include
- A postfix plus sign (+) for ``at least one instance''. a+ is (a.a*).
- Brackets for alternation of larger sets. [abc] is a|b|c. [a-c] is also a|b|c.
- A postfix question mark for ``optional''. a? is shorthand for (a|epsilon).

- What are some sample regular expressions?
- All strings of a's and b's: (a+b)*
- Strings of a's and b's with exactly one b: a*ba*
- Strings of one or more a's: aa* (also a*a, also a*aa*)

Back to Debate on Tiger vs. our own language as implemented language. On to Introduction to finite automata.

[Instructions] [Search] [Current] [News] [Syllabus] [Handouts] [Outlines] [Assignments]

**Disclaimer** Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

Source text last modified Fri Sep 11 12:28:14 1998.

This page generated on Wed Sep 16 11:21:16 1998 by SiteWeaver.

Contact our webmaster at rebelsky@math.grin.edu