[Instructions] [Search] [Current] [Syllabus] [Links] [Handouts] [Outlines] [Assignments] [Labs]

Back to Prolog, Concluded. On to An Expression Grammar.

**Held** Wednesday, April 7

**Summary**

**Contents**

**Notes**

- The exam is due on Friday. I'll try to have it graded by the beginning of next week.
- Sorry, still working on the answer key for homework 4. I do have HW4 graded, though. Finding free time is nearly impossible these days.
- Those of you on the job market may want to look at
`http://www.jobweb.org/pubs/joboutlook/want.htm`

- We've spent some time looking at the formal
*semantics*of languages, particularly the semantics of Scheme. Are there other aspects of a language we should study?- Yes, we should also study the
*syntax*of a language.

- Yes, we should also study the
- The
*syntax*of a language is (describes?) the way a language looks: the symbols and combinations of symbols that form valid utterances.- At times, the syntax describes a superset of the valid utterances; if we said that every ``noun-phrase transitive-verb noun-phrase'' sequence was an English sentence, we'd allow some meaningless sentences like ``the dog flew an ice-cream cone''.
- The semantics can help classify utterances as ``valid'' or ``invalid''.

- The syntax is also used to assign ``types'' to utterances (e.g., this is a noun-phrase, this is a sentence, this is a program, this is a function call).

- From the syntactic perspective, a
*language*is a potentially infinite set of strings built from symbols in a base*alphabet*.- A
*string*is a fine linear sequence of symbols from the alphabet.

- A
- How do we finitely describe an infinite set?
- Typically, with a
*grammar*: rules in a particular language along with a logic of generation that those rules can use.- Rule: ``A sentence can be a noun-phrase transitive-verb noun-phrase triplet''
- Logic: ``If X can be a X
_{1}... X_{n}, then to build X, we first build all the X_{i}'s''.

- One particularly convenient form of grammar is the
*regular expression*. - Regular expressions can only describe relatively simple languages, but they permit very efficient membership tests and matching.
- [Note that we need a language to describe regular expressions, but we're only now learning how to describe languages, which leads to an interesting problem in self-reflexion. We'll use an informal description of regular expressions.]
- Like all languages, the language of regular expressions is based on an underlying alphabet, sigma.
- We'll denote the set of utterances a regular expression denotes by L(exp).
- There is a special symbol, epsilon, not in sigma.
- epsilon is written as epsilon
- L(epsilon) is the set of the empty string, { "" }.

- Any single symbol in sigma is a regular expression.
- The regular expression for that symbol is that symbol
- For each symbol s in sigma, L(s) = { s }.

- The concatenation of any two regular expressions is a regular
expression denoting the combinations of strings from those two
regular expressions.
- If R and S are regular expressions, the concatenation of R and S is written (R)(S)
- L((R)(S)) = { concat(x,y) | x in L(R), y in L(S) }
- concat(epsilon,s) = concat(s,epsilon) = s
- (R)(R) may also be written as (R)^2; (R)(R)(R) as (R)^3; and so on and so forth.

- The alternation of any two regular expressions is a regular expression
denoting strings in either language.
- If R and S are regular expressions, the alternation of R and S is written (R)+(S).
- L((R)+(S)) = L(R) union L(S)

- If R is a regular expression, then the Kleene star of R is a regular
expression.
- If R is a regular expression, the Kleene star of R is written (R)*.
- L((R)*) = L(epsilon) union L((R)^1) union L((R)^2) ...

- Because the parentheses are cumbersome, you may remove them if the
meaning is obvious without them.
- Kleene star has highest precedence; concatenation next; alternation least.
- ab* is (a)((b)*) and not ((a)(b))*
- a+bc is (a)+((b)(c)) and not ((a)+(b))(c)

- There are a number of shorthands for regular expression. None add
any expressive power and all can be ``compiled'' into normal regular
expressions.
- For example, many use ``set notation'': [abc] is shorthand for a+b+c.
- Similarly, many use ``range notation'': [a-f] is shorthand for a+b+c+d+e+f.

- What are some sample regular expressions?
- All strings of a's and b's: (a+b)*
- Strings of a's and b's with exactly one b: a*ba*
- Strings of one or more a's: aa* (also a*a, also a*aa*)

- Often, we name our regular expressions so that we can more easily
identify different things.
- Generally, it is legal to use any previously defined name. (This restriction prevents recursive or mutually recursive definitions.)

- For example
lowercase: a+b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z uppercase: A+B+C+D+E+F+G+H+I+J+K+L+M+N+O+P+Q+R+S+T+U+V+W+X+Y+Z letter: (uppercase)+(lowercase) digit: 0+1+2+3+4+5+6+7+8+9 number: (digit)(digit)* word: (letter)(lowercase)* identifier: ((letter)+_)((digit)+(letter)+_)*

**History**

- Created Tuesday, January 19, 1999 as a blank outline.
- Filled in the details on Monday, April 5, 1999. Many of the details on language syntax were based on outline 5 of CS302 98S, although they were rewritten to accomdate changes to the structure of the class (we're covering topics in a much different order).
- Removed some uncovered materials on Friday, April 9, 1999.

Back to Prolog, Concluded. On to An Expression Grammar.

[Instructions] [Search] [Current] [Syllabus] [Links] [Handouts] [Outlines] [Assignments] [Labs]

**Disclaimer** Often, these pages were created ``on the fly'' with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

This page may be found at http://www.math.grin.edu/~rebelsky/Courses/CS302/99S/Outlines/outline.26.html

Source text last modified Fri Apr 9 12:26:44 1999.

This page generated on Wed Apr 14 10:42:11 1999 by SiteWeaver. Validate this page's HTML.

Contact our webmaster at rebelsky@math.grin.edu