Software Design (CSC-223 97F)
Outline of Class 34: Regular Expressions, Continued
- How did the previous assignment go? It was somewhat harder than
I intended. Hopefully, it got you thinking about the use and
complexity of regular expressions.
- I've done my best to modify the previous
outline to better reflect what we talked about in class.
- While most of my coauthors were happy with our chapter, one
ripped it to shreds. This may mean that I won't get to grading
this weekend, and for that I apologize.
- Don't forget the many forthcoming talks, including today's 2:15 talk
on chaos in biology.
- Mr. O'Fallon and Mr. Bright will be presenting today on the
year 2000 problem. You can find their presentation at
- Those of you still looking for a topic might want to consider
as the basis of a talk. You would, however, need to find more
- You may recall that
grep is a program designed to
find lines in a file that match a regular expression.
grep returns all lines which contain a substring
matched by the regular expression (all lines that contain a
substring that is in the set denoted by the regular expression).
- The normal form is
grep options file
- This says to look for all matching lines in a particular file.
- The typical form used in filtering is
- This and returns the matching lines from standard input.
- An alternate from is
grep options file1 file2 ...
- This is used to look for a pattern in multiple files.
grep permits a variety of command line switches,
-E to use extended regular expressions.
-v to negate the match (only nonmatching lines
-i to ignore case differences.
-l to list files that contain the pattern (but
not print the lines).
- We worked on a fairly hard expression in class to identify lines
in a file that have a less-than sign that does not have a corresponding
greater-than sign before the end of the line.
- Purpose: identifying potentially bad HTML
- What are some lines that should match?
- < text
- text <
- text < text
- < <tag>
- < <tag> >
- <tag> <
- What are some lines that shouldn't match?
- One person suggested using the
-v flag in our solution.
We would then want to come up with a pattern for "legal lines".
- How do we identify legal lines? By breaking the line up into parts.
- A legal line begins with some non-tag stuff. We can use the set
negation notation for that,
- Then there are a series of tags/nontags.
- A tag is a less-than sign, non-signs, a greater-than sign. Or,
- We've already done nontags.
- After all these tag/nontag pairs comes an optional set of nontag
- We need to anchor the pattern at the beginning and end of the line.
- Putting it all together ...