[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Assignments]

**1. A BNF Grammar for Regular Expressions**

Recall that regular expressions over an alphabet, sigma, consist of

- epsilon
- singleton elements
- concatenated regular expressions of the form
`RS`

- alternated regular expressions of the form
`R+S`

- starred regular expressions of the form
`R*`

- starred regular expressions of the form
- parenthesized regular expressions of the form
`(R)`

The formal definition of regular expressions uses parentheses in each
case. This helps avoid ambiguity but leads to longer expressions, like
`((a)+(b))((c)*)`

. Hence, in practice we often use a more
concise
notation. To ensure that every expression, such as `d+ef*`

is unambiguous, we need to choose appropriate precedence levels.
Parenthesization has highest precedence, followed by Kleene star,
followed by concatenation, followed by alternation. For example, the
firat expression above might be written `(a+b)c*`

and the
second expression above is shorthand for
`(d)+((e)((f)*))`

.

Write an unambiguous BNF grammar for regular expressions that follows
these precedence rules. You can assume that members of the alphabet
have been tokenized and can be referred to with the terminal symbol
`symbol`

.

**Solution**

I've chosen to write a grammar in which starred expressions cannot
be starred. This is because `E**`

is
semantically equivalent to `E*`

.

RegExp ::= RegExp '+' RegTerm | RegTerm RegTerm ::= RegTerm RegStar | RegStar RegStar ::= RegFactor '*" | RegFactor RegFactor ::= epsilon | symbol | '(' RegExp ')'

**Notes**

The errors for this problem tended to be a failure to make the grammar unambiguous or the assign appropriate precedence rules. Because '+' and concatenation are not obviously "associative", some found it difficult to decide to how do associativity and therefore left the grammar ambiguous.

Another common problem was to insert a `'.'`

for concatenation
when it wasn't part of the original language. I was relatively lenient
when correcting this error, as it seemed less significant.

**2. L-values and R-values**

Give an assignment statement in an Algol-like language involving variable X such that the R-value of X is used on the left-hand-side of the assignment and the L-value of X is used on the right-hand-side of the assignment. You may choose any type you deem appropriate for X. You may also use other variables in the assignment. Again, you should indicate the type of each variable you use, especially of X.

**Solution**

var y,z: ref int; var x: int; if (x < 3) then y else z := x;

The `x<3`

comparision requires the *r-value* of
`x`

and falls on the left-hand-side. The result of the
`if`

expression is a variable of type "ref int", so we'll
need to get a "ref int" on the right hand side. Since the *l-value*
of `x`

is of type "ref int", we use that l-value.

Another common solution was

var A : array of ref int; A[x] = x;

**3. Critiquing Variant Records**

Some versions of Pascal support a data storage mechanism called the
*variant record* in which it is possible to assign multiple types
to the same storage location. Why do you think the designers of these
versions of Pascal included such a mechanism? What language design
criteria does it violate, and why?

**Notes**

Most of your answers were fairly good. A few of you failed to answer half of the question (Why do you tink they chose to include it in the language?) and were penalized appropriately.

**4. Assigning Arrays**

While arrays play a large role in many programming languages, they are often supported in a way inferior to other data types. For example, many languages only permit you to describe the full contents of an array when the array is being created. Similarly, few permit you to assign one array to another (with the sense of copying all the elements from one array to another).

A language designer might chose to provide many more mechanisms for
working with arrays. In particular, one might allow programmer to
assign to arrays. For example, if `A`

and `B`

are
arrays of the same size, then `A=B`

is a legal assignment and
copies all the elements of `B`

into `A`

.
Similarly, one might also permit the programmer to describe "array
values" with an appropriate syntax, such as surrounding the contents
with parentheses. For example, `(4, 2, 7)`

might represent
an array of three integers. Similarly, if `x`

and
`y`

are string variables, then `(x, y)`

is a valid
array value. If `A`

is an array of size 5, then ```
A =
(x,y,4,2,x)
```

would be a valid assignment.

*4.1. What design principles would suggest such a language design?*

**Partial Solution**

There are a number of critiera that would sugest such a design. One of
these is *uniformity*. Since we can assign to variables of other
types (e.g., integer, boolean, real), uniformity suggests that we should
be able to assign to variables of all types. Similarly, since we can
express "base values" in various primitive types, uniformity suggests that
we should also be able to express base values in our array types.

Another reasonable criterion is *writeability*. It is far
easier to initialize all of the elements of an array in one fell swoop
than to do it an element at a time. Some of you suggested doing the
initialization with a for loop, but I'm not sure how you would do that.

One might even take this idea to an extreme, and permit such array
values on either side of an assignment statement. If `C`

is
an array of size 2, then `(x,y) = C`

might be a valid
assignment that assigns the first value in
`C`

to `x`

and the second value
in `C`

to `y`

.

*4.2. What design principles would suggest such a design?*

**Partial Solution**

*Orthogonality* normally suggests that if you can do something
in one way, then you should be able to do it in similar ways. However,
this isn't the best argument for such an assignment, since the ability
to write `x = 3`

does not imply an ability to write
`3 = x`

.

Again, *writability* seems to be a significant motivating
criterion for this type of assignment. Among other things, it might
allow us to return multiple values from functions and then separate
those values into different variables. It also makes it easier to
"mass initialize" variables.

*4.3. What difficulties do you foresee in adding such array support to
the language?*

**Partial Solution**

The greatest difficulties will have to do with typing. There are a number of key issues in typing: (1) the size of the arrays (and what you do if they're different); (2) the "types" of the arrays; and (3) the types of the values within the arrays.

What should we do in A = B when A and B are different sizes? We could
disallow such assignments (most likely as a *run time* error).
We could permit such assignments only when B is bigger than A, with the
implied meaning that we only assign to the corresponding elements of A.
We could permit such assignments when B is smaller than A, with the
implied meaning that assignments are only done when there is a corresponding
value in B.

A more significant problem is what to do with the "constant" arrays.
Note that they have very different meanings depending on which side of
the array they fall on. For example, while `A = (a b c)`

means "assign the r-value of the ith element of `(a b c)`

to
the memory location *corresponding to* the ith value of
`A`

", `(a b c) = A`

means "assign the r-value of
the ith element of `A`

to the memory location given by the
l-value of the ith element of of `(a b c)`

". That is, in
one case, we assign directly to the elements of the array and in the
other, we assign indirectly based on the contents of the elements of
the array.

The third problem is easier to catch and handle at compile time. This is simply "are all the elements of the same type".

Are there other problems? Certainly. There are some of the most significant ones.

**5. Computing with Binary Representation**

The following is a simple BNF grammar for binary numbers.

Num ::= Digit | Num Digit Digit ::= '0' | '1'

One useful attribute of binary numbers is their base-ten value. Extend this
grammar with a `value`

attribute (for both `Digit`

and
`Num`

) and rules for computing these attributes.

**Solution**

Num ::= DigitNum.value = Digit.valueNum0 ::= Num1 DigitNum0.value = 2 * Num1.value + Digit.valueDigit ::= '0'Digit.value = 0Digit ::= '1'Digit.value = 1

**Notes**

The biggest problems I saw with your solutions were a failure to abide by the rules of attribute grammars. In an attribute grammar, the rules for computing attributes must be associated with productions. The attributes computed in a rule can only be based on attributes of the nonterminals used in the corresponding production.

In addition, each attribute must be associated with a nonterminal. There are no "global attributes".

[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Assignments]

**Disclaimer** Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

Source text last modified Wed Oct 14 14:11:11 1998.

This page generated on Wed Oct 14 14:20:12 1998 by SiteWeaver.

Contact our webmaster at rebelsky@math.grin.edu