# Solutions to Examination 1

1. A BNF Grammar for Regular Expressions

Recall that regular expressions over an alphabet, sigma, consist of

• epsilon
• singleton elements
• concatenated regular expressions of the form RS
• alternated regular expressions of the form R+S
• starred regular expressions of the form R*
• parenthesized regular expressions of the form (R)

The formal definition of regular expressions uses parentheses in each case. This helps avoid ambiguity but leads to longer expressions, like ((a)+(b))((c)*). Hence, in practice we often use a more concise notation. To ensure that every expression, such as d+ef* is unambiguous, we need to choose appropriate precedence levels. Parenthesization has highest precedence, followed by Kleene star, followed by concatenation, followed by alternation. For example, the firat expression above might be written (a+b)c* and the second expression above is shorthand for (d)+((e)((f)*)).

Write an unambiguous BNF grammar for regular expressions that follows these precedence rules. You can assume that members of the alphabet have been tokenized and can be referred to with the terminal symbol symbol.

Solution

I've chosen to write a grammar in which starred expressions cannot be starred. This is because E** is semantically equivalent to E*.

RegExp ::= RegExp '+' RegTerm
|  RegTerm
RegTerm ::= RegTerm RegStar
|  RegStar
RegStar ::= RegFactor '*"
|  RegFactor
RegFactor ::= epsilon
|  symbol
|  '(' RegExp ')'

Notes

The errors for this problem tended to be a failure to make the grammar unambiguous or the assign appropriate precedence rules. Because '+' and concatenation are not obviously "associative", some found it difficult to decide to how do associativity and therefore left the grammar ambiguous.

Another common problem was to insert a '.' for concatenation when it wasn't part of the original language. I was relatively lenient when correcting this error, as it seemed less significant.

2. L-values and R-values

Give an assignment statement in an Algol-like language involving variable X such that the R-value of X is used on the left-hand-side of the assignment and the L-value of X is used on the right-hand-side of the assignment. You may choose any type you deem appropriate for X. You may also use other variables in the assignment. Again, you should indicate the type of each variable you use, especially of X.

Solution

var y,z: ref int;
var x: int;
if (x < 3) then y else z := x;

The x<3 comparision requires the r-value of x and falls on the left-hand-side. The result of the if expression is a variable of type "ref int", so we'll need to get a "ref int" on the right hand side. Since the l-value of x is of type "ref int", we use that l-value.

Another common solution was

var A : array of ref int;
A[x] = x;

3. Critiquing Variant Records

Some versions of Pascal support a data storage mechanism called the variant record in which it is possible to assign multiple types to the same storage location. Why do you think the designers of these versions of Pascal included such a mechanism? What language design criteria does it violate, and why?

Notes

Most of your answers were fairly good. A few of you failed to answer half of the question (Why do you tink they chose to include it in the language?) and were penalized appropriately.

4. Assigning Arrays

While arrays play a large role in many programming languages, they are often supported in a way inferior to other data types. For example, many languages only permit you to describe the full contents of an array when the array is being created. Similarly, few permit you to assign one array to another (with the sense of copying all the elements from one array to another).

A language designer might chose to provide many more mechanisms for working with arrays. In particular, one might allow programmer to assign to arrays. For example, if A and B are arrays of the same size, then A=B is a legal assignment and copies all the elements of B into A. Similarly, one might also permit the programmer to describe "array values" with an appropriate syntax, such as surrounding the contents with parentheses. For example, (4, 2, 7) might represent an array of three integers. Similarly, if x and y are string variables, then (x, y) is a valid array value. If A is an array of size 5, then A = (x,y,4,2,x) would be a valid assignment.

4.1. What design principles would suggest such a language design?

Partial Solution

There are a number of critiera that would sugest such a design. One of these is uniformity. Since we can assign to variables of other types (e.g., integer, boolean, real), uniformity suggests that we should be able to assign to variables of all types. Similarly, since we can express "base values" in various primitive types, uniformity suggests that we should also be able to express base values in our array types.

Another reasonable criterion is writeability. It is far easier to initialize all of the elements of an array in one fell swoop than to do it an element at a time. Some of you suggested doing the initialization with a for loop, but I'm not sure how you would do that.

One might even take this idea to an extreme, and permit such array values on either side of an assignment statement. If C is an array of size 2, then (x,y) = C might be a valid assignment that assigns the first value in C to x and the second value in C to y.

4.2. What design principles would suggest such a design?

Partial Solution

Orthogonality normally suggests that if you can do something in one way, then you should be able to do it in similar ways. However, this isn't the best argument for such an assignment, since the ability to write x = 3 does not imply an ability to write 3 = x.

Again, writability seems to be a significant motivating criterion for this type of assignment. Among other things, it might allow us to return multiple values from functions and then separate those values into different variables. It also makes it easier to "mass initialize" variables.

4.3. What difficulties do you foresee in adding such array support to the language?

Partial Solution

The greatest difficulties will have to do with typing. There are a number of key issues in typing: (1) the size of the arrays (and what you do if they're different); (2) the "types" of the arrays; and (3) the types of the values within the arrays.

What should we do in A = B when A and B are different sizes? We could disallow such assignments (most likely as a run time error). We could permit such assignments only when B is bigger than A, with the implied meaning that we only assign to the corresponding elements of A. We could permit such assignments when B is smaller than A, with the implied meaning that assignments are only done when there is a corresponding value in B.

A more significant problem is what to do with the "constant" arrays. Note that they have very different meanings depending on which side of the array they fall on. For example, while A = (a b c) means "assign the r-value of the ith element of (a b c) to the memory location corresponding to the ith value of A", (a b c) = A means "assign the r-value of the ith element of A to the memory location given by the l-value of the ith element of of (a b c)". That is, in one case, we assign directly to the elements of the array and in the other, we assign indirectly based on the contents of the elements of the array.

The third problem is easier to catch and handle at compile time. This is simply "are all the elements of the same type".

Are there other problems? Certainly. There are some of the most significant ones.

5. Computing with Binary Representation

The following is a simple BNF grammar for binary numbers.

Num ::= Digit
|  Num Digit
Digit ::= '0'
|  '1'

One useful attribute of binary numbers is their base-ten value. Extend this grammar with a value attribute (for both Digit and Num) and rules for computing these attributes.

Solution

Num ::= Digit
Num.value = Digit.value
Num0 ::= Num1 Digit
Num0.value = 2 * Num1.value + Digit.value
Digit ::= '0'
Digit.value = 0
Digit ::= '1'
Digit.value = 1

Notes

The biggest problems I saw with your solutions were a failure to abide by the rules of attribute grammars. In an attribute grammar, the rules for computing attributes must be associated with productions. The attributes computed in a rule can only be based on attributes of the nonterminals used in the corresponding production.

In addition, each attribute must be associated with a nonterminal. There are no "global attributes".

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.