[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Assignments]
1. A BNF Grammar for Regular Expressions
Recall that regular expressions over an alphabet, sigma, consist of
RS
R+S
R*
(R)
The formal definition of regular expressions uses parentheses in each
case. This helps avoid ambiguity but leads to longer expressions, like
((a)+(b))((c)*). Hence, in practice we often use a more
concise
notation. To ensure that every expression, such as d+ef*
is unambiguous, we need to choose appropriate precedence levels.
Parenthesization has highest precedence, followed by Kleene star,
followed by concatenation, followed by alternation. For example, the
firat expression above might be written (a+b)c* and the
second expression above is shorthand for
(d)+((e)((f)*)).
Write an unambiguous BNF grammar for regular expressions that follows
these precedence rules. You can assume that members of the alphabet
have been tokenized and can be referred to with the terminal symbol
symbol.
Solution
I've chosen to write a grammar in which starred expressions cannot
be starred. This is because E** is
semantically equivalent to E*.
RegExp ::= RegExp '+' RegTerm
| RegTerm
RegTerm ::= RegTerm RegStar
| RegStar
RegStar ::= RegFactor '*"
| RegFactor
RegFactor ::= epsilon
| symbol
| '(' RegExp ')'
Notes
The errors for this problem tended to be a failure to make the grammar unambiguous or the assign appropriate precedence rules. Because '+' and concatenation are not obviously "associative", some found it difficult to decide to how do associativity and therefore left the grammar ambiguous.
Another common problem was to insert a '.' for concatenation
when it wasn't part of the original language. I was relatively lenient
when correcting this error, as it seemed less significant.
2. L-values and R-values
Give an assignment statement in an Algol-like language involving variable X such that the R-value of X is used on the left-hand-side of the assignment and the L-value of X is used on the right-hand-side of the assignment. You may choose any type you deem appropriate for X. You may also use other variables in the assignment. Again, you should indicate the type of each variable you use, especially of X.
Solution
var y,z: ref int; var x: int; if (x < 3) then y else z := x;
The x<3 comparision requires the r-value of
x and falls on the left-hand-side. The result of the
if expression is a variable of type "ref int", so we'll
need to get a "ref int" on the right hand side. Since the l-value
of x is of type "ref int", we use that l-value.
Another common solution was
var A : array of ref int; A[x] = x;
3. Critiquing Variant Records
Some versions of Pascal support a data storage mechanism called the variant record in which it is possible to assign multiple types to the same storage location. Why do you think the designers of these versions of Pascal included such a mechanism? What language design criteria does it violate, and why?
Notes
Most of your answers were fairly good. A few of you failed to answer half of the question (Why do you tink they chose to include it in the language?) and were penalized appropriately.
4. Assigning Arrays
While arrays play a large role in many programming languages, they are often supported in a way inferior to other data types. For example, many languages only permit you to describe the full contents of an array when the array is being created. Similarly, few permit you to assign one array to another (with the sense of copying all the elements from one array to another).
A language designer might chose to provide many more mechanisms for
working with arrays. In particular, one might allow programmer to
assign to arrays. For example, if A and B are
arrays of the same size, then A=B is a legal assignment and
copies all the elements of B into A.
Similarly, one might also permit the programmer to describe "array
values" with an appropriate syntax, such as surrounding the contents
with parentheses. For example, (4, 2, 7) might represent
an array of three integers. Similarly, if x and
y are string variables, then (x, y) is a valid
array value. If A is an array of size 5, then A =
(x,y,4,2,x) would be a valid assignment.
4.1. What design principles would suggest such a language design?
Partial Solution
There are a number of critiera that would sugest such a design. One of these is uniformity. Since we can assign to variables of other types (e.g., integer, boolean, real), uniformity suggests that we should be able to assign to variables of all types. Similarly, since we can express "base values" in various primitive types, uniformity suggests that we should also be able to express base values in our array types.
Another reasonable criterion is writeability. It is far easier to initialize all of the elements of an array in one fell swoop than to do it an element at a time. Some of you suggested doing the initialization with a for loop, but I'm not sure how you would do that.
One might even take this idea to an extreme, and permit such array
values on either side of an assignment statement. If C is
an array of size 2, then (x,y) = C might be a valid
assignment that assigns the first value in
C to x and the second value
in C to y.
4.2. What design principles would suggest such a design?
Partial Solution
Orthogonality normally suggests that if you can do something
in one way, then you should be able to do it in similar ways. However,
this isn't the best argument for such an assignment, since the ability
to write x = 3 does not imply an ability to write
3 = x.
Again, writability seems to be a significant motivating criterion for this type of assignment. Among other things, it might allow us to return multiple values from functions and then separate those values into different variables. It also makes it easier to "mass initialize" variables.
4.3. What difficulties do you foresee in adding such array support to the language?
Partial Solution
The greatest difficulties will have to do with typing. There are a number of key issues in typing: (1) the size of the arrays (and what you do if they're different); (2) the "types" of the arrays; and (3) the types of the values within the arrays.
What should we do in A = B when A and B are different sizes? We could disallow such assignments (most likely as a run time error). We could permit such assignments only when B is bigger than A, with the implied meaning that we only assign to the corresponding elements of A. We could permit such assignments when B is smaller than A, with the implied meaning that assignments are only done when there is a corresponding value in B.
A more significant problem is what to do with the "constant" arrays.
Note that they have very different meanings depending on which side of
the array they fall on. For example, while A = (a b c)
means "assign the r-value of the ith element of (a b c) to
the memory location corresponding to the ith value of
A", (a b c) = A means "assign the r-value of
the ith element of A to the memory location given by the
l-value of the ith element of of (a b c)". That is, in
one case, we assign directly to the elements of the array and in the
other, we assign indirectly based on the contents of the elements of
the array.
The third problem is easier to catch and handle at compile time. This is simply "are all the elements of the same type".
Are there other problems? Certainly. There are some of the most significant ones.
5. Computing with Binary Representation
The following is a simple BNF grammar for binary numbers.
Num ::= Digit
| Num Digit
Digit ::= '0'
| '1'
One useful attribute of binary numbers is their base-ten value. Extend this
grammar with a value attribute (for both Digit and
Num) and rules for computing these attributes.
Solution
Num ::= Digit Num.value = Digit.value Num0 ::= Num1 Digit Num0.value = 2 * Num1.value + Digit.value Digit ::= '0' Digit.value = 0 Digit ::= '1' Digit.value = 1
Notes
The biggest problems I saw with your solutions were a failure to abide by the rules of attribute grammars. In an attribute grammar, the rules for computing attributes must be associated with productions. The attributes computed in a rule can only be based on attributes of the nonterminals used in the corresponding production.
In addition, each attribute must be associated with a nonterminal. There are no "global attributes".
[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Assignments]
Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.
Source text last modified Wed Oct 14 14:11:11 1998.
This page generated on Wed Oct 14 14:20:12 1998 by SiteWeaver.
Contact our webmaster at rebelsky@math.grin.edu