Compilers (CS362 2002F)

A Context-Sensitive Grammar for Set Before Use

Introduction

In our discussion of context-free and context-sensitive grammars, someone asked for an example for why we might need a context-sensitive grammar. This document provides an example for a fairly simple language that consists only of assignment statements. We use context-sensitive rules to ensure that every variable is set before it is used.

This language and grammar are similar to a language and grammar presented in class. This document is intended to help you understand more of the details of the design of the grammar.

Basic Grammar

We'll begin with a basic grammar for a language of sequences of assignment statements. At this point, we won't worry about requiring that every variable be set before it is used.

;;; A program is a sequence of assignment statements each of
;;; which is terminated by a semicolon. The program is
;;; terminated by a period.
Program --> Assignments .
Assignments --> Assignment ; Assignments
Assignments -->  
;;; An assignment statement has the normal imperative form.
;;; We use Pascal's assignment operation.
Assignment --> id := Exp
;;; We use the simple ambiguous expression grammar described elsewhere.
Exp --> num
Exp --> id
Exp --> Exp op Exp
Exp --> ( Exp )

Some Derivations

Let's consider some simple derivations of programs in this language (including some we might not like because of their use of unset variables).

The Empty Program

Program
=> Assignments .
=>   .

A Simple Assignment

Deriving: id(a) := num(3) ; .

Program
=> Assignments .
=> Assignment ; Assignments .
=> Assignment ;  .
=> id(a) := Exp ; .
=> id(a) := num(3) ; .

Multiple Assignments

Deriving: id(a) := num(3) ; id(b) := id(a) op(+) num(2) ; .

Program
=> Assignments .
=> Assignment ; Assignments .
=> id(a) := Exp ; Assignments .
=> id(a) := num(3) ; Assignments .
=> id(a) := num(3) ; Assignment ; Assignments .
=> id(a) := num(3) ; id(b) := Exp ; Assignments .
=> id(a) := num(3) ; id(b) := Exp op(+) Exp ; Assignments .
=> id(a) := num(3) ; id(b) := id(a) op(+) Exp ; Assignments .
=> id(a) := num(3) ; id(b) := id(a) op(+) num(2) ; Assignments .
=> id(a) := num(3) ; id(b) := id(a) op(+) num(2) ;   .

Deriving: id(a) := num(3) ; id(b) := num(4) ; id(c) := id(a) ; .

Program
=> Assignments .
=> Assignment ; Assignments .
=> id(a) := Exp ; Assignments .
=> id(a) := num(3) ; Assignments .
=> id(a) := num(3) ; Assignment ; Assignments .
=> id(a) := num(3) ; id(b) := Exp ; Assignments .
=> id(a) := num(3) ; id(b) := num(4) ; Assignments .
=> id(a) := num(3) ; id(b) := num(4) ; Assignment ; Assignments .
=> id(a) := num(3) ; id(b) := num(4) ; id(c) := Exp ; Assignments .
=> id(a) := num(3) ; id(b) := num(4) ; id(c) := id(a) ; Assignments .
=> id(a) := num(3) ; id(b) := num(4) ; id(c) := id(a) ;   .

An Invalid Assignment

Deriving: id(a) := num(3) ; id(b) := id(c) op(+) num(2) ; .

Program
=> Assignments .
=> Assignment ; Assignments .
=> id(a) := Exp ; Assignments .
=> id(a) := num(3) ; Assignments .
=> id(a) := num(3) ; Assignment ; Assignments .
=> id(a) := num(3) ; id(b) := Exp ; Assignments .
=> id(a) := num(3) ; id(b) := Exp op(+) Exp ; Assignments .
=> id(a) := num(3) ; id(b) := id(c) op(+) Exp ; Assignments .
=> id(a) := num(3) ; id(b) := id(c) op(+) num(2) ; Assignments .
=> id(a) := num(3) ; id(b) := id(c) op(+) num(2) ;   .

Adding Context

As the last example suggests, this grammar makes it possible to derive invalid sentences, ones that use variables before they are set. How can we solve the problem? By only allowing someone to use a variable that's been set.

We'll include the phrase IsSet id1 whenever we know that an identifier is set.

Fixing Identifier Expressions

To make sure that the only identifiers used in expressions are those that have already been set, we need to replace the rule that reads

Exp --> id

With one that reads

IsSet id1 Exp --> id1

Noting Variables Are Set

We now need a way to generate the phrase IsSet id1. We'll do so whenever we make an assignment to a particular identifier. In particular, we'll replace the rule that reads

Assignment --> id := Exp

Instead, we'll use the more complicated set of rules

Assignment --> Assign id
Assign id1 --> id1 := Exp IsSet id1

Propagating Notes that Variables Are Set

Unfortunately, we have not yet solved the problem. As things currently stand, the IsSet id is only available immediately after the identifier is set. What if we need it later in the program (as we do in the example above) or in the middle of an expression? More importantly, given that we need the note in expressions and it currently appears at the end of an assignment, how do we get it where we need it? The solution is to propagate the note that it's set throughout the rest of the program.

The first step is to allow the note to propogate through semicolons.

IsSet id1 ; --> ; IsSet id1

The next step is to allow the note to get to the expression.

IsSet id1 Assignment --> id2 := IsSet id1 Exp

We also want to say that a variable that is currently set can be used in both a future assignment and the current assignment.

IsSet id1 Assignment --> IsSet id1 Assignment IsSet id1

Finally, we need to propagate it into assignments.

IsSet id1 Exp --> IsSet id1 Exp op IsSet id1 Exp
IsSet id1 Exp --> ( IsSet id1 Exp )

Deleting Notes that Variables Are Set

We now have a lot of potential notes that a variable has been set. Eventually, we want to get rid of the notes. The easiest way to do so is to simply allow them to be deleted at any time.

IsSet id -->  

The Final Grammar

;;; A program is a sequence of assignment statements each of
;;; which is terminated by a semicolon. The program is
;;; terminated by a period.
P01: Program --> Assignments .
P02: Assignments --> Assignment ; Assignments
;;; The variable set in an assignment is available afterwards.
P03: Assignment --> Assign id1
P04: Assign id1 --> id1 := Exp IsSet id1
;;; Set variables propagate over semicolons.
P05: IsSet id1 ; --> ; IsSet id1
;;; Set variables can be used in the current and future assignments.
P06: IsSet id1 Assignment --> IsSet id1 Assignment IsSet id1
;;; A variable may be used on the RHS if it's been set.
P07: IsSet id1 Exp --> id1
;;; A number can be used for an expression.
P08: Exp --> num
;;; A variable may be used for an expression if the variable has already been set.
P09: IsSet id1 Exp --> id1
;;; An expression can be formed by applying an operator to two expressions.
;;; Any variable that can be used for the expression can be used for the two subexpressions.
P10: IsSet id1 Exp --> IsSet id1 Exp op IsSet id1 Exp
;;; An expression can be formed by parenthesizing an expression.
;;; Any variable that can be used for the expression can be used for the subexpression.
P11: IsSet id1 Exp --> ( IsSet id1 Exp )
;;; Notes about variables being set can be deleted.
P12: IsSet id -->  

A Revised Example

Using the new grammar to derive: id(a) := num(3) ; id(b) := num(4) ; id(c) := id(a) ; .

Program
=> (P01) Assignments .
=> (P02) Assignment ; Assignments .
=> (P03) Assign id(a) ; Assignments .
=> (P04) id(a) := Exp IsSet id(a) ; Assignments .
=> (P08) id(a) := num(3) IsSet id(a) ; Assignments .
=> (P05) id(a) := num(3) ; IsSet id(a) Assignments .
=> (P02) id(a) := num(3) ; IsSet id(a) Assignment ; Assignments .
=> (P06) id(a) := num(3) ; IsSet id(a) Assignment IsSet id(a) ; Assignments .
=> (P12) id(a) := num(3) ;   Assignment IsSet id(a) ; Assignments .
=> (P03) id(a) := num(3) ; Assign id(b) IsSet id(a) ; Assignments .
=> (P04) id(a) := num(3) ; id(b) := Exp IsSet id(b) IsSet id(a) ; Assignments .
=> (P08) id(a) := num(3) ; id(b) := num(4) IsSet id(b) IsSet id(a) ; Assignments .
=> (P05) id(a) := num(3) ; id(b) := num(4) IsSet id(b) ; IsSet id(a) Assignments .
=> (P02) id(a) := num(3) ; id(b) := num(4) IsSet id(b) ; IsSet id(a) Assignment ; Assignments .
=> id(a) := num(3) ; id(b) := num(4) ; id(c) := Exp ; Assignments .
=> id(a) := num(3) ; id(b) := num(4) ; id(c) := id(a) ; Assignments .
=> id(a) := num(3) ; id(b) := num(4) ; id(c) := id(a) ;   .

An Alternate Grammar

We used a special kind of rule in this grammar, one which duplicates a terminal on the right-hand side and expects the duplicated terminals to have the same value. For example,

Assignment -->Assign id
Assign id1 --> id1 := Exp IsSet id1

Some might argue that such rules are not valid in the standard context-sensitive-grammar notation, since the symbols that appear on the right-hand-side are supposed to mean any such token.

Can we avoid this problem? The only mechanism I've come up with is to use a seprate token for each identifier. For example, if the only legal identifiers are a, b, and c, I might use the following.

Assignment --> a := Exp SetA
Assignment --> b := Exp SetB
Assignment --> c := Exp SetC

We must now update the rules for propagating notes and the rules for using notes.

SetA Exp --> a
SetB Exp --> b
SetC Exp --> c

Simlarly, we need separate rules to propate each of the notations.

SetA ; --> ; SetA
SetB ; --> ; SetB
SetC ; --> ; SetC
...

And we need separate rules to eliminate each notation.

SetA -->  
SetB -->  
SetC -->  

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Mon Sep 23 10:55:25 2002.
The source to the document was last modified on Mon Sep 23 10:55:15 2002.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2002F/Readings/set-before-use.html.

You may wish to validate this document's HTML ; Valid CSS! ; Check with Bobby

Glimmer Labs: The Grinnell Laboratory for Interactive Multimedia Experimentation & Research
glimmer@grinnell.edu