Class 2: Introduction, continued
- A few pages from Appel on intermediate representation trees.
- Hopefully, you've all taken the
introductory survey. I've
responses to your questions.
- I've forgotten to mention that we decided to be fairly casual with
the prereqs for this class. At a larger school, it is likely that
an architecture course and a software design course would be
prerequisites for this course. It is also possible that a languages
course (like 302), an algorithms course (like 301), and an automata
theory course (like 341) would be prerequisites. You'll need to work
with each other to smooth out the problems with the topics other
folks don't understand.
- On Wednesday, we'll have a debate on C vs. Java as implementation
- Arguing for C: Sarah S., Hilary, Rachel (4 min per opening argument).
- Arguing for Java: Raphen, David, Omar (4 min per opening argument)
- On Friday, we'll have a debate on Tiger vs. "Our own language"
as implemented language.
- Arguing for Tiger: Vivek, Jamal, Kevin, Scott. (3 min per opening argument).
- Arguing for "Our own": Ryan, Jack, Sarah, Yu (3 min per opening argument).
- We'll do the initial votes on those debates today.
- Note that you should feel free to discuss possible debate strategies
- Here is the way I traditionally think of a compiler. Note that
our book uses a slightly different one.
- A program begins as a sequence of bytes (characters, whatever
you want to call them).
- In some compilers, compilation begins with preprocessing,
which typically involves expanding macros. Some consider this
process external to the compilation process.
- The real fun begins with lexical analysis. In this phase,
the sequence of bytes is transformed into a series of tokens:
the basic syntactic values of the language. During lexical analysis,
comments, whitespace, and other non-essential parts are stripped.
- The syntactic analyzer or parser ensures that
the program is syntactically correct. You can also think of this
phase as building a parse tree to represent the syntactic
- The parse tree is then refined into an abstract syntax tree (AST),
in which we throw away the useless syntactic entities (such as parentheses
- The abstract syntax tree undergoes semantic analysis to ensure
that it is valid. Most typically, this phase does type checking and
ensures that all variables are declared.
- The AST is translated into an intermediate code, most
frequently an intermediate representation tree (IRT). The
IRT is effectively a generic, tree-structured assembly code.
- This translation may involve some sort of frame layout to
determine how information will be arranged before, during, and after
function calls (more or less, where in memory does everything go).
- The IRT may be optimized to produce better code (this is
really improvement, rather than optimization).
- Instructions in the target architecture are selected
for appropriate parts of the IRT.
- This code is analyzed. Typically, the flow of control and
the liveness of variables are analyzed.
- The code is again optimized.
- Details are filled in (e.g., registers are assigned to particular
- The final code is emitted.
- If you do not understand the syntax vs. semantics distinction
made earlier, let's consider it in terms of English.
- Syntax gives the structure of a sentence. Semantics
assigns a meaning.
- In English, you might say that one form of English sentence
contains a noun-phrase, a verb, and a noun-phrase.
- Given that weak definition, the following might all be sentences.
- John threw the ball.
- The books are pretty colors.
- John are a book.
- The colors threw John.
- Some of these are simply meaningless (e.g., colors traditionally do not
- At least one is invalid because of a ``number mismatch'' (e.g., singular
noun with plural verb).
- Let us consider the various phases with an extended example (one which
we may continue in the next class session).
- We will begin with a simple program in a fictitious and unnamed language.
1: program whatever;
3: int j,k; -- Not sure yet
4: int n = 10; -- The number of elements
5: array(1..10) of real A; -- A is an array
6: A = 1; -- Set up the first element of the array
7: int i; -- A counter variable
8: for i = 2 to n do
9: A[i] = i + A[i-1];
- When we tokenize this, we might get a sequence like
program id(whatever) semi begin int id(j) comma id(k) semi
assign integer(4) semi int id(n) assign integer(10) semi
array lparen integer(1) ellipses integer(10) rparen of real id(A) semi
id(A) lbracket integer(1) rbracket assign integer(1) semi int id(i) semi
for id(i) assign integer(2) to id(n) do id(A) lbracket id(i) rbracket
assign id(i) plus id(A) lbracket id(i) minus integer(1) rbracket semi
od end period
- Turning this into an approximate tree, we get something like
the following. Note that the words in all caps are
- Considering just the assignment in the loop, we might see
/ | \
LVALUE assign EXP
/ | | \ / | \
id(A) [ EXP ] EXP + EXP
| | |
id(i) id(i) ARREF
/ | | \
id(A) [ EXP ]
/ | \
EXP - EXP
- Of course, there are lots of extraneous pieces here. Hence,
/ \ / \
id(A) id(i) id(i) ARREF
- During semantic analysis, we ensure that all the variables used
in the program are declared. The following is not how the analysis
is done, but illustrates some of the issues.
A is used on line 6. It is declared on line 5.
i is used on line 8. It is declared on line 7.
n is used on line 8. It is declared on line 4.
A is used on line 8 (twice). It is declared on line 5.
i is used on line 9. It is declared on line 7.
- The value of
n is used on line 8. Is it assigned earlier?
Yes. It is assigned on line 4.
- The value of
i is used on line 9. Is it assigned earlier?
Yes. It is assigned on line 8.
- The value of
A[i-1] is used on line 9. Is it assigned
earlier? It's probably beyond our capabilities to check right now.
- On line 6,
1 is used as an index into array
Is it a valid index? It is an integer, and
A is indexed
by integers. It has the value 1, and
A permits indices
between 1 and 10.
- On line 9,
i is used as an index into array
Is it a valid index? It is an integer variable. It is probably too
difficult to ensure that its value is acceptable.
- On line 9,
i-1 is used as an index into array
Is it a valid index?
i is an integer.
is an integer. Hence,
i-1 is an integer, and valid in
- On line 9, the expression
i + A[i-1] is assigned to
A[i]. The right-hand-side must have type real, since
A contains real values.
i is an integer,
being added to a real. Can we do that?
- We can now consider the frame layout. In particular, how
will we organize memory?
- We are now ready to consider an intermediate code. Our book presents
one such code on pp. 157-8 (red) or 151-3 (green).
- Our program might translate into
MOVE(MEM(BINOP(+, TEMP(0), 8), 4), CONST(10)),
MOVE(MEM(BINOP(+, TEMP(0), 12), 8), CONST(1)),
MOVE(MEM(BINOP(+, TEMP(0), 92), 4), CONST(2)),
MEM(BINOP(+, TEMP(0), 92), 4), MEM(BINOP(+, TEMP(0), 8), 4),
- Note that I have perhaps made this more sequential than is necessary
(which may then be handled in a subsequent step).
- I've also left out the translation of the assignment statement.
I'll write that in a shorthand, using a lot of temporary variables.
The explantation will come in class, as we redevelop the code from
t1 = i-1;
t2 = t1-1;
t3 = t2*8;
t4 = base+8+t3;
t5 = i-1;
t6 = t3*8;
t7 = base+8+t6;
mem(t7) = i + mem(t4);
- The next stages might involve analysis and optimization. There are a number
of kinds of optimization we might do.
- We might observe that we're repeating work (lots of
- We might observe that we're repeating the values stored in temporaries
t5 both hold
- We might observe that some variables are never used. For example,
we never use
- We might observe that some instructions can be simplified. For
example, instead of multiplying by eight, we might shift left by
three (or is that not reasonable).
- We might instead observe that some instructions can be simplified
by better analysis of surrounding code. For example, since
increases by one each time through the loop,
by 8 each time through the loop.
- We might observe that since there is no output, the program is
essentially the null program.
Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.
Source text last modified Fri Sep 11 12:28:03 1998.
This page generated on Wed Sep 16 11:21:14 1998 by SiteWeaver.
Contact our webmaster at email@example.com