To what audience or audiences should program source code directed?
What rhetorical problems, if any, does writing source code entail?
In what ways does collaborative software development differ from
independent, single-programmer software development?
The preliminary syllabus for this course identified mastery of the
following software tools as particularly important for software designers
and developers: a compiler, a programmable text editor, a debugger, a
programmable build tool, a version control system, a documentation
generator, and a profiler. Would your list of essential tools include all
of these? Would it include tools not listed here?
Wednesday, January 25
How does one create an “opaque” data type in C? In other
words, how can one establish and enforce an abstraction boundary between
code that implements that data type and code that uses it?
The software “life cycle” is sometimes thought of as
consisting of distinct phases: specification, design, coding, testing,
debugging, packaging, maintenance, extension, and decommissioning. How are
these phases related? Which of them logically must precede which
others? Which of them may need to be iterated, in order to “feed
back” knowledge acquired during one phase to improve the quality of another
phase by passing through it again?
What are the advantages of adopting a single, comprehensive
methodology for software development? What are the limitations of that
policy?
Friday, January 27
What's the biggest problem with the waterfall model of
software development? To what extent can a judicious selection of
development techniques overcome or neutralize this problem?
Some programming methodologies encourage programmers to refactor
their code early and often, as soon as they notice any difficulty that
refactoring could address. Others recommend spending time and effort up
front, on specification and design, so that subsequent refactoring will
seldom be necessary, but consumes more of the team's effort when it has to
be done anyway. What kinds of programming projects and work environments
are particularly conducive to one or the other of these extreme attitudes?
In pair programming, two programmers work at each station, but only
one manages the keyboard. What would it be most useful for the other one
to do?
Monday, January 30
What is the main thesis of Brian Kernighan's account (in chapter 1 of
Beautiful code) of Rob Pike's regular-expression matcher? Can
you think of any way in which Pike's code could be further improved?
Kernighan suggests several possible extensions to Pike's code that
the reader might undertake. Pick one and implement it. Can you preserve
the concision and elegance for which Kernigan praises Pike's code while
adding new features to it?
One common variation of quicksort finds the kth smallest
element in an array of n values, given k, n, and the
array, by choosing a random pivot, partitioning the array around it,
calculating which partition contains the element sought, and recursing on
just that partition. Can you adapt Bentley's methods to calculate the
average number of comparisons that this algorithm makes, when an array of
size n is filled in randomly and k is chosen randomly from the
range from 1 to n?
Wednesday, February 1
How is McConnell using the term “prerequisites” in chapter 3
of Code complete? What are the prerequisites of a
software-development project?
Suppose that one of the requirements for a software system is
essentially trivial to fulfill. Under what conditions, if any, is it worth
while to specify that requirement explicitly? (The underlying question
here is: What are requirements for?)
Suppose that a requirement for a software system is worded in such a
way that it is impossible to tell whether or not that requirement has been
met. (Perhaps the requirement is stated vaguely, or ambiguously, or in
subjective, non-quantitative terms.) Under what conditions, if any, would
it be worth while to include that requirement in the specification?
Some of the items in McConnell's checklists presuppose that his
readers are working within the object-oriented programming paradigm and
using an object-oriented language, which we are not. How does this affect
the process of drawing up prerequisites?
Friday, February 3
How did you decide whether to implement queues using arrays or linked
lists? What are the consequences of each alternative?
Explain why it would be a bad idea to include a function called main in your queues.c file.
Study the long list of highly judgemental comments on
programming assignment #0. Find one that seems, on
due consideration, to be simplistic, hyperbolic, or wrong-headed. Write a
revised comment that offers a more judicious account of the issue raised.
Monday, February 6
What's the problem with the current mechanism for authenticating
Web sites, as part of the preparation for establishing secure connections
for transmitting sensitive data (such as credit-card numbers)?
How does the Internet Engineering Task Force design and promulgate
standards for Internet protocols such as iCalendar?
The Cryptonite mail user agent is that it is designed to empower a
class of users who are in some other respects at a technological
disadvantage: individuals who need secure, uncensored e-mail exchange and
cannot rely on an Internet service provider to supply it (e.g., “activists, NGOs, and reporters working in repressive countries,” “whistle-blowers, witnesses, and victims of domestic abuse”). In what
sense, if any, does this aspect of the design make the software “beautiful code”? What's the relationship, if any, between the kind of
software-development practices that the essays in Beautiful
code promote and the ideals of freedom and social justice that Gulhati
wishes to advance?
Wednesday, February 8
What is McConnell's “General Principle of Software Quality”?
How is it related to Eric Raymond's maxim (which he named “Linus's
Law,” after Linus Torvalds) that “given enough eyeballs, all bugs
are shallow”?
McConnell identifies the management of complexity as “software's primary technical imperative” (Code complete,
p. 77). What does he mean by this? Aren't correctness and efficient use
of resources at least as important as managing complexity?
Define the term loose coupling. Is the use of global
variables an indicator of loose coupling, tight coupling, or neither? Give
an example, or another example, of a programming construction that is an
indicator of loose coupling.
Friday, February 10
McConnell's examples of semantic coupling (Code complete, p. 102) arise only within the object-oriented model, but
the hazards are even greater when you're working in a low-level procedural
language like C. Give some examples of how semantic coupling might arise
in C programs, or restate or generalize McConnell's examples.
In what language is the “excerpt from the read system
call specification” that Spinellis reproduces on p. 288 of Beautiful code written? How is it related to the C code that he describes
and reproduces earlier in his essay? How and why would one use the awk utility to write C code, as he describes?
Define the term multiplexing and describe how the design
pattern that it denotes figures in the “layered indirection” design
that Spinellis describes.
Monday, February 13
In addition to awk, Spinellis mentions lex (and flex), yacc (and bison), and cfront as
domain-specific language commonly used in software development. Give at
least one additional example.
What are some indications against introducting an indirection
layer into a software system? In other words, why would one ever decide
not to solve a problem in this way?
In Spinellis's example, the arguments to the VOP_READ function
that constitutes the “kernel-side interface” of the read
system call (the vnode identifying the file to be read from, the buffer in
which the value being read is to be stored, and so on) are packaged into a
struct vop_read_args value before being passed to the VOP_READ_APV function that constitutes the next level down. Eventually,
though, they just have to be broken out of that structure again by one of
the functions at the lowest level of Figure 17-1 (the cd9660_read
function, perhaps). So what's the point of building the structure in the
first place?
Working within the constraints of the C programming language, how
would you address the “multiple instances of an abstract data type”
problem that McConnell discusses on pages 131 through 133 of Code complete?
How would you adapt the rules that McConnell presents in the section
“Constructors” (pages 151 and 152) to C, which lacks any special
syntax for constructor and destructor methods?
What support, if any, does C provide for Booleans? for enumerated
types?
Friday, February 17
What are the advantages, if any, of using Unicode representations of
characters in C programs? What kinds of applications would benefit most
from the systematic use of Unicode?
How would one represent a string comprising the single Unicode
character ∃, the existential-quantifier symbol, in the UTF-8
encoding?
Suggest an algorithm for computing the number of Unicode characters
in a given file that uses the UTF-8 encoding throughout.
Monday, February 20
(class session terminated prematurely)
Wednesday, February 22
Do the lab on git and submit a list of
the commands you gave (in the terminal window) to complete the steps of the
lab. (The history command gives you a list of recent commands, from
which you may be able to extract the ones you want if you neglect to record
them as you give them. The script command can be used to make a
transcription of a terminal session, if you remember to turn it on at the
beginning of the session and off (by pressing control-D) at the end.)
What is “SHA-1”? Why does git use it to construct
names for heads?
What is a “fast forward merge”?
Friday, February 24
Describe how to turn assertions on and off in C programs (that is,
how to activate or deactivate the assert macro at compilation time, without
changing the source code).
Give an example of “offensive programming” in the
specification of the queue library (in programming assignment #0).
C does not treat integer overflow or underflow as an exception. What
kinds of steps would a defensive programmer take to accommodate this
language feature?
Monday, February 27
How does the Debian Project generally package libraries such as libical (for iCalendar) and libepub (for EPUB)? How might the
packages that developers need differ from those that end users need?
How would the milestones for a project using the top-down methodology
differ from those for a project developed using a core-followed-by-addons
structure?
What additional kinds of testing become possible when there is
already a working implementation or prototype for a project?
Wednesday, February 29
What is the purpose of regression testing? Once a function or an
application has demonstrated that it yields correct results, what is the
point of repeating the same test?
What are the costs and benefits of keeping a detailed log of one's
programming errors?
McConnell recommends writing tests before implementing code. Why?
Friday, March 2
What does it mean to say that a hypothesis is falsifiable?
Why is it important for a hypothesis about the cause of a program error to
be falsifiable?
What is the significance of “psychological distance” in
choosing identifiers for related values or functions?
Does the GNU C compiler have a command-line option that converts
warnings into errors, as McConnell suggests? If so, what is it?
Friday, March 5
Start a session transcript by opening a terminal window and typing
script gdb-lab-transcript at the prompt. Complete parts 0 through 9
of the lab on the gdb debugger. At the end,
press Ctrl/D to terminate the session transcript, then e-mail the file to
me.
How would gdb display a null pointer?
Is it possible to set a breakpoint after the program has received the
values of argc and argv from the command line, but before the
any statement in the actual text of the program has been executed? How
would one do this? What commands would one give to inspect the values of
argc and argv at that point?
Explain why it's not possible to set a watchpoint on a function
parameter before that function has been invoked.
Use gdb's help system to find out what the command d
does.
Friday, March 9
Start a session transcript by opening a terminal window and typing
script make-lab-transcript at the prompt. Complete the lab
on the make builder,. At the end, press Ctrl/D to
terminate the session transcript, then e-mail the file to me.
Look over the manual page for lowriter (LibreOffice Writer)
and formulate a make rule for constructing a Portable Document
Format file, say frogs.pdf, from an Open Document Format text,
say frogs.odt.
In many large software packages, the first rule in a Makefile uses the non-file-name identifier all as its target,
specifies as prerequisites all of the file names that are targets in
subsequent rules, and has no actions. What is the point of such a rule?
Why would it be placed first?
Monday, March 12
If a Makefile contains the definition TEX =
/usr/share/tetex/bin/tex, but the shell from which the instance of make that processes this Makefile defines TEX instead as
/usr/bin/tex, which executable will be invoked by an action in the
Makefile that specifies ${TEX}?
Why is it usually more difficult to make fast code correct than to
make correct code fast?
In Unicode, the first 65536 code points constitute the Basic
Multilingual Plane, which includes the most commonly used characters for
most of the world's writing systems. How much storage would be needed to
hold one Boolean value for each of these characters, if you packed them
together as tightly as possible? What data structure would you use for
this purpose? What changes would you need to make in order to store a
Boolean value for each of the 1117112 code points in the full Unicode
codespace?
Wednesday, March 14
Why is nesting of control structures a stronger indicator of
complexity than sequencing of control structures?
One strong reason for using recursion, even in a low-level procedural
language such as C, is that procedures for operating on recursively defined
data structures can be expressed more naturally, particularly when such a
structure can contain two or more substructures of the same type. Give
some examples of common data structures that have this property.
One of Knuth's pseudocode examples of the use of goto
statements to avoid redundant tests is a text-processing application. The
sequence below reads in a character from standard input and normally echoes
it to standard output. If it is a slash character, /, a tab
character is output instead; but if two slash characters appear in
succession, a newline should be output instead. Finally, whenever a
full-stop character, ., is output, an extra space is output
immediately afterwards.
x := read char;
if x = slash
then x := read char;
if x = slash
then return the carriage; go to char processed;
else tabulate;
fi;
fi;
write char (x);
if x = period then write char (space) fi;
char processed:
How would one implement this algorithm in modern C (using getchar()
for read char, putchar for write char, putchar('\n') for return the carriage, and putchar('\t') for
tabulate)?
Friday, March 16
Do the lab on doxygen. Package the
finished directory into a tarball and e-mail it to me.
What options would you change in the Doxyfile if you
wanted to use doxygen in connection with a program written in Java
rather than C or C++?
What special commands would you include in a file to identify two or
more people who are joint authors of the code it contains?
Monday, April 2
Suggest one or more layout rules for struct definitions that
accord with McConnell's principle that “good visual layout shows
the logical structure of the program”.
To what extent does the logical structure of a program correspond to
its syntactic structure? Suggest a case in which these structures are not
the same.
How does the need to anticipate program maintenance constrain layout
practices?
Wednesday, April 4
There is an error in example 29-1 (Beautiful code, page
478). Does the presence of this error strengthen or weaken Matsumoto's
line of reasoning?
Suppose that, in creating the naive-collinear procedure
(Beautiful code, page 542), Hayes had not had the slope and y-intercept procedures already on hand. What would naive-collinear have looked like if the expressions for computing the
slope and the y-intercept had been written out in line instead of
being expressed as procedure calls? Suggest a way of simplifying the
resulting expression algebraically that would have led Hayes directly to a
much simpler predicate.
A line in a Euclidean plane through points p0 and
p1 separates all the other points of the
plane that are not on the line into two half-planes, one on each side of
the line. The line segment from p2 to p3 crosses
this line if one of its endpoints belongs to each of these half-planes.
Write a Scheme predicate that takes the x- and y-coordinates of
p0, p1, p2, and p3 as
arguments and returns #t if the line segment from p2 to
p3 crosses the line through p0 and p1.
What cases must be handled specially? What is the best way to deal with
them?
Friday, April 6
We saw an example of replacing a deeply nested conditional control
structure with a table lookup in one of the earlier readings. Which one?
In that context, did the replacement have the desirable results that
McConnell claims for this tactic?
Converting a deeply nested conditional control structure into a table
lookup is an example of radical refactoring -- making a gigantic
change in the structure of a section of code in order to make it simpler,
easier to understand, and easier to maintain. Suggest other examples of
radical refactoring.
In many programming languages, including C, it is possible to create
a table of functions, and to select a function from this table and
invoke it dynamically, with the selection being based on variables whose
values are not determined until the program is executed. (In C, the
functions themselves must be defined statically as part of the program
text, but the selection of the appropriate function from the table need not
be.) Write and test a C function that takes two arguments, an unsigned
integer and a double, and returns the sine of the double if the unsigned
integer is even, or its cosine if the unsigned integer is odd, by selecting
the appropriate function from a two-element table and applying it. (The
function may not use an if-statement, a switch-statement, or a
conditional expression.)
Monday, April 9
Relate Dean and Ghemawat's rationale for adopting the MapReduce
programming model to our previous discussions of the advantages of
modularity.
Describe a simple algorithm that the National Oceanographic and
Atmospheric Administration, which prepares weather reports and predictions
and monitors climatic trends, might wish to apply to an immense data set,
using the MapReduce programming model.
New programming models and design patterns often emerge from
computing environments that are specialized in some way, just as MapReduce
emerged from computing environments in which the data sets are too large to
be processed serially, by a single machine. Suggest other instances of
this generalization.
Read section 10.4 (Iteration) in
the GNU Emacs Lisp reference manual. Then rewrite
the rev function described in the lab, using an iterative control
structure and assignments to one or more local variables (introduced
through let).
Read sections 25.3 (Reading from files),
25.9 (Contents of directories),
and 27.9 (Creating buffers).
Then write an Emacs Lisp procedure that takes one argument, the name of a
directory, and creates a buffer containing copies of all of the files in
that directory. The name of the buffer should be the name of the directory
with the suffix .all appended.
The interactive Emacs Lisp function exchange-point-and-mark
simultaneously moves the editing cursor to the position of the most
recently set mark and sets a new mark at the (previous) position of the
editing cursor. If this function were not predefined, how would you
implement it?
Define an Emacs Lisp function that takes a character as argument and
determines the number of occurrences of that character in the current
buffer.
Find the source code for the untabify function in Emacs Lisp.
What happens if the goto-char function receives a negative
integer as its argument?
Wednesday, April 18
What is mutation testing? Under what conditions, if any,
is it worth while to test the test procedures themselves?
Prove, by analyzing Savoia's implementation of the binarySearch method (in Beautiful code, page 89), that it
always returns a value, and that that value is always either -1 or a
non-negative integer. Using this result, prove that Savoia's Theory 3
follows logically from his Theory 2, and that Theory 4 similarly follows
from Theory 1.
Savoia describes the idea of creating instrumented implementations of
the code being tested, as in his comparison-count tests of the binary
search method, as a “developer testing trick.” What are the
hazards of this approach? Would any of those hazards be alleviated or
eliminated if you could turn on a compiler option to add instrumentation
instructions to the executable without changing the source code?
Friday, April 20
Does your project team have a clear path from the current state of
the project to its completion? If so, describe the remaining steps; if
not, what issues have to be resolved in order to make progress?
What features of the project have turned out to be the most difficult
ones to implement correctly?
What is the state of your project's documentation? If you ran
Doxygen on your project just as it now stands in your repository, would the
result be satisfactory?
Monday, April 23
In chapter 29 of Code complete, McConnell argues that
trying to integrate components of a system in the wrong order can create
obstacles and frustrate developers. How can you tell whether you're
integrating components in an inappropriate order? What are the danger
signs?
One integration schedule that McConnell does not discuss is opportunistic integration, in which two modules (or collections of modules
resulting from previous integration steps) are integrated as soon as they
are completed, provided that at least one of them contains calls to
functions or methods defined in the other. Discuss the advantages and
disadvantages of this approach in contrast to phased integration.
What is a smoke test? Are there any circumstances in which it would
be undesirable to make such tests a routine part of software
development?
Wednesday, April 25
It seems likely that McConnell tried, at the beginning of Code complete, to list all of the important components of software
quality. Did he succeed, or are there significant quality characteristics
that he omitted?
In Table 28-2, McConnell lists “Severity of each defect” as
one of a few dozen “useful software-development measurements.” How
would one measure the severity of a defect? Is there some kind of discrete
“severity object” that one could count, or a standard “severity unit”, so that one could express the severity of a defect as a
multiple of that unit? If not, what constraints should one place on the
possible uses of this measurement?
Choose one of the measurements in Table 28-2 and describe the
possible motivational side effects of adopting that measurement as an
important criterion of software quality (e.g., by paying small bonuses to
programmers whose code has a high score under that criterion).
Run test-queues and have gprof construct a report
giving the flat profile and the call graph analysis for the execution.
Save gprof's output to a file. Now delete the gmon.out file,
run test-queues again, and again have gprof construct the
report, saving it in a different file. Compare the files -- it is unlikely
that they will be identical. Why? Which of the quantities measured will
be the same on each run, and which ones are likely to vary from one run to
another?
What are the vertices of the call graph that gprof
constructs and analyzes? Is it a directed or undirected graph? Could it
contain cycles?
The mtrace utility displays the sizes of blocks of allocated
storage in base 16. Why? How would you get a workstation or calculator to
compute the value of a base-16 number such as 56C7?
Find a C program that you wrote for CSC 161 or some other course, one
that includes calls to malloc, and use mtrace to determine
whether it leaks memory. If so, make it stop.
Wednesday, May 2
What experience led Barton Miller of the University of Wisconsin to
consider fuzz-testing Unix utilities?
The file /home/stone/courses/software-design/code/fuzz-generator.c is a simple
utility for generating files of random bytes. Compile it and use it to
generate a file, fuzz.dat, containing exactly 320,000 random
bytes.
Run the line-sorter utility (from the lab on detecting
memory leaks) on the fuzz.dat file, collecting the
output in a new file, fuzz.sorted. Note and explain any unusual
behavior (crashing, hanging, printing warning messages, assertion failures,
etc.). Do you obtain a correct fuzz.sorted file? If not,
explain the nature and cause of the error. (Hint: look carefully at the
output of the command ls -l fuzz.dat fuzz.sorted.)
Friday, May 4
Explain the significance and purpose of the numeric constant 0x33333333 in example 10-1 (page 151) of Beautiful code.
Both Java (in its BigInteger class) and Scheme support
arbitrarily large integer values. What algorithm would you use to compute
the population count of such a value? Why?
A directed graph containing 32 nodes could be represented as an array
of unsigned 32-bit integers, one for each node, with each bit indicating
the presence or absence of an arc: using zero-based indexing and counting
from the least significant bit, bit j in element i of the array
is 1 if there is an arc from node i to node j in the graph, 0 if there is
no such arc. In this model, what graph property do the population
counts of the array elements represent?
Monday, May 7
One of the more consequential entries in Table 25-2 of
Code complete (pages 601 and 602) indicates that an integer
division takes about five times as long as other arithmetic operations.
This suggests that one could improve the performance of a program by
replacing the arithmetic expression n / 64, where n is a
variable of type int, with n >> 6. Confirm or refute this
suggestion by measurement.
Is the code transformation proposed in the preceding exercise valid
if the value of n is negative?
What is "loop unrolling"? How could unrolling a loop ever reduce its
execution time?
Wednesday, May 9
In McConnell's discussion of character in chapter 33, he posits that
intelligence is innate and unchangeable, while other traits (humility,
curiosity, intellectual honesty, creativity, ability to communicate and
share, and self-governance) are learned and can be increased or
strengthened through habit formation. Are these assumptions correct?
What is “enlightened laziness”? Is McConnell right to extend
a kind of grudging tolerance towards it? Why is he so skeptical of the
opposite practice that he calls “gonzo programming”?
The “key points” at the end of chapter 34 effectively
summarize the high-level advice that McConnell wants readers to take away
from the book. Has he adequately justified them? Has he made the case for
following his advice?
Friday, May 11
Summarize the status of your team's project and provide a timetable
for its completion.
When is the final exam? How will it be structured?
Assess the content, structure, and effectiveness of this course. How
could it be improved? What topics related to software design did it fail
to address adequately?