Study questions

CSC 323: Software design · Spring, 2012

Department of Computer Science · Grinnell College

Monday, January 23

  1. To what audience or audiences should program source code directed? What rhetorical problems, if any, does writing source code entail?
  2. In what ways does collaborative software development differ from independent, single-programmer software development?
  3. The preliminary syllabus for this course identified mastery of the following software tools as particularly important for software designers and developers: a compiler, a programmable text editor, a debugger, a programmable build tool, a version control system, a documentation generator, and a profiler. Would your list of essential tools include all of these? Would it include tools not listed here?

Wednesday, January 25

  1. How does one create an “opaque” data type in C? In other words, how can one establish and enforce an abstraction boundary between code that implements that data type and code that uses it?
  2. The software “life cycle” is sometimes thought of as consisting of distinct phases: specification, design, coding, testing, debugging, packaging, maintenance, extension, and decommissioning. How are these phases related? Which of them logically must precede which others? Which of them may need to be iterated, in order to “feed back” knowledge acquired during one phase to improve the quality of another phase by passing through it again?
  3. What are the advantages of adopting a single, comprehensive methodology for software development? What are the limitations of that policy?

Friday, January 27

  1. What's the biggest problem with the waterfall model of software development? To what extent can a judicious selection of development techniques overcome or neutralize this problem?
  2. Some programming methodologies encourage programmers to refactor their code early and often, as soon as they notice any difficulty that refactoring could address. Others recommend spending time and effort up front, on specification and design, so that subsequent refactoring will seldom be necessary, but consumes more of the team's effort when it has to be done anyway. What kinds of programming projects and work environments are particularly conducive to one or the other of these extreme attitudes?
  3. In pair programming, two programmers work at each station, but only one manages the keyboard. What would it be most useful for the other one to do?

Monday, January 30

  1. What is the main thesis of Brian Kernighan's account (in chapter 1 of Beautiful code) of Rob Pike's regular-expression matcher? Can you think of any way in which Pike's code could be further improved?
  2. Kernighan suggests several possible extensions to Pike's code that the reader might undertake. Pick one and implement it. Can you preserve the concision and elegance for which Kernigan praises Pike's code while adding new features to it?
  3. One common variation of quicksort finds the kth smallest element in an array of n values, given k, n, and the array, by choosing a random pivot, partitioning the array around it, calculating which partition contains the element sought, and recursing on just that partition. Can you adapt Bentley's methods to calculate the average number of comparisons that this algorithm makes, when an array of size n is filled in randomly and k is chosen randomly from the range from 1 to n?

Wednesday, February 1

  1. How is McConnell using the term “prerequisites” in chapter 3 of Code complete? What are the prerequisites of a software-development project?
  2. Suppose that one of the requirements for a software system is essentially trivial to fulfill. Under what conditions, if any, is it worth while to specify that requirement explicitly? (The underlying question here is: What are requirements for?)
  3. Suppose that a requirement for a software system is worded in such a way that it is impossible to tell whether or not that requirement has been met. (Perhaps the requirement is stated vaguely, or ambiguously, or in subjective, non-quantitative terms.) Under what conditions, if any, would it be worth while to include that requirement in the specification?
  4. Some of the items in McConnell's checklists presuppose that his readers are working within the object-oriented programming paradigm and using an object-oriented language, which we are not. How does this affect the process of drawing up prerequisites?

Friday, February 3

  1. How did you decide whether to implement queues using arrays or linked lists? What are the consequences of each alternative?
  2. Explain why it would be a bad idea to include a function called main in your queues.c file.
  3. Study the long list of highly judgemental comments on programming assignment #0. Find one that seems, on due consideration, to be simplistic, hyperbolic, or wrong-headed. Write a revised comment that offers a more judicious account of the issue raised.

Monday, February 6

  1. What's the problem with the current mechanism for authenticating Web sites, as part of the preparation for establishing secure connections for transmitting sensitive data (such as credit-card numbers)?
  2. How does the Internet Engineering Task Force design and promulgate standards for Internet protocols such as iCalendar?
  3. The Cryptonite mail user agent is that it is designed to empower a class of users who are in some other respects at a technological disadvantage: individuals who need secure, uncensored e-mail exchange and cannot rely on an Internet service provider to supply it (e.g., “activists, NGOs, and reporters working in repressive countries,” “whistle-blowers, witnesses, and victims of domestic abuse”). In what sense, if any, does this aspect of the design make the software “beautiful code”? What's the relationship, if any, between the kind of software-development practices that the essays in Beautiful code promote and the ideals of freedom and social justice that Gulhati wishes to advance?

Wednesday, February 8

  1. What is McConnell's “General Principle of Software Quality”? How is it related to Eric Raymond's maxim (which he named “Linus's Law,” after Linus Torvalds) that “given enough eyeballs, all bugs are shallow”?
  2. McConnell identifies the management of complexity as “software's primary technical imperative” (Code complete, p. 77). What does he mean by this? Aren't correctness and efficient use of resources at least as important as managing complexity?
  3. Define the term loose coupling. Is the use of global variables an indicator of loose coupling, tight coupling, or neither? Give an example, or another example, of a programming construction that is an indicator of loose coupling.

Friday, February 10

  1. McConnell's examples of semantic coupling (Code complete, p. 102) arise only within the object-oriented model, but the hazards are even greater when you're working in a low-level procedural language like C. Give some examples of how semantic coupling might arise in C programs, or restate or generalize McConnell's examples.
  2. In what language is the “excerpt from the read system call specification” that Spinellis reproduces on p. 288 of Beautiful code written? How is it related to the C code that he describes and reproduces earlier in his essay? How and why would one use the awk utility to write C code, as he describes?
  3. Define the term multiplexing and describe how the design pattern that it denotes figures in the “layered indirection” design that Spinellis describes.

Monday, February 13

  1. In addition to awk, Spinellis mentions lex (and flex), yacc (and bison), and cfront as domain-specific language commonly used in software development. Give at least one additional example.
  2. What are some indications against introducting an indirection layer into a software system? In other words, why would one ever decide not to solve a problem in this way?
  3. In Spinellis's example, the arguments to the VOP_READ function that constitutes the “kernel-side interface” of the read system call (the vnode identifying the file to be read from, the buffer in which the value being read is to be stored, and so on) are packaged into a struct vop_read_args value before being passed to the VOP_READ_APV function that constitutes the next level down. Eventually, though, they just have to be broken out of that structure again by one of the functions at the lowest level of Figure 17-1 (the cd9660_read function, perhaps). So what's the point of building the structure in the first place?

Incidentally, in case you're curious, you can see the code for the FreeBSD cd9660_read function at http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/isofs/cd9660/cd9660_vnops.c.html#cd9660_read. It shows that even there we haven't quite reached the bottom layer, since cd9660_read invokes cluster_read or bread (i.e., “block read”) or breadn functions at an even lower level.

Wednesday, February 15

  1. Working within the constraints of the C programming language, how would you address the “multiple instances of an abstract data type” problem that McConnell discusses on pages 131 through 133 of Code complete?
  2. How would you adapt the rules that McConnell presents in the section “Constructors” (pages 151 and 152) to C, which lacks any special syntax for constructor and destructor methods?
  3. What support, if any, does C provide for Booleans? for enumerated types?

Friday, February 17

  1. What are the advantages, if any, of using Unicode representations of characters in C programs? What kinds of applications would benefit most from the systematic use of Unicode?
  2. How would one represent a string comprising the single Unicode character ∃, the existential-quantifier symbol, in the UTF-8 encoding?
  3. Suggest an algorithm for computing the number of Unicode characters in a given file that uses the UTF-8 encoding throughout.

Monday, February 20

(class session terminated prematurely)

Wednesday, February 22

  1. Do the lab on git and submit a list of the commands you gave (in the terminal window) to complete the steps of the lab. (The history command gives you a list of recent commands, from which you may be able to extract the ones you want if you neglect to record them as you give them. The script command can be used to make a transcription of a terminal session, if you remember to turn it on at the beginning of the session and off (by pressing control-D) at the end.)
  2. What is “SHA-1”? Why does git use it to construct names for heads?
  3. What is a “fast forward merge”?

Friday, February 24

  1. Describe how to turn assertions on and off in C programs (that is, how to activate or deactivate the assert macro at compilation time, without changing the source code).
  2. Give an example of “offensive programming” in the specification of the queue library (in programming assignment #0).
  3. C does not treat integer overflow or underflow as an exception. What kinds of steps would a defensive programmer take to accommodate this language feature?

Monday, February 27

  1. How does the Debian Project generally package libraries such as libical (for iCalendar) and libepub (for EPUB)? How might the packages that developers need differ from those that end users need?
  2. How would the milestones for a project using the top-down methodology differ from those for a project developed using a core-followed-by-addons structure?
  3. What additional kinds of testing become possible when there is already a working implementation or prototype for a project?

Wednesday, February 29

  1. What is the purpose of regression testing? Once a function or an application has demonstrated that it yields correct results, what is the point of repeating the same test?
  2. What are the costs and benefits of keeping a detailed log of one's programming errors?
  3. McConnell recommends writing tests before implementing code. Why?

Friday, March 2

  1. What does it mean to say that a hypothesis is falsifiable? Why is it important for a hypothesis about the cause of a program error to be falsifiable?
  2. What is the significance of “psychological distance” in choosing identifiers for related values or functions?
  3. Does the GNU C compiler have a command-line option that converts warnings into errors, as McConnell suggests? If so, what is it?

Friday, March 5

  1. Start a session transcript by opening a terminal window and typing script gdb-lab-transcript at the prompt. Complete parts 0 through 9 of the lab on the gdb debugger. At the end, press Ctrl/D to terminate the session transcript, then e-mail the file to me.
  2. How would gdb display a null pointer?
  3. Is it possible to set a breakpoint after the program has received the values of argc and argv from the command line, but before the any statement in the actual text of the program has been executed? How would one do this? What commands would one give to inspect the values of argc and argv at that point?

Wednesday, March 7

  1. Complete parts 10 and 11 of the lab on the gdb debugger.
  2. Explain why it's not possible to set a watchpoint on a function parameter before that function has been invoked.
  3. Use gdb's help system to find out what the command d does.

Friday, March 9

  1. Start a session transcript by opening a terminal window and typing script make-lab-transcript at the prompt. Complete the lab on the make builder,. At the end, press Ctrl/D to terminate the session transcript, then e-mail the file to me.
  2. Look over the manual page for lowriter (LibreOffice Writer) and formulate a make rule for constructing a Portable Document Format file, say frogs.pdf, from an Open Document Format text, say frogs.odt.
  3. In many large software packages, the first rule in a Makefile uses the non-file-name identifier all as its target, specifies as prerequisites all of the file names that are targets in subsequent rules, and has no actions. What is the point of such a rule? Why would it be placed first?

Monday, March 12

  1. If a Makefile contains the definition TEX = /usr/share/tetex/bin/tex, but the shell from which the instance of make that processes this Makefile defines TEX instead as /usr/bin/tex, which executable will be invoked by an action in the Makefile that specifies ${TEX}?
  2. Why is it usually more difficult to make fast code correct than to make correct code fast?
  3. In Unicode, the first 65536 code points constitute the Basic Multilingual Plane, which includes the most commonly used characters for most of the world's writing systems. How much storage would be needed to hold one Boolean value for each of these characters, if you packed them together as tightly as possible? What data structure would you use for this purpose? What changes would you need to make in order to store a Boolean value for each of the 1117112 code points in the full Unicode codespace?

Wednesday, March 14

  1. Why is nesting of control structures a stronger indicator of complexity than sequencing of control structures?
  2. One strong reason for using recursion, even in a low-level procedural language such as C, is that procedures for operating on recursively defined data structures can be expressed more naturally, particularly when such a structure can contain two or more substructures of the same type. Give some examples of common data structures that have this property.
  3. One of Knuth's pseudocode examples of the use of goto statements to avoid redundant tests is a text-processing application. The sequence below reads in a character from standard input and normally echoes it to standard output. If it is a slash character, /, a tab character is output instead; but if two slash characters appear in succession, a newline should be output instead. Finally, whenever a full-stop character, ., is output, an extra space is output immediately afterwards.
    x := read char; if x = slash then x := read char; if x = slash then return the carriage; go to char processed; else tabulate; fi; fi; write char (x); if x = period then write char (space) fi; char processed:
    How would one implement this algorithm in modern C (using getchar() for read char, putchar for write char, putchar('\n') for return the carriage, and putchar('\t') for tabulate)?

Friday, March 16

  1. Do the lab on doxygen. Package the finished directory into a tarball and e-mail it to me.
  2. What options would you change in the Doxyfile if you wanted to use doxygen in connection with a program written in Java rather than C or C++?
  3. What special commands would you include in a file to identify two or more people who are joint authors of the code it contains?

Monday, April 2

  1. Suggest one or more layout rules for struct definitions that accord with McConnell's principle that “good visual layout shows the logical structure of the program”.
  2. To what extent does the logical structure of a program correspond to its syntactic structure? Suggest a case in which these structures are not the same.
  3. How does the need to anticipate program maintenance constrain layout practices?

Wednesday, April 4

  1. There is an error in example 29-1 (Beautiful code, page 478). Does the presence of this error strengthen or weaken Matsumoto's line of reasoning?
  2. Suppose that, in creating the naive-collinear procedure (Beautiful code, page 542), Hayes had not had the slope and y-intercept procedures already on hand. What would naive-collinear have looked like if the expressions for computing the slope and the y-intercept had been written out in line instead of being expressed as procedure calls? Suggest a way of simplifying the resulting expression algebraically that would have led Hayes directly to a much simpler predicate.
  3. A line in a Euclidean plane through points p0 and p1 separates all the other points of the plane that are not on the line into two half-planes, one on each side of the line. The line segment from p2 to p3 crosses this line if one of its endpoints belongs to each of these half-planes. Write a Scheme predicate that takes the x- and y-coordinates of p0, p1, p2, and p3 as arguments and returns #t if the line segment from p2 to p3 crosses the line through p0 and p1. What cases must be handled specially? What is the best way to deal with them?

Friday, April 6

  1. We saw an example of replacing a deeply nested conditional control structure with a table lookup in one of the earlier readings. Which one? In that context, did the replacement have the desirable results that McConnell claims for this tactic?
  2. Converting a deeply nested conditional control structure into a table lookup is an example of radical refactoring -- making a gigantic change in the structure of a section of code in order to make it simpler, easier to understand, and easier to maintain. Suggest other examples of radical refactoring.
  3. In many programming languages, including C, it is possible to create a table of functions, and to select a function from this table and invoke it dynamically, with the selection being based on variables whose values are not determined until the program is executed. (In C, the functions themselves must be defined statically as part of the program text, but the selection of the appropriate function from the table need not be.) Write and test a C function that takes two arguments, an unsigned integer and a double, and returns the sine of the double if the unsigned integer is even, or its cosine if the unsigned integer is odd, by selecting the appropriate function from a two-element table and applying it. (The function may not use an if-statement, a switch-statement, or a conditional expression.)

Monday, April 9

  1. Relate Dean and Ghemawat's rationale for adopting the MapReduce programming model to our previous discussions of the advantages of modularity.
  2. Describe a simple algorithm that the National Oceanographic and Atmospheric Administration, which prepares weather reports and predictions and monitors climatic trends, might wish to apply to an immense data set, using the MapReduce programming model.
  3. New programming models and design patterns often emerge from computing environments that are specialized in some way, just as MapReduce emerged from computing environments in which the data sets are too large to be processed serially, by a single machine. Suggest other instances of this generalization.

Wednesday, April 11

  1. Do parts 0 through 3 of the lab introducing Emacs Lisp. Write up your results and observations and send them to stone@cs.grinnell.edu.
  2. Read section 10.4 (Iteration) in the GNU Emacs Lisp reference manual. Then rewrite the rev function described in the lab, using an iterative control structure and assignments to one or more local variables (introduced through let).
  3. Read sections 25.3 (Reading from files), 25.9 (Contents of directories), and 27.9 (Creating buffers). Then write an Emacs Lisp procedure that takes one argument, the name of a directory, and creates a buffer containing copies of all of the files in that directory. The name of the buffer should be the name of the directory with the suffix .all appended.

Friday, April 13

  1. Do parts 4 through 7 of the lab introducing Emacs Lisp. Write up your results and observations and send them to stone@cs.grinnell.edu.
  2. The interactive Emacs Lisp function exchange-point-and-mark simultaneously moves the editing cursor to the position of the most recently set mark and sets a new mark at the (previous) position of the editing cursor. If this function were not predefined, how would you implement it?
  3. Define an Emacs Lisp function that takes a character as argument and determines the number of occurrences of that character in the current buffer.

Monday, April 16

  1. Do any of the parts of the lab introducing Emacs Lisp that you have not yet completed. Write up your results and observations and send them to stone@cs.grinnell.edu.
  2. Find the source code for the untabify function in Emacs Lisp.
  3. What happens if the goto-char function receives a negative integer as its argument?

Wednesday, April 18

  1. What is mutation testing? Under what conditions, if any, is it worth while to test the test procedures themselves?
  2. Prove, by analyzing Savoia's implementation of the binarySearch method (in Beautiful code, page 89), that it always returns a value, and that that value is always either -1 or a non-negative integer. Using this result, prove that Savoia's Theory 3 follows logically from his Theory 2, and that Theory 4 similarly follows from Theory 1.
  3. Savoia describes the idea of creating instrumented implementations of the code being tested, as in his comparison-count tests of the binary search method, as a “developer testing trick.” What are the hazards of this approach? Would any of those hazards be alleviated or eliminated if you could turn on a compiler option to add instrumentation instructions to the executable without changing the source code?

Friday, April 20

  1. Does your project team have a clear path from the current state of the project to its completion? If so, describe the remaining steps; if not, what issues have to be resolved in order to make progress?
  2. What features of the project have turned out to be the most difficult ones to implement correctly?
  3. What is the state of your project's documentation? If you ran Doxygen on your project just as it now stands in your repository, would the result be satisfactory?

Monday, April 23

  1. In chapter 29 of Code complete, McConnell argues that trying to integrate components of a system in the wrong order can create obstacles and frustrate developers. How can you tell whether you're integrating components in an inappropriate order? What are the danger signs?
  2. One integration schedule that McConnell does not discuss is opportunistic integration, in which two modules (or collections of modules resulting from previous integration steps) are integrated as soon as they are completed, provided that at least one of them contains calls to functions or methods defined in the other. Discuss the advantages and disadvantages of this approach in contrast to phased integration.
  3. What is a smoke test? Are there any circumstances in which it would be undesirable to make such tests a routine part of software development?

Wednesday, April 25

  1. It seems likely that McConnell tried, at the beginning of Code complete, to list all of the important components of software quality. Did he succeed, or are there significant quality characteristics that he omitted?
  2. In Table 28-2, McConnell lists “Severity of each defect” as one of a few dozen “useful software-development measurements.” How would one measure the severity of a defect? Is there some kind of discrete “severity object” that one could count, or a standard “severity unit”, so that one could express the severity of a defect as a multiple of that unit? If not, what constraints should one place on the possible uses of this measurement?
  3. Choose one of the measurements in Table 28-2 and describe the possible motivational side effects of adopting that measurement as an important criterion of software quality (e.g., by paying small bonuses to programmers whose code has a high score under that criterion).

Friday, April 27

  1. Work through the lab on profiling and send me the answers to the questions it poses.
  2. Run test-queues and have gprof construct a report giving the flat profile and the call graph analysis for the execution. Save gprof's output to a file. Now delete the gmon.out file, run test-queues again, and again have gprof construct the report, saving it in a different file. Compare the files -- it is unlikely that they will be identical. Why? Which of the quantities measured will be the same on each run, and which ones are likely to vary from one run to another?
  3. What are the vertices of the call graph that gprof constructs and analyzes? Is it a directed or undirected graph? Could it contain cycles?

Monday, April 30

  1. Work through the lab on detecting memory leaks. What change(s) to the line-sorter.c would stop th ememory leak?
  2. The mtrace utility displays the sizes of blocks of allocated storage in base 16. Why? How would you get a workstation or calculator to compute the value of a base-16 number such as 56C7?
  3. Find a C program that you wrote for CSC 161 or some other course, one that includes calls to malloc, and use mtrace to determine whether it leaks memory. If so, make it stop.

Wednesday, May 2

  1. What experience led Barton Miller of the University of Wisconsin to consider fuzz-testing Unix utilities?
  2. The file /home/stone/courses/software-design/code/fuzz-generator.c is a simple utility for generating files of random bytes. Compile it and use it to generate a file, fuzz.dat, containing exactly 320,000 random bytes.
  3. Run the line-sorter utility (from the lab on detecting memory leaks) on the fuzz.dat file, collecting the output in a new file, fuzz.sorted. Note and explain any unusual behavior (crashing, hanging, printing warning messages, assertion failures, etc.). Do you obtain a correct fuzz.sorted file? If not, explain the nature and cause of the error. (Hint: look carefully at the output of the command ls -l fuzz.dat fuzz.sorted.)

Friday, May 4

  1. Explain the significance and purpose of the numeric constant 0x33333333 in example 10-1 (page 151) of Beautiful code.
  2. Both Java (in its BigInteger class) and Scheme support arbitrarily large integer values. What algorithm would you use to compute the population count of such a value? Why?
  3. A directed graph containing 32 nodes could be represented as an array of unsigned 32-bit integers, one for each node, with each bit indicating the presence or absence of an arc: using zero-based indexing and counting from the least significant bit, bit j in element i of the array is 1 if there is an arc from node i to node j in the graph, 0 if there is no such arc. In this model, what graph property do the population counts of the array elements represent?

Monday, May 7

  1. One of the more consequential entries in Table 25-2 of Code complete (pages 601 and 602) indicates that an integer division takes about five times as long as other arithmetic operations. This suggests that one could improve the performance of a program by replacing the arithmetic expression n / 64, where n is a variable of type int, with n >> 6. Confirm or refute this suggestion by measurement.
  2. Is the code transformation proposed in the preceding exercise valid if the value of n is negative?
  3. What is "loop unrolling"? How could unrolling a loop ever reduce its execution time?

Wednesday, May 9

  1. In McConnell's discussion of character in chapter 33, he posits that intelligence is innate and unchangeable, while other traits (humility, curiosity, intellectual honesty, creativity, ability to communicate and share, and self-governance) are learned and can be increased or strengthened through habit formation. Are these assumptions correct?
  2. What is “enlightened laziness”? Is McConnell right to extend a kind of grudging tolerance towards it? Why is he so skeptical of the opposite practice that he calls “gonzo programming”?
  3. The “key points” at the end of chapter 34 effectively summarize the high-level advice that McConnell wants readers to take away from the book. Has he adequately justified them? Has he made the case for following his advice?

Friday, May 11

  1. Summarize the status of your team's project and provide a timetable for its completion.
  2. When is the final exam? How will it be structured?
  3. Assess the content, structure, and effectiveness of this course. How could it be improved? What topics related to software design did it fail to address adequately?