Algorithm Analysis (CSC 301 2015F) : Assignments

Exam 2: Sorting, Graphs, Dynamic Programming, and More


Preliminaries

Exam Format

This is a take-home examination. You may use any time or times you deem appropriate to complete the exam, provided you return it to me by the due date.

There are five problems on this examination. You must do your best to answer all of them. The problems are not necessarily of equal difficulty. Problems may include subproblems. If you complete five problems correctly or mostly correctly, you will earn an A. If you complete four problems correctly or mostly correctly, you will earn a B. If you complete three problems correctly or mostly correctly, you will earn a C. If you complete two problems correctly or mostly correctly, you will earn a D. If you complete fewer than two problems correctly or mostly correctly, you will earn an F. If you do not attempt the examination, you will earn a 0. Partially correct solutions may or may not earn you a partial grade, depending on the discretion of the grader.

I rarely give makeup problems because my experience in past semesters is that students spend a lot of effort on such problems but do not significantly improve their grade.

Read the entire examination before you begin.

I expect that someone who has mastered the material and works at a moderate rate should have little trouble completing the exam in a reasonable amount of time. In particular, this exam is likely to take you about ten hours, depending on how well you've learned the topics and how fast you work.

Blind Grading

In the interest of fairness, I prefer to do blind grading on my examinations. I will assign you a random number. You should write your random number on every page of the exam. You should do your best to avoid including any information that woudl personally identify you within the exam.

In addition to your examination, you will turn in a separate cover sheet that provides your name, random number, and academic honesty statements (see below for details).

After grading the anonymous examinations, I will merge them with the cover sheets before returning them to you.

Academic Honesty

This examination is open book, open notes, open mind, open computer, and open Web. However, it is closed person. That means you should not talk to other people about the exam. Other than as restricted by that limitation, you should feel free to use all reasonable resources available to you.

As always, you are expected to turn in your own work. If you find ideas in a book or on the Web, be sure to cite them appropriately. If you use code that you wrote for a previous lab or homework, cite that lab or homework and the other members of your group. If you use code that you found on the course Web site, be sure to cite that code. You need not cite the code provided in the body of the examination.

Although you may use the Web for this exam, you may not post your answers to this examination on the Web. (You certainly should not post them to GitHub unless you create a private repository for your exam.) And, in case it's not clear, you may not ask others (in person, via email, via IM, via IRC, by posting a “please help” message on StackOverflow or elsewhere, or in any other way) to put answers on the Web.

Because different students may be taking the exam at different times, you are not permitted to discuss the exam with anyone until after I have returned it. If you must say something about the exam, you are allowed to say “This is among the hardest exams I have ever taken. If you don't start it early, you will have no chance of finishing the exam.” You may also summarize these policies. You may not tell other students which problems you've finished. You may not tell other students how long you've spent on the exam.

You must include both of the following statements on the cover sheet of the examination.

  1. I have neither received nor given inappropriate assistance on this examination.
  2. I am not aware of any other students who have given or received inappropriate assistance on this examination.

Please sign and date each statement. Note that the statements must be true; if you are unable to sign either statement, please talk to me at your earliest convenience. You need not reveal the particulars of the dishonesty, simply that it happened. Note also that inappropriate assistance is assistance from (or to) anyone other than Professor Rebelsky (that's me).

Presenting Your Work

You will only present your exam to me in physical form.

Physical copy: You must write all of your answers (using the computer or by hand), print them out, number the pages; and put your assigned number on the top of every page. You must turn in a separate cover sheet on which you hand write, sign, and date each of the academic honesty statements (provided you are able to do so). If you fail to number the printed pages, you may suffer a penalty. If you fail to turn in a legible version of the exam, you are also likely to suffer some sort of penalty.

Partial Credit: I may give partial credit for partially correct answers. I am best able to give such partial credit if you include a clear set of work that shows how you derived your answer. You ensure the best possible grade for yourself by clearly indicating what part of your answer is work and what part is your final answer.

Getting Help

I may not be available at the time you take the exam. If you feel that a question is badly worded or impossible to answer, note the issue you have observed and attempt to reword the question in such a way that it is answerable. You should also feel free to send me electronic mail at any time of day.

I will also reserve time at the start of each class before the exam is due to discuss any general questions you have on the exam.

Problems

Problem 1: Searching for Ratings

Recently, the Scarlet and Black has been gathering numeric ratings of courses at Grinnell on a -10 (horrible) to 10 (excellent) scale, where ratings are real numbers. We will refer to the courses as C0 through Cn-1. The S&B has had different students reviewing the courses, who we will refer to as S0 through Sn-1. (Yes, we have the same number of reviewers and courses.)

To make it easier to study the data, the editor has arranged it in a table with rows numbered from 0 to n (for the student reviewers) and with the columns numbered from 0 to n (for the courses). We will refer to the review of class i by student j as R[i,j].

In looking at the table, the editor has found some very interesting properties of the data. First, each row is represented in strictly increasing order, so the reviewers agree on the relative ranking of the courses. Second, each column is represented in strictly decreasing order, so student i is always more positive about each course than student i+1.

President Kington, always interested in numeric data, asks the following questions:

  • a. How many entries in the table contain a value of 0?
  • b. How many entries in the table contain a value of 0 or above?
  • c. Suppose all of the ratings are different. Does any course have a rating of exactly 0?

Write algorithms to answer all three questions and then analyze the worst case running time of each algorithm. You may use no more than a constant amount of extra space. You should strive to make your algorithms as fast as possible, so an O(n2) algorithm is unacceptable.

Citation: This problem is based on problems from Skiena's The Algorithm Design Manual.

Problem 2: Picking Paths for Ford-Fulkerson

As you may recall, the Ford-Fulkerson algorithm finds a maximum flow by repeatedly finding an augmenting path in a modified version of the original graph, updating the original graph to remove that path and add a “back-path”, and adding that path to the flow graph.

As we saw, bad choices for augmenting paths can make the Ford-Fulkerson algorithm very slow, with the cost of the algorithm dependent on the weights of the edges rather than the number of nodes or edges.

Here are two approaches one might use to try to choose augmenting paths.

a. Choose the “shortest” augmenting path, where distance is measured only in terms of nodes visited. (If multiple paths have the same length, choose any of them.)

b. Choose the “most valuable” augementing path, where the value of a path is the weight of its smallest edge. (If multiple paths have the same value, choose any of them.)

For each, either provide an example in which the cost of the algorithm does not depend on the number of nodes and edges in the graph or sketch a proof that they approach will give a polynomial solution in the size of the graph.

Problem 3: Maximizing Task Completion

GrinCo, Inc. is a consulting firm. Through long experience, they can accurately assess the length of time any project will take. Their task board therefore contains a list of deadline/length pairs. Even though tasks may take different lengths, each task has the same value. Tasks must be completed at or before their deadline. So a task with a deadline of 12 and a length of 7 must be started at or before 5 (so at 0, 1, 2, 3, 4 or 5).

[optional] a. Describe an algorithm to find the most tasks that k consultants can complete by the individual tasks' deadlines.

b. Given the following randomly generated set of tasks, find the best solution you can with k of 2, 4, and 6. (If you have an algorithm, apply your algorithm.)

deadline        length
--------        ------
   10              6
   13              4
    4              2
   12              6
   10              2
   16              3
   15              5
   12              5
    7              4
    7              2
    6              5
   12              7
   14              4
   13              2
    6              4
   13              4
   14              4
   11              3
   19              6
   13              7

Problem 4: Formatting Text

As you know, one of the great advances of word processing is the ability of the word processor to take your long paragraphs and break them up into lines, and to rearrange those lines as you reformat the text (e.g., by changing the font or the borders).

The greedy algorithm for formatting text fits as much as possible on each line before going on to the next line. However, that algorithm sometimes creates surprisingly unpleasant line breaks. Consider the text “Ted is in Tampa” written in a monospace font with a column width of 6. The greedy algorithm produces

Ted is
in
Tampa

However, if we break before the “is”, we get the somewhat more aesthetically pleasing

Ted
is in
Tampa

The legendary Donald E. Knuth has suggested that we should measure the quality of a line wrapping algorithm by summing the squares of the spaces left at the end of the line (smaller numbers are better). In the greedy algorithm, we would get 02 + 42 + 12, or 17. In the alternate approach, we get 32 + 12 + 12, or 11.

As you might expect, the natural implementation of this algorithm expects us to try lots of choices, each of which has implications on other choices. Each place we might cut on the first line affects where we might cut on the second line, which affects where we might cut on the third line, and so on and so forth.

And, as you know, when we see this kind of blow-up, we often turn to dynamic programming as a solution.

a. Design a dynamic programming algorithm that will find the optimal line breaks for a paragraph of n words, with a maximum line length of m. You should not count the number of spaces in the last line when assessing the quality of a paragraph.

b. Write two paragraphs of twenty or so words that demonstrate the efficacy of your algorithm vs. the greedy algorithm. Show the table that your algorithm builds for each paragraph with a line length of 15. Show the output that your algorithm and greedy algorithm give for each paragraph.

Citation: This problem is based on ideas from Donald E. Knuth.

Problem 5: Huffman Coding

Huffman Coding is a technique for choosing variable-length encodings of symbols in an alphabet, based on the frequency of those symbols. We mark each symbol with its frequency and repeatedly combine the two lowest-frequency symbols into a “meta symbol” with the sum of the frequencies. We represent each combination as a node in a binary tree. We compute the encoding of any symbol by reading off the path from the root to the node, with left branches represented by 0 and right branches represented by 1.

For example, given an alphabet of p, q, r, s, t, u and corresponding frequencies 8, 4, 7, 6, 14, and 11,

  • We combine q (4) and s (6) into a node representing a total frequency of 10. We'll call that node (q/s).
  • We combine r (7) and p (8) into a node representing a total frequency of 15. We'll call that node (r/p).
  • We combine qs (10) and u (11) into a node representing a total frequency of 21. We'll call that node ((q/s)/u).
  • We combine t (14) and rp (15) into a node representing a total frequency of 29. We'll call that node (t/(r/p)).
  • We combine the two remaining nodes.

Our tree then looks as follows.

                      *(50)
                   0 /     \ 1
                    /       \
                 *(21)      *(29)
                0 / \ 1   0 / \ 1
                 /   \     /   \
             *(10) u(11) t(14) *(15)
          0 / \ 1           0 / \ 1
           /   \             /   \
         q(4)  s(6)        r(7)  p(8)

The codings are as follows: p: 111, q: 000, r: 110, s: 001, u: 01, t: 10. (Usually, the tree is less balanced.)

a. Construct the Huffman tree for the following set of frequncies, which was generated from Grinnell's mission statement.

  a   b   c   d   e   f   g   h   i   j   k   l   m 
 71   8  35  42 104  22  21  42  69   0   7  49  14 

  n   o   p   q   r   s   t   u   v   w   x   y   z 
 60  67  11   4  49  50  70  27  10  16   2  13   0 

  space
174

Always put the subtree representing smaller frequencies to the left. If there is a tie, put the shorter subtree to the left.

b. Show the encoding of "the college aims to graduate individuals who can think clearly".

c. How much space, if any, does that save relative to the number of bits required for a naive encoding, with 5 bits per character?

Citation: This problem was inspired by a problem by Henry Walker.

Questions and Answers

Here you will find questions from students that are likely to be of general interest.

Errata

Here you will find errors of spelling, grammar, and design that students have noted. I will not give extra credit for such errors, but I will nonetheless record them.