Algorithms and OOD (CSC 207 2014S) : Outlines
Held: Wednesday, 5 March 2014
Back to Outline 26 - Quadratic Sorts.
On to Outline 28 - Quicksort.
We consider the merge sort algorithm, our first O(nlogn) sorting algorithm.
- Lower bounds on sorting.
- Divide and conquer algorithms.
- An introduction to merge sort.
- Analyzing merge sort.
- Have fun with Earnest!
- Reading for Friday: Quicksort
- Today's lab writeup: Invariants for merge (part of Exercise 2a)
- You can draw pictures on the computer
- You can draw pictures on paper
- You can write things a bit more mathematically
- KS is the note taker today.
- Earnest is happy to answer questions about Skip Lists, whether he knows
it or not.
- I'll also try to be on email, and you can collaborate on sending
- Extra credit:
- Convocation, noon, today.
- Presentations on Grinnell institutional image, noon on Thursday or Friday.
- Other things you should do (warning! tickets go quickly)
- Neverland players.
- Balancing acts.
An introduction to merge sort
- There's a theoretical analysis that shows that O(nlogn) comparisons
are necessary for a comparison-based sort.
- All of the sorting algorithms we've seen so far are O(n^2).
- Can we do better? (Can we achieve the known lower bound?)
- One strategy for writing faster algorithms is "divide and conquer".
When presented with a large problem,
- split it into two parts
- solve each part
- combine the solutions
- The easiest way to split an array: first half and second half.
- We sort the two halves.
- What can we do after sorting the two halves?
- Let's let t(n) represent the time mergesort takes on input of size n.
- To sort an array of size n, we must sort two arrays of size n/2, and
then merge the two. Merging takes n steps.
- We have a simple recurrence relation: t(n) = 2*t(n/2) + n
- We can explore recurrence relations top-down or bottom up.
- Bottom up
- t(1) = 1
- t(2) = 21 + 2 = 4
- t(4) = 24 + 4 = 12
- t(8) = 212 + 8 = 32
- t(16) = 232 + 16 = 80
- Hmmm ...
- Top down
- t(n) = 2t(n/2) + n
- t(n) = 2(2t(n/4) + n/2) + n // Expand t(n/2)
- t(n) = 22t(n/4) + 2n/2 + n // Distribute
- t(n) = 4t(n/4) + 2n // Simplify
- t(n) = 4(2t(n/8) + n/4) + 2n // Expand t(n/4)
- t(n) = 42t(n/8) + 4n/4 + 2n // Distribute
- t(n) = 8t(n/8) + 3n // Simplify
- t(n) = 8(2t(n/16) + n/8) + 3n // Expand t(n/8)
- t(n) = 82t(n/16) + 8n/8 + 3n // Distribute
- t(n) = 16*t(n/16) + 4n // Distribute
- I see a pattern:
- t(n) = 2^k*t(n/(2^k)) + kn
- If we let 2^x = n, we get
- t(n) = n*t(1) + xn
- If 2^x = n, then x = log2(n)
- So t(n) = n + log2(n) * n
- The second term dominates. t(n) is in O(nlogn)