Algorithms and OOD (CSC 207 2014S) : Outlines

# Outline 27: Merge Sort

Held: Wednesday, 5 March 2014

Back to Outline 26 - Quadratic Sorts. On to Outline 28 - Quicksort.

Summary

We consider the merge sort algorithm, our first O(nlogn) sorting algorithm.

Related Pages

Overview

• Lower bounds on sorting.
• Divide and conquer algorithms.
• An introduction to merge sort.
• Analyzing merge sort.

• Have fun with Earnest!
• Today's lab writeup: Invariants for merge (part of Exercise 2a)
• You can draw pictures on the computer
• You can draw pictures on paper
• You can write things a bit more mathematically
• KS is the note taker today.
• Earnest is happy to answer questions about Skip Lists, whether he knows it or not.
• I'll also try to be on email, and you can collaborate on sending me messages.
• Extra credit:
• Convocation, noon, today.
• Presentations on Grinnell institutional image, noon on Thursday or Friday.
• Other things you should do (warning! tickets go quickly)
• Neverland players.
• Balancing acts.

## An introduction to merge sort

• There's a theoretical analysis that shows that O(nlogn) comparisons are necessary for a comparison-based sort.
• All of the sorting algorithms we've seen so far are O(n^2).
• Can we do better? (Can we achieve the known lower bound?)
• One strategy for writing faster algorithms is "divide and conquer". When presented with a large problem,
• split it into two parts
• solve each part
• combine the solutions
• The easiest way to split an array: first half and second half.
• We sort the two halves.
• What can we do after sorting the two halves?

## Analysis

• Let's let t(n) represent the time mergesort takes on input of size n.
• To sort an array of size n, we must sort two arrays of size n/2, and then merge the two. Merging takes n steps.
• We have a simple recurrence relation: t(n) = 2*t(n/2) + n
• We can explore recurrence relations top-down or bottom up.
• Bottom up
• t(1) = 1
• t(2) = 21 + 2 = 4
• t(4) = 24 + 4 = 12
• t(8) = 212 + 8 = 32
• t(16) = 232 + 16 = 80
• Hmmm ...
• Top down
• t(n) = 2t(n/2) + n
• t(n) = 2(2t(n/4) + n/2) + n // Expand t(n/2)
• t(n) = 22t(n/4) + 2n/2 + n // Distribute
• t(n) = 4t(n/4) + 2n // Simplify
• t(n) = 4(2t(n/8) + n/4) + 2n // Expand t(n/4)
• t(n) = 42t(n/8) + 4n/4 + 2n // Distribute
• t(n) = 8t(n/8) + 3n // Simplify
• t(n) = 8(2t(n/16) + n/8) + 3n // Expand t(n/8)
• t(n) = 82t(n/16) + 8n/8 + 3n // Distribute
• t(n) = 16*t(n/16) + 4n // Distribute
• I see a pattern:
• t(n) = 2^k*t(n/(2^k)) + kn
• If we let 2^x = n, we get
• t(n) = n*t(1) + xn
• If 2^x = n, then x = log2(n)
• So t(n) = n + log2(n) * n
• The second term dominates. t(n) is in O(nlogn)

## Lab

Copyright (c) 2013-14 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit `http://creativecommons.org/licenses/by/3.0/` or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.