Algorithms and OOD (CSC 207 2013F) : Outlines

# Outline 32: Merge Sort

Held: Wednesday, 30 October 2013

Back to Outline 31 - Quadratic Sorts. On to Outline 33 - Quicksort.

Summary

We consider the merge sort algorithm, our first O(nlogn) sorting algorithm.

Related Pages

Overview

• An introduction to merge sort.
• Analyzing merge sort.
• Lab.

• Today we will do a quick analysis of merge sort and then follow it up with some lab exercises.
• I'm moving the due time for the electronic and printed versions of the exam to 10:30 pm on Friday night. Put the printed version under my door.
• We should discuss the project and the role of Ushahidi in this class. Clearly, we were less successful at getting the materials ready than we would have liked this summer, and I was as unsuccessful at getting them ready during the semester.
• Upcoming extra credit opportunities:
• Study in Budapest Lunch, Today
• Learning from Alumni, Thursday: Jordan Shkolnick '11 (Microsoft)
• CS Table, Friday: Ambient Belonging
• One Grinnell Prize Event next week

## An introduction to merge sort

• There's a theoretical analysis that shows that O(nlogn) comparisons are necessary for a comparison-based sort.
• All of the sorting algorithms we've seen so far are O(n^2).
• Can we do better? (Can we achieve the known lower bound?)
• One strategy for writing faster algorithms is "divide and conquer". When presented with a large problem,
• split it into two parts
• solve each part
• combine the solutions
• The easiest way to split an array: first half and second half.
• We sort the two halves.
• What can we do after sorting the two halves?

## Analysis

• Let's let t(n) represent the time mergesort takes on input of size n.
• To sort an array of size n, we must sort two arrays of size n/2, and then merge the two. Merging takes n steps.
• We have a simple recurrence relation: t(n) = 2*t(n/2) + n
• We can explore recurrence relations top-down or bottom up.
• Bottom up
• t(1) = 1
• t(2) = 21 + 2 = 4
• t(4) = 24 + 4 = 12
• t(8) = 212 + 8 = 32
• t(16) = 232 + 16 = 80
• Hmmm ...
• Top down
• t(n) = 2t(n/2) + n
• t(n) = 2(2t(n/4) + n/2) + n // Expand t(n/2)
• t(n) = 22t(n/4) + 2n/2 + n // Distribute
• t(n) = 4t(n/4) + 2n // Simplify
• t(n) = 4(2t(n/8) + n/4) + 2n // Expand t(n/4)
• t(n) = 42t(n/8) + 4n/4 + 2n // Distribute
• t(n) = 8t(n/8) + 3n // Simplify
• t(n) = 8(2t(n/16) + n/8) + 3n // Expand t(n/8)
• t(n) = 82t(n/16) + 8n/8 + 3n // Distribute
• t(n) = 16*t(n/16) + 4n // Distribute
• I see a pattern:
• t(n) = 2^k*t(n/(2^k)) + kn
• If we let 2^x = n, we get
• t(n) = n*t(1) + xn
• If 2^x = n, then x = log2(n)
• So t(n) = n + log2(n) * n
• The second term dominates. t(n) is in O(nlogn)

## Lab

Copyright (c) 2013 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit `http://creativecommons.org/licenses/by/3.0/` or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.