[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Labs] [Assignments] [Examples] [Bailey Docs] [SamR Docs] [Tutorial] [API]

**Held**: Wednesday, March 4, 1998

- Grinnell Women in Science is hosting at least four seminars and talks over the next two days. I'd strongly encourage you to attend some or all of them.
- Hopefully, I'll have the exams graded by Friday, but don't hold your breath.
- I'm also working on a library to illustrate all of these techniques. Perhaps that will also be ready on Friday.

- Another simple sorting technique is
*insertion sort*. - Insertion sort operates by segmenting the list into unsorted and sorted portions, and repeatedly removing an element from the unsorted portion and inserting it into the sorted portion.
- This may be likened to the way typical card players sorts their hands.
- In approximate code (assuming that we're writing this as part of
a class that subclasses
`Vector`

)/** * Sort a subvector. * pre: 0 <= lb <= ub < size() * pre: the elements in the vector are comparable using * a lessEqual method * post: elementAt(lb) <= elementAt(lb+1) <= ... <= elementAt(ub) */ public void insertionSort(int lb, int ub) { // Variables Object tmp; // The object that we'll be inserting // Check the preconditions Assert.pre(lb <= ub, "nonempty subrange"); Assert.pre(0 <= lb, "reasonable starting point"); Assert.pre(ub < size(), "reasonable ending point"); // Base case: one element, so it's sorted if (lb == ub) { return; } // Remember the element at the end, and then remove it tmp = elementAt(ub); removeElementAt(ub); // Sort all but that element insertionSort(lb,ub-1); // Insert the element at the proper place. This may throw an // IncomparableException. insert(findPlace(lb,ub-1,tmp), tmp); } // insertionSort

- What's the running time? There are O(n) insertions and O(n)
calls to
`findPlace()`

(which finds the proper place in the vector to insert the element). Each insertion requires O(n) steps, and each place determination takes O(log_2(n)) steps, so the running time is O(n*(n+log_2(n)) which is O(n). - What's the extra storage (ignoring recursive calls)? It should be constant.
- What's the extra storage for the recursive method calls? It can be O(n), since this is not a tail-recursive method.

- We can come up with a number of sorting techniques based on the
*divide and conquer*technique. One of the most straightforward is**merge sort**. - In merge sort, you split the vector (or list or array or ...) into two parts, sort each part, and then merge them together.
- Unlike the previous algorithms, merge sort requires extra space for
the sorted vector (or subvectors).
- The book uses a helper array named
`temp`

- I'll be writing a slightly more readable but less efficient version that returns the sorted version of the vector.

- The book uses a helper array named
- In approximate Java code,
/** * Sort a vector, returning a sorted version of the vector. * pre: The elements in the vector are comparable. * post: Returns a sorted version of the vector (defined elsewhere). * post: The original vector is not changed. */ public static VectorOfComparable mergeSort(VectorOfComparable v) { int middle; // Index of middle element // Base case: vector of size 0 or 1 if (v.size() <= 1) { return v.clone(); // Cloned, so that it is safe to modify // the returned vector } // if // Recursive case: split and merge middle = v.size() / 2; // Integer division gives integer return merge(mergeSort(v.subVector(0,middle)); mergeSort(v.subVector(middle+1,v.size()))); } // mergeSort /** * Merge two sorted vectors into a single sorted vector. * pre: Both vectors are sorted * pre: Elements in both vectors are comparable * post: The returned vector is sorted, and contains all the * elements of the two vectors * post: The two arguments are not changed */ public static VectorOfComparable merge(VOC left, VOC right) { // Variables Vector result = new VectorOfComparable(left.size()+right.size()); int left_index=0; // Index into left vector int right_index=0; // Index into right vector int index=0; // Index into result vector // As long both vectors have elements, copy the smaller one. while ((left_index<left.size()) && (right_index<right.size())) { if(left[left_index].smaller(right[right_index])) { result[index++] = left[left_index++]; } // first element in left subvector is smaller else { result[index++] = right[right_index++]; } // first element in right subvector is smaller } // while // Copy any remaining parts of each vector. while(left_index<left.size()) { result[index++] = left[left_index++]; } while(right_index<right.size()) { result[index++] = right[right_index++]; } // That's it return result; } // merge

- What is the running time? We can use a somewhat clever analysis
technique.
- Assume that we're dealing with n = 2^x for some x.
- Consider all the sorts of vectors of size k as the same "level".
- There are n/k vectors of size k.
- Because we divide in half each time, there are log_2(n) levels.
- Going from level k to level k+1, we do O(n) work to merge.
- So, the running time is O(n*log_2(n)).

- Unfortunately, merge sort requires significantly more memory than do the other sorting routines (you can spend some time trying to come up with an "in place" merge sort, but you are quite likely to fail).

- Is it possible to write an O(n*log_2(n)) sorting algorithm that is based on comparing and swapping, but doesn't require significantly extra space?
- Yes, if you're willing to rely on probabilities.
- In the
**quicksort**algorithm, you split (partition) the array to be sorted into three parts: those smaller than some middle element, those equal to some middle element, and those larger than some middle element. You can then sort the three pieces, and glue them back together. - With a little work, you can do this partitioning in place, so that there is no overhead (and so that "glueing" is basically a free operation).
- (It is not necesarrily okay to partition the array into two parts: those
less than or equal to the element, and those greater than or
equal to the element.)
/** * Sort a subvector using quick sort. * pre: All elements are comparable. * pre: 0 <= lb <= ub < size() * post: The vector is sorted. */ public void quickSort(int lb, int ub) { // Variables Comparable pivot; // The pivot used to split the vector int mid; // The position of the pivot // Base case: size one if (lb == ub) return; // Pick a pivot. Often, this is the second element of the // vector (for some reason, it works better than the first). // More clever selection techniques might also exist. pivot = selectPivot(lb,ub); // Determine the position of the pivot mid = split(pivot, lb, ub); // Recurse if (mid-1>lb) quickSort(lb,mid-1); if (mid+1<ub) quickSort(mid+1,ub); } // quickSort /** * Partition a vector into two subvectors: those less than or * equal to a particular element (the pivot), and those greater * than the pivot. * returns: The index of the pivot (mid) if it's in the vector. * returns: the index of the largest element smaller than * pivot, if it's not. * pre: lb <= ub * pre: the elements in the vector are comparable * post: for all 0 <= i <= mid, mid < j < size() * elementAt(i) <= elementAt(j) */ public int partition(Comparable pivot, int lb, int ub) { // Keep going until we reach the middle while(lb < ub) { // If we have a small enough element on the left, advance // the lower-bound. if (elementAt(lb).lessEqual(pivot)) ++lb; // If we have a large enough element on the right, advance // the upper bound. For safety, we don't advance both in the // same round. else if (pivot.less(elementAt(ub))) --ub; // If necessary, swap elements in the wrong part of the // vector if (pivot.less(elementAt(lb)) && elementAt(ub).less(pivot)) { swap(lb,ub); } } // while // lb = ub, so we can return return lb; } // split

- We might want to spend some time verifying that our
`partition`

method actually works (that it partitions correctly and that it terminates). - What is the running time of this algorithm? It depends on how
well we partition. If we partition into two equal halves, then
we can say
- Partitioning a vector of length n takes O(n) steps.
- The time to partition those two partitions into four parts is also O(n).
- If each partition is perfect (splits it exactly in half), we can stop the process after O(log_2(n)) levels.
- This gives a running time of O(n*log_2(n)).

- However, bad choice of pivots can give significantly worse running time.
If we always chose the largest element as the pivot, this algorithm would
be equivalent to
`selectionSort`

, and would take time O(n^2).

On to Sorting Without Swapping

Back to Introduction to Sorting

Outlines:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

Current position in syllabus

[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Labs] [Assignments] [Examples] [Bailey Docs] [SamR Docs] [Tutorial] [API]

**Disclaimer** Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

This page may be found at http://www.math.grin.edu/~rebelsky/Courses/CS152/98S/home/rebelsky/public_html/Courses/CS152/98S/Outlines/outline.25.html

Source text last modified Tue Jan 12 11:52:24 1999.

This page generated on Mon Jan 25 09:49:23 1999 by SiteWeaver. Validate this page.

Contact our webmaster at rebelsky@math.grin.edu