The insertion sort algorithm is not always the best one to use. When sorting n values, starting from a random initial arrangement, the insertion sort has to look through half of the elements in the sorted part of the data structure to find the correct insertion point for each new value it places. The size of that sorted part increases linearly from 0 to n, so its average size is n/2 and the average number of comparisons needed to insert one element is n/4. Taking all the insertions together, then, the insertion sort performs about n2/4 comparisons to sort the entire set.
Accordingly, when the number of values to be sorted is large (greater than one hundred, say), it is preferable to use a sorting method that is more complicated to set up initially but performs fewer operations on each element in the process of positioning it correctly. The merge sort algorithm is often an appropriate choice. We shall examine two variants of this algorithm -- one taking a list of values and returning a newly allocated list containing the same values, but in sorted order, and the other one taking a vector of values and arranging its elements as a ``destructive'' side effect.
One of the building blocks for the first version of the merge sort is the
merge procedure, which takes two sorted lists as arguments and
returns a sorted list containing all of the values from both argument
lists. (Program 4.3 on page 98 of our textbook is an implementation of
this procedure, in the special case where the lists are lists of real
numbers arranged in ascending order.) If we abstract out the ordering
relation as may-precede?, a curried version of the
merge looks like this:
(define merge
(lambda (may-precede?)
(lambda (initial-left initial-right)
(let kernel ((left initial-left)
(right initial-right))
(cond ((null? left) right)
((null? right) left)
((may-precede? (car left) (car right))
(cons (car left) (kernel (cdr left) right)))
(else
(cons (car right) (kernel left (cdr right)))))))))
If either of the given lists is null, the result is simply the other list. Otherwise, we split off whichever of the first elements of the given lists may precede the other, issue a recursive call to merge the remainders of both lists, and prepend the selected first element to the result.
Use merge to merge the lists (2 3 4 7 8 10 12)
and (1 6 11 13 14).
What happens if merge is applied to lists that are not already
in ascending order?
What happens if merge is given two empty lists as arguments?
Why?
The merge sort works by starting with short lists, merging them to form
somewhat longer lists, merging the somewhat longer lists to form still
longer ones, and so on until only one list remains -- the result of the
final merge operation. The short lists that we begin with must satisfy the
precondition for the merge procedure: they must already be
sorted. The textbook suggests two ways of establishing this precondition:
One approach is to put just one value into each short list. A single value
is already ``sorted,'' in a trivial or vacuous sense (and the
merge procedure will work correctly on such single-element
lists).
A second idea, which the textbook calls the ``natural mergesort,'' is to pre-process the data in their initial arrangement, grouping together sequences of values that are already in the correct relative order, and then merging the resulting groups.
In this lab, we'll look at a third alternative, using a kind of recursion
known as ``divide and conquer.'' Sorting a list ls is trivial
if ls has no elements, or only one; those are our base cases
for the recursion. If ls has two or more elements, we can use
the merge procedure to produce a sorted list, provided that we
can somehow get two lists, sorted separately, from ls. Well,
we can use recursive calls to sort the two parts of ls,
provided that we can somehow split ls into two sublists. And
we get the greatest leverage if those lists are equal in size, so that the
subproblems for the recursive calls are much smaller than the original
problem.
This suggests that we need a procedure split that takes a list
ls of two or more elements and divides it into two lists,
equal or nearly equal in size. Since the splitting precedes the sorting,
the elements can be distributed into the two lists in any order. Here's
one method:
(define split
(lambda (ls)
(let kernel ((rest ls)
(left null)
(right null))
(if (null? rest)
(values left right)
(kernel (cdr rest) (cons (car rest) right) left)))))
The parameters left and right are both ``so-far''
accumulators, each holding about half of the elements so far encountered in
the original list. At each invocation kernel, either
left and right have the same number of elements
(as is true initially) or left has one more element than
right; prepending the element taken from rest to
right and then having the two lists swap places in each
recursive call ensures that any imbalance is immediately redressed.
What are the values of (split (list 'a 'b 'c 'd 'e 'f 'g))?
Figure it out by hand first, then use DrScheme to check your answer.
Account for any differences between DrScheme's answer and yours.
The full merge-sort procedure checks to see whether either of
the base cases holds; if not, it invokes split to create two
subproblems of the same kind, solves each one by a recursive call, and
finally invokes merge to combine the results.
(define merge-sort
(lambda (may-precede?)
(let ((merger (merge may-precede?)))
(lambda (ls)
(let kernel ((subproblem ls))
(if (or (null? subproblem) (null? (cdr subproblem)))
subproblem
(call-with-values
(lambda () (split subproblem))
(lambda (left right)
(merger (kernel left) (kernel right))))))))))
Using merge-sort, sort the strings "blanc",
"noir", "rouge", "bleu",
"jaune", "vert", "gris",
"brun", and "rose" into alphabetical order.
Suppose, now, that the values to be sorted arrive in the form of a vector, and that the objective is to revise the contents of the vector so that at the end of the sorting procedure the same elements are present but arranged in the order specified by the comparison rule. As in the constructive version of the algorithm, we want to work our way up from single-element subvectors. Instead of allocating a separate vector for each single element, however, we can take advantage of constant-time access to the elements of a vector by making the separation purely notional: We can identify a sub-vector of the original vector by keeping track of the positions at which it begins and ends.
I'll adopt the convention that the starting position of a
subvector is the position of the first element that is inside the
subvector, and the ending position is the position of the first
element after and outside of the subvector (or, if there is no
such element, the length of the entire vector). So, within the vector
'#(a b c d e), the subvector with elements 'b and
'c has starting position 1 and ending position 3. (This
convention has the advantage that the number of elements in the subvector
is the difference between the ending position and the starting position.
The arguments to Scheme's substring procedure are required to
follow the same convention, for the same reason.)
The merge! procedure takes two adjacent subvectors
of the same vector, both of which must already be sorted, and overwrites
them with the merged (and therefore sorted) version of their elements.
Unfortunately, this cannot be done completely ``in place''; there must be
a ``holding area'' that provides separate storage for the elements as they
are merged, and at the end of the merging process the elements have to be
copied back from the holding area into the original vector. In this
implementation, the holding area takes the form of a second vector, of the
same size as the original. As the two adjacent subvectors of
the original vector are merged, they are placed into the positions of the
holding vector that they will eventually occupy in the original vector;
at the end of the merge, they are copied.
Define a Scheme procedure copy-subvector! that takes four
arguments -- a source vector source, the starting position
start and ending position finish of a subvector
of source, and a target vector target of the same
size as source -- and copies the specified subvector of
source into the corresponding positions in
target, using vector-set!.
> (define vector-1 (vector 'alpha 'beta 'gamma 'delta 'epsilon)) > (define vector-2 (vector 'first 'second 'third 'fourth 'fifth)) > (copy-subvector! vector-1 1 3 vector-2) > vector-2 #(first beta gamma fourth fifth) > vector-1 #(alpha beta gamma delta epsilon)
The arguments to the merge! procedure are the starting
position of the left subvector, the boundary position (which is both the
ending position of the left subvector and the starting position of the
right one) and the ending position of the right subvector.
The kernel procedure keeps track of three positions. The parameter
target counts off the positions in the holding vector as they
are filled up, from left to right. Current-left keeps track
of the position of the leftmost element in the first subvector that has not
yet been copied into the holding vector; current-right does
the same for the second subvector. There are three cases, which are
handled in three separate cond-clauses:
If current-left has been incremented enough times to make it
equal to the boundary, then the recursion can stop and all the
elements that have been moved to the holding vector can be copied back into
the original vector The remaining elements in the second subvector are
already in their correct sorted positions and need not be moved at all.
If no more elements remain in the second subvector, because
current-right has been incremented until it is equal to
finish-right, or if the current element from the first subvector
may precede the current element of the second subvector, copy the current
element from the first subvector into the holding area, then advance to the
next position in the first subvector and in the holding area.
Otherwise, copy the current element from the second subvector into the holding area, then advance to the next position in the second subvector and in the holding area.
(define merge!
(lambda (start-left boundary finish-right)
(let kernel ((target start-left)
(current-left start-left)
(current-right boundary))
(cond ((= current-left boundary)
(copy-subvector! holding start-left current-right vec))
((or (= current-right finish-right)
(may-precede? (vector-ref vec current-left)
(vector-ref vec current-right)))
(vector-set! holding target (vector-ref vec current-left))
(kernel (+ target 1) (+ current-left 1) current-right))
(else
(vector-set! holding target (vector-ref vec current-right))
(kernel (+ target 1) current-left (+ current-right 1)))))))
In this definition of merge!, the identifiers
vec, holding, and may-precede? are
not bound. To make it work, one must embed it in a context in which all of
these identifiers have been bound. We'll do this by making
merge! a locally defined procedure inside
merge-sort!.
Since we have random access to vectors, the vector analogue of the
split procedure just computes the midpoint of some subvector
and returns it, so that it can be used as the boundary between the two
subproblems to be solved recursively. The index of the midpoint is the
average of starting point and the ending point of the subvector.
(define merge-sort!
(lambda (may-precede?)
(lambda (vec)
(let* ((len (vector-length vec))
(holding (make-vector len)))
(letrec
((subsort!
(lambda (start finish)
(if (< 1 (- finish start))
(let ((boundary (quotient (+ start finish) 2)))
(subsort! start boundary)
(subsort! boundary finish)
(merge! start boundary finish)))))
(merge!
(lambda (start-left boundary finish-right)
(let kernel ((target start-left)
(current-left start-left)
(current-right boundary))
(cond ((= current-left boundary)
(copy-subvector! holding start-left current-right vec))
((or (= current-right finish-right)
(may-precede? (vector-ref vec current-left)
(vector-ref vec current-right)))
(vector-set! holding target
(vector-ref vec current-left))
(kernel (+ target 1) (+ current-left 1) current-right))
(else
(vector-set! holding target
(vector-ref vec current-right))
(kernel (+ target 1) current-left (+ current-right 1))))))))
(subsort! 0 len))))))
Define a Scheme procedure that takes a vector of real numbers and
determines whether its elements are arranged in ascending numerical order,
returning #t if it is and #f if it is not.
Design a test procedure that generates a vector of ten thousand integers
randomly selected from the range from 0 to 999999, calls
merge-sort! to sort the vector into ascending numerical order,
and checks whether the sort succeeded, returning #t if the
elements wind up in ascending numerical order and #f if they
do not.
Sometimes, instead of sorting the elements of a list, we want to sort the entries in an association list so that the keys are arranged in a particular order (without regard to the corresponding values). Let's call a procedure that takes an association list as its argument and returns another association list with the same entries, but arranged so that the keys are in a particular order, an alist-sorter.
Define a procedure alist-sort that takes, as its only
argument, a binary predicate may-precede? expressing an
ordering relation, and returns an alist-sorter that arranges the entries
of any given association list so that their keys are ordered by
may-precede?.
This document is available on the World Wide Web as
http://www.cs.grinnell.edu/~stone/courses/scheme/merge-sort.xhtml
created November 25, 1997
last revised April 21, 2000