The merge sort

The insertion sort algorithm is not always the best one to use. When sorting n values, starting from a random initial arrangement, the insertion sort has to look through half of the elements in the sorted part of the data structure to find the correct insertion point for each new value it places. The size of that sorted part increases linearly from 0 to n, so its average size is n/2 and the average number of comparisons needed to insert one element is n/4. Taking all the insertions together, then, the insertion sort performs about n2/4 comparisons to sort the entire set.

Accordingly, when the number of values to be sorted is large (greater than one hundred, say), it is preferable to use a sorting method that is more complicated to set up initially but performs fewer operations on each element in the process of positioning it correctly. The merge sort algorithm is often an appropriate choice. We shall examine two variants of this algorithm -- one taking a list of values and returning a newly allocated list containing the same values, but in sorted order, and the other one taking a vector of values and arranging its elements as a ``destructive'' side effect.

Merge sort: constructing a sorted list

One of the building blocks for the first version of the merge sort is the merge procedure, which takes two sorted lists as arguments and returns a sorted list containing all of the values from both argument lists. (Program 4.3 on page 98 of our textbook is an implementation of this procedure, in the special case where the lists are lists of real numbers arranged in ascending order.) If we abstract out the ordering relation as may-precede?, a curried version of the merge looks like this:

(define merge
  (lambda (may-precede?)
    (lambda (initial-left initial-right)
      (let kernel ((left initial-left)
                   (right initial-right))
        (cond ((null? left) right)
              ((null? right) left)
              ((may-precede? (car left) (car right))
               (cons (car left) (kernel (cdr left) right)))
              (else
               (cons (car right) (kernel left (cdr right)))))))))

If either of the given lists is null, the result is simply the other list. Otherwise, we split off whichever of the first elements of the given lists may precede the other, issue a recursive call to merge the remainders of both lists, and prepend the selected first element to the result.


Exercise 1

Use merge to merge the lists (2 3 4 7 8 10 12) and (1 6 11 13 14).


Exercise 2

What happens if merge is applied to lists that are not already in ascending order?


Exercise 3

What happens if merge is given two empty lists as arguments? Why?


The merge sort works by starting with short lists, merging them to form somewhat longer lists, merging the somewhat longer lists to form still longer ones, and so on until only one list remains -- the result of the final merge operation. The short lists that we begin with must satisfy the precondition for the merge procedure: they must already be sorted. The textbook suggests two ways of establishing this precondition:

In this lab, we'll look at a third alternative, using a kind of recursion known as ``divide and conquer.'' Sorting a list ls is trivial if ls has no elements, or only one; those are our base cases for the recursion. If ls has two or more elements, we can use the merge procedure to produce a sorted list, provided that we can somehow get two lists, sorted separately, from ls. Well, we can use recursive calls to sort the two parts of ls, provided that we can somehow split ls into two sublists. And we get the greatest leverage if those lists are equal in size, so that the subproblems for the recursive calls are much smaller than the original problem.

This suggests that we need a procedure split that takes a list ls of two or more elements and divides it into two lists, equal or nearly equal in size. Since the splitting precedes the sorting, the elements can be distributed into the two lists in any order. Here's one method:

(define split
  (lambda (ls)
    (let kernel ((rest ls)
                 (left null)
                 (right null))
      (if (null? rest)
          (values left right)
          (kernel (cdr rest) (cons (car rest) right) left)))))

The parameters left and right are both ``so-far'' accumulators, each holding about half of the elements so far encountered in the original list. At each invocation kernel, either left and right have the same number of elements (as is true initially) or left has one more element than right; prepending the element taken from rest to right and then having the two lists swap places in each recursive call ensures that any imbalance is immediately redressed.


Exercise 4

What are the values of (split (list 'a 'b 'c 'd 'e 'f 'g))? Figure it out by hand first, then use DrScheme to check your answer. Account for any differences between DrScheme's answer and yours.


The full merge-sort procedure checks to see whether either of the base cases holds; if not, it invokes split to create two subproblems of the same kind, solves each one by a recursive call, and finally invokes merge to combine the results.

(define merge-sort
  (lambda (may-precede?)
    (let ((merger (merge may-precede?)))
      (lambda (ls)
        (let kernel ((subproblem ls))
          (if (or (null? subproblem) (null? (cdr subproblem)))
              subproblem
              (call-with-values
                (lambda () (split subproblem))
                (lambda (left right)
                  (merger (kernel left) (kernel right))))))))))

Exercise 5

Using merge-sort, sort the strings "blanc", "noir", "rouge", "bleu", "jaune", "vert", "gris", "brun", and "rose" into alphabetical order.


Merge sort: overwriting the contents of a vector

Suppose, now, that the values to be sorted arrive in the form of a vector, and that the objective is to revise the contents of the vector so that at the end of the sorting procedure the same elements are present but arranged in the order specified by the comparison rule. As in the constructive version of the algorithm, we want to work our way up from single-element subvectors. Instead of allocating a separate vector for each single element, however, we can take advantage of constant-time access to the elements of a vector by making the separation purely notional: We can identify a sub-vector of the original vector by keeping track of the positions at which it begins and ends.

I'll adopt the convention that the starting position of a subvector is the position of the first element that is inside the subvector, and the ending position is the position of the first element after and outside of the subvector (or, if there is no such element, the length of the entire vector). So, within the vector '#(a b c d e), the subvector with elements 'b and 'c has starting position 1 and ending position 3. (This convention has the advantage that the number of elements in the subvector is the difference between the ending position and the starting position. The arguments to Scheme's substring procedure are required to follow the same convention, for the same reason.)

The merge! procedure takes two adjacent subvectors of the same vector, both of which must already be sorted, and overwrites them with the merged (and therefore sorted) version of their elements. Unfortunately, this cannot be done completely ``in place''; there must be a ``holding area'' that provides separate storage for the elements as they are merged, and at the end of the merging process the elements have to be copied back from the holding area into the original vector. In this implementation, the holding area takes the form of a second vector, of the same size as the original. As the two adjacent subvectors of the original vector are merged, they are placed into the positions of the holding vector that they will eventually occupy in the original vector; at the end of the merge, they are copied.


Exercise 6

Define a Scheme procedure copy-subvector! that takes four arguments -- a source vector source, the starting position start and ending position finish of a subvector of source, and a target vector target of the same size as source -- and copies the specified subvector of source into the corresponding positions in target, using vector-set!.

> (define vector-1 (vector 'alpha 'beta 'gamma 'delta 'epsilon))
> (define vector-2 (vector 'first 'second 'third 'fourth 'fifth))
> (copy-subvector! vector-1 1 3 vector-2)
> vector-2
#(first beta gamma fourth fifth)
> vector-1
#(alpha beta gamma delta epsilon)

The arguments to the merge! procedure are the starting position of the left subvector, the boundary position (which is both the ending position of the left subvector and the starting position of the right one) and the ending position of the right subvector.

The kernel procedure keeps track of three positions. The parameter target counts off the positions in the holding vector as they are filled up, from left to right. Current-left keeps track of the position of the leftmost element in the first subvector that has not yet been copied into the holding vector; current-right does the same for the second subvector. There are three cases, which are handled in three separate cond-clauses:

(define merge!
  (lambda (start-left boundary finish-right)
    (let kernel ((target start-left)
                 (current-left start-left)
                 (current-right boundary))
      (cond ((= current-left boundary)
             (copy-subvector! holding start-left current-right vec))
            ((or (= current-right finish-right)
                 (may-precede? (vector-ref vec current-left)
                               (vector-ref vec current-right)))
             (vector-set! holding target (vector-ref vec current-left))
             (kernel (+ target 1) (+ current-left 1) current-right))
            (else
             (vector-set! holding target (vector-ref vec current-right))
             (kernel (+ target 1) current-left (+ current-right 1)))))))

In this definition of merge!, the identifiers vec, holding, and may-precede? are not bound. To make it work, one must embed it in a context in which all of these identifiers have been bound. We'll do this by making merge! a locally defined procedure inside merge-sort!.

Since we have random access to vectors, the vector analogue of the split procedure just computes the midpoint of some subvector and returns it, so that it can be used as the boundary between the two subproblems to be solved recursively. The index of the midpoint is the average of starting point and the ending point of the subvector.

(define merge-sort!
  (lambda (may-precede?)
    (lambda (vec)
      (let* ((len (vector-length vec))
             (holding (make-vector len)))
        (letrec
          ((subsort!
            (lambda (start finish)
              (if (< 1 (- finish start))
                  (let ((boundary (quotient (+ start finish) 2)))
                    (subsort! start boundary)
                    (subsort! boundary finish)
                    (merge! start boundary finish)))))
           (merge!
            (lambda (start-left boundary finish-right)
              (let kernel ((target start-left)
                           (current-left start-left)
                           (current-right boundary))
                (cond ((= current-left boundary)
                       (copy-subvector! holding start-left current-right vec))
                      ((or (= current-right finish-right)
                           (may-precede? (vector-ref vec current-left)
                                         (vector-ref vec current-right)))
                       (vector-set! holding target
                         (vector-ref vec current-left))
                       (kernel (+ target 1) (+ current-left 1) current-right))
                      (else
                       (vector-set! holding target
                         (vector-ref vec current-right))
                       (kernel (+ target 1) current-left (+ current-right 1))))))))
          (subsort! 0 len))))))

Exercise 7

Define a Scheme procedure that takes a vector of real numbers and determines whether its elements are arranged in ascending numerical order, returning #t if it is and #f if it is not.


Exercise 8

Design a test procedure that generates a vector of ten thousand integers randomly selected from the range from 0 to 999999, calls merge-sort! to sort the vector into ascending numerical order, and checks whether the sort succeeded, returning #t if the elements wind up in ascending numerical order and #f if they do not.


Exercise 9

Sometimes, instead of sorting the elements of a list, we want to sort the entries in an association list so that the keys are arranged in a particular order (without regard to the corresponding values). Let's call a procedure that takes an association list as its argument and returns another association list with the same entries, but arranged so that the keys are in a particular order, an alist-sorter.

Define a procedure alist-sort that takes, as its only argument, a binary predicate may-precede? expressing an ordering relation, and returns an alist-sorter that arranges the entries of any given association list so that their keys are ordered by may-precede?.


This document is available on the World Wide Web as

http://www.cs.grinnell.edu/~stone/courses/scheme/merge-sort.xhtml

created November 25, 1997
last revised April 21, 2000

John David Stone (stone@cs.grinnell.edu)