Sorting a collection of values -- arranging them in a fixed order, usually alphabetical or numerical -- is one of the commonest computing applications. When the number of values is even moderately large, sorting is such a tiresome, error-prone, and time-consuming process for human beings that the programmer should automate it whenever possible. For this reason, computer scientists have studied this application with extreme care and thoroughness.
One of the clear results of their investigations is that no one algorithm for sorting is best in all cases. Which approach is best depends on whether one is sorting a small collection or a large one, on whether the individual elements occupy a lot of storage (so that moving them around in memory is time-consuming), on how easy it is to compare two elements to figure out which one should precede the other, and so on. In this course we'll be looking at two of the most generally useful algorithms for sorting: the insertion sort, which is the subject of today's reading, and the merge sort, which we'll talk about in the next reading.
Imagine first that we're given a collection of values and a rule for
arranging them. The values might actually be stored either in a list or in
a vector; let's assume first that they are in a list. The rule typically
takes the form of a predicate of arity 2 that can be applied to any two
values in the set to determine whether the first of them could precede the
second when the values have been sorted. (For example, if one wants to
sort a set of real numbers into ascending numerical order, the rule should
be the predicate <=; if one wants to sort a set of strings into
alphabetical order, ignoring case, the rule should be string-ci<=?,
and so on.)
The insertion sort works by taking the values one by one and inserting each
one into a new list that it constructs, constantly maintaining the
condition that the elements of the new list are in the desired order with
respect to one another. Clearly, this condition will not be maintained if
each element is added to the new list at the beginning, using
cons; instead, the insertion sort adds each element at a
carefully selected position within the new list, placing the new element
after each previously placed element that precedes it according
to the given precedence rule, but before every such element that
it precedes. The following procedure, insert, adds a new
element to a list in exactly this way. For the moment, we'll assume that
the elements of the list are real numbers and than we want to sort them
into ascending order; <= is therefore used as the ordering
predicate.
;;; insert: add a given real number to a given list of real numbers in ;;; ascending order, returning a new list, also in ascending order ;;; Givens: ;;; NEW-ELEMENT, a real number ;;; LS, a list of real numbers ;;; Result: ;;; EXTENDED, a list of real numbers ;;; Precondition: ;;; LS is in ascending order (that is, each element other than the first ;;; is greater than or equal to the one before it). ;;; Postconditions: ;;; (1) The elements of EXTENDED are exactly the elements of LS together ;;; with NEW-ELEMENT. ;;; (2) EXTENDED is in ascending order. (define insert (lambda (new-element ls) (cond ((null? ls) (list new-element)) ((<= new-element (car ls)) (cons new-element ls)) (else (cons (car ls) (insert new-element (cdr ls)))))))
In English: If the list into which the new element is to be inserted is empty, return a list containing only the new element. If the new element can precede the first element of the existing list, then, since the existing list is assumed to be sorted already, it must also be able to precede every element of the existing list, so attach the new element onto the front of the existing list and return the result. Otherwise, we haven't yet found the place, so issue a recursive call to insert the new element into the cdr of the current list, then reattach its car at the beginning of the result.
The preceding version of the insert procedure is not tail-recursive.
When dealing with long lists, you may want to use the following
tail-recursive version, which uses space more economically:
;;; insert: add a given real number to a given list of real numbers in ;;; ascending order, returning a new list, also in ascending order ;; Givens: ;; NEW-ELEMENT, a real number ;; LS, a list of real numbers ;; Result: ;; EXTENDED, a list of real numbers ;; Precondition: ;; LS is in ascending order (that is, each element other than the first ;; is greater than or equal to the one before it). ;; Postconditions: ;; (1) The elements of EXTENDED are exactly the elements of LS together ;; with NEW-ELEMENT. ;; (2) EXTENDED is in ascending order. (define insert (lambda (new-element ls) (let kernel ((rest ls) (bypassed '())) (cond ((null? rest) (revappend bypassed (list new-element))) ((<= new-element (car rest)) (revappend bypassed (cons new-element rest))) (else (kernel (cdr rest) (cons (car rest) bypassed))))))) ;;; revappend: attach the reverse of one given list to the front of another ;; Givens: ;; LEFT and RIGHT, both lists. ;; Result: ;; COMBINED, a list. ;; Preconditions: ;; None. ;; Postcondition: ;; The elements of COMBINED are the elements of LEFT, in reverse order, ;; followed by the elements of RIGHT (without reversal). (define revappend (lambda (left right) (if (null? left) right (revappend (cdr left) (cons (car left) right)))))
Now let's return to the overall process of sorting an entire list. The insertion sort algorithm simply takes up the elements of the list to be sorted one by one and inserts each one into a new list, initially empty. We can achieve this with a simple fold:
;;; insertion-sort: arrange a given list of real numbers in ascending order ;; Given: ;; UNSORTED, a list of real numbers. ;; Result: ;; SORTED, a list of real numbers. ;; Preconditions: ;; None. ;; Postconditions: ;; (1) The elements of SORTED are exactly the elements of UNSORTED. ;; (2) SORTED is in ascending order. (define insertion-sort (fold-list '() insert))
By writing the specific predicate <= into the definition of insert, we restricted the preceding version of insertion-sort so
that it applies only to lists of real numbers and always returns a list in
ascending numerical order. Let's go back now and lift that restriction.
One way to do this would be to give insert three arguments instead
of two, providing the ordering predicate as well as the list and the new
element. In many applications, however, the nature of the desired ordering
is known before the particular list to be ordered and is constant over many
applications to different lists and new elements, so that it makes sense to
tweak the interface to insert to accommodate this difference.
Our generalized insert procedure, therefore, will be a higher-order
procedure that takes the ordering predicate may-precede? as its
argument and returns a customized insertion procedure that uses that
predicate. This customized procedure takes a (sorted) list and a new
element as its arguments and returns the result of inserting the new
element into the list at the appropriate position.
;;; insert: given an ordering predicate, construct a procedure that ;;; adds a given value to a given ordered list, returning the new list ;;; (also ordered) ;;; Given: ;;; MAY-PRECEDE?, a binary predicate ;;; Result: ;;; INSERTER, a binary procedure ;;; Precondition: ;;; MAY-PRECEDE? expresses an ordering relation (that is, it is connected ;;; and transitive). ;;; Postcondition: ;;; Given any value NEW-ELEMENT that meets any precondition that ;;; MAY-PRECEDE? imposes on its first argument, and any list LS of values ;;; that meet the preconditions that MAY-PRECEDE? imposes on either of ;;; its arguments and moreover are ordered within LS by MAY-PRECEDE?, ;;; INSERTER returns a list, also ordered by MAY-PRECEDE?, containing all ;;; of the elements of LS and in addition NEW-ELEMENT. (define insert (lambda (may-precede?) (letrec ((inserter (lambda (new-element ls) (cond ((null? ls) (list new-element)) ((may-precede? new-element (car ls)) (cons new-element ls)) (else (cons (car ls) (inserter new-element (cdr ls)))))))) inserter)))
Using this generalized insertion procedure, we can write a similarly
generalized version of insertion-sort. Once again, we'll separate
the may-precede? parameter from the others, so that a call to insertion-sort actually returns a customized sorting procedure,
constructed by fold-list, which can then be applied to a list.
;;; insertion-sort: given a binary ordering predicate, construct a ;;; procedure that takes a list and arranges its elements to be consistent ;;; with the ordering ;; Given: ;; MAY-PRECEDE?, a binary predicate ;; Result: ;; SORTER, a procedure. ;; Precondition: ;; MAY-PRECEDE? expresses an ordering relation (that is, it is connected ;; and transitive). ;; Postconditions: ;; Given any list UNSORTED of elements that meet any preconditions that ;; MAY-PRECEDE? imposes on either of its arguments, SORTER returns a ;; list SORTED such that ;; ;; (1) The elements of SORTED are exactly the elements of UNSORTED. ;; (2) SORTED is ordered by MAY-PRECEDE?. (That is, if an element e0 ;; precedes an element e1 in SORTED, then applying MAY-PRECEDE? to ;; e0 and e1 would yield the result #T.) (define insertion-sort (lambda (may-precede?) (fold-list '() (insert may-precede?))))
Here are a few examples of how this version insertion-sort can be
used:
> ((insertion-sort <=) '(3 1 4 1 5 9 2 6))
(1 1 2 3 4 5 6 9)
> ((insertion-sort >=) '(3 1 4 1 5 9 2 6))
(9 6 5 4 3 2 1 1)
> (define alphabetize (insertion-sort string-ci<=?))
> (alphabetize '("Brunner" "Furuta" "Romero" "Shadel" "Poulin" "Hecker"
"Falcon" "Rapp" "Bakyu" "Manfredi" "Benness" "Sims"
"Morrison" "Herrington" "Pecsok" "Chamberlain"))
("Bakyu" "Benness" "Brunner" "Chamberlain" "Falcon" "Furuta" "Hecker"
"Herrington" "Manfredi" "Morrison" "Pecsok" "Poulin" "Rapp" "Romero"
"Shadel" "Sims")
> (define sort-chars (insertion-sort char-ci<=?))
> (define alphanagram
(lambda (str)
(list->string (sort-chars (string->list str)))))
> (alphanagram "conglomeration")
"acegilmnnooort"
Now let's consider the rather different case in which the values that we want to arrange are presented as a vector and the goal of the sorting algorithm is to overwrite the old arrangement of those values with a new, sorted arrangement of the same values.
Instead of constructing a new vector, we partition the original vector into two subvectors: a sorted subvector, in which all of the elements are in the correct order relative to one another, and an unsorted subvector in which the elements are still in their original positions. The two subvectors are not actually split into separate data structures. Instead, we just keep track of a boundary between them inside the original vector: Items to the left of the boundary are in the sorted subvector; items to its right, in the unsorted one. Initially the boundary is at the left end of the vector. The plan is to shift the boundary, one position at a time, to the right end. When it arrives, the entire vector has been sorted.
Once again, we curry the insert! procedure so that the ordering rule
can be provided before the vector. The procedure that insert!
returns takes three arguments: an element to be inserted into the sorted
part of the vector, the vector itself, and the current boundary position.
The new element can be inserted at any position up to and including the
current boundary position, but it must be placed in the correct order
relative to elements to the left of that boundary. This means that any
elements that should follow the new one should be shifted one position to
the right in order to make room for the new one. (Elements that precede
the new one can keep their current positions.)
;;; insert: given an ordering predicate, construct a procedure that ;;; adds a given value to the sorted part of a vector. ;;; Given: ;;; MAY-PRECEDE?, a binary predicate ;;; Result: ;;; INSERTER!, a ternary procedure ;;; insert!: place a given value into a given vector, preserving ordering ;;; within an initial segment of the vector ;;; Precondition: ;;; MAY-PRECEDE? expresses an ordering relation (that is, it is connected ;;; and transitive). ;;; Postcondition: ;;; Given any value NEW-ELEMENT that meets any precondition that ;;; MAY-PRECEDE? imposes on its first argument, any vector VEC of values ;;; that meet the preconditions that MAY-PRECEDE? imposes on either of ;;; its arguments, and any natural number BOUNDARY less than the length ;;; of VEC, with the additional provision that the elements of VEC in ;;; positions less than BOUNDARY are already ordered with respect to ;;; MAY-PRECEDE?, INSERTER! rearranges the elements of VEC so that ;;; ;;; (1) The elements of VEC in positions less than or equal to BOUNDARY ;;; are NEW-ELEMENT and the values that were initially in positions ;;; less than BOUNDARY in VEC. ;;; (2) The elements in positions 0 through BOUNDARY, inclusive, in VEC ;;; are ordered by MAY-PRECEDE?. ;;; (3) The elements of VEC in positions greater than BOUNDARY are the ;;; same values that initially occupied those positions. (define insert! (lambda (may-precede?) (lambda (new-element vec boundary) (do ((candidate boundary (- candidate 1))) ((or (zero? candidate) (may-precede? (vector-ref vec (- candidate 1)) new-element)) (vector-set! vec candidate new-element)) (vector-set! vec candidate (vector-ref vec (- candidate 1)))))))
In English: Starting at the boundary and working from right to left,
examine each position in turn as a candidate for the position at which
new-element should be inserted. If the position number is 0 (so
that we've reached the left end of the vector), or if the element just to
the left of the current position can precede new-element, stop and
put new-element in the current candidate position. Otherwise, fill
in the current candidate position by copying the element just to its left
into it and proceed to the next iteration, in which the position of the
element just copied will be overwritten one way or the other.
Here, then, is the general, curried insertion sort procedure for vectors:
;;; insertion-sort!: given a binary ordering predicate, construct a ;;; procedure that takes a vector and destructively rearranges its elements ;;; to be consistent with the ordering ;;; Given: ;;; MAY-PRECEDE?, a binary predicate ;;; Result: ;;; SORTER!, a procedure. ;;; Precondition: ;;; MAY-PRECEDE? expresses an ordering relation (that is, it is connected ;;; and transitive). ;;; Postconditions: ;;; Given any vector VEC of elements that meet any preconditions that ;;; MAY-PRECEDE? imposes on either of its arguments, SORTER! destructively ;;; modifies VEC so that the following conditions are met: ;;; ;;; (1) The elements of VEC are the same as in its initial state. ;;; (2) VEC is in order by MAY-PRECEDE? ;;; ;;; SORTER! does not return any particular value; it is invoked only for ;;; its side effect. (define insertion-sort! (lambda (may-precede?) (let ((inserter! (insert! may-precede?))) (lambda (vec) (let ((len (vector-length vec))) (do ((boundary 0 (+ boundary 1))) ((= boundary len)) (inserter! (vector-ref vec boundary) vec boundary)))))))
Note that the local binding for inserter! is placed outside the
lambda-expression for the procedure that insertion-sort!
returns. This is so that inserter! is computed only once, when the
customized sorting procedure is constructed, rather than every time that
sorting procedure is invoked.