To search a data structure is to examine its elements singly
until one has either found an element that has a desired property or
concluded that the data structure contains no such element. For instance,
one might search a list of integers for an even element, or a vector of
pairs for a pair having the string "elephant" as its cdr. Scheme's
predefined assq, assv, and assoc procedures search
association lists.
In a linear data structure, such as a flat list or a vector, there is an obvious algorithm for conducting a search: Start at the beginning of the data structure and traverse it, testing each element. Eventually one will either find an element that has the desired property or reach the end of the structure without finding such an element, thus conclusively proving that there is no such element. Here's a vector version of the linear-search algorithm:
;;; linear-search: find the position of an element in a given vector that ;;; satisfies a given predicate ;;; Givens: ;;; TEST?, a unary predicate. ;;; VEC, a vector. ;;; Result: ;;; OUTCOME, either a natural number or #F. ;;; Precondition: ;;; Every element of VEC satisfies the preconditions that TEST? imposes on ;;; its argument. ;;; Postconditions: ;;; (1) If no element of VEC satisfies TEST?, OUTCOME is #F. ;;; (2) If at least one element of VEC satisfies TEST?, then OUTCOME is ;;; a position in VEC, and the element at position OUTCOME in VEC ;;; satisfies TEST? (define linear-search (lambda (test? vec) (let ((len (vector-length vec))) (let kernel ((position 0)) (cond ((= position len) #f) ((test? (vector-ref vec position)) position) (else (kernel (+ position 1))))))))
Here are two examples of the use of this procedure:
> (define sample (vector 1 3 5 7 8 11 13)) > (linear-search even? sample) 4 > (linear-search (right-section = 12) sample) #f
This search procedure returns #f if the search is unsuccessful; if
it is successful, it returns the position in the specified vector at which
the desired element can be found. There are many variants of this idea:
One might, for instance, prefer to signal an error or display a diagnostic
message if a search is unsuccessful. In the case of a successful search,
one might simply return #t (if all that is needed is an indication
of whether an element having the desired property is present in or absent
from the list), or one might return the element found rather than its
position in the vector.
The linear search algorithms just described can be quite slow if the data structure to be searched is large. If one has a number of searches to carry out in the same data structure, it is often more efficient to ``pre-process'' the values, sorting them and transferring them to a vector, before starting those searches. One can then use the much faster binary-search algorithm.
Binary search is a more specialized algorithm than linear search. It requires a random-access structure, as opposed to one that offers only sequential access, and it is limited to the kind of test in which one is looking for a particular value that has a unique relative position in some ordering. For instance, one could use a binary search to look for an element equal to 12 in a vector of integers, since 12 is uniquely located between integers less than 12 and integers greater than 12; but one wouldn't use binary search to look for an even integer, since the even integers don't have a unique position in any natural ordering of the integers.
The idea in a binary search is to divide the sorted vector into two approximately equal parts, examining the element at the point of division to determine which of the parts must contain the value sought. Actually, there are usually three possibilities:
The element at the point of division cannot precede the value sought in the ordering that was used to sort the vector. In this case, the value sought must be in a position with a lower index that the element at the point of division (if it is present at all) -- in other words, it must be in the left half of the vector. The search procedure invokes itself recursively to search just the left half of the vector.
The value sought cannot precede the element at the point of division. In this case, the value sought must be in a higher-indexed position -- in the right half of the vector -- if it is present at all. The search procedure invokes itself recursively to search just the right half of the vector.
The value sought is the element at the point of division. The search has succeeded.
There is one other way in which the recursion can bottom out: If, in some recursive call, the subvector to be searched (which will be half of a half of a half of ... of the original vector) contains no elements at all, then the search obviously cannot succeed and the procedure should take the appropriate failure action.
Here, then, is the basic binary-search algorithm. It is curried, so that
the ordering predicate is to be supplied first and separately; binary-search returns a customized searching procedure that one can, in
turn, apply to a vector and the item one is looking for. The identifiers
lower-bound and upper-bound denote the starting and ending
positions of the part of the vector within which the value sought must lie,
if it is present at all. As in the reading on sorting by merging, let's adopt the convention that the starting
position is ``inclusive'' -- it is the first position that is in the
subvector -- and the ending position is ``exclusive'' -- it is the position
after the last position in the subvector.
;;; binary-search: given an ordering predicate, construct and return a ;;; procedure that finds the position of a given value in a given vector ;;; ordered by that predicate ;;; Given: ;;; MAY-PRECEDE?, a binary predicate. ;;; Result: ;;; SEEKER, a binary procedure. ;;; Precondition: ;;; MAY-PRECEDE? is an ordering relation. ;;; Postcondition: ;;; Given a vector VEC, every element of which satisfies the ;;; preconditions that MAY-PRECEDE? imposes on either of its arguments, ;;; and which is ordered by MAY-PRECEDE?, and a value SOUGHT, SEEKER ;;; returns either #F (if SOUGHT not an element of VEC) or a zero-based ;;; position in VEC at which SOUGHT occurs. (define binary-search (lambda (may-precede?) (lambda (vec sought) (let kernel ((lower-bound 0) (upper-bound (vector-length vec))) (if (< lower-bound upper-bound) (let* ((midpoint (quotient (+ lower-bound upper-bound) 2)) (middle-element (vector-ref vec midpoint))) (cond ((not (may-precede? middle-element sought)) (kernel lower-bound midpoint)) ((not (may-precede? sought middle-element)) (kernel (+ midpoint 1) upper-bound)) (else midpoint))) #f)))))
In each recursive call to kernel, the length of the subvector within
which the value sought must lie, if it is present at all, is cut in half.
Since even a very large vector cannot be halved very many times, binary
search is typically much, much faster than linear search.