Radix sorting

Sorting a collection of records usually involves inspecting one field of this record, called the key, and arranging the records so that their keys are in ascending order. The keys may be natural numbers -- serial numbers, perhaps -- or they may be short strings of digits or other characters, such as the letters of a person's surname or the digits of a ZIP code.

If a key is a short array of characters or other values, or if, like a serial number, it can easily be converted into such an array, it is often possible to use a sorting method known as radix sorting. In radix sorting, one sets up a queue for each possible value of a component of the key. (For instance, if sorting by ZIP codes, one would set up ten queues, one for each digit from 0 to 9.) Then one distributes the records into these component queues by examining the last or least significant component of each keys (so that, for instance, all of the records with ZIP codes ending in 0 would be placed in the 0 queue, all those with ZIP codes ending in 1 in the 1 queue, and so on). Next, one reconstructs the full collection by taking all of the elements in the 0 queue, then all of the elements in the 1 queue, and so on in order; the result is a master queue in which the records are sorted by their last digit.

The next step is to redistribute the records into the component queues, this time according to the next-to-last component of the key, and to reconstruct the master queue from the component queues in the same way as before. Since the distribution process is stable, in the sense that it will not change the order of records with equal keys, the resulting master queue is correctly sorted by the last two digits of the key.

By repeating the distribution and reconstruction steps for each component of the key, from least significant to most significant, one eventually obtains a completely sorted master queue. (If one is sorting by five-digit ZIP codes, for instance, five cycles of distribution and reconstruction are needed.)

Here is an HP Pascal procedure that implements this algorithm:

const
  KeySize = { the number of components in a key };
  Least = { the least possible value for one component of a key };
  Greatest = { the greatest possible value for one component of a key };
type
  Component = Least .. Greatest;
  KeyType = array [1 .. KeySize] of Component;
  Element = record
              Key: KeyType;
              { presumably other fields as well }
            end;

procedure RadixSort (var Master: Queue);

var
  Val: Component;
    { runs through the possible values of a component of the key }
  SmallQueue: array [Component] of Queue;
    { a queue for each of those possible values }
  Position: 1 .. KeySize;
    { runs through the positions of the components within a key }
  Item: Element;
    { one item at a time from the master queue }

begin

  { Set up the component queues. }

  for Val := Least to Greatest do
    SmallQueue[Val] := CreateQueue;

  { Run through a cycle of distribution and reconstruction for each
    component of the key. }

  for Position := KeySize downto 1 do begin

    { Distribute items from the master queue into the component queues. }

    while not EmptyQueue (Master) do begin
      Item := Dequeue (Master);
      Enqueue (Item, SmallQueue[Item.Key[Position]])
    end;

    { Reconstruct the master queue. }

    for Val := Least to Greatest do
      while not EmptyQueue (SmallQueue[Val]) do
        Enqueue (Dequeue (SmallQueue[Val]), Master)

  end;

  { Recycle the (empty) component queues. }

  for Val := Least to Greatest do
    DeallocateQueue (SmallQueue[Val]);

end;
This implementation presupposes the existence of the five basic queue functions CreateQueue, EmptyQueue, Dequeue, Enqueue, and DeallocateQueue. Here is a module that provides them, implementing them in terms of singly-linked lists with a header containing pointers to the first and last components:

{ This module defines an interface for a queue data type and implements it
  for HP 9000 Series 700 workstations under HP-UX 9.x, using HP Pascal.

  Programmer: John Stone, Grinnell College.
  Original version: April 18, 1996.
  Last revised: August 5, 1996.
}

{ The Dispose procedure does not actually recycle storage unless the
  heap_dispose compiler option is turned on. }

$heap_dispose on$

module Queues;

$search 'queue-element.o'$
import Element;

export

  type
    Queue = ^QueueHeader;

  { The CreateQueue function constructs and returns an empty queue capable
    of any number of elements. }

  function CreateQueue: Queue;

  { The EmptyQueue function determines whether a given queue is empty. }

  function EmptyQueue (Q: Queue): Boolean;

  { The Dequeue function extracts the oldest element from a non-empty queue
    and returns it.  It is an error to give an empty queue as the argument
    to dequeue. }

  function Dequeue (var Q: Queue): Element;

  { The Enqueue procedure adds an element at the end of an existing
    queue. }

  procedure Enqueue (Item: Element; var Q: Queue);

  { The DeallocateQueue procedure recycles all the storage associated with
    a given queue, leaving its argument undefined. }

  procedure DeallocateQueue (var Q: Queue);

implement

  import
    StdErr;

  const

    { The following constants are more or less arbitrary integers
      signifying various kinds of exceptions that can occur within this
      module. }

    FirstExceptionCode = 1;

    UninitializedQueueException = 1;
    DequeueException = 2;
    ExceptionException = 3;

    LastExceptionCode = 3;

  type
    Link = ^QueueComponent;
    QueueComponent = record
                       Datum: Element;
                       Next: Link;
                     end;
    QueueHeader = record
                    Front, Rear: Link
                  end;

  { The QueueExceptionHandler procedure, which is not exported, is invoked
    whenever one of the preconditions for the successful execution of a
    procedure is found to be false.  It prints out an appropriate
    explanation of the exception just before the program is halted. }

  procedure QueueExceptionHandler (ExceptionCode: integer);
  begin
    if (ExceptionCode < FirstExceptionCode) or
           (LastExceptionCode < ExceptionCode) then
      ExceptionCode := ExceptionException;
    write (StdErr, 'Exception #', ExceptionCode : 1, ' in module Queues: ');
    case ExceptionCode of
      UninitializedQueueException:
        WriteLn (StdErr, 'An operation was applied to an uninitialized ',
                 'queue.');
      DequeueException:
        writeln (StdErr, 'An empty queue was passed as argument to the ',
                 'Dequeue function.');
      ExceptionException:
        writeln (StdErr, 'The QueueExceptionHandler procedure received ',
                 'an unknown exception code.');
    end
  end;

  function CreateQueue: Queue;
  var
    Result: Queue;
      { the queue that is constructed }
  begin
    New (Result);
    Result^.Front := Nil;
    Result^.Rear := Nil;
    CreateQueue := Result
  end;

  function EmptyQueue (Q: Queue): Boolean;
  begin
    Assert (Q <> Nil, UninitializedQueueException, QueueExceptionHandler);
    EmptyQueue := (Q^.Front = Nil)
  end;

  function Dequeue (var Q: Queue): Element;
  var
    OldLink: Link;
      { a pointer to the component to be removed from the queue }
  begin
    Assert (Q <> Nil, UninitializedQueueException, QueueExceptionHandler);
    Assert (Q^.Front <> Nil, DequeueException, QueueExceptionHandler);
    Dequeue := Q^.Front^.Datum;
    OldLink := Q^.Front;
    Q^.Front := OldLink^.Next;
    if Q^.Front = Nil then
      Q^.Rear := Nil;
    Dispose (OldLink)
  end;

  procedure Enqueue (Item: Element; var Q: Queue);
  var
    NewLink: Link;
      { a pointer to the component to be added to the queue }
  begin
    Assert (Q <> Nil, UninitializedQueueException, QueueExceptionHandler);
    New (NewLink);
    NewLink^.Datum := Item;
    NewLink^.Next := Nil;
    if Q^.Rear = Nil then
      Q^.Front := NewLink
    else
      Q^.Rear^.Next := NewLink;
    Q^.Rear := NewLink
  end;

  procedure DeallocateQueue (var Q: Queue);
  var
    Traverser: Link;
      { a pointer to successive components of the underlying linked list }
    Trailer: Link;
      { a similar pointer, lagging one component behind Traverser }
  begin
    Assert (Q <> Nil, UninitializedQueueException, QueueExceptionHandler);
    Traverser := Q^.Front;
    while Traverser <> Nil do begin
      Trailer := Traverser;
      Traverser := Traverser^.Next;
      Dispose (Trailer)
    end;
    Dispose (Q);
    Q := Nil
  end;

end.
A much faster implementation of the radix sort can be obtained by manipulating the Link pointers directly; for instance, instead of using Dequeue and Enqueue to transfer records from the component queues into the master queue, one could rebuild it by linking the last item in each component queue to the first item in the next. However, the handling of the special cases that arise when some of the component queues are empty obscures the working of the radix-sorting algorithm, so the slower but simpler version is presented here.


This document is available on the World Wide Web as

http://www.math.grin.edu/~stone/courses/fundamentals/radix-sorting.html

created April 18, 1996
last revised August 16, 1996