Binary-coded decimal representations

The cost of converting integer data

Some application programs use integer data without ever operating on it arithmetically, usually because the integers are simply serial numbers or identification keys for which arithmetic operations are insignificant. (What would it mean to add two Social Security numbers, for instance?) In these cases, it is inefficient to spend time converting each integer numeral into the corresponding numeric value for the computer to use internally, and then converting it back again when the number is to be printed or displayed. The rationale for the conversions is that integer values can be operated on arithmetically; if no arithmetic is to be done, then there's no point in using integer variables to hold the data at all.

In fact, there are cases in which converting the data back and forth actually degrades it. For instance, when Social Security numbers are converted to and from integer values, the SSNs that begin with one or more leading zeroes lose those digits, so that (for instance) the number that is read in as 004-79-4226 becomes 4-79-4226 when it is written out.

Storing integers as character strings

In most such cases, it is sufficient to treat the integers as strings that happen to be composed of digit characters. In Pascal, comparisons between such strings will always give the same results as comparisons between the integer values they express, so the string representation is adequate even in applications that involve sorting and searching.

However, the string representation is not very economical in its use of space. Even though there are only ten possible digits, a full byte is required for each one. A nine-digit Social Security number requires nine bytes of memory; if it is converted to an integer, it requires only four bytes. If you have lots of Social Security numbers to keep track of and not all that much available memory, this difference could be significant. There's a trade-off between economical use of time (string representations) and economical use of memory (integer representations).

The binary-coded decimal representation

The binary-coded decimal representation of integers combines combine the advantages of the two approaches; on some machines, at least, it's almost as fast as the string representation and almost as small as the integer representation. Also, the common arithmetical operations of addition, subtraction, and comparison can be programmed fairly straightforwardly (or even, on some machines, provided in hardware, as processor instructions). The idea of the BCD representation is to take each decimal digit individually and convert it to an integer that can be stored in four bits---0000 for the digit 0, 0001 for 1, 0010 for 2, and so on up to 1001 for 9. Two of these four-bit representations can be packed, side by side, into one byte, so you wind up using only half as many bytes as there are digits in the decimal numeral. A Social Security number occupies four and a half (or, realistically, five) bytes of storage.

Since the conversion from a string of digit characters to the BCD representation simply involves subtracting ord ('0') from each character and storing the result into the right four bits of the appropriate byte, it's much faster than a full conversion to integer. For example, here's how it might be coded in HP Pascal, assuming a left-justified string of no more than MAXLEN digits:

type
   decimal_digit = 0 .. 9;
   bcd = packed array [1 .. MAXLEN] of decimal_digit;

procedure digit_string_to_bcd (digit_string: string[MAXLEN];
                               var result: bcd);
var
  bcd_position: integer;
    { counts off half-bytes in the BCD representation }
  position: integer;
    { counts off character positions in the string representation }
begin
  bcd_position := MAXLEN;
  for position := strlen (digit_string) downto 1 do begin
    result[bcd_position] := ord (digit_string[position]) - ord ('0');
    bcd_position := bcd_position - 1
  end;
  for bcd_position := bcd_position downto 1
    result[bcd_position] := 0
end;
The use of the packed array type ensures (in HP Pascal) that the four-bit representations of values in the integer subrange 0 .. 9 will be stored two to a byte.

Here's how to add two BCD representations:

procedure add_bcd (augend, addend: bcd; var sum: bcd);
var
  position: integer;
    { counts off digit positions, from right to left }
  carry: integer;
    { a carry from one digit position into the neighboring position to
      the left }
  place_sum: integer;
    { the sum of the digits in one digit position and the carry into that
      position }
begin
  carry := 0;
  for position := MAXLEN downto 1 do begin
    place_sum := augend[position] + addend[position] + carry;
    if place_sum < 10 then begin
      sum[position] := place_sum;
      carry := 0
    end
    else begin
      sum[position] := place_sum - 10;
      carry := 1
    end
  end
end;
An overflow occurs if the value of carry is 1 when the for-loop is finished; a fully developed implementation would include code to handle this situation.

Pascal doesn't permit direct comparison of two BCD representations, but they too are easily written and reasonably efficient. As an example, here's a ``less than'' function:

function less_than_bcd (first, second: bcd): Boolean;
var
  finished: Boolean;
    { indicates whether more digits must be compared }
  position: integer;
    { counts off digit positions in both bcds }
  first_digit, second_digit: integer;
    { corresponding single-digit values from first and second }
begin
  finished := FALSE;
  position := 1;
  while not finished do begin
    first_digit := first[position];
    second_digit := second[position];
    if first_digit < second_digit then begin
      less_than_bsd := TRUE;
      finished := TRUE
    end
    else if second_digit < first_digit then begin
      less_than_bsd := FALSE;
      finished := TRUE
    end
    else if position = MAXLEN then begin { identical bcds }
      less_than_bsd := FALSE
      finished := TRUE
    end
    else
      position := position + 1
  end
end;
The reason for recovering the individual digits and storing them in separate variables is that the subscripting operation on packed arrays is less efficient than for ordinary arrays; the processor must recover a whole byte and then extract just the part of the byte in which the array element is stored. So it makes sense to perform that operation only once and to save the result.

Using pack and unpack

Indeed, if your program calls for a lot of subscripting into a value that is stored as a packed array, it often makes sense to use the predefined procedure unpack to create a non-packed version of the array, operate on the result, and then use pack to assemble any results or changes into the smaller structures. For instance, one might revise the add_bsd procedure above so that all the subscripting is performed on non-packed arrays:

procedure add_bcd (augend, addend: bcd; var sum: bcd);
type
  non_packed_bcd = array [1 .. MAXLEN] of decimal_digit;
var
  np_augend, np_addend, np_sum: non_packed_bcd;
    { non-packed versions of the augend, addend, and sum }
  position: integer;
    { counts off digit positions, from right to left }
  carry: integer;
    { a carry from one digit position into the neighboring position to
      the left }
  place_sum: integer;
    { the sum of the digits in one digit position and the carry into that
      position }
begin
  unpack (augend, np_augend, 1);
  unpack (addend, np_addend, 1);
  carry := 0;
  for position := MAXLEN downto 1 do begin
    place_sum := np_augend[position] + np_addend[position] + carry;
    if place_sum < 10 then begin
      np_sum[position] := place_sum;
      carry := 0

    end
    else begin
      np_sum[position] := place_sum - 10;
      carry := 1
    end
  end;
  pack (np_sum, 1, sum)
end;
Whether this technique actually saves any time can be expected to vary from one machine architecture to another and from one Pascal compiler to another.


This document is available on the World Wide Web as

http://www.math.grin.edu/~stone/courses/fundamentals/bcd.html

created January 10, 1996
last revised January 10, 1996