Real numbers

Real numbers as an abstract data type

The real numbers -- mathematically characterized as limits of bounded sets of rational numbers -- can also be treated as an abstract data type, though they pose even more difficult implementation problems than the integers. Since the set of real numbers is indenumerable, no system of numeration can give a name to every one of them, and no implementation can provide a representation for each one; the implementer must choose which ones shall be represented exactly and which ones represented by approximations. Indeed, between any two real values, there are indenumerably many others, so that an implementation can exactly represent only a subset of the real numbers in any range.

This handout uses ordinary decimal numeration for reals, so that it represents exactly only those real numbers that are finite sums of integer powers of 10.

Two reals that cannot be represented exactly in this system of numeration are important enough to receive names:

pi, the ratio of the circumference of a circle to its diameter.

e, the base of natural logarithms.

It would be pleasant if all of the operations defined on real numbers were continuous. (Loosely speaking, a operation is continuous if applying it to any sequence of real values that approach a limit yields values that also converge on a limit.) Unfortunately, many common and necessary mathematical operations, such as taking the reciprocal of a real number, do not have this property: taking the reciprocals of a succession of reals that approach zero as a limit yields a sequence of results that diverges to infinity. Hence the use of approximate representations can have devastating effects in some computations; the tiny rounding error can be disproportionately inflated when such operations are applied.

The following operations belong to my proposed interface for the real data type. First, the basic arithmetic operations:

negate
Input: negand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: The sum of negand and result is 0.0.

absolute-value
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: result is the magnitude of operand, that is, its distance (in either direction) from 0.0.

add
Inputs: augend and addend, both real numbers.
Output: sum, a real number.
Preconditions: none.
Postcondition: sum is the sum of augend and addend.

subtract
Inputs: minuend and subtrahend, both real numbers.
Output: difference, a real number.
Preconditions: none.
Postcondition: minuend is the sum of difference and subtrahend.

multiply
Inputs: multiplicand and multiplier, both real numbers.
Output: product, a real number.
Preconditions: none.
Postcondition: product is the product of multiplicand and multiplier.

divide
Inputs: dividend and divisor, both real numbers.
Output: quotient, a real number.
Precondition: divisor is not 0.0.
Postconditions: dividend is the product of quotient and divisor.

modulo
Inputs: moduland and modulus, both real numbers.
Output: result, a real number.
Precondition: modulus is not 0.0.
Postconditions: The difference between moduland and result is an exact integer multiple of modulus. The magnitude of result is less than the magnitude of modulus. If result is not 0.0, its sign is the same as the sign of modulus.

fractional-part
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: The difference between operand and result is an integer. result is greater than -1.0 and less than 1.0. If result is not 0.0, it has the same sign as operand.

whole
Input: operand, a real number.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if operand is an integer value, a whole number, and false if it is not.

exponential
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: result is the result of raising e to the power operand.

natural-logarithm
Input: operand, a real number.
Output: result, a real number.
Precondition: operand is positive.
Postcondition: operand is the result of raising e to the power result.

raise
Inputs: base and exponent, both real numbers.
Output: power, a real number.
Preconditions: Either exponent is an integer or base is not negative. Either exponent is not negative or base is not 0.0.
Postconditions: power is the result of raising base to the power of exponent. If both base and exponent are 0.0, power is 1.0.

logarithm
Inputs: base and power, both real numbers.
Output: exponent, a real number.
Preconditions: Both base and exponent are positive. base is not 1.0.
Postconditions: power is the result of raising base to the power of exponent.

The next seven operations are special cases of the preceding ones that occur frequently enough to be treated separately:

twice
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: result is the product of operand and 2.0.

half
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: operand is the product of result and 2.0.

reciprocal
Input: operand, a real number.
Output: result, a real number.
Precondition: operand is not 0.0
Postcondition: The product of operand and result is 1.0.

square
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: result is the result of raising operand to the power 2.0.

cube
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: result is the result of raising operand to the power 3.0.

square-root
Input: operand, a real number.
Output: result, a real number.
Precondition: operand is not negative.
Postcondition: result is the result of raising operand to the power 0.5.

binary-logarithm
Input: operand, a real number.
Output: result, a real number.
Precondition: operand is positive.
Postcondition: operand is the result of raising 2.0 to the power result.

Next, some trigonometric functions:

sine
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: result is the sine of operand.

cosine
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postcondition: result is the cosine of operand.

tangent
Input: operand, a real number.
Output: result, a real number.
Precondition: The result of dividing operand by half of pi is not an odd integer.
Postcondition: result is the tangent of operand.

arc-sine
Input: operand, a real number.
Output: result, a real number.
Precondition: operand is greater than or equal to -1.0 and less than or equal to 1.0.
Postconditions: operand is the sine of result. result is greater than or equal to half of the negative of pi and less than or equal to half of pi.

arc-cosine
Input: operand, a real number.
Output: result, a real number.
Precondition: operand is greater than or equal to -1.0 and less than or equal to 1.0.
Postconditions: operand is the cosine of result. result is non-negative and less than or equal to pi.

arc-tangent
Input: operand, a real number.
Output: result, a real number.
Preconditions: none.
Postconditions: operand is the tangent of result. result is greater than the negative of half of pi and less than half of pi.

ratio-arc-tangent
Inputs: numerator and denominator, both real numbers.
Output: result, a real number.
Preconditions: Either numerator is not 0.0 or denominator is not 0.0.
Postconditions: One of the following conditions holds:

There are four common ways of mapping real numbers to nearby integers:

round
Input: operand, a real number.
Output: result, an integer.
Preconditions: none.
Postconditions: If there is an integer that differs from operand by less than any other integer, that integer is result. If there are two integers that differ from operand by 0.5, result is the even one.

truncate
Input: operand, a real number.
Output: result, an integer.
Preconditions: none.
Postconditions: Either operand is negative and result is the least integer not less than operand, or operand is non-negative and result is the greatest integer not greater than operand.

floor
Input: operand, a real number.
Output: result, an integer.
Preconditions: none.
Postconditions: result is the greatest integer not greater than operand.

ceiling
Input: operand, a real number.
Output: result, an integer.
Preconditions: none.
Postconditions: result is the least integer not less than operand.

Next, the comparison operations:

equal
Inputs: left-operand and right-operand, both real numbers.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if the operands are the same real number, false if they are different real numbers.

unequal
Inputs: left-operand and right-operand, both real numbers.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if the operands are different real numbers, false if they are the same real number.

less
Inputs: left-operand and right-operand, both real numbers.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if left-operand is less than right-operand, false if it is greater than right-operand or if both operands are the same real number.

less-or-equal
Inputs: left-operand and right-operand, both real numbers.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if left-operand is less than right-operand or if both operands are the same real number, false if left-operand is greater than right-operand.

greater
Inputs: left-operand and right-operand, both real numbers.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if left-operand is greater than right-operand, false if it is less than right-operand or if both operands are the same real number.

greater-or-equal
Inputs: left-operand and right-operand, both real numbers.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if left-operand is greater than right-operand or if both operands are the same real number, false if left-operand is less than right-operand.

major
Inputs: left-operand and right-operand, both real numbers.
Output: result, a real number.
Preconditions: none.
Postcondition: result is the greater of the operands.

minor
Inputs: left-operand and right-operand, both real numbers.
Output: result, a real number.
Preconditions: none.
Postcondition: result is the lesser of the operands.

Again, some special cases of the preceding predicates are worth defining:

zero
Input: operand, a real number.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if operand is 0.0, false if it is any other real.

negative
Input: operand, a real number.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if operand is less than 0.0, false if it is greater or if it is 0.0.

positive
Input: operand, a real number.
Output: result, a Boolean.
Preconditions: none.
Postcondition: result is true if operand is greater than 0.0, false if it is less or if it is 0.0.

Finally, the input and output operations:

read
Input: source, a data source (e.g., a file, the keyboard, a device).
Outputs: legend, a real number, and success, a Boolean.
Preconditions: none.
Postcondition: Either some representation of a real number has been extracted from source and legend is that real number,or an input error of some kind has occurred and success is false.

write
Inputs: target, a data sink (e.g., a file, a window, a device), and scribend, a real number.
Outputs: none.
Preconditions: none.
Postcondition: A representation of scribend has been appended to target.

Reals in standard Pascal

Values of the standard Pascal Real type are almost always represented as data of a fixed size, and therefore constitute a finite selection of the real numbers, with a least and greatest member. Usually, real values less than the least member of this elite set or greater than its largest member are treated as unrepresentable, though in some implementations they are identified with ``negative infinity'' and ``positive infinity,'' which are more or less arbitrary bit sequences that are tossed into the Real type even though they do not accurately represent any real values. As with the Integer data type, preconditions are added to all of the operations, asserting that the inputs and outputs of the operation all lie within the specified range.

Real values that lie within the range bounded by the least and greatest members of the elite set, but are not themselves members of that set, are equated to the closest real value that can be represented exactly, by an implementation-dependent rounding mechanism. All operations on values of the Real type implicitly presuppose that the errors introduced by such approximations are negligible. In practice, this presupposition is often false. Approximation is a frequent cause of error in computer programs. Do not use the Real data type if there is an alternative, such as integers with a scaling factor or ratios.

In Standard Pascal, as in the presentation above, decimal numerals are used as names for values of the Real type. It is also possible to use a variant of scientific notation in which the decimal numeral is followed by the letter E and an integer exponent, representing a power of ten by which the value of the numeral preceding the E should be multiplied.

pi and e are not predefined, but can be introduced as constants. Of course, the numerals shown do not give the exact values of pi and e. However, under HP Pascal, the correct value of pi and the correct value of 3.14159265358979324 are approximated by the same real number, so it is impossible to improve on the given definition; and similarly for e.

const
  Pi = 3.14159265358979324;
  E = 2.71828182845904524;
Of the operations listed above, standard Pascal provides the following: The standard does not require that the range of the ArcTan function be (-pi/2, pi/2), but it is normally so implemented.

In addition, standard Pascal provides a Round function that differs from the one prescribed for the abstract data type only in some of the cases where operand is halfway between two adjacent integers. In these cases, Round always returns the integer that is farther from 0, which is not necessarily the even one.

Supplying the missing operations

We proceed to supplement Pascal's predefined functions with programmer-defined functions and procedures to complete the implementation of the abstract data type:
  { In this implementation of the Modulo operation, the idea is to subtract
    away integer multiples of Modulus from Moduland until the remainder is
    smaller than Modulus.  (This is easiest to understand if both Moduland
    and Modulus are non-negative, so I've done some preprocessing to
    establish this condition initially and some postprocessing to adjust
    the result at the end in case either operand is negative.)

    To keep the number of subtractions small, the function first doubles
    the modulus (stored in the variable Subtrahend) repeatedly until it is
    about to become greater than Moduland, thus obtaining the largest
    power-of-two multiple of Modulus that will fit into Moduland.  It
    subtracts this multiple away and then backtracks through the smaller
    power-of-two multiples of Modulus, subtracting away any of them that
    fits into the remainder of Moduland, until it reaches Modulus itself.
    The final remainder is the desired value of Modulo (apart from the
    adjustments for signed operands). }

  function Modulo (Moduland: Real; Modulus: Real): Real;
  var
    NegativeModulus: Boolean;
    NegativeModuland: Boolean;
    Subtrahend: Real;
  begin
    { Assert (Modulus <> 0.0); }

    { Adjust the signs of the operands. }

    NegativeModulus := (Modulus < 0.0);
    Modulus := Abs (Modulus);
    NegativeModuland := (Moduland < 0.0);
    Moduland := Abs (Moduland);

    { Find the appropriate power-of-two multiple of Modulus. }

    Subtrahend := Modulus;
    while Subtrahend <= Moduland / 2 do
      Subtrahend := Subtrahend * 2;

    { Back down through the power-of-two multiples of Modulus, subtracting
      any that fit into what's left of Moduland. }

    while Modulus <= Subtrahend do begin
      if Subtrahend <= Moduland then
        Moduland := Moduland - Subtrahend;
      Subtrahend := Subtrahend / 2
    end;

    { Compensate for a negative Moduland. }

    if (Moduland <> 0.0) and NegativeModuland then
      Moduland := Modulus - Moduland;

    { Compensate for a negative Modulus. }

    if (Moduland <> 0.0) and NegativeModulus then
      Modulo := Moduland - Modulus
    else
      Modulo := Moduland
  end;

  function FractionalPart (Operand: Real): Real;
  begin
    if Operand < 0.0 then
      FractionalPart := Modulo (Operand, -1.0)
    else
      FractionalPart := Modulo (Operand, 1.0)
  end;

  function Whole (Operand: Real): Boolean;
  begin
    Whole := (Modulo (Operand, 1.0) = 0.0)
  end;

  { Integer powers are computed by a method analogous to that used in
    integers.p, except that if the initial exponent is negative, it is
    changed to the equivalent positive exponent and the reciprocal taken
    afterwards.  Non-integer powers are computed by logarithms: Base ^
    Exponent is the exponential of the product of the Exponent and the
    logarithm of Base. }

  function Raise (Base: Real; Exponent: Real): Real;

    function Helper (Exponent: Real): Real;
    var
      HalfExponent: Real;
    begin
      if Exponent = 0.0 then
        Helper := 1.0
      else begin
        HalfExponent := Exponent / 2;
        if Whole (HalfExponent) then
          Helper := Sqr (Helper (HalfExponent))
        else
          Helper := Base * Sqr (Helper ((Exponent - 1) / 2))
      end
    end;

  begin { function Raise }
    { Assert (Whole (Exponent) or (0.0 <= Base)); }
    { Assert ((0.0 <= Exponent) or (Base <> 0.0)); }
    if Whole (Exponent) then begin
      if 0.0 <= Exponent then
        Raise := Helper (Exponent)
      else
        Raise := 1 / Helper (-Exponent)
    end
    else if Base = 0.0 then
      Raise := 0.0
    else
      Raise := Exp (Exponent * Ln (Base))
  end;

  function Logarithm (Base: Real; Power: Real): Real;
  begin
    { Assert ((0.0 < Base) and (0.0 < Power) and (Base <> 1.0)); }
    Logarithm := Ln (Power) / Ln (Base)
  end;

  function Twice (Operand: Real): Real;
  begin
    Twice := Operand * 2
  end;

  function Half (Operand: Real): Real;
  begin
    Half := Operand / 2
  end;

  function Reciprocal (Operand: Real): Real;
  begin
    { Assert (Operand <> 0.0); }
    Reciprocal := 1 / Operand
  end;

  function Cube (Operand: Real): Real;
  begin
    Cube := Operand * Operand * Operand
  end;

  function BinaryLogarithm (Operand: Real): Real;
  const
    Ln2 = 0.69314718055994531; { the natural logarithm of 2 }
  begin
    { Assert (0.0 < Operand); }
    BinaryLogarithm := Ln (Operand) / Ln2
  end;

  function Tangent (Operand: Real): Real;
  begin
    { Assert (Modulo (Operand - Half (Pi), Pi) <> 0.0); }
    Tangent := Sin (Operand) / Cos (Operand)
  end;

  function ArcSine (Operand: Real): Real;
  begin
    { Assert ((-1.0 <= Operand) and (Operand <= 1.0)); }
    if Operand = -1.0 then
      ArcSine := -Pi / 2
    else if Operand = 1.0 then
      ArcSine := Pi / 2
    else
      ArcSine := ArcTan (Operand / Sqrt (1.0 - Sqr (Operand)))
  end;

  function ArcCosine (Operand: Real): Real;
  begin
    { Assert ((-1.0 <= Operand) and (Operand <= 1.0)); }
    ArcCosine := Pi / 2 - ArcSine (Operand)
  end;

  function RatioArcTangent (Numerator, Denominator: Real): Real;
  begin
    { Assert ((Numerator <> 0.0) or (Denominator <> 0.0)); }
    if Denominator = 0.0 then begin
      if Numerator < 0.0 then
        RatioArcTangent := -Pi / 2
      else
        RatioArcTangent := Pi / 2
    end
    else if 0.0 < Denominator then
      RatioArcTangent := ArcTan (Numerator / Denominator)
    else if Numerator < 0.0 then
      RatioArcTangent := ArcTan (Numerator / Denominator) - Pi
    else
      RatioArcTangent := ArcTan (Numerator / Denominator) + Pi
  end;

  function Floor (Operand: Real): Integer;
  begin
    Floor := Round (Operand - Modulo (Operand, 1.0))
  end;

  function Ceiling (Operand: Real): Integer;
  begin
    Ceiling := Round (Operand + Modulo (-Operand, 1.0))
  end;

  function Major (LeftOperand, RightOperand: Real): Real;
  begin
    if LeftOperand < RightOperand then
      Major := RightOperand
    else
      Major := LeftOperand
  end;

  function Minor (LeftOperand, RightOperand: Real): Real;
  begin
    if LeftOperand < RightOperand then
      Minor := LeftOperand
    else
      Minor := RightOperand
  end;

  function Zero (Operand: Real): Boolean;
  begin
    Zero := (Operand = 0.0)
  end;

  function Negative (Operand: Real): Boolean;
  begin
    Negative := (Operand < 0.0)
  end;

  function Positive (Operand: Real): Boolean;
  begin
    Positive := (0.0 < Operand)
  end;

This document is available on the World Wide Web as

http://www.math.grin.edu/~stone/courses/fundamentals/reals.html

created July 24, 1996
last revised September 18, 1996

John David Stone (stone@math.grin.edu)