# Class 12: Representing Real Numbers

Back to Predicates. On to Fundamental Types in C.

Held: Friday, 7 February 2003

Summary: Today we consider mechanisms for representing real numbers, particularly the IEEE standards.

Related Pages:

Due

Assignments:

• .

Notes:

• I note that a number of assignments were turned in at about 3 a.m. As I said the first day, I'd prefer that you get to sleep at a reasonable hour and turn in the assignment late.
• A few former students reminded me that I tend to expect too much time from my students. I'll be distributing a quick, anonymous, survey to help me gauge time.

Overview:

• Real numbers.
• Method one: Rationals.
• Method two: Fixed precision.
• Method three: Floating point.
• Basics of the IEEE Standard.
• Some Special Cases.
• Learning from Java.

## Representing Values in Binary

• We know how to represent a subset of the integers as a fixed-length sequence of 0's and 1's.
• Can we do for other values?
• Representing many other kinds of values can often be reduced to the problem of representing integers. All you need to do is create a collating sequence that maps the values to integers.
• The ASCII sequence is a famous collating sequence for mapping the characters on an American keyboard to numeric values.
• C makes the design decision that since there's a natural collating sequence for characters, characters are, in effect, numbers.

## Representing Reals

• But what about real numbers? There isn't a natural mapping of reals onto the integers.
• See your Math professor for more details.
• If we're going to used a fixed number of bits, we're unlikely to be able to represent many irrational numbers or particularly small numbers, so we will never be able to get all the reals in a particular range. We must approximate them.
• We'll design a few different representations and I'll challenge you to critique them.
• You are allowed to reflect on the reading in your criticisms.

### Reals as Rationals

• Since we have no irrational reals, we can represent many real numbers fairly easily as rational numbers: ratios of two integers.
• Reserve some number of bits for the numerator, represented as a signed integer in the notation of your choice.
• Reserve some number of bits for the denominator.

### Fixed-Point Numbers

• A common early representation of reals was the fixed-point strategy.
• Given a sequence of bits to use to represent a real number, select a position for the decimal point (except that it should be called the binary point).
• The column to its left is the 1's column, the next column to the left is the 2's, then the 4's, and so on and so foth.
• The column to the right of the point is the 1/2's column. The next column is the 1/4's, and so on and so forth.

## Detour: Scientific Notation

• Rather than trying to design a representation from scratch, perhaps we should reflect on common practices in other disciplines.
• Many folks who work with real numbers (including approximations of real numbers) use scientific notation, as in 1.231 x 105.
• What is involved in scientific notation?
• A sign, which indicates whether the number is positive or negative,
• A mantissa, which gives the primary digits of the number.
• A base of exponentiation (in this case, 10).
• An exponent.
• Might we not use the same general technique in designing a representation for real numbers?
• Such notations are called floating point because the point moves depending on the exponent.

## IEEE Single-Precision Floating-Point Numbers

• Rather than having each computer manufacturer decide on a particular floating point representation, standards groups worked do design a few standard representations.
• Two of the most popular representations are IEEE Single-Precision and Double-Precision floating-point numbers.
• We'll concentrate on single-precision numbers. Double-precision numbers follow similar conventions.
• Single-precision numbers use 32 bits.
• One bit gives the sign.
• Eight bits give the exponent (represented in bias-127 notation).
• The remaining 23 bits give the mantissa, using fixed-point notation.
• Note that the base of exponentiation is not represented. It is instead fixed at 2.
• There are some important restrictions on the various parts.
• The exponent must be between -126 and 127
• The mantissa must be no smaller than 1.0 and cannot be as large as 2.0.
• Note the clever trick: Given that restriction on the mantissa, you don't need to represent the first bit of the mantissa (since it's always 1).

## Special Cases

• There are some parts the previous discussion ignores.
• The range of exponents only gives 254 different values. We have two additional unused values, corresponding to bit sequences 00000000 and 11111111.
• If the mantissa can be no smaller than 1.0, how do we represent 0?

### Representing Small Values

• If the exponent consists of only 0's, we use -126 as the exponent and restrict the mantissa to values no less than 0 and less than 1.
• Hence, 0 has an exponent of all 0's and a mantissa of all 0's.

### Error Values

• If the exponent consists of only 1's, the float represents some special value.
• Positive infinity uses all 0's for the mantissa, as does negative infinity (sign bit gives the sign).
• Anything else is NaN.

## Learning from Java

• Here's a simple Java program that prints out a floating point number supplied on the command line.
```
/**
* Print the bits in the represntation of a floating-point number
* supplied on the command line.
*
* @author Samuel A. Rebelsky
* @version 1.0 of February 2003
*/
public class PrintFloatBits
{
/**
* Grab the floating-point number from the command line, convert
* it to bits, and print 'em out.
*/
public static void main(String[] args) {
// (1) Convert the string to a float.
float f = Float.parseFloat(args[0]);
// (2) Convert the float to an integer with equivalent bit pattern.
int i = Float.floatToIntBits(f);
// (3) Conver the integer to a bit string (missing leading 0's).
String bits = Integer.toBinaryString(i);
// (4) Print out the result.
System.out.println(args[0] + " is represented as " + bits);
// (5) Be nice to the operating system.
System.exit(0);
} // main(String[])

} // class PrintFloatBits

```
• If there's time, we'll try running it and analyzing the results.
• You may want to extend it to
• Insert leading 0's so that all 32 bits appear.
• Separate the number into its three components.
• Print the number using the decimal representation of its 3 components.

## History

Tuesday, 7 January 2003 [Samuel A. Rebelsky]

• Created generic version to set up course.

Friday, 7 February 2003 [Samuel A. Rebelsky]

• Filled in the details.

Back to Predicates. On to Fundamental Types in C.

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Fri May 2 14:20:21 2003.
The source to the document was last modified on Fri Feb 7 13:56:25 2003.
This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS195/2003S/Outlines/outline.12.html`.

You may wish to validate this document's HTML ; ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu