Outline of Class 35: More Binary Representation
Held: Monday, April 6, 1998
 If you haven't done so already, read the
handout on IEEE representation
of real numbers.
 Start reading chapter 8 of Bailey.
 Any questions on
assignment five?
 Today's brownbag lunch is on The Design of C++. C++ is
an objectoriented language with a syntax not unlike Javas, but with
some very different design decisions. I encourage you to attend
(but I won't be there due to a prior commitment).
 One disadvantage of the three standard representations of signed integers
(signed magnitude, one's complement, two's complement) is that all three
support only a fixed range of values.
 In a biased representation of a range of integers, you select
a bias (offset) and then (traditionally) use the standard
positiveonly representation.
 If the bias is b, you represent n using the
positiveonly representation of b+n.
 To represent the numbers from 1 to 254 in one byte, you use a bias of 1.
 To represent the numbers from 255 to 0 in one byte, you use a bias of 255.
 Alternately, you can think of a biased representation as taking
the series of bits, computing the corresponding positive integer,
and then subtracting the bias to determine the actual value
represented.
 The most typical bias is 2^(m1), where m is the number of bits used
to represent the number. This is called excess 2^(m1).
 For one byte, the bias is 128. This means that the smallest number
we can represent is 128 and the largest is 127.
 In most computers, characters are represented as integers, using
a mapping between integers and characters (and back again).
 For example, one might decide that 'A' was 66, 'B' was 67, and so
on and so forth.
 In designing such a code, you need to consider how many possible
characters you wish to allow. This helps you determine how many
bits or bytes to allow per character.
 It turns out that there are fewer than 128 different characters
available on the standard US keyboard. So, we might use seven bits
(even expanded to eight bits for an even byte) to represent our
characters.
 However, as we incorporate other languages or other symbols (such
as the copyright or registered trademark signs), we may need more bits
and bytes.
 At one point, each manufacturer had its own encoding. This made
transmission of data between machines more complicated than it should
be. These days, there are standards.
 The standard on most USbased computers is ASCII, the American
Standard Code for Information Interchange. It uses eight bits per
character. You can determine the
ASCII encoding by typing
man ascii
on our HP's.
 At one time, IBM promoted EBCDIC (I have no idea what it stands for,
perhaps "extended binary coding of diverse characters"; a reference
tells me that it's "extended binarycoded decimal interchange code").
One interesting aspect of EBCDIC is that it doesn't code the characters
in sequence (that is, it's not guaranteed that if "A" has code n, then
"B" has code n+1).
 The big coding standard these days is Unicode. Java supports it,
and it's huge. I'm happy if you know it exists (you don't
need to know the details). Unicode uses two bytes per character.
 What if we want to deal with numbers that may have a fractional
part (something after the decimal point)?
 We need to think about the meaning of bits after the point.
Traditionally, we continue the meaning we use in decimal.
 The first bit after the binary point is 2^1. The next bit
is 2^2, and so on and so forth.
 For example 0.1 is 1/2, 0.01 is 1/4, and 0.11 is 3/4.
 Let's try some exercises in conversion (and think about our
conversion algorithm)
 Fraction = decimal = binary
 7/16 = .4375 = ?
 1/3 = .333... = ?
 1/10 = .1 = ?
 Observe that this changes the numbers we can represent with
a finite number of digits. For example, our
handout suggests that
2/5 cannot be represented in a finite number of binary digits.
 Nonetheless, this seems like the best way to represent numbers
with fractional parts.
 Are there others? Yes. One might use sets of four bits to
represent decimal digits. This is clearly less efficient.
 However, there are still further design decisions to make. For
example, how do we place the decimal point?
 In fixedprecision or fixedpoint representation,
you pick some number of bits that come
after the decimal point, and use those to represent the factional
part.
 This limits your accuracy for small numbers. For example, if you've
only allowed three bits after the decimal point, your accuracy is
limited to about 1/8. This means that you'd represent both (1/16) and
(3/64) as 0.000.
 This limits the overall size of your numbers. For example, if you've
only allocated 13 bits to the whole part, your largest number can't
be bigger than 2^141 or about 16,000.
 However, computation is relatively cheap. You can simply use standard
integer computation and then shift the decimal point.
 On the other hand, this can limit accuracy.
 To handle the aforementioned problems,
you might instead let the decimal point move ("float") and use extra
bits to indicate where the decimal point is positioned.
 In floatingpoint representation, you use something similar
to scientific notation (+/ n.nnnn * 10^x), and represent
 the digits,
 the exponent, and
 the sign separately.
 For example, in decimal .125 might be represented as

+
for the sign

12
for the twelve

1
for the exponent (10^1)
 As in the cases above, some things get a little bit confusing as
we move to binary. In particular, our exponents are powers of
two, instead of powers of ten.
 So, you would not represent .125 as

for the sign

00111101
for the 125

11111111
for the 1 (in two's complement)
 Instead, you might represent .125 as

for the sign

00000001
for 1

11111101
for exponent (3 in two's complement)
 Because .125 is 1/8 or .001 in fixedprecision binary.
 It turns out that mathematics are complicated in floating point.
Plauger tells us that floating point computations take up as much
microcode to implement the basic floating point operations as it
does to implement everything else on a typical small computer.
 Designing floating point representations (and computation) is still
nontrivial. You must still concern yourselves with a number of issues.
 How many bits will you use for each component?
 How will you represent each component? Signedmagnitude, two's
complement, as a biased value? Will you use the same representation
for each component, or different ones?
 Will you use a separate bit for the sign (in effect, doing
signedmagnitude)?
 The IEEE (Institute for Electrical and Electronics Engineers, or
some such) serves as a standards body for many issues in computing.
They issue language, protocol, design, and other standards.
 (The IEEE does a number of other things, but that is the most
pertinent to our current concerns.)
 One of their mostly widely used standards is the IEEE Standard
for Binary FloatingPoint Arithmetic (IEEE standard 754)
which discusses not just representation of floating point numbers,
but also computation with those numbers. This standard was
released in 1985.
 As suggested earlier, some of the first issues
in the design of a floating point representation are how to allocate bits
and represent components.
 The IEEE singleprecision representation uses 32 bits, with
 one bit for the sign (effectively using signedmagnitude)
 23 bits for the mantissa
 eight bits for the exponent
 The actual order is
SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM
 The mantissa is represented as an unsigned value. The base position
of the decimal is right before the mantissa, although the exponent
can shift it.
 The exponent uses a 127bias representation.
 But there are also some tricks ...
 The smallest exponent allowed is 126 (represented as 00000001) and the
largest exponent allowed is 127 (represented as 11111110).
 Observe that this leaves 00000000 and 11111111 as undefined exponent
strings.
 00000000 is used for "close to zero" and effects other issues
 11111111 is used for "error"
 In the standard representations (those not close to zero), a special
trick is used to get one more bit of accuracy.
 For nonzero numbers, it's clear that in standard scientific notation,
using binary (+/ b.bbb * 2^x), the mantissa will always be between
1 and 2.
 If it's less than one, we should simply shift the bits left and
decrease the exponent
 If it's more than two, we should shift the bits right
 So, we can just assume the leftmost bit is 1 and not bother including it
in our representation.
 This bit is called the hidden bit.
 In the representations of numbers close to zero, the hidden bit isn't
used.
 Exercises
 What is 0 1000 1000 00000000000000000000000?
 What is 1 1000 1000 00000000000000000000000?
 What is 0 0000 1000 00000000000000000000000?
 What is 0 1000 1000 01000000000000000000000?
 How do you represent 0?
 How do you represent 1?
 What is the smallest number you can represent?
 What is the largest number you can represent?