Outline of Class 35: More Binary Representation
Held: Monday, April 6, 1998
- If you haven't done so already, read the
handout on IEEE representation
of real numbers.
- Start reading chapter 8 of Bailey.
- Any questions on
assignment five?
- Today's brown-bag lunch is on The Design of C++. C++ is
an object-oriented language with a syntax not unlike Javas, but with
some very different design decisions. I encourage you to attend
(but I won't be there due to a prior commitment).
- One disadvantage of the three standard representations of signed integers
(signed magnitude, one's complement, two's complement) is that all three
support only a fixed range of values.
- In a biased representation of a range of integers, you select
a bias (offset) and then (traditionally) use the standard
positive-only representation.
- If the bias is b, you represent n using the
positive-only representation of b+n.
- To represent the numbers from -1 to 254 in one byte, you use a bias of 1.
- To represent the numbers from -255 to 0 in one byte, you use a bias of 255.
- Alternately, you can think of a biased representation as taking
the series of bits, computing the corresponding positive integer,
and then subtracting the bias to determine the actual value
represented.
- The most typical bias is 2^(m-1), where m is the number of bits used
to represent the number. This is called excess 2^(m-1).
- For one byte, the bias is 128. This means that the smallest number
we can represent is -128 and the largest is 127.
- In most computers, characters are represented as integers, using
a mapping between integers and characters (and back again).
- For example, one might decide that 'A' was 66, 'B' was 67, and so
on and so forth.
- In designing such a code, you need to consider how many possible
characters you wish to allow. This helps you determine how many
bits or bytes to allow per character.
- It turns out that there are fewer than 128 different characters
available on the standard US keyboard. So, we might use seven bits
(even expanded to eight bits for an even byte) to represent our
characters.
- However, as we incorporate other languages or other symbols (such
as the copyright or registered trademark signs), we may need more bits
and bytes.
- At one point, each manufacturer had its own encoding. This made
transmission of data between machines more complicated than it should
be. These days, there are standards.
- The standard on most US-based computers is ASCII, the American
Standard Code for Information Interchange. It uses eight bits per
character. You can determine the
ASCII encoding by typing
man ascii on our HP's.
- At one time, IBM promoted EBCDIC (I have no idea what it stands for,
perhaps "extended binary coding of diverse characters"; a reference
tells me that it's "extended binary-coded decimal interchange code").
One interesting aspect of EBCDIC is that it doesn't code the characters
in sequence (that is, it's not guaranteed that if "A" has code n, then
"B" has code n+1).
- The big coding standard these days is Unicode. Java supports it,
and it's huge. I'm happy if you know it exists (you don't
need to know the details). Unicode uses two bytes per character.
- What if we want to deal with numbers that may have a fractional
part (something after the decimal point)?
- We need to think about the meaning of bits after the point.
Traditionally, we continue the meaning we use in decimal.
- The first bit after the binary point is 2^-1. The next bit
is 2^-2, and so on and so forth.
- For example 0.1 is 1/2, 0.01 is 1/4, and 0.11 is 3/4.
- Let's try some exercises in conversion (and think about our
conversion algorithm)
- Fraction = decimal = binary
- 7/16 = .4375 = ?
- 1/3 = .333... = ?
- 1/10 = .1 = ?
- Observe that this changes the numbers we can represent with
a finite number of digits. For example, our
handout suggests that
2/5 cannot be represented in a finite number of binary digits.
- Nonetheless, this seems like the best way to represent numbers
with fractional parts.
- Are there others? Yes. One might use sets of four bits to
represent decimal digits. This is clearly less efficient.
- However, there are still further design decisions to make. For
example, how do we place the decimal point?
- In fixed-precision or fixed-point representation,
you pick some number of bits that come
after the decimal point, and use those to represent the factional
part.
- This limits your accuracy for small numbers. For example, if you've
only allowed three bits after the decimal point, your accuracy is
limited to about 1/8. This means that you'd represent both (1/16) and
(-3/64) as 0.000.
- This limits the overall size of your numbers. For example, if you've
only allocated 13 bits to the whole part, your largest number can't
be bigger than 2^14-1 or about 16,000.
- However, computation is relatively cheap. You can simply use standard
integer computation and then shift the decimal point.
- On the other hand, this can limit accuracy.
- To handle the aforementioned problems,
you might instead let the decimal point move ("float") and use extra
bits to indicate where the decimal point is positioned.
- In floating-point representation, you use something similar
to scientific notation (+/- n.nnnn * 10^x), and represent
- the digits,
- the exponent, and
- the sign separately.
- For example, in decimal .125 might be represented as
-
+ for the sign
-
12 for the twelve
-
-1 for the exponent (10^-1)
- As in the cases above, some things get a little bit confusing as
we move to binary. In particular, our exponents are powers of
two, instead of powers of ten.
- So, you would not represent .125 as
-
for the sign
-
00111101 for the 125
-
11111111 for the -1 (in two's complement)
- Instead, you might represent .125 as
-
for the sign
-
00000001 for 1
-
11111101 for exponent (-3 in two's complement)
- Because .125 is 1/8 or .001 in fixed-precision binary.
- It turns out that mathematics are complicated in floating point.
Plauger tells us that floating point computations take up as much
microcode to implement the basic floating point operations as it
does to implement everything else on a typical small computer.
- Designing floating point representations (and computation) is still
nontrivial. You must still concern yourselves with a number of issues.
- How many bits will you use for each component?
- How will you represent each component? Signed-magnitude, two's
complement, as a biased value? Will you use the same representation
for each component, or different ones?
- Will you use a separate bit for the sign (in effect, doing
signed-magnitude)?
- The IEEE (Institute for Electrical and Electronics Engineers, or
some such) serves as a standards body for many issues in computing.
They issue language, protocol, design, and other standards.
- (The IEEE does a number of other things, but that is the most
pertinent to our current concerns.)
- One of their mostly widely used standards is the IEEE Standard
for Binary Floating-Point Arithmetic (IEEE standard 754)
which discusses not just representation of floating point numbers,
but also computation with those numbers. This standard was
released in 1985.
- As suggested earlier, some of the first issues
in the design of a floating point representation are how to allocate bits
and represent components.
- The IEEE single-precision representation uses 32 bits, with
- one bit for the sign (effectively using signed-magnitude)
- 23 bits for the mantissa
- eight bits for the exponent
- The actual order is
SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM
- The mantissa is represented as an unsigned value. The base position
of the decimal is right before the mantissa, although the exponent
can shift it.
- The exponent uses a 127-bias representation.
- But there are also some tricks ...
- The smallest exponent allowed is -126 (represented as 00000001) and the
largest exponent allowed is 127 (represented as 11111110).
- Observe that this leaves 00000000 and 11111111 as undefined exponent
strings.
- 00000000 is used for "close to zero" and effects other issues
- 11111111 is used for "error"
- In the standard representations (those not close to zero), a special
trick is used to get one more bit of accuracy.
- For nonzero numbers, it's clear that in standard scientific notation,
using binary (+/- b.bbb * 2^x), the mantissa will always be between
1 and 2.
- If it's less than one, we should simply shift the bits left and
decrease the exponent
- If it's more than two, we should shift the bits right
- So, we can just assume the leftmost bit is 1 and not bother including it
in our representation.
- This bit is called the hidden bit.
- In the representations of numbers close to zero, the hidden bit isn't
used.
- Exercises
- What is 0 1000 1000 00000000000000000000000?
- What is 1 1000 1000 00000000000000000000000?
- What is 0 0000 1000 00000000000000000000000?
- What is 0 1000 1000 01000000000000000000000?
- How do you represent 0?
- How do you represent 1?
- What is the smallest number you can represent?
- What is the largest number you can represent?