Fundamentals of Computer Science II (CSC-152 97F)

[News] [Basics] [Syllabus] [Outlines] [Assignments] [Examples] [Readings] [Bailey Docs]


Machine-level data types

Although most programming languages encourage the programmer to believe that such data types as characters, integers, and reals are primitive and unanalyzable, this is an illusion. Computers represent values of these types in ways that correspond more directly to the nature and structure of their electronic components, and they operate on such values in ways that correspond more directly to circuit designs that are simple and inexpensive to make. Pascal, Scheme, and Java's simple data types are abstractions; the authors of a compiler for these languages (or of a Java Virtual Machine) have to implement these abstractions, using the machine-level data types provided by the particular computer that will run the executable programs that the compiler constructs. In addition, the designer of the base computer will need to implement the corresponding machine-level data types.

The designs of computers (i.e., the natures and structures of their components) are very diverse, but a few generalizations hold for all, or almost all, contemporary computers. One fundamental principle is that the smallest components in which data of any kind are stored are bistable, which means they are capable of taking on either of two distinguishable states and remaining in either state for a long time. Bistability may depend upon power being supplied to the component; some bistable devices cannot be relied on to retain their state if power is shut off; these are said to be volatile.

Different kinds of computer hardware use different physical components as bistable devices, and the technologies have also varied enormously over time. In practice, many of the bistable devices have a continuous range of values, and a dividing point is used to determine which of the two states they are in. For our purposes, we don't need to make any assumptions about their physical structure, because the ``programming interface'' to any bistable device is the same: the programmer can set it to either of its two states, and can determine which state it was most recently set to. A toggle-style light switch is a familiar model of a bistable device: You can put it in the ``on'' or in the ``off'' position, and you can see by looking at it which position it is currently in.

There are several different ways of describing the states of a bistable device. Sometimes, as in the case of the light switch, it is natural to call them ``on'' and ``off.'' Putting a switch in the ``on'' position is sometimes called ``setting'' the device, and putting it in the ``off'' position is called ``clearing'' it; hence the states are sometimes called ``set'' and ``clear.'' But the most common notation uses the digits 0 and 1 for the ``off'' and ``on'' states. In the binary or base-two system of numeration, only these two digits are used, so they are often referred to as binary digits or, for short, bits. By extension, one also sometimes refers to a bistable device as a bit.

Since there are only two binary digits, a single bistable device can represent a datum only if the type of the datum has two members. For example, it can represent a Boolean datum, since the Boolean data type contains only the two values true and false. (Which binary digit represents true is more or less arbitrary, but it is conventional to use 1 for true and 0 for false.) However, most data types contain many more possible values and are therefore represented at the machine level by sequences of bits.

In most computer designs, it turns out to be simplest and most cost-effective to arrange bistable devices in groups of fixed sizes, which can be treated as units and connected up with other devices in parallel, so that data can be copied from one group of bistable devices to another more quickly. The term byte is often used for a small group of bits, typically the smallest group that forms an addressible unit in a computer memory. Typically, one character datum can be stored in a byte. The number of bits in a group varies from one machine design to another and also from one component of a given machine to another, but over the history of computer engineering designers have increasingly tended to use powers of two as convenient sizes for groups. For instance, although many past computers used six-bit bytes or nine-bit bytes -- that is, their memories were designed so as to treat groups of six or nine bistable devices as units -- the most popular present-day computers use eight-bit bytes.

Since each bit in a byte can independently be either a 0 or a 1, the number of different values that can be stored in an eight-bit byte is two to the eighth power, or 256. A datum of any type that includes 256 or fewer values, such as the character data type in Pascal, can be stored in an eight-bit byte.

It is conventional to number the bits that make up a byte from right to left, starting with 0. In an eight-bit byte, then, the leftmost bit is ``bit 7'' and the rightmost is ``bit 0.''

In most computers, the memory system is byte-addressible -- each byte of memory has an address, which is simply a sequence of bits that identifies it uniquely. (In fact, Java's references are simply the addresses of the bytes that make up an object.) The number of bits in an address depends both on the design of the memory system and on its maximum size; the most common address size at present is thirty-two bits. Interestingly, addresses can also be treated as data (stored in memory, copied from one component of the computer to another, operated on in various ways); in fact, in most implementations of Pascal, addresses are used to represent values of pointer types.

The other common grouping of bits is called a word. In most computer designs, the central processing unit contains a number of registers -- groups of bistable devices that are intensively wired up so that a variety of logical and arithmetic operations can be performed on them. All, or almost all, of these registers will be the same size, in the sense that they contain the same number of bits. This number is the computer's word size, and a word is simply a group of bistable devices, not necessarily located in the central processing unit, that is the same size as a register. Word sizes vary from computer to computer, but at present the most popular designs use thirty-two-bit, sixteen-bit, or sixty-four-bit words. The Hewlett-Packard workstations on MathLAN use thirty-two-bit words.

The number of different values that can be stored in a word is two to the power of the word size -- in the case of the HPs, two to the thirty-second power, or 4294967296. Values of the integer data type (for instance) are generally stored in words. Like the bits in a byte, the bits in a word are conventionally numbered from right to left, starting with 0.

Computer designs, like that of the HPs, in which addresses and words contain the same number of bits, have the advantage that registers can be used interchangeably for addresses and for other data. But not all computers have this feature, and there is no necessary connection between address size and word size.


[News] [Basics] [Syllabus] [Outlines] [Assignments] [Examples] [Readings] [Bailey Docs]

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

Source text last modified Tue Oct 14 09:03:51 1997.

This page generated on Tue Oct 14 09:04:33 1997 by SamR's Site Suite.

Contact our webmaster at rebelsky@math.grin.edu