Regardless of whether you decide to implement queues with arrays or with
lists, the type of the values you store into that underlying data structure
should be void *, since that is the form in which the enqueue
function receives them and the form in which dequeue and front_of_queue are expected to deliver them. It is pointless to use any
other pointer type for this purpose and dangerous to use int, since
C makes no guarantee that casting a void pointer to an int value and
back again preserves its value.
Indeed, on a computer that uses a 64-bit architecture, a void pointer is
likely to occupy eight bytes of storage and an int only four, so
that casting a void pointer to int drops the four most significant
bytes, with no hope of recovery. Bad idea.
Nor may you dereference the void pointer and store whatever is on the other
end by casting it to, say, int. The problem with an expression like
is that the caller did not necessarily store an int at the other end
of that pointer. It may have started out as a char * or as a
pointer to some complicated struct, and so that chopping out four
bytes from the value at the other end of the pointer and storing those four
bytes in a int often loses part of the datum. In any case, storing
only this int value during enqueueing will not enable you to
reconstruct the void pointer that the dequeue function should
return.
Nor can you use malloc to allocate storage dynamically for the
values at the other end of void pointers. You have no way to figure out
how large that value is. Writing something like
is incorrect, because the size of the pointer is likely to be different from (and less than) the size of the value to which it points. Writing something like
is very likely to fail unless there really is a string at the other end of
the data pointer, since a zero-valued byte can occur in the middle
of a struct; and if it doesn't, there's no reason to expect one to
occur immediately after the struct.
Dereferencing a void pointer is almost always the wrong thing to do.
Since theoretically there is no upper bound on the number of elements that
a queue can contain, and that number can vary widely and erratically during
the execution of a program, it is natural to store the elements of a queue
in a singly linked list, so that a new, dynamically allocated component can
be added every time a value is enqueued. Such a component will be freed
when the value, having passed through the queue, reaches the front and is
dequeued, or when the entire queue is freed by a call to free_queue.
Each component of a singly linked list is a structure, looking something like this:
An empty singly linked list is then conventionally represented by (struct queue_node *) NULL, and a non-empty singly linked list by a
pointer to a struct queue_node in which the stored_value
field is the first element of the list (the car, in Scheme terminology) and
the next field is the rest of the list (the cdr).
It is possible to write a queue implementation in which struct
queue_structure and struct queue_node are identical, but more
commonly the fields of struct queue_structure provide information
that speed up some common operations on queues. For instance, the
interface to a queue type sometimes includes a queue_size function
that reports the number of values currently stored in a given queue, with
the prototype
To support this function, struct queue_structure might well include
a size field that make_queue initializes to 0 (don't forget
this step!), enqueue increments, and dequeue decrements. In
the singly linked list implementation, one would otherwise have to compute
the size by traversing the list, counting the components along the way.
Under that approach, queue_size would have an O(n)
(linear) running time, whereas maintaining a size field allows the
function to return a value in O(1) (constant) time.
Similarly, a naive algorithm for enqueue would traverse the list to
get the last component before attaching the component containing the new
value, again yielding an O(n) running time. A better idea is
to add a field to struct queue_structure containing a pointer to the
most recently appended component: a rear pointer. The definition of
struct queue_structure might look like this:
The front pointer points to the struct queue_node containing
the least recently enqueued value in a non-empty queue; in an empty queue,
front will be (struct queue_node *) NULL. It doesn't make
any difference what value is stored in the rear field when a queue
is empty, but it is good practice to have make_queue store a null
pointer and also to have dequeue store a null pointer there when
removing the last component of the linked list, just so that only valid,
non-dangling pointers occur in the structure.
For the same reason, the enqueue function should store a null
pointer in the next field of the struct queue_node that it
allocates.
It's a little dangerous to have three kinds of pointers around (void
pointers for the actual values stored in the queues, pointers to
linked-list components, and pointers to struct queue_structure
headers), so programmers who adopt the linked-list implementation need to
take care that their local variables and, especially, return values are
conceptually of the correct type. (The C compiler is no help here, since
it tacitly casts any pointer to whichever type it thinks is needed.)
Alternatively, the elements of a queue can be stored in a (dynamically
allocated) array, in which case the struct queue_structure typically
comprises two fields, each of which holds an index into that array. In one
good implementation, the “front” index is the position occupied by
the least recently added element, and the “rear” index is the
(vacant) position that will be occupied, when the next enqueue operation
occurs, by the new element that it adds. In this representation, a queue
is empty if the two indices are equal (note: not if the values stored
at those indices are equal, but if the indices themselves are
equal). The rear index is incremented every time an element is enqueued,
the front index every time an element is dequeued. In each case, the
incrementation occurs after the array operation has been completed.
It is usual for both indices to “wrap around” to 0 when the high end of the array is reached, so that the positions at the low end of the array, occupied by values that are no longer in the queue, can be recycled for new values. The size of the array is then an upper bound on the number of elements that can be stored in the queue simultaneously, rather than on the total number of values that can pass through the queue during the entire execution of the program.
The array implementation is simpler and faster when the programmer can
guarantee that the upper bound on the size of the queue will never be
exceeded. In the absence of such a guarantee, the programmer must decide
what course of action to take if enqueue is called at a time when
there is no vacant position in which to store the new value. (In the
wraparound implementation described above, a problem arises even when there
is one vacant position remaining, because filling it and incrementing the
rear index would cause the two indices to be equal, a situation
indistinguishable from an empty queue.)
One possibility is to dynamically allocate a new, larger array, copy the
values in the existing array into it, and free the existing array. Since
this is a slow process, it is probably a good idea to allocate an array
that is, say, twice as large as the existing array, thus leaving room for a
lot more elements, in order to avoid or at least postpone another resizing
operation. On this plan, the struct queue_structure needs a third
field, to keep track of the size of the currently allocated array (not the
number of elements currently in the queue, but the amount of storage
allocated for queue elements, including both the positions that are
occupied by current elements and those that are not).
If it is inconvenient or impossible to allocate storage for a larger array,
the enqueue function may have to report an error and halt. It is
better to do this than to access storage locations outside of the array (by
allowing the indices to grow too large) or to overwrite queue elements (by
wrapping around into array positions that are already in use).
If you adopt this approach, don't forget to free the old array once the elements have been copied into the new one!
The dequeue and front_of_queue functions (and, less
obviously, the free_queue function) have preconditions that the
caller is supposed to satisfy. It is almost inevitable, in the use of any
program library, that someone will eventually call the procedure without
ensuring that the precondition is satisfied. The programmer of the queue
library must consider what her code should do in such a situation.
One possibility, of course, is to do nothing. In the case of dequeue or front_of_queue, the function then attempts to
dereference a null pointer and the caller's program will crash with the
unhelpful error message “Segmentation fault”. Well, after all, it
is the caller's fault -- he should have checked the documentation and
made sure that the documented precondition was satisfied!
Another possibility is to use an assertion to check that the precondition is met, writing something like
or even
to catch the case in which the user tries to dequeue from a queue that has been freed, or one that was never allocated in the first place.
A more traditional approach, which gives the programmer greater flexibility in reporting and handling precondition failures, is to test the precondition explicitly and write a warning or error message to standard error:
If you want program execution to stop when a precondition is not met,
invoke the library procedure exit, giving it a nonzero argument:
EXIT_FAILURE is defined in stdlib.h to be a standard
nonzero value. You can define your own status codes, of type int,
if you want to give a more precise diagnosis of the cause of the error. On
GNU/Linux and Unix systems, the shell running in a terminal window can
report the status code of the last program that it ran. On MathLAN, for
instance, typing the command
reports that status code. (It's typically 0, which is the conventional status code for a program that terminated normally.)
If you're trying to perform some operation on each of the elements of a singly linked list, it won't work to write
This is wrong for two reasons: (0) It skips over the last element of a
non-empty list, which never gets operated on, and (1) it crashes if the
list is empty. (There is no traverser->next when traverser
is a null pointer.)
The code you want is
The code at the end of the preceding section, for traversing a singly
linked list, doesn't quite work if the operation that you're performing on
each element is to free the storage that it is occupying! Since this
operation invalidates the value of traverser itself, you can't turn
around and use that value to advance to the next node in the list.
Instead, write
It is important to traverse the linked-list structure and free each
component of the linked list, since the queue module itself allocated those
components (in calls to enqueue) and knows how to get their
addresses. The caller has no other way to free the components except by
invoking dequeue repeatedly until the queue is empty, and not all
users of the queue module will even consider that approach.
Note, that it is not necessary and not desirable to include the step
to deallocate the storage to which the void pointer in the datum
field points. The queue functions are not responsible for that storage.
They didn't allocate it, and they have no way of knowing where it came from
or even whether it was dynamically allocated to begin with. Moreover, the
caller may still have another copy of the pointer stored in the datum and may have some use for the value at the other end. Don't take it
away just because the queue no longer needs access to it!
The free_queue function should also deallocate the storage for the
"header" structure:
However, it is an error to write
in the free_queue function, since address_of_queue itself is
not dynamically allocated (it does not acquire its value from malloc, either directly or indirectly).
In an array implementation, free_queue should deallocate the space
for the array and for the header:
but, again, not for the stored data (free(q->elements[index]); would
be incorrect) and not for the storage location containing the address of
the queue (free(address_of_queue); would still be incorrect).
address_of_queue
The specification for the free_queue function says that, after the
queue has been freed, the pointer to that queue should be overwritten with
a null pointer. (This practice helps to prevent the circulation of
“dangling” pointers -- pointers to storage locations that have been
deallocated, so that the pointers are no longer valid -- and also makes it
easier to detect and correct the common error of attempting to free the
same queue more than once.)
The code that performs this step is simply
and it would be conventional to place this step at the end of the
free_queue procedure, after the traversal and deallocation of the
queue components.
Incidentally, it's probably a good idea to program defensively against
users of the library who try to free the same queue twice by enclosing the
body of free_queue in an if-statement:
or, more aggressively, by writing
at the beginning.
I saw a few variations like this:
(without the initial *). This is pointless. It changes the value
of the parameter within the free_queue function, but has no effect
on the caller's variables.
malloc
In order to allocate enough storage for a value, the malloc function
needs to know how big that value will be -- that is, how many bytes such a
value occupies. You can use the sizeof operator to compute the size
of a value of a specified type. For instance,
gives you the number of bytes that one value of type struct
queue_structure occupies. Do not attempt to compute this explicitly by
adding up the sizes of the fields of the structure -- the C compiler is
allowed to leave holes in the middle of a structure, to rearrange its
fields, or to pad it with additional bookkeeping information, all of which
can affect the value that sizeof will return.
The expression
gives you the number of bytes needed to store one value of type queue -- that is, one value of type struct queue_structure *. This
is unlikely to be what you want. An allocation statement like
where the pointer that malloc returns is of the type that is
provided as the argument to sizeof, is almost certainly wrong, since
normally the type of a pointer differs from the type of the object to which
it points, and there is no reason to expect the sizes of values of these
types to be equal.
If you find yourself typing in the same code sequence, or nearly the same code sequence, over and over again, stop it! You are creating code that is tedious to write, tedious to read, difficult to maintain, and extremely error-prone.
Look around for some way to rearrange or refactor the code so that the part
you're repeating occurs only once. For instance, if the consequent and the
alternative of an if-statement are both long and are almost alike,
reduce the scope of the if-statement so that it encloses only the
part that differs. If the same loop occurs in several different places,
with only the starting value of the loop control variable and the number of
iterations being different, put the loop in a function and pass in the two
varying items as arguments. And so on -- there are lots of refactoring
techniques.
If you find yourself using copy and paste to write large swatches of code quickly, stop it! Really, just stop it! -- you're abusing the editing tools. The resulting code is slightly less tedious to write, but it's still just as tedious to read, difficult to maintain, and error-prone as if you had retyped it. Stop and think instead.
malloc
The specifications for the library functions malloc, calloc,
and realloc acknowledge that it may be impossible for these
functions to allocate dynamic storage on demand. If any of these functions
fails in its attempt to allocate dynamic storage, it returns (void *)
NULL.
On modern computers, the amount of storage available is very large. On the other hand, many of the programs that we tend to write demand extraordinary amounts of storage. On balance, it's probably a good idea to get into the habit of checking the value returned by a call to a storage-allocation function before trying to dereference the pointer:
I have the habit of writing explicit casts in many cases where C allows implicit ones, for instance in
and
and even
These make absolutely no difference in the execution of compiled code, since such casts are no-ops at the machine level anyway. They are basically reminders to me and my readers to bear in mind the data types to which the expressed values belong and to acknowledge the need for adapters to accommodate the differences between what libraries provide and what applications need. You can adopt this practice if you find it useful, but a “reading knowledge” is sufficient if you find that it gets in your way.
If you use identifiers that are defined in one or more of the standard
libraries, you must include the header files for those libraries in your
code, preferably using the #include directive.
Some C compilers will fill in the missing #include directives for
you automatically, at least in some cases. Don't rely on this.
If you find it difficult to remember which of the libraries is the one that
contains, say, malloc, open a terminal window and type
The synopsis at the top of the manual page tells you which header file to include.
main function
When you're writing a library that will be compiled separately and linked
to a variety of application programs, don't include a function called main. C reserves that function name as the entry point for a stand-alone
executable.
To write the queue library to which queue.h is the interface and a test program that tests the functions in the library, you should create two separate files, queue.c and test-queue.c. Both would contain the line
to ensure that they agree on the interface. The command to compile the library should be something like
and the (subsequent) command to compile the test program, linking the compiled library to it, should be something like
This yields a standalone executable program, test-queue, which you can run just by typing its name:
Two submitters were foolish enough to modify queue.h, even
though the assignment unconditionally says not to do that. In particular,
it is important to separate the implementation of the opaque data type
struct queue_structure and of the six specified functions from their
public interface, which is all that a .h file should contain.
If you do rewrite some code file that I supply, amend the authorship comment at the top of the file to reflect your contributions.
Many C programmers strive for concision and efficient execution, sometimes even at the expense of clarity. For instance, a C programmer is not likely to write
when she can instead write
with the same effect.
C allows programmers to omit the return type from the function prototype
and header, but this is probably bad style. Many readers will assume that
the unspecified return type is void, but it's actually int.
Many of you did some precondition testing and executed an early function
return when the precondition was not met, often after printing an error
message. Unless the function is declared with the result type void,
however, you should not do this by executing a return-statement with
no expression in it! The danger in this practice is that the caller,
expecting to receive a useful value, will get whatever garbage bits happen
to be in the position on the run-time stack where the return value should
have gone, and is likely to try to treat those garbage bits as if they
constituted a useful value.
If you're going to return after detecting an error (instead of invoking the
exit function or failing an assertion), you must return some value
or other -- usually a null pointer, if the result type is some pointer
type, or 0 or -1 if it is an integer type, etc.
Nothing in the definition of the C language stops you from attempting to index into an array with a negative index expression, or with one that has a value greater than or equal to the declared size of the array. Nevertheless, such an attempt usually ends unhappily.
printf
To bring up the manual page on MathLAN that describes format specifications
for the library function printf, type
at the prompt in a terminal window. Use the format specification appropriate to the type of value that you want to print.
In particular, if you want to print out a pointer, cast it to void *
and use the %p format specification. On MathLAN workstations,
you'll get a base-sixteen numeral specifying the address of the storage
location to which the pointer points.
It is an error to write
The first line declares sample to be a variable, of type “pointer to integer,” but it does not initialize that variable, by
storing a valid address into it; it leaves that variable uninitialized, so
that the bits stored in the variable are whatever random garbage happens to
be in the memory location associated with that variable.
The second line then tries to perform the dereferencing operation on the
value of sample, so as to recover the address stored in that
variable and to place the integer 42 at that address. Since the “address” consists of garbage bits, this is unlikely to succeed.
There are two ways to get a valid address to store into sample. One
can call the dynamic storage allocator to get one:
or one can take the address of a variable or parameter of the relevant type, provided that it is in scope:
In the latter case, the value of target becomes 42, as a side effect
of the assignment through sample.
Many programmers use a “telegraphic” prose style when writing comments. In this style, the subject of a sentence is omitted when it would be obvious or conventional (e.g., “This function”), as are articles and full stops at the ends of declarative and imperative sentences. For instance, Rob Pike's comments in the regular-expression matcher discussed in chapter 1 of Beautiful code are written in telegraphic style.
In my opinion, the use of this style is an obsolete convention, dating from a time when there was a measurable cost for every byte consumed by a program's source code, so that there was an economic incentive to keep comments very short. Those days are long gone. My advice today is to follow the same conventions when writing comments as when writing expository essays.