Exercise #5, our last programming assignment, asks you to write a DrScheme program that generates and displays, in a window, a two-dimensional plot in which the heights of some trees are plotted against their diameters. The data to be plotted are available in a text file from which we can read directly. (If you haven't yet read the statement of the exercise, do so now.)
After the lab on drawing classes, setting up a frame with a canvas inside it, establishing the dimensions of the canvas, and selecting its device context are matters of routine.
Set up a frame with a canvas inside it, establish the dimensions of the
canvas, and select and name the canvas's device context. Send the frame a
show message to make it visible.
The basic plotting operation is to place a dot on the canvas at the
position that represents the height and diameter of a given tree. The
naive approach is simply to send the device context a
draw-point message, enclosing the tree's height and diameter,
as recovered from the data file.
Using file recursion, write a procedure that takes an already-open input
port as its argument, reads in the diameter, height, and volume of a tree,
sends a draw-point message to the canvas's device context, and
calls itself recursively to deal with the next tree, stopping when the
attempt to read in the next diameter results in an end-of-file object.
When such a message has been sent for each of the thirty-one trees, the canvas looks like this:
Clearly there are some difficulties here.
To begin with, the dots aren't big enough. The draw-point
message blacks out only one pixel, which is almost invisible.
Suggest and implement a solution to this problem. (There are two good approaches: (1) In addition to the pixel that represents the tree's diameter and height, black out a few adjacent pixels. (2) Instead of a point, draw some other small figure.)
Next problem: the dots are all bunched up in the upper left-hand corner. Instead of using the raw data values as pixel-index coordinates, we should apply a scaling function that transforms each datum into a coordinate, and place the dot at the position indicated by the result of the transformations.
Define and test a higher-order procedure rescaler that takes
four real numbers as arguments -- the lower and upper bounds of the range
within which a group of data lie, and the lower and upper bounds of the
range of coordinates into which the data are to be transformed -- and
returns a procedure that takes any real number in the former range and
transforms it into the corresponding real number in the latter range.
> (define diameter-scaler (rescaler 8.3 20.6 0 511)) > (diameter-scaler 8.3) 0.0 > (diameter-scaler 20.6) 511.0 > (diameter-scaler 11.3) 124.6341463414634
In order to construct our scaling procedures, however, we need to know the minimum and the maximum of the tree diameters in the data set that we're going to plot, and also the minimum and the maximum of the tree volumes. Since we have to inspect all of the data to determine these values, we're going to have to separate the input phase of the computation from the processing and output phases. Instead of reading in the data file one tree at a time and plotting the data for each tree immediately, before going on to the next one, we need to read in the whole file, saving all the data in some structure, and then search through that structure for the minima and maxima that we need.
Choose a suitable structure for retaining the diameter, height, and volume
of an indefinite number of trees. (There are several reasonable choices: a
matrix, a list of tree records, and a vector of objects of a
tree% class are among the possibilities.)
Using file recursion, define a procedure that transfers all of the tree data from the specified source file to the structure you have chosen, then returns that structure.
Revise the plotting procedure from step 3 so that it takes your data structure as its argument instead of the input port and uses the appropriate form of recursion to traverse that structure tree by tree, plotting the diameter of each tree against its height.
Define procedures for finding the minimum diameter, the maximum diameter, the minimum height, and the maximum height of the trees in the data set, given your chosen structure.
Construct and name appropriate scaling procedures for transforming tree diameters to x-coordinates of your canvas and for transforming tree heights to y-coordinates. (Remember that the y-coordinates are distances from the top of the canvas, so the ``lower'' and ``upper'' bounds of the range of coordinates are the opposite of what you might initially suppose.)
Revise the plotting procedure from step 5 so that it applies the appropriate scaling procedure to each datum to get the correct position on the canvas.
Next problem: The dots are floating in space. To give the viewer some idea of what values they stand for, we need coordinate axes, with some points along them labeled.
Well, the axes themselves are easy: They are just lines, one vertical and one horizontal, that meet at a point near the lower lower left corner of the canvas.
Draw in a pair of coordinate axes, thirty pixels in from the left and bottom edges of the canvas, stopping thirty pixels from the top and right edges.
Revise your calls to rescaler so that the data are transformed
into coordinates in the region above and to the right of the axes.
We now have the problem of figuring out where to put tick marks and labels. For maximum legibility, we'd like to have somewhere between eight and twenty tick marks on each axis, and perhaps half as many labels. Moreover, the labels should be successive multiples of ``round'' numbers, like 10, 20, or 50, not of arbitrary values like 38.742 or 114. But how can we find appropriate round numbers automatically?
If we know the minimum and maximum values in the data set, we can subtract the former from the latter to get the size of the range. Suppose we divide this size by four, so that the range is divided into four equal parts. The quotient is probably not a round number; however, if we take the greatest round number not exceeding the quotient (where a round number is defined as 1, 2, or 5 times some power of 10), then there will be somewhere between four and ten multiples of that round number in the range. For in the worst case, where the quotient is just a little bit less than 5 times some power of 10, so that the round number that we calculate is 2 times the same power of 10, we still will divide the range up into only two and a half times as many parts, so if there were four to begin with, we'll still have no more than ten.
So the problem reduces to defining a procedure
nearest-smaller-round-number that takes any positive real
number (e.g., one-fourth of the length of the range that we want to divide)
as its argument and returns the greatest round number not greater than the
argument.
We can figure out what the appropriate power of ten is by taking the
base-ten logarithm of the argument (recall that we wrote a procedure for
computing base-ten logarithms as exercise 8 of the lab on numbers), applying the floor
procedure to that logarithm to eliminate its fractional part, and raising
10 to the power of the floored logarithm. The desired round number is
either 5 times that power of ten, or 2 times that power of ten, or that
power of ten itself, whichever is the first that does not exceed the
original argument.
Define and test the nearest-smaller-round-number procedure.
Define a procedure that takes two real numbers as arguments -- the lower and upper bounds of a range of data values -- and returns the greatest round number not greater than one-fourth of the size of the range. (This value will be the interval between the labels along one of the axes, in the same units in which the data values are expressed. The interval between the tick marks will be half of it.)
Define a procedure that takes two real numbers as arguments -- the lower bound of a range of data values, and the interval between tick marks -- and returns the least multiple of the latter that is not less than the former. (The result indicates the value that the first tick mark will represent.)
Define a procedure that takes three arguments -- the interval between the tick marks, and the lower and upper bounds of the data values represented by the horizontal axis -- and draws a line six pixels in length, running down from the horizontal axis, at each coordinate that results from applying the appropriate scaling procedure to some multiple of the tick-mark interval that is greater than or equal to the lower bound and less than or equal to the upper bound. For instance, if the interval between tick marks is 50, and the lower and upper bounds are 284.7 and 792.3, the tick marks should be placed at the coordinates obtained by applying the scaling procedure to 300, 350, 400, 450, 500, 550, 600, 650, 700, and 750.
Define the analogous procedure for drawing six-pixel tick marks leftwards from the vertical axis.
The remaining problem is to print the labels at the free ends of alternate
tick marks. We can compute their numerical values as in steps 11 and 12,
convert them to strings with number->string, and
send draw-text messages to the device context to print each
one. The tricky part is computing the upper-left hand corner of the
region into which each label will be printed.
If we know what font we want to use, we can compute the exact dimensions of
that region by sending a get-text-extent message to the device
context, giving it the string we want to print and the font we want to
print it in as arguments. This message returns four values, of which the
first two are the ones we're interested in: they are the width and height
of the region that the text will fill, measured in pixels.
(If you use the default font, incidentally, you can easily do the computation yourself -- the width of the text region is 7 times the length of the string, and its height is always 13.)
Define a procedure that takes three arguments -- the interval between the labels, and the lower and upper bounds of the data values represented by the horizontal axis -- and draws the string numeral for each coordinate that results from applying the appropriate scaling procedure to some multiple of the label interval that is greater than or equal to the lower bound and less than or equal to the upper bound, centered below the tick mark at the same coordinate, with a gap of three pixels between the lower end of the tick mark and the top of the label text. For instance, if the interval between labels is 100, and the lower and upper bounds are 284.7 and 792.3, the labels should be placed just below the tick marks for 300, 400, 500, 600, and 700.
Define the analogous procedure for printing labels on the vertical axis, each one centered and just to the left of the corresponding tick mark, with a three-pixel gap between the right edge of the label text and the left end of the tick mark.
This last step can cause problems if the data involve very large or very small numbers, since the labels may not fit in the thirty-pixel margin between the left edge of the canvas and the vertical axis. Ideally, we'd like to go back to step 6 and replace the hard-wired value 30 with, say, 15 plus the width in pixels of the longest label that we'll have to print.
Define a procedure that takes three arguments -- the interval between the labels, and the lower and upper bounds of the data values represented by the vertical axis -- and returns the width in pixels of the longest of the labels to be printed next to the vertical axis.
Rewrite the code you wrote in step 8 to accommodate the labels for the vertical axis.
If the data values are extremely large or small, there could be problems with the labels on the horizontal axis as well -- they might overlap, and they might overrun either the left edge or the right edge of the canvas. Is there any easy way to deal with this problem? If so, implement it; if not, document it, and try to formulate the precondition that it imposes on the data.
If the vertical-axis labels turn out to be very long, we'd like to use a wider canvas, so that we can just shift the region in which the points are plotted rightwards instead of shrinking the horizontal dimension to accommodate the labels. So when we set the dimensions of the canvas itself, we might set its height at a fixed value such as 480 or 512, but make its width dependent on the width of the longest label -- say, 466 plus the width of the longest label.
However, the timing is a little tricky. To find out the width of the longest label (unless we're using the default font) we have to send a message to the device context, which means we must already have created the canvas that supplies the device context, which in turn means that we must already have created the frame that contains that canvas. Moreover, we must already know what the strings that we want to use as labels are, which means that we must already have read in the data, determined the minimum and maximum of the tree heights, and calculated the interval between labels. From this information, we can compute the numeric values of the labels, convert them to strings, find out the widths of the strings in pixels, and keep track of the maximum of those widths. Finally, we can use that result to compute and set the width of the canvas.
This involves some rearrangement of the steps listed above and some additions to the computation. Document carefully, giving the rationale for any subtle decision that you make in the design or implementation of the program.
Once all the steps are in place, however, it requires only a short series
of procedure calls and send-expressions to draw and label
the axes and plot the points.
Complete the program by appropriately arranging the code you wrote in the
previous steps and adding some procedure calls and
send-expressions.
This document is available on the World Wide Web as
http://www.cs.grinnell.edu/~stone/courses/scheme/plotting.xhtml
created May 8, 2000
last revised May 9, 2000