Skip to main content

Displaying data

Summary: In this lab, you will have the opportunity to explore some of the visualizations available through DrRacket’s plot package along with our data sets.

Preparation

a. Do the traditional lab preparation. That is,

  • Start DrRacket.
  • Check for update the csc151 package.
  • Require the csc151 package with (require csc151).

b. Also require the plot package with (require plot).

c. Load the list of cities arranged by zip codes.

(define zips (read-csv-file "/home/username/Desktop/us-zip-codes.csv"))

d. If you haven’t done so, save a copy of the Project Gutenberg version of Jane Eyre on your desktop.

e. Add the following undocumented procedures to your definitions pane.

(define zip-ends-with
  (lambda (city three-char-suffix)
    (string=? (substring (car city) 2) three-char-suffix)))
(define zip-starts-with
  (lambda (city three-char-prefix)
    (string=? (substring (car city) 0 3) three-char-prefix)))

f. Create four different small subsets of the zips data using filter and zip-ends-with.

(define zips1
  (filter (section zip-ends-with <> "021") zips))
(define zips2
  (filter (section zip-ends-with <> "606") zips))
(define zips3
  (filter (section zip-starts-with <> "021") zips))
(define zips4
  (filter (section zip-starts-with <> "606") zips))

g. Add the following undocumented procedure to your definitions pane.

(define useful-entry?
  (lambda (entry)
    (and (real? (cadr entry))
         (real? (caddr entry)))))

h. Explain to yourself why useful-entry? is likely to be useful.

Exercises

Exercise 1: Plotting cities

a. Using filter, write an expression that selects only the elements of zips1 that contain a latitude and longitude.

> (define valid1 (filter ... zips1))

b. Using map1, extract only the latitude and longitude from that list. (You may want to write a separate helper that extracts a latitude and longitude from a single entry.)

> (define lat-long-1 (map1 ... valid1))

c. Using plot and points, display the points.

> (plot (points ...))

d. Repeat those steps with zips2.

Since latitude and longitude are angles, rather than x and y coordinates, this approach is imperfect. But it will suffice for our experiments.

Exercise 2: Plotting cities, revisited

a. Write an expression or series of expressions that plots the first two sets of points, using one color for the valid entries in zips1 and another for the valid entries in zips2.

b. Do you expect to see something similar or different for the entries in zips3 or zips4?

c. Check your answer experimentally. Then discuss with your partner any differences you see.

Exercise 3: Plotting cities, re-revisited

a. Write an expression or expressions to plot the cities in zips1 so that those north of 39.72 are one color and those south of 39.72 are another color. For example, those north of 39.72 might be blue and those south of 39.72 might be gray.

b. Write an expression or expressions to plot the cities in zips1 and zips3 using four colors: one for zips1 north of 39.72, one for zips1 south of 39.72, one for zips3 north of 39.72, and one for zips3 south of 39.72.

Exercise 4: Detour: Exploring colors

Here’s a simple expression to plot some points.

> (plot (list (points (list (list 0 0) (list 10 10) (list 3 5) (list 1 4))
                      #:fill-color "red"
                      #:sym 'fullcircle6)
              (points (list (list 5 10) (list 6 9) (list 8 7))
                      #:fill-color "black"
                      #:sym 'fullcircle6)
              (points (list (list 1 1) (list 2 3) (list 3 5))
                      #:fill-color "blue"
                      #:sym 'fullcircle6)))

In addition to color names, DrRacket lets you use RGB triplets: Lists of three integers, as in #:fill-color (list 200 10 180).

Experiment with a few triplets to find five or so colors you find useful as a set.

Exercise 5: Categorical data

In a recent lab, you wrote a procedure something like the following.

(define categorize
  (lambda (city)
    (cond
      [(not (useful-entry? city))
       "Unknown"]
      [(> (cadr city) 39.72)
       "North"]
      [(< (cadr city) 39.72)
       "South"]
      [else
       "Other"])))

a. Using tally-all, map1, and categorize, create summary information for zips1. Here’s one possible output.

> (.... zips1)
'(("North" 27) ("Unknown" 1) ("South" 31))

b. Using plot and discrete-histogram, make a histogram of these values.

c. Repeat your work for zips3.

d. Repeat your work for zips.

e. Given those results, how representative do you feel your sample data are?

Exercise 6: Tallying different types

a. Write a procedure, tally-alphabetic, that, given a list of characters, determines how many are alphabetic.

> (tally-alphabetic (list #\a #\b #\3 #\d))
3
> (tally-alphabetic (string->list "a and b3 & q4"))
6

Hint: One approach is to filter the alphabetic characters and then find out how long the list is.

Hint: char-alphabetic? is a built-in Scheme procedure.

b. Write a procedure, tally-digits, that, given a list of characters, determines how many are digits.

> (tally-digits (list #\a #\b #\3 #\d))
1
> (tally-digits (string->list "a and b3 & q4"))
2

Hint: char-numeric? is a built-in Scheme procedure.

c. Write a procedure, tally-whitespace, that, given a list of characters, determines how many are whitespace.

> (tally-whitespace (string->list "a and b3 & q4"))
4

Hint: char-whitespace? is a built-in Scheme procedure.

d. Write a procedure, tally-other, that, given a list of characters, determines how many are neither alphabetic, nor digits, nor whitespace.

> (tally-other (string->list "a and b3 & q4"))
1

e. Write a procedure, char-tallies, that, given a string, produces a list of four numbers corresponding to the four numbers above.

> (char-tallies "a and b3 & q4")
'(6 2 4 1)

Exercise 7: Visualizing tallies

a. Write a procedure, explore-strings, that takes a list of strings as input and produces a stacked histogram of the distribution of characters in the strings using char-tallies.

(define explore-strings
  (lambda (strings)
    (plot (stacked-histogram (map1 (lambda (str) (cons "" ...)) 
                                   strings)))))

b. Run explore-strings on a few sample inputs.

> (explore-strings 
   (list 
    "Now is the time for all good men to come to the aid of their country."
    "A 1 and a 2 and a 3 and ...."
    "'Twas brillig and the slithy toves; did gyre and gimble in the wabe."))

c. Run explore-strings on lines 100-110 of Jane Eyre.

Exercise 8: Exploring strings, revisited

Arrange for the histogram you created in the previous exercise to have an appropriate legend, title, and other labels.

For those with extra time

Extra 1: Other groupings

We’ve created zips1 and zips2 by selecting the entries whose last three digits of zip code match.

Create two other lists, zips3 and zips4, in which you select the entries whose first three digits match. Use “021” and “606” as the leading digits.

a. What do you expect to happen when we plot the four sets of data?

b. Check your answer experimentally.

Extra 2: Side-by-side histograms

Skim [the DrRacket documentation on histograms].

Using the ideas contained therein, show the north-south histograms of zipa1, zips2, zips3, and zips4 in one diagram that makes it easier for the reader to understand how they relate.

Extra 3: Side-by-side histograms

a. What do you expect to happen if we add zips to the solution above?

b. Check your answer experimentally.

c. You should observe that the large list of zips so dominates that the others become almost invisible. How might you solve this problem?

d. Discuss your answer with a teacher or mentor.

e. Implement your solution (or the one your teacher or mentor suggests).