Introduction to Statistics (MAT/SST 115.03 2008S)

R Notes for Topic 10: More Summary Measures and Graphs


R notes for Activity 10-3

We'll be learning a bit more of R in this exercise. In particular, we'll think about ways to create new columns in a data frame and revisit ways to select rows from a data frame.

You can read in the data for this exercise with

ICC = read.csv("/home/rebelsky/Stats115/Data/IceCreamCalories.csv")

You can see the first and last few lines of the data set with

head(ICC)
tail(ICC)

As those lines suggest, there are six columns in the table BenAndJerrys, BJcal, ColdStoneCreamery, CScal, Dreyers, and Dcal. You may also note that there are a lot of NA values at the end of the table. That's because the different columns are of different lengths, but R likes to pad them into uniform lengths.

Exercise a: Five-number Summaries

Recall that you can get a five-number summary using the summary function.

summary(ICC$BJcal)
summary(ICC$CScal)
summary(ICC$Dcal)

It is also possible to get all the summaries at once by just asking for a summary of the frame.

summary(ICC)

b. Boxplots

As you might hope, you make box plots in R with the boxplot command. You make the simplest box plots from vectors.

boxplot(ICC$BJcal)

Suprisingly, R likes to make vertical box plots, rather than the horizontal box plots that most of us like. To make R make horizontal boxplots, you add horizontal=T to the command.

boxplot(ICC$BJcal, horizontal=T)

Of course, we often like to stack box plots on top of each other to compare variables (as this problem requests). If the variables are already in a frame, we can just use those columns of the frame.

boxplot(ICC[,c(2,4,6)], horizontal=T)

f. Converting Calories

This exercise asks you to create a new column in the table. You can create a new column in a table by referring to it. If the column is based on other columns, you use the appropriate formula to compute it. For example, suppose we call the new column CScal2

ICC$CScal2 = ICC$CScal / 170 * 146 / 2

h. Comparing Calories, Revisited

Recall that we made a box plot from columns 2, 4, and 6 of the frame with

boxplot(ICC[,c(2,4,6)], horizontal=T)

For the new box plot, you want columns 2, 7, and 6. (Or, if you want to keep the old data, 2, 4, 7, and 6.)

R notes for Activity 10-4: Fan Cost Index

We'll be learning a bit more of R in this exercise. In particular, we'll think about ways to create new columns in a data frame and revisit ways to select rows from a data frame.

Getting Started

To read in the initial Fan Cost Index table, use

FanCost = read.csv("/home/rebelsky/Stats115/Data/FanCost06.csv")

You can get a look at the first few lines of the table with

head(FanCost)

As that summary suggests, the columns are Team, Adult, Child, Parking, Program, Cap, Beer, Beer.Oz, Soda, Soda.Oz, and Hot Dog.

a. Computing FCI

This exercise asks you to create a new column in the table. You can create a new column in a table by referring to it. If the column is based on other columns, you use the appropriate formula to compute it. For example, if FCI were based on one adult ticket and three children's tickets, you would write

FanCost$FCI = FanCost$Adult + 2*FanCost$Child

You can see that the new column is added with head.

It is, of course, left to the reader to figure out what formula to use for the FCI reported in the book. (If you can't figure it out, it is reproduced at the end of this section.)

Next, this exercise asks you to find the team with the highest and lowest fan cost. You should start by finding out those values.

max(FanCost$FCI)
min(FanCost$FCI)

Now, how do we figure out which teams correspond to those numbers. We can look at the data (just type FanCost). We can look at just the columns that correspond to the data.

FanCost[,c(1,12)]

or

data.frame(City=FanCost$Team,FCI=FanCost$FCI)

However, our best bet is to get R to search for us.

FanCost[FanCost$FCI == max(FanCost$FCI), ]
FanCost[FanCost$FCI == min(FanCost$FCI), ]

Exercise b: Plotting

I'm not sure why the book asks for a dotplot, rather than a boxplot, but, hey, we'll do both.

library(BHH2, lib="/home/rebelsky/Stats115/Packages")
dotPlot(FanCost$FCI)
X11()
boxplot(FanCost$FCI, horizontal=T)

Exercise h: Price Per Ounce

Just in case it wasn't clear, you can use a similar technique for computing price per ounce that you used for computing FPI and MCI.

FanCost$SodaPPO = FanCost$Soda/FanCost$Soda.Oz
FanCost[FanCost$SodaPPO==max(FanCost$SodaPPO),]
FanCost[FanCost$SodaPPO==min(FanCost$SodaPPO),]

Exercise a, revisited

If you could not figure out the formula for FCI, here it is.

FanCost$FCI = 2*FanCost$Adult + 2*FanCost$Child + FanCost$Park + 
  2*FanCost$Program + 2*FanCost$Cap + 
  2*FanCost$Beer + 4*FanCost$Soda + 2*FanCost$HotDog

R notes for Activity 10-5: Digital Cameras

As always, we start by reading in some values.

DC = read.csv("/home/rebelsky/Stats115/Data/DigitalCameras.csv")
head(DC)

As that summary suggests, the five columns in the table are Brand, Model, Type, Price, Score. Now, the book asks us to separate them by type. Lets check what kinds of types there are.

summary(DC$Type)

Well, it looks the the four types are advanced compact, compact, subcompact, and super-zoom. Each is probably represented as a string.

a. Segmenting and Summarizing

This problem asks us to summarize the data by type of camera. In order to get summaries, we need to break apart the data according to type. Let's start by creating a vector of prices for each of the four kinds of camera. Note that the price is is column 4, so we can use a selector to get the appropriate rows and then just take column 4. From that vector, we compute the six-number summary.

summary(DC[DC$Type=="advanced compact", 4])
summary(DC[DC$Type=="compact", 4])
summary(DC[DC$Type=="subcompact", 4])
summary(DC[DC$Type=="super-zoom", 4])

In addition to those summaries, we might make boxplots. If we're going to do aligned boxplots, we need a way to join those boxplots together. Alternately we can look for a command to draw multiple boxpots, stacked on top of each other. Fortunately, the split operation comes into play here.

boxplot(split(DC$Price,DC$Type), horizontal=T)

Exercise d: Comparing Ratings

You may recall that we recently split prices by type. Here, we want to split the Score column. The particulars of that command are left as an exercise for the reader.

Creative Commons License

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright (c) 2007-8 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.