Introduction to Statistics (MAT/SST 115.03 2008S)

R Notes for Topic 22: Comparing Two Means


R notes for Activity 22-1: Close Friends

22-1 i. Computing the test statistic

You can use R to compute the test statistic. Recall that this test statistic has the form

(x1bar - x2bar)/sqrt(s1*s1/n1 + s2*s2/n2)

You can fill in the values.

22-1 o. Computing a confidence interval

Okay, you need to compute

(xfbar-xfbar) +/- tkstar*sqrt(sf*sf/nf + sm*sm/nm)

I'll leave you to fill in the details.

22-1 p. Computing the p-value

As the book suggests, you can use the applet. You can also use R's pt function to compute the p-value. Since it's a two-sided test, we should double the computed value.

t = (1.861-2.089)/sqrt(1.777*1.777/654 + 1.760*1.760/813)
2*pt(t, df=653)

R notes for Activity 22-2: Hypothetical Commuting Times

This is one of those fun times in which our data set combines a number of essentially independent columns into a single data frame. Since R pads the empty cells in the data frame with NA values, our analyses may be slightly more complicated.

Let's start by loading the data. There's little enough data that we can look at all of it.

CommuteTimes = read.csv("/home/rebelsky/Stats115/Data/HypoCommute.csv")
CommuteTimes

The columns are named A1 (for Alex's Route 1), A2 (for Alex's Route 2), B1 (for Barb's Route 1), and so on and so forth.

22-2 c. Computing Alex's route statistics

You should be able to read the sample size from the table. To get the sample mean and standard deviation, we can use mean and sd, but need to tell the functions to ignore the NA values. (Having to tell the functions to deal with the NA values differently is one of the disadvantages of combining the columns.

mean(CommuteTimes$A1, na.rm=T)
sd(CommuteTimes$A1, na.rm=T)

22-2 d. Conducting the significance test

R makes two-sample t-tests very easy to compute. Just call t.test with the two samples.

t.test(CommuteTimes$A1,CommuteTimes$A2)

22-2 f. Confidence intervals

We repeat the t-test, telling it to use a different confidence level.

t.test(CommuteTimes$A1,CommuteTimes$A2, conf.level=.90)

22-2 k. More Computations

You should be able to figure out how to do these computations by revisiting the Alex examples from above.

R notes for Activity 22-3

Since we ended up with very different sample sizes (I'm not sure why), I've put the data into two files, Convenient.csv and Inconvenient.csv.

Convenient = read.csv("/home/rebelsky/Stats115/Data/ConvenientSequence.csv")
Inconvenient = read.csv("/home/rebelsky/Stats115/Data/InconvenientSequence.csv")

22-3 e. Visual displays

Here are the commands you might use to build four windows with the four separate displays.

library(BHH2, lib="/home/rebelsky/Stats115/Packages")
X11()
boxplot(Convenient,horizontal=T)
X11()
boxplot(Inconvenient,horizontal=T)
X11()
dotPlot(Convenient)
X11()
dotPlot(Inconvenient)

22-3 g. Test Statistics

Are we doing a two-sided test or a one-sided test? If you're using a one-sided test, which is the direction of the test? Use your answer to figure out which of the following commands to select.

t.test(Convenient,Inconvenient)
t.test(Convenient,Inconvenient, alternative="greater")
t.test(Convenient,Inconvenient, alternative="less")

22-3 h. Confidence intervals

Add conf.level=.90 to your previous answer to compute the confidence interval.

Creative Commons License

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright (c) 2007-8 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.