# R Notes for Topic 22: Comparing Two Means

## R notes for Activity 22-1: Close Friends

### 22-1 i. Computing the test statistic

You can use R to compute the test statistic. Recall that this test statistic has the form

```(x1bar - x2bar)/sqrt(s1*s1/n1 + s2*s2/n2)
```

You can fill in the values.

### 22-1 o. Computing a confidence interval

Okay, you need to compute

```(xfbar-xfbar) +/- tkstar*sqrt(sf*sf/nf + sm*sm/nm)
```

I'll leave you to fill in the details.

### 22-1 p. Computing the p-value

As the book suggests, you can use the applet. You can also use R's `pt` function to compute the p-value. Since it's a two-sided test, we should double the computed value.

```t = (1.861-2.089)/sqrt(1.777*1.777/654 + 1.760*1.760/813)
2*pt(t, df=653)
```

## R notes for Activity 22-2: Hypothetical Commuting Times

This is one of those fun times in which our data set combines a number of essentially independent columns into a single data frame. Since R pads the empty cells in the data frame with `NA` values, our analyses may be slightly more complicated.

Let's start by loading the data. There's little enough data that we can look at all of it.

```CommuteTimes = read.csv("/home/rebelsky/Stats115/Data/HypoCommute.csv")
CommuteTimes
```

The columns are named `A1` (for Alex's Route 1), `A2` (for Alex's Route 2), B1 (for Barb's Route 1), and so on and so forth.

### 22-2 c. Computing Alex's route statistics

You should be able to read the sample size from the table. To get the sample mean and standard deviation, we can use `mean` and `sd`, but need to tell the functions to ignore the `NA` values. (Having to tell the functions to deal with the NA values differently is one of the disadvantages of combining the columns.

```mean(CommuteTimes\$A1, na.rm=T)
sd(CommuteTimes\$A1, na.rm=T)
```

### 22-2 d. Conducting the significance test

R makes two-sample t-tests very easy to compute. Just call `t.test` with the two samples.

```t.test(CommuteTimes\$A1,CommuteTimes\$A2)
```

### 22-2 f. Confidence intervals

We repeat the t-test, telling it to use a different confidence level.

```t.test(CommuteTimes\$A1,CommuteTimes\$A2, conf.level=.90)
```

### 22-2 k. More Computations

You should be able to figure out how to do these computations by revisiting the Alex examples from above.

## R notes for Activity 22-3

Since we ended up with very different sample sizes (I'm not sure why), I've put the data into two files, `Convenient.csv` and `Inconvenient.csv`.

```Convenient = read.csv("/home/rebelsky/Stats115/Data/ConvenientSequence.csv")
```

### 22-3 e. Visual displays

Here are the commands you might use to build four windows with the four separate displays.

```library(BHH2, lib="/home/rebelsky/Stats115/Packages")
X11()
boxplot(Convenient,horizontal=T)
X11()
boxplot(Inconvenient,horizontal=T)
X11()
dotPlot(Convenient)
X11()
dotPlot(Inconvenient)
```

### 22-3 g. Test Statistics

Are we doing a two-sided test or a one-sided test? If you're using a one-sided test, which is the direction of the test? Use your answer to figure out which of the following commands to select.

```t.test(Convenient,Inconvenient)
t.test(Convenient,Inconvenient, alternative="greater")
t.test(Convenient,Inconvenient, alternative="less")
```

### 22-3 h. Confidence intervals

Add `conf.level=.90` to your previous answer to compute the confidence interval.

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright (c) 2007-8 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit `http://creativecommons.org/licenses/by-nc/2.5/` or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.