Introduction to Statistics (MAT/SST 115.03 2008S)

R Notes for Topic 19: Confidence Intervals: Means


R notes for Activity 19-2: Exploring the t-Distribution

While the book does not ask you to use technology in this section, I thought it would be useful for you to learn a bit about how you can use R for working with the t-distribution.

Particularly since the t-distribution table is complicated to use, it is helpful to be able to have R do the computation for us. R provides a procedure, qt, that behaves much like that table. However, qt, given an area, computes a t-value with that area to the left (rather than to the right, as shown in the table).

For a 95% confidence interval, we call qt with .975. (Why .975? Because there's 0.025 to the right, and therefore 0.975 to the left.) More generally, we can average the confidence level and 1. The qt function also expects a second parameter, which represents the degrees of freedom. For example, part d asks us to find the value of t* for a 95% confidence interval with a sample size of 10 (9 degrees of freedom). In R, we would write

qt(.975, 9)

Of course, you should make sure that you know how to use the table to find t* and confirm that you get the same answer in both cases.

R notes for Activity 19-3

19-3 c: Visualizing Sample Data

You can read and preview the data with

BodyTemps = read.csv("/home/rebelsky/Stats115/Data/BodyTemps.csv")
summary(BodyTemps)
head(BodyTemps)

You'll note that the data have two columns: BodyTemp and Sex. We just want the first column, which we will select with BodyTemps$BodyTemp.

We can build a quick histogram of those data with the following command. (Since R and Minitab make different decisions as to how to make intervals, this may look a bit different than the sample answer.)

hist(BodyTemps$BodyTemp)

But we should certainly label the x axis

hist(BodyTemps$BodyTemp,
  main="Sample Body Temperatures",
  xlab="Body Temperature in Degrees F"
)

If we'd rather do a dot plot, we can use

library(BHH2, lib="/home/rebelsky/Stats115/Packages")
dotPlot(BodyTemps$BodyTemp,
  main="Sample Body Temperatures",
  xlab="Body Temperature in Degrees F"
)

We can create the normal probability plot with

qqnorm(BodyTemps$BodyTemp, datax=T, ylab="Body Temperature in Degrees F")

19-3 f: Computing Confidence Intervals

Since you used some form of technology to compute these confidence intervals in activity 19-1, I'm not sure why they're asking you to do so again. But, hey, let's cooperate. One technique is to tell R the formula. We'll start by recording the values we know.

x_bar = 98.249
s = .733
n = 130

We can use qt to compute t*. Unlike the table on p. 625, qt computes the appropriate t value given the area to the left of that t. Hence, for a 95% confidence interval, we use .975. (Why .975? Because there's 0.025 to the right, and therefore 0.975 to the left.) As you should recall from the reading, the degrees of freedom should be n-1.

t_star = qt(0.975, n-1)

Now, we're ready to compute the lower bound and upper bounds of the confidence interval using the standard formula.

ci_lower = x_bar - t_star*s/sqrt(n)
ci_upper = x_bar + t_star*s/sqrt(n)
c(ci_lower, ci_upper)

Of course, that's a lot of work. Hence, we might want to use the built-in t.test function, which provides not just the confidence interval, but also a lot of other data. However, we need to work from the original data set, rather than from the mean and standard deviation already computed from that data set. (If you only know mean, standard deviation, and sample size, you'll need to use the technique above.) To use the t.test function, you also need to provide a hypothesized population parameter (mu) and a desired confidence interval (conf.level). While you don't need mu to compute the confidence interval, the t-test computes more than just the confidence interval, and therefore requires a bit more.

t.test(BodyTemps$BodyTemp, mu=98.6, conf.level=0.95)

For the other two confidence intervals, we would use

t.test(BodyTemps$BodyTemp, mu=98.6, conf.level=0.90)
t.test(BodyTemps$BodyTemp, mu=98.6, conf.level=0.99)

19-3 j: Computing Another CI

Since you don't have the original data set, you cannot use the t.test function. Hence, you must provide R with the formulae.

x_bar = 98.249
s = .733
n = 13
t_star = qt(0.975, n-1)
ci_lower = x_bar - t_star*s/sqrt(n)
ci_upper = x_bar + t_star*s/sqrt(n)
c(ci_lower, ci_upper)

R notes for Activity 19-5: Sleeping Times

I will gather your sleep time data at the start of class and enter them into a file. You can read them into a table with

SleepData = read.csv("/home/rebelsky/Stats115/Data/SleepData.csv")
SleepTimes = SleepData$HoursSlept

The second line allows us to refer to the vector as SleepTimes.

19-5 a: Graphical Displays

You should refer to activity 19-3 c for ideas of graphical displays.

19-5 b: Sample Statistics

You can get sample size, sample mean, and sample standard deviation with

length(SleepTimes)
mean(SleepTimes)
sd(SleepTimes)

19.5 d: Confidence Interval

Since we have all of the original data, the easiest way to get the confidence interval is to use the t.test function. You should substitute your own guess as to the mean hours slept (in place of 6).

t.test(SleepTimes, mu=6, conf.level=.90)

Of course, you might also want to provide R with step-by-step instructions.

x_bar = mean(SleepTimes)
n = length(SleepTimes)
s = sd(SleepTimes)
t_star = qt(0.95, n-1)
ci_lower = x_bar - t_star*s/sqrt(n)
ci_upper = x_bar + t_star*s/sqrt(n)
c(ci_lower, ci_upper)

19-5 e: Counting

Rather than counting values by hand, you can get R to produce a vector of the times in this interval with

NearMedian = SleepTimes[(SleepTimes > ci_lower) & (SleepTimes < ci_upper)]
length(NearMedian)

R notes for Activity 19-6: Backpack Weights

You can read the data and preview basic information with

Backpack = read.csv("/home/rebelsky/Stats115/Data/Backpack.csv")
summary(Backpack)
head(Backpack)
tail(Backpack)

As you will find, there are 100 rows in the table, and three columns. The columns are labeled BackpackWeight, BodyWeight, and Sex.

19-6 b: A Ratio Vector

This question asks you to build a vector that represents the ratio of backpack weight to body weight. As you may recall, we can create that vector with

Ratio = Backpack$BackpackWeight/Backpack$BodyWeight

You should be able to figure out the appropriate ways to display these data. If you'd like to get a histogram close to the one in the sample answers, you can use

hist(Ratio, 
  breaks=seq(from=-0.008, to=0.20, by=0.015),
  axes=FALSE,
  main="",
  xlab="Ratio of Backpack Weight to Body Weight"
)
axis(1, seq(from=0,to=0.2,by=0.02)) 
axis(2, seq(from=0,to=20,by=4))

19-6 c: Confidence Intervals

The authors recommend that you compute this confidence interval by hand. However, you may also use t.test.

Creative Commons License

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright (c) 2007-8 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.