Introduction to Statistics (MAT/SST 115.03 2008S)
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
While the book does not ask you to use technology in this section, I thought it would be useful for you to learn a bit about how you can use R for working with the t-distribution.
Particularly since the t-distribution table
is complicated to use, it is helpful to be able to have R do the
computation for us. R provides a procedure, qt
,
that behaves much like that table. However, qt
,
given an area, computes a t-value with that area
to the left (rather than to the right, as shown
in the table).
For a 95% confidence interval, we call
qt
with .975. (Why .975? Because there's 0.025
to the right, and therefore 0.975 to the left.) More generally, we can
average the confidence level and 1. The qt
function
also
expects a second parameter, which represents the degrees of freedom.
For example, part d asks us to find the value of
t^{*} for a 95% confidence
interval with a sample size of 10 (9 degrees of freedom).
In R, we would write
qt(.975, 9)
Of course, you should make sure that you know how to use the table to find t^{*} and confirm that you get the same answer in both cases.
You can read and preview the data with
BodyTemps = read.csv("/home/rebelsky/Stats115/Data/BodyTemps.csv") summary(BodyTemps) head(BodyTemps)
You'll note that the data have two columns: BodyTemp
and Sex
. We just want the first column, which we
will select with BodyTemps$BodyTemp
.
We can build a quick histogram of those data with the following command. (Since R and Minitab make different decisions as to how to make intervals, this may look a bit different than the sample answer.)
hist(BodyTemps$BodyTemp)
But we should certainly label the x axis
hist(BodyTemps$BodyTemp, main="Sample Body Temperatures", xlab="Body Temperature in Degrees F" )
If we'd rather do a dot plot, we can use
library(BHH2, lib="/home/rebelsky/Stats115/Packages") dotPlot(BodyTemps$BodyTemp, main="Sample Body Temperatures", xlab="Body Temperature in Degrees F" )
We can create the normal probability plot with
qqnorm(BodyTemps$BodyTemp, datax=T, ylab="Body Temperature in Degrees F")
Since you used some form of technology to compute these confidence intervals in activity 19-1, I'm not sure why they're asking you to do so again. But, hey, let's cooperate. One technique is to tell R the formula. We'll start by recording the values we know.
x_bar = 98.249 s = .733 n = 130
We can use qt
to compute t*.
Unlike the table on p. 625, qt
computes the appropriate
t value given the area to the left of that t. Hence,
for a 95% confidence interval, we use .975. (Why .975? Because there's
0.025 to the right, and therefore 0.975 to the left.) As you should
recall from the reading, the degrees of freedom should be
n
-1.
t_star = qt(0.975, n-1)
Now, we're ready to compute the lower bound and upper bounds of the confidence interval using the standard formula.
ci_lower = x_bar - t_star*s/sqrt(n) ci_upper = x_bar + t_star*s/sqrt(n) c(ci_lower, ci_upper)
Of course, that's a lot of work. Hence, we might want to use the
built-in t.test
function, which provides not
just the confidence interval, but also a lot of other data. However,
we need to work from the original data set, rather than from the mean
and standard deviation already computed from that data set. (If you
only know mean, standard deviation, and sample size, you'll need to
use the technique above.) To use the t.test
function, you also need to provide a hypothesized population
parameter (mu
) and a desired confidence interval
(conf.level
). While you don't need mu to compute
the confidence interval, the t-test computes more
than just the confidence interval, and therefore requires a bit more.
t.test(BodyTemps$BodyTemp, mu=98.6, conf.level=0.95)
For the other two confidence intervals, we would use
t.test(BodyTemps$BodyTemp, mu=98.6, conf.level=0.90) t.test(BodyTemps$BodyTemp, mu=98.6, conf.level=0.99)
Since you don't have the original data set, you cannot use the
t.test
function. Hence, you must provide
R with the formulae.
x_bar = 98.249 s = .733 n = 13 t_star = qt(0.975, n-1) ci_lower = x_bar - t_star*s/sqrt(n) ci_upper = x_bar + t_star*s/sqrt(n) c(ci_lower, ci_upper)
I will gather your sleep time data at the start of class and enter them into a file. You can read them into a table with
SleepData = read.csv("/home/rebelsky/Stats115/Data/SleepData.csv") SleepTimes = SleepData$HoursSlept
The second line allows us to refer to the vector as SleepTimes
.
You should refer to activity 19-3 c for ideas of graphical displays.
You can get sample size, sample mean, and sample standard deviation with
length(SleepTimes) mean(SleepTimes) sd(SleepTimes)
Since we have all of the original data, the easiest way to get
the confidence interval is to use the t.test
function.
You should substitute your own guess as to the mean hours
slept (in place of 6).
t.test(SleepTimes, mu=6, conf.level=.90)
Of course, you might also want to provide R with step-by-step instructions.
x_bar = mean(SleepTimes) n = length(SleepTimes) s = sd(SleepTimes) t_star = qt(0.95, n-1) ci_lower = x_bar - t_star*s/sqrt(n) ci_upper = x_bar + t_star*s/sqrt(n) c(ci_lower, ci_upper)
Rather than counting values by hand, you can get R to produce a vector of the times in this interval with
NearMedian = SleepTimes[(SleepTimes > ci_lower) & (SleepTimes < ci_upper)] length(NearMedian)
You can read the data and preview basic information with
Backpack = read.csv("/home/rebelsky/Stats115/Data/Backpack.csv") summary(Backpack) head(Backpack) tail(Backpack)
As you will find, there are 100 rows in the table, and three
columns. The columns are labeled BackpackWeight
,
BodyWeight
, and Sex
.
This question asks you to build a vector that represents the ratio of backpack weight to body weight. As you may recall, we can create that vector with
Ratio = Backpack$BackpackWeight/Backpack$BodyWeight
You should be able to figure out the appropriate ways to display these data. If you'd like to get a histogram close to the one in the sample answers, you can use
hist(Ratio, breaks=seq(from=-0.008, to=0.20, by=0.015), axes=FALSE, main="", xlab="Ratio of Backpack Weight to Body Weight" ) axis(1, seq(from=0,to=0.2,by=0.02)) axis(2, seq(from=0,to=20,by=4))
The authors recommend that you compute this confidence interval by
hand. However, you may also use t.test
.
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
Copyright (c) 2007-8 Samuel A. Rebelsky.
This work is licensed under a Creative Commons
Attribution-NonCommercial 2.5 License. To view a copy of this
license, visit http://creativecommons.org/licenses/by-nc/2.5/
or send a letter to Creative Commons, 543 Howard Street, 5th Floor,
San Francisco, California, 94105, USA.