Introduction to Statistics (MAT/SST 115.03 2008S)
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
As you saw (or will see) in Topics 16-18, there are two principal techniques of statistical inference: confidence intervals and tests of significance. To compute these values in R, you simply need to follow the formulae. We can make it easier to compute with the formulae by using names.
We'll start with confidence intervals. The lower bound of a confidence interval is
c_lower = p_hat - z_star*sqrt(p_hat*(1-p_hat)/n)
where c_lower
is the lower bound of the interval,
p_hat
is the sample proportion, z_star
is the desired confidence level (aka the “critical value”),
and n
is the sample size.
Similarly, the upper bound of a confidence interval is
c_upper = p_hat + z_star*sqrt(p_hat*(1-p_hat)/n)
One way to use these formulae is to plug in the values directly.
For example, to find the bounds of the 99% confidence interval
(z_star
is 2.576)
when p_hat
is 68% and the population is 2032, we might
write
c_lower = .68 - 2.576*sqrt(.68*(1-.68)/2032) c_upper = .68 + 2.576*sqrt(.68*(1-.68)/2032) c_lower c_upper
We echo the c_lower
and c_upper
so that R
not only computes the values, but also prints them out.
However, that's a lot of work to change the formula each time we
want to compute a new confidence interval. Instead, we can assign
values to the names p_hat
, z_star
, and
n
and then use the formulae with the mnemonics.
p_hat = .68 z_star = 2.576 n = 2032 c_lower = p_hat - z_star*sqrt(p_hat*(1-p_hat)/n) c_upper = p_hat + z_star*sqrt(p_hat*(1-p_hat)/n) c_lower c_upper
In fact, we can even use R to compute z_star
, so that we
don't have to look it up or memorize it each time. For a confidence
level of 99%, we need the opposite of the z-score
for a proportion of .05. (Why .05? Because we want only 1%
outside of the confidence level, and half of
the values will be below the interval and half above.) We can use
qnorm
(the inverse to pnorm
)
to compute that z-score.
z_star = -qnorm(.05) z_star
If we don't want to compute the .05 by hand, we can even use a formula for it.
confidence_level = .99 z_star = -qnorm((1-confidence_level)/2) z_star
You'll note that both of these computations give a slightly different result than the 2.576 that we've been using. As you've figured out from your repeated use of the normal table, it is common practice to use approximations of most values, and R computes a more precise approximation.
Returning to our original example (99% confidence interval for a sample proportion of .68 and size of 2032), we might ask R to compute the confidence interval with
confidence_level = .99 p_hat = .68 n = 2032 z_star = -qnorm((1-confidence_level)/2) c_lower = p_hat - z_star*sqrt(p_hat*(1-p_hat)/n) c_upper = p_hat + z_star*sqrt(p_hat*(1-p_hat)/n) c_lower c_upper
We can use the same strategy here that we used above: Write the formulae for the test statistic and the p-value and then make sure the variables used in those formulae are defined as we'd like.
z = (p_hat-pi_0)/sqrt(pi_0*(1-pi_0)/n) p_left = pnorm(z) p_right = 1-pnorm(z) p_both = 2*pnorm(-abs(z))
where p_hat
is the sample proportion,
pi_0
is the hypothesized population proportion,
and n
is the sample size.
So, for example, if we want to compute the p-value of getting a test proportion at least as large as 24/74 (.324) with a hypothesized population proportion of .25 when the sample size is 74, we would write,
p_hat = 24/74 pi_0 = .25 n = 72 z = (p_hat-pi_0)/sqrt(pi_0*(1-pi_0)/n) p = 1 - pnorm(z) p
Similarly, to compute the double-sided p-value when the test proportion is .68, the hypothesized population proprotion is .65, and the sample size is 2032, we would write
p_hat = .68 pi_0 = .65 n = 2032 z = (p_hat-pi_0)/sqrt(pi_0*(1-pi_0)/n) p = 2*pnorm(-abs(z)) p
Warning! This section contains R concepts beyond the scope of this course.
Of course, rather than copying and pasting the same code every time, we might find it useful to create functions that encapsulate the computation. (We may also ask ourselves why R does not include these functions, but that's a question for another day.) It is not our goal to teach you how to write functions in R, so for this section, we'll just give you some code that you might use.
We can also turn this code into a function of three parameters.
proportion_ci = function(confidence_level, p_hat, n) { z_star = -qnorm((1-confidence_level)/2) c_lower = p_hat - z_star*sqrt(p_hat*(1-p_hat)/n) c_upper = p_hat + z_star*sqrt(p_hat*(1-p_hat)/n) c(c_lower,c_upper) }
Now, rather than naming each value separately, we can call this function like we'd call any other R function.
>
proportion_ci(.95,.68,2032)
[1] 0.6597178 0.7002822
>
proportion_ci(.99,.68,2032)
[1] 0.6533446 0.7066554
>
proportion_ci(.95,.72,996)
[1] 0.6921154 0.7478846
>
proportion_ci(.95,.64,1036)
[1] 0.6107713 0.6692287
We'll start with a function to compute the test statistic for proportions, since it's used in any computation of p-values.
proportion_z = function(pi_0, p_hat, n) { (p_hat-pi_0)/sqrt(pi_0*(1-pi_0)/n) }
Now, there are three different ways to compute p-values: At least as extreme to the left, at least as extreme, or at least as extreme in both directions. We'll represent these as separate funtions.
proportion_p_left = function(pi_0, p_hat, n) { pnorm(proportion_z(pi_0, p_hat, n)) } proportion_p_right = function(pi_0, p_hat, n) { 1 - pnorm(proportion_z(pi_0, p_hat, n)) } proportion_p_both = function(pi_0, p_hat, n) { 2*pnorm(-abs(proportion_z(pi_0, p_hat, n))) }
Here's how we'd start to fill in the table for Activity 17-4.
>
proportion_z(.25, .30, 50)
[1] 0.8164966
>
proportion_p_right(.25, .30, 50)
[1] 0.2071081
>
proportion_z(.25, .30, 100)
[1] 1.154701
>
proportion_p_right(.25, .30, 100)
[1] 0.1241065
>
proportion_z(.25, .30, 150)
[1] 1.414214
>
proportion_p_right(.25, .30, 150)
[1] 0.0786496
Similarly, here's how we'd start to fill out the table for Activity 18-1.
>
proportion_ci(.99, .68, 2032)
[1] 0.6533446 0.7066554
>
proportion_p_both(.65, .68, 2032)
[1] 0.004578886
>
proportion_p_both(.6667, .68, 2032)
[1] 0.2034319
>
proportion_p_both(.7, .68, 2032)
[1] 0.04914258
>
proportion_p_both(.707, .68, 2032)
[1] 0.007492398
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
Copyright (c) 2007-8 Samuel A. Rebelsky.
This work is licensed under a Creative Commons
Attribution-NonCommercial 2.5 License. To view a copy of this
license, visit http://creativecommons.org/licenses/by-nc/2.5/
or send a letter to Creative Commons, 543 Howard Street, 5th Floor,
San Francisco, California, 94105, USA.