# Techniques of Statistical Inference: Proportions

As you saw (or will see) in Topics 16-18, there are two principal techniques of statistical inference: confidence intervals and tests of significance. To compute these values in R, you simply need to follow the formulae. We can make it easier to compute with the formulae by using names.

## Confidence Intervals

We'll start with confidence intervals. The lower bound of a confidence interval is

```c_lower = p_hat - z_star*sqrt(p_hat*(1-p_hat)/n)
```

where `c_lower` is the lower bound of the interval, `p_hat` is the sample proportion, `z_star` is the desired confidence level (aka the “critical value”), and `n` is the sample size.

Similarly, the upper bound of a confidence interval is

```c_upper = p_hat + z_star*sqrt(p_hat*(1-p_hat)/n)
```

One way to use these formulae is to plug in the values directly. For example, to find the bounds of the 99% confidence interval (`z_star` is 2.576) when `p_hat` is 68% and the population is 2032, we might write

```c_lower = .68 - 2.576*sqrt(.68*(1-.68)/2032)
c_upper = .68 + 2.576*sqrt(.68*(1-.68)/2032)
c_lower
c_upper
```

We echo the `c_lower` and `c_upper` so that R not only computes the values, but also prints them out.

However, that's a lot of work to change the formula each time we want to compute a new confidence interval. Instead, we can assign values to the names `p_hat`, `z_star`, and `n` and then use the formulae with the mnemonics.

```p_hat = .68
z_star = 2.576
n = 2032
c_lower = p_hat - z_star*sqrt(p_hat*(1-p_hat)/n)
c_upper = p_hat + z_star*sqrt(p_hat*(1-p_hat)/n)
c_lower
c_upper
```

In fact, we can even use R to compute `z_star`, so that we don't have to look it up or memorize it each time. For a confidence level of 99%, we need the opposite of the z-score for a proportion of .05. (Why .05? Because we want only 1% outside of the confidence level, and half of the values will be below the interval and half above.) We can use `qnorm` (the inverse to `pnorm`) to compute that z-score.

```z_star = -qnorm(.05)
z_star
```

If we don't want to compute the .05 by hand, we can even use a formula for it.

```confidence_level = .99
z_star = -qnorm((1-confidence_level)/2)
z_star
```

You'll note that both of these computations give a slightly different result than the 2.576 that we've been using. As you've figured out from your repeated use of the normal table, it is common practice to use approximations of most values, and R computes a more precise approximation.

Returning to our original example (99% confidence interval for a sample proportion of .68 and size of 2032), we might ask R to compute the confidence interval with

```confidence_level = .99
p_hat = .68
n = 2032
z_star = -qnorm((1-confidence_level)/2)
c_lower = p_hat - z_star*sqrt(p_hat*(1-p_hat)/n)
c_upper = p_hat + z_star*sqrt(p_hat*(1-p_hat)/n)
c_lower
c_upper
```

## Significance Tests

We can use the same strategy here that we used above: Write the formulae for the test statistic and the p-value and then make sure the variables used in those formulae are defined as we'd like.

```z = (p_hat-pi_0)/sqrt(pi_0*(1-pi_0)/n)
p_left = pnorm(z)
p_right = 1-pnorm(z)
p_both = 2*pnorm(-abs(z))
```

where `p_hat` is the sample proportion, `pi_0` is the hypothesized population proportion, and `n` is the sample size.

So, for example, if we want to compute the p-value of getting a test proportion at least as large as 24/74 (.324) with a hypothesized population proportion of .25 when the sample size is 74, we would write,

```p_hat = 24/74
pi_0 = .25
n = 72
z = (p_hat-pi_0)/sqrt(pi_0*(1-pi_0)/n)
p = 1 - pnorm(z)
p
```

Similarly, to compute the double-sided p-value when the test proportion is .68, the hypothesized population proprotion is .65, and the sample size is 2032, we would write

```p_hat = .68
pi_0 = .65
n = 2032
z = (p_hat-pi_0)/sqrt(pi_0*(1-pi_0)/n)
p = 2*pnorm(-abs(z))
p
```

Warning! This section contains R concepts beyond the scope of this course.

Of course, rather than copying and pasting the same code every time, we might find it useful to create functions that encapsulate the computation. (We may also ask ourselves why R does not include these functions, but that's a question for another day.) It is not our goal to teach you how to write functions in R, so for this section, we'll just give you some code that you might use.

### A Function for Confidence Intervals

We can also turn this code into a function of three parameters.

```proportion_ci = function(confidence_level, p_hat, n) {
z_star = -qnorm((1-confidence_level)/2)
c_lower = p_hat - z_star*sqrt(p_hat*(1-p_hat)/n)
c_upper = p_hat + z_star*sqrt(p_hat*(1-p_hat)/n)
c(c_lower,c_upper)
}
```

Now, rather than naming each value separately, we can call this function like we'd call any other R function.

````>` `proportion_ci(.95,.68,2032)`
`[1] 0.6597178 0.7002822`
`>` `proportion_ci(.99,.68,2032)`
`[1] 0.6533446 0.7066554`
`>` `proportion_ci(.95,.72,996)`
`[1] 0.6921154 0.7478846`
`>` `proportion_ci(.95,.64,1036)`
`[1] 0.6107713 0.6692287`
```

### Functions for Computing p-values

We'll start with a function to compute the test statistic for proportions, since it's used in any computation of p-values.

```proportion_z = function(pi_0, p_hat, n) {
(p_hat-pi_0)/sqrt(pi_0*(1-pi_0)/n)
}
```

Now, there are three different ways to compute p-values: At least as extreme to the left, at least as extreme, or at least as extreme in both directions. We'll represent these as separate funtions.

```proportion_p_left = function(pi_0, p_hat, n) {
pnorm(proportion_z(pi_0, p_hat, n))
}
proportion_p_right = function(pi_0, p_hat, n) {
1 - pnorm(proportion_z(pi_0, p_hat, n))
}
proportion_p_both = function(pi_0, p_hat, n) {
2*pnorm(-abs(proportion_z(pi_0, p_hat, n)))
}
```

Here's how we'd start to fill in the table for Activity 17-4.

````>` `proportion_z(.25, .30, 50)`
`[1] 0.8164966`
`>` `proportion_p_right(.25, .30, 50)`
`[1] 0.2071081`
`>` `proportion_z(.25, .30, 100)`
`[1] 1.154701`
`>` `proportion_p_right(.25, .30, 100)`
`[1] 0.1241065`
`>` `proportion_z(.25, .30, 150)`
`[1] 1.414214`
`>` `proportion_p_right(.25, .30, 150)`
`[1] 0.0786496`
```

Similarly, here's how we'd start to fill out the table for Activity 18-1.

````>` `proportion_ci(.99, .68, 2032)`
`[1] 0.6533446 0.7066554`
`>` `proportion_p_both(.65, .68, 2032)`
`[1] 0.004578886`
`>` `proportion_p_both(.6667, .68, 2032)`
`[1] 0.2034319`
`>` `proportion_p_both(.7, .68, 2032)`
`[1] 0.04914258`
`>` `proportion_p_both(.707, .68, 2032)`
`[1] 0.007492398`
```

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright (c) 2007-8 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit `http://creativecommons.org/licenses/by-nc/2.5/` or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.