## MATH 335: The Hypergeometric and Binomial distributions

For each of several "named" distributions, including the binomial and hypergeometric, R provides a collection of functions that:

• Compute the probability density function (e.g., the dbinom function);
• Compute the cumulative distribution function, i.e., the probability of obtaining a value less than or equal to some prescribed value, x (e.g., the dbinom function);
• Generates a random selection of values from that distribution (e.g., the rbinom function).

Examples for the Binomial distribution

The format of the dbinom function is dbinom(x,size,prob), where x=the number of successes you want the probability of, size=the number of trials (more conventionally called n), and prob=the probability of success on a single trial (more conventionally called p).

Supper we have an 80% free throw shooter who makes 10 trials. Let X=the number of successful free throws in the 10 trials. Then:

```dbinom(7,10,.8) # gives the probability of exactly 7 successes;
pbinom(7,10,.8) # gives the probability of 7 or fewer successes.

```
Try these two examples; the answers should be .2013266 and .3222005.

Examples for the Hypergeometric distribution

The format of the dhyper function is dhyper(x,m,n,k), where x=the number of white balls you want the probability of, m=the number of white balls in the urn, n=the number of black balls in the urn, and k=the number of balls you draw from the urn. Similarly, for the phyper function, the format is phyper(x,m,n,k)

Here is the translation from R's notation and language to that of Larsen and Marx (the course textbook):

```R                                         Larsen&Marx
-----------------------------------------------------------------
urn with white and black balls           red and white chips
focus on white balls                     focus on red chips
m= # of white balls in urn               r= # of red chips in urn
n= # of black balls in urn               w= # of white chips
m+n=total # balls in urn                 N = r+w = total # chips
k=sample size                            n=sample size

-----------------------------------------------------------------
```

Suppose you have an urn with 30 balls, 10 red and 20 white (using the Larsen and Marx language. You select 15 at random. What is the probability that the sample contains 8 red? contains 8 or more red?

```dhyper(8,10,20,15)  # answer is 0.02248876

```
On that last one, do you see why we entered 7 instead of 8 in the argument?

Generating random numbers

We have learned about using the sample function to generate random samples from a vector, either with or without replacement. We can use this function to simulate random outcomes from either the hypergeometric or binomail distribution.

Example: Simulate the free-throw problem.

Let's simulate many repetitions of the free-throw problem above. Here is a solution using what we already know.

```
n <- 10
x <- c(rep(0,2),rep(1,8))  # we want 80% chance of making a FT
sample(x,n,replace=T) # generates 10 FTs, 0 = miss; 1 = make
sum(sample(x,n,replace=T)) # counts the number of makes

reps <- 100
replicate(reps, sum(sample(x,n,replace=T))) # produces reps instances
# of the number of makes in 10 FTs

sum(replicate(reps, sum(sample(x,n,replace=T))) <= 7)/reps
# estimates the probability of getting 7 or fewer.
```

But R has built-in functions to simulate from well-known distributions. The all begin with the letter r , for example rbinom to generate random numbers from the binomial. So that we can simulate, say, 20 repetitions of 10 free throws when p=.8 by:

```rbinom(20, 10, .8)
[1]  9  9  7  9  7  8  9  9  8  7  9  7  6  9  8 10  8  8  9  7

results <- rbinom(20, 10, .8)
sum(results <= 7)

[1] 7

```

Notice that 7 of the 20 resulted in 7 or fewer free-throws.

On page 33 of the Introduction to R manual, you can find a full listing of the "named" distribution available in R. In our year of study, we will be working with the following:

```
-------------------------------------------------------

binomial            binom          size, prob
chi-squared         chisq          df, ncp
exponential         exp            rate
F                   f              df1, df2, ncp
gamma               gamma          shape, scale
hypergeometric      hyper          m, n, k
normal              norm           mean, sd
Poisson             pois           lambda
Student's t         t              df, ncp
uniform             unif           min, max

```

Keep in mind that with each "name" there are really three R functions implied. For example, with binom the 3 functions are dbinom , pbinom , and rbinom for, respectively, the pdf, the CDF, and for generating random numbers from the distributions.

Example : Generate 10000 random numbers from the interval of real numbers [0, 432]. Then generate the vector of first digits from this, using the function first.digit . Then make a histogram of the first digits.

Solution:

```
x <- runif(10000, 0, 432)
y <- first.digit(x)
hist(y,breaks=seq(.5,9.5)) # breaking bins at .5 marks makes
# a nicer histogram

# Here is the first.digit function:
first.digit <- function(y) {
# y should be a positive real number.
k <- floor(log10(y))
floor(y/(10^k))
}

```