MATH 335: The Hypergeometric and Binomial distributions

For each of several "named" distributions, including the binomial and hypergeometric, R provides a collection of functions that:

  • Compute the probability density function (e.g., the dbinom function);
  • Compute the cumulative distribution function, i.e., the probability of obtaining a value less than or equal to some prescribed value, x (e.g., the dbinom function);
  • Generates a random selection of values from that distribution (e.g., the rbinom function).

    Examples for the Binomial distribution

    The format of the dbinom function is dbinom(x,size,prob), where x=the number of successes you want the probability of, size=the number of trials (more conventionally called n), and prob=the probability of success on a single trial (more conventionally called p).

    Supper we have an 80% free throw shooter who makes 10 trials. Let X=the number of successful free throws in the 10 trials. Then:

    dbinom(7,10,.8) # gives the probability of exactly 7 successes;
    pbinom(7,10,.8) # gives the probability of 7 or fewer successes.
    Try these two examples; the answers should be .2013266 and .3222005.

    Examples for the Hypergeometric distribution

    The format of the dhyper function is dhyper(x,m,n,k), where x=the number of white balls you want the probability of, m=the number of white balls in the urn, n=the number of black balls in the urn, and k=the number of balls you draw from the urn. Similarly, for the phyper function, the format is phyper(x,m,n,k)

    Here is the translation from R's notation and language to that of Larsen and Marx (the course textbook):

    R                                         Larsen&Marx
    urn with white and black balls           red and white chips
    focus on white balls                     focus on red chips
    m= # of white balls in urn               r= # of red chips in urn
    n= # of black balls in urn               w= # of white chips
    m+n=total # balls in urn                 N = r+w = total # chips
    k=sample size                            n=sample size

    Suppose you have an urn with 30 balls, 10 red and 20 white (using the Larsen and Marx language. You select 15 at random. What is the probability that the sample contains 8 red? contains 8 or more red?

    Obtain the answers via:

    dhyper(8,10,20,15)  # answer is 0.02248876
    1-phyper(7,10,20,15) # answer is 0.02508746
    On that last one, do you see why we entered 7 instead of 8 in the argument?

    Generating random numbers

    We have learned about using the sample function to generate random samples from a vector, either with or without replacement. We can use this function to simulate random outcomes from either the hypergeometric or binomail distribution.

    Example: Simulate the free-throw problem.

    Let's simulate many repetitions of the free-throw problem above. Here is a solution using what we already know.

    n <- 10
    x <- c(rep(0,2),rep(1,8))  # we want 80% chance of making a FT
    sample(x,n,replace=T) # generates 10 FTs, 0 = miss; 1 = make
    sum(sample(x,n,replace=T)) # counts the number of makes
    reps <- 100
    replicate(reps, sum(sample(x,n,replace=T))) # produces reps instances
                                 # of the number of makes in 10 FTs
    sum(replicate(reps, sum(sample(x,n,replace=T))) <= 7)/reps 
                          # estimates the probability of getting 7 or fewer.

    But R has built-in functions to simulate from well-known distributions. The all begin with the letter r , for example rbinom to generate random numbers from the binomial. So that we can simulate, say, 20 repetitions of 10 free throws when p=.8 by:

    rbinom(20, 10, .8)
     [1]  9  9  7  9  7  8  9  9  8  7  9  7  6  9  8 10  8  8  9  7
    results <- rbinom(20, 10, .8)
    sum(results <= 7)
    [1] 7

    Notice that 7 of the 20 resulted in 7 or fewer free-throws.

    On page 33 of the Introduction to R manual, you can find a full listing of the "named" distribution available in R. In our year of study, we will be working with the following:

    Distribution        R name        additional arguments
    binomial            binom          size, prob
    chi-squared         chisq          df, ncp
    exponential         exp            rate
    F                   f              df1, df2, ncp
    gamma               gamma          shape, scale
    hypergeometric      hyper          m, n, k
    normal              norm           mean, sd
    Poisson             pois           lambda
    Student's t         t              df, ncp
    uniform             unif           min, max

    Keep in mind that with each "name" there are really three R functions implied. For example, with binom the 3 functions are dbinom , pbinom , and rbinom for, respectively, the pdf, the CDF, and for generating random numbers from the distributions.

    Example : Generate 10000 random numbers from the interval of real numbers [0, 432]. Then generate the vector of first digits from this, using the function first.digit . Then make a histogram of the first digits.


    x <- runif(10000, 0, 432)
    y <- first.digit(x)
    hist(y,breaks=seq(.5,9.5)) # breaking bins at .5 marks makes
                               # a nicer histogram
    # Here is the first.digit function:
    first.digit <- function(y) {
    # y should be a positive real number.
    k <- floor(log10(y))