MATH 335: The Hypergeometric and Binomial distributions

For each of several "named" distributions, including the binomial and hypergeometric, R provides a collection of functions that:

  • Compute the probability density function (e.g., the dbinom function);
  • Compute the cumulative distribution function, i.e., the probability of obtaining a value less than or equal to some prescribed value, x (e.g., the dbinom function);
  • Generates a random selection of values from that distribution (e.g., the rbinom function).

    Examples for the Binomial distribution

    The format of the dbinom function is dbinom(x,size,prob), where x=the number of successes you want the probability of, size=the number of trials (more conventionally called n), and prob=the probability of success on a single trial (more conventionally called p).

    Supper we have an 80% free throw shooter who makes 10 trials. Let X=the number of successful free throws in the 10 trials. Then:

    dbinom(7,10,.8) # gives the probability of exactly 7 successes;
    pbinom(7,10,.8) # gives the probability of 7 or fewer successes.
    
    
    Try these two examples; the answers should be .2013266 and .3222005.

    Examples for the Hypergeometric distribution

    The format of the dhyper function is dhyper(x,m,n,k), where x=the number of white balls you want the probability of, m=the number of white balls in the urn, n=the number of black balls in the urn, and k=the number of balls you draw from the urn. Similarly, for the phyper function, the format is phyper(x,m,n,k)

    Here is the translation from R's notation and language to that of Larsen and Marx (the course textbook):

    R                                         Larsen&Marx
    -----------------------------------------------------------------
    urn with white and black balls           red and white chips
    focus on white balls                     focus on red chips
    m= # of white balls in urn               r= # of red chips in urn
    n= # of black balls in urn               w= # of white chips
    m+n=total # balls in urn                 N = r+w = total # chips
    k=sample size                            n=sample size
    
    -----------------------------------------------------------------
    

    Suppose you have an urn with 30 balls, 10 red and 20 white (using the Larsen and Marx language. You select 15 at random. What is the probability that the sample contains 8 red? contains 8 or more red?

    Obtain the answers via:

    dhyper(8,10,20,15)  # answer is 0.02248876
    1-phyper(7,10,20,15) # answer is 0.02508746
    
    
    On that last one, do you see why we entered 7 instead of 8 in the argument?