Examples for the Binomial distribution
The format of the dbinom function is dbinom(x,size,prob), where x=the number of successes you want the probability of, size=the number of trials (more conventionally called n), and prob=the probability of success on a single trial (more conventionally called p).
Supper we have an 80% free throw shooter who makes 10 trials. Let X=the number of successful free throws in the 10 trials. Then:
dbinom(7,10,.8) # gives the probability of exactly 7 successes; pbinom(7,10,.8) # gives the probability of 7 or fewer successes.Try these two examples; the answers should be .2013266 and .3222005.
Examples for the Hypergeometric distribution
The format of the dhyper function is dhyper(x,m,n,k), where x=the number of white balls you want the probability of, m=the number of white balls in the urn, n=the number of black balls in the urn, and k=the number of balls you draw from the urn. Similarly, for the phyper function, the format is phyper(x,m,n,k)
Here is the translation from R's notation and language to that of Larsen and Marx (the course textbook):
R Larsen&Marx ----------------------------------------------------------------- urn with white and black balls red and white chips focus on white balls focus on red chips m= # of white balls in urn r= # of red chips in urn n= # of black balls in urn w= # of white chips m+n=total # balls in urn N = r+w = total # chips k=sample size n=sample size -----------------------------------------------------------------
Suppose you have an urn with 30 balls, 10 red and 20 white (using the Larsen and Marx language. You select 15 at random. What is the probability that the sample contains 8 red? contains 8 or more red?
Obtain the answers via:
dhyper(8,10,20,15) # answer is 0.02248876 1-phyper(7,10,20,15) # answer is 0.02508746On that last one, do you see why we entered 7 instead of 8 in the argument?
Generating random numbers
We have learned about using the sample function to generate random samples from a vector, either with or without replacement. We can use this function to simulate random outcomes from either the hypergeometric or binomail distribution.
Example: Simulate the free-throw problem.
Let's simulate many repetitions of the free-throw problem above. Here is a solution using what we already know.
n <- 10
x <- c(rep(0,2),rep(1,8)) # we want 80% chance of making a FT
sample(x,n,replace=T) # generates 10 FTs, 0 = miss; 1 = make
sum(sample(x,n,replace=T)) # counts the number of makes
reps <- 100
replicate(reps, sum(sample(x,n,replace=T))) # produces reps instances
# of the number of makes in 10 FTs
sum(replicate(reps, sum(sample(x,n,replace=T))) <= 7)/reps
# estimates the probability of getting 7 or fewer.
But R has built-in functions to simulate from well-known distributions. The all begin with the letter r , for example rbinom to generate random numbers from the binomial. So that we can simulate, say, 20 repetitions of 10 free throws when p=.8 by:
rbinom(20, 10, .8) [1] 9 9 7 9 7 8 9 9 8 7 9 7 6 9 8 10 8 8 9 7 results <- rbinom(20, 10, .8) sum(results <= 7) [1] 7
Notice that 7 of the 20 resulted in 7 or fewer free-throws.
On page 33 of the Introduction to R manual, you can find a full listing of the "named" distribution available in R. In our year of study, we will be working with the following:
Distribution R name additional arguments ------------------------------------------------------- binomial binom size, prob chi-squared chisq df, ncp exponential exp rate F f df1, df2, ncp gamma gamma shape, scale hypergeometric hyper m, n, k normal norm mean, sd Poisson pois lambda Student's t t df, ncp uniform unif min, max
Keep in mind that with each "name" there are really three R functions implied. For example, with binom the 3 functions are dbinom , pbinom , and rbinom for, respectively, the pdf, the CDF, and for generating random numbers from the distributions.
Example : Generate 10000 random numbers from the interval of real numbers [0, 432]. Then generate the vector of first digits from this, using the function first.digit . Then make a histogram of the first digits.
Solution:
x <- runif(10000, 0, 432)
y <- first.digit(x)
hist(y,breaks=seq(.5,9.5)) # breaking bins at .5 marks makes
# a nicer histogram
# Here is the first.digit function:
first.digit <- function(y) {
# y should be a positive real number.
k <- floor(log10(y))
floor(y/(10^k))
}