When you have k independent samples to compare there is a temptation to use the two-sample t-test and make a bunch of pairwise comparisons. Let's investigate this strategy by doing a classtime simulation. For this activitiy we will simplify things by strictly following an alpha=.05 decision rule. Then we will compare these two decision strategies:
I want each of you to generate 6 random samples of size 50. So we are imagining that we have 6 "treatments" and 6 independent samples of size 50 from each treatment. But we are going to generate data where there are truly no mean differences whatsoever.
After generating the data (which go into the matrix called data ) we use the subsequent commands to find the 6 column means and note which column has the maximum mean and which the minimum mean. We then assign these columns to x and y resp., and will test whether these means are significantly different by using a two-sample t-test.
Here is the code for doing this:
data <- matrix(rnorm(300),ncol=6) foo <- apply(data,2,mean) x <- data[,foo==max(foo)] y <- data[,foo==min(foo)] t.test(x,y)
When I did this, x was the fifth column and y was the second. The output from the two-sample t-test was:
Standard Two-Sample t-Test
data: x and y
t = 1.5509, df = 98, p-value = 0.1241
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.08721542 0.71119975
sample estimates:
mean of x mean of y
0.08181305 -0.2301791
The P-value is .1241, which is not strong evidence against the null. This is a pleasing result since the two columns are independent random samples from the same standard normal distribution and so the true means are identical (zero) and the null hypothesis is, indeed, true.
For simplicity in what follows, we will concentrate on the P-value from the test which can be isolated via:
t.test(x,y)[[3]]
Now let us use these 6 samples to test the overall hypothesis that the 6 population means are equal. We use the one-way ANOVA F-test. To do this, we need to put the data into the form expected by aov.
y2 <- as.vector(data) #This stacks columns 1, 2, ..., 6 into a single vector. x2 <- as.factor(c(rep(1,50),rep(2,50),rep(3,50),rep(4,50),rep(5,50),rep(6,50))) fit1 <- aov(y2~x2) summary(fit1) # We are interested in the P-value. summary(fit1)[[1]][5] # To obtain the P-value alone.Now, to make all of the above steps easier to do (with a cut-and-paste into R) we collect them here:
# Cut-and-paste all of these lines into R. Do this 10 times and # record each of the 2 P-values for each of the 10 runs. data <- matrix(rnorm(300),ncol=6) foo <- apply(data,2,mean) x <- data[,foo==max(foo)] y <- data[,foo==min(foo)] t.test(x,y)[[3]] y2 <- as.vector(data) x2 <- factor(c(rep(1,50),rep(2,50),rep(3,50),rep(4,50),rep(5,50),rep(6,50))) fit1 <- aov(y2~x2) summary(fit1)[[1]][5] # Select all of this code (including this line); paste at the >