x <- matrix(rnorm(40000,mean=0, sd=1),ncol=4) y <- matrix(rnorm(200000,mean=0,sd=1),ncol=20) xbar <- apply(x,1,mean) ybar <- apply(y,1,mean) sx <- apply(x,1,sd) sy <- apply(y,1,sd) sp <- sqrt((3*sx^2 + 19*sy^2)/22) t.ratio <- (xbar-ybar)/(sp*sqrt(1/4 + 1/20)) sum(t.ratio <= -1.717144)/10000 welch.ratio <- (xbar-ybar)/sqrt(sx^2/4 + sy^2/20) welch.df <- ((sx^2/4 + sy^2/20)^2) /( (sx^2/4)^2/(3) + (sy^2/20)^2/(19) ) welch.tcrit <- qt(.05,welch.df) sum(welch.ratio <= welch.tcrit)/10000 x <- matrix(rnorm(40000,mean=0, sd=6),ncol=4) y <- matrix(rnorm(200000,mean=0,sd=1),ncol=20) xbar <- apply(x,1,mean) ybar <- apply(y,1,mean) sx <- apply(x,1,sd) sy <- apply(y,1,sd) sp <- sqrt((3*sx^2 + 19*sy^2)/22) t.ratio <- (xbar-ybar)/(sp*sqrt(1/4 + 1/20)) sum(t.ratio <= -1.717144)/10000 welch.ratio <- (xbar-ybar)/sqrt(sx^2/4 + sy^2/20) welch.df <- ((sx^2/4 + sy^2/20)^2) /( (sx^2/4)^2/(3) + (sy^2/20)^2/(19) ) welch.tcrit <- qt(.05,welch.df) sum(welch.ratio <= welch.tcrit)/10000 x <- matrix(runif(40000,min=-10,max=10),ncol=4) y <- matrix(runif(200000,min=-1,max=1),ncol=20) xbar <- apply(x,1,mean) ybar <- apply(y,1,mean) sx <- apply(x,1,sd) sy <- apply(y,1,sd) sp <- sqrt((3*sx^2 + 19*sy^2)/22) t.ratio <- (xbar-ybar)/(sp*sqrt(1/4 + 1/20)) sum(t.ratio <= -1.717144)/10000 welch.ratio <- (xbar-ybar)/sqrt(sx^2/4 + sy^2/20) welch.df <- ((sx^2/4 + sy^2/20)^2) /( (sx^2/4)^2/(3) + (sy^2/20)^2/(19) ) welch.tcrit <- qt(.05,welch.df) sum(welch.ratio <= welch.tcrit)/10000 x <- matrix(runif(40000,min=-10,max=10),ncol=4) y <- matrix(runif(40000,min=-1,max=1),ncol=4) xbar <- apply(x,1,mean) ybar <- apply(y,1,mean) sx <- apply(x,1,sd) sy <- apply(y,1,sd) sp <- sqrt((3*sx^2 + 3*sy^2)/6) t.ratio <- (xbar-ybar)/(sp*sqrt(1/4 + 1/4)) sum(t.ratio <= -1.943180)/10000 welch.ratio <- (xbar-ybar)/sqrt(sx^2/4 + sy^2/4) welch.df <- ((sx^2/4 + sy^2/4)^2) /( (sx^2/4)^2/(3) + (sy^2/4)^2/(3) ) welch.tcrit <- qt(.05,welch.df) sum(welch.ratio <= welch.tcrit)/10000 x <- matrix(runif(200000,min=-10,max=10),ncol=20) y <- matrix(runif(200000,min=-1,max=1),ncol=20) xbar <- apply(x,1,mean) ybar <- apply(y,1,mean) sx <- apply(x,1,sd) sy <- apply(y,1,sd) sp <- sqrt((19*sx^2 + 19*sy^2)/38) t.ratio <- (xbar-ybar)/(sp*sqrt(1/20 + 1/20)) sum(t.ratio <= -1.685954)/10000 welch.ratio <- (xbar-ybar)/sqrt(sx^2/20 + sy^2/20) welch.df <- ((sx^2/20 + sy^2/20)^2) /( (sx^2/20)^2/(19) + (sy^2/20)^2/(19) ) welch.tcrit <- qt(.05,welch.df) sum(welch.ratio <= welch.tcrit)/10000
RESULTS: The nominal relative of frequency of times that each "t ratio" should be less than the .05 critical t value is, of course, .05. So for each entry in the last two columns below, the closer to .05 the better. Notice that across all conditions Welch does a good job of hitting the target, never moe than .01665 away. The pooled t-ratio, on the other hand, is out of whack seriously in two cases: conditions (2) and (3). In condition (2), both populations are normal, but the population variances differ: 36 versus 1. In condition (3), the population is not normal and again the variances are not equal. (For a uniform U(a,b) the variance is (b-a)^2/12, so the variances in conditions (3)-(5) are about 33 and .33, radically different.) Notice that the lack of robustness of the pooled t appears goes largely away when the sample sizes are made equal (= 4) in condidtion (4), although both tests over shoot the .05 a little. Then in condition (5), with the uniforms, different variances, and samples both at the higher value of 20, the rejection rates are almost dead on the .05.
condition n m X Y pooled Welch ----------------------------------------------------------- (1) 4 20 N(0,1) N(0,1) .0474 .0576 (2) 4 20 N(0,36) N(0,1) .2403 .0474 (3) 4 20 U(-10,10) U(-1,1) .2505 .0626 (4) 4 4 U(-10,10) U(-1,1) .0825 .0665 (5) 20 20 U(-10,10) U(-1,1) .0491 .0452 -----------------------------------------------------------