Here we see how to do some basic things for simple linear regression in R. The data are the US News college rankings for 2004 along with some extra variables collected for use in statistics courses by Katherine McClelland. Click on the US US News college rankings data link to see the full data set.
The small data set below is a subset consisting of the top 10 schools and two variables: Peer Rating and Rank. Cut and paste these data into R.
Peer Rank 4.7 1 4.7 2 4.6 3 4.5 4 4.4 5 4.3 6 4.3 7 4.1 8 4.1 9 4.3 10 small.df <- read.table(file='clipboard',header=T)Now, do the following R commands:
attach(small.df) plot(Peer, Rank) college.lm <- lm(Rank ~ Peer) abline(college.lm) summary(college.lm)
Over time, we will learn what this summary is trying to say. The lm function fits a straight-line to the data. This creates the object we have named college.lm . Note now nimbly R will extract what it needs to draw the line, when we input college.lm into the abline function and extract other information when we input college.lm into the summary function.
The method of fitting a line to the data is the method of least squares, which the calculations in the following code illustrates for a smaller data set: just the first 10 points of college06.df.
y <- Rank
x <- Peer
plot(x,y)
xy <- x*y
xx <- x^2
fit <- college.lm$fit
res <- college.lm$resid
round(cbind(x,y,xx,xy,fit,res),3)
x y xx xy fit res
[1,] 4.7 1 22.09 4.7 1.75 -0.75
[2,] 4.7 2 22.09 9.4 1.75 0.25
[3,] 4.6 3 21.16 13.8 3.00 0.00
[4,] 4.5 4 20.25 18.0 4.25 -0.25
[5,] 4.4 5 19.36 22.0 5.50 -0.50
[6,] 4.3 6 18.49 25.8 6.75 -0.75
[7,] 4.3 7 18.49 30.1 6.75 0.25
[8,] 4.1 8 16.81 32.8 9.25 -1.25
[9,] 4.1 9 16.81 36.9 9.25 -0.25
[10,] 4.3 10 18.49 43.0 6.75 3.25
sum(xy)
# [1] 236.5
sum(x)
#[1] 44
sum(y)
# [1] 55
sum(xx)
#[1] 194.04
b1 <- (10*xy - sum(x)*sum(y))/(10*sum(xx) - sum(x)^2)
b1 <- (10*sum(xy) - sum(x)*sum(y))/(10*sum(xx) - sum(x)^2)
b1
# [1] -12.5
b0 <- mean(y) - b1*mean(x)
b0
# [1] 60.5