Introduction to Statistics (MAT/SST 115.03 2008S)

R Notes for Topic 29: Inference for Correlation and Regression


R notes for Activity 29-1: Studying and Grades

You can load the data with

UOPgpa = read.csv("/home/rebelsky/Stats115/Data/UOPgpa.csv")

The column headings are ID, Hours, and GPA.

29-1 b. Scatterplot

plot(UOPgpa$GPA ~ UOPgpa$Hours)

29-1 c. Least-squares line

lm(UOPgpa$GPA ~ UOPgpa$Hours)

Let's add it to the scatterplot, too.

abline(lm(UOPgpa$GPA ~ UOPgpa$Hours))

29-1 d. Correlation coefficient

r = cor(UOPgpa$GPA,UOPgpa$Hours)
r^2

29-1 n. Standard error

I'll admit that I only know how to read the standard error from R, and not how to automatically get it into a variable.

We can get more information than you'll ever need about a linear model by asking for a summary of the linear model.

summary(lm(UOPgpa$GPA ~ UOPgpa$Hours))

The line of interest gives various information about the slope

> summary(lm(UOPgpa$GPA ~ UOPgpa$Hours))
...
             Estimate Std. Error t value Pr(>|t|)    
...
UOPgpa$Hours  0.08938    0.02771   3.226  0.00184 ** 
...

29-1 o. The test statistic

Although R just reported the test statistic, you should calculate it yourself just to make sure that R is correc.t

29-1 p. The p-value

Although R just reported the p-value, you should see whether you get a similar value from the table.

29-1 r. Confidence interval

Remember, you compute the confidence interval with b +/- t*SE(b). You can look up the critical value in the table.

R notes for Activity 29-2: Studying and Grades

Although we used some calculations last time (to make sure that you understood how to get residuals), you can use residuals and lm together to get the residuals.

res = residuals(lm(UOPgpa$GPA ~ UOPgpa$Hours))

29-2 a. Some plots

Recall that we use hist to produce the historgram.

hist(res)

We make normal probability plots and lines through them with

qqnorm(res, datax=T)
qqline(res, datax=T)

29-2 b. A scatterplot

You should be able to figure this one out, since we've been doing scatterplots lately.

R notes for Activity 29-3: House Prices

You can load the data with

HousePrices = read.csv("/home/rebelsky/Stats115/Data/HousePricesAG.csv")

You should be able to figure out the rest on your own (from previous exercises).

Creative Commons License

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright (c) 2007-8 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.