# R Notes for Topic 28: Least Squares Regression

## Least Squares Regression

R provides the `lm` function to compute the least-squares regression line. (The “lm” stands for “linear model”.) You need to provide it with a paired set of vectors, which you create with the `~` operation.

```lm(response ~ explanatory)
```

For example, if we had a data frame called `People` with one column called `FootLength` and another called `Height`, we might compute the coefficients as follows. (We get somewhat different values than given in Activity 28-1 because we're not working with exactly the same data set.)

````>` `lm(People\$Height~People\$FootLength)`
`Call:`
`lm(formula = People\$Height ~ People\$FootLength)`
`Coefficients:`
`      (Intercept)  People\$FootLength  `
`           38.668              1.022  `
```

That's a lot of text, and not in a particularly usable format. Fortunately, we can use the `coef` to grab the values from the result.

````>` `coef(lm(People\$Height~People\$FootLength))`
`      (Intercept) People\$FootLength `
`        38.668071          1.022173 `
```

We can even grab and name the two coefficients.

````>` `ab = coef(lm(People\$Height~People\$FootLength))`
`>` `a = ab[1]`
`>` `b = ab[2]`
`>` `a`
`(Intercept) `
`   38.66807 `
`>` `b`
`People\$FootLength `
`         1.022173 `
```

We can then use those values in predictions, such as predicting the height (in inches) of someone with a foot size of 28 centimeters. (Your guess is as good as mine as to why they switch units.)

````>` `a + 29*b`
`(Intercept) `
`   68.31109 `
```

Yeah, that “(Intercept)” is annoying. Ignore it for now.

We can even plot the line, using `abline(a,b)`. In this case, we want to put it on a scatterplot of height vs. foot length.

````>` `plot(People\$Height ~ People\$FootLength, main="Height (in Inches) vs. Foot Length (in cm)")`
`>` `abline(a,b)`
```

## R notes for Activity 28-2: House Prices

You can load the data with

```HousePrices = read.csv("/home/rebelsky/Stats115/Data/HousePricesAG.csv")
```

The columns are `Address`, `Price`, `Bedrooms`, `Bathrooms`, and `Size`.

You can plot house price vs. size (without the regression line) with

```plot(HousePrices\$Price ~ HousePrices\$Size,
ylab = "House Price (in \$)",
xlab = "House Size (in sq. ft.)")
```

### 28-2 b. Computing coefficients

For this problem, you should simply use R as a calculator, entering the values in the formulae.

### 28-2 c. Checking with technology

Since this is your first time using `lm`, we'll go through all of the steps. First, we just ask R for the summary. That summary should be enough to confirm your answer.

```lm(HousePrices\$Price ~ HousePrices\$Size)
```

That summary should be enough to confirm your answer. However, you may find it helpful to have the intercept (a) and slope (b) in variables, so we'll do that, too.

```ab = coef(lm(HousePrices\$Price ~ HousePrices\$Size))
a = ab[1]
b = ab[2]
```

### 28-2 d. Predicting prices

Now we see why it was useful to put `a` and `b` in variables.

```a + b*1242
```

### 28-2 l. Explaining variability with least squares lines

In case you missed it, the description of the proportion of variability explainted by the least squares line is given in the text on the top of p. 579.

## R notes for Activity 28-3: Animal Trotting Speeds

Let's start by gathering the data, building the scatterplot, computing the parameters of the least-squares line, and plotting that line. Since we're using the plot to explore data, and not for presentations, we won't worry about labels.

```TrotSpeeds = read.csv("/home/rebelsky/Stats115/Data/TrotSpeeds.csv")
plot(TrotSpeeds\$Trot.Speed ~ TrotSpeeds\$Body.Mass)
ab = coef(lm(TrotSpeeds\$Trot.Speed ~ TrotSpeeds\$Body.Mass))
a = ab[1]
b = ab[2]
abline(a,b)
```

We'll also compute the r2 value.

```r = cor(TrotSpeeds\$Trot.Speed, TrotSpeeds\$Body.Mass)
r^2
```

### 28-3 d. A residual plot

Okay, the first thing we have to do is compute the residuals. So, we need to predict the values and subtract those predicted values from the observed values.

```predicted = a + b*TrotSpeeds\$Body.Mass
residuals = TrotSpeeds\$Trot.Speed - predicted
```

Now, we're ready to plot. You should be able to figure out the plot command yourself. Remember, the form is

```plot(`response` ~ `explanatory`)
```

You may find it useful to add a horizontal line for the residual of 0.

```abline(h=0)
```

### 28-3 e. A logarithmic transformation

We'll start by computing the logs.

```log10BodyMass = log10(TrotSpeeds\$Body.Mass)
```

You can create the plot with

```plot(TrotSpeeds\$Trot.Speed ~ log10BodyMass)
```

### 28-3 f. New least-squares line

The R is fairly straightforward.

```lm(TrotSpeeds\$Trot.Speed ~ log10BodyMass)
ab = coef(lm(TrotSpeeds\$Trot.Speed ~ log10BodyMass))
a = ab[1]
b = ab[2]
abline(a,b)
```

The value of r2 is computed by

```r = cor(TrotSpeeds\$Trot.Speed, log10BodyMass)
r^2
```

### 28-3 g. Another residual plot

This plot is a bit subtle, since the residuals are computed from the log (base 10) of the body mass, but the X axis should still be the original body mass.

```predicted = a + b * log10BodyMass
residuals = TrotSpeeds\$Trot.Speed - predicted
plot(residuals ~ TrotSpeeds\$Body.Mass)
abline(h=0)
```

## R notes for Activity 28-4: Textbook Prices

We'll load the data using our standard strategy.

```TBP = read.csv("/home/rebelsky/Stats115/Data/TextbookPrices.csv")
```

### 28-4 b. Scatterplots

R is happy to make you a grid of scatterplots, using each pair of explanatory/response variable.

```plot(TBP)
```

If you'd rather do the individual scatterpots, we can write

```X11()
plot(TBP\$Price ~ TBP\$Pages)
X11()
plot(TBP\$Price ~ TBP\$Year)
```

### 28-4 d. Least-squares line

Since this is a self-check exercise, you should figure out how do this and the remaining problems using the prior answers.

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright (c) 2007-8 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit `http://creativecommons.org/licenses/by-nc/2.5/` or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.