MATH 336: Probability and Statistics II


Residual plot example

The following data come from an experiment with a simple pendulum. A pendulum of a given length L is allowed to swing back and forth for 50 cycles and the time elapsed for these 50 cycles is recorded. How well can we estimate the time per cycle from the pendulum length?

Here are the data:
Length  Time
175.2  2.650
151.5  2.468
126.4  2.256
101.7  2.024
 77.0  1.764

Length is length in centimeters and Time is cycle time in seconds (that is elapsed time divided by 50).

Select and copy the data from the web; then use the R command:

pendulum.df <- read.table(file='clipboard',header=T)

First use a straight line

attach(pendulum.df)
fit.lm <- lm(Time ~ Length)
plot(Length, Time)
abline(fit.lm)
summary(fit.lm)
cbind(Length, Time, fit.lm$fit, fit.lm$resid)

Now make a residual plot:
plot(Length,fit.lm$resid)

Transform the data by using square root of length

The plot suggests that a straight-line model may be deficient. We can use a transformation to improve the description.

sqrt.L <- sqrt(Length)
fit2.lm <- lm(Time ~ sqrt.L)
summary(fit2.lm)
plot(sqrt.L,fit2.lm$resid)

The residual plot lacks the obvious curvilinear pattern, which suggests the model Time ~ sqrt(Length) is a better description. (This, of course, squares with physical theory about the behavior of the pendulum.)

Let's go further with the analysis. Using the summary table given below.

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0195021  0.0039852   4.894   0.0163 *
sqrt.L      0.1988327  0.0003545 560.844 1.25e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.001252 on 3 degrees of freedom
Multiple R-Squared:     1,      Adjusted R-squared:     1
F-statistic: 3.145e+05 on 1 and 3 DF,  p-value: 1.25e-08