Introduction to Statistics (MAT/SST 115.03 2008S)
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
This exercise asks you to create a variety of histograms. Interestingly, different software packages make very different choices as to where to put the breaks in histograms. For example, the book shows a histogram in which the first subinterval has a midpoint of 0. Many packages start the first subinterval at 0, putting the break at the subinterval size.
R's hist
lets you control where the breaks fall.
However, this means that you have to specify a vector of breaks.
Fortunately, the seq
function makes it easy to
build that vector.
As you found with R's dot plots, you may need to generate the axes with a separate command. In general, your command to create a histogram will look something like the following:
hist(vector, breaks=seq(from=min,to=max,by=step) axes=FALSE, main="Title" xlab="Label of X Axis" ylab="Label of Y Axis" ) axis(1, seq(from=min,to=max,by=step)) axis(2, seq(from=min,to=max,by=step))
To complete this part of the activity, you'll need to start by reading in the data.
Diabetes = read.csv("/home/rebelsky/Stats115/Data/Diabetes.csv")
To get the histogram shown on p. 127, you would use
hist(Diabetes$AgeAtDiagnosis, breaks=seq(from=-2.5,to=92.5,by=5), axes=FALSE, main="Diabetes Diagnoses", xlab="Age of Diabetes Diagnosis", ylab="Number of People" ) axis(1, seq(from=0,to=90,by=5)) axis(2, seq(from=0,to=70,by=10))
If, however, you think that the first bar should include the values 0, 1, 2, 3, and 4 (and perhaps even 5), rather than just 0, 1, and 2 (after all, who has a negative age), you might use
hist(Diabetes$AgeAtDiagnosis, breaks=seq(from=0,to=90,by=5), axes=FALSE, main="Diabetes Diagnoses", xlab="Age of Diabetes Diagnosis", ylab="Number of People" ) axis(1, seq(from=0,to=90,by=5)) axis(2, seq(from=0,to=70,by=10))
Notice, however, that this slight shift in subintervals can have a significant effect on how we look at the data. Compare, for example, the lower end of the graph in each case.
The book then asks us to decrease the number of subintervals to 10. To acheive that result, we want to make each of size 10. We would therefore use
hist(Diabetes$AgeAtDiagnosis, breaks=seq(from=-5,to=95,by=10), axes=FALSE, main="Diabetes Diagnoses", xlab="Age of Diabetes Diagnosis", ylab="Number of People" ) axis(1, seq(from=0,to=90,by=10)) axis(2, seq(from=0,to=130,by=10))
The book next asks us to decrease the number of subintervals to 5. In this case, we want to make each of size about 20 or so.
hist(Diabetes$AgeAtDiagnosis, breaks=seq(from=-10,to=90,by=20), axes=FALSE, main="Diabetes Diagnoses", xlab="Age of Diabetes Diagnosis", ylab="Number of People" ) axis(1, seq(from=0,to=90,by=20)) axis(2, seq(from=0,to=200,by=10))
Finally, the book asks us increase the number of subintervals to 30. The size of each should be 3. Let's start at -1.5 (so that the first subinterval is centerd at 1).
hist(Diabetes$AgeAtDiagnosis, breaks=seq(from=-1.5,to=90,by=3), axes=FALSE, main="Diabetes Diagnoses", xlab="Age of Diabetes Diagnosis", ylab="Number of People" ) axis(1, seq(from=0,to=90,by=3)) axis(2, seq(from=0,to=50,by=10))
After you have completed this exercise, you should review
the commands you copied and pasted to see how they differed, and
use those differences to help you understand
R's hist
function.
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
Copyright (c) 2007-8 Samuel A. Rebelsky.
This work is licensed under a Creative Commons
Attribution-NonCommercial 2.5 License. To view a copy of this
license, visit http://creativecommons.org/licenses/by-nc/2.5/
or send a letter to Creative Commons, 543 Howard Street, 5th Floor,
San Francisco, California, 94105, USA.