Introduction to Statistics (MAT/SST 115.03 2008S)
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
Although the book tells you that the data are stored in a single file,
I've found it easier to segment it into five files,
ClassF
,
ClassG
,
ClassH
,
ClassI
, and
ClassJ
.
You should load each separately. For example,
ClassF = read.csv("/home/rebelsky/Stats115/Data/ClassF.csv") ClassG = read.csv("/home/rebelsky/Stats115/Data/ClassG.csv") ClassH = read.csv("/home/rebelsky/Stats115/Data/ClassH.csv") ClassI = read.csv("/home/rebelsky/Stats115/Data/ClassI.csv") ClassJ = read.csv("/home/rebelsky/Stats115/Data/ClassJ.csv")
Each of these CSV files contains a single column, titled
Ratings
. Hence, to make a histogram for one
of them, you would write something like the following.
hist(ClassF$Ratings)
Of course, the book doesn't tell you to make your own histograms, but you might find it useful to do so.
What the book does is ask you to compute a variety of numbers, including
range, interquartile range, and standard deviation. R's
range
function gives you the min and the max, rather
the difference between the two. To compute the difference between the
two, you need to subtract the max from the min.
max(ClassF$Ratings) - min(ClassF$Ratings)
You compute interquartile range with IQR
and
standard deviation with sd
. (And no, I do not
know why they use different capitalization in different places.)
IQR(ClassF$Ratings) sd(ClassF$Ratings)
Problems i and j ask you to create a hypothetical example. Use something like the following (replacing the 0's by other numbers).
iHypotheticals = c(0,0,0,0,0,0,0,0,0,0) sd(iHypotheticals) jHypotheticals = c(0,0,0,0,0,0,0,0,0,0) sd(jHypotheticals)
The data for this exercises are stored in
MarriageAges.csv
.
MarriageAges = read.csv("/home/rebelsky/Stats115/Data/MarriageAges.csv")
The column names in the frame are
Couple
,
HusbandAge
,
WifeAge
,
and
Difference
. We might, for example, compute the
median husband age with
median(MarriageAges$HusbandAge)
You may once again find it useful to ask for summaries of the different variables.
summary(MarriageAges$HusbandAge)
We can get summaries of all the variables with
summary(MarriageAges)
The problem also asks you to compute standard deviations and
interquartile ranges. You use the sd
to
compute standard deviations. You use the
IQR
function to compute interquartile ranges.
sd(MarriageAges$HusbandAge) IQR(MarriageAges$HusbandAge)
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
Copyright (c) 2007-8 Samuel A. Rebelsky.
This work is licensed under a Creative Commons
Attribution-NonCommercial 2.5 License. To view a copy of this
license, visit http://creativecommons.org/licenses/by-nc/2.5/
or send a letter to Creative Commons, 543 Howard Street, 5th Floor,
San Francisco, California, 94105, USA.