By a collective decision of the faculty of Grinnell College, we now administer the same student-evaluation form in all of our courses except the first-year tutorial. The form presents six assertions, oddly referred to as “questions”:
The course sessions were conducted in a manner that helped me to understand the subject matter of the course.
The instructor helped me to understand the subject matter of the course.
Work completed with and/or discussions with other students in this course helped me to understand the subject matter of the course.
The oral and written work, tests, and/or other assignments helped me to understand the subject matter of the course.
Required readings or other course materials helped me to understand the subject matter of the course.
I learned a lot in this course.
The student is invited to disagree strongly, disagree moderately, disagree slightly, agree slightly, agree moderately, or agree strongly with each of these assertions, or to decline to express an opinion (“Not Applicable/Don't Know”). The Office of Institutional Research tallies the responses and returns the tallies to the teacher of the course.
In the first-year tutorial, we use a similar but slightly more elaborate form, inviting students to agree or disagree with a total of eighteen assertions.
The faculty approved the use of the results of these evaluations for only one purpose: When a faculty member is being reviewed (for retention, promotion, or tenure), the Personnel Committee receives the results of his or her evaluations. However, factoids derived from the evaluations are equally useful -- indeed, exactly equally useful -- in many other contexts. At the suggestion of the President of the College, therefore, I have made the results of evaluations of my courses available for whatever purposes readers may conceive. I invite comments and suggestions from interested readers.
The OIR also computes various additional statistics from the tallies by awarding points for each response according to the following table:
| Response | Score |
|---|---|
| Strongly Disagree | 1 point |
| Moderately Disagree | 2 points |
| Slightly Disagree | 3 points |
| Slightly Agree | 4 points |
| Moderately Agree | 5 points |
| Strongly Agree | 6 points |
Using these scores, the OIR computes a 95% confidence interval for the supposed mean score for each question, and sometimes includes these in its reports to teachers. Because computing such intervals exemplifies the elementary statistical fallacy that is explained in my article “Scales,” I have not bothered to reproduce them here.
For comparison, here are the college-wide aggregate results. (However, the Office of Institutional Research compiled at least some of these tallies by a method that yields incorrect results. The data in these tables should therefore be taken as approximate.)
In interpreting evaluations from previous years, readers may wish to take into account the secular inflation of ratings, particularly on questions 2 and 6. Note, for instance, the rise in the percentage of students giving the response “strongly agree” to each question:
| Question | “Strongly agree,” spring 1999 | “Strongly agree,” fall 2012 |
|---|---|---|
| The course sessions were conducted in a manner that helped me to understand the subject matter of the course. | 48% | 58% |
| The instructor helped me to understand the subject matter of the course. | 49% | 68% |
| Work completed with and/or discussions with other students in this course helped me to understand the subject matter of the course. | 39% | 49% |
| The oral and written work, tests, and or other assignments helped me to understand the subject matter of the course. | 38% | 50% |
| Required readings or other course materials helped me to understand the subject matter of the course. | 41% | 48% |
| I learned a lot in this course. | 53% | 67% |
Readers may also wish to note that both the bandwidth of the quantitative part of the evaluation form (the upper bound on the amount of information that can be transmitted through that channel) and the information content of the actual responses are surprisingly small.
Each student's response to each question is assigned to one of seven groups (a group for each of the six scalar responses and a combined “not applicable,” “don't know,” no-response group). The bandwidth is therefore log2 7 (approximately 2.8) bits per response, for a total of 6 log2 7 (approximately 16.8) bits for the entire form.
However, our students' actual responses do not use this bandwidth with perfect efficiency, since not all of the responses to a question are equally likely. For each of the six questions, the distribution of responses is heavily and predictably skewed towards the “Strongly Agree” and “Moderately Agree” groups. Hence the mean information content of those responses is less than the theoretical maximum. In addition, the mean information content has diminished over time, as secular inflation has tended to make the distributions even more skewed.
The following table shows the mean information content (in bits) of responses, broken down by question and by semester. The “Total” column shows the mean total information content of all six questions in the specified semester.
| Semester | Question number | Total | |||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | ||
| Spring 1999 | 1.78 | 1.91 | 2.29 | 2.07 | 2.05 | 1.74 | 11.84 |
| Spring 2000 | 1.75 | 1.61 | 2.18 | 2.05 | 2.10 | 1.70 | 11.39 |
| Fall 2000 | 1.83 | 1.68 | 2.16 | 2.03 | 2.15 | 1.70 | 11.55 |
| Spring 2001 | 1.83 | 1.69 | 2.17 | 2.02 | 2.07 | 1.70 | 11.48 |
| Fall 2001 | 1.81 | 1.66 | 2.18 | 2.12 | 2.16 | 1.69 | 11.63 |
| Spring 2002 | 1.82 | 1.63 | 2.20 | 2.07 | 2.04 | 1.65 | 11.41 |
| Fall 2002 | 1.77 | 1.61 | 2.17 | 1.98 | 2.08 | 1.63 | 11.24 |
| Spring 2003 | 1.73 | 1.52 | 2.13 | 1.95 | 1.99 | 1.61 | 10.93 |
| Fall 2003 | 1.80 | 1.60 | 2.14 | 1.97 | 2.06 | 1.69 | 11.26 |
| Spring 2004 | 1.75 | 1.60 | 2.15 | 1.96 | 2.07 | 1.62 | 11.13 |
| Fall 2004 | 1.76 | 1.56 | 2.17 | 2.00 | 2.10 | 1.62 | 11.21 |
| Spring 2005 | 1.81 | 1.57 | 2.10 | 1.98 | 2.03 | 1.60 | 11.09 |
| Fall 2005 | 1.69 | 1.47 | 2.09 | 1.89 | 2.01 | 1.55 | 10.70 |
| Spring 2006 | 1.80 | 1.62 | 2.18 | 1.99 | 2.05 | 1.66 | 11.31 |
| Fall 2006 | 1.74 | 1.58 | 2.12 | 1.95 | 2.07 | 1.61 | 11.07 |
| Spring 2007 | 1.73 | 1.54 | 2.09 | 1.98 | 2.04 | 1.58 | 10.96 |
| Fall 2007 | 1.76 | 1.59 | 2.08 | 1.97 | 2.08 | 1.62 | 11.11 |
| Spring 2008 | 1.70 | 1.49 | 2.03 | 1.94 | 2.02 | 1.53 | 10.71 |
| Fall 2008 | 1.62 | 1.44 | 2.00 | 1.87 | 2.07 | 1.52 | 10.52 |
| Spring 2009 | 1.70 | 1.47 | 2.02 | 1.92 | 2.07 | 1.53 | 10.70 |
| Fall 2009 | 1.63 | 1.46 | 2.01 | 1.88 | 2.04 | 1.49 | 10.52 |
| Spring 2010 | 1.65 | 1.42 | 2.00 | 1.90 | 2.03 | 1.47 | 10.46 |
| Fall 2010 | 1.62 | 1.41 | 1.95 | 1.87 | 2.03 | 1.45 | 10.33 |
| Spring 2011 | 1.64 | 1.41 | 1.97 | 1.86 | 2.01 | 1.47 | 10.36 |
| Spring 2012 | 1.58 | 1.37 | 1.93 | 1.81 | 1.98 | 1.42 | 10.10 |
| Fall 2012 | 1.56 | 1.38 | 1.90 | 1.84 | 1.96 | 1.41 | 10.06 |
For comparison, the information content of written English prose is generally estimated to be at least 5.5 bits per word. Thus one set of responses to the quantitative part of the evaluation form could in principle carry about the same amount of information as three words of prose, but in practice conveys only about two words' worth, on the average.
Professor Rebelsky's essay “About end-of-course evaluations” explains the purpose and recent history of student evaluations at Grinnell in greater detail. (It's also more interesting reading than this page.)
This document is available on the World Wide Web as
http://www.cs.grinnell.edu/~stone/evaluations/
created January 18, 2001
last revised January 30, 2013