The Average Student Does Not Exist

A look into student individuality in higher education

In high-stakes testing, student performance is most commonly judged by one’s relation to the average total score. Data from 1.5k computer science finals graded with Gradescope suggests this may be an ineffective way to characterize student performance.

In the late 1940s, the US Air Force had a problem. Its pilots were crashing their warplanes too often. After ruling out pilot error and faulty mechanics, the main hypothesis became that the average American pilot had outgrown the cockpit, which was designed during the First World War.

In 1950, officials commissioned a new study to measure 140 dimensions of the human body to determine the new “average pilot.” Over 4,000 young pilots had their height, chest circumference, and other measurements taken for this endeavor. The “averagarian” thinking at the time was that a majority of pilots would measure near the average on most dimensions.

One researcher doubted this approach. Lt. Gilbert S. Daniels calculated the average of 10 physical dimensions believed to be most relevant for cockpit design and determined how many pilots measured near the average for all dimensions. Daniels himself was stunned by the actual number.



Zero. Out of 4,063 pilots, not a single one fell within the average 30 percent on all 10 dimensions.



Harvard Professor Todd Rose’s book, The End of Average, which features this story, debunks the idea that determining the average amongst a group of people will provide universal insight. As he puts it, “If you’ve designed a cockpit to fit the average pilot, you’ve actually designed it to fit no one.”



Rose believes rather that most human characteristics from size to intelligence consist of multiple dimensions which are weakly related to one another, if at all — a principle he calls “jaggedness.”

We sought to determine whether student performance, like pilot size, is “jagged.”

Our team analyzed the results from a past final exam taken by 1506 students in John DeNero’s UC Berkeley Computer Science 61A course. It consisted of 7 questions, 26 subquestions, and 154 rubric items*, with a mean score of 46 out of 80 total points.

Do “average” students exist at a question level?

We wanted to find out whether or not students were likely to score among the average across multiple questions on the exam.



Out of 1506 exam submissions, only one student scored within the average 20 percent on all 7 questions. Furthermore, only 60 students — less than 1 in 25 — scored near the average on 5 or more questions. In fact, 365 students, or nearly 25 percent, did not score within the average range on a single question.

We calculated whether or not students scored within the average 20% (+/-10% of the mean score) for each question.

Even among students with average total scores, ranging from 38/80 to 53.5/80, no less than 14 students did not score within the average 20 percent on any of the 7 individual questions.

While a polarized distribution of overall scores could explain why few students scored near the average on multiple questions, that was not the case. The score distribution was unimodal and the standard deviation was 17 points.

There is no average student.

For example, we looked at two students who both earned 47.5/80 (the median score) and determined that despite having average final grades, each were “A” through “D” students on individual questions. Furthermore, their question-level performance varied widely between one another, with a discrepancy of 25 points between their 7 question scores.