In the last post, we discovered that random guessing results in the same average score on both a matching question and a set of multiple choice questions. However, I still feel as though there is a difference between the two types of questions. Personally, the fact that it is impossible to score a 3 on the matching question piques my interest. To investigate the problem, we can use some of the results from the previous post.

Recall, we had the following table for the matching problem:

We can convert the ‘ways’ into probabilities by dividing by the total number of ways, 24. This gives:

Next, we need to compute the same table for the multiple choice situation. The math is simple, if a bit long:

That table is strange indeed. Comparing the two types of questions, nothing looks the same between them! Starting on the left, we see that with the matching question, you are around 6% more likely to score no points when compared to the multiple choice questions. That’s 6 more students out of 100 who would score no marks! Then, on the far right, you are 10 times more likely to score 4 points (full marks) on the matching than on the multiple choice! If you had a class of 100, 4 would score full marks on the matching while none would score full marks on the multiple choice. That hardly seems fair.

In fact, it feels like the matching question is ‘all or nothing.’ While you might score 1 point on average (see the previous post), you are more likely to score no points or full points than if you had been guessing on a set of multiple choice questions.

In statistics, we quantify this variability in the outcome using the standard deviation. Using this language, we would say our matching question has a higher standard deviation than our multiple choice question, even though they have the same average.

As an analogy, consider a room of big NFL players and skinny mathematicians. The average weight of a human in the room is probably around 200 pounds. However, there will be a huge variability in that weight; with some people weighing in around 275 and others around 150. This means our room would have a high standard deviation.

If instead, you had a room consisting of only 5 foot 10-inch-tall males with an average build, the average weight might still be around 200 pounds. But most of these people would be close to the average weight. This means our second room would have a low standard deviation.

So which question type (or room) is fairer? It really comes down to preference. While having question types with low variability might seem ideal, this is not set in stone. We appreciate the variability in the human population. If everyone was basically the same size, football wouldn’t be nearly as interesting. In fact, taking the low variability idea to the extreme, we could create a test that consisted entirely of true-false questions. While this test might seem fairer, I doubt many students would enjoy it. I for one, appreciate variability in my test questions.