Exams over and results awaited, time to catch up on a recent Cambridge Assessment report on the causes of volatility in GCSE results.

Big grade swings from year to year are a reliable cause of headaches, heartache and hand-wringing in late August. It is tempting to see this report as a pre-emptive strike by an exam board against the inevitable cries of "foul".

Indeed, the authors are at pains to show that when schools suffer big swings, the cause is not uneven marking or the setting of grade boundaries. The bulk of year-on-year volatility in any particular subject is explained by changes in the ability of a school’s cohort (measured by performance in other GCSE subjects – which does seem a little circular, but still…).

Some residual volatility remains, but while every year it leaves some subjects, schools and students in a bind, it is dismissed in this study, because while it can’t be explained, it can be quantified and predicted using a probabilistic model.

In fact, the authors go so far as to suggest that, from a statistical point of view, it would be “worrisome” if all students achieved the results that were expected of them.

The study is focused on the performance of GCSE in terms of system and schools, rather than on individual students. It concludes that the qualification performs predictably in aggregate terms, but is disarmingly honest about the uncertainties involved in individual student performances – defining students as “independent random variables”.

Students, we are told, “are not machines who can access the same information from memory every single time it is required.” (It is scary that GCSE is conceived as primarily a test of memory, rather than of knowledge, understanding and application.)

So-called “normal” variation in a student’s performance is put down to such factors as state of mind and confidence, diet, even the temperature in the exam room – the latter is suggested as a reason why a candidate might get a D rather than a C.

Grade distributions system-wide are predictable; the grade an individual ends up with comes down to the operation of “chance in an indeterministic system”. The authors conclude that there will always be “inherent uncertainty in the outcome of any individual pupil on a specific exam”.

The under- or over-performance of individuals is not a problem because it is evened out in the aggregation. Except that it is a problem for the individual. What happened to the reliability principle – that the grade a student gets should not depend on the year (let alone the day) she took the exam?

And if an assessment outcome is truly “indeterministic”, how can anyone claim that it is criterion-referenced? This study goes a long way to undermine oft-made claims about the value of high-stakes exams.

The study exonerates GCSE in statistical terms and concludes that it has utility as a measure of school effectiveness.

In acknowledging the uncertainty inherent in individual student outcomes, though, it fails to follow up with the obvious conclusion – that results from one-off exams should be used with great caution – grades should be seen not as absolute values but as statistical approximations bounded by clear confidence limits.

Yet this is precisely not how these summative assessments are used in the real world. What should be a snapshot of an individual’s performance on the day is inscribed instead as an indelible record of that student’s abilities – for all time.

Exam boards are not responsible for how GCSE grades are used by others. But the keepers of the high-stakes assessment flame do have a duty to warn of their misinterpretation and misapplication.

This report is as close as exam boards are likely to come to admitting that the whole paraphernalia of GCSE succeeds in monitoring system and school performance, but not individual attainment.

So, as results day looms, the best of luck to all you “independent random variables” – sorry, I mean students.

Dr Kevin Stannard is the director of innovation and learning at the Girls' Day School Trust. He tweets as @KevinStannard1

For more columns by Kevin, visit his back catalogue

Dr Tom Benton, principal research officer at Cambridge Assessment and co-author of the research report Volatility happens: Understanding variation in schools' GCSE results, responds.

I agree with Kevin Stannard that awarding organisations have a duty to warn others about the “misinterpretation and misapplication” of exam results.

This is precisely what our research was trying to do and why it is so thoroughly disheartening to see it misinterpreted as saying that the grades students receive are “largely down to chance”.

This statement is not to be found anywhere within our research for the simple reason that it is not true.

The grades that students get are largely down to their ability within the subjects that they are studying. However, I imagine anyone who has ever taken an exam can remember the feeling of “if only I’d answered that question in a different way”. As such they are probably aware that students are capable of a doing a little bit better on some occasions than on others (emphasis on “a little”).

Awarding organisations spend quite a lot of time trying to quantify exactly how much the “little” variation in pupils’ achievement between different occasions might be and trying to keep these variations as small as possible.

Indeed, in 2012 Ofqual published a reliability compendium containing nearly 1,000 pages of research into this subject (including large contributions from the awarding organisations themselves) which is rather in contrast to Dr Stannard’s suggestion that this is some kind of industry secret.

Our own research was about trying to work out what the effect of these minor variations might be on school level performance in particular subjects, and actually showed that volatility in schools’ results is rather more affected by easily observable changes in ability between different cohorts of students than by any unreliability in the exams themselves.

More importantly, our research was about ensuring that decision makers do not overreact to a single year of unexpected results in a particular subject.

It is extremely important that all users of exam results are aware of both the utility of exam results and the level of caution that should be applied in their use.

This requires a sensible recognition that exam results are neither “largely down to chance” nor an “indelible record of that student’s abilities – for all time”.

We maintain that exams are good indicators of student abilities. Communicating with the public about exactly how good they are and how much caution is needed in interpretation is not helped by exaggerated claims about the impact of chance.

Dr Tom Benton is the principal research officer at Cambridge Assessment

Dr Kevin Stannard responds to Dr Tom Benton of Cambridge Assessment

Tom Benton’s response takes aim, not at the substantive points in my comment piece, but at its headline. The latter was not in my control, and might very well seem a bit hyperbolic. It isn’t all down to chance. But the research report on grade volatility is based on a probabilistic analysis, and hence chance, indeterminacy and the unpredictability of individual student grades, keep cropping up.

There is very little with which to disagree in the research report. My point is that while its conclusions seek to confirm GCSE’s value system-wide, many of the observations made along the way cast a disconcerting light on the qualification’s value as a summative one-time assessment of an individual student’s attainment.

I would encourage everyone to read the article with an eye on what it says about individual students. The report refers to the “influence of chance in an indeterministic system”; in which each student “has the same amount of uncertainty in which actual grade he will achieve given his abilities”. We should read “performance scores as values selected from a range of possible outcomes”.

The authors stress “the inherent uncertainty in the outcome of any individual pupil on a specific exam”, whereby “a small change in the test conditions (the example used is that of the temperature in the exam room) might cause a candidate who would have achieved a C to fall just below the C boundary and get a D instead”.

Hence, “even with extensive data on a student’s typical performance, we never know how he or she will perform on a particular test taken on a particular day”.

Unsurprising, then, that in the study, students are described as “independent random variables”.

The question that follows, surely, is whether a stochastic approach to assessment should be allowed to underpin such a high-stakes qualification.

Dr Benton takes issue with my description of a student’s GCSE profile as an indelible record. Surely as a summative assessment, even allowing for the odd retake, that is precisely what it is.

Dr Kevin Stannard is the director of innovation and learning at the Girls' Day School Trust. He tweets as @KevinStannard1

Want to keep up with the latest education news and opinion? Follow Tes on Twitter and like Tes on Facebook