Assessment has seen a lot of evolution recently. New accountability measures across Key Stages recently has forced schools to re-consider their models. To adapt, a few teachers in my science department have been reading Daisy Christodoulou’s lucid new book on assessment: Making Good Progress? We have also been reading various books, blogs & research papers to help us consider findings of cognitive science and various forms of instruction to improve our curriculum and assessment model.

We are modifying our assessment model to allow more valid inferences to be made from the data, and to help our pupils to make better progress. This blog post documents some of this thinking. Please feel free to comment and advise me on how valid(!) my interpretations are.

The main message in Christodoulou’s book, for me, is the need to make a clear cut distinction between assessment for formative and summative purposes. These two have been somewhat jumbled in my mind, and her thesis in this book is helping me re-organise my thoughts on assessment, and how they fit in with my evolving views on curriculum and pedagogy.

Let me begin by summarising some of the research and implication for teaching practice that Christodoulou presents in her book:

Research: skills are not generic and transferrable – they are domain-specific.

Teaching: plan curriculum to teach knowledge, not generic skills. Content is king.

Research: working memory is limited; long-term memory has no known limit.

Teaching: direct instruction is more effective than project-based activities at teaching pupils knowledge and (domain-specific) skills.

Research: practicing tasks resembling the type given in summative assessment is less effective than deliberate practice of tasks which break down skills into their constituent parts.

Teaching: give pupils lots of opportunity to practice tasks which may not resemble the desired end goal, but will help pupils make progress in each component that sum to the end goal.

The third idea is the one that has been most helpful in categorising the kinds of assessment I give my pupils and the feedback that I share with them. To contextualise the ideas around summative assessment, Christodoulou points out that some subjects lend themselves to a difficulty model of assessment, where pupils sit exam papers with questions of increasing difficulty (e.g. Science). Other subjects lend themselves to a ‘best-fit’ model, where all pupils are examined on the same task, with their performance matched to descriptors of different levels (e.g. English).

Implications for Assessment Models in Science

My department’s approach to assessments has been to give an exam paper at the end of every unit of teaching. E.g. after 3 weeks of ‘Cells & Organisation’, Year 7s would sit an exam with questions about cells. After 4 weeks of teaching ‘Photosynthesis and Respiration’, pupils would sit an exam on this very topic. Our whole year was mapped out into such distinct units, each separated by a summative assessment. Teachers marked the tests and recorded the results; historically, as a national curriculum level and later changing to one of 4 labels (ranging from ‘below expectations’ to ‘exceeding expectations’… or something of the sort). For GCSE, we made UMS convertors to record a grade. We would then give formative feedback on the exam paper; initially an individualised approach, later a whole-class feedback approach.

Reading Christodoulou’s book reveals four faults in this model:

Summative tests were far too frequent.

The purpose of a summative test is to obtain ‘shared meaning’. What does this mean? Firstly, it is a snapshot of where pupils are in their learning, allowing comparisons to be made between and within cohorts. For example, it can help identify pupils further behind than others, identify classes who might need more support, and identify topics that pupils find particularly challenging. Secondly, it allows various stakeholders to hold each other to account.

With our model, the reality was that the short gaps between assessments made achieving either of the these purposes difficult. Furthermore, the data was likely to be an overestimate of pupil learning since the exam tests only what pupils have learned recently.

In order to genuinely capture learning over time, we must consider the cognitive science definition of learning: ‘a change in the long-term memory’. To measure long-term changes, assessments must be spaced out over much longer periods of time.

The solution: use a maximum of 3 summative assessments across the year, with each being cumulative. This provides sufficient data for the intended purposes. Being cumulative makes the data a more reliable measure of learning: a better measure of change in the long-term memory.

2. Formative feedback was given on summative tests

For many questions used in summative assessments, there are so many factors that could explain why a pupil did not answer correctly: misunderstanding a command word (describe/explain is notorious); literacy in a question; not understanding a new application etc. In light of this, Christodolou’s critique is bang on: giving formative feedback to pupils on summative tasks is muddled as it is likely to miss the reason the pupil answered incorrectly. Yet, there are plenty of science questions in summative tests which clearly reveals misconceptions. But these tend to be AO1 (knowledge/recall questions); a limited part of the summative assessment. Why not focus on these during instruction rather than after? You could also argue that feedback can still be given on more complex test questions on aspects specific to the summative nature of the task: on exam technique and question interpretation etc., bearing in mind that these can be broken down into procedural knowledge parts (e.g. defining ‘describe’ as a command word).

On reflection, what is essential, is that the teacher be clear that this is the intention of the feedback. Furthermore, each of these separate components can and should be taught and assessed through deliberate practice first. The kinds of post-hoc analyses and feedback we were used to giving were actually skills that we had not explicitly taught before, and so, were more of an after-thought than a considered part of the curriculum. Which brings me to the third problem with our model…

3. Our assessments did not always test what we intended to

Firstly, for teachers in our department to write the high number of exam-style assessments we were using meant they were often rushed. Time is our biggest constraint. They were almost always made as teachers were reaching the end of teaching a unit of work. Secondly, they often included skills and content that pupils weren’t necessarily taught explicitly. For example, I might only find one good question from a bank of exam questions about diffusion. However, this question requires interpretation of a table and a graph, which is something that has not been practised sufficiently before. This is a flaw, because a crude analysis of the exam data may suggest diffusion as a weakness of my class, when in fact it could be interpretation of graphs/tables. Even if I do not make this flawed inference from the assessment data, I have failed to capture any meaningful data on whether my pupils understood diffusion as a concept in the summative test, even though I intended to when I made the test.

To solve both of these problems, we need to clearly map out agreed knowledge – both declarative (diffusion is…) and procedural (to read a table, first I…) – and make it an explicit part of the curriculum. Following this, we should write the assessment at the start of the unit. The clarity from the onset must be thus: here are the types of questions I expect my pupils to be able to answer. Of course, teachers shouldn’t have access and know exactly what will come up, otherwise we end up (subconsciously or otherwise) teaching to the test.

Furthermore: Develop formative feedback tasks that allow teachers to isolate specific knowledge in a timely way.

4. Mastery after a taught unit was (implicitly) assumed

By giving an exam-style assessment, we are assuming that pupils have mastered content, and will be able to apply it to new contexts. In reality, we were giving assessments at such short intervals that pupils were not mastering content. We were not building sufficient time into schemes of work to recap and review what content was covered. It was very much a ‘progress within the lesson’ model of teaching and learning.

Whilst ‘AfL’ was a part of our lessons, it looked something like this:

Check what pupils have understood within the lesson the content was taught

Written feedback on a specific piece of work, pre-selected in each scheme of work

It is now apparent, that these sorts of ‘AfL’ assess performance rather than learning. It was uncommon to assess content taught many lessons ago.

The end point of a topic in terms of sitting end-of-unit assessments also meant that there was no opportunity to review learning after the assessment (other than one feedback lesson – see point 2). If pupils did not master topics (as evidenced by the assessment), there was not time to go back to it, until perhaps towards the end of the year.

The solution: regular review through interleaved, spaced retrieval practice. We now use recap/drill quizzes at the start of every single lesson. These cumulatively give pupils the opportunity to retrieve knowledge from prior lessons within the current topic, and often, topics taught a long time ago.

I hope I have given a clear picture of what we used to do and why it wasn’t the best way to help our pupils make progress. I have alluded to some changes we are making. Assessment and curriculum are inextricably intertwined – both must be modified alongside each other. So, where are we heading with respect to curriculum and assessment?

We are still in the process of remodelling. One key source of input will be ongoing discussions with other teachers I’ve been fortunate enough to talk to on Twitter. Dr. Niki Kaiser has been a catalyst of these discussions, and has compiled a brilliant set of sources here on cognitive load and on mastery, which will be the basis for further discussion.

In a (near-)future post, I hope to share how these talks have helped us in re-shaping our curriculum and assessment model:

What do our retrieval practice quiz starters look like?

How do we ensure content is delivered consistently across the faculty, with a focus on broken down knowledge and skills?

How will we ensure pupils get sufficient practice to master content?

How will we teach and space revision to ensure pupils remember content forever?

How will we regularly assess pupils to reveal areas of weakness and give formative feedback?

How will we summatively assess pupils to make valid inferences and create ‘shared meaning’?

If you have any ideas or are at a similar or more progressed stage of curriculum/assessment model development, I’d be grateful to hear. @Mr_Raichura

PS You can follow/join science-specific discussions on applying findings from cognitive science by using the hashtag: #cogscisci