Comparative Judgement, Part 2

Some weeks ago I wrote a detailed blog post on a comparative judgement exercise my department undertook to evaluate its effectiveness in assessing Year 12 English Literature unseen mock examinations. David Didau and Chris Wheadon both offered detailed, thoughtful comments on our methodology, and I resolved to try the process again, this time using my Year 11 students as markers. This blog post reports what I learned from the process, and I hope will prove useful to teachers interested in this powerful tool.

The method

All students had completed a 1 hour timed piece of writing (IGCSE English Language ‘Directed Writing’). I anonymised their work, assigning a random three-number code to each student. These I scanned, uploaded to and set up my class as judges. I briefed the students on what I wanted them to do, telling them they should average about 30 seconds per judgement. I included myself in the judging panel, to determine my infit with the students.

The results

were interesting.

Here are our top three candidates, according to the comparative judgement exercise:

CJ Top 3

There was a fair level of consensus on the best essay (standardised score of 86.23, comfortably outside 2 standard deviations from the mean of 50 (σ=15), whose fluent, sparky prose clearly caught our class’s eye. The next best candidate was also highly praised, but less uniformly: their infit score was above 1, though not by much. The third best was a long way back, and the top of a big cluster of results.

The thing was, our judges had got it wrong. The best candidate set up a line of argument in their introduction, and promptly completely contradicted themself on page two. Equally, after a purposeful start, our second-placed candidate lapsed into those tiresome ‘semi-sentences’ so beloved of teenagers (omitting verbs, or using commas and full-stops as one were only a sort of non-committal version of the other). Neither was our best. Our best got off to a slow start, but by the end of the essay had developed an argument of genuine sophistication and force.

My Conclusions:

  1. Judges can make reliable judgements quickly, but a quick judgement is not always a reliable of a complex argument;
  2. Teenagers are seduced by the meretricious, and not inclined to read carefully;
  3. This exercise taught me a lot more about what my students thought was good, than about the genuine quality of the work. Therefore, it was an unmitigated, unexpected success.
    1. If my class agrees that a flawed essay is brilliant, I need to address that in my teaching.

And so, my journey of comparative judgement continues.


I strongly recommend you have a look at Chris Wheadon’s blog and David Didau’s post on Rethinking Assessment for more on this fascinating development in how we assess our students’ work.

Comparative Judgement, Part 2

Reading aloud allowed

My first post on this blog was an attempt to put into words my thoughts having completed a comparative judgement trial based on Lower 6th essays completed for a timed examination. I am exceedingly grateful to David Didau and Chris Wheadon for their generous and thoughtful comments, which have greatly expanded my understanding of the process and its rewards.

One of the most telling criticisms made of our methodology was the time it took for a teacher to arrive at a judgement. Our median time was about five minutes; Chris Wheadon claims evidence that reliable judgements can be achieved in as little as seven seconds with a median of about half a minute. There are a number of factors which slowed us down, but it prompts me to ask: how often do we read a student’s work really closely?

I would suggest, not that often. The standard model of marking for an English teacher is about as bad as you can get: a pile of essays must be waded through, and each assigned a slot on an arbitrary scale. This makes our reading bad in two ways: firstly, we have a pile of essays to mark, so are disinclined to spend long on each essay; secondly, we tend to look for things we can tick, ‘analysis of language’, technical vocabulary and suchlike.

I’ve tried a range of moderation strategies in my department, from blind marking and submission of marks anonymously (try using Google Forms for this), ranking exercises, paired marking and suchlike. Each has their place, yet none come close in terms of real value to sitting down together around a table and reading students’ work aloud. I pick out a pretty random selection of essays (controlling for gender if it’s obviously skewed), anonymise them and then, we read them. Aloud.

Reading aloud is slow and effortful, horribly inefficient when it comes to that pile of essays. Yet it assures three things:

  • Every teacher around the table has taken in every word that candidate has written. Silent reading cannot guarantee this, even with trained professionals (our attention may be elsewhere, or the handwriting may just be too crabbed to read);
  • We have had to make sense of the work. To read aloud is to turn words into meaning, the voice articulating ideas and their relationship to one another;
  • We will arrive at a shared view of what the writer has actually said.

Having done this, we can begin to apply the mark scheme, working carefully on thoughts which are fresh in our mind. Going through four or five essays takes about three quarters of an hour, but it is time very well spent.

Does it make a difference? I think it does. It reminds us what we’re looking for and what we think is a good answer, as well as reminding us to look beyond the easily-ticked technical terms and suchlike. Equally importantly, it’s led to more consistent marking, with less variation between teachers, and, hence, more reliable data on which we can base our discussions.

Why slow reading matters

One of the most humbling experiences of my career occurred while at a standardisation meeting for Pre-U English Literature. We’d read essays aloud around the table, dissected their arguments and I’d enjoyed every moment of it. Later in the session, the team of examiners were divided into pairs to take away a small pile of scripts and mark in tandem, each moderating the other’s work. I was paired up with a colleague some decades my senior, and it was totally instructive to watch her work. When she read a student’s essay, it was as if she were in the room with that student, talking to them as they wrote, reading and re-reading to ensure she was completely certain of what they had said. When they misquoted, she knew instantly (these essays were on Hamlet, Measure for Measure, Henry IV Part 1, as well as works by Pinter, Churchill and Jonson); when they misrepresented the play, she picked them up on it; when they illuminated something, she praised them to the skies.

Ever since then, I’ve known I ought to replicate that care and attention. I’d like to say I attempt it with every pile of IGCSE essays, but I’d be lying through my teeth. Sometimes, though, I do lock myself in my classroom and read aloud, trying to put a voice to the words on the page. I listen better that way.

Reading aloud allowed