Comparative Judgement, Part 2

Some weeks ago I wrote a detailed blog post on a comparative judgement exercise my department undertook to evaluate its effectiveness in assessing Year 12 English Literature unseen mock examinations. David Didau and Chris Wheadon both offered detailed, thoughtful comments on our methodology, and I resolved to try the process again, this time using my Year 11 students as markers. This blog post reports what I learned from the process, and I hope will prove useful to teachers interested in this powerful tool.

The method

All students had completed a 1 hour timed piece of writing (IGCSE English Language ‘Directed Writing’). I anonymised their work, assigning a random three-number code to each student. These I scanned, uploaded to and set up my class as judges. I briefed the students on what I wanted them to do, telling them they should average about 30 seconds per judgement. I included myself in the judging panel, to determine my infit with the students.

The results

were interesting.

Here are our top three candidates, according to the comparative judgement exercise:

CJ Top 3

There was a fair level of consensus on the best essay (standardised score of 86.23, comfortably outside 2 standard deviations from the mean of 50 (σ=15), whose fluent, sparky prose clearly caught our class’s eye. The next best candidate was also highly praised, but less uniformly: their infit score was above 1, though not by much. The third best was a long way back, and the top of a big cluster of results.

The thing was, our judges had got it wrong. The best candidate set up a line of argument in their introduction, and promptly completely contradicted themself on page two. Equally, after a purposeful start, our second-placed candidate lapsed into those tiresome ‘semi-sentences’ so beloved of teenagers (omitting verbs, or using commas and full-stops as one were only a sort of non-committal version of the other). Neither was our best. Our best got off to a slow start, but by the end of the essay had developed an argument of genuine sophistication and force.

My Conclusions:

  1. Judges can make reliable judgements quickly, but a quick judgement is not always a reliable of a complex argument;
  2. Teenagers are seduced by the meretricious, and not inclined to read carefully;
  3. This exercise taught me a lot more about what my students thought was good, than about the genuine quality of the work. Therefore, it was an unmitigated, unexpected success.
    1. If my class agrees that a flawed essay is brilliant, I need to address that in my teaching.

And so, my journey of comparative judgement continues.


I strongly recommend you have a look at Chris Wheadon’s blog and David Didau’s post on Rethinking Assessment for more on this fascinating development in how we assess our students’ work.

Comparative Judgement, Part 2