Testing Times: What Can We Learn About Data from the UK’s Great Exam Omnishambles

Opinion and Insight 240 Add to collection
Industry data experts on the importance of empathy and testing to avoid unintended consequences – like those seen with recent A-level and Scottish Higher results in the UK
Testing Times: What Can We Learn About Data from the UK’s Great Exam Omnishambles
2020 has not been a great year to be a UK school leaver. No leavers dance – or prom, if you must. No first pint in a pub – or if you haven’t met legal drinking age, sneaking off to the pub with a fake ID. No idea of what the grown up world of work is going to look like – and even if it will have any jobs for you. 

And then there’s the question of exams. In the UK, lockdown prevented end of year exams – and so the governments across the UK decided to give out grades based on an algorithm. The result? To cut a long story short: computer said no. 

The extent of the issue first became apparent when the Scottish Qualification Authority (SQA) released their results. The ‘normalisation’ algorithm, which took into account factors like teachers’ predictions, mock exam results and schools’ previous performances. 124,000 students had their predicted results downgraded – and this impacted students in more deprived areas more profoundly, particularly hardworking and talented outliers who had been predicted to excel despite difficult circumstances. Lo and behold, when A-level results came out across the rest of the UK a week or so later, the same pattern was observed and the media was flooded with tales of talented youngsters losing prestigious scholarships, industry training places and university places. Both the Scottish and UK government have, by now, agreed to walk back on the algorithmically-decided grades and to award grades estimated by teachers. As a result universities are oversubscribed and youngsters and parents are still trying to figure out the next steps.

It's a sorry state of affairs – but an instructive one too. It shows how data, poorly used, can entrench and reinforce systemic bias. It shows how ‘the power of prediction’ can poorly serve exceptional outliers. And how blind faith in ‘an algorithm’ reveals a lack of sophistication. Most pointedly, it reminds us that behind every data point is a human being, struggling through a crisis as well as they can.

We caught up with some of the UK industry’s data experts to find out what they make of the mess and what we can learn from it.



Alex Steer, EMEA Chief Data Officer, Wunderman Thompson 

Much of the confusion and anger over this year’s A-level results has focused on the ‘algorithm’ that the government used to allocate results based on the past performance of students and schools, rather than relying on the predicted grades given by teachers. It’s made many people suddenly, shockingly aware of the influence of predictive models over so many areas of our lives. 
 
But describing the model as a mysterious ‘black-box algorithm’ misses the point. It’s painstakingly explained in a 300-page technical paper on the government’s website. It’s not light reading, but it is clear. It has its flaws – such as a built-in bias towards small classes and niche subjects which ends up benefiting affluent schools – but these are by design, not accident. The model didn’t go rogue, it did what it was told to: prevent grade inflation, be a realist not an optimistic, and ignore outliers.
 
After a pandemic and heading into a downturn, nobody wants that kind of realism, and we're all cheering for the outliers and the underdogs. Reducing young people's futures to a formula wasn't a failure of maths, it was a failure of humanity. Ministers, not models, are to blame for this one.
 
The lesson for leaders isn’t to avoid algorithms. We need to understand their assumptions, scrutinise their recommendations, and be accountable for the decisions we make using them. 


Leila Seith Hassan, Head of Data Science & Analytics, Digitas UK

Most people don’t think about algorithms and the impact they have on their lives. That changed last with last week's A-level and GCSE results.

There is an assumption that because algorithms are technology, data and maths, they are fairer and less fallible than their human counterparts. They are free from biases and assumptions that make us human. But algorithms learn from historical data that contains all the past decisions and behaviours of humans. And ultimately, humans still build and train algorithms.

Last week’s results were based on an algorithm that looked at past school and student performance but fell short of accounting for students who were outperforming, schools that had made recent improvements or exceptional teachers. So, by default, it favoured those in areas that had long histories of doing well.

It missed factors around the schools such as neighbourhood and demographic changes or property prices and around the students (which is not possible and screams of GDPR violations) such as changes at home or activities undertaken outside of school.

Because of this missing information, the algorithm arguably wasn’t good (unless you went to school in a privileged area) and the scores and future of tens of thousands of students were at risk.

In adland there are five lessons we should take:  
1. Is the data fit for the purpose?  
2. What is the benefit of using an algorithm? Or is it just a shiny new thing?
3. Do we understand the results (predictions)? Could we explain them?
4. Is this negatively affecting specific groups of people? Why?
5. Is it fair?



David Fletcher, Chief Data Officer at Wavemaker UK

Scratch just a little bit under ‘led by the science’ and you’ll quickly hear that the science itself is ambiguous, nuanced, containing as many if not more uncertainties than it does inviolate knowledge.

Those of us who work in data know parallel truths, and many have been the time when I’ve stopped clients who’ve expressed a desire to be ‘led by’ the data – when informed or fuelled are far more scalable for practical application.

Most algorithms play out over multiple iterations. Small scale tests for provable confirmation of hypothesis lead to larger trials with multiple tests for sensitivity at the margins and then deployed at scale still with tests to track continuing performance over time.

Even Google’s algorithms change multiple times a year. The changes in consumer behaviour and the destinations search can point to require change to keep pace, but even Google’s algorithms have glitches and unforeseen consequences that get course-corrected over time.
And very few algorithms have meaningful individual consequences that matter. Credit scorers and insurance businesses get close but have big regulators that have many years’ practice in the issues and the art.

So pity, then, the poor data scientists tasked by Ofqual to solve the impossible.
Rock, meet hard place.


Justine O’Neill, Director, Analytic Partners

Humans are hardwired to have inbuilt biases based on our own lives and experiences and a tendency to ignore evidence that doesn’t fit with our world view – the confirmation bias. Which is why scenario planning using historic and live data is such a powerful tool for marketers. But data and the tech behind it can only be as good as the way it is used. As someone once succinctly said, ‘put rubbish in, you’ll get rubbish out’. The recent algorithmic-generated A level grades fiasco is a prime example of how important it is to use data wisely. In all walks of life and for marketers especially, we need to make the data and tech work for us, not us for it. We need to factor out our human bias, but not forget our human insight which will provide the empathy bridge between businesses and their data to provide robust results.


Nic Pietersma, Director of Analytics, Ebiquity

Algorithms are getting a lot of bad PR at the moment, but an algorithm is just a set of instructions or a mathematical routine that needs to be followed. Algorithms aren’t intrinsically good or bad – they should be judged by their usefulness.
 
In this case, Ofqual seems to have misjudged the legal and political ramifications of downgrading results to the extent that they have. Accepting teacher assessments may have been the lesser of two evils, but no doubt would also have repercussions elsewhere in the university selection process.
 
In programmatic marketing we often trust algorithms too much, without anyone in the room having a full end-to-end understanding of what they do with our investment. Our advice to clients is to have some form of validation to regularly ‘kick the tyres’ on the algorithm – we recommend transparent test and control methods.

Sign up to our newsletters and stay up to date with the best work and breaking ad news from around the world.
LBB Editorial, 1 month ago