The limits to school improvement

The evidence has been growing for some time that our present understanding of school improvement and the pre-eminence we give to certain forms of data to evaluate that may have reached the end of their useful life. And the root of the problem lies in our inadequate understanding of the nature of that data, whilst we try to make use of it for purposes fro which it was never intended.

Stephen Gorard in 2009 highlighted serious doubts about school effectiveness. He highlights a number of methodological weakness in the use of statistics, especially the propagation of error at every stage and concludes, in the context of the use value-added scores.

It is not enough to do well. Others have to fail for any school to obtain a positive result. Or more accurately, it is not even necessary to do well at all; it is only necessary to do not as badly as others.

Although value-added in that form has now been replaced in government data, fundamental problems remain in our new focus on pupil progress. Becky Allen, in a 2018 blog, highlights the difficulty, perhaps even the impossibility of accurately measuring individual pupil progress, let alone school progress. She suggests:

When we use standardised tests to measure relative progress, we often look to see whether a student has moved up (good) or down (down) the bell curve. A student scored 114 at the end of the year (having begun scoring 109) On the face of it this looks like they’ve made good progress, and learnt more than similar students over the course of the year. However, 109 is a noisy measure of what they knew at the start of the year and 114 is a noisy measure of what they knew at the end of the year. Neither test is reliable enough to say if this individual pupil’s progress is actually better or worse than should be expected, given their starting point.

When we turn to look at national high-stakes testing and use it for school accountability, a further complication arises from the nature of the bell curve against which standards are calibrated. Tom Sherrington’s
blog highlights the extent which it has to be a zero sum game. It is by definition the case that a school can only succeed as long as someone else fails. Not everyone can be at the head of the bell curve.

Finally, the evidence has been clear for a while that in terms of factors that influence educational outcomes, schools only have about a 30% effect. The other 70% or so lies outside the school. (See for example Silins and Mulford, 2002, Moreno et al., 2007)

While it is of course entirely right that schools are as effective as they can possibly be in influencing the factors in their control, it is perhaps also time recognise the inbuilt limits of this. Perhaps then we might stop committing logical errors such as expecting everyone to be above average and start looking at how the school can extend its influence beyond the school gates for the next of school improvement.

Moreno, M., Mulford, B. and Hargreaves, A. (2007). Trusting Leadership: From Standards to Social Capital. Nottingham: NCSL

Silins, H. and Mulford, B. (2002). Leadership and School Results, in K. Leithwood and P. Hallinger (eds), Second International Handbook of Educational Leadership and Administration. Norwell, MA: Kluwer Academic Press

Living with uncertainty and the measurement trap

Perhaps it's because we're uncomfortable living with uncertainty that we've been lured into the measurement trap.

In any redefinition of school quality and purpose, the measurements around quality and purpose, as well as the forms of measurement used, need to be taken carefully so as to reflect that change. ‘Don’t value what you measure, measure what you value’ may be an old adage, but is one that is often neglected in the world of education today. Measurement is important, and data is really valuable, but the purpose needs to be clear. The role of data, properly understood, is not to provide definitive answers but to support the asking of powerful questions.

A measurement is an observation that quantitatively reduces uncertainty. Measurements cannot yield precise, certain results, though they can help to reduce uncertainty.

The object of measurement must be described clearly, in terms of what we are seeking to observe. Even if you cannot measure exactly what you want, you can learn about your area of interest with related data. A business may not be able to measure the exact benefit of a happy customer, for example, but it could get measures which give evidence of the value and magnitude of its work. It could also get measures of the cost of dissatisfied customers.

But with all measures, you must use judgement. The danger is that we often mistake the measure for the thing itself. Measures are a proxy, and we need to understand the limitations of the data we use.

We should not just pretend that the data we have tells us everything we need to know. We need to think and ask powerful questions. We need to understand. And we need to exercise our judgement.

Perhaps that is what we're really uncomfortable with.