Ben Goldacre has recently authored a paper Building Evidence into Education, published through the UK’s Department for Education (which invited him to do). It has been extensively (mis)reported in the press and social media. A video of a related speech is available on youtube. I am intrigued by Goldacre’s arguments, and fascinated by the response his paper has provoked, see
1. The Guardian website, following an article that bears a curious subtitle, which Goldacre corrects);
2. Goldacre’s own website, with the paper and responses.
3. Geoff Whitty’s (Director Emeritus of the Institute of Education) ‘guarded welcome‘ to the paper, on the Centre for Education Research & Policy website, which asserts the need for a degree of realism
4. Prof Mary James’ (President of British Educational Research Association, and University of Cambridge) response on the BERA website, which raises crucial critical questions and doubts
5. Twitter, where, for example @MarkRPriestly refers Goldacre to literature about the need to treat RCTs with caution in social sciences.
A bit of context
For those who may be less familiar with science policy and media in the UK, Ben Goldacre is a key public figure who, among other things, holds scientists to account in terms of quality research, and the media to account for questionable reporting of research findings. He has recently moved out of the medicine / pharmaceutical area and written on educational research. An easy reaction is perhaps to position Goldacre as an outsider – neither teacher nor educational researcher. ‘Get off our lawns’ is generally not a very helpful response in my experience. I think educational researchers have benefitted enormously from insights from outside – historians, anthropologists, sociologists, economists: why not medics, too?
I wish to lay out clearly what I understand Goldacre is and is not arguing.
What Ben Goldacre is arguing
1. That we can improve outcomes for children and increase professional independence of school teachers by collecting better evidence about what works best, and establishing a culture where this evidence is used as a matter of routine.
2. That education can reap these benefits by replicating the successes that have been enjoyed in medicine, including the emphasis on randomised controlled trial (RCT) research, information architectures, and cultures of being research literate and research engaged.
3. That RCTs are the best process we have for comparing one treatment or intervention against another, for answering questions of whether something works better than something else.
I do not disagree with any of the aims Ben Goldacre is trying to achieve – who would argue for worse outcomes for children or reduced professional independence? (NB. I am not ignoring widespread views that teacher professional autonomy has been undermined in recent years; this is an important, but tangential issue for now).
Neither do I disagree with any of the claims I’ve identified above (a selection from many arguments Goldacre makes). I do have some caveats that I would append to his arguments, and some questions and comments that might explain aspects of the response that Goldacre says surprised him.
What Goldacre is not arguing
My reading of the paper, and of Goldacre’s rejoinders to comments, leads me to the following understanding:
1. He does not suggest that RCTs should replace qualitative or other quantitative approaches to research in education. He does suggest the balance needs to swing to extend the number of RCTs.
2. He does not suggest that RCTs are a kind of gold standard for all educational research. He is generally very careful in his wording, arguing that RCTs are the best we have for answering questions of what works. He explicitly acknowledges the limits of RCTs (for example they can’t tell us why something works), and acknowledges a valuable role for qualitative research (and presumably other quantitative methods, such as quasi-experiments where interventions are applied to groups not themselves created through random allocation).
Why I agree that RCTs are important and there should be more of them in (British) educational research
RCTs are an incredibly powerful research tool. Through random allocation of participants to either an intervention group or a control group, RCTs can take care of a whole bunch of complex factors that make research involving human beings (whether medical, educational, development-based etc) difficult. As Goldacre argues, they really are the fairest test for whether something works better than something else. Note, they only ever give a comparative answer, not an absolute one: they don’t provide a complete recipe for ‘what works’. They tell us that something works differently (and we interpret that difference as better/worse) than something else. Goldacre is clear on this.
They are amazing, and on top of the examples given by Goldacre, I’d add the High/Scope Perry project, that continues to build an incredible evidence base relating to the difference that a particular approach to nursery education can make to people throughout their lives. RCTs really are like Heineken: they can reach parts of understanding that other methods cannot. As long as your question is ‘Does X work better than Y?’.
The limits to RCTs
Ask any other kind of question, and the value of an RCT quickly diminishes. Goldacre acknowledges this, too. Ask ‘what should we do?’, and until you have a new idea or intervention to try against something else, an RCT is useless. Often challenges in practice and problems in research start from a more open question – the luxury of having things to compare comes later. And other research approaches, as well as pioneering practitioners, ideological commitments to social justice, technological advances, social changes etc. all have a role to play in getting us to the point of being able to try something new out and exploit the power of RCTs to give us causally robust comparisons of X and Y. We don’t always need an RCT to make important changes, either. We no longer allow corporal punishment in schools. It didn’t (nor should it) take an RCT to show that schools are better places without the cane.
RCT’s are not perfect, nor are they the best method in all circumstances. I am in no dispute with Goldacre on this point.
There are lots of important questions where RCTs might never figure. RCTs can not tell us what it is (morally) right to do, what is just. Other approaches are often better place to identify social inequalities that might remain hidden (for example, around gender or racial differences in educational outcomes, where results from national tests are very useful). RCTs are not necessarily blind to questions of social justice (depending on the outcome measures involved). I’m reinforcing the simple point here that there are lots of questions where RCTs are not the best approach.
Goldacre is not arguing that we should ignore these questions, but there is a risk in the way he presents his arguments, that questions of ‘what works’ are heralded as the most important. Paying less attention to other kinds of questions serves the argument for more and better RCTs, but could (unintentionally) lead to a narrowing of the conceived role of research in education.
RCTs as a valuable contributor to evidence that can contribute to evidence-informed practice
It might be possible to read Goldacre’s paper and to (wrongly in my view) equate the evidence used in evidence-based practice with outcomes from RCTs. Goldacre doesn’t say this, but he doesn’t talk in any detail about other kinds of evidence, or other sorts of relationship between evidence and practice. We might, hypothetically, start seriously looking at literacy teaching practices based on evidence that current curricular and testing regimes disadvantage certain students and reproduce social inequalities. If the only evidence that could influence practice was that coming from RCTs, we would have to ignore the other evidence that the status quo is grossly unfair.
Evidence is valuable to practitioners in helping point to ‘what works’ but is also valuable in other ways, and these alternatives are played down in Goldacre’s paper, in his construction of an argument that seeks to redress a perceived imbalance and neglect of RCTs. Such diverse evidence-practice connections apply in medicine, too – there are lots of things that doctors advise (eg suggesting you give up smoking) or do (eg heart transplants) based on other kinds of evidence, not just RCT outcomes.
So I agree with Goldacre – we do need more, better, and more joined-up RCTs, because they are so powerful and the best tools for comparing two or more approaches against each other. But to avoid possible over-extensions of this argument, it is important to be very clear about the important role of other kinds of evidence. It’s not that other research is useful for other things, and only RCTs can be used in evidence-based practice. Evidence-informed practice (a phrase I prefer because it points better to the requirement for professional judgement in interpreting evidence, which Goldacre mentions), can be enriched by all kinds of evidence.
What is educational research for?
I think Peter Mortimore (2000, p. 18) captured some of this in his writing on the role of educational research:
“Who else but independent researchers would risk making themselves unpopular by questioning the wisdom of hasty or incoherent policy? Who else could challenge inspection evidence and offer a reasoned argument as to how empirical faws had led to erroneous conclusions? Who else would dare say ‘the King has no clothes’? Who else would work with teachers and others in the system in order to look below the surface:
- to notice the unfairness suffered by those who are young for their school year yet for whom no adjustment is made to their assessment scores;
- to count, and to identify variations in, the numbers of minority pupils excluded from school;
- to point out that many of the supermarket shelf-fillers are our further education students trying to get by financially;
- to investigate whether adult learners need the same or a different pedagogy from pupils;
- to make fair comparisons of schools, as opposed to the travesty of league tables;
- to tease out why poverty is associated with failure in a competitive system, in which only so many can succeed, rather than just being an excuse for low expectations or poor teaching;
- to monitor trends and changes in educational aspirations, attitudes and attainments.
On the relevance of the medical model
Goldacre argues that education has much to learn from medicine in terms of the way research is conducted, and in particular the way practitioners are involved in research and the way doctors are trained and expected to be research literate.
As an educational researcher who has in the past done considerable work based in schools, there is no greater reward than thinking what you have found out has made a difference to the lives of teachers and/or pupils. There is no greater insult, to me at least, than to find one’s research accrues value only through citations by other researchers and makes no connection to the ground. As Goldacre rightly suggests, this connection is not only a property of the quality of evidence. It is also a question of the (perceived) relevance of that evidence, and on the user/reader’s ability to make good sense of that evidence. The relationship between evidence and practice isn’t as simple as finding out X is better than Y and then making sure all teachers read the relevant paper.
So I would be quick to welcome and celebrate a shift in school teaching cultures that brought teachers into a different relationship, both more routine and more critical (as Goldacre advocates), with research. I would add this should be with the full richness of the evidence base that educational research has to offer, not just outcomes from RCTs (and I don’t think I’m contradicting Goldacre here either).
Medicine and education may not be that different in some respects
Goldacre’s paper, and the responses and rejoinders online, do raise questions about how valid or relevant comparisons with medical research, forms of evidence, and practice are. Many (including Goldacre) have rightly shot down crude arguments that medicine is about ‘physical stuff like cells’ and education is about ‘people’ or that all diseases/patients are treated the same, that medicine is devoid of the kinds of social complexity that pervade education. One only has to look at the complex deliberations underpinning the development of NICE guidelines to understand that what happens in hospitals and doctors’ surgeries is not simply a question of knowing ‘what works’. Questions of what can be afforded, what is politically acceptable (remember headlines about postcode lotteries in breast cancer treatment?), what is practical – these all have a bearing too.
It’s easy to police boundaries and protect education from medical experts by screwing our eyes shut and shouting ‘but classrooms are different!’ as loud as we can. There are presumably some very important things to consider about the nature of medical and educational research and practices, and whether elements from one system might inform those of another (and shock, horror, this might even involve medicine learning from education!). I don’t think Goldacre offers an adequate account of these issues, but at least he acknowledges them. My critique is not of Goldacre’s oversight (there’s only so much one can say in a paper or a 20 minute talk), but of the risks that others simply elevate medicine as an ideal type and naively expect education (and other systems) to follow.
I am, however, curious to learn more about the sorts of trials Goldacre argues can be done cheaply, efficiently, and effectively in education. As he points out, medical trials have developed into complex designs involving treatments that are not just based on one pill or quantifiable medication regimen versus another – trials of psychotherapeutic interventions, for example. My understanding is that steps must be taken in RCTs in education to ensure compliance – that what is being done in classrooms is actually what the trial is supposed to be testing (as happens in some medical trials). This often requires teams of researchers to check and observe what is being done, which is very expensive, or places a burden of documentation on teachers. I’m not saying it’s not possible. I’m saying that I’d like to see Goldacre’s vision of cheap, robust RCTs (which involves all sorts of considerations about levels of randomisation and their relationship to levels of outcome measurement) explained in more detail.
Why medicine over other models?
What Goldacre doesn’t address, except perhaps indirectly in his introduction where he attributes the leaps forward made in medicine to RCTs, is why medicine should be held up as a model to replicate rather than other systems. Sure, medicine has come a long way in recent decades. So have other aspects of life, too. Why are educational practices and evidence best approached in a medicine-like fashion? Why not sociological? Anthropological? Why not arts-based? Why are medical models better than approaches that might acknowledge things that RCTs can’t, like morality, or justice? Answers like ‘because those aren’t objective’ or ‘can’t be tested fairly’ miss the point (it should be obvious why). I’d love to see Goldacre develop more sophisticated arguments as to why medicine should trump other conceptions of evidence and other notions of evidence-practice relationships. I am guessing there is more to this than the mere fact that Goldacre knows medicine best, but we would benefit from further explanation on such issues. I’m not defending educational research against outside influences. On the contrary, I strongly believe we are and will be better off for these. But we should understand this as a complex choice, with significant implications depending which perspectives get left out from arguments.
On the defensive reaction by some qualitative educational researchers
Goldacre replies to some of the responses to the Guardian article as follows:
“It is very odd, I think we’ve seen some rather peculiar protectionism here from qualitative researchers working in education. I’ve not seen this attitude among the very good multidisciplinary teams working on mixed methods approaches to medical research, where quantitative and qualitative research is done harmoniously with mutual respect, in my experience at any rate. It may be a peculiarity of the qualitative research community in education, or it may be that we are seeing only bad apples in this thread. I don’t think they do their profession any favours”.
What might lie beneath ‘protectionism’? Why might qualitative educational researchers react differently from their medical colleagues in mixed-methods teams? Why would we expect them to react in the same way?
Notwithstanding the histories of marginalisation that many qualitative researchers would argue they have suffered at the hands of pseudo-scientific dominance in educational research, I think part of the explanation lies in some of the ways in which Goldacre’s language might be interpreted, and the genuine sense of threat that such interpretations could pose to some scholar’s values, ethical commitments, and livelihoods.
Why might qualitative educational researchers (of which I am one), react differently from medical researchers in mixed-methods teams? Maybe because many of us are not in mixed-methods teams (for better or worse), but instead collaborate in other ways, for example working with teachers and schools in solely qualitative paradigms. Arguments that the pendulum should swing back to re-emphasise RCTs can be interpreted as a move that will diminish the place of other approaches. This was not Goldacre’s intention, but this is what many perceive has happened in the US as a result of the way federal funding for educational research has been allocated. Protectionism seems quite understandable, as part of a professional ethos that preserves mutual respect and place for different kinds of research (an ethos that Goldacre himself subscribes to). What surprises me is that Goldacre was surprised by this reaction.
On the representation of qualitative research
“Qualitative” research – such as asking people open questions about their experiences – can help give a better understanding of how and why things worked, or failed, on the ground. This kind of research can also be useful for generating new questions about what works best, to be answered with trials. But qualitative research is very bad for finding out whether an intervention has worked… The trick is to ensure that the right method is used to answer the right questions.” (p.13)
I agree wholly with the point that the right method is used to answer the right questions. In my view Goldacre’s paper does not adequately capture the range or value of qualitative approaches, and risks them being positioned as subservient to trials. I do not follow qualitative researchers who campaign against RCTs. I think we should have more of them. But this should not be at the expense of other approaches, and certainly not based on accounts of qualitative research that convey a potentially misleading and diminished view of what the alternatives are and what they offer. Goldacre does not clarify the extent to which he thinks ‘what works’ questions should trump other questions. Protectionism may reflect concerns that others may take Goldacre’s arguments as a basis for a narrowing of the kind of question (and by implication the kind of research that is valued) in educational research.
Indeed, Goldacre makes the very good point, I think, that educational research (or at least that which focuses on teaching and learning in schools), could be enhanced by pursuing agendas and questions from the ground up – ie. those identified as priorities by teachers. This would be very welcome, although I would always seek to preserve space for outsiders to pose questions, too, for they can often challenge assumptions and see possibilities that are difficult to imagine from the inside. But the bigger problem here, is if RCTs become a blanket preferential mode of enquiry (which is not what Goldacre advocates, but is not implausible). Rather than opening up the possibility for teachers to lead the direction of research, this would close it down by limiting the kind of questions that teachers can ask to those of a ‘what works’ variety. There are myriad other important kinds of question that teachers want to ask, too.
The overlooked value of locally-based, locally-relevant research
There’s something curious about Goldacre’s critique of piecemeal individual projects that are oriented to figuring out what works locally, and his open admission that RCTs don’t often generalise: there is rarely going to be a ‘what works’ solution that applies to all schools, age groups, subjects etc. I agree that isolated pockets of poorly supported research that never leaves the boundaries of a particular institution isn’t a great set-up. So yes, we need more joined up infrastructure, for research and for disseminating and sharing evidence. But could not such local projects also be ideal ways to test out, empirically, and in an evidence-based way, how local conditions shape the meaning of RCT outcomes developed elsewhere? Might not some of these projects into which teachers pour their heart and soul, which Goldacre criticises for turning out to be too small, lacking robust design (p.17), in fact be avenues for translating distant evidence into locally relevant forms?
A related point concerns the critical research literacies mentioned by Goldacre. These are of course important, and if teaching is to benefit from any kind of research evidence, there must be critical appraisals of that evidence. But that critique cannot be limited just to understanding RCTs and ‘what works?’ kinds of research. Such critical skills should also involve understanding different approaches and their value. Goldacre doesn’t close off on the kind of critical understanding he’s advocating for, but I think it’s important to be really clear that a narrow RCT-focused literacy will not suffice.
On the perils of misinterpretation
In what was mentioned above we find an example of how some of the language used might have (perhaps unintentionally) provoked the strong reaction from some qualitative researchers. There is potential for readers to infer from Goldacre’s wording an equation of ‘small’ with lacking robustness (and such readings are readily apparent in the comments on the Guardian webpage). If big sample = better research, then we should look away from RCTs and more to statistical analyses of existing datasets, and the use of various regression models to figure out which schools are performing best. Sample size is a poor proxy for research quality. Goldacre knows this, but not all of his readers appear to notice this point.
There are other moments, too, for example when Goldacre mentions the risk of pilot studies ‘misleading’ on benefits and harms. I agree such risks are real and important. But it is not the pilot itself that poses the risk. It is the flawed interpretation or application of findings that poses the risks. Qualitative researchers might be forgiven for interpreting what was written as laying the problem at the door of qualitative research itself, rather than at the door of those who mis-use or abuse its outcomes. Yes RCTs are the only true ‘fair test’ but this doesn’t make other approaches ‘unfair’ provided they’re not doing a certain kind of test. Goldacre knows this. Many of his readers may miss this point.
Then there is the different language used in the 20 minute talk, which was less precise in its wording and thus more open to misinterpretation. Goldacre spoke of people being ‘horribly misled by weaker forms of evidence’. Any evidence, from an RCT or otherwise, has the potential to horribly mislead. Any evidence, from an RCT or otherwise, may be strong or weak depending on the question. The care Goldacre took in his written paper to manage these issues was less evident in the speech, in which listeners could easily be led into equating RCT with strong evidence, and other approaches as misleading and weak. This only applies to ‘what works’ questions. This is reinforced by phrasing that links good quality evidence with RCTs, again without explicitly placing a caveat of ‘only if we are asking is X better than Y’. And again in talk of swinging the balance towards more robust quantitative research. More robust than what? The potential for listeners to interpret this as a slight against qualitative research, or as a suggestion that qualitative evidence by definition lacks robustness, is clear.
As an aside, Goldacre also contrasts ‘nerdy academics’ with ‘teachers on the ground’ – setting up another potentially damaging binary. In particular this kind of talk fuels the Govian anti-academic rhetoric and misleads the public into outdated conceptions of ivory tower academics (see Pat Thomson’s blog on this). Many educational researchers are in schools week in, week out, working with new and experienced teachers. They are ‘on the ground’. They are also ‘on the ground’ because most academics are also teachers, themselves. This applies not only, but particularly, to educational researchers. If having a scholarly or theoretical interest in learning and pedagogy makes us nerds, then I’ll wear the nerd badge with pride. But I do take issue with characterisations that reinforce notions of aloof nerdiness against on the ground realism.
Another binary set up in Goldacre’s talk is between evidence-based practice on the one hand, and leaving everything to individual professional judgement. I’m convinced Goldacre has a more sophisticated view of practice than this – his writing about the need for critical appraisal of research suggests so – but again this kind of phrase can provoke defensive reactions, and risks being taken up in unhelpful ways if not set in a wider context.
On the risk of over-promising
Finally, there are some other very real risks that Goldacre himself acknowledges. He rightly says that evidence based practice isn’t about telling teachers what to do. As if evidence (from RCTs or otherwise) could ever be so prescriptive. Goldacre imagines a greater role of RCTs and networker participation of teachers in research, supported by experts, and feeding into two-way information architectures of setting the profession free from governments. Forgive me if I don’t hold my breath. For the simple reason that even if we were to achieve everything Goldacre sets out, it would offer few guarantees to children’s outcomes or teacher professional independence. Goldacre does not imply otherwise, but does not engage adequately with other features of the political-practice landscape.
Many teachers and educational researchers share a view that the education system in the US, which Goldacre notes funds way more RCTs in educational research than the UK, is straining – with many school buildings in urgent need of renewal, and high-stakes testing policies asserting a significant influence on practice. Outcomes from the What Works Clearinghouse are undoubtedly valuable, but do not land in a tabula rasa. And of course, not all RCTs change practice.
Even with more, better RCTs, and research cultures and information architectures of the sort Goldacre imagines, without stability in other aspects of the education system (for example in curriculum content, examinations, accountability structures, inspection regimes), any knowledge of ‘what works’ seems likely to be reduced in value either through short lifespan (it worked in the old system, but not the one now), or by simply failing to register on the radar in a profession that is straining from incessant change. I agree in theory, what Goldacre proposes might play an important role in emancipating the profession from ‘the odd spectacle of governments telling teachers how to teach’ (p. 19). I wonder how likely the promise of this is to hold true.
Some of the protectionism that Goldacre is surprised to see, and so strongly puts down as reflecting poorly on our profession, may in fact be understood as people with passionate commitments to precisely the same aims and improvements that Goldacre wishes to see, differing in their confidence in the whole of his vision becoming a reality, and clear in their understanding that even if it were to be realised, it would not be enough to secure the kind of conditions they feel best serve children and teachers.
I do educational research (and I’ll confess, it is of a qualitative kind most of the time) because I think research-based evidence has a lot to offer teaching and learning. Like Goldacre I don’t think we have exclusive rights to this kind of influence, nor do I think there is no space for political ideology either. I’m all for more evidence, better evidence, greater research literacies, more joined up research, and weaker divides between academe and schools. But please let us treat visions such as that set out by Goldacre with the careful and critical reading to which we should subject research.
Mortimore, P. 2000, ‘Does educational research matter?’, British Educational Research Journal, vol. 26, no. 1, pp. 5-24.