Philosophy

The fine art of critiquing an academic paper

"Researchers don't mean to exaggerate, but lots of things can distort findings"

It's possible people are not bothering to report a negative result alongside positive ones they found

by

Ben Goldacre

August 12th, 2011

guardian.co.uk

You may have seen some news stories saying one part of the brain is bigger, or smaller, in people with a certain mental health problem, or even a specific job. These are generally based on real, published research. But how reliable are the studies?

One way of critiquing a piece of scientific research is to read the academic paper in detail, looking for flaws. But that may not be enough, if some sources of bias might exist outside it, in the wider system of science.

By now you'll be familiar with publication bias: the phenomenon where studies with boring, negative results are less likely to get written up or published. You can estimate this using a tool such as, say, a funnel plot. The principle is simple: expensive landmark studies are harder to brush under the carpet, but small ones can disappear more easily. So split your studies into "big ones" and "small ones": if the small studies, averaged out together, give a more positive result than the big studies, then maybe some small negative studies have gone missing in action.

Sadly, this doesn't work with brain scan studies, because there's not enough variation in size. So Professor John Ioannidis, a godlike figure in the field of "research about research", took a different approach. He collected a large, representative sample of these anatomical studies, counted up how many positive results they got, and how positive those results were, and then compared this to how many similarly positive results you could plausibly have expected to detect, simply from the sizes of the studies.

This can be derived from something called the "power calculation". Everyone knows that the more data you collect for a piece of research, the greater your ability to detect a modest effect. What people often miss is that the size of sample needed also changes with the size of the effect you're trying to detect: detecting a true 0.2% difference in the size of the hippocampus between two groups, say, would need more subjects than a study aiming to detect a 25% difference.

By working backwards and sideways from these kinds of calculations, Ioannidis was able to determine, from the sizes of effects measured and from the numbers of people scanned, how many positive findings could plausibly have been expected, and compare with how many were reported. The answer was stark: even being generous, there were twice as many positive findings as you could realistically have expected from the amount of data reported on.

What could explain this? Inadequate blinding is an issue: a fair amount of judgment goes into measuring brain size area on a scan, so wishful nudges can creep in. And boring old publication bias is another: maybe whole negative papers aren't getting published.

But a final, more interesting explanation is also possible. In these kinds of studies, it's possible that many brain areas are measured to see if they're bigger or smaller, and maybe, then, only the positive findings get reported within each study.

There is one final line of evidence to support this. In studies of depression, for example, 31 studies report data on the hippocampus, six on the putamen and seven on the prefrontal cortex. Maybe, perhaps, more investigators really did focus solely on the hippocampus. But given how easy it is to measure the size of another area – once you've recruited and scanned your participants – it's also possible that people are measuring these other areas, finding no change and not bothering to report that negative result in their paper alongside the positive ones they've found.

There's only one way to prevent this: researchers would have to publicly pre-register which areas they plan to measure and then report all findings. In the absence of that process, the entire field might be distorted, by a form of exaggeration that is – we trust – honest and unconscious, but more interestingly, collective and disseminated.

- Grumpiness And Those Pesky Moral Decisions
"How Your Moral Decisions are Shaped by a Bad Mood" Weighty choices can be shifted by surprising factors by Travis Riddle March 12th, 2013 Scientific American Imagine you’re standing on a footbridge over some trolley tracks. Below you, an out-of-control...

- Salt And Prozac...the Debate Continues
Statistics can be misinterpreted. "What Salt and Prozac Have in Common" by Maia Szalavitz July 15th, 2011 Time Recently, two medical controversies have made headlines: the question of whether too much salt is bad for your health and the debate over...

- Money=happiness?
This is one of the oldest philosophical questions ever asked. "Can money buy happiness? Gallup poll asks, and the world answers" July 1st, 2010 physorg.com A worldwide survey of more than 136,000 people in 132 countries included questions about happiness...

- Scientific Research...more Ethics
"Ethics experts call for refocus of scientific review to ensure integrity of research process" May 13th, 2010 PHYSORG.COM In a paper published this week in the journal Science, experts caution that important ethical issues in the testing of new therapies...

- "genius/creativity" Poll
Is genius/creativity in an individual... Inherent...2 Environmental...0 Both...3 Humm, would the genius or creative person have better opportunities to cultivate same if the environment were more conducive than an environment of few opportunities or...

Philosophy