Unreliable Biomedical Research

An American drug company, Amgen, that tried to replicate 53 landmark studies in cancer was able to reproduce the original results in only 6 cases—even though they worked with the original researchers!

That’s not all. Scientists at the pharmaceutical company Bayer were able to reproduce the published results in just a quarter of 67 studies!

How could things be so bad? The picture here shows two reasons:

If most interesting hypotheses are false, a lot of positive results will be ‘false positives’. Negative results may be more reliable. But few people publish negative results, so we miss out on those!

And then there’s wishful thinking, sloppiness and downright fraud. Read this Economist article for more on the problems—and how to fix them:

Trouble at the lab, Economist, 18 October 2013.

That’s where I got the picture above.

18 Responses to Unreliable Biomedical Research

  1. svik says:

    Maybe the underpaid grad student knows the knows the trick to get it to work. Or maybe there was a software error that was not replicated the 2nd time.

  2. arch1 says:

    It seems hard to avoid the conclusion that a lot of scientists, including a lot of senior scientists (else this could not have become so pervasive), have lost the meaning.

    • André Joyal says:

      Biomedical research is dominantly experimental. The stakes are very high: a vaccine could save millions of lives. Also, a magic potion can win the industry a pot of gold. Negative results are depressing: who wants to know that a complex treatment does not cure cold. The faintest indication that a new cocktail may do some good could raise exaggerated hopes. Multiple independant verifications are absolutly essential in this field. End of the story?

  3. I can’t tell what they’re trying to say in the first paragraph under “Not even wrong”. It seems as if they’re maligning PLoS One, but it’s a good thing that PLoS One doesn’t look for “novelty and significance” — because they’re trying to address one of the problems you mention above, the unpublish-ability of negative results.

  4. Negative results aren’t sexy, they don’t please funders – “we worked on this project for x years and got no result; the synthesis didn’t work; the name reaction failed to produce any product after x years of work; molecular properties predicted by the professor’s pet theory are explained – more comprehensively/more accurately – by another theory (dangerous if you really want that PhD with Prof as your research director)” and so on.

    Recent example: Permaculturists advocate keyline tilling, and recently an extensive study of that technique was performed by a prof from the soil sciences department of a reputable university. The claim by the proponents of the theory is that it markedly increased soil fertility over conventional methods. The site selection and tilling was done by a proponent of the technique, who sells plows in the range of $7,000 to $10,000 each to do this kind of tilling. The study was done over a two and one-half year period, at four separate farms, with thousands of baseline samples, samples taken during the tilling study period, and samples taken after the study period.

    Result: No measurable increase or change in fertility by any criterion was found. The proponent of the method immediately condemned the study – even though he chose the tilling pattern and did the tilling himself. There’s a huge outcry against science being used to evaluate permaculture and especially this method and so forth and so on, and this study will most probably be swept under the rug if at all possible. See: http://onpasture.com/2013/06/24/keyline-plowing-gets-you-522720-worms-for-280/

    The same thing has happened in mainstream hard science, a really famous French professor published a synthetic reaction that he claimed to have invented, which gave amazingly good results, in a highly reputable French journal (peer-reviewed, of course). This reaction had been in the literature for 10 years without apparent challenge, and a friend doing his PhD research at Cornell had been assigned to use it in one of his syntheses. Didn’t work. Not once in the course of two years with fiddling with reaction conditions, and so forth and so on, even though there’d been extensive consultation with the French prof. Finally someone got up the nerve to ask for the raw data – the experimental notebooks and alas! they were missing – the dog ate them… I think the paper was finally retracted but it cost my friend two wasted years of frustration.

  5. lee bloomquist says:


    The first training films about groupthink that I saw in corporate America were about groupthink in drug companies.

  6. Uncle Al says:

    The primary effect of SSRIs is dry mouth. “Double-blind studies” are wholly transparent to both sides of test populations. The bottom line is marketing, plus screwing up shellfish molting and reproduction from urinated metabolites. Dry mouths put dentists back in the dental caries business (abetted by unfluoridated bottled water – and its explosive popularity given dry mouths).

  7. I saw this Economist article (and several other similar ones) a while back when it came out and had read some of the original journal articles. There is certainly a problem with how science is not only being done, but how it is reported after the fact.

    The recent news about the NSA and how they track metadata for fighting terrorism should give us all a better example for how to collect data and report it.

    If we analogize negative results as a negative image for a photograph, the absence of light still shows us an outline of what is actually there – the problem is that humans aren’t as good at seeing this conceptually and even worse when it isn’t a simple image, but raw data. Science should be doing a better job of reporting negative results, particularly in biology with the growth of increasing genomic studies as well as the impact of epigenomic data as well. These negative results will become even more important with our improving ability to handle and analyze big data.

  8. • Jonah Lehrer, The Truth Wears Off: Is there something wrong with the scientific method?, New Yorker, 13 December, 2010.

    It’s not just biomedical, its far far more widespread.

    • The New Yorker article I just posted above quotes only one physics effect, at the very end: “the weak coupling ratio exhibited by decaying neutrons, which appears to have fallen by more than ten standard deviations between 1969 and 2001.” I can remember one more from my student days: the fine structure constant seems to have changed by more than six sigma (?) since the 1940’s. Presumably this is all “experimental error”, but these are measurements made by world-class, top-notch, name-brand physicists … not exactly error-prone ding-dongs. So the explanation, whatever it is, is subtle. The typical publication-bias effect seems insufficient.

      • John Beattie says:

        That is an interesting article. Does anyone know what this sentence means?

        “…the weak coupling ratio exhibited by decaying neutrons, which appears to have fallen by more than ten standard deviations between 1969 and 2001.” [second-last paragraph of the article]

        “…fallen by ten standard deviations…” is what is giving me most difficulty.

        Does the sentence say that there has been a reduction in the measured value and the reduction equals ten times the estimate of the error, i.e. the error estimate made for the first measurement?

        I very much doubt if that is what it is saying but I feel I ought to give a guess at what it might be saying, in order to help show what my difficulty is.

      • John Baez says:

        John Beattie wrote:

        Does the sentence say that there has been a reduction in the measured value and the reduction equals ten times the estimate of the error, i.e. the error estimate made for the first measurement?

        Yes, that’s how I read the sentence.

        I doubt they’re claiming the weak interaction constant has actually decreased that much from 1969 to 2001, though one could read the sentence that way if one didn’t know physicists don’t believe constants of nature change so rapidly. (Some believe they change, but much more slowly.)

        • Yes, I think that is what it is. Now, as I think about it, I believe you can dig up many more examples of this, e.g. in searches for gravitational waves or dark matter or neutrino mass (or older Higgs boson searches) or Pioneer anomaly…. I’d occasionally see graphs with ridiculously tiny error bars on them, and some pertinent phenomenon far, far outside those error bars, and I’d wonder: is that a misprint? Am I mis-understanding the graph? Were the people who made those error bars just plain wrong? How could measurement/prediction X have possibly been so confidently wrong?

          If the criterion for correctness in physics is that “the true answer must lie within a few lengths of error bars”, then I humbly suggest that a lot of what is published in physics is also wrong and/or unreproducible. Again, the explanation for this is so is surely subtle.

          (Sorry I have no specific examples at my fingertips, but I think that, with some attention, they can be found)

    • John Baez says:

      I wrote a blog article here about Jonah Lehrer’s article:

      The decline effect, Azimuth, 18 October 2011.

      I dug up some good discussions of the ‘decline effect’ he discusses—the effect where initially an experiment is confirmed and later, slowly, it becomes harder to confirm.

      • Chris Aldrich says:

        This also sounds similar to the concept of the “Proteus Phenomenon” which is the “tendency for early findings in a new area of research to alternate between opposite conclusions”, which I ran across recently via WordSpy

  9. Anonymous says:

    If the researchers were more interested in what the actual truth is, maybe they would make better experiments (by taking larger sample sizes, ruling out possible errors in the experiment more rigorously, etc.). However, the current funding system of the academia does not give much incentive to valiantly search the truth. Instead, it rewards quantity over quality and flashy (but wrong) results over solid, but boring work. Even worse, the system actually penalizes rigour and solid, long term work but kicking out people who are not able to (or do not want to) churn out publications – no matter what their quality is – at high speed.

    I’m an advanced graduate student and I even consider myself relatively good at what I do. I also love physics, but the academic world and perverse financial incentives and high pressure to perform are things that make me constantly worry about my future and make me think that maybe I should go work in some other field instead and keep physics as just a hobby.

  10. David Lyon says:

    The 21st century model for collaborative problem solving contains concepts such as open source code, peer to peer data distribution and crowd-sourced funding. The core idea is to build secure, persistent systems by removing single points of failure. The current tragedy of the untrustworthiness of scientific publishing can not be blamed on scientists, but on the failure-prone 20th century system that they feel trapped within. If the single funding source is corrupt, data analysis is twisted or data is lost. If the single steward of the data depends too much on publishing a certain result at a certain time, the research is spoiled. If the editors of the top handful of journals are corrupt or under outside control, research is guided in directions that they choose and other avenues are left unexplored.

    As a physics graduate student, I’ve completely lost faith in the way science is currently done and my PhD thesis date has therefore receded towards the end of time. However, I’m excited about the promise of 21st century science 2.0 that uses public data, a decentralized anonymous scientific reputation system, free peer-to-peer publishing with no middlemen, etc. Something much like the Selected Papers Network advocated by this community will be a vital component of science 2.0, but even more reform is needed throughout the entire supply chain of science.

  11. Hank Roberts says:

    I followed that plowing link through to where the guy who did the plow work commented

    … I was under the impression that the study would unfortunately not be ‘publishable’ in academic journals due to an insufficient number of test plots at each site. My understanding was that an insufficient number of soil samples at each site and high variability between samples brought about results with low statistical reliability. Although the study results did indicate keyline plowing had no effect on soils (save for those lovely (but hungry) earthworms), it was not possible to determine if it was the result of the tool or under-sampling.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.