Why Most Published Research Findings Are False

My title here is the eye-catching—but exaggerated!—-title of this well-known paper:

• John P. A. Ioannidis, Why most published research findings are false, PLoS Medicine 2 (2005), e124.

It’s open-access, so go ahead and read it! Here is his bold claim:

Published research findings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies to the most modern molecular research. There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims. However, this should not be surprising. It can be proven that most claimed research findings are false. Here I will examine the key factors that influence this problem and some corollaries thereof.

He’s not really talking about all ‘research findings’, just research that uses the

ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05.

His main interests are medicine and biology, but many of the problems he discusses are more general.

His paper is a bit technical—but luckily, one of the main points was nicely explained in the comic strip xkcd:


If you try 20 or more things, you should not be surprised that once an event with probability less than 0.05 = 1/20 will happen! It’s nothing to write home about… and nothing to write a scientific paper about.

Even researchers who don’t make this mistake deliberately can do it accidentally. Ioannidis draws several conclusions, which he calls corollaries:

Corollary 1: The smaller the studies, the less likely the research findings are to be true. (If you test just a few jelly beans to see which ones ‘cause acne’, you can easily fool yourself.)

Corollary 2: The smaller the effects being measured, the less likely the research findings are to be true. (If you’re studying whether jelly beans cause just a tiny bit of acne, you you can easily fool yourself.)

Corollary 3: The more quantities there are to find relationships between, the less likely the research findings are to be true. (If you’re studying whether hundreds of colors of jelly beans cause hundreds of different diseases, you can easily fool yourself.)

Corollary 4: The greater the flexibility in designing studies, the less likely the research findings are to be true. (If you use lots and lots of different tricks to see if different colors of jelly beans ‘cause acne’, you can easily fool yourself.)

Corollary 5: The more financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. (If there’s huge money to be made selling acne-preventing jelly beans to teenagers, you can easily fool yourself.)

Corollary 6: The hotter a scientific field, and the more scientific teams involved, the less likely the research findings are to be true. (If lots of scientists are eagerly doing experiments to find colors of jelly beans that prevent acne, it’s easy for someone to fool themselves… and everyone else.)

Ioannidis states his corollaries in more detail; I’ve simplified them to make them easy to understand, but if you care about this stuff, you should read what he actually says!

The Open Science Framework

Since his paper came out—and many others on this general theme—people have gotten more serious about improving the quality of statistical studies. One effort is the Open Science Framework.

Here’s what their website says:

The Open Science Framework (OSF) is part network of research materials, part version control system, and part collaboration software. The purpose of the software is to support the scientist’s workflow and help increase the alignment between scientific values and scientific practices.

Document and archive studies.

Move the organization and management of study materials from the desktop into the cloud. Labs can organize, share, and archive study materials among team members. Web-based project management reduces the likelihood of losing study materials due to computer malfunction, changing personnel, or just forgetting where you put the damn thing.

Share and find materials.

With a click, make study materials public so that other researchers can find, use and cite them. Find materials by other researchers to avoid reinventing something that already exists.

Detail individual contribution.

Assign citable, contributor credit to any research material – tools, analysis scripts, methods, measures, data.

Increase transparency.

Make as much of the scientific workflow public as desired – as it is developed or after publication of reports. Find public projects here.

Registration.

Registering materials can certify what was done in advance of data analysis, or confirm the exact state of the project at important points of the lifecycle such as manuscript submission or at the onset of data collection. Discover public registrations here.

Manage scientific workflow.

A structured, flexible system can provide efficiency gain to workflow and clarity to project objectives, as pictured.

CONSORT

Another group trying to improve the quality of scientific research is CONSORT, which stands for Consolidated Standards of Reporting Trials. This is mainly aimed at medicine, but it’s more broadly applicable.

The key here is the “CONSORT Statement”, a 25-point checklist saying what you should have in any paper about a randomized controlled trial, and a flow chart saying a bit about how the experiment should work.

What else?

What are the biggest other efforts that are being made to improve the quality of scientific research?

14 Responses to Why Most Published Research Findings Are False

  1. Jon Awbrey says:

    One can’t help thinking of Peter Medawar’s question, “ Is The Scientific Paper A Fraud?”.

  2. […] Professor John Carlos Baez tipped and explained John Ioannidis’ famous 2005 paper “Why Most Published Research […]

  3. How does the reasoning in the IPCC reports, and the studies they’re based on, stand up to the kind of scrutiny Ioannidis suggests?

    • John Baez says:

      That’s a huge question, since the IPCC reports are thousands of pages long and rely on vast numbers of different studies, and often a meta-analysis of these studies. I bet you’ll only get ideologically motivated answers if you pose the question at such a broad level: some people are motivated to attack the IPCC, and others to defend it. To get more interesting replies it’s probably better to focus on a more specific aspect of the IPCC reports.

  4. arch1 says:

    Eye-opening, thanks very much! Among other things I now have more insight into the ever changing stream of nutritional and medical guidelines coming at me from all directions.
    1) what is the “odds ratio” mentioned in Box 1 of the paper?
    2) apologies for my ignorance, but how often is it possible to meaningfully estimate β and R? I suspect R is typically the sticky one. But even a rough and partly subjective PPV value would seem a valuable corrective to producers (and very valuable caveat to consumers) of these studies.
    3) Even *I* realize that per-study estimation of u is a fools’ errand; but in cases where the bias-free PPV *is* meaningfully estimable, a journal could require that PPV be reported assuming each of a spread of plausible u-values e.g. {0,.05,0.1,0.2}

    • John Baez says:

      I don’t have much to say about question 2); maybe a statistician around here could tackle that one!

      As for question 1), here’s what Wikipedia says about the ‘odds ratio’:

      In statistics, imagine each individual in a population either does or does not have a property ″A,″ and also either does or does not have a property ″B.″ For example, ″A″ might be “has high blood pressure,″ and ″B″ might be ″drinks more than one alcoholic drink a day,″ where both properties need to be appropriately defined and quantified (the properties need not be medical, though, and they need not be ″good″ or ″bad″). The odds ratio (usually abbreviated ″OR″) is one of three main ways to quantify how strongly the having or not having of the property A is associated with having or not having the property B in that population. As the name implies, to compute the OR, one follows these steps: 1) computes the odds that an individual in the population has ″A″ given that he or she has ″B″ (probability of ″A″ given ″B″ divided by the probability of not-″A″ given ″B″); 2) Computes the odds that an individual in the population has ″A″ given that he or she does not have ″B″; and 3) Divides the first odds by the second odds to obtain the odds ratio, the OR. If the OR is greater than 1, then having ″A″ is ″associated″ with having ″B″ in the sense that the having of ″B″ raises (relative to not-having ″B″) the odds of having ″A.″ Note that this is not enough to establish that B is a contributing cause of ″A″: it could be that the association is due to a third property, ″C,″ which is a contributing cause of both ″A″ and ″B.″

  5. Reproducibility is a big deal in computational mathematics, both pure and applied. Last December I participated in a workshop at ICERM on this topic. You can read the final report, which is certainly an “effort to improve the quality of scientific research”, here: http://icerm.brown.edu/html/programs/topical/tw12_5_rcem/icerm_report.pdf

    • John Baez says:

      That looks good! Let me quote some, since most people will be too busy to click the link:

      For reproducibility to be fostered and maintained, workshop participants agreed that cultural changes need to take place within the field of computationally based research that instill the open and transparent communication of results as a default. Such a mode will increase productivity—less time wasted in trying to recover output that was lost or misplaced, less time wasted trying to double-check results in the manuscript with computational output, and less time wasted trying to determine whether other published results (or even their own) are truly reliable. Open access to any data used in the research and to both primary and auxiliary source code also provides the basis for research to be communicated transparently creating the opportunity to build upon previous work, in a similar spirit as open software provided the basis for Linux. Code and data should be made available under open licensing terms as discussed in Appendix F. This practice enables researchers both to benefit more deeply from the creative energies of the global community and to participate more fully in it. Most great science is built upon the discoveries of preceding generations and open access to the data and code associated with published computational science allows this tradition to continue. Researchers should be encouraged to recognize the potential benefits of openness and reproducibility.

      It is also important to recognize that there are costs and barriers to shifting to a practice of reproducible research, particularly when the culture does not recognize the value of developing this new paradigm or the effort that can be required to develop or learn to use suitable tools. This is of particular concern to young people who need to earn tenure or secure a permanent position. To encourage more movement towards openness and reproducibility, it is crucial that such work be acknowledged and rewarded. The current
      system, which places a great deal of emphasis on the number of journal publications and virtually none on reproducibility (and often too little on related computational issues such as
      verification and validation), penalizes authors who spend extra time on a publication rather than doing the minimum required to meet current community standards. Appropriate credit should given for code and data contributions including an expectation of citation. Another suggestion is to instantiate yearly award from journals and/or professional societies, to be awarded to investigators for excellent reproducible practice. Such awards are highly motivating to young researchers in particular, and potentially could result in a sea change in attitudes. These awards could also be cross-conference and journal awards; the collected list of award recipients would both increase the visibility of researchers following good practices and provide examples for others. More generally, it is unfortunate that software development and data curation are often discounted in the scientific community, and programming is treated as something to spend as little time on as possible. Serious scientists are not expected to carefully test code, let alone document it, in the same way they are trained to properly use other tools or document their experiments. It has been said in some quarters that writing a large piece of software is akin to building infrastructure such as a telescope rather than a creditable scientific contribution, and not worthy of tenure or comparable status at a research laboratory. This attitude must change if we are to encourage young researchers to specialize in computing skills that are essential for the future of mathematical and scientific research. We believe the more proper analog to a large scale scientific instrument is a supercomputer, whereas software reflects the intellectual engine that makes the supercomputers useful, and has scientific value beyond the hardware itself. Important computational results, accompanied
      by verification, validation, and reproducibility, should be accorded with honors similar to a strong publication record.

      Several tools were presented at the workshop that enable users to write and publish documents that integrate the text and figures seen in reports with code and data used to generate both text and graphical results, such as IPython, Sage notebooks, Lepton, knitr,
      and Vistrails. Slides for these talks are available on the wiki [1] and Appendix E discusses these and other tools in detail.

    • Graham Jones says:

      Yes this does look good, but people have been saying things like this for a long time without much changing.

      This article Lost Branches on the Tree of Life
      http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001636
      documents the sad state of affairs in my area. For example, BEAST is a very popular program for phylogenetic analysis. The user must supply an xml file as input, and that file can be included in supplementary material when publishing. But

      “Our survey of publications that implemented BEAST revealed that only 11 out of 100 (11%) examined studies provided access to the underlying xml input file, which is critical for reproducing BEAST results.”

      and

      “Failure on a Massive Scale

      “Our findings indicate that while some journals (e.g., Evolution, Nature, PLOS Biology, Systematic Biology) currently require nucleotide sequence alignments, associated tree files, and other relevant data to be deposited in public repositories, most journals do not have these requirements; resultantly, the systematics community is doing a poor job of making the actual datasets available. More troublesome perhaps is that the situation has barely improved over the 12 years covered in this study (Figures 1 and 2). In addition, when data are deposited, they often do not include critical information such as what was actually included in data alignments (e.g., what characters were excluded, full taxon names; see Table S1 and Figure S1). Without accurate details describing how alignments were implemented, it is difficult or perhaps impossible to faithfully reproduce the study results. Additionally, parameters for the program BEAST are rarely made available for scrutiny. Lastly, in many cases when data were not deposited to TreeBASE, the authors indicated that the data could be obtained directly from them; however, our survey indicates this is typically not the case (only ~40% of authors even respond, and of these only a small percent actually provide the requested data)—hence, many alignments and analysis parameters seem to be lost forever.”

  6. […] John Baez does a nice job of simplifying the corollaries, and there is some good discussion to follow it on his blog. […]

  7. Roger Witte says:

    This article showing lack of rigour in open access journals gives further cause for alarm http://www.sciencemag.org/content/342/6154/60.full

    • John Baez says:

      Yes, but note that this study in Science did not compare open access journals to other journals—like, umm, Science.

      This is bad, because if there are problems with all journals, or most journals, the study will make these seem like problems with open access journals! By analogy, consider a study of black American men who beat their wives, that does not bother to compare the behavior of white men. Would that be wise?

      There’s a more detailed critique here:

      • Mike Taylor, Anti-tutorial: how to design and execute a really bad study, Sauropod Vertebra Picture of the Week, 7 October 2013.

      Quoting:

      Suppose, hypothetically, that you worked for an organisation whose nominal goal is the advancement of science, but which has mutated into a highly profitable subscription-based publisher. And suppose you wanted to construct a study that showed the alternative — open-access publishing — is inferior.

      What would you do?

      You might decide that a good way to test publishers is by sending them an obviously flawed paper and seeing whether their peer-review weeds it out.

      But you wouldn’t want to risk showing up subscription publishers. So the first thing you’d do is decide up front not to send your flawed paper to any subscription journals. You might justify this by saying something like “the turnaround time for traditional journals is usually months and sometimes more than a year. How could I ever pull off a representative sample?”.

      Next, you’d need to choose a set of open-access journals to send it to. At this point, you would carefully avoid consulting the membership list of the Open Access Scholarly Publishers Association, since that list has specific criteria and members have to adhere to a code of conduct. You don’t want the good open-access journals — they won’t give you the result you want.

      Instead, you would draw your list of publishers from the much broader Directory of Open Access Journals, since that started out as a catalogue rather than a whitelist. (That’s changing, and journals are now being cut from the list faster than they’re being added, but lots of old entries are still in place.)

      Then, to help remove many of the publishers that are in the game only to advance research, you’d trim out all the journals that don’t levy an article processing charge.

      But the resulting list might still have an inconveniently high proportion of quality journals. So you would bring down the quality by adding in known-bad publishers from Beall’s list of predatory open-access publishers.

      To make sure you get a good, impressive result that will have a lot of “impact”, you might find it necessary to discard some inconvenient data points, omitting from the results some open-access journals that rejected the paper.

      Now you have your results, it’s time to spin them. Use sweeping, unsupported generalisations like “Most of the players are murky. The identity and location of the journals’ editors, as well as the financial workings of their publishers, are often purposefully obscured.”

      The link is to another critique:

      • Gunther Eysenbach, Unscientific spoof paper accepted by 157 “black sheep” open access journals – but the Bohannon study has severe flaws itself, 5 October 2013.

      Clearly we need bloggers to keep an on the supposedly serious researchers who publish in Science.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

This site uses Akismet to reduce spam. Learn how your comment data is processed.