Quantitative Reasoning at Yale-NUS College

27 June, 2013

What mathematics should any well-educated person know? It’s rather rare that people have a chance not just to think about this question, but do something about it. But it’s happening now.

There’s a new college called Yale-NUS College starting up this fall in Singapore, jointly run by Yale College and the National University of Singapore. The buildings aren’t finished yet: the above picture shows how a bit of it should look when they are. Faculty are busily setting up the courses and indeed the whole administrative structure of the university, and I’ve had the privilege of watching some of this and even helping out a bit.

It’s interesting because you usually meet an institution when it’s already formed—and you encounter and learn about only those aspects that matter to you. But in this case, the whole institution is being created, and every aspect discussed. And this is especially interesting because Yale-NUS College is designed to be a ‘liberal arts college for Asia for the 21st century’.

As far as I can tell, there are no liberal arts colleges in Asia. Creating a good one requires rethinking the generally Eurocentric attitudes toward history, philosophy, literature, classics and so on that are built into the traditional idea of the liberal arts. Plus, the whole idea of a liberal arts education needs to be rethought for the 21st century. What should a well-educated person know, and be able to do? Luckily, the faculty of Yale-NUS College are taking a fresh look at this question, and coming up with some new answers.

I’m really excited about the Quantitative Reasoning course that all students will take in the second semester of their first year. It will cover topics like this:

• innumeracy, use of numbers in the media.
• visualizing quantitative data.
• cognitive biases, operationalization.
• qualitative heuristics, cognitive biases, formal logic and mathematical proof.
• formal logic, mathematical proofs.
• probability, conditional probability (Bayes’ rule), gambling and odds.
• decision trees, expected utility, optimal decisions and prospect theory.
• sampling, uncertainty.
• quantifying uncertainty, hypothesis testing, p-values and their limitations.
statistical power and significance levels, evaluating evidence.
• correlation and causation, regression analysis.

The idea is not to go into vast detail and not to bombard the students with sophisticated mathematical methods, but to help students:

• learn how to criticize and question claims in an informed way;

• learn to think clearly, to understand logical and intuitive reasoning, and to consider appropriate standards of proof
 in different contexts;

• develop a facility and comfort with a variety of representations of quantitative data, and practical experience in gathering data;

• understand the sources of bias and error in seemingly objective numerical data;

• become familiar with the basic concepts 
of probability and statistics, with particular emphasis on recognizing when these techniques provide reliable results and when they threaten to mislead us.

They’ll do some easy calculations using R, a programming language optimized for statistics.

Most exciting of all to me is how the course will be taught. There will be about 9 teachers. It will be ‘team-based learning’, where students are divided into (carefully chosen) groups of six. A typical class will start with a multiple choice question designed to test the students understanding of the material they’ve just studied. Then the team will discuss their answers, while professors walk around and help out; then they’ll take the quiz again; then one professor will talk about that topic.

This idea is called ‘peer instruction’. Some studies have shown this approach works better than the traditional lecture style. I’ve never seen it in action, though my friend Christopher Lee uses it in now in his bioinformatics class, and he says it’s great. You can read about its use in physics here:

• Eric Mazur, Physics Education.

I’ll be interested to see it in action starting in August, and later I hope to teach part-time at Yale-NUS College and see how it works for myself!

At the very least, it’s exciting to see people try new things.


The Selected Papers Network (Part 2)

14 June, 2013

Last time Christopher Lee and I described some problems with scholarly publishing. The big problems are expensive journals and ineffective peer review. But we argued that solving these problems require new methods of

selection—assessing papers

and

endorsement—making the quality of papers known, thus giving scholars the prestige they need to get jobs and promotions.

The Selected Papers Network is an infrastructure for doing both these jobs in an open, distributed way. It’s not yet the solution to the big visible problems—just a framework upon which we can build those solutions. It’s just getting started, and it can use your help.

But before I talk about where all this is heading, and how you can help, let me say what exists now.

This is a bit dangerous, because if you’re not sure what a framework is for, and it’s not fully built yet, it can be confusing to see what’s been built so far! But if you’ve thought about the problems of scholarly publishing, you’re probably sick of hearing about dreams and hopes. You probably want to know what we’ve done so far. So let me start there.

SelectedPapers.net as it stands today

SelectedPapers.net lets you recommend papers, comment on them, discuss them, or simply add them to your reading list.

But instead of “locking up” your comments within its own website—the “walled garden” strategy followed by many other services—it explicitly shares these data in a way that people not on SelectedPapers.net can easily see. Any other service can see and use them too. It does this by using existing social networks—so that users of those social networks can see your recommendations and discuss them, even if they’ve never heard of SelectedPapers.net!

The idea is simple. You add some hashtags to let SelectedPapers.net know you’re talking to it, and to let it know which paper you’re talking about. It notices these hashtags and copies your comments over to its publicly accessible database.

So far Christopher Lee has got it working on Google+. So right now, if you’re a Google+ user, you can post comments on SelectedPapers.net using your usual Google+ identity and posting process, just by including suitable hashtags. Your post will be seen by your usual audience—but also by people visiting the SelectedPapers.net website, who don’t use Google+.

If you want to strip the idea down to one sentence, it’s this:

Given that social networks already exist, all we need for truly open scientific communication is a convention on a consistent set of tags and IDs for discussing papers.

That makes it possible to integrate discussion from all social networks—big and small—as a single unified forum. It’s a federated approach, rather than a single isolated website. And it won’t rely on any one social network: after Google+, we can get it working for Twitter and other networks and forums.

But more about the theory later. How, exactly, do you use it?

Getting Started

To see how it works, take a look here:

https://selectedpapers.net

Under ‘Recent activity’ you’ll see comments and recommendations of different papers, so far mostly on the arXiv.

Support for other social networks such as Twitter is coming soon. But here’s how you can use it now, if you’re a member of Google+:

• We suggest that you first create (in your Google+ account) a Google+ Circle specifically for discussing research with (e.g. call it “Research”). If you already have such a circle, or circles, you can just use those.

• Click Sign in with Google on https://selectedpapers.net or on a paper discussion page.

• The usual Google sign-in window will appear (unless you are already signed in). Google will ask if you want to use the Selected Papers network, and specifically for what Circle(s) to let it see the membership list(s) (i.e. the names of people you have added to that Circle). SelectedPapers.net uses this as your initial “subscriptions”, i.e. the list of people whose recommendations you want to receive. We suggest you limit this to your “Research” circle, or whatever Circle(s) of yours fit this purpose.

Note the only information you are giving SelectedPapers.net access to is this list of names; in all other respects SelectedPapers.net is limited by Google+ to the same information that anyone on the internet can see, i.e. your public posts. For example, SelectedPapers.net cannot ever see your private posts within any of your Circles.

• Now you can initiate and join discussions of papers directly on any SelectedPapers.net page.

• Alternatively, without even signing in to SelectedPapers.net, you can just write posts on Google+ containing the hashtag #spnetwork, and they will automatically be included within the SelectedPapers.net discussions (i.e. indexed and displayed so that other people can reply to them etc.). Here’s an example of a Google+ post example:

This article by Perelman outlines a proof of the Poincare conjecture!

#spnetwork #mustread #geometry #poincareConjecture arXiv:math/0211159

You need the tag #spnetwork for SelectedPapers.net to notice your post. Tags like #mustread, #recommend, and so on indicate your attitude to a paper. Tags like #geometry, #poincareConjecture and so on indicate a subject area: they let people search for papers by subject. A tag of the form arXiv:math/0211159 is necessary for arXiv papers; note that this does not include a # symbol.

For PubMed papers, include a tag of the form PMID:22291635. Other published papers usually have a DOI (digital object identifier), so for those include a tag of the form doi:10.3389/fncom.2012.00001.

Tags are the backbone of SelectedPapers.net; you can read more about them here.

• You can also post and see comments at https://selectedpapers.net. This page also lets you search for papers in the arXiv and search for published papers via their DOI or Pubmed ID. If you are signed in, the homepage will also show the latest recommendations (from people you’re subscribed to), papers on your reading list, and papers you tagged as interesting for your work.

Papers

Papers are the center of just about everything on the selected papers network. Here’s what you can currently do with a paper:

• click to see the full text of the paper via the arXiv or the publisher’s website.

• read other people’s recommendations and discussion of the paper.

• add it to your Reading List. This is simply a private list of papers—a convenient way of marking a paper for further attention later. When you are logged in, your Reading list is shown on the homepage. No one else can see your reading list.

• share the paper with others (such as your Google+ Circles or Google+ communities that you are part of).

• tag it as interesting for a specific topic. You do this either by clicking the checkbox of a topic (it shows topics that other readers have tagged the paper), by selecting from a list of topics that you have previously tagged as interesting to you, or by simply typing a tag name. These tags are public; that is, everyone can see what topics the paper has been tagged with, and who tagged them.

• post a question or comment about the paper, or reply to what other people have said about it. This traffic is public. Specifically, clicking the Discuss this Paper button gives you a Google+ window (with appropriate tags already filled in) for writing a post. Note that in order for the spnet to see your post, you must include Public in the list of recipients for your post (this is an inherent limitation of Google+, which limits apps to see only the same posts that any internet user would see – even when you are signed-in to the app as yourself on Google+).

• recommend it to others. Once again, you must include Public in the list of recipients for your post, or the spnet cannot see it.

We strongly suggest that you include a topic hashtag for your research interest area. For example, if there is a hashtag that people in your field commonly use for posting on Twitter, use it. If you have to make up a new hashtag, keep it intuitive and follow “camelCase” capitalization e.g. #openPeerReview.

Open design

Note that thanks to our open design, you do not even need to create a SelectedPapers.net login. Instead, SelectedPapers.net authenticates with Google (for example) that you are signed in to Google+; you never give SelectedPapers.net your Google password or access to any confidential information.

Moreover, even when you are signed in to SelectedPapers.net using your Google sign-in, it cannot see any of your private posts, only those you posted publicly—in other words, exactly the same as what anybody on the Internet can see.

What to do next?

We really need some people to start using SelectedPapers.net and start giving us bug reports. The place to do that is here:

https://github.com/cjlee112/spnet/issues

or if that’s too difficult for some reason, you can just leave a comment on this blog entry.

We could also use people who can write software to improve and expand the system. I can think of fifty ways the setup could be improved: but as usual with open-source software, what matters most is not what you suggest, but what you’re willing to do.

Next, let mention three things we could do in the longer term. But I want to emphasize that these are just a few of many things that can be done in the ecosystem created by a selected papers network. We don’t need to all do the same thing, since it’s an open, federated system.

Overlay journals. A journal doesn’t need to do distribution and archiving of papers anymore: the arXiv or PubMed can do that. A journal can focus on the crucial work of selection and endorsement—it can just point to a paper on the arXiv or PubMed, and say “this paper is published”. Such journals, called overlay journals, are already being contemplated—see for example Tim Gowers’ post. But they should work better in the ecosystem created by a selected papers network.

Review boards. Publication doesn’t need to be a monogamous relation between a journal and an author. We could also have prestigious ‘review boards’ like the Harvard Genomics Board or the Institute of Network Science who pick, every so often, what they consider to be best papers in their chosen area. In their CVs, scholars could then say things like “this paper was chosen as one of the Top Ten Papers in Topology in 2015 by the International Topology Review Board”. Of course, boards would become prestigious in the usual recursive way: by having prestigious members, being associated with prestigious institutions, and correctly choosing good papers to bestow prestige upon. But all this could be done quite cheaply.

Open peer review. Last time, we listed lots of problems with how journals referee papers. Open peer review is a way to solve these problems. I’ll say more about it next time. For now, go here:

• Christopher Lee, Open peer review by a selected-papers network, Frontiers of Computational Neuroscience 6 (2012).

A federated system

After reading this, you may be tempted to ask: “Doesn’t website X already do most of this? Why bother starting another?”

Here’s the answer: our approach is different because it is federated. What does that mean? Here’s the test: if somebody else were to write their own implementation of the SelectedPapers.net protocol and run it on their own website, would data entered by users of that site show up automatically on selectedpapers.net, and vice versa? The answer is yes, because the protocol transports its data on open, public networks, so the same mechanism that allows selectedpapers.net to read its users’ messages would work for anyone else. Note that no special communications between the new site and SelectedPapers.net would be required; it is just federated by design!

One more little website is not going to solve the problems with journals. The last thing anybody wants is another password to remember! There are already various sites trying to solve different pieces of the problem, but none of them are really getting traction. One reason is that the different sites can’t or won’t talk to each other—that is, federate. They are walled gardens, closed ecosystems. As a result, progress has been stalled for years.

And frankly, even if some walled garden did eventually eventually win out, that wouldn’t solve the problem of expensive journals. If one party became able to control the flow of scholarly information, they’d eventually exploit this just as the journals do now.

So, we need a federated system, to make scholarly communication openly accessible not just for scholars but for everyone—and to keep it that way.


The Selected Papers Network (Part 1)

7 June, 2013

Christopher Lee has developed some new software called the Selected Papers Network. I want to explain that and invite you all to try using it! But first, in this article, I want to review the problems it’s trying to address.

There are lots of problems with scholarly publishing, and of course even more with academia as a whole. But I think Chris and I are focused on two: expensive journals, and ineffective peer review.

Expensive Journals

Our current method of publication has some big problems. For one thing, the academic community has allowed middlemen to take over the process of publication. We, the academic community, do most of the really tricky work. In particular, we write the papers and referee them. But they, they publishers, get almost all the money, and charge our libraries for it—more and more, thanks to their monopoly power. It’s an amazing business model:

Get smart people to work for free, then sell what they make back to them at high prices.

People outside academia have trouble understanding how this continues! To understand it, we need to think about what scholarly publishing and libraries actually achieve. In short:

1. Distribution. The results of scholarly work get distributed in publicly accessible form.

2. Archiving. The results, once distributed, are safely preserved.

3. Selection. The quality of the results is assessed, e.g. by refereeing.

4. Endorsement. The quality of the results is made known, giving the scholars the prestige they need to get jobs and promotions.

Thanks to the internet, jobs 1 and 2 have become much easier. Anyone can put anything on a website, and work can be safely preserved at sites like the arXiv and PubMed Central. All this is either cheap or already supported by government funds. We don’t need journals for this.

The journals still do jobs 3 and 4. These are the jobs that academia still needs to find new ways to do, to bring down the price of journals or make them entirely obsolete.

The big commercial publishers like to emphasize how they do job 3: selection. The editors contact the referees, remind them to deliver their referee reports, and communicate these reports to the authors, while maintaining the anonymity of the referees. This takes work.

However, this work can be done much more cheaply than you’d think from the prices of journals run by the big commercial publishers. We know this from the existence of good journals that charge much less. And we know it from the shockingly high profit margins of the big publishers, particularly Elsevier.

It’s clear that the big commercial publishers are using their monopoly power to charge outrageous prices for their products. Why do they continue to get away with this? Why don’t academics rebel and publish in cheaper journals?

One reason is a broken feedback loop. The academics don’t pay for journals out of their own pocket. Instead, their university library pays for the journals. Rising journal costs do hurt the academics: money goes into paying for journals that could be spent in other ways. But most of them don’t notice this.

The other reason is item 4: endorsement. This is the part of academic publishing that outsiders don’t understand. Academics want to get jobs and promotions. To do this, we need to prove that we’re ‘good’. But academia is so specialized that our colleagues are unable to tell how good our papers are. Not by actually reading them, anyway! So, they try to tell by indirect methods—and a very important one is the prestige of the journals we publish in.

The big commercial publishers have bought most of the prestigious journals. We can start new journals, and some of us are already doing that, but it takes time for these journals to become prestigious. In the meantime, most scholars prefer to publish in prestigious journals owned by the big publishers, even if this slowly drives their own libraries bankrupt. This is not because these scholars are dumb. It’s because a successful career in academia requires the constant accumulation of prestige.

The Elsevier boycott shows that more and more academics understand this trap and hate it. But hating a trap is not enough to escape the trap.

Boycotting Elsevier and other monopolistic publishers is a good thing. The arXiv and PubMed Central are good things, because they show that we can solve the distribution and archiving problems without the help of big commercial publishers. But we need to develop methods of scholarly publishing that solve the selection and endorsement problems in ways that can’t be captured by the big commercial publishers.

I emphasize ‘can’t be captured’, because these publishers won’t go down without a fight. Anything that works well, they will try to buy—and then they will try to extract a stream of revenue from it.

Ineffective Peer Review

While I am mostly concerned with how the big commercial publishers are driving libraries bankrupt, my friend Christopher Lee is more concerned with the failures of the current peer review system. He does a lot of innovative work on bioinformatics and genomics. This gives him a different perspective than me. So, let me just quote the list of problems from this paper:

• Christopher Lee, Open peer review by a selected-papers network, Frontiers of Computational Neuroscience 6 (2012).

The rest of this section is a quote:

Expert peer review (EPR) does not work for interdisciplinary peer review (IDPR). EPR means the assumption that the reviewer is expert in all aspects of the paper, and thus can evaluate both its impact and validity, and can evaluate the paper prior to obtaining answers from the authors or other referees. IDPR means the situation where at least one part of the paper lies outside the reviewer’s expertise. Since journals universally assume EPR, this creates artificially high barriers to innovative papers that combine two fields [Lee, 2006]—-one of the most valuable sources of new discoveries.

Shoot first and ask questions later means the reviewer is expected to state a REJECT/ACCEPT position before getting answers from the authors or other referees on questions that lie outside the reviewer’s expertise.

No synthesis: if review of a paper requires synthesis—combining the different expertise of the authors and reviewers in order to determine what assumptions and criteria are valid for evaluating it—both of the previous assumptions can fail badly [Lee, 2006].

Journals provide no tools for finding the right audience for an innovative paper. A paper that introduces a new combination of fields or ideas has an audience search problem: it must search multiple fields for people who can appreciate that new combination. Whereas a journal is like a TV channel (a large, pre-defined audience for a standard topic), such a paper needs something more like Google—a way of quickly searching multiple audiences to find the subset of people who can understand its value.

Each paper’s impact is pre-determined rather than post-evaluated: By ‘pre-determination’ I mean that both its impact metric (which for most purposes is simply the title of the journal it was published in) and its actual readership are locked in (by the referees’s decision to publish it in a given journal) before any readers are allowed to see it. By ‘post-evaluation’ I mean that impact should simply be measured by the research community’s long-term response and evaluation of it.

Non-expert PUSH means that a pre-determination decision is made by someone outside the paper’s actual audience, i.e., the reviewer would not ordinarily choose to read it, because it does not seem to contribute sufficiently to his personal research interests. Such a reviewer is forced to guess whether (and how much) the paper will interest other audiences that lie outside his personal interests and expertise. Unfortunately, people are not good at making such guesses; history is littered with examples of rejected papers and grants that later turned out to be of great interest to many researchers. The highly specialized character of scientific research, and the rapid emergence of new subfields, make this a big problem.

In addition to such false-negatives, non-expert PUSH also causes a huge false-positive problem, i.e., reviewers accept many papers that do not personally interest them and which turn out not to interest anybody; a large fraction of published papers subsequently receive zero or only one citation (even including self-citations [Adler et al., 2008]). Note that non-expert PUSH will occur by default unless reviewers are instructed to refuse to review anything that is not of compelling interest for their own work. Unfortunately journals assert an opposite policy.

One man, one nuke means the standard in which a single negative review equals REJECT. Whereas post-evaluation measures a paper’s value over the whole research community (‘one man, one vote’), standard peer review enforces conformity: if one referee does not understand or like it, prevent everyone from seeing it.

PUSH makes refereeing a political minefield: consider the contrast between a conference (where researchers publicly speak up to ask challenging questions or to criticize) vs. journal peer review (where it is reckoned necessary to hide their identities in a ‘referee protection program’). The problem is that each referee is given artificial power over what other people can like—he can either confer a large value on the paper (by giving it the imprimatur and readership of the journal) or consign it zero value (by preventing those readers from seeing it). This artificial power warps many aspects of the review process; even the ‘solution’ to this problem—shrouding the referees in secrecy—causes many pathologies. Fundamentally, current peer review treats the reviewer not as a peer but as one who wields a diktat: prosecutor, jury, and executioner all rolled into one.

Restart at zero means each journal conducts a completely separate review process of a paper, multiplying the costs (in time and effort) for publishing it in proportion to the number of journals it must be submitted to. Note that this particularly impedes innovative papers, which tend to aim for higher-profile journals, and are more likely to suffer from referees’s IDPR errors. When the time cost for publishing such work exceeds by several fold the time required to do the work, it becomes more cost-effective to simply abandon that effort, and switch to a ‘standard’ research topic where repetition of a pattern in many papers has established a clear template for a publishable unit (i.e., a widely agreed checklist of criteria for a paper to be accepted).

The reviews are thrown away: after all the work invested in obtaining reviews, no readers are permitted to see them. Important concerns and contributions are thus denied to the research community, and the referees receive no credit for the vital contribution they have made to validating the paper.

In summary, current peer review is designed to work for large, well-established fields, i.e., where you can easily find a journal with a high probability that every one of your reviewers will be in your paper’s target audience and will be expert in all aspects of your paper. Unfortunately, this is just not the case for a large fraction of researchers, due to the high level of specialization in science, the rapid emergence of new subfields, and the high value of boundary-crossing research (e.g., bioinformatics, which intersects biology, computer science, and math).

Toward solutions

Next time I’ll talk about the software Christopher Lee has set up. But if you want to get a rough sense of how it works, read the section of Christopher Lee’s paper called The Proposal in Brief.


Meta-Rationality

15 March, 2013

On his blog, Eli Dourado writes something that’s very relevant to the global warming debate, and indeed most other debates.

He’s talking about Paul Krugman, but I think with small modifications we could substitute the name of almost any intelligent pundit. I don’t care about Krugman here, I care about the general issue:

Nobel laureate, Princeton economics professor, and New York Times columnist Paul Krugman is a brilliant man. I am not so brilliant. So when Krugman makes strident claims about macroeconomics, a complex subject on which he has significantly more expertise than I do, should I just accept them? How should we evaluate the claims of people much smarter than ourselves?

A starting point for thinking about this question is the work of another Nobelist, Robert Aumann. In 1976, Aumann showed that under certain strong assumptions, disagreement on questions of fact is irrational. Suppose that Krugman and I have read all the same papers about macroeconomics, and we have access to all the same macroeconomic data. Suppose further that we agree that Krugman is smarter than I am. All it should take, according to Aumann, for our beliefs to converge is for us to exchange our views. If we have common “priors” and we are mutually aware of each others’ views, then if we do not agree ex post, at least one of us is being irrational.

It seems natural to conclude, given these facts, that if Krugman and I disagree, the fault lies with me. After all, he is much smarter than I am, so shouldn’t I converge much more to his view than he does to mine?

Not necessarily. One problem is that if I change my belief to match Krugman’s, I would still disagree with a lot of really smart people, including many people as smart as or possibly even smarter than Krugman. These people have read the same macroeconomics literature that Krugman and I have, and they have access to the same data. So the fact that they all disagree with each other on some margin suggests that very few of them behave according to the theory of disagreement. There must be some systematic problem with the beliefs of macroeconomists.

In their paper on disagreement, Tyler Cowen and Robin Hanson grapple with the problem of self-deception. Self-favoring priors, they note, can help to serve other functions besides arriving at the truth. People who “irrationally” believe in themselves are often more successful than those who do not. Because pursuit of the truth is often irrelevant in evolutionary competition, humans have an evolved tendency to hold self-favoring priors and self-deceive about the existence of these priors in ourselves, even though we frequently observe them in others.

Self-deception is in some ways a more serious problem than mere lack of intelligence. It is embarrassing to be caught in a logical contradiction, as a stupid person might be, because it is often impossible to deny. But when accused of disagreeing due to a self-favoring prior, such as having an inflated opinion of one’s own judgment, people can and do simply deny the accusation.

How can we best cope with the problem of self-deception? Cowen and Hanson argue that we should be on the lookout for people who are “meta-rational,” honest truth-seekers who choose opinions as if they understand the problem of disagreement and self-deception. According to the theory of disagreement, meta-rational people will not have disagreements among themselves caused by faith in their own superior knowledge or reasoning ability. The fact that disagreement remains widespread suggests that most people are not meta-rational, or—what seems less likely—that meta-rational people cannot distinguish one another.

We can try to identify meta-rational people through their cognitive and conversational styles. Someone who is really seeking the truth should be eager to collect new information through listening rather than speaking, construe opposing perspectives in their most favorable light, and offer information of which the other parties are not aware, instead of simply repeating arguments the other side has already heard.

All this seems obvious to me, but it’s discussed much too rarely. Maybe we can figure out ways to encourage this virtue that Cohen and Hanson call ‘meta-rationality’? There are already too many mechanisms that reward people for aggressively arguing for fixed positions. If Krugman really were ‘meta-rational’, he might still have his Nobel Prize, but he probably wouldn’t be a popular newspaper columnist.

The Azimuth Project, and this blog, are already doing a lot of things to prevent people from getting locked into fixed positions and filtering out evidence that goes against their views. Most crucial seems to be the policy of forbidding insults, bullying, and overly repetitive restatement of the same views. These behaviors increase what I call the ‘heat’ in a discussion, and I’ve decided that, all things considered, it’s best to keep the heat fairly low.

Heat attracts many people, so I’m sure we could get a lot more people to read this blog by turning up the heat. A little heat is a good thing, because it engages people’s energy. But heat also makes it harder for people to change their minds. When the heat gets too high, changing ones mind is perceived as a defeat, to be avoided at all costs. Even worse, people form ‘tribes’ who back each other up in every argument, regardless of the topic. Rationality goes out the window. And meta-rationality? Forget it!

Some Questions

Dourado talks about ways to “identify meta-rational people.” This is very attractive, but I think it’s better to talk about “identifying when people are behaving meta-rationally”. I don’t think we should spend too much of our time looking around for paragons of meta-rationality. First of all, nobody is perfect. Second of all, as soon as someone gets a big reputation for rationality, meta-rationality, or any other virtue, it seems they develop a fan club that runs a big risk of turning into a cult. This often makes it harder rather than easier for people to think clearly and change their minds!

I’d rather look for customs and institutions that encourage meta-rationality. So, my big question is:

How can we encourage rationality and meta-rationality, and make them more popular?

Of course science, and academia, are institutions that have been grappling with this question for centuries. Universities, seminars, conferences, journals, and so on—they all put a lot of work into encouraging the search for knowledge and examining the conditions under which it thrives.

And of course these institutions are imperfect: everything humans do is riddled with flaws.

But instead of listing cases where existing institutions failed to do their job optimally, I’d like to think about ways of developing new customs and institutions that encourage meta-rationality… and linking these to the existing ones.

Why? Because I feel the existing institutions don’t reach out enough to the ‘general public’, or ‘laymen’. The mere existence of these terms is a clue. There are a lot of people who consider academia as an ‘ivory tower’, separate from their own lives and largely irrelevant. And there are a lot of good reasons for this.

There’s one you’ve heard me talk about a lot: academia has let its journals get bought by big multimedia conglomerates, who then charge high fees for access. So, we have have scientific research on global warming paid for by our tax dollars, and published by prestigious journals such as Science and Nature… which unfortunately aren’t available to the ‘general public’.

That’s like a fire alarm you have to pay to hear.

But there’s another problem: institutions that try to encourage meta-rationality seem to operate by shielding themselves from the broader sphere that favors ‘hot’ discussions. Meanwhile, the hot discussions don’t get enough input from ‘cooler’ forums… and vice versa!

For example: we have researchers in climate science who publish in refereed journals, which mostly academics read. We have conferences, seminars and courses where this research is discussed and criticized. These are again attended mostly by academics. Then we have journalists and bloggers who try to explain and discuss these papers in more easily accessed venues. There are some blogs written by climate scientists, who try to short-circuit the middlemen a bit. Unfortunately the heated atmosphere of some of these blogs makes meta-rationality difficult. There are also blogs by ‘climate skeptics’, many from outside academia. These often criticize the published papers, but—it seems to me—rarely get into discussions with the papers’ authors in conditions that make it easy for either party to change their mind. And on top of all this, we have various think tanks who are more or less pre-committed to fixed positions… and of course, corporations and nonprofits paying for advertisements pushing various agendas.

Of course, it’s not just the global warming problem that suffers from a lack of public forums that encourage meta-rationality. That’s just an example. There have got to be some ways to improve the overall landscape a little. Just a little: I’m not expecting miracles!

Details

Here’s the paper by Aumann:

• Robert J. Aumann, Agreeing to disagree, The Annals of Statistics 4 (1976), 1236-1239.

and here’s the one by Cowen and Hanson:

• Tyler Cowen and Robin Hanson, Are disagreements honest?, 18 August 2004.

Personally I find Aumann’s paper uninteresting, because he’s discussing agents that are not only rational Bayesians, but rational Bayesians that share the same priors to begin with! It’s unsurprising that such agents would have trouble finding things to argue about.

His abstract summarizes his result quite clearly… except that he calls these idealized agents ‘people’, which is misleading:

Abstract. Two people, 1 and 2, are said to have common knowledge of an event E if both know it, 1 knows that 2 knows it, 2 knows that 1 knows is, 1 knows that 2 knows that 1 knows it, and so on.

Theorem. If two people have the same priors, and their posteriors for an event A are common knowledge, then these posteriors are equal.

Cowen and Hanson’s paper is more interesting to me. Here are some key sections for what we’re talking about here:

How Few Meta-rationals?

We can call someone a truth-seeker if, given his information and level of effort on a topic, he chooses his beliefs to be as close as possible to the truth. A non-truth seeker will, in contrast, also put substantial weight on other goals when choosing his beliefs. Let us also call someone meta-rational if he is an honest truth-seeker who chooses his opinions as if he understands the basic theory of disagreement, and abides by the rationality standards that most people uphold, which seem to preclude self-favoring priors.

The theory of disagreement says that meta-rational people will not knowingly have self-favoring disagreements among themselves. They might have some honest disagreements, such as on values or on topics of fact where their DNA encodes relevant non-self-favoring attitudes. But they will not have dishonest disagreements, i.e., disagreements directly on their relative ability, or disagreements on other random topics caused by their faith in their own superior knowledge or reasoning ability.

Our working hypothesis for explaining the ubiquity of persistent disagreement is that people are not usually meta-rational. While several factors contribute to this situation, a sufficient cause that usually remains when other causes are removed is that people do not typically seek only truth in their beliefs, not even in a persistent rational core. People tend to be hypocritical in have self-favoring priors, such as priors that violate indexical independence, even though they criticize others for such priors. And they are reluctant to admit this, either publicly or to themselves.

How many meta-rational people can there be? Even if the evidence is not consistent with most people being meta-rational, it seems consistent with there being exactly one meta-rational person. After all, in this case there never appears a pair of meta-rationals to agree with each other. So how many more meta-rationals are possible?

If meta-rational people were common, and able to distinguish one another, then we should see many pairs of people who have almost no dishonest disagreements with each other. In reality, however, it seems very hard to find any pair of people who, if put in contact, could not identify many persistent disagreements. While this is an admittedly difficult empirical determination to make, it suggests that there are either extremely few meta-rational people, or that they have virtually no way to distinguish each other.

Yet it seems that meta-rational people should be discernible via their conversation style. We know that, on a topic where self-favoring opinions would be relevant, the sequence of alternating opinions between a pair of people who are mutually aware of both being meta-rational must follow a random walk. And we know that the opinion sequence between typical non-meta-rational humans is nothing of the sort. If, when responding to the opinions of someone else of uncertain type, a meta-rational person acts differently from an ordinary non-meta-rational person, then two meta-rational people should be able to discern one another via a long enough conversation. And once they discern one another, two meta-rational people should no longer have dishonest disagreements. (Aaronson (2004) has shown that regardless of the topic or their initial opinions, any two Bayesians have less than a 10% chance of disagreeing by more than a 10% after exchanging about a thousand bits, and less than a 1% chance of disagreeing by more than a 1% after exchanging about a million bits.)

Since most people have extensive conversations with hundreds of people, many of whom they know very well, it seems that the fraction of people who are meta-rational must be very small. For example, given N people, a fraction f of whom are meta-rational, let each person participate in C conversations with random others that last long enough for two meta-rational people to discern each other. If so, there should be on average f^2CN/2 pairs who no longer disagree. If, across the world, two billion people, one in ten thousand of who are meta-rational, have one hundred long conversations each, then we should see one thousand pairs of people with only honest disagreements. If, within academia, two million people, one in ten thousand of who are meta-rational, have one thousand long conversations each, we should see ten agreeing pairs of academics. And if meta-rational people had any other clues to discern each another, and preferred to talk with one another, there should be far more such pairs. Yet, with the possible exception of some cult-like or fan-like relationships, where there is an obvious alternative explanation for their agreement, we know of no such pairs of people who no longer disagree on topics where self-favoring opinions are relevant.

We therefore conclude that unless meta-rationals simply cannot distinguish each other, only a tiny non-descript percentage of the population, or of academics, can be meta-rational. Either few people have truth-seeking rational cores, and those that do cannot be readily distinguished, or most people have such cores but they are in control infrequently and unpredictably. Worse, since it seems unlikely that the only signals of meta-rationality would be purely private signals, we each seem to have little grounds for confidence in our own meta-rationality, however much we would like to believe otherwise.

Personally, I think the failure to find ‘ten agreeing pairs of academics’ is not very interesting. Instead of looking for people who are meta-rational in all respects, which seems futile, I’m more interested in to looking for contexts and institutions that encourage people to behave meta-rationally when discussing specific issues.

For example, there’s surprisingly little disagreement among mathematicians when they’re discussing mathematics and they’re on their best behavior—for example, talking in a classroom. Disagreements show up, but they’re often dismissed quickly when one or both parties realize their mistake. The same people can argue bitterly and endlessly over politics or other topics. They are not meta-rational people: I doubt such people exist. They are people who have been encouraged by an institution to behave meta-rationally in specific limited ways… because the institution rewards this behavior.

Moving on:

Personal policy implications

Readers need not be concerned about the above conclusion if they have not accepted our empirical arguments, or if they are willing to embrace the rationality of self-favoring priors, and to forgo criticizing the beliefs of others caused by such priors. Let us assume, however, that you, the reader, are trying to be one of those rare meta-rational souls in the world, if indeed there are any. How guilty should you feel when you disagree on topics where self-favoring opinions are relevant?

If you and the people you disagree with completely ignored each other’s opinions, then you might tend to be right more if you had greater intelligence and information. And if you were sure that you were meta-rational, the fact that most people were not might embolden you to disagree with them. But for a truth-seeker, the key question must be how sure you can be that you, at the moment, are substantially more likely to have a truth-seeking, in-control, rational core than the people you now disagree with. This is because if either of you have some substantial degree of meta-rationality, then your relative intelligence and information are largely irrelevant except as they may indicate which of you is more likely to be self-deceived about being meta-rational.

One approach would be to try to never assume that you are more meta-rational than anyone else. But this cannot mean that you should agree with everyone, because you simply cannot do so when other people disagree among themselves. Alternatively, you could adopt a “middle” opinion. There are, however, many ways to define middle, and people can disagree about which middle is best (Barns 1998). Not only are there disagreements on many topics, but there are also disagreements on how to best correct for one’s limited meta-rationality.

Ideally we would want to construct a model of the process of individual self-deception, consistent with available data on behavior and opinion. We could then use such a model to take the observed distribution of opinion, and infer where lies the weight of evidence, and hence the best estimate of the truth. [Ideally this model would also satisfy a reflexivity constraint: when applied to disputes about self-deception it should select itself as the best model of self-deception. If people reject the claim that most people are self-deceived about their meta-rationality, this approach becomes more difficult, though perhaps not impossible.]

A more limited, but perhaps more feasible, approach to relative meta-rationality is to seek observable signs that indicate when people are self-deceived about their meta-rationality on a particular topic. You might then try to disagree only with those who display such signs more strongly than you do. For example, psychologists have found numerous correlates of self-deception. Self-deception is harder regarding one’s overt behaviors, there is less self-deception in a galvanic skin response (as used in lie detector tests) than in speech, the right brain hemisphere tends to be more honest, evaluations of actions are less honest after those actions are chosen than before (Trivers 2000), self-deceivers have more self-esteem and less psychopathology, especially less depression (Paulhus 1986), and older children are better than younger ones at hiding their self-deception from others (Feldman & Custrini 1988). Each correlate implies a corresponding sign of self-deception.

Other commonly suggested signs of self-deception include idiocy, self-interest, emotional arousal, informality of analysis, an inability to articulate supporting arguments, an unwillingness to consider contrary arguments, and ignorance of standard mental biases. If verified by further research, each of these signs would offer clues for identifying other people as self-deceivers.

Of course, this is easier said than done. It is easy to see how self-deceiving people, seeking to justify their disagreements, might try to favor themselves over their opponents by emphasizing different signs of self-deception in different situations. So looking for signs of self-deception need not be an easier approach than trying to overcome disagreement directly by further discussion on the topic of the disagreement.

We therefore end on a cautionary note. While we have identified some considerations to keep in mind, were one trying to be one of those rare meta-rational souls, we have no general recipe for how to proceed. Perhaps recognizing the difficulty of this problem can at least make us a bit more wary of our own judgments when we disagree.


The Faculty of 1000

31 January, 2012

As of this minute, 1890 scholars have signed a pledge not to cooperate with the publisher Elsevier. People are starting to notice. According to this Wired article, the open-access movement is “catching fire”:

• David Dobbs, Testify: the open-science movement catches fire, Wired, 30 January 2012.


Now is a good time to take more substantial actions. But what?

Many things are being discussed, but it’s good to spend a bit of time thinking about the root problems and the ultimate solutions.

The world-wide web has made journals obsolete: it would be better to put papers on freely available archives and then let boards of top scholars referee them. But how do we get to this system?

In math and physics we have the arXiv, but nobody referees those papers. In biology and medicine, a board called the Faculty of 1000 chooses and evaluates the best papers, but there’s no archive: they get those papers from traditional journals.

Whoops—never mind! That was yesterday. Now the Faculty of 1000 has started an archive!

• Rebecca Lawrence, F1000 Research – join us and shape the future of scholarly communication, F1000, 30 January 2012.

• Ivan Oransky, An arXiv for all of science? F1000 launches new immediate publication journal, Retraction Watch, 30 January 2012.

This blog article says “an arXiv for all science”, but it seems the new F1000 Research archive is just for biology and medicine. So now it’s time for the mathematicians and physicists to start catching up.


Azimuth on Google Plus (Part 5)

1 January, 2012

Happy New Year! I’m back from Laos. Here are seven items, mostly from the Azimuth Circle on Google Plus:

1) Phil Libin is the boss of a Silicon Valley startup. When he’s off travelling, he uses a telepresence robot to keep an eye on things. It looks like a stick figure on wheels. Its bulbous head has two eyes, which are actually a camera and a laser. On its forehead is a screen, where you can see Libin’s face. It’s made by a company called Anybots, and it costs just $15,000.


I predict that within my life we’ll be using things like this to radically cut travel costs and carbon emissions for business and for conferences. It seems weird now, but so did telephones. Future models will be better to look at. But let’s try it soon!

• Laura Sydell No excuses: robots put you in two places at once, Weekend Edition Saturday, 31 December 2011.

Bruce Bartlett and I are already planning for me to use telepresence to give a lecture on mathematics and the environment at Stellenbosch University in South Africa. But we’d been planning to use old-fashioned videoconferencing technology.

Anybots is located in Mountain View, California. That’s near Google’s main campus. Can anyone help me set up a talk on energy and the environment at Google, where I use an Anybot?

(Or, for that matter, anywhere else around there?)

2) A study claims to have found a correlation between weather and the day of the week! The claim is that there are more tornados and hailstorms in the eastern USA during weekdays. One possible mechanism could be that aerosols from car exhaust help seed clouds.


I make no claims that this study is correct. But at the very least, it would be interesting to examine their use of statistics and see if it’s convincing or flawed:

• Thomas Bell and Daniel Rosenfeld, Why do tornados and hailstorms rest on weekends?, Journal of Geophysical Research 116 (2011), D20211.

Abstract. This study shows for the first time statistical evidence that when anthropogenic aerosols over the eastern United States during summertime are at their weekly mid-week peak, tornado and hailstorm activity there is also near its weekly maximum. The weekly cycle in summertime storm activity for 1995–2009 was found to be statistically significant and unlikely to be due to natural variability. It correlates well with previously observed weekly cycles of other measures of storm activity. The pattern of variability supports the hypothesis that air pollution aerosols invigorate deep convective clouds in a moist, unstable atmosphere, to the extent of inducing production of large hailstones and tornados. This is caused by the effect of aerosols on cloud drop nucleation, making cloud drops smaller and hydrometeors larger. According to simulations, the larger ice hydrometeors contribute to more hail. The reduced evaporation from the larger hydrometeors produces weaker cold pools. Simulations have shown that too cold and fast-expanding pools inhibit the formation of tornados. The statistical observations suggest that this might be the mechanism by which the weekly modulation in pollution aerosols is causing the weekly cycle in severe convective storms during summer over the eastern United States. Although we focus here on the role of aerosols, they are not a primary atmospheric driver of tornados and hailstorms but rather modulate them in certain conditions.

Here’s a discussion of it:

• Bob Yirka, New research may explain why serious thunderstorms and tornados are less prevalent on the weekends, PhysOrg, 22 December 2011.

3) And if you like to check how people use statistics, here’s a paper that would be incredibly important if its findings were correct:

• Joseph J. Mangano and Janette D. Sherman, An unexpected mortality increase in the United States follows arrival of the radioactive plume from Fukushima: is there a correlation?, International Journal of Health Services 42 (2012), 47–64.

The title has a question mark in it, but it’s been cited in very dramatic terms in many places, for example this video entitled “Peer reviewed study shows 14,000 U.S. deaths from Fukushima”:

Starting at 1:31 you’ll see an interview with one of the paper’s authors, Janette Sherman.

14,000 deaths in the US due to Fukushima? Wow! How did they get that figure? This quote from the paper explains how:

During weeks 12 to 25 [after the Fukushima disaster began], total deaths in 119 U.S. cities increased from 148,395 (2010) to 155,015 (2011), or 4.46 percent. This was nearly double the 2.34 percent rise in total deaths (142,006 to 145,324) in 104 cities for the prior 14 weeks, significant at p < 0.000001 (Table 2). This difference between actual and expected changes of +2.12 percentage points (+4.46% – 2.34%) translates to 3,286 “excess” deaths (155,015 × 0.0212) nationwide. Assuming a total of 2,450,000 U.S. deaths will occur in 2011 (47,115 per week), then 23.5 percent of deaths are reported (155,015/14 = 11,073, or 23.5% of 47,115). Dividing 3,286 by 23.5 percent yields a projected 13,983 excess U.S. deaths in weeks 12 to 25 of 2011.

Hmm. Can you think of some potential problems with this analysis?

In the interview, Janette Sherman also mentions increased death rates of children in British Columbia. Here’s the evidence the paper presents for that:

Shortly after the report [another paper by the authors] was issued, officials from British Columbia, Canada, proximate to the northwestern United States, announced that 21 residents had died of sudden infant death syndrome (SIDS) in the first half of 2011, compared with 16 SIDS deaths in all of the prior year. Moreover, the number of deaths from SIDS rose from 1 to 10 in the months of March, April, May, and June 2011, after Fukushima fallout arrived, compared with the same period in 2010. While officials could not offer any explanation for the abrupt increase, it coincides with our findings in the Pacific Northwest.

4) For the first time in 87 years, a wild gray wolf was spotted in California:

• Stephen Messenger, First gray wolf in 80 years enters California, Treehugger, 29 December 2011.

Researchers have been tracking this juvenile male using a GPS-enabled collar since it departed northern Oregon. In just a few weeks, it walked some 730 miles to California. It was last seen surfing off Malibu. Here is a photograph:

5) George Musser left the Centre for Quantum Technologies and returned to New Jersey, but not before writing a nice blog article explaining how the GRACE satellite uses the Earth’s gravitational field to measure the melting of glaciers:

• George Musser, Melting glaciers muck up Earth’s gravitational field, Scientific American, 22 December 2011.

6) The American Physical Society has started a new group: a Topical Group on the Physics of Climate! If you’re a member of the APS, and care about climate issues, you should join this.

7) Finally, here’s a cool picture taken in the Gulf of Alaska by Kent Smith:

He believes this was caused by fresher water meeting more salty water, but it doesn’t sounds like he’s sure. Can anyone figure out what’s going on? The foam where the waters meet is especially intriguing.


The Decline Effect

18 October, 2011

I bumped into a surprising article recently:

• Jonah Lehrer, Is there something wrong with the scientific method?, New Yorker, 13 December 2010.

It starts with a bit of a bang:

Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.

But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.

This phenomenon does have a name now: it’s called the decline effect. The article tells some amazing stories about it. If you’re in the mood for some fun, I suggest going to your favorite couch or café now, and reading them!

For example: John Ioannides is the author of the most heavily downloaded paper in the open-access journal PLoS Medicine. It’s called Why most published research findings are false.

In it, Ioannides took three prestigious medical journals and looked at the 49 most cited clinical research studies. 45 of them used randomized controlled trials and reported positive results. But of the 34 that people tried to replicate, 41% were either directly contradicted or had their effect sizes significantly downgraded.

For more examples, read the article or listen to this radio show:

Cosmic Habituation, Radiolab, May 3, 2011.

It’s a bit sensationalistic… but it’s fun. It features Jonathan Schooler, who discovered a famous effect in psychology, called verbal overshadowing. It doesn’t really matter what this effect is. What matters is that it showed up very strongly in his first experiments… but as he and others continued to study it, it gradually diminished over time! He got freaked out. And then looked around, and saw that this sort of decline happened all over the place, in lots of cases.

What could cause this ‘decline effect’? There are lots of possible explanations.

At one extreme, maybe the decline effect doesn’t really exist. Maybe this sort of decline just happens sometimes purely by chance. Maybe there are equally many cases where effects seem to get stronger each time they’re measured!

At the other extreme, a very disturbing possibility has been proposed by Jonathan Schooler. He suggests that somehow the laws of reality change when they’re studied, in such a way that initially strong effects gradually get weaker.

I don’t believe this. It’s logically possible, but there are lots of less radical explanations to rule out first.

But if it were true, maybe we could make the decline effect go away by studying it. The decline effect would itself decline!

Unless of course, you started studying the decline of the decline effect.

Okay. On to some explanations that are interesting but less far-out.

One plausible explanation is significance chasing. Scientists work really hard to find something that’s ‘statistically significant’ according to the widely-used criterion of having a p-value of less than 0.05.

That sounds technical, but basically all it means is this: there was at most a 5% chance of having found a deviation from the expected situation that’s as big as the one you found.

(To play this game, you have to say ahead of time what the ‘expected situation’ is: this is your null hypothesis.)

Why is significance chasing dangerous? How can it lead to the decline effect?

Well, here’s how to write a paper with a statistically significant result. Go through 20 different colors of jelly bean and see if people who eat them have more acne than average. There’s a good chance that one of your experiments will say ‘yes’ with a p-value of less than 0.05, just because 0.05 = 1/20. If so, this experiment gives a statistically significant result!

I took this example from Randall Munroe’s cartoon strip xkcd:

It’s funny… but it’s actually sad: some testing of drugs is not much better than this! Clearly a result obtained this way is junk, so when you try to replicate it, the ‘decline effect’ will kick in.

Another possible cause of the decline effect is publication bias: scientists and journals prefer positive results over null results, where no effect is found. And surely there are other explanations, too: for starters, all the ways people can fool themselves into thinking they’ve discovered something interesting.

For suggestions on how to avoid the evils of ‘publication bias’, try these:

• Jonathan Schooler, Unpublished results hide the decline effect, Nature 470 (2011), 437.

Putting an end to ‘significance chasing’ may require people to learn more about statistics:

• Geoff Cumming, Significant does not equal important: why we need the new statistics, 9 October 2011.

He explains the problem in simple language:

Consider a psychologist who’s investigating a new therapy for anxiety. She randomly assigns anxious clients to the therapy group, or a control group. You might think the most informative result would be an estimate of the benefit of therapy – the average improvement as a number of points on the anxiety scale-together with the amount that’s the confidence interval around that average. But psychology typically uses significance testing rather than estimation.

Introductory statistics books often introduce significance testing as a step-by-step recipe:

Step 1. Assume the new therapy has zero effect. You don’t believe this and you fervently hope it’s not true, but you assume it.

Step 2. You use that assumption to calculate a strange thing called a ‘p value’, which is the probability that, if the therapy really has zero effect, the experiment would have given a difference as large as you observed, or even larger.

Step 3. If the p value is small, in particular less than the hallowed criterion of .05 (that’s 1 chance in 20), you are permitted to reject your initial assumption—which you never believed anyway—and declare that the therapy has a ‘significant’ effect.

If that’s confusing, you’re in good company. Significance testing relies on weird backward logic. No wonder countless students every year are bamboozled by their introduction to statistics! Why this strange ritual they ask, and what does a p value actually mean? Why don’t we focus on how large an improvement the therapy gives, and whether people actually find it helpful? These are excellent questions, and estimation gives the best answers.

For half a century distinguished scholars have published damning critiques of significance testing, and explained how it hampers research progress. There’s also extensive evidence that students, researchers, and even statistics teachers often don’t understand significance testing correctly. Strangely, the critiques of significance testing have hardly prompted any defences by its supporters. Instead, psychology and other disciplines have simply continued with the significance testing ritual, which is now deeply entrenched. It’s used in more than 90% of published research in psychology, and taught in every introductory textbook.

For more discussion and references, try my co-blogger:

• Tom Leinster, Fetishizing p-values, n-Category Café.

He gives some good examples of how significance testing can lead us astray. Anyone who uses the p-test should read these! He also discusses this book:

• Stephen T. Ziliak and Deirdre N. McCloskey, The Cult of Statistical Significance, University of Michigan Press, Ann Arbor, 2008. (Online summary here.)

Now, back to the provocative title of that New Yorker article: “Is there something wrong with the scientific method?”

The answer is yes if we mean science as actually practiced, now. Lots of scientists are using cookbook recipes they learned in statistics class without understanding them, or investigating the alternatives. Worse, some are treating statistics as a necessary but unpleasant piece of bureaucratic red tape, and then doing whatever it takes to achieve the appearance of a significant result!

This is a bit depressing. There’s a student I know, who is taking an introductory statistics course. After she read about this stuff she said:

So, what I’m gleaning here is that what I’m studying is basically bull. It struck me as bull to start with, admittedly, but since my grade depended on it, I grinned and swallowed. At least my eyes are open now, I guess.

But there’s some good news, buried in her last sentence. Science has the marvelous ability to notice and correct its own mistakes. It’s scientists who noticed the decline effect and significance chasing. They’ll eventually figure out what’s going on, and learn how to fix any mistakes that they’ve been making. So ultimately, I don’t find this story depressing. It’s actually inspiring!

The scientific method is not a fixed rulebook handed down from on high. It’s a work in progress. It’s only been around for a few centuries—not very long, in the grand scheme of things. The widespread use of statistics in science has been around for less than one century. And computers, which make heavy-duty number-crunching easy, have only been cheap for 30 years! No wonder people still use primitive cookbook methods for analyzing data, when they could do better.

So science is still evolving. And I think that’s fun, because it means we can help it along. If you see someone claim their results are statistically significant, you can ask them what they mean, exactly… and what they had to do to get those results.


I thank a lot of people on Google+ for discussions on this topic, including (but not limited to) John Forbes, Roko Mijic, Heather Vandagriff, and Willie Wong.