Meta-Rationality

15 March, 2013

On his blog, Eli Dourado writes something that’s very relevant to the global warming debate, and indeed most other debates.

He’s talking about Paul Krugman, but I think with small modifications we could substitute the name of almost any intelligent pundit. I don’t care about Krugman here, I care about the general issue:

Nobel laureate, Princeton economics professor, and New York Times columnist Paul Krugman is a brilliant man. I am not so brilliant. So when Krugman makes strident claims about macroeconomics, a complex subject on which he has significantly more expertise than I do, should I just accept them? How should we evaluate the claims of people much smarter than ourselves?

A starting point for thinking about this question is the work of another Nobelist, Robert Aumann. In 1976, Aumann showed that under certain strong assumptions, disagreement on questions of fact is irrational. Suppose that Krugman and I have read all the same papers about macroeconomics, and we have access to all the same macroeconomic data. Suppose further that we agree that Krugman is smarter than I am. All it should take, according to Aumann, for our beliefs to converge is for us to exchange our views. If we have common “priors” and we are mutually aware of each others’ views, then if we do not agree ex post, at least one of us is being irrational.

It seems natural to conclude, given these facts, that if Krugman and I disagree, the fault lies with me. After all, he is much smarter than I am, so shouldn’t I converge much more to his view than he does to mine?

Not necessarily. One problem is that if I change my belief to match Krugman’s, I would still disagree with a lot of really smart people, including many people as smart as or possibly even smarter than Krugman. These people have read the same macroeconomics literature that Krugman and I have, and they have access to the same data. So the fact that they all disagree with each other on some margin suggests that very few of them behave according to the theory of disagreement. There must be some systematic problem with the beliefs of macroeconomists.

In their paper on disagreement, Tyler Cowen and Robin Hanson grapple with the problem of self-deception. Self-favoring priors, they note, can help to serve other functions besides arriving at the truth. People who “irrationally” believe in themselves are often more successful than those who do not. Because pursuit of the truth is often irrelevant in evolutionary competition, humans have an evolved tendency to hold self-favoring priors and self-deceive about the existence of these priors in ourselves, even though we frequently observe them in others.

Self-deception is in some ways a more serious problem than mere lack of intelligence. It is embarrassing to be caught in a logical contradiction, as a stupid person might be, because it is often impossible to deny. But when accused of disagreeing due to a self-favoring prior, such as having an inflated opinion of one’s own judgment, people can and do simply deny the accusation.

How can we best cope with the problem of self-deception? Cowen and Hanson argue that we should be on the lookout for people who are “meta-rational,” honest truth-seekers who choose opinions as if they understand the problem of disagreement and self-deception. According to the theory of disagreement, meta-rational people will not have disagreements among themselves caused by faith in their own superior knowledge or reasoning ability. The fact that disagreement remains widespread suggests that most people are not meta-rational, or—what seems less likely—that meta-rational people cannot distinguish one another.

We can try to identify meta-rational people through their cognitive and conversational styles. Someone who is really seeking the truth should be eager to collect new information through listening rather than speaking, construe opposing perspectives in their most favorable light, and offer information of which the other parties are not aware, instead of simply repeating arguments the other side has already heard.

All this seems obvious to me, but it’s discussed much too rarely. Maybe we can figure out ways to encourage this virtue that Cohen and Hanson call ‘meta-rationality’? There are already too many mechanisms that reward people for aggressively arguing for fixed positions. If Krugman really were ‘meta-rational’, he might still have his Nobel Prize, but he probably wouldn’t be a popular newspaper columnist.

The Azimuth Project, and this blog, are already doing a lot of things to prevent people from getting locked into fixed positions and filtering out evidence that goes against their views. Most crucial seems to be the policy of forbidding insults, bullying, and overly repetitive restatement of the same views. These behaviors increase what I call the ‘heat’ in a discussion, and I’ve decided that, all things considered, it’s best to keep the heat fairly low.

Heat attracts many people, so I’m sure we could get a lot more people to read this blog by turning up the heat. A little heat is a good thing, because it engages people’s energy. But heat also makes it harder for people to change their minds. When the heat gets too high, changing ones mind is perceived as a defeat, to be avoided at all costs. Even worse, people form ‘tribes’ who back each other up in every argument, regardless of the topic. Rationality goes out the window. And meta-rationality? Forget it!

Some Questions

Dourado talks about ways to “identify meta-rational people.” This is very attractive, but I think it’s better to talk about “identifying when people are behaving meta-rationally”. I don’t think we should spend too much of our time looking around for paragons of meta-rationality. First of all, nobody is perfect. Second of all, as soon as someone gets a big reputation for rationality, meta-rationality, or any other virtue, it seems they develop a fan club that runs a big risk of turning into a cult. This often makes it harder rather than easier for people to think clearly and change their minds!

I’d rather look for customs and institutions that encourage meta-rationality. So, my big question is:

How can we encourage rationality and meta-rationality, and make them more popular?

Of course science, and academia, are institutions that have been grappling with this question for centuries. Universities, seminars, conferences, journals, and so on—they all put a lot of work into encouraging the search for knowledge and examining the conditions under which it thrives.

And of course these institutions are imperfect: everything humans do is riddled with flaws.

But instead of listing cases where existing institutions failed to do their job optimally, I’d like to think about ways of developing new customs and institutions that encourage meta-rationality… and linking these to the existing ones.

Why? Because I feel the existing institutions don’t reach out enough to the ‘general public’, or ‘laymen’. The mere existence of these terms is a clue. There are a lot of people who consider academia as an ‘ivory tower’, separate from their own lives and largely irrelevant. And there are a lot of good reasons for this.

There’s one you’ve heard me talk about a lot: academia has let its journals get bought by big multimedia conglomerates, who then charge high fees for access. So, we have have scientific research on global warming paid for by our tax dollars, and published by prestigious journals such as Science and Nature… which unfortunately aren’t available to the ‘general public’.

That’s like a fire alarm you have to pay to hear.

But there’s another problem: institutions that try to encourage meta-rationality seem to operate by shielding themselves from the broader sphere that favors ‘hot’ discussions. Meanwhile, the hot discussions don’t get enough input from ‘cooler’ forums… and vice versa!

For example: we have researchers in climate science who publish in refereed journals, which mostly academics read. We have conferences, seminars and courses where this research is discussed and criticized. These are again attended mostly by academics. Then we have journalists and bloggers who try to explain and discuss these papers in more easily accessed venues. There are some blogs written by climate scientists, who try to short-circuit the middlemen a bit. Unfortunately the heated atmosphere of some of these blogs makes meta-rationality difficult. There are also blogs by ‘climate skeptics’, many from outside academia. These often criticize the published papers, but—it seems to me—rarely get into discussions with the papers’ authors in conditions that make it easy for either party to change their mind. And on top of all this, we have various think tanks who are more or less pre-committed to fixed positions… and of course, corporations and nonprofits paying for advertisements pushing various agendas.

Of course, it’s not just the global warming problem that suffers from a lack of public forums that encourage meta-rationality. That’s just an example. There have got to be some ways to improve the overall landscape a little. Just a little: I’m not expecting miracles!

Details

Here’s the paper by Aumann:

• Robert J. Aumann, Agreeing to disagree, The Annals of Statistics 4 (1976), 1236-1239.

and here’s the one by Cowen and Hanson:

• Tyler Cowen and Robin Hanson, Are disagreements honest?, 18 August 2004.

Personally I find Aumann’s paper uninteresting, because he’s discussing agents that are not only rational Bayesians, but rational Bayesians that share the same priors to begin with! It’s unsurprising that such agents would have trouble finding things to argue about.

His abstract summarizes his result quite clearly… except that he calls these idealized agents ‘people’, which is misleading:

Abstract. Two people, 1 and 2, are said to have common knowledge of an event E if both know it, 1 knows that 2 knows it, 2 knows that 1 knows is, 1 knows that 2 knows that 1 knows it, and so on.

Theorem. If two people have the same priors, and their posteriors for an event A are common knowledge, then these posteriors are equal.

Cowen and Hanson’s paper is more interesting to me. Here are some key sections for what we’re talking about here:

How Few Meta-rationals?

We can call someone a truth-seeker if, given his information and level of effort on a topic, he chooses his beliefs to be as close as possible to the truth. A non-truth seeker will, in contrast, also put substantial weight on other goals when choosing his beliefs. Let us also call someone meta-rational if he is an honest truth-seeker who chooses his opinions as if he understands the basic theory of disagreement, and abides by the rationality standards that most people uphold, which seem to preclude self-favoring priors.

The theory of disagreement says that meta-rational people will not knowingly have self-favoring disagreements among themselves. They might have some honest disagreements, such as on values or on topics of fact where their DNA encodes relevant non-self-favoring attitudes. But they will not have dishonest disagreements, i.e., disagreements directly on their relative ability, or disagreements on other random topics caused by their faith in their own superior knowledge or reasoning ability.

Our working hypothesis for explaining the ubiquity of persistent disagreement is that people are not usually meta-rational. While several factors contribute to this situation, a sufficient cause that usually remains when other causes are removed is that people do not typically seek only truth in their beliefs, not even in a persistent rational core. People tend to be hypocritical in have self-favoring priors, such as priors that violate indexical independence, even though they criticize others for such priors. And they are reluctant to admit this, either publicly or to themselves.

How many meta-rational people can there be? Even if the evidence is not consistent with most people being meta-rational, it seems consistent with there being exactly one meta-rational person. After all, in this case there never appears a pair of meta-rationals to agree with each other. So how many more meta-rationals are possible?

If meta-rational people were common, and able to distinguish one another, then we should see many pairs of people who have almost no dishonest disagreements with each other. In reality, however, it seems very hard to find any pair of people who, if put in contact, could not identify many persistent disagreements. While this is an admittedly difficult empirical determination to make, it suggests that there are either extremely few meta-rational people, or that they have virtually no way to distinguish each other.

Yet it seems that meta-rational people should be discernible via their conversation style. We know that, on a topic where self-favoring opinions would be relevant, the sequence of alternating opinions between a pair of people who are mutually aware of both being meta-rational must follow a random walk. And we know that the opinion sequence between typical non-meta-rational humans is nothing of the sort. If, when responding to the opinions of someone else of uncertain type, a meta-rational person acts differently from an ordinary non-meta-rational person, then two meta-rational people should be able to discern one another via a long enough conversation. And once they discern one another, two meta-rational people should no longer have dishonest disagreements. (Aaronson (2004) has shown that regardless of the topic or their initial opinions, any two Bayesians have less than a 10% chance of disagreeing by more than a 10% after exchanging about a thousand bits, and less than a 1% chance of disagreeing by more than a 1% after exchanging about a million bits.)

Since most people have extensive conversations with hundreds of people, many of whom they know very well, it seems that the fraction of people who are meta-rational must be very small. For example, given N people, a fraction f of whom are meta-rational, let each person participate in C conversations with random others that last long enough for two meta-rational people to discern each other. If so, there should be on average f^2CN/2 pairs who no longer disagree. If, across the world, two billion people, one in ten thousand of who are meta-rational, have one hundred long conversations each, then we should see one thousand pairs of people with only honest disagreements. If, within academia, two million people, one in ten thousand of who are meta-rational, have one thousand long conversations each, we should see ten agreeing pairs of academics. And if meta-rational people had any other clues to discern each another, and preferred to talk with one another, there should be far more such pairs. Yet, with the possible exception of some cult-like or fan-like relationships, where there is an obvious alternative explanation for their agreement, we know of no such pairs of people who no longer disagree on topics where self-favoring opinions are relevant.

We therefore conclude that unless meta-rationals simply cannot distinguish each other, only a tiny non-descript percentage of the population, or of academics, can be meta-rational. Either few people have truth-seeking rational cores, and those that do cannot be readily distinguished, or most people have such cores but they are in control infrequently and unpredictably. Worse, since it seems unlikely that the only signals of meta-rationality would be purely private signals, we each seem to have little grounds for confidence in our own meta-rationality, however much we would like to believe otherwise.

Personally, I think the failure to find ‘ten agreeing pairs of academics’ is not very interesting. Instead of looking for people who are meta-rational in all respects, which seems futile, I’m more interested in to looking for contexts and institutions that encourage people to behave meta-rationally when discussing specific issues.

For example, there’s surprisingly little disagreement among mathematicians when they’re discussing mathematics and they’re on their best behavior—for example, talking in a classroom. Disagreements show up, but they’re often dismissed quickly when one or both parties realize their mistake. The same people can argue bitterly and endlessly over politics or other topics. They are not meta-rational people: I doubt such people exist. They are people who have been encouraged by an institution to behave meta-rationally in specific limited ways… because the institution rewards this behavior.

Moving on:

Personal policy implications

Readers need not be concerned about the above conclusion if they have not accepted our empirical arguments, or if they are willing to embrace the rationality of self-favoring priors, and to forgo criticizing the beliefs of others caused by such priors. Let us assume, however, that you, the reader, are trying to be one of those rare meta-rational souls in the world, if indeed there are any. How guilty should you feel when you disagree on topics where self-favoring opinions are relevant?

If you and the people you disagree with completely ignored each other’s opinions, then you might tend to be right more if you had greater intelligence and information. And if you were sure that you were meta-rational, the fact that most people were not might embolden you to disagree with them. But for a truth-seeker, the key question must be how sure you can be that you, at the moment, are substantially more likely to have a truth-seeking, in-control, rational core than the people you now disagree with. This is because if either of you have some substantial degree of meta-rationality, then your relative intelligence and information are largely irrelevant except as they may indicate which of you is more likely to be self-deceived about being meta-rational.

One approach would be to try to never assume that you are more meta-rational than anyone else. But this cannot mean that you should agree with everyone, because you simply cannot do so when other people disagree among themselves. Alternatively, you could adopt a “middle” opinion. There are, however, many ways to define middle, and people can disagree about which middle is best (Barns 1998). Not only are there disagreements on many topics, but there are also disagreements on how to best correct for one’s limited meta-rationality.

Ideally we would want to construct a model of the process of individual self-deception, consistent with available data on behavior and opinion. We could then use such a model to take the observed distribution of opinion, and infer where lies the weight of evidence, and hence the best estimate of the truth. [Ideally this model would also satisfy a reflexivity constraint: when applied to disputes about self-deception it should select itself as the best model of self-deception. If people reject the claim that most people are self-deceived about their meta-rationality, this approach becomes more difficult, though perhaps not impossible.]

A more limited, but perhaps more feasible, approach to relative meta-rationality is to seek observable signs that indicate when people are self-deceived about their meta-rationality on a particular topic. You might then try to disagree only with those who display such signs more strongly than you do. For example, psychologists have found numerous correlates of self-deception. Self-deception is harder regarding one’s overt behaviors, there is less self-deception in a galvanic skin response (as used in lie detector tests) than in speech, the right brain hemisphere tends to be more honest, evaluations of actions are less honest after those actions are chosen than before (Trivers 2000), self-deceivers have more self-esteem and less psychopathology, especially less depression (Paulhus 1986), and older children are better than younger ones at hiding their self-deception from others (Feldman & Custrini 1988). Each correlate implies a corresponding sign of self-deception.

Other commonly suggested signs of self-deception include idiocy, self-interest, emotional arousal, informality of analysis, an inability to articulate supporting arguments, an unwillingness to consider contrary arguments, and ignorance of standard mental biases. If verified by further research, each of these signs would offer clues for identifying other people as self-deceivers.

Of course, this is easier said than done. It is easy to see how self-deceiving people, seeking to justify their disagreements, might try to favor themselves over their opponents by emphasizing different signs of self-deception in different situations. So looking for signs of self-deception need not be an easier approach than trying to overcome disagreement directly by further discussion on the topic of the disagreement.

We therefore end on a cautionary note. While we have identified some considerations to keep in mind, were one trying to be one of those rare meta-rational souls, we have no general recipe for how to proceed. Perhaps recognizing the difficulty of this problem can at least make us a bit more wary of our own judgments when we disagree.


The Faculty of 1000

31 January, 2012

As of this minute, 1890 scholars have signed a pledge not to cooperate with the publisher Elsevier. People are starting to notice. According to this Wired article, the open-access movement is “catching fire”:

• David Dobbs, Testify: the open-science movement catches fire, Wired, 30 January 2012.


Now is a good time to take more substantial actions. But what?

Many things are being discussed, but it’s good to spend a bit of time thinking about the root problems and the ultimate solutions.

The world-wide web has made journals obsolete: it would be better to put papers on freely available archives and then let boards of top scholars referee them. But how do we get to this system?

In math and physics we have the arXiv, but nobody referees those papers. In biology and medicine, a board called the Faculty of 1000 chooses and evaluates the best papers, but there’s no archive: they get those papers from traditional journals.

Whoops—never mind! That was yesterday. Now the Faculty of 1000 has started an archive!

• Rebecca Lawrence, F1000 Research – join us and shape the future of scholarly communication, F1000, 30 January 2012.

• Ivan Oransky, An arXiv for all of science? F1000 launches new immediate publication journal, Retraction Watch, 30 January 2012.

This blog article says “an arXiv for all science”, but it seems the new F1000 Research archive is just for biology and medicine. So now it’s time for the mathematicians and physicists to start catching up.


Azimuth on Google Plus (Part 5)

1 January, 2012

Happy New Year! I’m back from Laos. Here are seven items, mostly from the Azimuth Circle on Google Plus:

1) Phil Libin is the boss of a Silicon Valley startup. When he’s off travelling, he uses a telepresence robot to keep an eye on things. It looks like a stick figure on wheels. Its bulbous head has two eyes, which are actually a camera and a laser. On its forehead is a screen, where you can see Libin’s face. It’s made by a company called Anybots, and it costs just $15,000.


I predict that within my life we’ll be using things like this to radically cut travel costs and carbon emissions for business and for conferences. It seems weird now, but so did telephones. Future models will be better to look at. But let’s try it soon!

• Laura Sydell No excuses: robots put you in two places at once, Weekend Edition Saturday, 31 December 2011.

Bruce Bartlett and I are already planning for me to use telepresence to give a lecture on mathematics and the environment at Stellenbosch University in South Africa. But we’d been planning to use old-fashioned videoconferencing technology.

Anybots is located in Mountain View, California. That’s near Google’s main campus. Can anyone help me set up a talk on energy and the environment at Google, where I use an Anybot?

(Or, for that matter, anywhere else around there?)

2) A study claims to have found a correlation between weather and the day of the week! The claim is that there are more tornados and hailstorms in the eastern USA during weekdays. One possible mechanism could be that aerosols from car exhaust help seed clouds.


I make no claims that this study is correct. But at the very least, it would be interesting to examine their use of statistics and see if it’s convincing or flawed:

• Thomas Bell and Daniel Rosenfeld, Why do tornados and hailstorms rest on weekends?, Journal of Geophysical Research 116 (2011), D20211.

Abstract. This study shows for the first time statistical evidence that when anthropogenic aerosols over the eastern United States during summertime are at their weekly mid-week peak, tornado and hailstorm activity there is also near its weekly maximum. The weekly cycle in summertime storm activity for 1995–2009 was found to be statistically significant and unlikely to be due to natural variability. It correlates well with previously observed weekly cycles of other measures of storm activity. The pattern of variability supports the hypothesis that air pollution aerosols invigorate deep convective clouds in a moist, unstable atmosphere, to the extent of inducing production of large hailstones and tornados. This is caused by the effect of aerosols on cloud drop nucleation, making cloud drops smaller and hydrometeors larger. According to simulations, the larger ice hydrometeors contribute to more hail. The reduced evaporation from the larger hydrometeors produces weaker cold pools. Simulations have shown that too cold and fast-expanding pools inhibit the formation of tornados. The statistical observations suggest that this might be the mechanism by which the weekly modulation in pollution aerosols is causing the weekly cycle in severe convective storms during summer over the eastern United States. Although we focus here on the role of aerosols, they are not a primary atmospheric driver of tornados and hailstorms but rather modulate them in certain conditions.

Here’s a discussion of it:

• Bob Yirka, New research may explain why serious thunderstorms and tornados are less prevalent on the weekends, PhysOrg, 22 December 2011.

3) And if you like to check how people use statistics, here’s a paper that would be incredibly important if its findings were correct:

• Joseph J. Mangano and Janette D. Sherman, An unexpected mortality increase in the United States follows arrival of the radioactive plume from Fukushima: is there a correlation?, International Journal of Health Services 42 (2012), 47–64.

The title has a question mark in it, but it’s been cited in very dramatic terms in many places, for example this video entitled “Peer reviewed study shows 14,000 U.S. deaths from Fukushima”:

Starting at 1:31 you’ll see an interview with one of the paper’s authors, Janette Sherman.

14,000 deaths in the US due to Fukushima? Wow! How did they get that figure? This quote from the paper explains how:

During weeks 12 to 25 [after the Fukushima disaster began], total deaths in 119 U.S. cities increased from 148,395 (2010) to 155,015 (2011), or 4.46 percent. This was nearly double the 2.34 percent rise in total deaths (142,006 to 145,324) in 104 cities for the prior 14 weeks, significant at p < 0.000001 (Table 2). This difference between actual and expected changes of +2.12 percentage points (+4.46% – 2.34%) translates to 3,286 “excess” deaths (155,015 × 0.0212) nationwide. Assuming a total of 2,450,000 U.S. deaths will occur in 2011 (47,115 per week), then 23.5 percent of deaths are reported (155,015/14 = 11,073, or 23.5% of 47,115). Dividing 3,286 by 23.5 percent yields a projected 13,983 excess U.S. deaths in weeks 12 to 25 of 2011.

Hmm. Can you think of some potential problems with this analysis?

In the interview, Janette Sherman also mentions increased death rates of children in British Columbia. Here’s the evidence the paper presents for that:

Shortly after the report [another paper by the authors] was issued, officials from British Columbia, Canada, proximate to the northwestern United States, announced that 21 residents had died of sudden infant death syndrome (SIDS) in the first half of 2011, compared with 16 SIDS deaths in all of the prior year. Moreover, the number of deaths from SIDS rose from 1 to 10 in the months of March, April, May, and June 2011, after Fukushima fallout arrived, compared with the same period in 2010. While officials could not offer any explanation for the abrupt increase, it coincides with our findings in the Pacific Northwest.

4) For the first time in 87 years, a wild gray wolf was spotted in California:

• Stephen Messenger, First gray wolf in 80 years enters California, Treehugger, 29 December 2011.

Researchers have been tracking this juvenile male using a GPS-enabled collar since it departed northern Oregon. In just a few weeks, it walked some 730 miles to California. It was last seen surfing off Malibu. Here is a photograph:

5) George Musser left the Centre for Quantum Technologies and returned to New Jersey, but not before writing a nice blog article explaining how the GRACE satellite uses the Earth’s gravitational field to measure the melting of glaciers:

• George Musser, Melting glaciers muck up Earth’s gravitational field, Scientific American, 22 December 2011.

6) The American Physical Society has started a new group: a Topical Group on the Physics of Climate! If you’re a member of the APS, and care about climate issues, you should join this.

7) Finally, here’s a cool picture taken in the Gulf of Alaska by Kent Smith:

He believes this was caused by fresher water meeting more salty water, but it doesn’t sounds like he’s sure. Can anyone figure out what’s going on? The foam where the waters meet is especially intriguing.


The Decline Effect

18 October, 2011

I bumped into a surprising article recently:

• Jonah Lehrer, Is there something wrong with the scientific method?, New Yorker, 13 December 2010.

It starts with a bit of a bang:

Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.

But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.

This phenomenon does have a name now: it’s called the decline effect. The article tells some amazing stories about it. If you’re in the mood for some fun, I suggest going to your favorite couch or café now, and reading them!

For example: John Ioannides is the author of the most heavily downloaded paper in the open-access journal PLoS Medicine. It’s called Why most published research findings are false.

In it, Ioannides took three prestigious medical journals and looked at the 49 most cited clinical research studies. 45 of them used randomized controlled trials and reported positive results. But of the 34 that people tried to replicate, 41% were either directly contradicted or had their effect sizes significantly downgraded.

For more examples, read the article or listen to this radio show:

Cosmic Habituation, Radiolab, May 3, 2011.

It’s a bit sensationalistic… but it’s fun. It features Jonathan Schooler, who discovered a famous effect in psychology, called verbal overshadowing. It doesn’t really matter what this effect is. What matters is that it showed up very strongly in his first experiments… but as he and others continued to study it, it gradually diminished over time! He got freaked out. And then looked around, and saw that this sort of decline happened all over the place, in lots of cases.

What could cause this ‘decline effect’? There are lots of possible explanations.

At one extreme, maybe the decline effect doesn’t really exist. Maybe this sort of decline just happens sometimes purely by chance. Maybe there are equally many cases where effects seem to get stronger each time they’re measured!

At the other extreme, a very disturbing possibility has been proposed by Jonathan Schooler. He suggests that somehow the laws of reality change when they’re studied, in such a way that initially strong effects gradually get weaker.

I don’t believe this. It’s logically possible, but there are lots of less radical explanations to rule out first.

But if it were true, maybe we could make the decline effect go away by studying it. The decline effect would itself decline!

Unless of course, you started studying the decline of the decline effect.

Okay. On to some explanations that are interesting but less far-out.

One plausible explanation is significance chasing. Scientists work really hard to find something that’s ‘statistically significant’ according to the widely-used criterion of having a p-value of less than 0.05.

That sounds technical, but basically all it means is this: there was at most a 5% chance of having found a deviation from the expected situation that’s as big as the one you found.

(To play this game, you have to say ahead of time what the ‘expected situation’ is: this is your null hypothesis.)

Why is significance chasing dangerous? How can it lead to the decline effect?

Well, here’s how to write a paper with a statistically significant result. Go through 20 different colors of jelly bean and see if people who eat them have more acne than average. There’s a good chance that one of your experiments will say ‘yes’ with a p-value of less than 0.05, just because 0.05 = 1/20. If so, this experiment gives a statistically significant result!

I took this example from Randall Munroe’s cartoon strip xkcd:

It’s funny… but it’s actually sad: some testing of drugs is not much better than this! Clearly a result obtained this way is junk, so when you try to replicate it, the ‘decline effect’ will kick in.

Another possible cause of the decline effect is publication bias: scientists and journals prefer positive results over null results, where no effect is found. And surely there are other explanations, too: for starters, all the ways people can fool themselves into thinking they’ve discovered something interesting.

For suggestions on how to avoid the evils of ‘publication bias’, try these:

• Jonathan Schooler, Unpublished results hide the decline effect, Nature 470 (2011), 437.

Putting an end to ‘significance chasing’ may require people to learn more about statistics:

• Geoff Cumming, Significant does not equal important: why we need the new statistics, 9 October 2011.

He explains the problem in simple language:

Consider a psychologist who’s investigating a new therapy for anxiety. She randomly assigns anxious clients to the therapy group, or a control group. You might think the most informative result would be an estimate of the benefit of therapy – the average improvement as a number of points on the anxiety scale-together with the amount that’s the confidence interval around that average. But psychology typically uses significance testing rather than estimation.

Introductory statistics books often introduce significance testing as a step-by-step recipe:

Step 1. Assume the new therapy has zero effect. You don’t believe this and you fervently hope it’s not true, but you assume it.

Step 2. You use that assumption to calculate a strange thing called a ‘p value’, which is the probability that, if the therapy really has zero effect, the experiment would have given a difference as large as you observed, or even larger.

Step 3. If the p value is small, in particular less than the hallowed criterion of .05 (that’s 1 chance in 20), you are permitted to reject your initial assumption—which you never believed anyway—and declare that the therapy has a ‘significant’ effect.

If that’s confusing, you’re in good company. Significance testing relies on weird backward logic. No wonder countless students every year are bamboozled by their introduction to statistics! Why this strange ritual they ask, and what does a p value actually mean? Why don’t we focus on how large an improvement the therapy gives, and whether people actually find it helpful? These are excellent questions, and estimation gives the best answers.

For half a century distinguished scholars have published damning critiques of significance testing, and explained how it hampers research progress. There’s also extensive evidence that students, researchers, and even statistics teachers often don’t understand significance testing correctly. Strangely, the critiques of significance testing have hardly prompted any defences by its supporters. Instead, psychology and other disciplines have simply continued with the significance testing ritual, which is now deeply entrenched. It’s used in more than 90% of published research in psychology, and taught in every introductory textbook.

For more discussion and references, try my co-blogger:

• Tom Leinster, Fetishizing p-values, n-Category Café.

He gives some good examples of how significance testing can lead us astray. Anyone who uses the p-test should read these! He also discusses this book:

• Stephen T. Ziliak and Deirdre N. McCloskey, The Cult of Statistical Significance, University of Michigan Press, Ann Arbor, 2008. (Online summary here.)

Now, back to the provocative title of that New Yorker article: “Is there something wrong with the scientific method?”

The answer is yes if we mean science as actually practiced, now. Lots of scientists are using cookbook recipes they learned in statistics class without understanding them, or investigating the alternatives. Worse, some are treating statistics as a necessary but unpleasant piece of bureaucratic red tape, and then doing whatever it takes to achieve the appearance of a significant result!

This is a bit depressing. There’s a student I know, who is taking an introductory statistics course. After she read about this stuff she said:

So, what I’m gleaning here is that what I’m studying is basically bull. It struck me as bull to start with, admittedly, but since my grade depended on it, I grinned and swallowed. At least my eyes are open now, I guess.

But there’s some good news, buried in her last sentence. Science has the marvelous ability to notice and correct its own mistakes. It’s scientists who noticed the decline effect and significance chasing. They’ll eventually figure out what’s going on, and learn how to fix any mistakes that they’ve been making. So ultimately, I don’t find this story depressing. It’s actually inspiring!

The scientific method is not a fixed rulebook handed down from on high. It’s a work in progress. It’s only been around for a few centuries—not very long, in the grand scheme of things. The widespread use of statistics in science has been around for less than one century. And computers, which make heavy-duty number-crunching easy, have only been cheap for 30 years! No wonder people still use primitive cookbook methods for analyzing data, when they could do better.

So science is still evolving. And I think that’s fun, because it means we can help it along. If you see someone claim their results are statistically significant, you can ask them what they mean, exactly… and what they had to do to get those results.


I thank a lot of people on Google+ for discussions on this topic, including (but not limited to) John Forbes, Roko Mijic, Heather Vandagriff, and Willie Wong.


Follow

Get every new post delivered to your Inbox.

Join 744 other followers