Hi John,

I computed Bayes probability intervals for the success probabilities of Ludescher’s predictions (and for my variations). My method is not perfect, since I assumed independence across years, which is surely not correct. However, taking into account the serial correlations would only increase the length of the error bars, and they are plenty long enough already! I’m not certain how to do better, but I suspect I would need to use Monte Carlo methods, and they have their own problems.

Ludescher’s method has an expected posterior probability of successfully predicting an El Nino initiation event, when one actually occurs, of 0.601. This is very close to the frequentist estimate (11 successes / 18 El Ninos = 0.611); so, the prior distribution has little effect on the estimate of the mean. The 95% confidence interval is from 0.387 to 0.798; so, the data and method did succeed in narrowing the prior uniform interval (see the next paragraph) that extends from 0.286 to 1. The intervals for “non” El Nino events are shorter: 0.768 to 0.951 for the probability of successfully predicting a “non” event; however, the prior for the non-event is from 0.714 to 1, so Ludescher’s method doesn’t narrow the range very much!

If we don’t condition on the outcome, then the estimate of the mean success probability is 0.795 using Ludescher’s method; but, if we simply use the “dumb” rule (always predict “no El Nino”) then we will be right with probability 0.714 – the data and Ludescher gain us very little!

Truncated Uniform Prior:I assume that any reasonable method will do at least as well as chance. Over many years we have experienced an El Nino initiation event in 28.6% of those years. So, a dumb method that simply declares “El Nino will happen next year!” with probability 28.6% and “will not happen!” with probability 71.4% will be successful in “predicting” 28.6% of all El Ninos. So, I set the minimum success probability at p0 = 28.6%, given that an El Nino actually occurs. Similarly, the dumb method successfully predicts 71.4% of the “no El Nino” years; so, I set the minimum success probability at p0 = 71.4% for any prediction method, given that the outcome is “no El Nino”. In both cases the upper limit is p1 = 1 for a perfect prediction method.

For a binomial sampling situation with a truncated uniform prior the posterior density is expressible with the help of the beta distribution function (normalized incomplete beta function). The formulas can be found in Bayesian Reliability Analysis by Martz & Waller, Wiley, 1982, pp262-264. The posterior mean has a closed form, but the Bayes probability intervals must be found by iterative methods (I used the Excel “solver” add-in).

The details are on the spreadsheet. I greyed out superfluous stuff I used to help me get the formulas straight.

Cheers,

Steve

He added:

]]>Hi, I just wanted to add a couple of comments:

The reason I used both the training and the validation data in estimating confidence limits was because the validation data show a better fit to the model than the training data; so, it seemed more fair to use both data sets for these calculations.

I did some rough calculations to estimate the increase in the error bars that might result if I were to take the serial correlations into account. For instance, I think that the lower limit for the probability of successfully predicting an El Nino with Ludescher’s method is actually closer to 0.366, rather than the 0.387 reported below, and the upper limit would increase from 0.798 to 0.815. Since my ideas for this adjustment are only half-baked, I won’t go into the details here.

• El Niño project (part 1): basic introduction to El Niño and our project here.

• El Niño project (part 2): introduction to the physics of El Niño.

• El Niño project (part 3): summary of the work of Ludescher *et al*.

• El Niño project (part 4): how Graham Jones replicated the work by Ludescher *et al*, using software written in R.

• El Niño project (part 5): how to download R and use it to get files of climate data.

• El Niño project (part 6): Steve Wenner’s statistical analysis of the work of Ludescher *et al*.

• El Niño project (part 7): the definition of El Niño.

]]>“How many flavors of …?” http://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-12-00649.1

]]>Thanks. I would like to know (but couldn’t find it in the links) which El Nino events are judged EP-ENSO and which CP-ENSO. I have a hypothesis: Ludescher et al’s method is better at EPs than CPs.

]]>Whoops! Thanks, I fixed it. You can find Steve Wenner’s spreadsheet here. The annotations are interesting.

(My link called the file “ElNinoTemps.xslx”, but it’s “ElNinoTemps.xlsx”.)

]]>A comment about (my attitude toward) null hypotheses: except in rare circumstances null hypothses are always false, and we don’t need any data to confirm that!

Right! You and Hypergeometric (= Jan Galkowski) already know this stuff… but since we have lots of non-statistician readers here, I urge them to read this great blog article:

• Tom Leinster, Fetishizin p-values, The *n*-Category Café, 23 September 2010.

For those too lazy to click the link, here’s the basic point:

Suppose that you and I are each trying to develop a drug to speed up the healing of broken legs. You do some trials and run some statistics and conclude that your drug works—in other words, you reject the null hypothesis that patients taking your drug heal no faster than a control group. I do the same for my drug. Your pp-value is 0.1 (a certainty of 90%), and mine is 0.001 (a certainty of 99.9%).

What does this tell us about the comparative usefulness of our drugs? Nothing. That’s because we know nothing about the magnitude of the effect. I can now reveal that my wonder drug for broken legs is…

an apple a day. Who knows, maybe this does have a minuscule positive effect; for the sake of argument, let’s say that it does. [….] Given enough time to conduct trials, I can truthfully claim any degree of statistical significance I like.The danger is that someone who buys into the ‘cult of statistical significance’ might simply look at the numbers and say ‘90% is less than 99.9%, so the second drug must be better’.

So, I agree completely with Jan’s point:

This all does suggest some thought should be given to how anyone’s going to be able to tell if an Azimuth-derived improvement is actually an improvement over Ludescher,

et al. I’m not saying they can’t. I!m saying it should be given some thought, and perhaps some of us more test engineering-oriented people should invest in that.

And here’s a reason why it’s really worthwhile. The question “how good is this method of El Niño prediction” is not only important for the methods we come up with here at Azimuth. It’s also important for rating the methods the ‘big boys’ use!

Look at this huge range of predictions the ‘big boys’ are making:

Which of these models should we trust? That’s an important question.

Of course, the ‘big boys’ have also thought about this question:

• Anthony G. Barnston, Michael K. Tippett, Michelle L. L’Heureux,

Shuhua Li and David G. Dewitt, Skill of real-time seasonal ENSO model predictions during 2002-2011 — is our capability increasing?, *Science and Technology Infusion Climate Bulletin*, NOAA’s National Weather Service, 36th NOAA Annual Climate Diagnostics and Prediction Workshop, Fort Worth, TX, 3-6 October 2011.

So, we’d need to understand what they’re doing before we could do something better.

]]>