*guest post by Steven Wenner*

Hi, I’m Steve Wenner.

I’m an industrial statistician with over 40 years of experience in a wide range applications (quality, reliability, product development, consumer research, biostatistics); but, somehow, time series only rarely crossed my path. Currently I’m working for a large consumer products company.

My undergraduate degree is in physics, and I also have a master’s in pure math. I never could reconcile how physicists used math (explain that Dirac delta function to me again in math terms? Heaviside calculus? On the other hand, I thought category theory was abstract nonsense until John showed me otherwise!). Anyway, I had to admit that I lacked the talent to pursue pure math or theoretical physics, so I became a statistician. I never regretted it—statistics has provided a very interesting and intellectually challenging career.

I got interested in Ludescher *et al*’s paper on El Niño prediction by reading Part 3 of this series. I have no expertise in climate science, except for an intense interest in the subject as a concerned citizen. So, I won’t talk about things like how Ludescher *et al* use a nonstandard definition of ‘El Niño’—that’s a topic for another time. Instead, I’ll look at some statistical aspects of their paper:

• Josef Ludescher, Avi Gozolchiani, Mikhail I. Bogachev, Armin Bunde, Shlomo Havlin, and Hans Joachim Schellnhuber, Very early warning of next El Niño, *Proceedings of the National Academy of Sciences*, February 2014. (Click title for free version, journal name for official version.)

### Analysis

I downloaded the NOAA adjusted monthly temperature anomaly data and compared the El Niño periods with the charts in this paper. I found what appear to be two errors (“phantom” El Niños) and noted some interesting situations. Some of these are annotated on the images below. Click to enlarge them:

I also listed for each year whether an El Niño initiation was predicted, or not, and whether one actually happened. I did the predictions five ways: first, I listed the author’s “arrows” as they appeared on their charts, and then I tried to match their predictions by following in turn four sets of rules. Nevertheless, I could not come up with any detailed rules that exactly reproduced the author’s results.

These were the rules I used:

An El Niño initiation is predicted for a calendar year if during the preceding year the average link strength crossed above the 2.82 threshold. However, we could also invoke additional requirements. Two possibilities are:

1. Preemption rule: the prediction of a new El Niño is canceled if the preceding year ends in an El Niño period.

2. End-of-year rule: the link strength must be above 2.82 at year’s end.

I counted the predictions using all four combinations of these two rules and compared the results to the arrows on the charts.

I defined an “El Niño initiation month” to be a month where the monthly average adjusted temperature anomaly rises up to at least 0.5 C and remains above or equal to 0.5 °C for at least five months. Note that the NOAA El Niño monthly temperature estimates are rounded to hundredths; and, on occasion, the anomaly is reported as exactly 0.5 °C. I found slightly better agreement with the authors’ El Niño periods if I counted an anomaly of exactly 0.5 °C as satisfying the threshold criterion, instead of using the strictly “greater than” condition.

Anyway, I did some formal hypothesis testing and estimation under all five scenarios. The good news is that under most scenarios the prediction method gave better results than merely guessing. (But, I wonder how many things the authors tried before they settled on their final method? Also, did they do all their work on the learning series, and then only at the end check the validation series—or were they checking both as they went about their investigations?)

The bad news is that the predictions varied with the method, and the methods were rather weak. For instance, in the training series there were 9 El Niño periods in 30 years; the authors’ rules (whatever they were, exactly) found five of the nine. At the same time, they had three false alarms in the 21 years that did not have an El Niño initiated.

I used Fisher’s exact test to compute some p-values. Suppose (as our ‘null hypothesis’) that Ludescher *et al’*s method does not improve the odds of a successful prediction of an El Nino initiation. What’s the probability of that method getting at least as many predictions right just by chance? Answer: 0.032 – this is marginally more significant than the conventional 1 in 20 chance that is the usual threshold for rejecting a null hypothesis, but still not terribly convincing. This was, by the way, the most significant of the five p-values for the alternative rule sets applied to the learning series.

I also computed the “relative risk” statistics for all scenarios; for instance, we are more than three times as likely to see an El Niño initiation if Ludescher *et al* predict one, than if they predict otherwise (the 90% confidence interval for that ratio is 1.2 to 9.7, with the point estimate 3.4). Here is a screen shot of some statistics for that case:

Here is a screen shot of part of the spreadsheet list I made. In the margin on the right I made comments about special situations of interest.

Again, click to enlarge—but my whole working spreadsheet is available with more details for anyone who wishes to see it. I did the statistical analysis with a program called JMP, a product of the SAS corporation.

My overall impression from all this is that Ludescher *et al* are suggesting a somewhat arbitrary (and not particularly well-defined) method for revealing the relationship between link strength and El Niño initiation, if, indeed, a relationship exists. Slight variations in the interpretation of their criteria and slight variations in the data result in appreciably different predictions. I wonder if there are better ways to analyze these two correlated time series.