One problem with using a finer grid is that, as John pointed out earlier, the temperatures were not measured using one thermometer per grid point, but estimated in a complicated way. Their statistical properties are likely to be strange.

]]>Steve writes:

“Ludescher et al used temperature data on a 7.5° × 7.5° grid. Data is available for a 1.5° × 1.5° grid.”

They got their temperature data by taking data on the 1.5° × 1.5° grid and ‘subsampling’ it in some (undescribed?) manner to get temperatures on a coarser 7.5° × 7.5° grid. Our own program, written by Graham Jones, takes the obvious approach of averaging over groups of nine squares. As you note, this allows us the ability to think of each temperature on the coarser grid as the mean of a random variable.

If you can think of anything interesting to do with that, I’d like to hear it! Graham and I may or may not have the energy to implement your ideas in software. But I’ll be releasing Graham’s software in an hour or so, written in R, and it would be easy to ‘tweak’ it to carry out various other statistical analyses. If you can program in R, maybe you could help out.

]]>John Baez points out that “Ludescher et al used temperature data on a 7.5° × 7.5° grid. Data is available for a 1.5° × 1.5° grid.” Great! That leaves lots of room for repeated sampling schemes to estimate the standard error of link strength estimates (given the underlying temperature data). For instance, we could sample two 1.5° × 1.5° “squares” within each 7.5° × 7.5° area, and do this 25*24/2 = 300 different ways, providing many estimates (although, not entirely independent estimates) of the link strength using slightly modified definitions. Also, we could investigate the validity of any type of proposed link-strength-based El Nino prediction scheme by repeating the calculations for each sample and calculating the precision and accuracy of the predictions. We could investigate the validation question by using the same (threshold) parameter(s) on all samples, or fit the threshold separately for each sample.

Maybe. On another level perhaps my ideas make no sense (maybe I am trying too hard to think of these things as a frequentist-oriented statistician – everything has to be a sample from a random distribution!).

]]>John writes: “I think an El Niño starts the moment the blue peak starts… as long as it lasts at least 5 months.” – but they didn’t start the blue peaks until 5 months in. I wish they had started the blue peaks from the first day (but charting only those crossings that lasted at least 5 months).

John also writes: “I think that if the average link strength exceeds 2.82 any time in 2011, and we’re not already in an El Niño at that time, they predict an El Niño will start sometime in 2012.” – does that mean they cannot make a prediction until December 31, in order to know if the threshold has been crossed sometime in the year? Now that we have entered July, what is the conditional probability that we will still see the start of an El Nino before the end of 2014, given that one has not yet started in the first six months?

I suspect there is a better way to go about all this, even neglecting questions about the link strength calculations.

]]>A couple of fine points:

If theta was set to 1.5, their method would give no alarms, since it is only when the average link strength goes from below to above theta that it can give alarms. It is not obvious that theta is tightly determined by this data set, because theta=2.5 (say) gives a different set of alarms, and it would take some careful examination of graphs to see if it was better or worse.

Since their criterion for alarms includes a condition on the El Nino index, their method is dependent on the choice of this index.

]]>Steve wrote:

In their paper they define the blue areas in the charts as indicating “when the NINO3.4 index is above 0.5 °C for at least 5 mo”. Is that also the time above 0.5 C to declare that an El Nino event is occurring? – You say that the required time is 3 months, not 5 months.

Thanks for catching that! That was a mistake on my part. I’ve fixed the blog article: now it says 5 months.

Once we accumulate 3 months (or 5 months?) above 0.5 C, do we then declare that we have been in an El Nino from the first day the threshold was crossed, or does the clock start at 3 months?

That’s a great question. I hope we can figure out the answer—I’ve been much more focused on writing up Graham’s replication of the average link strength calculation.

Personally, I interpret them as meaning the El Niño starts from the first day the threshold was crossed. Their graph shows these blue peaks, and I think an El Niño starts the moment the blue peak starts… as long as it lasts at least 5 months.

But it couldn’t hurt (much) to ask Ludescher *et al*.

I’m having a little trouble deciding on an interpretation of “they predict an El Niño will start in the following calendar year”. Does this mean their prediction is that 12 months hence we will be in an El Nino; or, does this mean that sometime in the next 12 months an El Nino will commence, whether or not we are still in one at the end of the 12 month period?

More great questions. But the use of the term **calendar year** makes me think this is not about 12 month periods. I think that if the average link strength exceeds any time in 2011, and we’re not already in an El Niño at that time, they predict an El Niño will start sometime in 2012.

I don’t think they would say “next calendar year” if they meant “next 12 months”.

By the way, is the data for the El Nino and link strength time series available? I’d love to play around with them.

I don’t think Ludescher *et al* have made this data available. However, the point of our project is to replicate their work and make all the data, software etc. publicly available—and explain how everything works. So, in the next couple of blog articles you’ll get this stuff.

Steve Wenner wrote:

But I intended to ask the more difficult question: what is the uncertainty engendered by the full process of choosing the grid, generating the data and estimating the model. To be really meaningful I think these investigations must be very robust, and not simply artifacts resulting from a very particular data set and analysis scheme.

Okay, now I see what you mean. I don’t agree.

I don’t think anyone is claiming that the average link strength as a function of time, or the threshold , are robustly interesting facts about nature. They may depend on all sort of details like the grid used, etcetera. Perhaps they *are* robust, and that would be interesting. But I don’t think anybody is claiming they are. The point is to develop a scheme for predicting El Niños. It will be justified if it succeeds in predicting future El Niños.

For comparison, people predicting the weather use all sorts of somewhat arbitrary schemes based on a mix of science and intuition. What really matters is whether these schemes succeed in predicting the weather.

Of course, robustness and—more generally—methodological soundness can be used as an arguments that we’re doing something right. But I don’t really think Ludescher *et al*‘s ‘average link strength’ should be independent of things like the grid size. I imagine that as you shrink the grid, the average link strength goes up.

Luckily, this is something you can easily check! Ludescher *et al* used temperature data on a 7.5° × 7.5° grid. Data is available for a 1.5° × 1.5° grid; they subsampled this to get a coarser grid. In my next post I’ll give links to Graham Jones’ software for calculating the average link strength. It would be quite easy to modify this to use a 1.5° × 1.5° grid! You’d mainly need to remove some lines that subsample down to the coarser 7.5° × 7.5°. And, you’d need to let the resulting program grind away for quite a bit longer. On a PC it takes about half an hour with the coarser grid.

I predict average link strengths will go up.

]]>You can get Niño 3.4 data here:

• Niño 3.4 data since 1870 calculated from the HadISST1, NOAA. Discussed in N. A. Rayner

et al, Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century,J. Geophys. Res.108(2003), 4407.You can also get Niño 3.4 data here:

• Monthly Niño 3.4 index, Climate Prediction Center, National Weather Service.

The actual temperatures in Celsius are close to those at the other website, but the anomalies are rather different, because they’re computed in a way that takes global warming into account. See the website for details.

It’s interesting to see how they take global warming into account. The anomaly is now defined as the temperature minus the average temperature (on the same day of the year) in a certain 30-year moving period! So, the average goes up over time, like this:

]]>