What follows is draft of a talk I’ll be giving at the Neural Information Processing Seminar on December 10th. The actual talk may contain more stuff—for example, more work that Dara Shayda has done. But I’d love comments now, so I’m posting this now and hoping you can help out.
You can click on any of the pictures to see where it came from or get more information.
Preliminary throatclearing
I’m very flattered to be invited to speak here. I was probably invited because of my abstract mathematical work on networks and category theory. But when I got the invitation, instead of talking about something I understood, I thought I’d learn about something a bit more practical and talk about that. That was a bad idea. But I’ll try to make the best of it.
I’ve been trying to learn climate science. There’s a subject called ‘complex networks’ where people do statistical analyses of large graphs like the worldwide web or Facebook and draw conclusions from it. People are trying to apply these ideas to climate science. So that’s what I’ll talk about. I’ll be reviewing a lot of other people’s work, but also describing some work by a project I’m involved in, the Azimuth Project.
The Azimuth Project is an allvolunteer project involving scientists and programmers, many outside academia, who are concerned about environmental issues and want to use their skills to help. This talk is based on the work of many people in the Azimuth Project, including Jan Galkowski, Graham Jones, Nadja Kutz, Daniel Mahler, Blake Pollard, Paul Pukite, Dara Shayda, David Tanzer, David Tweed, Steve Wenner and others. Needless to say, I’m to blame for all the mistakes.
Climate variability and El Niño
Okay, let’s get started.
You’ve probably heard about the ‘global warming pause’. Is this a real thing? If so, is it due to ‘natural variability’, heat going into the deep oceans, some combination of both, a massive failure of our understanding of climate processes, or something else?
Here is chart of global average air temperatures at sea level, put together by NASA’s Goddard Institute of Space Science:
You can see a lot of fluctuations, including a big dip after 1940 and a tiny dip after 2000. That tiny dip is the socalled ‘global warming pause’. What causes these fluctuations? That’s a big, complicated question.
One cause of temperature fluctuations is a kind of cycle whose extremes are called El Niño and La Niña.
A lot of things happen during an El Niño. For example, in 1997 and 1998, a big El Niño, we saw all these events:
El Niño is part of an irregular cycle that happens every 3 to 7 years, called the El Niño Southern Oscillation or ENSO. Two strongly correlated signs of an El Niño are:
1) Increased sea surface temperatures in a patch of the Pacific called the Niño 3.4 region. The temperature anomaly in this region—how much warmer it is than usual for that time of year—is called the Niño 3.4 index.
2) A decrease in air pressures in the western side of the Pacific compared to those further east. This is measured by the Southern Oscillation Index or SOI.
You can see the correlation here:
El Niños are important because they can cause billions of dollars of economic damage. They also seem to bring heat stored in the deeper waters of the Pacific into the atmosphere. So, one reason for the ‘global warming pause’ may be that we haven’t had a strong El Niño since 1998. The global warming pause might end with the next El Niño. For a while it seemed we were due for a big one this fall, but that hasn’t happened.
Teleconnections
The ENSO cycle is just one of many cycles involving teleconnections: strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using principal component analysis of climate data, and also other techniques.
The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade. Some others include:
• The PacificNorth America Oscillation.
• The Pacific Decadal Oscillation.
• The North Atlantic Oscillation.
• The Arctic Oscillation.
For example, the Pacific Decadal Oscillation is a longerperiod relative of the ENSO, centered in the north Pacific:
Complex network theory
Recently people have begun to study teleconnections using ideas from ‘complex network theory’.
What’s that? In complex network theory, people often start with a weighted graph: that is, a set of nodes and for any pair of nodes a weight which can be any nonnegative real number.
Why is this called a weighted graph? It’s really just a matrix of nonnegative real numbers!
The reason is that we can turn any weighted graph into a graph by drawing an edge from node to node whenever This is a directed graph, meaning that we should draw an arrow pointing from to We could have an edge from to but not vice versa! Note that we can also have an edge from a node to itself.
Conversely, if we have any directed graph, we can turn it into a weighted graph by choosing the weight when there’s an edge from to and otherwise.
For example, we can make a weighted graph where the nodes are web pages and is the number of links from the web page to the web page
People in complex network theory like examples of this sort: large weighted graphs that describe connections between web pages, or people, or cities, or neurons, or other things. The goal, so far, is to compute numbers from weighted graphs in ways that describe interesting properties of these complex networks—and then formulate and test hypotheses about the complex networks we see in real life.
The El Niño basin
Here’s a very simple example of what we can do with a weighted graph. For any node we can sum up the weights of edges going into
This is called the degree of the node For example, if lots of people have web pages with lots of links to yours, your webpage will have a high degree. If lots of people like you on Facebook, you will have a high degree.
So, the degree is some measure of how ‘important’ a node is.
People have constructed climate networks where the nodes are locations on the Earth’s surface, and the weight measures how correlated the weather is at the th and th location. Then, the degree says how ‘important’ a given location is for the Earth’s climate—in some vague sense.
For example, in Complex networks in climate dynamics, Donges et al take surface air temperature data on a grid and compute the correlation between grid points.
More precisely, let be the temperature at the th grid point at month after the average for that month in all years under consideration has been subtracted off, to eliminate some seasonal variations. They compute the Pearson correlation of and for each pair of grid points The Pearson correlation is the simplest measure of linear correlation, normalized to range between 1 and 1.
We could construct a weighted graph this way, and it would be symmetric, or undirected:
However, Donges et al prefer to work with a graph rather than a weighted graph. So, they create a graph where there is an edge from to (and also from to ) when exceeds a certain threshold, and no edge otherwise.
They can adjust this threshold so that any desired fraction of pairs actually have an edge between them. After some experimentation they chose this fraction to be 0.5%.
A certain patch dominates the world! This is the El Niño basin. The Indian Ocean comes in second.
(Some details, which I may not say:
The Pearson correlation is the covariance
normalized by dividing by the standard deviation of and the standard deviation of
The reddest shade of red in the above picture shows nodes that are connected to 5% or more of the other nodes. These nodes are connected to at least 10 times as many nodes as average.)
The Pearson correlation detects linear correlations. A more flexible measure is mutual information: how many bits of information knowing the temperature at time at grid point tells you about the temperature at the same time at grid point
Donges et al create a climate network this way as well, putting an edge between nodes if their mutual information exceeds a certain cutoff. They choose this cutoff so that 0.5% of node pairs have an edge between them, and get the following map:
The result is almost indistinguishable in the El Niño basin. So, this feature is not just an artifact of focusing on linear correlations.
El Niño breaks climate links
We can also look at how climate networks change with time—and in particular, how they are affected by El Niños. This is the subject of a 2008 paper by Tsonis and Swanson, Topology and predictability of El Niño and La Niña networks.
They create a climate network in a way that’s similar to the one I just described. The main differences are that they:
 separately create climate networks for El Niño and La Niña time periods;

create a link between grid points when their Pearson correlation has absolute value greater than $0.5;$

only use temperature data from November to March in each year, claiming that summertime introduces spurious links.
They get this map for La Niña conditions:
and this map for El Niño conditions:
They conclude that “El Niño breaks climate links”.
This may seem to contradict what I just said a minute ago. But it doesn’t! While the El Niño basin is a region where the surface air temperatures are highly correlated to temperatures at many other points, when an El Niño actually occurs it disrupts correlations between temperatures at different locations worldwide—and even in the El Niño basin!
For the rest of the talk I want to focus on a third claim: namely, that El Niños can be predicted by means of an increase in correlations between temperatures within the El Niño basin and temperatures outside this region. This claim was made in a recent paper by Ludescher et al. I want to examine it somewhat critically.
Predicting El Niños
People really want to predict El Niños, because they have huge effects on agriculture, especially around the Pacific ocean. However, it’s generally regarded as very hard to predict El Niños more than 6 months in advance. There is also a spring barrier: it’s harder to predict El Niños through the spring of any year.
It’s controversial how much of the unpredictability in the ENSO cycle is due to chaos intrinsic to the Pacific ocean system, and how much is due to noise from outside the system. Both may be involved.
There are many teams trying to predict El Niños, some using physical models of the Earth’s climate, and others using machine learning techniques. There is a kind of competition going on, which you can see at a National Oceanic and Atmospheric Administration website.
The most recent predictions give a sense of how hard this job is:
When the 3month running average of the Niño 3.4 index exceeds 0.5°C for 5 months, we officially declare that there is an El Niño.
As you can see, it’s hard to be sure if there will be an El Niño early next year! However, the consensus forecast is yes, a weak El Niño. This is the best we can do, now. Right now multimodel ensembles have better predictive skill than any one model.
The work of Ludescher et al
The Azimuth Project has carefully examined a 2013 paper by Ludescher et al called Very early warning of next El Niño, which uses a climate network for El Niño prediction.
They build their climate network using correlations between daily surface air temperature data between points inside the El Niño basin and certain points outside this region, as shown here:
The red dots are the points in their version of the El Niño basin.
(Next I will describe Ludescher’s procedure. I may omit some details in the actual talk, but let me include them here.)
The main idea of Ludescher et al is to construct a climate network that is a weighted graph, and to say an El Niño will occur if the average weight of edges between points in the El Niño basin and points outside this basin exceeds a certain threshold.
As in the other papers I mentioned, Ludescher et al let be the surface air temperature at the th grid point at time minus the average temperature at that location at that time of year in all years under consideration, to eliminate the most obvious seasonal effects.
They consider a timedelayed covariance between temperatures at different grid points:
where is a time delay, and the angle brackets denote a running average over the last year, that is:
where is the time in days.
They normalize this to define a correlation that ranges from 1 to 1.
Next, for any pair of nodes and and for each time they determine the maximum, the mean and the standard deviation of as the delay ranges from 200 to 200 days.
They define the link strength as the difference between the maximum and the mean value of divided by its standard deviation.
Finally, they let be the average link strength, calculated by averaging over all pairs where is a grid point inside their El Niño basin and is a grid point outside this basin, but still in their larger rectangle.
Here is what they get:
The blue peaks are El Niños: episodes where the Niño 3.4 index is over 0.5°C for at least 5 months.
The red line is their ‘average link strength’. Whenever this exceeds a certain threshold and the Niño 3.4 index is not already over 0.5°C, they predict an El Niño will start in the following calendar year.
Ludescher et al chose their threshold for El Niño prediction by training their algorithm on climate data from 1948 to 1980, and tested it on data from 1981 to 2013. They claim that with this threshold, their El Niño predictions were correct 76% of the time, and their predictions of no El Niño were correct in 86% of all cases.
On this basis they claimed—when their paper was published in February 2014—that the Niño 3.4 index would exceed 0.5 by the end of 2014 with probability 3/4.
The latest data as of 1 December 2014 seems to say: yes, it happened!
Replication and critique
Graham Jones of the Azimuth Project wrote code implementing Ludescher et al’s algorithm, as best as we could understand it, and got results close to theirs, though not identical. The code is opensource; one goal of the Azimuth Project is to do science ‘in the open’.
More interesting than the small discrepancies between our calculation and theirs is the question of whether ‘average link strengths’ between points in the El Niño basin and points outside are really helpful in predicting El Niños.
Steve Wenner, a statistician helping the Azimuth Project, noted some ambiguities in Ludescher et al‘s El Niño prediction rules and disambiguated them in a number of ways. For each way he used Fischer’s exact test to compute the value of the null hypothesis that Ludescher et al‘s El Niño prediction does not improve the odds that what they predict will occur.
The best he got (that is, the lowest value) was 0.03. This is just a bit more significant than the conventional 0.05 threshold for rejecting a null hypothesis.
Do high average link strengths between points in the El Niño basin and points elsewhere in the Pacific really increase the chance that an El Niño is coming? It is hard to tell from the work of Ludescher et al.
One reason is that they treat El Niño as a binary condition, either on or off depending on whether the Niño 3.4 index for a given month exceeds 0.5 or not. This is not the usual definition of El Niño, but the real problem is that they are only making a single yesorno prediction each year for 65 years: does an El Niño occur during this year, or not? 31 of these years (19501980) are used for training their algorithm, leaving just 34 retrodictions and one actual prediction (19812013, and 2014).
So, there is a serious problem with small sample size.
We can learn a bit by taking a different approach, and simply running some linear regressions between the average link strength and the Niño 3.4 index for each month. There are 766 months from 1950 to 2013, so this gives us more data to look at. Of course, it’s possible that the relation between average link strength and Niño is highly nonlinear, so a linear regression may not be appropriate. But it is at least worth looking at!
Daniel Mahler and Dara Shayda of the Azimuth Project did this and found the following interesting results.
Simple linear models
Here is a scatter plot showing the Niño 3.4 index as a function of the average link strength on the same month:
(Click on these scatter plots for more information.)
The coefficient of determination, is 0.0175. In simple terms, this means that the average link strength in a given month explains just 1.75% of the variance of the Niño 3.4 index. That’s quite low!
Here is a scatter plot showing the Niño 3.4 index as a function of the average link strength six months earlier:
Now is 0.088. So, the link strength explains 8.8% of the variance in the Niño 3.4 index 6 months later. This is still not much—but interestingly, it’s much more than when we try to relate them at the same moment in time! And the value is less than so the effect is statistically significant.
Of course, we could also try to use Niño 3.4 to predict itself. Here is the Niño 3.4 index plotted against the Niño 3.4 index six months earlier:
Now So, this is better than using the average link strength!
That doesn’t sound good for average link strength. But now let’s could try to predict Niño 3.4 using both itself and the average link strength 6 months earlier. Here is a scatter plot showing that:
Here the axis is an optimally chosen linear combination of average and link strength and Niño 3.4: one that maximizes .
In this case we get
Conclusions
What can we conclude from this?
Using a linear model, the average link strength on a given month accounts for only 8% of the variance of Niño 3.4 index 6 months in the future. That sounds bad, and indeed it is.
However, there are more interesting things to say than this!
Both the Niño 3.4 index and the average link strength can be computed from the surface air temperature of the Pacific during some window in time. The Niño 3.4 index explains 16% of its own variance 6 months into the future; the average link strength explains 8%, and taken together they explain 22%. So, these two variables contain a fair amount of independent information about the Niño 3.4 index 6 months in the future.
Furthermore, they explain a surprisingly large amount of its variance for just 2 variables.
For comparison, Mahler used a random forest variant called ExtraTreesRegressor to predict the Niño 3.4 index 6 months into the future from much larger collections of data. Out of the 778 months available he trained the algorithm on the first 400 and tested it on the remaining 378.
The result: using a full worldwide grid of surface air temperature values at a given moment in time explains only 23% of the Niño 3.4 index 6 months into the future. A full grid of surface air pressure values does considerably better, but still explains only 34% of the variance. Using twelve months of the full grid of pressure values only gets around 37%.
From this viewpoint, explaining 22% of the variance with just two variables doesn’t look so bad!
Moreover, while the Niño 3.4 index is maximally correlated with itself at the same moment in time, for obvious reasons, the average link strength is maximally correlated with the Niño 3.4 index 10 months into the future:
(The lines here occur at monthly intervals.)
However, we have not tried to determine if the average link strength as Ludescher et al define it is optimal in this respect. Graham Jones has shown that simplifying their definition of this quantity doesn’t change it much. Maybe modifying their definition could improve it. There seems to be a real phenomenon at work here, but I don’t think we know exactly what it is!
My talk has avoided discussing physical models of the ENSO, because I wanted to focus on very simple, general ideas from complex network theory. However, it seems obvious that really understanding the ENSO requires a lot of ideas from meteorology, oceanography, physics, and the like. I am not advocating a ‘purely networkbased approach’.
Please Mister Professor Sir, might I inquire, whatever happened to John Dutton’s book The Ceaseless Wind? http://books.google.com/books?id=3ogL1o5DePIC&printsec=frontcover&dq=inauthor:%22John+A.+Dutton%22&hl=en&sa=X&ei=TDh7VJXwJ8qwogTj_IHIDg&ved=0CB8Q6AEwAA#v=onepage&q&f=false
Has anyone proved the matrix methods you advocate are more accurate than the nonlinear differential equations favored by Professor Dutton?
“the nonlinear differential equations favored by Professor Dutton?”
HenryB, the Azimuth Project is also working this angle.
Dear Web Hub Telescope
Wonderfulness! I’m looking forward to a detailed article comparing the classical methods described by Dutton with the modern approach espoused by Baez et al.
“Of course, it’s possible that the relation between average link strength and Niño is highly nonlinear, so a linear regression may not be appropriate.”
Is there something like “functional regression” which allows a more general functional relationship to emerge from the data with few initial assumptions (reminiscent of the way a functionaloptimizing function can magically emerge in the calculus of variations)?
Oh, I guess you’re going to say that absent constraints, you’ll always find an infinite number of solutions which “explain” 100% of the observed behavior – which doesn’t help much. If so, is there a way to incorporate a heuristic metric into the objective function (perhaps a simplicity measure) to winnow the infinite solution set down to a smaller set of arguablyplausible solutions while keeping the initial search space quite general?
Yes, you can always explain a dataset perfectly by a sufficiently complex model, but that model will be mostly useless on new data. This is known as *overfitting*. The most reliable way to guard against this is to fit the model on only a part of the data and evaluate the model model on the the remaining unseen *held out set*. In this case you can tell by just eyballing the scatter plots that there is no magic non linear relationship (except one that just wildly oscillates to fit individual points). There is just a general up and to the right trend plus a lot of noise. Most of the error comes from the noise, fitting a smooth curve is unlikely to be a massive improvement over a straight line.
Thanks Daniel.
John, either (a) your brain locked up on the profundity of my question or (b) there was a glitch somewhere. Given your coherence w/ Domenico 4 mins later (and frankly, given the profundity of my question) my money’s on (b):)
In the scatter plot of the Simple linear models the click don’t work.
I don’t understand what is “other” in the Annual Global Temperature Anomalies graph, but the climatologists have not my same problem.
It is a clear, simple, and profound analysis of a Nino model; it is like reading a narrative.
Domenico wrote:
Whoops! It should be working now.
My talk will be given to experts on neural networks and machine learning, not climatologists. So, I should explain this.
An El Niño occurs when it gets quite hot in a certain patch of the Pacific. A La Niñ occurs when it gets quite cold. The rest of the time is “other”.
(Later in my talk I give a more precise definition of El Niño, and there is a similar definition of La Niña… and when neither of those conditions hold, it’s “other.”)
I’m glad you liked the talk, and thanks for the help.
I find the bit about the ‘global warming pause’ distracting. I think the second graph would be a better starting point.
My plan is to hook people’s interest with something they’ve heard about. Some things I’ll say that aren’t on the slides are supposed to help smooth the transition. Namely: El Niño is a major form of ‘natural variability’, El Niños bring up heat from the ocean into the atmosphere, and the ‘global warming pause’ may end with the next big El Niño (say, one comparable to the 19971998 El Niño).
But I might jettison this stuff if I decide I need all the time I can get, or want to avoid questions from global warming nuts.
I decided to eliminate the ‘global warming pause’ stuff.
What are the reasons to consider a simple graph by defining a threshold and not a graph with weights? We can normalize the positive matrix to interpret it as a probability measure in the set where is the set of nodes. Being symmetric the projected measure on is the same for the two projections. This measure probably gives the same results.
I don’t really understand the reason for turning a weighted graph into a simple graph using a threshold, except perhaps that it’s easier to work with huge graphs than with huge weighted graphs, since there’s less data. Since I don’t know a really good reason for it, I don’t like it.
That is a good reason if your are planning to do complicated operations with the matrix. For example finding the inverse But adding all the elements and then dividing each element by the sum is not a complicated operation. Is the data available ?, maybe i can try to do it (which some help) . If the maps do agree I would say that proves the good scientific intuition in defining the threshold if not I think there is something to be explained.
There is another thing, for each column let the sum of the elements of the matrix . Let the matrix with elements . So the matrix is a stochastic matrix, for each column all the elements add 1. By Perron’s Theorem 1 is an eigenvalue and all others eigenvalues have norm less than 1.
The eigenvalue 1 is simple if graph is connected. This should be the case. The correspondent eigenvector is an equilibrium measure for the stochastic process associated with . The point is that this measure is precisely the one defined above.
Could there be a climatic interpretation of this? Flow of information?
I am very intrigued about the very low correlated white strip just south of the most correlated zone.I mean, most of the ocean is light blue and you have a clearly white strip zone just south of the red and green and dark blue zone.
I enjoyed very much.
Sorry here again the formula there was a missing }
.
Just in case the matrix $\latex P$ is formed from dividing each element by a column dependent number, which is the sum of all the elements in the column.
John, I think that you should try to cobble together a more positive message in the introduction to the talk.
Take a few steps back, and consider the question of why this material — at a very general level — could potentially be of interest to (1) you, and (2) the audience at NIPS. What would be the abstract for this talk?
Here are some possible ingredients:
– New area of application for network theory
– New area of application for machine learning
– Application area represents a pressing human concern
– Azimuth project is searching for ways that mathematicians, scientists and programmers can contribute to the understanding of significant environmental problems
– Made a decision to investigate a more concrete problem
– In this talk, I will begin by giving background and context on the El Nino phenomenon and its physics; then discuss climate network structures that have been posited as indicators for the occurrence of El Nino events; then proceed to evaluate a specific paper which uses this framework, and makes specific testable hypotheses about the preconditions for the occurrence of an El Nino event.
I would also suggest a section that talks about the role of machine learning in this study.
Good Luck!
[…] given in peer reviewed and informal scientific literature to the ideas of Tsonis and Swanson, e.g., here and […]
Just for everyone’s information, there’s a new article on ENSO and networks just published in Climate Dynamics, at http://link.springer.com/article/10.1007/s0038201422657?noaccess=true. I don’t subscribe, so I have seen nothing but the Abstract.