El Niño Project (Part 8)

So far we’ve rather exhaustively studied a paper by Ludescher et al which uses climate networks for El Niño prediction. This time I’d like to compare another paper:

• Y. Berezin, Avi Gozolchiani, O. Guez and Shlomo Havlin, Stability of climate networks with time, Scientific Reports 2 (2012).

Some of the authors are the same, and the way they define climate networks is very similar. But their goal here is different: they want to see see how stable climate networks are over time. This is important, since the other paper wants to predict El Niños by changes in climate networks.

They divide the world into 9 zones: For each zone they construct several climate networks. Each one is an array of numbers $W_{l r}^y$, one for each year $y$ and each pair of grid points $l, r$ in that zone. They call $W_{l r}^y$ a link strength: it’s a measure of how how correlated the weather is at those two grid points during that year.

I’ll say more later about how they compute these link strengths. In Part 3 we explained one method for doing it. This paper uses a similar but subtly different method.

The paper’s first big claim is that $W_{l r}^y$ doesn’t change much from year to year, “in complete contrast” to the pattern of local daily air temperature and pressure fluctuations. In simple terms: the strength of the correlation between weather at two different points tends to be quite stable.

Moreover, the definition of link strength involves an adjustable time delay, $\tau$. We can measure the correlation between the weather at point $l$ at any given time and point $r$ at a time $\tau$ days later. The link strength is computed by taking a maximum over time delays $\tau$. Naively speaking, the value of $\tau$ that gives the maximum correlation is “how long it typically takes for weather at point $l$ to affect weather at point $r$”. Or the other way around, if $\tau$ is negative.

This is a naive way of explaining the idea, because I’m mixing up correlation with causation. But you get the idea, I hope.

Their second big claim is that when the link strength between two points $l$ and $r$ is big, the value of $\tau$ that gives the maximum correlation doesn’t change much from year to year. In simple terms: if the weather at two locations is strongly correlated, the amount of time it takes for weather at one point to reach the other point doesn’t change very much.

The data

How do Berezin et al define their climate network?

They use data obtained from here:

This is not exactly the same data set that Ludescher et al use, namely:

“Reanalysis 2” is a newer attempt to reanalyze and fix up the same pile of data. That’s a very interesting issue, but never mind that now!

Berezin et al use data for:

• the geopotential height for six different pressures

and

• the air temperature at those different heights

The geopotential height for some pressure says roughly how high you have to go for air to have that pressure. Click the link if you want a more precise definition! Here’s the geopotential height field for the pressure of 500 millibars on some particular day of some particular year: The height is in meters.

Berezin et al use daily values for this data for:

• locations world-wide on a grid with a resolution of 5° × 5°,

during:

• the years from 1948 to 2006.

They divide the globe into 9 zones, and separately study each zone:

So, they’ve got twelve different functions of space and time, where space is a rectangle discretized using a 5° × 5° grid, and time is discretized in days. From each such function they build a ‘climate network’.

How do they do it?

The climate networks

Berezin’s method of defining a climate network is similar to Ludescher et al‘s, but different. Compare Part 3 if you want to think about this.

Let $\tilde{S}^y_l(t)$ be any one of their functions, evaluated at the grid point $l$ on day $t$ of year $y$.

Let $S_l^y(t)$ be $\tilde{S}^y_l(t)$ minus its climatological average. For example, if $t$ is June 1st and $y$ is 1970, we average the temperature at location $l$ over all June 1sts from 1948 to 2006, and subtract that from $\tilde{S}^y_l(t)$ to get $S^y_l(t)$. In other words: $\displaystyle{ \tilde{S}^y_l(t) = S^y_l(t) - \frac{1}{N} \sum_y S^y_l(t) }$

where $N$ is the number of years considered.

For any function of time $f$, let $\langle f^y(t) \rangle$ be the average of the function over all days in year $y$. This is different than the ‘running average’ used by Ludescher et al, and I can’t even be 100% sure that Berezin mean what I just said: they use the notation $\langle f^y(t) \rangle$.

Let $l$ and $r$ be two grid points, and $\tau$ any number of days in the interval $[-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}]$. Define the cross-covariance function at time $t$ by: $\Big(f_l(t) - \langle f_l(t) \rangle\Big) \; \Big( f_r(t + \tau) - \langle f_r(t + \tau) \rangle \Big)$

I believe Berezin mean to consider this quantity, because they mention two grid points $l$ and $r$. Their notation omits the subscripts $l$ and $r$ so it is impossible to be completely sure what they mean! But what I wrote is the reasonable quantity to consider here, so I’ll assume this is what they meant.

They normalize this quantity and take its absolute value, forming: $\displaystyle{ X_{l r}^y(\tau) = \frac{\Big|\Big(f_l(t) - \langle f_l(t) \rangle\Big) \; \Big( f_r(t + \tau) - \langle f_r(t + \tau) \rangle \Big)\Big|} {\sqrt{\Big\langle \Big(f_l(t) - \langle f_l(t)\rangle \Big)^2 \Big\rangle } \; \sqrt{\Big\langle \Big(f_r(t+\tau) - \langle f_r(t+\tau)\rangle\Big)^2 \Big\rangle } } }$

They then take the maximum value of $X_{l r}^y(\tau)$ over delays $\tau \in [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}]$, subtract its mean over delays in this range, and divide by the standard deviation. They write something like this: $\displaystyle{ W_{l r}^y = \frac{\mathrm{MAX}\Big( X_{l r}^y - \langle X_{l r}^y\rangle \Big) }{\mathrm{STD} X_{l r}^y} }$

and say that the maximum, mean and standard deviation are taken over the (not written) variable $\tau \in [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}]$.

Each number $W_{l r}^y$ is called a link strength. For each year, the matrix of numbers $W_{l r}^y$ where $l$ and $r$ range over all grid points in our zone is called a climate network.

We can think of a climate network as a weighted complete graph with the grid points $l$ as nodes. Remember, an undirected graph is one without arrows on the edges. A complete graph is an undirected graph with one edge between any pair of nodes:

A weighted graph is an undirected graph where each edge is labelled by a number called its weight. But right now we’re also calling the weight the ‘link strength’.

A lot of what’s usually called ‘network theory’ is the study of weighted graphs. You can learn about it here:

• Ernesto Estrada, The Structure of Complex Networks: Theory and Applications, Oxford U. Press, Oxford, 2011.

Suffice it to say that given a weighted graph, there are lot of quantities you can compute from it, which are believed to tell us interesting things!

The conclusions

I will not delve into the real meat of the paper, namely what they actually do with their climate networks! The paper is free online, so you can read this yourself.

I will just quote their conclusions and show you a couple of graphs.

The conclusions touch on an issue that’s important for the network-based approach to El Niño prediction. If climate networks are ‘stable’, not changing much in time, why would we use them to predict a time-dependent phenomenon like the El Niño Southern Oscillation?

We have established the stability of the network of connections between the dynamics of climate variables (e.g. temperatures and geopotential heights) in different geographical regions. This stability stands in fierce contrast to the observed instability of the original climatological field pattern. Thus the coupling between different regions is, to a large extent, constant and predictable. The links in the climate network seem to encapsulate information that is missed in analysis of the original field.

The strength of the physical connection, $W_{l r}$, that each link in this network represents, changes only between 5% to 30% over time. A clear boundary between links that represent real physical dependence and links that emerge due to noise is shown to exist. The distinction is based on both the high link average strength $\overline{W_{l r}}$ and on the low variability of time delays $\mathrm{STD}(T_{l r})$.

Recent studies indicate that the strength of the links in the climate network changes during the El Niño Southern Oscillation and the North Atlantic Oscillation cycles. These changes are within the standard deviation of the strength of the links found here. Indeed in Fig. 3 it is clearly seen that the coefficient of variation of links in the El Niño basin (zone 9) is larger than other regions such as zone 1. Note that even in the El Niño basin the coefficient of variation is relatively small (less than 30%).

Beside the stability of single links, also the hierarchy of the link strengths in the climate network is preserved to a large extent. We have shown that this hierarchy is partially due to the two dimensional space in which the network is embedded, and partially due to pure physical coupling processes. Moreover the contribution of each of these effects, and the level of noise was explicitly estimated. The spatial effect is typically around 50% of the observed stability, and the noise reduces the stability value by typically 5%–10%.

The network structure was further shown to be consistent across different altitudes, and a monotonic relation between the altitude distance and the correspondence between the network structures is shown to exist. This yields another indication that the observed network structure represents effects of physical coupling.

The stability of the network and the contributions of different effects were summarized in specific relation to different geographical areas, and a clear distinction between equatorial and off–equatorial areas was observed. Generally, the network structure of equatorial regions is less stable and more fluctuative.

The stability and consistence of the network structure during time and across different altitudes stands in contrast to the known unstable variability of the daily anomalies of climate variables. This contrast indicates an analogy between the behavior of nodes in the climate network and the behavior of coupled chaotic oscillators. While the fluctuations of each coupled oscillators are highly erratic and unpredictable, the interactions between the oscillators is stable and can be predicted. The possible outreach of such an analogy lies in the search for known behavior patterns of coupled chaotic oscillators in the climate system. For example, existence of phase slips in coupled chaotic oscillators is one of the fingerprints for their cooperated behavior, which is evident in each of the individual oscillators. Some abrupt changes in climate variables, for example, might be related to phase slips, and can be understood better in this context.

On the basis of our measured coefficient of variation of single links (around 15%), and the significant overall network stability of 20–40%, one may speculatively assess the extent of climate change. However, for this assessment our current available data is too short and does not include enough time from periods before the temperature trends. An assessment of the relation between the network stability and climate change might be possible mainly through launching of global climate model “experiments” realizing other climate conditions, which we indeed intend to perform.

A further future outreach of our work can be a mapping between network features (such as network motifs) and known physical processes. Such a mapping was previously shown to exist between an autonomous cluster in the climate network and El Niño. Further structures without such a climate interpretation might point towards physical coupling processes which were not observed earlier.

(I have expanded some acronyms and deleted some reference numbers.)

Finally, here two nice graphs showing the average link strength as a function of distance. The first is based on four climate networks for Zone 1, the southern half of South America:

The second is based on four climate networks for Zone 9, a big patch of the Pacific north of the Equator which roughly corresponds to the ‘El Niño basin’:

As we expect, temperatures and geopotential heights get less correlated at points further away. But the rate at which the correlation drops off conveys interesting information! Graham Jones has made some interesting charts of this for the rectangle of the Pacific that Ludescher et al use for El Niño prediction, and I’ll show you those next time.

The series so far

El Niño project (part 1): basic introduction to El Niño and our project here.

El Niño project (part 2): introduction to the physics of El Niño.

El Niño project (part 3): summary of the work of Ludescher et al.

El Niño project (part 4): how Graham Jones replicated the work by Ludescher et al, using software written in R.

El Niño project (part 5): how to download R and use it to get files of climate data.

El Niño project (part 6): Steve Wenner’s statistical analysis of the work of Ludescher et al.

El Niño project (part 7): the definition of El Niño.

El Niño project (part 8): Berezin et al on the stability of climate networks.

2 Responses to El Niño Project (Part 8)

1. arch1 says:

Very interesting, thanks. Just eyeballing the two graphs, it is tempting to guess that in general, the avg link strengths drop to some minimal baseline value of around 3.5 at about 3k Km separation, with the Zone 9 height isobars being the anomalies. I wonder how tau varies with separation (I know, I should go look:-)

2. domenico says:

It seem a graph with a discrete temporal evolution: there are some edge with some spatial nodes ( the grid or graph) and some edges with the same graph in different times.
It seem a common idea in experimental science, that is if the graph contain some edge with a great value (correlation), then it is possible a function that connect some past nodes and it is possible the prediction of an event; so that a study of the weights is enough to guess the domain of the function (that is used in the prediction), if there are different variables in each node suggested by experts.

This site uses Akismet to reduce spam. Learn how your comment data is processed.