Regime Shift?

25 November, 2015

There’s no reason that the climate needs to change gradually. Recently scientists have become interested in regime shifts, which are abrupt, substantial and lasting changes in the state of a complex system.

Rasha Kamel of the Azimuth Project pointed us to a report in Science Daily which says:

Planet Earth experienced a global climate shift in the late 1980s on an unprecedented scale, fueled by anthropogenic warming and a volcanic eruption, according to new research. Scientists say that a major step change, or ‘regime shift,’ in Earth’s biophysical systems, from the upper atmosphere to the depths of the ocean and from the Arctic to Antarctica, was centered around 1987, and was sparked by the El Chichón volcanic eruption in Mexico five years earlier.

As always, it’s good to drill down through the science reporters’ summaries to the actual papers.  So I read this one:

• Philip C. Reid et al, Global impacts of the 1980s regime shift on the Earth’s climate and systems, Global Change Biology, 2015.

The authors of this paper analyzed 72 time series of climate and ecological data to search for such a regime shift, and found one around 1987. If such a thing really happened, this could be very important.

Here are some of the data they looked at:

Click to enlarge them—they’re pretty interesting! Vertical lines denote regime shift years, colored in different ways: 1984 blue, 1985 green, 1986 orange, 1987 red, 1988 brown, 1989 purple and so on. You can see that lots are red.

The paper has a lot of interesting and informed speculations about the cause of this shift—so give it a look.  For now I just want to tackle an important question of a more technical nature: how did they search for regime shifts?

They used the ‘STARS’ method, which stands for Sequential t-Test Analysis of Regime Shifts. They explain:

The STARS method (Rodionov, 2004; Rodionov & Overland, 2005) tests whether the end of one period (regime) of a certain length is different from a subsequent period (new regime). The cumulative sum of normalized deviations from the hypothetical mean level of the new regime is calculated, and then compared with the mean level of the preceding regime. A shift year is detected if the difference in the mean levels is statistically significant according to a Student’s t-test.

In his third paper, Rodionov (2006) shows how autocorrelation can be accounted for. From each year of the time series (except edge years), the rules are applied backwards and forwards to test that year as a potential shift year. The method is, therefore, a running procedure applied on sequences of years within the time series. The multiple STARS method used here repeats the procedure for 20 test-period lengths ranging from 6 to 25 years that are, for simplicity (after testing many variations), of the same length on either side of the regime shift.

Elsewhere I read that the STARS method is ‘too sensitive’. Could it be due to limitations of the ‘statistical significance’ idea involved in Student’s t-test?

You can download software that implements the STARS method here. The method is explained in the papers by Rodionov.

Do you know about this stuff?  If so, I’d like to hear your views on this paper and the STARS method.

Fires in Indonesia

2 November, 2015

I lived in Singapore for two years, and I go back to work there every summer. I love Southeast Asia, its beautiful landscapes, its friendly people, and its huge biological and cultural diversity. It’s a magical place.

But in 2013 there was a horrible haze from fires in nearby Sumatra. And this year it’s even worse. It makes me want to cry, thinking about how millions of people all over this region are being choked as the rain forest burns.

This part of the world has a dry season from May to October and then a wet season. In the dry season, Indonesian farmers slash down jungle growth, burn it, and plant crops. That is nothing new.

But now, palm oil plantations run by big companies do this on a massive scale. Jungles are disappearing at an astonishing rate. Some of this is illegal, but corrupt government officials are paid to look the other way. Whenever we buy palm oil—in soap, cookies, bread, margarine, detergents, and many other products—we become part of the problem.

This year the fires are worse. One reason is that we’re having an El Niño. That typically means more rain in California—which we desperately need. But it means less rain in Southeast Asia.

This summer it was very dry in Singapore. Then, in September, the haze started. We got used to rarely seeing the sun—only yellow-brown light filtering through the smoke. When it stinks outside, you try to stay indoors.

When I left on September 19th, the PSI index of air pollution had risen above 200, which is ‘very unhealthy’. Singapore had offered troops to help fight the fires, but Indonesia turned down the offer, saying they could handle the situation themselves. That was completely false: thousands of fires were burning out of control in Sumatra, Borneo and other Indonesian islands.

I believe the Indonesian government just didn’t want foreign troops out their land. Satellites could detect the many hot spots where fires were burning. But outrageously, the government refused to say who owned those lands.

A few days after I left, the PSI index in Singapore had shot above 300, which is ‘hazardous’. But in parts of Borneo the PSI had reached 1,986. The only name for that is hell.

By now Indonesia has accepted help from Singapore. Thanks to changing winds, the PSI in Singapore has been slowly dropping throughout October. In the last few days the rainy season has begun. Each time the rain clears the air, Singaporeans can see something beautiful and almost forgotten: a blue sky.

Rain is also helping in Borneo. But the hellish fires continue. There have been over 100,000 individual fires—mostly in Sumatra, Borneo and Papua. In many places, peat in the ground has caught on fire! It’s very hard to put out a peat fire.

If you care about the Earth, this is very disheartening. These fires have been putting over 15 million tons of carbon dioxide into the air per day – more than the whole US economy! And so far this year they’ve put out 1.5 billion tons of CO2. That’s more than Germany’s carbon emissions for the whole year—in fact, even more than Japan’s. How can we make progress on reducing carbon emissions with this going on?

For you and me, the first thing is to stop buying products with palm oil. The problem is largely one of government corruption driven by money from palm oil plantations. But the real heart of the problem lies in Indonesia. Luckily Widodo, the president of this country, may be part of the solution. But the solution will be difficult.

Quoting National Public Radio:

Widodo is Indonesia’s first president with a track record of efficient local governance in running two large cities. Strong action on the haze issue could help fulfill the promise of reform that motivated Indonesian voters to put him in office in October 2014.

The president has deployed thousands of firefighters and accepted international assistance. He has ordered a moratorium on new licenses to use peat land and ordered law enforcers to prosecute people and companies who clear land by burning forests.

“It must be stopped, we mustn’t allow our tropical rainforests to disappear because of monoculture plantations like oil palms,” Widodo said early in his administration.

Land recently burned and planted with palm trees is now under police investigation in Kalimantan [the Indonesian part of Borneo].

The problem of Indonesia’s illegal forest fires is so complex that it’s very hard to say exactly who is responsible for causing it.

Indonesia’s government has blamed both big palm oil companies and small freeholders. Poynton [executive director of the Forest Trust] says the culprits are often mid-sized companies with strong ties to local politicians. He describes them as lawless middlemen who pay local farmers to burn forests and plant oil palms, often on other companies’ concessions.

“There are these sort of low-level, Mafioso-type guys that basically say, ‘You get in there and clear the land, and I’ll then finance you to establish a palm oil plantation,'” he says.

The problem is exacerbated by ingrained government corruption, in which politicians grant land use permits for forests and peat lands to agribusiness in exchange for financial and political support.

“The disaster is not in the fires,” says independent Jakarta-based commentator Wimar Witoelar. “It’s in the way that past Indonesian governments have colluded with big palm oil businesses to make the peat lands a recipe for disaster.”

The quote is from here:

• Anthony Kuhn, As Indonesia’s annual fires rage, plenty of blame but no responsibility.

For how to avoid using palm oil, see for example:

• Lael Goodman, How many products with palm oil do I use in a day?

First, avoid processed foods. That’s smart for other reasons too.

Second, avoid stuff that contains stearic acid, sodium palmitate, sodium laureth sulfate, cetyl alcohol, glyceryl stearate and related compounds—various forms of artificial grease that are often made from palm oil. It takes work to avoid all this stuff, but at least be aware of it. These chemicals are not made in laboratories from pure carbon, hydrogen, oxygen and nitrogen! The raw ingredients often come from palm plantations, huge monocultures that are replacing the wonderful diversity of rainforest life.

For more nuanced suggestions, see the comments below. Right now I’m just so disgusted that I want to avoid palm oil.

For data on the carbon emissions of this and other fires, see:

Global fire emissions data.

1997 was the last really big El Niño.

This shows a man in Malaysia in September. Click on the pictures for more details. The picture at top shows a woman named a woman named Gaye Thavisin in Indonesia—perhaps in Kalimantan, the Indonesian half of Borneo, the third largest island in the world. Here is a bit of her story:

The Jungle River Cruise is run by Kalimantan Tour Destinations a foreign owned company set up by two women pioneering the introduction of ecotourism into a part of Central Kalimantan that to date has virtually no tourism.

Inspired by the untapped potential of Central Kalimantan’s mighty rivers, Gaye Thavisin and Lorna Dowson-Collins converted a traditional Kalimantan barge into a comfortable cruise boat with five double cabins, an inside sitting area and a upper viewing deck, bringing the first jungle cruises to the area.

Originally Lorna Dowson-Collins worked in Central Kalimantan with a local NGO on a sustainable livelihoods programme. The future livelihoods of the local people were under threat as logging left the land devastated with poor soils and no forest to fend from.

Kalimantan was teeming with the potential of her people and their fascinating culture, with beautiful forests of diverse flora and fauna, including the iconic orang-utan, and her mighty rivers providing access to these wonderful treasures.

An idea for a social enterprise emerged , which involved building a boat to journey guests to inaccessible places and provide comfortable accommodation.

Gaye Thavisin, an Australian expatriate, for 4 years operated an attractive, new hotel 36 km out of Palangkaraya in Kalimantan. Gaye was passionate about developing the tourism potential of Central Kalimantan and was also looking at the idea of boats. With her contract at the hotel coming to an end, the Jungle Cruise began to take shape!

Melting Permafrost (Part 4)

4 March, 2015


Russian scientists have recently found more new craters in Siberia, apparently formed by explosions of methane. Three were found last summer. They looked for more using satellite photos… and found more!

“What I think is happening here is, the permafrost has been acting as a cap or seal on the ground, through which gas can’t permeate,” says Paul Overduin, a permafrost expert at the Alfred Wegener Institute in Germany. “And it reaches a particular temperature where there’s not enough ice in it to act that way anymore. And then gas can rush out.”

It’s rather dramatic. Some Russian villagers have even claimed to see flashes in the distance when these explosions occur. But how bad is it?

The Siberian Times

An English-language newspaper called The Siberian Times has a good article about these craters, which I’ll quote extensively:

• Anna Liesowska Dozens of new craters suspected in northern Russia, The Siberian Times, 23 February 2015.

B1 – famous Yamal hole in 30 kilometers from Bovanenkovo, spotted in 2014 by helicopter pilots. Picture: Marya Zulinova, Yamal regional government press service.

Respected Moscow scientist Professor Vasily Bogoyavlensky has called for ‘urgent’ investigation of the new phenomenon amid safety fears.

Until now, only three large craters were known about in northern Russia with several scientific sources speculating last year that heating from above the surface due to unusually warm climatic conditions, and from below, due to geological fault lines, led to a huge release of gas hydrates, so causing the formation of these craters in Arctic regions.

Two of the newly-discovered large craters—also known as funnels to scientists—have turned into lakes, revealed Professor Bogoyavlensky, deputy director of the Moscow-based Oil and Gas Research Institute, part of the Russian Academy of Sciences.

Examination using satellite images has helped Russian experts understand that the craters are more widespread than was first realised, with one large hole surrounded by as many as 20 mini-craters, The Siberian Times can reveal.

Four Arctic craters: B1 – famous Yamal hole in 30 kilometers from Bovanenkovo, B2 – recently detected crater in 10 kilometers to the south from Bovanenkovo, B3 – crater located in 90 kilometers from Antipayuta village, B4 – crater located near Nosok village, on the north of Krasnoyarsk region, near Taimyr Peninsula. Picture: Vasily Bogoyavlensky.

‘We know now of seven craters in the Arctic area,’ he said. ‘Five are directly on the Yamal peninsula, one in Yamal Autonomous district, and one is on the north of the Krasnoyarsk region, near the Taimyr peninsula.

‘We have exact locations for only four of them. The other three were spotted by reindeer herders. But I am sure that there are more craters on Yamal, we just need to search for them.

‘I would compare this with mushrooms: when you find one mushroom, be sure there are few more around. I suppose there could be 20 to 30 craters more.’

He is anxious to investigate the craters further because of serious concerns for safety in these regions.

The study of satellite images showed that near the famous hole, located in 30 kilometres from Bovanenkovo are two potentially dangerous objects, where the gas emission can occur at any moment.

Satellite image of the site before the forming of the Yamal hole (B1). K1 and the red outline show the hillock (pingo) formed before the gas emission. Yellow outlines show the potentially dangerous objects. Picture: Vasily Bogoyavlensky.

He warned: ‘These objects need to be studied, but it is rather dangerous for the researchers. We know that there can occur a series of gas emissions over an extended period of time, but we do not know exactly when they might happen.

‘For example, you all remember the magnificent shots of the Yamal crater in winter, made during the latest expedition in Novomber 2014. But do you know that Vladimir Pushkarev, director of the Russian Centre of Arctic Exploration, was the first man in the world who went down the crater of gas emission?

‘More than this, it was very risky, because no one could guarantee there would not be new emissions.’

Professor Bogoyavlensky told The Siberian Times: ‘One of the most interesting objects here is the crater that we mark as B2, located 10 kilometres to the south of Bovanenkovo. On the satellite image you can see that it is one big lake surrounded by more than 20 small craters filled with water.

‘Studying the satellite images we found out that initially there were no craters nor a lake. Some craters appeared, then more. Then, I suppose that the craters filled with water and turned to several lakes, then merged into one large lake, 50 by 100 metres in diameter.

‘This big lake is surrounded by the network of more than 20 ‘baby’ craters now filled with water and I suppose that new ones could appear last summer or even now. We now counting them and making a catalogue. Some of them are very small, no more than 2 metres in diameter.’

‘We have not been at the spot yet,’ he said. ‘Probably some local reindeer herders were there, but so far no scientists.’

He explained: ‘After studying this object I am pretty sure that there was a series of gas emissions over an extended period of time. Sadly, we do not know, when exactly these emissions occur, i.e. mostly in summer, or in winter too. We see only the results of this emissions.’

The object B2 is now attracting special attention from the researchers as they seek to understand and explain the phenomenon. This is only 10km from Bovanenkovo, a major gas field, developed by Gazprom, in the Yamalo-Nenets Autonomous Okrug. Yet older satellite images do not show the existence of a lake, nor any craters, in this location.

Not only the new craters constantly forming on Yamal show that the process of gas emission is ongoing actively.

Professor Bogoyavlensky shows the picture of one of the Yamal lakes, taken by him from the helicopter and points on the whitish haze on its surface.

Yamal lake with traces of gas emissions. Picture: Vasily Bogoyavlensky.

He commented: ‘This haze that you see on the surface shows that gas seeps that go from the bottom of the lake to the surface. We call this process ‘degassing’.

‘We do not know, if there was a crater previously and then turned to lake, or the lake formed during some other process. More important is that the gases from within are actively seeping through this lake.

‘Degassing was revealed on the territory of Yamal Autonomous District about 45 years ago, but now we think that it can give us some clues about the formation of the craters and gas emissions. Anyway, we must research this phenomenon urgently, to prevent possible disasters.’

Professor Bogoyavlensky stressed: ‘For now, we can speak only about the results of our work in the laboratory, using the images from space.

‘No one knows what is happening in these craters at the moment. We plan a new expedition. Also we want to put not less than four seismic stations in Yamal district, so they can fix small earthquakes, that occur when the crater appears.

‘In two cases locals told us that they felt earth tremors. The nearest seismic station was yet too far to register these tremors.

‘I think that at the moment we know enough about the crater B1. There were several expeditions, we took probes and made measurements. I believe that we need to visit the other craters, namely B2, B3 and B4, and then visit the rest three craters, when we will know their exact location. It will give us more information and will bring us closer to understanding the phenomenon.’

He urged: ‘It is important not to scare people, but to understand that it is a very serious problem and we must research this.’

In an article for Drilling and Oil magazine, Professor Bogoyavlensky said the parapet of these craters suggests an underground explosion.

‘The absence of charred rock and traces of significant erosion due to possible water leaks speaks in favour of mighty eruption (pneumatic exhaust) of gas from a shallow underground reservoir, which left no traces on soil which contained a high percentage of ice,’ he wrote.

‘In other words, it was a gas-explosive mechanism that worked there. A concentration of 5-to-16% of methane is explosive. The most explosive concentration is 9.5%.’

Gas probably concentrated underground in a cavity ‘which formed due to the gradual melting of buried ice’. Then ‘gas was replacing ice and water’.

‘Years of experience has shown that gas emissions can cause serious damage to drilling rigs, oil and gas fields and offshore pipelines,’ he said. ‘Yamal craters are inherently similar to pockmarks.

‘We cannot rule out new gas emissions in the Arctic and in some cases they can ignite.’

This was possible in the case of the crater found at Antipayuta, on the Yamal peninsula.

‘The Antipayuta residents told how they saw some flash. Probably the gas ignited when appeared the crater B4, near Taimyr peninsula. This shows us, that such explosion could be rather dangerous and destructive.

‘We need to answer now the basic questions: what areas and under what conditions are the most dangerous? These questions are important for safe operation of the northern cities and infrastructure of oil and gas complexes.’

Crater B3 located in 90 kilometres from Antipayuta village, Yamal district. Picture: local residents.

Crater B4 located near Nosok village, on the north of Krasnoyarsk region, near Taimyr Peninsula. Picture: local residents.

How bad is it?

Since methane is a powerful greenhouse gas, some people are getting nervous. If global warming releases the huge amounts of methane trapped under permafrost, will that create more global warming? Could we be getting into a runaway feedback loop?

The Washington Post has a good article telling us to pay attention, but not panic:

• Chris Mooney, Why you shouldn’t freak out about those mysterious Siberian craters, Chris Mooney, 2 March 2015.

David Archer of the University of Chicago, a famous expert on climate change and the carbon cycle, took a look at thes craters and did some quick calculations. He estimated that “it would take about 20,000,000 such eruptions within a few years to generate the standard Arctic Methane Apocalypse that people have been talking about.”

More importantly, people are measuring the amount of methane in the air. We know how it’s doing. For example, you can make graphs of methane concentration here:

• Earth System Research Laboratory, Global Monitoring Division, Data visualization.

Click on a northern station like Alert, the scary name of a military base and research station in Nunavut—the huge northern province in Canada:

(Alert is on the very top, near Greenland.)

Choose Carbon cycle gases from the menu at right, and click on Time series. You’ll go to another page, and then choose Methane—the default choice is carbon dioxide. Go to the bottom of the page and click Submit and you’ll get a graph like this:

Methane has gone up from about 1750 to 1900 nanomoles per mole from 1985 to 2015. That’s a big increase—but not a sign of incipient disaster.

A larger perspective might help. Apparently from 1750 to 2007 the atmospheric CO2 concentration increased about 40% while the methane concentration has increased about 160%. The amount of additional radiative forcing due to CO2 is about 1.6 watts per square meter, while for methane it’s about 0.5:

Greenhouse gas: natural and anthropogenic sources, Wikipedia.

So, methane is significant, and increasing fast. So far CO2 is the 800-pound gorilla in the living room. But I’m glad Russian scientists are doing things like this:

The latest expedition to Yamal crater was initiated by the Russian Center of Arctic Exploration in early November 2014. The researchers were first in the world to enter this crater. Pictures: Vladimir Pushkarev/Russian Center of Arctic Exploration

Previous posts

For previous posts in this series, see:

Melting Permafrost (Part 1).

Melting Permafrost (Part 2).

Melting Permafrost (Part 3).

Exploring Climate Data (Part 3)

9 February, 2015

post by Nadja Kutz

This blog article is about the temperature data used in the reports of the Intergovernmental panel on Climate Change (IPCC). I present the results of an investigation into the completeness of global land surface temperature records. There are noticeable gaps in the data records, but I leave discussion about the implications of these gaps to the readers.

The data used in the newest IPCC report, namely the Fifth Assessment Report (AR5) is, as it seems, at the time of writing not yet available at the IPCC data distribution centre.

The temperature databases used for the previous report, AR4, are listed here on the website of the IPCC. These databases are:


NCDC (probably as a guess using the data set GHCNM v3),


• the collection of Lugina et al.

The temperature collection CRUTEM3 was put together by the Climatic Research Unit (CRU) at the University of East Anglia. According to the CRU temperature page the CRUTEM3 data and in particular the CRUTEM3 land air temperature anomalies on a 5° × 5° grid-box basis has now been superseded by the so-called CRUTEM4 collection.

Since the CRUTEM collection appeared to be an important data source for the IPCC, I started by investigating the land air temperature data collection CRUTEM4. In what follows, only the availability of so-called land air temperature measurements will be investigated. (The collections often also contain sea surface temperature (SST) measurements.)

Usually only ‘temperature grid data’ or other averaged data is used for the climate assessments. Here ‘grid’ means that data is averaged over regions that cover the earth in a grid. However, the data is originally generated by temperature measuring stations around the world. So, I was interested in this original data and its quality. For the CRUTEM collection the latest station data is called the CRUTEM4 station data collection.

I downloaded the station’s data file, which is a simple text file, from the bottom of the CRUTEM4 station data page. I noticed on a first glance that there are big gaps in the file in some regions of the world. The file is huge, though: it contains monthly measurements starting in January 1701 ending in 2011 and there are altogether 4634 stations. Quickly finding a gap in such a huge file was a sufficiently disconcerting experience that persuaded my husband Tim Hoffmann to help me to investigate this station data in more accessible way, via a visualization.

The visualization takes a long time to load, and due to some unfortunate software configuration issues (not on our side) it sometimes doesn’t work at all. Please open it now in a separate tab while reading this article:

• Nadja Kutz and Tim Hoffman, Temperature data from stations around the globe, collected by CRUTEM 4.

For those who are too lazy to explore the data themselves, or in case the visualization is not working, here are some screenshots from the visualization which documents the missing data in the CRUTEM4 dataset.

The images should speak for themselves. However, an additional explanation is provided after the images. One should in particular mention that it looks as if the deterioration of the CRUTEM4 data set has been greater in the years 2000-2009 than in the years 1980-2000.

Now you could say: okay, we know that there are budget cuts in the UK, and so probably the University of East Anglia was subject to those, but what about all these other collections in the world? This will be addressed after the images.

North America

Jan 1980

Jan 2000

Jan 2009


Jan 1980

Jan 2000

Jan 2009


Jan 1980

Jan 2000

Jan 2009

Eurasia/Northern Africa

Jan 1980

Jan 2000

Jan 2009



Jan 1980

Jan 2000

Jan 2009

These screenshots comprise various regions of the world for the month of January for the years 1980, 2000 and 2009. Each station is represented by a small rectangle around its coordinates. The color of a rectangle indicates the monthly temperature value for that station: blue is the coldest, red is the hotttest. Black rectangles are what CRU calls ‘missing data’, denoted with -999 in the file. I prefer instead to call it ‘invalid’ data, in order to distinguish it from the missing data due to stations that have been closed down. In the visualization, closed down stations are encoded by a transparent rectangle and their markers are also present.

We couldn’t find the reasons for this invalid data. At the end of the post John Baez has provided some more literature on this question. It is worth noting that satellites can replace surface measurements only to a certain degree, as was highlighted by Stefan Rahmstorf in a blog post on RealClimate:

the satellites cannot measure the near-surface temperatures but only those overhead at a certain altitude range in the troposphere. And secondly, there are a few question marks about the long-term stability of these measurements (temporal drift).

What about other collections?

Apart from the already mentioned collections, which were used in the IPCC’s AR4 report, there are actually some more institutional collections, and I also found some private weather collections. However among those private collections I haven’t found any collection that goes back in time as far as CRUTEM4. However, it could be that some of those private collections might be more complete in terms of actual data than the collections that reach further back in time.

After discussing our visualization on the Azimuth Forum it turned out that Nick Stokes, who runs the blog MOYHU in Australia, had the same idea as me—however, already in 2011. That is in this year he had visualized station data. For his visualization he used Google Earth. Moreover, for his visualization he used different temperature collections.

If you have Google Earth installed then you can see his visualizations here:

• Nick Stokes, click here.

The link is from the documentation page of Nick Stoke’s website.

What are the major collections?

As far as we can tell, the major global collections of temperature data that go back to the 18th or 19th or at least early 20th century seem to be following. First, there are the collections already mentioned, which are also used in the AR4 report:

• The CRUTEM collection from the University of East Anglia (UK).

• the GISTEMP collection from the Goddard Institute of Space Science (GISS) at NASA (US).

• the collection of Lugina et al, which is a cooperative project involving NCDC/NOAA (US) (see also below), the University of Maryland (US), St. Petersburg State University (Russia) and the State Hydrological Institute, St. Petersburg, (Russia).

• the GHCN collection from NOAA.

Then there are these:

• the Berkeley Earth collection, called BEST

• The GSOD (Global Summary Of the Day) and Global Historical Climatology Network (GHCN) collections. Both these are run by the National Climatic Data Center (NCDC) at National Oceanic and Atmospheric Administration (NOAA) (US). It is not clear to me to what extent these two databases overlap with those of Lugina et al, which were made in cooperation with NCDC/NOAA. It is also not clear to me whether the GHCN collection had been used for the AR4 report (it seems so). There is currently also a very partially working visualization of the GSOD data here. The sparse data in specific regions (see images above) is also apparent in this visualization.

• There is a comparatively new initiative called International Surface Temperatures Initiative (ISTI) which gathers collections in a databank and seeks to provide temperature data “from hourly to century timescales”. As written on their blog, this data seems not to be quality controlled:

The ISTI dataset is not quality controlled, so, after re-reading section 3.3 of Lawrimore et al 2011, I implemented an extremely simple quality control scheme, MADQC.

What did you visualize?

As far as I had understood in the visualization by Nick Stokes—which you just opened—the collection BEST (before 1850-2010), the collections GSOD (1921-2010) and GHCN v2 (before 1850-1990) from NOAA and CRUTEM3 (before 1850-2000) are represented.

CRUTEM3 is also visualized in another way Clive Best. In Clive Best’s visualization, it seems however that one has apart from the station name no further access to other data, like station temperatures, etc. Moreover, it is not possible to set a recent time range, which is important for checking how much the dataset changed in recent times.

Unfortunately this limited possibility to set a time range holds also true for two visualizations of Nick Stokes here and here. In his first visualization, which is more exhaustive than the second, the following datasets are shown: GHCNv3 and an adjusted version of it (GADJ), a prelimary dataset from ISTI, BEST and CRUTEM 4. So his first visualization seems quite exhaustive also with respect to newer data. Unfortunately, as mentioned, setting the time range didn’t work properly (at least when I tested it). The same holds for his second visualization of GHCN v3 data. So, I was only able to trace the deterioration of recent data manually (for example, by clicking on individual stations).

Tim and I visualized CRUTEM4, that is, the updated version of CRUTEM3.

What did you not visualize?

Newer datasets after 2011/2012, for example from the aforementioned ISTI or from the private collections, are not visualized in the two collections you just opened.

Moreover in the visualizations mentioned hre, there is no coverage of the GISS collection, which however now uses NOAA’S GHCN v3 collections. The historical data of GISS could, however, be different from the other collections. The visualizations may also not cover the Lugina et al. collection, which was mentioned above in the context of the IPCC report. Lugina et al. could however be similar to GSOD (and GHCN) due to cooperation. Moreover, GHCN v3 could be substantially more exhaustive than CRUTEM or GHCN v2 (as shown in Nick Stoves visualization). However here the last collection was—like CRUTEM4—released in the spring of 2011.

GCHN v3 is also represented in Nick Stokes’ visualizations (here and here). Upon manually investigating it, it didn’t seem to much crucial additional data not found in CRUTEM4. Since this manual exploration was not exhaustive, I may be wrong—but I don’t think so.

Hence, to our knowledge, in the two visualizations you just opened, quite a lot of the available data is visualized—and as it seems “almost all” (?) of the far-back-reaching original quality controlled global surface temperature data collections as of 2011 or 2012. If you know of other similar collections please let us know.

As mentioned above private collections and in particular the ISTI collection may contain much more data. At the point of writing we don’t know in how far those newer collections will be taken into account for the new IPCC reports and in particular for the AR5 report. Moreover it seems not so clear how quality control may be ensured for those newer collections.

In conclusion, the previous IPCC reports seem to have been informed by the collections described here. Thus the coverage problems you see here need to be taken into account in discussions about the scientific base of previous climate descriptions.

Hopefully the visualizations from Nick Stokes and from Tim and me are ready for exploration! You can start to explore them yourself, and in particular see that the ‘deterioration of data’ is—just as in our CRUTEM4 visualization—also visible in Nick’s collections.

Note: I would like to thank people at the Azimuth Forum for pointing out references, and in particular Nick Stokes and Nathan Urban.

The effects of missing data

supplement by John Baez

There have always been fewer temperature recording stations in Arctic regions than other regions. The following paper initiated a controversy over how this fact affects our picture of the Earth’s climate:

Here is some discussion:

• Kevin Cowtan, Robert Way, and Dana Nuccitelli, Global warming since 1997 more than twice as fast as previously estimated, new study shows, Skeptical Science, 13 November 2013.

• Stefan Rahmstorf, Global warming since 1997 underestimated by half, RealClimate, 13 November 2013 in which it is highlighted that satellites can replace surface measurements only to a certain degree.

Anthony Watts’ protest about Cowtan, Way and the Arctic, HotWhopper, 15 November 2013.

• Victor Venema, Temperature trend over last 15 years is twice as large as previously thought , Variable Variability, 13 November 2013.

However, these posts seem to say little about the increasing amount of ‘missing data’.

Networks in Climate Science

29 November, 2014

What follows is draft of a talk I’ll be giving at the Neural Information Processing Seminar on December 10th. The actual talk may contain more stuff—for example, more work that Dara Shayda has done. But I’d love comments now, so I’m posting this now and hoping you can help out.

You can click on any of the pictures to see where it came from or get more information.

Preliminary throat-clearing

I’m very flattered to be invited to speak here. I was probably invited because of my abstract mathematical work on networks and category theory. But when I got the invitation, instead of talking about something I understood, I thought I’d learn about something a bit more practical and talk about that. That was a bad idea. But I’ll try to make the best of it.

I’ve been trying to learn climate science. There’s a subject called ‘complex networks’ where people do statistical analyses of large graphs like the worldwide web or Facebook and draw conclusions from it. People are trying to apply these ideas to climate science. So that’s what I’ll talk about. I’ll be reviewing a lot of other people’s work, but also describing some work by a project I’m involved in, the Azimuth Project.

The Azimuth Project is an all-volunteer project involving scientists and programmers, many outside academia, who are concerned about environmental issues and want to use their skills to help. This talk is based on the work of many people in the Azimuth Project, including Jan Galkowski, Graham Jones, Nadja Kutz, Daniel Mahler, Blake Pollard, Paul Pukite, Dara Shayda, David Tanzer, David Tweed, Steve Wenner and others. Needless to say, I’m to blame for all the mistakes.

Climate variability and El Niño

Okay, let’s get started.

You’ve probably heard about the ‘global warming pause’. Is this a real thing? If so, is it due to ‘natural variability’, heat going into the deep oceans, some combination of both, a massive failure of our understanding of climate processes, or something else?

Here is chart of global average air temperatures at sea level, put together by NASA’s Goddard Institute of Space Science:

You can see a lot of fluctuations, including a big dip after 1940 and a tiny dip after 2000. That tiny dip is the so-called ‘global warming pause’. What causes these fluctuations? That’s a big, complicated question.

One cause of temperature fluctuations is a kind of cycle whose extremes are called El Niño and La Niña.

A lot of things happen during an El Niño. For example, in 1997 and 1998, a big El Niño, we saw all these events:

El Niño is part of an irregular cycle that happens every 3 to 7 years, called the El Niño Southern Oscillation or ENSO. Two strongly correlated signs of an El Niño are:

1) Increased sea surface temperatures in a patch of the Pacific called the Niño 3.4 region. The temperature anomaly in this region—how much warmer it is than usual for that time of year—is called the Niño 3.4 index.

2) A decrease in air pressures in the western side of the Pacific compared to those further east. This is measured by the Southern Oscillation Index or SOI.

You can see the correlation here:

El Niños are important because they can cause billions of dollars of economic damage. They also seem to bring heat stored in the deeper waters of the Pacific into the atmosphere. So, one reason for the ‘global warming pause’ may be that we haven’t had a strong El Niño since 1998. The global warming pause might end with the next El Niño. For a while it seemed we were due for a big one this fall, but that hasn’t happened.


The ENSO cycle is just one of many cycles involving teleconnections: strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using principal component analysis of climate data, and also other techniques.

The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade. Some others include:

• The Pacific-North America Oscillation.
• The Pacific Decadal Oscillation.
• The North Atlantic Oscillation.
• The Arctic Oscillation.

For example, the Pacific Decadal Oscillation is a longer-period relative of the ENSO, centered in the north Pacific:

Complex network theory

Recently people have begun to study teleconnections using ideas from ‘complex network theory’.

What’s that? In complex network theory, people often start with a weighted graph: that is, a set N of nodes and for any pair of nodes i, j \in N, a weight A_{i j}, which can be any nonnegative real number.

Why is this called a weighted graph? It’s really just a matrix of nonnegative real numbers!

The reason is that we can turn any weighted graph into a graph by drawing an edge from node j to node i whenever A_{i j} >0. This is a directed graph, meaning that we should draw an arrow pointing from j to i. We could have an edge from i to j but not vice versa! Note that we can also have an edge from a node to itself.

Conversely, if we have any directed graph, we can turn it into a weighted graph by choosing the weight A_{i j} = 1 when there’s an edge from j to i, and A_{i j} = 0 otherwise.

For example, we can make a weighted graph where the nodes are web pages and A_{i j} is the number of links from the web page j to the web page i.

People in complex network theory like examples of this sort: large weighted graphs that describe connections between web pages, or people, or cities, or neurons, or other things. The goal, so far, is to compute numbers from weighted graphs in ways that describe interesting properties of these complex networks—and then formulate and test hypotheses about the complex networks we see in real life.

The El Niño basin

Here’s a very simple example of what we can do with a weighted graph. For any node i, we can sum up the weights of edges going into i:

\sum_{j \in N} A_{j i}

This is called the degree of the node i. For example, if lots of people have web pages with lots of links to yours, your webpage will have a high degree. If lots of people like you on Facebook, you will have a high degree.

So, the degree is some measure of how ‘important’ a node is.

People have constructed climate networks where the nodes are locations on the Earth’s surface, and the weight A_{i j} measures how correlated the weather is at the ith and jth location. Then, the degree says how ‘important’ a given location is for the Earth’s climate—in some vague sense.

For example, in Complex networks in climate dynamics, Donges et al take surface air temperature data on a grid and compute the correlation between grid points.

More precisely, let T_i(t) be the temperature at the ith grid point at month t after the average for that month in all years under consideration has been subtracted off, to eliminate some seasonal variations. They compute the Pearson correlation A_{i j} of T_i(t) and T_j(t) for each pair of grid points i, j. The Pearson correlation is the simplest measure of linear correlation, normalized to range between -1 and 1.

We could construct a weighted graph this way, and it would be symmetric, or undirected:

A_{i j} = A_{j i}

However, Donges et al prefer to work with a graph rather than a weighted graph. So, they create a graph where there is an edge from i to j (and also from j to i) when |A_{i j}| exceeds a certain threshold, and no edge otherwise.

They can adjust this threshold so that any desired fraction of pairs i, j actually have an edge between them. After some experimentation they chose this fraction to be 0.5%.

A certain patch dominates the world! This is the El Niño basin. The Indian Ocean comes in second.

(Some details, which I may not say:

The Pearson correlation is the covariance

\Big\langle \left( T_i - \langle T_i \rangle \right) \left( T_j - \langle T_j \rangle \right) \Big\rangle

normalized by dividing by the standard deviation of T_i and the standard deviation of T_j.

The reddest shade of red in the above picture shows nodes that are connected to 5% or more of the other nodes. These nodes are connected to at least 10 times as many nodes as average.)

The Pearson correlation detects linear correlations. A more flexible measure is mutual information: how many bits of information knowing the temperature at time t at grid point i tells you about the temperature at the same time at grid point j.

Donges et al create a climate network this way as well, putting an edge between nodes if their mutual information exceeds a certain cutoff. They choose this cutoff so that 0.5% of node pairs have an edge between them, and get the following map:

The result is almost indistinguishable in the El Niño basin. So, this feature is not just an artifact of focusing on linear correlations.

El Niño breaks climate links

We can also look at how climate networks change with time—and in particular, how they are affected by El Niños. This is the subject of a 2008 paper by Tsonis and Swanson, Topology and predictability of El Niño and La Niña networks.

They create a climate network in a way that’s similar to the one I just described. The main differences are that they:

  1. separately create climate networks for El Niño and La Niña time periods;
  2. create a link between grid points when their Pearson correlation has absolute value greater than $0.5;$

  3. only use temperature data from November to March in each year, claiming that summertime introduces spurious links.

They get this map for La Niña conditions:

and this map for El Niño conditions:

They conclude that “El Niño breaks climate links”.

This may seem to contradict what I just said a minute ago. But it doesn’t! While the El Niño basin is a region where the surface air temperatures are highly correlated to temperatures at many other points, when an El Niño actually occurs it disrupts correlations between temperatures at different locations worldwide—and even in the El Niño basin!

For the rest of the talk I want to focus on a third claim: namely, that El Niños can be predicted by means of an increase in correlations between temperatures within the El Niño basin and temperatures outside this region. This claim was made in a recent paper by Ludescher et al. I want to examine it somewhat critically.

Predicting El Niños

People really want to predict El Niños, because they have huge effects on agriculture, especially around the Pacific ocean. However, it’s generally regarded as very hard to predict El Niños more than 6 months in advance. There is also a spring barrier: it’s harder to predict El Niños through the spring of any year.

It’s controversial how much of the unpredictability in the ENSO cycle is due to chaos intrinsic to the Pacific ocean system, and how much is due to noise from outside the system. Both may be involved.

There are many teams trying to predict El Niños, some using physical models of the Earth’s climate, and others using machine learning techniques. There is a kind of competition going on, which you can see at a National Oceanic and Atmospheric Administration website.

The most recent predictions give a sense of how hard this job is:

When the 3-month running average of the Niño 3.4 index exceeds 0.5°C for 5 months, we officially declare that there is an El Niño.

As you can see, it’s hard to be sure if there will be an El Niño early next year! However, the consensus forecast is yes, a weak El Niño. This is the best we can do, now. Right now multi-model ensembles have better predictive skill than any one model.

The work of Ludescher et al

The Azimuth Project has carefully examined a 2013 paper by Ludescher et al called Very early warning of next El Niño, which uses a climate network for El Niño prediction.

They build their climate network using correlations between daily surface air temperature data between points inside the El Niño basin and certain points outside this region, as shown here:

The red dots are the points in their version of the El Niño basin.

(Next I will describe Ludescher’s procedure. I may omit some details in the actual talk, but let me include them here.)

The main idea of Ludescher et al is to construct a climate network that is a weighted graph, and to say an El Niño will occur if the average weight of edges between points in the El Niño basin and points outside this basin exceeds a certain threshold.

As in the other papers I mentioned, Ludescher et al let T_i(t) be the surface air temperature at the ith grid point at time t minus the average temperature at that location at that time of year in all years under consideration, to eliminate the most obvious seasonal effects.

They consider a time-delayed covariance between temperatures at different grid points:

\langle T_i(t) T_j(t - \tau) \rangle - \langle T_i(t) \rangle \langle T_j(t - \tau) \rangle

where \tau is a time delay, and the angle brackets denote a running average over the last year, that is:

\displaystyle{ \langle f(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} f(t - d) }

where t is the time in days.

They normalize this to define a correlation C_{i,j}^t(\tau) that ranges from -1 to 1.

Next, for any pair of nodes i and j, and for each time t, they determine the maximum, the mean and the standard deviation of |C_{i,j}^t(\tau)|, as the delay \tau ranges from -200 to 200 days.

They define the link strength S_{i,j}(t) as the difference between the maximum and the mean value of |C_{i,j}^t(\tau)|, divided by its standard deviation.

Finally, they let S(t) be the average link strength, calculated by averaging S_{i j}(t) over all pairs i,j where i is a grid point inside their El Niño basin and j is a grid point outside this basin, but still in their larger rectangle.

Here is what they get:

The blue peaks are El Niños: episodes where the Niño 3.4 index is over 0.5°C for at least 5 months.

The red line is their ‘average link strength’. Whenever this exceeds a certain threshold \Theta = 2.82, and the Niño 3.4 index is not already over 0.5°C, they predict an El Niño will start in the following calendar year.

Ludescher et al chose their threshold for El Niño prediction by training their algorithm on climate data from 1948 to 1980, and tested it on data from 1981 to 2013. They claim that with this threshold, their El Niño predictions were correct 76% of the time, and their predictions of no El Niño were correct in 86% of all cases.

On this basis they claimed—when their paper was published in February 2014—that the Niño 3.4 index would exceed 0.5 by the end of 2014 with probability 3/4.

The latest data as of 1 December 2014 seems to say: yes, it happened!

Replication and critique

Graham Jones of the Azimuth Project wrote code implementing Ludescher et al’s algorithm, as best as we could understand it, and got results close to theirs, though not identical. The code is open-source; one goal of the Azimuth Project is to do science ‘in the open’.

More interesting than the small discrepancies between our calculation and theirs is the question of whether ‘average link strengths’ between points in the El Niño basin and points outside are really helpful in predicting El Niños.

Steve Wenner, a statistician helping the Azimuth Project, noted some ambiguities in Ludescher et al‘s El Niño prediction rules and disambiguated them in a number of ways. For each way he used Fischer’s exact test to compute the p-value of the null hypothesis that Ludescher et al‘s El Niño prediction does not improve the odds that what they predict will occur.

The best he got (that is, the lowest p-value) was 0.03. This is just a bit more significant than the conventional 0.05 threshold for rejecting a null hypothesis.

Do high average link strengths between points in the El Niño basin and points elsewhere in the Pacific really increase the chance that an El Niño is coming? It is hard to tell from the work of Ludescher et al.

One reason is that they treat El Niño as a binary condition, either on or off depending on whether the Niño 3.4 index for a given month exceeds 0.5 or not. This is not the usual definition of El Niño, but the real problem is that they are only making a single yes-or-no prediction each year for 65 years: does an El Niño occur during this year, or not? 31 of these years (1950-1980) are used for training their algorithm, leaving just 34 retrodictions and one actual prediction (1981-2013, and 2014).

So, there is a serious problem with small sample size.

We can learn a bit by taking a different approach, and simply running some linear regressions between the average link strength and the Niño 3.4 index for each month. There are 766 months from 1950 to 2013, so this gives us more data to look at. Of course, it’s possible that the relation between average link strength and Niño is highly nonlinear, so a linear regression may not be appropriate. But it is at least worth looking at!

Daniel Mahler and Dara Shayda of the Azimuth Project did this and found the following interesting results.

Simple linear models

Here is a scatter plot showing the Niño 3.4 index as a function of the average link strength on the same month:

(Click on these scatter plots for more information.)

The coefficient of determination, R^2, is 0.0175. In simple terms, this means that the average link strength in a given month explains just 1.75% of the variance of the Niño 3.4 index. That’s quite low!

Here is a scatter plot showing the Niño 3.4 index as a function of the average link strength six months earlier:

Now R^2 is 0.088. So, the link strength explains 8.8% of the variance in the Niño 3.4 index 6 months later. This is still not much—but interestingly, it’s much more than when we try to relate them at the same moment in time! And the p-value is less than 2.2 \cdot 10^{-16}, so the effect is statistically significant.

Of course, we could also try to use Niño 3.4 to predict itself. Here is the Niño 3.4 index plotted against the Niño 3.4 index six months earlier:

Now R^2 = 0.162. So, this is better than using the average link strength!

That doesn’t sound good for average link strength. But now let’s could try to predict Niño 3.4 using both itself and the average link strength 6 months earlier. Here is a scatter plot showing that:

Here the x axis is an optimally chosen linear combination of average and link strength and Niño 3.4: one that maximizes R^2.

In this case we get R^2 = 0.22.


What can we conclude from this?

Using a linear model, the average link strength on a given month accounts for only 8% of the variance of Niño 3.4 index 6 months in the future. That sounds bad, and indeed it is.

However, there are more interesting things to say than this!

Both the Niño 3.4 index and the average link strength can be computed from the surface air temperature of the Pacific during some window in time. The Niño 3.4 index explains 16% of its own variance 6 months into the future; the average link strength explains 8%, and taken together they explain 22%. So, these two variables contain a fair amount of independent information about the Niño 3.4 index 6 months in the future.

Furthermore, they explain a surprisingly large amount of its variance for just 2 variables.

For comparison, Mahler used a random forest variant called ExtraTreesRegressor to predict the Niño 3.4 index 6 months into the future from much larger collections of data. Out of the 778 months available he trained the algorithm on the first 400 and tested it on the remaining 378.

The result: using a full world-wide grid of surface air temperature values at a given moment in time explains only 23% of the Niño 3.4 index 6 months into the future. A full grid of surface air pressure values does considerably better, but still explains only 34% of the variance. Using twelve months of the full grid of pressure values only gets around 37%.

From this viewpoint, explaining 22% of the variance with just two variables doesn’t look so bad!

Moreover, while the Niño 3.4 index is maximally correlated with itself at the same moment in time, for obvious reasons, the average link strength is maximally correlated with the Niño 3.4 index 10 months into the future:

(The lines here occur at monthly intervals.)

However, we have not tried to determine if the average link strength as Ludescher et al define it is optimal in this respect. Graham Jones has shown that simplifying their definition of this quantity doesn’t change it much. Maybe modifying their definition could improve it. There seems to be a real phenomenon at work here, but I don’t think we know exactly what it is!

My talk has avoided discussing physical models of the ENSO, because I wanted to focus on very simple, general ideas from complex network theory. However, it seems obvious that really understanding the ENSO requires a lot of ideas from meteorology, oceanography, physics, and the like. I am not advocating a ‘purely network-based approach’.

El Niño Project (Part 8)

14 October, 2014

So far we’ve rather exhaustively studied a paper by Ludescher et al which uses climate networks for El Niño prediction. This time I’d like to compare another paper:

• Y. Berezin, Avi Gozolchiani, O. Guez and Shlomo Havlin, Stability of climate networks with time, Scientific Reports 2 (2012).

Some of the authors are the same, and the way they define climate networks is very similar. But their goal here is different: they want to see see how stable climate networks are over time. This is important, since the other paper wants to predict El Niños by changes in climate networks.

They divide the world into 9 zones:

For each zone they construct several climate networks. Each one is an array of numbers W_{l r}^y, one for each year y and each pair of grid points l, r in that zone. They call W_{l r}^y a link strength: it’s a measure of how how correlated the weather is at those two grid points during that year.

I’ll say more later about how they compute these link strengths. In Part 3 we explained one method for doing it. This paper uses a similar but subtly different method.

The paper’s first big claim is that W_{l r}^y doesn’t change much from year to year, “in complete contrast” to the pattern of local daily air temperature and pressure fluctuations. In simple terms: the strength of the correlation between weather at two different points tends to be quite stable.

Moreover, the definition of link strength involves an adjustable time delay, \tau. We can measure the correlation between the weather at point l at any given time and point r at a time \tau days later. The link strength is computed by taking a maximum over time delays \tau. Naively speaking, the value of \tau that gives the maximum correlation is “how long it typically takes for weather at point l to affect weather at point r”. Or the other way around, if \tau is negative.

This is a naive way of explaining the idea, because I’m mixing up correlation with causation. But you get the idea, I hope.

Their second big claim is that when the link strength between two points l and r is big, the value of \tau that gives the maximum correlation doesn’t change much from year to year. In simple terms: if the weather at two locations is strongly correlated, the amount of time it takes for weather at one point to reach the other point doesn’t change very much.

The data

How do Berezin et al define their climate network?

They use data obtained from here:

NCEP-DOE Reanalysis 2.

This is not exactly the same data set that Ludescher et al use, namely:

NCEP/NCAR Reanalysis 1.

“Reanalysis 2” is a newer attempt to reanalyze and fix up the same pile of data. That’s a very interesting issue, but never mind that now!

Berezin et al use data for:

• the geopotential height for six different pressures


• the air temperature at those different heights

The geopotential height for some pressure says roughly how high you have to go for air to have that pressure. Click the link if you want a more precise definition! Here’s the geopotential height field for the pressure of 500 millibars on some particular day of some particular year:

The height is in meters.

Berezin et al use daily values for this data for:

• locations world-wide on a grid with a resolution of 5° × 5°,


• the years from 1948 to 2006.

They divide the globe into 9 zones, and separately study each zone:

So, they’ve got twelve different functions of space and time, where space is a rectangle discretized using a 5° × 5° grid, and time is discretized in days. From each such function they build a ‘climate network’.

How do they do it?

The climate networks

Berezin’s method of defining a climate network is similar to Ludescher et al‘s, but different. Compare Part 3 if you want to think about this.

Let \tilde{S}^y_l(t) be any one of their functions, evaluated at the grid point l on day t of year y.

Let S_l^y(t) be \tilde{S}^y_l(t) minus its climatological average. For example, if t is June 1st and y is 1970, we average the temperature at location l over all June 1sts from 1948 to 2006, and subtract that from \tilde{S}^y_l(t) to get S^y_l(t). In other words:

\displaystyle{  \tilde{S}^y_l(t) = S^y_l(t) - \frac{1}{N} \sum_y S^y_l(t)  }

where N is the number of years considered.

For any function of time f, let \langle f^y(t) \rangle be the average of the function over all days in year y. This is different than the ‘running average’ used by Ludescher et al, and I can’t even be 100% sure that Berezin mean what I just said: they use the notation \langle f^y(t) \rangle.

Let l and r be two grid points, and \tau any number of days in the interval [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}]. Define the cross-covariance function at time t by:

\Big(f_l(t) - \langle f_l(t) \rangle\Big) \; \Big( f_r(t + \tau) - \langle f_r(t + \tau) \rangle \Big)

I believe Berezin mean to consider this quantity, because they mention two grid points l and r. Their notation omits the subscripts l and r so it is impossible to be completely sure what they mean! But what I wrote is the reasonable quantity to consider here, so I’ll assume this is what they meant.

They normalize this quantity and take its absolute value, forming:

\displaystyle{ X_{l r}^y(\tau) = \frac{\Big|\Big(f_l(t) - \langle f_l(t) \rangle\Big) \; \Big( f_r(t + \tau) - \langle f_r(t + \tau) \rangle \Big)\Big|}   {\sqrt{\Big\langle \Big(f_l(t)      - \langle f_l(t)\rangle \Big)^2 \Big\rangle  }  \; \sqrt{\Big\langle \Big(f_r(t+\tau) - \langle f_r(t+\tau)\rangle\Big)^2 \Big\rangle  } }  }

They then take the maximum value of X_{l r}^y(\tau) over delays \tau \in [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}], subtract its mean over delays in this range, and divide by the standard deviation. They write something like this:

\displaystyle{ W_{l r}^y = \frac{\mathrm{MAX}\Big( X_{l r}^y - \langle X_{l r}^y\rangle \Big) }{\mathrm{STD} X_{l r}^y} }

and say that the maximum, mean and standard deviation are taken over the (not written) variable \tau \in [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}].

Each number W_{l r}^y is called a link strength. For each year, the matrix of numbers W_{l r}^y where l and r range over all grid points in our zone is called a climate network.

We can think of a climate network as a weighted complete graph with the grid points l as nodes. Remember, an undirected graph is one without arrows on the edges. A complete graph is an undirected graph with one edge between any pair of nodes:

A weighted graph is an undirected graph where each edge is labelled by a number called its weight. But right now we’re also calling the weight the ‘link strength’.

A lot of what’s usually called ‘network theory’ is the study of weighted graphs. You can learn about it here:

• Ernesto Estrada, The Structure of Complex Networks: Theory and Applications, Oxford U. Press, Oxford, 2011.

Suffice it to say that given a weighted graph, there are lot of quantities you can compute from it, which are believed to tell us interesting things!

The conclusions

I will not delve into the real meat of the paper, namely what they actually do with their climate networks! The paper is free online, so you can read this yourself.

I will just quote their conclusions and show you a couple of graphs.

The conclusions touch on an issue that’s important for the network-based approach to El Niño prediction. If climate networks are ‘stable’, not changing much in time, why would we use them to predict a time-dependent phenomenon like the El Niño Southern Oscillation?

We have established the stability of the network of connections between the dynamics of climate variables (e.g. temperatures and geopotential heights) in different geographical regions. This stability stands in fierce contrast to the observed instability of the original climatological field pattern. Thus the coupling between different regions is, to a large extent, constant and predictable. The links in the climate network seem to encapsulate information that is missed in analysis of the original field.

The strength of the physical connection, W_{l r}, that each link in this network represents, changes only between 5% to 30% over time. A clear boundary between links that represent real physical dependence and links that emerge due to noise is shown to exist. The distinction is based on both the high link average strength \overline{W_{l r}} and on the low variability of time delays \mathrm{STD}(T_{l r}).

Recent studies indicate that the strength of the links in the climate network changes during the El Niño Southern Oscillation and the North Atlantic Oscillation cycles. These changes are within the standard deviation of the strength of the links found here. Indeed in Fig. 3 it is clearly seen that the coefficient of variation of links in the El Niño basin (zone 9) is larger than other regions such as zone 1. Note that even in the El Niño basin the coefficient of variation is relatively small (less than 30%).

Beside the stability of single links, also the hierarchy of the link strengths in the climate network is preserved to a large extent. We have shown that this hierarchy is partially due to the two dimensional space in which the network is embedded, and partially due to pure physical coupling processes. Moreover the contribution of each of these effects, and the level of noise was explicitly estimated. The spatial effect is typically around 50% of the observed stability, and the noise reduces the stability value by typically 5%–10%.

The network structure was further shown to be consistent across different altitudes, and a monotonic relation between the altitude distance and the correspondence between the network structures is shown to exist. This yields another indication that the observed network structure represents effects of physical coupling.

The stability of the network and the contributions of different effects were summarized in specific relation to different geographical areas, and a clear distinction between equatorial and off–equatorial areas was observed. Generally, the network structure of equatorial regions is less stable and more fluctuative.

The stability and consistence of the network structure during time and across different altitudes stands in contrast to the known unstable variability of the daily anomalies of climate variables. This contrast indicates an analogy between the behavior of nodes in the climate network and the behavior of coupled chaotic oscillators. While the fluctuations of each coupled oscillators are highly erratic and unpredictable, the interactions between the oscillators is stable and can be predicted. The possible outreach of such an analogy lies in the search for known behavior patterns of coupled chaotic oscillators in the climate system. For example, existence of phase slips in coupled chaotic oscillators is one of the fingerprints for their cooperated behavior, which is evident in each of the individual oscillators. Some abrupt changes in climate variables, for example, might be related to phase slips, and can be understood better in this context.

On the basis of our measured coefficient of variation of single links (around 15%), and the significant overall network stability of 20–40%, one may speculatively assess the extent of climate change. However, for this assessment our current available data is too short and does not include enough time from periods before the temperature trends. An assessment of the relation between the network stability and climate change might be possible mainly through launching of global climate model “experiments” realizing other climate conditions, which we indeed intend to perform.

A further future outreach of our work can be a mapping between network features (such as network motifs) and known physical processes. Such a mapping was previously shown to exist between an autonomous cluster in the climate network and El Niño. Further structures without such a climate interpretation might point towards physical coupling processes which were not observed earlier.

(I have expanded some acronyms and deleted some reference numbers.)

Finally, here two nice graphs showing the average link strength as a function of distance. The first is based on four climate networks for Zone 1, the southern half of South America:

The second is based on four climate networks for Zone 9, a big patch of the Pacific north of the Equator which roughly corresponds to the ‘El Niño basin’:

As we expect, temperatures and geopotential heights get less correlated at points further away. But the rate at which the correlation drops off conveys interesting information! Graham Jones has made some interesting charts of this for the rectangle of the Pacific that Ludescher et al use for El Niño prediction, and I’ll show you those next time.

The series so far

El Niño project (part 1): basic introduction to El Niño and our project here.

El Niño project (part 2): introduction to the physics of El Niño.

El Niño project (part 3): summary of the work of Ludescher et al.

El Niño project (part 4): how Graham Jones replicated the work by Ludescher et al, using software written in R.

El Niño project (part 5): how to download R and use it to get files of climate data.

El Niño project (part 6): Steve Wenner’s statistical analysis of the work of Ludescher et al.

El Niño project (part 7): the definition of El Niño.

El Niño project (part 8): Berezin et al on the stability of climate networks.

Exploring Climate Data (Part 2)

16 September, 2014

guest post by Blake Pollard

I have been learning to make animations using R. This is an animation of the profile of the surface air temperature at the equator. So, the x axis here is the longitude, approximately from 120° E to 280° E. I pulled the data from the region that Graham Jones specified in his code on github: it’s equatorial line in the region that Ludescher et al. used:

For this animation I tried to show the 1997-1998 El Niño. Typically the Pacific is much cooler near South America, due to the upwelling of deep cold water:

(Click for more information.) That part of the Pacific gets even cooler during La Niña:

But it warms up during El Niños:

You can see that in the surface air temperature during the 1997-1998 El Niño, although by summer of 1998 things seem to be getting back to normal:

I want to practice making animations like this. I could make a much prettier and better-labelled animation that ran all the way from 1948 to today, but I wanted to think a little about what exactly is best to plot if we want to use it as an aid to understanding some of this El Niño business.


Get every new post delivered to your Inbox.

Join 3,093 other followers