Bleaching of the Great Barrier Reef

22 April, 2016

The chatter of gossip distracts us from the really big story, the Anthropocene: the new geological era we are bringing about. Here’s something that should be dominating the headlines: Most of the Great Barrier Reef, the world’s largest coral reef system, now looks like a ghostly graveyard.

Most corals are colonies of tiny genetically identical animals called polyps. Over centuries, their skeletons build up reefs, which are havens for many kinds of sea life. Some polyps catch their own food using stingers. But most get their food by symbiosis! They cooperate with single-celled organism called zooxanthellae. Zooxanthellae get energy from the sun’s light. They actually live inside the polyps, and provide them with food. Most of the color of a coral reef comes from these zooxanthellae.

When a polyp is stressed, the zooxanthellae living inside it may decide to leave. This can happen when the sea water gets too hot. Without its zooxanthellae, the polyp is transparent and the coral’s white skeleton is revealed—as you see here. We say the coral is bleached.

After they bleach, the polyps begin to starve. If conditions return to normal fast enough, the zooxanthellae may come back. If they don’t, the coral will die.

The Great Barrier Reef, off the northeast coast of Australia, contains over 2,900 reefs and 900 islands. It’s huge: 2,300 kilometers long, with an area of about 340,000 square kilometers. It can be seen from outer space!

With global warming, this reef has been starting to bleach. Parts of it bleached in 1998 and again in 2002. But this year, with a big El Niño pushing world temperatures to new record highs, is the worst.

Scientists have being flying over the Great Barrier Reef to study the damage, and divers have looked at some of the reefs in detail. Of the 522 reefs surveyed in the northern sector, over 80% are severely bleached and less than 1% are not bleached at all. The damage is less further south where the water is cooler—but most of the reefs are in the north:

The top expert on coral reefs in Australia, Terry Hughes, wrote:

I showed the results of aerial surveys of bleaching on the Great Barrier Reef to my students. And then we wept.

Imagine devoting your life to studying and trying to protect coral reefs, and then seeing this.

Some of the bleached reefs may recover. But as oceans continue to warm, the prospects look bleak. The last big El Niño was in 1998. With a lot of hard followup work, scientists showed that in the end, 16% of the world’s corals died in that event.

This year is quite a bit hotter.

So, global warming is not a problem for the future: it’s a problem now. It’s not good enough to cut carbon emissions eventually. We’ve got to get serious now.

I need to recommit myself to this. For example, I need to stop flying around to conferences. I’ve cut back, but I need to do much better. Future generations, living in the damaged world we’re creating, will not have much sympathy for our excuses.

New IPCC Report (Part 5)

14 April, 2014

guest post by Steve Easterbrook

(5) Current rates of ocean acidification are unprecedented.

The IPCC report says:

The pH of seawater has decreased by 0.1 since the beginning of the industrial era, corresponding to a 26% increase in hydrogen ion concentration. […] It is virtually certain that the increased storage of carbon by the ocean will increase acidification in the future, continuing the observed trends of the past decades. […] Estimates of future atmospheric and oceanic carbon dioxide concentrations indicate that, by the end of this century, the average surface ocean pH could be lower than it has been for more than 50 million years.

(Fig SPM.7c) CMIP5 multi-model simulated time series from 1950 to 2100 for global mean ocean surface pH. Time series of projections and a measure of uncertainty (shading) are shown for scenarios RCP2.6 (blue) and RCP8.5 (red). Black (grey shading) is the modelled historical evolution using historical reconstructed forcings

(Fig SPM.7c) CMIP5 multi-model simulated time series from 1950 to 2100 for global mean ocean surface pH. Time series of projections and a measure of uncertainty (shading) are shown for scenarios RCP2.6 (blue) and RCP8.5 (red). Black (grey shading) is the modelled historical evolution using historical reconstructed forcings. [The numbers indicate the number of models used in each ensemble.]

Ocean acidification has sometimes been ignored in discussions about climate change, but it is a much simpler process, and is much easier to calculate (notice the uncertainty range on the graph above is much smaller than most of the other graphs). This graph shows the projected acidification in the best and worst case scenarios (RCP2.6 and RCP8.5). Recall that RCP8.5 is the “business as usual” future.

Note that this doesn’t mean the ocean will become acid. The ocean has always been slightly alkaline—well above the neutral value of pH 7. So “acidification” refers to a drop in pH, rather than a drop below pH 7. As this continues, the ocean becomes steadily less alkaline. Unfortunately, as the pH drops, the ocean stops being supersaturated for calcium carbonate. If it’s no longer supersaturated, anything made of calcium carbonate starts dissolving. Corals and shellfish can no longer form. If you kill these off, the entire ocean food chain is affected. Here’s what the IPCC report says:

Surface waters are projected to become seasonally corrosive to aragonite in parts of the Arctic and in some coastal upwelling systems within a decade, and in parts of the Southern Ocean within 1–3 decades in most scenarios. Aragonite, a less stable form of calcium carbonate, undersaturation becomes widespread in these regions at atmospheric CO2 levels of 500–600 ppm.

You can download all of Climate Change 2013: The Physical Science Basis here. Click below to read any part of this series:

  1. The warming is unequivocal.
  2. Humans caused the majority of it.
  3. The warming is largely irreversible.
  4. Most of the heat is going into the oceans.
  5. Current rates of ocean acidification are unprecedented.
  6. We have to choose which future we want very soon.
  7. To stay below 2°C of warming, the world must become carbon negative.
  8. To stay below 2°C of warming, most fossil fuels must stay buried in the ground.

Climate Change 2013: The Physical Science Basis is also available chapter by chapter here:

  1. Front Matter
  2. Summary for Policymakers
  3. Technical Summary
    1. Supplementary Material


  1. Introduction
  2. Observations: Atmosphere and Surface
    1. Supplementary Material
  3. Observations: Ocean
  4. Observations: Cryosphere
    1. Supplementary Material
  5. Information from Paleoclimate Archives
  6. Carbon and Other Biogeochemical Cycles
    1. Supplementary Material
  7. Clouds and Aerosols

    1. Supplementary Material
  8. Anthropogenic and Natural Radiative Forcing
    1. Supplementary Material
  9. Evaluation of Climate Models
  10. Detection and Attribution of Climate Change: from Global to Regional
    1. Supplementary Material
  11. Near-term Climate Change: Projections and Predictability
  12. Long-term Climate Change: Projections, Commitments and Irreversibility
  13. Sea Level Change
    1. Supplementary Material
  14. Climate Phenomena and their Relevance for Future Regional Climate Change
    1. Supplementary Material


  1. Annex I: Atlas of Global and Regional Climate Projections
    1. Supplementary Material: RCP2.6, RCP4.5, RCP6.0, RCP8.5
  2. Annex II: Climate System Scenario Tables
  3. Annex III: Glossary
  4. Annex IV: Acronyms
  5. Annex V: Contributors to the WGI Fifth Assessment Report
  6. Annex VI: Expert Reviewers of the WGI Fifth Assessment Report

New IPCC Report (Part 4)

11 April, 2014

guest post by Steve Easterbrook

(4) Most of the heat is going into the oceans

The oceans have a huge thermal mass compared to the atmosphere and land surface. They act as the planet’s heat storage and transportation system, as the ocean currents redistribute the heat. This is important because if we look at the global surface temperature as an indication of warming, we’re only getting some of the picture. The oceans act as a huge storage heater, and will continue to warm up the lower atmosphere (no matter what changes we make to the atmosphere in the future).

(Box 3.1 Fig 1) Plot of energy accumulation in ZJ (1 ZJ = 1021 J) within distinct components of Earth’s climate system relative to 1971 and from 1971–2010 unless otherwise indicated. See text for data sources. Ocean warming (heat content change) dominates, with the upper ocean (light blue, above 700 m) contributing more than the deep ocean (dark blue, below 700 m; including below 2000 m estimates starting from 1992). Ice melt (light grey; for glaciers and ice caps, Greenland and Antarctic ice sheet estimates starting from 1992, and Arctic sea ice estimate from 1979–2008); continental (land) warming (orange); and atmospheric warming (purple; estimate starting from 1979) make smaller contributions. Uncertainty in the ocean estimate also dominates the total uncertainty (dot-dashed lines about the error from all five components at 90% confidence intervals).

(Box 3.1 Fig 1) Plot of energy accumulation in zettajoules within distinct components of Earth’s climate system relative to 1971 and from 1971–2010 unless otherwise indicated. Ocean warming (heat content change) dominates, with the upper ocean (light blue, above 700 m) contributing more than the deep ocean (dark blue, below 700 m; including below 2000 m estimates starting from 1992). Ice melt (light grey; for glaciers and ice caps, Greenland and Antarctic ice sheet estimates starting from 1992, and Arctic sea ice estimate from 1979–2008); continental (land) warming (orange); and atmospheric warming (purple; estimate starting from 1979) make smaller contributions. Uncertainty in the ocean estimate also dominates the total uncertainty (dot-dashed lines about the error from all five components at 90% confidence intervals).

Note the relationship between this figure (which shows where the heat goes) and the figure from Part 2 that showed change in cumulative energy budget from different sources:

The Earth's energy budget from 1970 to 2011. Cumulative energy flux (in zettajoules) into the Earth system from well-mixed and short-lived greenhouse gases, solar forcing, changes in tropospheric aerosol forcing, volcanic forcing and surface albedo, (relative to 1860–1879) are shown by the coloured lines and these are added to give the cumulative energy inflow (black; including black carbon on snow and combined contrails and contrail induced cirrus, not shown separately).

(Box 13.1 fig 1) The Earth’s energy budget from 1970 to 2011. Cumulative energy flux (in zettajoules) into the Earth system from well-mixed and short-lived greenhouse gases, solar forcing, changes in tropospheric aerosol forcing, volcanic forcing and surface albedo, (relative to 1860–1879) are shown by the coloured lines and these are added to give the cumulative energy inflow (black; including black carbon on snow and combined contrails and contrail induced cirrus, not shown separately).

Both graphs show zettajoules accumulating over about the same period (1970-2011). But the graph from Part 1 has a cumulative total just short of 800 zettajoules by the end of the period, while today’s new graph shows the earth storing “only” about 300 zettajoules of this. Where did the remaining energy go? Because the earth’s temperature rose during this period, it also lost increasingly more energy back into space. When greenhouse gases trap heat, the earth’s temperature keeps rising until outgoing energy and incoming energy are in balance again.

You can download all of Climate Change 2013: The Physical Science Basis here. Click below to read any part of this series:

  1. The warming is unequivocal.
  2. Humans caused the majority of it.
  3. The warming is largely irreversible.
  4. Most of the heat is going into the oceans.
  5. Current rates of ocean acidification are unprecedented.
  6. We have to choose which future we want very soon.
  7. To stay below 2°C of warming, the world must become carbon negative.
  8. To stay below 2°C of warming, most fossil fuels must stay buried in the ground.

Climate Change 2013: The Physical Science Basis is also available chapter by chapter here:

  1. Front Matter
  2. Summary for Policymakers
  3. Technical Summary
    1. Supplementary Material


  1. Introduction
  2. Observations: Atmosphere and Surface
    1. Supplementary Material
  3. Observations: Ocean
  4. Observations: Cryosphere
    1. Supplementary Material
  5. Information from Paleoclimate Archives
  6. Carbon and Other Biogeochemical Cycles
    1. Supplementary Material
  7. Clouds and Aerosols

    1. Supplementary Material
  8. Anthropogenic and Natural Radiative Forcing
    1. Supplementary Material
  9. Evaluation of Climate Models
  10. Detection and Attribution of Climate Change: from Global to Regional
    1. Supplementary Material
  11. Near-term Climate Change: Projections and Predictability
  12. Long-term Climate Change: Projections, Commitments and Irreversibility
  13. Sea Level Change
    1. Supplementary Material
  14. Climate Phenomena and their Relevance for Future Regional Climate Change
    1. Supplementary Material


  1. Annex I: Atlas of Global and Regional Climate Projections
    1. Supplementary Material: RCP2.6, RCP4.5, RCP6.0, RCP8.5
  2. Annex II: Climate System Scenario Tables
  3. Annex III: Glossary
  4. Annex IV: Acronyms
  5. Annex V: Contributors to the WGI Fifth Assessment Report
  6. Annex VI: Expert Reviewers of the WGI Fifth Assessment Report

Life’s Struggle to Survive

19 December, 2013

Here’s the talk I gave at the SETI Institute:

When pondering the number of extraterrestrial civilizations, it is worth noting that even after it got started, the success of life on Earth was not a foregone conclusion. In this talk, I recount some thrilling episodes from the history of our planet, some well-documented but others merely theorized: our collision with the planet Theia, the oxygen catastrophe, the snowball Earth events, the Permian-Triassic mass extinction event, the asteroid that hit Chicxulub, and more, including the massive environmental changes we are causing now. All of these hold lessons for what may happen on other planets!

To watch the talk, click on the video above. To see
slides of the talk, click here!

Here’s a mistake in my talk that doesn’t appear in the slides: I suggested that Theia started at the Lagrange point in Earth’s orbit. After my talk, an expert said that at that time, the Solar System had lots of objects with orbits of high eccentricity, and Theia was probably one of these. He said the Lagrange point theory is an idiosyncratic theory, not widely accepted, that somehow found its way onto Wikipedia.

Another issue was brought up in the questions. In a paper in Science, Sherwood and Huber argued that:

Any exceedence of 35 °C for extended periods should
induce hyperthermia in humans and other mammals, as dissipation of metabolic heat becomes impossible. While this never happens now, it would begin to occur with global-mean warming of about 7 °C, calling the habitability of some regions into question. With 11-12 °C warming, such regions would spread to encompass the majority of the human population as currently distributed. Eventual warmings of 12 °C are
possible from fossil fuel burning.

However, the Paleocene-Eocene Thermal Maximum seems to have been even hotter:

So, the question is: where did mammals live during this period, which mammals went extinct, if any, and does the survival of other mammals call into question Sherwood and Huber’s conclusion?

Monte Carlo Methods in Climate Science

23 July, 2013

joint with David Tweed

One way the Azimuth Project can help save the planet is to get bright young students interested in ecology, climate science, green technology, and stuff like that. So, we are writing an article for Math Horizons, an American magazine for undergraduate math majors. This blog article is a draft of that. You can also see it in PDF form here.

We’d really like to hear your comments! There are severe limits on including more detail, since the article should be easy to read and short. So please don’t ask us to explain more stuff: we’re most interested to know if you sincerely don’t understand something, or feel that students would have trouble understanding something. For comparison, you can see sample Math Horizons articles here.


They look placid lapping against the beach on a calm day, but the oceans are actually quite dynamic. The ocean currents act as ‘conveyor belts’, transporting heat both vertically between the water’s surface and the depths and laterally from one area of the globe to another. This effect is so significant that the temperature and precipitation patterns can change dramatically when currents do.

For example: shortly after the last ice age, northern Europe experienced a shocking change in climate from 10,800 to 9,500 BC. At the start of this period temperatures plummeted in a matter of decades. It became 7° Celsius colder, and glaciers started forming in England! The cold spell lasted for over a thousand years, but it ended as suddenly as it had begun.

Why? The most popular theory is that that a huge lake in North America formed by melting glaciers burst its bank—and in a massive torrent lasting for years, the water from this lake rushed out to the northern Atlantic ocean. By floating atop the denser salt water, this fresh water blocked a major current: the Atlantic Meridional Overturning Circulation. This current brings warm water north and helps keep northern Europe warm. So, when iit shut down, northern Europe was plunged into a deep freeze.

Right now global warming is causing ice sheets in Greenland to melt and release fresh water into the North Atlantic. Could this shut down the Atlantic Meridional Overturning Circulation and make the climate of Northern Europe much colder? In 2010, Keller and Urban [KU] tackled this question using a simple climate model, historical data, probability theory, and lots of computing power. Their goal was to understand the spectrum of possible futures compatible with what we know today.

Let us look at some of the ideas underlying their work.

Box models

The earth’s physical behaviour, including the climate is far too complex to simulate from the bottom up using basic physical principles, at least for now. The most detailed models today can take days to run on very powerful computers. So to make reasonable predictions on a laptop in a tractable time-frame, geophysical modellers use some tricks.

First, it is possible to split geophysical phenomena into ‘boxes’ containing strongly related things. For example: atmospheric gases, particulate levels and clouds all affect each other strongly; likewise the heat content, currents and salinity of the oceans all interact strongly. However, the interactions between the atmosphere and the oceans are weaker, and we can approximately describe them using just a few settings, such as the amount of atmospheric CO2 entering or leaving the oceans. Clearly these interactions must be consistent—for example, the amount of CO2 leaving the atmosphere box must equal the amount entering the ocean box—but breaking a complicated system into parts lets different specialists focus on different aspects; then we can combine these parts and get an approximate model of entire planet. The box model used by Keller and Urban is shown in Figure 1.

1. The box model used by Keller and Urban.

Second, it turn out that simple but effective box models can be distilled from the complicated physics in terms of forcings and feedbacks. Essentially a forcing is a measured input to the system, such as solar radiation or CO2 released by burning fossil fuels. As an analogy, consider a child on a swing: the adult’s push every so often is a forcing. Similarly a feedback describes how the current ‘box variables’ influence future ones. In the swing analogy, one feedback is how the velocity will influence the future height. Specifying feedbacks typically uses knowledge of the detailed low-level physics to derive simple, tractable functional relationships between groups of large-scale observables, a bit like how we derive the physics of a gas by thinking about collisions of lots of particles.

However, it is often not feasible to get actual settings for the parameters in our model starting from first principles. In other words, often we can get the general form of the equations in our model, but they contain a lot of constants that we can estimate only by looking at historical data.

Probability modeling

Suppose we have a box model that depends on some settings S. For example, in Keller and Urban’s model, S is a list of 18 numbers. To keep things simple, suppose the settings are element of some finite set. Suppose we also have huge hard disc full of historical measurements, and we want to use this to find the best estimate of S. Because our data is full of ‘noise’ from other, unmodeled phenomena we generally cannot unambiguously deduce a single set of settings. Instead we have to look at things in terms of probabilities. More precisely, we need to study the probability that S take some value s given that the measurements take some value. Let’s call the measurements M, and again let’s keep things simple by saying M takes values in some finite set of possible measurements.

The probability that S = s given that M takes some value m is called the conditional probability P(S=s | M=m). How can we compute this conditional probability? This is a somewhat tricky problem.

One thing we can more easily do is repeatedly run our model with randomly chosen settings and see what measurements it predicts. By doing this, we can compute the probability that given setting values S = s, the model predicts measurements M=m. This again is a conditional probability, but now it is called P(M=m|S=s).

This is not what we want: it’s backwards! But here Bayes’ rule comes to the rescue, relating what we want to what we can more easily compute:

\displaystyle{ P(S = s | M = m) = P(M = m| S = s) \frac{P(S = s)}{P(M = m)} }

Here P(S = s) is the probability that the settings take a specific value s, and similarly for P(M = m). Bayes’ rule is quite easy to prove, and it is actually a general rule that applies to any random variables, not just the settings and the measurements in our problem [Y]. It underpins most methods of figuring out hidden quantities from observed ones. For this reason, it is widely used in modern statistics and data analysis [K].

How does Bayes’ rule help us here? When we repeatedly run our model with randomly chosen settings, we have control over P(S = s). As mentioned, we can compute P(M=m| S=s). Finally, P(M = m) is independent of our choice of settings. So, we can use Bayes’ rule to compute P(S = s | M = m) up to a constant factor. And since probabilities must sum to 1, we can figure out this constant.

This lets us do many things. It lets us find the most likely values of the settings for our model, given our hard disc full of observed data. It also lets us find the probability that the settings lie within some set. This is important: if we’re facing the possibility of a climate disaster, we don’t just want to know the most likely outcome. We would like to know to know that with 95% probability, the outcome will lie in some range.

An example

Let us look at an example much simpler than that considered by Keller and Urban. Suppose our measurements are real numbers m_0,\dots, m_T related by

m_{t+1} = s m_t - m_{t-1} + N_t

Here s, a real constant, is our ‘setting’, while N_t is some ‘noise’: an independent Gaussian random variable for each time t, each with mean zero and some fixed standard deviation. Then the measurements m_t will have roughly sinusoidal behavior but with irregularity added by the noise at each time step, as illustrated in Figure 2.

2. The example system: red are predicted measurements for a given value of the settings, green is another simulation for the same s value and blue is a simulation for a slightly different s.

Note how there is no clear signal from either the curves or the differences that the green curve is at the correct setting value while the blue one has the wrong one: the noise makes it nontrivial to estimate s. This is a baby version of the problem faced by Keller and Urban.

Markov Chain Monte Carlo

Having glibly said that we can compute the conditional probability P(M=m | S=s), how do we actually do this? The simplest way would be to run our model many, many times with the settings set at S=s and determine the fraction of times it predicts measurements equal to m. This gives us an estimate of P(M=m | S=s). Then we can use Bayes’ rule to work out P(M=m|S=s), at least up to a constant factor.

Doing all this by hand would be incredibly time consuming and error prone, so computers are used for this task. In our example, we do this in Figure 3. As we keep running our model over and over, the curve showing P(M=m |S=s) as a function of s settles down to the right answer.

3. The estimates of P(M=m | S=s) as a function of s using uniform sampling, ending up with 480 samples at each point.


However, this is computationally inefficient, as shown in the probability distribution for small numbers of samples. This has quite a few ‘kinks’, which only disappear later. The problem is that there are lots of possible choices of s to try. And this is for a very simple model!

When dealing with the 18 settings involved in the model of Keller and Urban, trying every combination would take far too long. A way to avoid this is Markov Chain Monte Carlo sampling. Monte Carlo is famous for its casinos, so a ‘Monte Carlo’ algorithm is one that uses randomness. A ‘Markov chain’ is a random walk: for example, where you repeatedly flip a coin and take one step right when you get heads, and one step right when you get tails. So, in Markov Chain Monte Carlo, we perform a random walk through the collection of all possible settings, collecting samples.

The key to making this work is that at each step on the walk a proposed modification s' to the current settings s is generated randomly—but it may be rejected if it does not seem to improve the estimates. The essence of the rule is:

The modification s \mapsto s' is randomly accepted with a probability equal to the ratio

\displaystyle{ \frac{P(M=m | S=s')}{ P(M=m | S=s)} }

Otherwise the walk stays at the current position.

If the modification is better, so that the ratio is greater than 1, the new state is always accepted. With some additional tricks—such as discarding the very beginning of the walk—this gives a set of samples from which can be used to compute P(M=m | S=s). Then we can compute P(S = s | M = m) using Bayes’ rule.

Figure 4 shows the results of using the Markov Chain Monte Carlo procedure to figure out P(S= s| M= m) in our example.

4. The estimates of P(S = s|M = m) curves using Markov Chain Monte Carlo, showing the current distribution estimate at increasing intervals. The red line shows the current position of the random walk. Again the kinks are almost gone in the final distribution.


Note that the final distribution has only peformed about 66 thousand simulations in total, while the full sampling peformed over 1.5 million. The key advantage of Markov Chain Monte Carlo is that it avoids performing many simulations in areas where the probability is low, as we can see from the way the walk path remains under the big peak in the probability density almost all the time. What is more impressive is that it achieves this without any global view of the probability density, just by looking at how P(M=m | S=s) changes when we make small changes in the settings. This becomes even more important as we move to dealing with systems with many more dimensions and settings, where it proves very effective at finding regions of high probability density whatever their shape.

Why is it worth doing so much work to estimate the probability distribution for settings for a climate model? One reason is that we can then estimate probabilities of future events, such as the collapse of the Atlantic Meridional Ocean Current. And what’s the answer? According to Keller and Urban’s calculation, this current will likely weaken by about a fifth in the 21st century, but a complete collapse is unlikely before 2300. This claim needs to be checked in many ways—for example, using more detailed models. But the importance of the issue is clear, and we hope we have made the importance of good mathematical ideas for climate science clear as well.

Exploring the topic

The Azimuth Project is a group of scientists, engineers and computer programmers interested in questions like this [A]. If you have questions, or want to help out, just email us. Versions of the computer programs we used in this paper will be made available here in a while.

Here are some projects you can try, perhaps with the help of Kruschke’s textbook [K]:

• There are other ways to do setting estimation using time series: compare some to MCMC in terms of accuracy and robustness.

• We’ve seen a 1-dimensional system with one setting. Simulate some multi-dimensional and multi-setting systems. What new issues arise?

Acknowledgements. We thank Nathan Urban and other
members of the Azimuth Project for many helpful discussions.


[A] Azimuth Project,

[KU] Klaus Keller and Nathan Urban, Probabilistic hindcasts and projections of the coupled climate, carbon cycle and Atlantic meridional overturning circulation system: a Bayesian fusion of century-scale measurements with a simple model, Tellus A 62 (2010), 737–750. Also available free online.

[K] John K. Kruschke, Doing Bayesian Data Analysis: A Tutorial with R and BUGS, Academic Press, New York, 2010.

[Y] Eliezer S. Yudkowsky, An intuitive explanation of Bayes’ theorem.

Petri Net Programming (Part 2)

20 December, 2012

guest post by David A. Tanzer

An introduction to stochastic Petri nets

In the previous article, I explored a simple computational model called Petri nets. They are used to model reaction networks, and have applications in a wide variety of fields, including population ecology, gene regulatory networks, and chemical reaction networks. I presented a simulator program for Petri nets, but it had an important limitation: the model and the simulator contain no notion of the rates of the reactions. But these rates critically determine the character of the dynamics of network.

Here I will introduce the topic of ‘stochastic Petri nets,’ which extends the basic model to include reaction dynamics. Stochastic means random, and it is presumed that there is an underlying random process that drives the reaction events. This topic is rich in both its mathematical foundations and its practical applications. A direct application of the theory yields the rate equation for chemical reactions, which is a cornerstone of chemical reaction theory. The theory also gives algorithms for analyzing and simulating Petri nets.

We are now entering the ‘business’ of software development for applications to science. The business logic here is nothing but math and science itself. Our study of this logic is not an academic exercise that is tangential to the implementation effort. Rather, it is the first phase of a complete software development process for scientific programming applications.

The end goals of this series are to develop working code to analyze and simulate Petri nets, and to apply these tools to informative case studies. But we have some work to do en route, because we need to truly understand the models in order to properly interpret the algorithms. The key questions here are when, why, and to what extent the algorithms give results that are empirically predictive. We will therefore be embarking on some exploratory adventures into the relevant theoretical foundations.

The overarching subject area to which stochastic Petri nets belong has been described as stochastic mechanics in the network theory series here on Azimuth. The theme development here will partly parallel that of the network theory series, but with a different focus, since I am addressing a computationally oriented reader. For an excellent text on the foundations and applications of stochastic mechanics, see:

• Darren Wilkinson, Stochastic Modelling for Systems Biology, Chapman and Hall/CRC Press, Boca Raton, Florida, 2011.

Review of basic Petri nets

A Petri net is a graph with two kinds of nodes: species and transitions. The net is populated with a collection of ‘tokens’ that represent individual entities. Each token is attached to one of the species nodes, and this attachment indicates the type of the token. We may therefore view a species node as a container that holds all of the tokens of a given type.

The transitions represent conversion reactions between the tokens. Each transition is ‘wired’ to a collection of input species-containers, and to a collection of output containers. When it ‘fires’, it removes one token from each input container, and deposits one token to each output container.

Here is the example we gave, for a simplistic model of the formation and dissociation of H2O molecules:

The circles are for species, and the boxes are for transitions.

The transition combine takes in two H tokens and one O token, and outputs one H2O token. The reverse transition is split, which takes in one H2O, and outputs two H’s and one O.

An important application of Petri nets is to the modeling of biochemical reaction networks, which include the gene regulatory networks. Since genes and enzymes are molecules, and their binding interactions are chemical reactions, the Petri net model is directly applicable. For example, consider a transition that inputs one gene G, one enzyme E, and outputs the molecular form G • E in which E is bound to a particular site on G.

Applications of Petri nets may differ widely in terms of the population sizes involved in the model. In general chemistry reactions, the populations are measured in units of moles (where a mole is ‘Avogadro’s number’ 6.022 · 1023 entities). In gene regulatory networks, on the other hand, there may only be a handful of genes and enzymes involved in a reaction.

This difference in scale leads to a qualitative difference in the modelling. With small population sizes, the stochastic effects will predominate, but with large populations, a continuous, deterministic, average-based approximation can be used.

Representing Petri nets by reaction formulas

Petri nets can also be represented by formulas used for chemical reaction networks. Here is the formula for the Petri net shown above:

H2O ↔ H + H + O

or the more compact:

H2O ↔ 2 H + O

The double arrow is a compact designation for two separate reactions, which happen to be opposites of each other.

By the way, this reaction is not physically realistic, because one doesn’t find isolated H and O atoms traveling around and meeting up to form water molecules. This is the actual reaction pair that predominates in water:

2 H2O ↔ OH + H3O+

Here, a hydrogen nucleus H+, with one unit of positive charge, gets removed from one of the H2O molecules, leaving behind the hydroxide ion OH. In the same stroke, this H+ gets re-attached to the other H2O molecule, which thereby becomes a hydronium ion, H3O+.

For a more detailed example, consider this reaction chain, which is of concern to the ocean environment:

CO2 + H2O ↔ H2CO3 ↔ H+ + HCO3

This shows the formation of carbonic acid, namely H2CO3, from water and carbon dioxide. The next reaction represents the splitting of carbonic acid into a hydrogen ion and a negatively charged bicarbonate ion, HCO3. There is a further reaction, in which a bicarbonate ion further ionizes into an H+ and a doubly negative carbonate ion CO32-. As the diagram indicates, for each of these reactions, a reverse reaction is also present. For a more detailed description of this reaction network, see:

• Stephen E. Bialkowski, Carbon dioxide and carbonic acid.

Increased levels of CO2 in the atmosphere will change the balance of these reactions, leading to a higher concentration of hydrogen ions in the water, i.e., a more acidic ocean. This is of concern because the metabolic processes of aquatic organisms is sensitive to the pH level of the water. The ultimate concern is that entire food chains could be disrupted, if some of the organisms cannot survive in a higher pH environment. See the Wikipedia page on ocean acidification for more information.

Exercise. Draw Petri net diagrams for these reaction networks.

Motivation for the study of Petri net dynamics

The relative rates of the various reactions in a network critically determine the qualitative dynamics of the network as a whole. This is because the reactions are ‘competing’ with each other, and so their relative rates determine the direction in which the state of the system is changing. For instance, if molecules are breaking down faster then they are being formed, then the system is moving towards full dissociation. When the rates are equal, the processes balance out, and the system is in an equilibrium state. Then, there are only temporary fluctuations around the equilibrium conditions.

The rate of the reactions will depend on the number of tokens present in the system. For example, if any of the input tokens are zero, then the transition can’t fire, and so its rate must be zero. More generally, when there are few input tokens available, there will be fewer reaction events, and so the firing rates will be lower.

Given a specification for the rates in a reaction network, we can then pose the following kinds of questions about its dynamics:

• Does the network have an equilibrium state?

• If so, what are the concentrations of the species at equilibrium?

• How quickly does it approach the equilibrium?

• At the equilibrium state, there will still be temporary fluctuations around the equilibrium concentrations. What are the variances of these fluctuations?

• Are there modes in which the network will oscillate between states?

This is the grail we seek.

Aside from actually performing empirical experiments, such questions can be addressed either analytically or through simulation methods. In either case, our first step is to define a theoretical model for the dynamics of a Petri net.

Stochastic Petri nets

A stochastic Petri net (with kinetics) is a Petri net that is augmented with a specification for the reaction dynamics. It is defined by the following:

• An underlying Petri net, which consists of species, transitions, an input map, and an output map. These maps assign to each transition a multiset of species. (Multiset means that duplicates are allowed.) Recall that the state of the net is defined by a marking function, that maps each species to its population count.

• A rate constant that is associated with each transition.

• A kinetic model, that gives the expected firing rate for each transition as a function of the current marking. Normally, this kinetic function will include the rate constant as a multiplicative factor.

A further ‘sanity constraint’ can be put on the kinetic function for a transition: it should give a positive value if and only if all of its inputs are positive.

• A stochastic model, which defines the probability distribution of the time intervals between firing events. This specific distribution of the firing intervals for a transition will be a function of the expected firing rate in the current marking.

This definition is based on the standard treatments found, for example in:

• M. Ajmone Marsan, Stochastic Petri nets: an elementary introduction, in Advances in Petri Nets, Springer, Berlin, 1989, 1–23.

or Wilkinson’s book mentioned above. I have also added an explicit mention of the kinetic model, based on the ‘kinetics’ described in here:

• Martin Feinberg, Lectures on chemical reaction networks.

There is an implied random process that drives the reaction events. A classical random process is given by a container with ‘particles’ that are randomly traveling around, bouncing off the walls, and colliding with each other. This is the general idea behind Brownian motion. It is called a random process because the outcome results from an ‘experiment’ that is not fully determined by the input specification. In this experiment, you pour in the ingredients (particles of different types), set the temperature (the distributions of the velocities), give it a stir, and then see what happens. The outcome consists of the paths taken by each of the particles.

In an important limiting case, the stochastic behavior becomes deterministic, and the population sizes become continuous. To see this, consider a graph of population sizes over time. With larger population sizes, the relative jumps caused by the firing of individual transitions become smaller, and graphs look more like continuous curves. In the limit, we obtain an approximation for high population counts, in which the graphs are continuous curves, and the concentrations are treated as continuous magnitudes. In a similar way, a pitcher of sugar can be approximately viewed as a continuous fluid.

This simplification permits the application of continuous mathematics to study of reaction network processes. It leads to the basic rate equation for reaction networks, which specifies the direction of change of the system as a function of the current state of the system.

In this article we will be exploring this continuous deterministic formulation of Petri nets, under what is known as the mass action kinetics. This kinetics is one implementation of the general specification of a kinetic model, as defined above. This means that it will define the expected firing rate of each transition, in a given marking of the net. The probabilistic variations in the spacing of the reactions—around the mean given by the expected firing rate—is part of the stochastic dynamics, and will be addressed in a subsequent article.

The mass-action kinetics

Under the mass action kinetics, the expected firing rate of a transition is proportional to the product of the concentrations of its input species. For instance, if the reaction were A + C → D, then the firing rate would be proportional to the concentration of A times the concentration of C, and if the reaction were A + A → D, it would be proportional to the square of the concentration of A.

This principle is explained by Feinberg as follows:

For the reaction A+C → D, an occurrence requires that a molecule of A meet a molecule of C in the reaction, and we take the probability of such an encounter to be proportional to the product [of the concentrations of A and C]. Although we do not presume that every such encounter yields a molecule of D, we nevertheless take the occurrence rate of A+C → D to be governed by [the product of the concentrations].

For an in-depth proof of the mass action law, see this article:

• Daniel Gillespie, A rigorous definition of the chemical master equation, 1992.

Note that we can easily pass back and forth between speaking of the population counts for the species, and the concentrations of the species, which is just the population count divided by the total volume V of the system. The mass action law applies to both cases, the only difference being that the constant factors of (1/V) used for concentrations will get absorbed into the rate constants.

The mass action kinetics is a basic law of empirical chemistry. But there are limits to its validity. First, as indicated in the proof in the Gillespie, the mass action law rests on the assumptions that the system is well-stirred and in thermal equilibrium. Further limits are discussed here:

• Georg Job and Regina Ruffler, Physical Chemistry (first five chapters), Section 5.2, 2010.

They write:

…precise measurements show that the relation above is not strictly adhered to. At higher concentrations, values depart quite noticeably from this relation. If we gradually move to lower concentrations, the differences become smaller. The equation here expresses a so-called “limiting law“ which strictly applies only when c → 0.

In practice, this relation serves as a useful approximation up to rather high concentrations. In the case of electrically neutral substances, deviations are only noticeable above 100 mol m−3. For ions, deviations become observable above 1 mol m−3, but they are so small that they are easily neglected if accuracy is not of prime concern.

Why would the mass action kinetics break down at high concentrations? According to the book quoted, it is due to “molecular and ionic interactions.” I haven’t yet found a more detailed explanation, but here is my supposition about what is meant by molecular interactions in this context. Doubling the number of A molecules doubles the number of expected collisions between A and C molecules, but it also reduces the probability that any given A and C molecules that are within reacting distance will actually react. The reaction probability is reduced because the A molecules are ‘competing’ for reactions with the C molecules. With more A molecules, it becomes more likely that a C molecule will simultaneously be within reacting distance of several A molecules; each of these A molecules reduces the probability that the other A molecules will react with the C molecule. This is most pronounced when the concentrations in a gas get high enough that the molecules start to pack together to form a liquid.

The equilibrium relation for a pair of opposite reactions

Suppose we have two opposite reactions:

T: A + B \stackrel{u}{\longrightarrow} C + D

T': C + D \stackrel{v}{\longrightarrow} A + B

Since the reactions have exactly opposite effects on the population sizes, in order for the population sizes to be in a stable equilibrium, the expected firing rates of T and T' must be equal:

\mathrm{rate}(T') = \mathrm{rate}(T)

By mass action kinetics:

\mathrm{rate}(T) = u [A] [B]

\mathrm{rate}(T') = v [C] [D]

where [X] means the concentration of X.

Hence at equilibrium:

u [A] [B] = v [C] [D]


\displaystyle{ \frac{[A][B]}{[C][D]} = \frac{v}{u} = K }

where K is the equilibrium constant for the reaction pair.

Equilibrium solution for the formation and dissociation of a diatomic molecule

Let A be some type of atom, and let D = A2 be the diatomic form of A. Then consider the opposite reactions:

A + A \stackrel{u}{\longrightarrow} D

D \stackrel{v}{\longrightarrow} A + A

From the preceding analysis, at equilibrium the following relation holds:

u [A]^2 = v [D]

Let N(A) and N(B) be the population counts for A and B, and let

N = N(A) + 2 N(D)

be the total number of units of A in the system, whether they be in the form of atoms or diatoms.

The value of N is an invariant property of the system. The reactions cannot change it, because they are just shuffling the units of A from one arrangement to the other. By way of contrast, N(A) is not an invariant quantity.

Dividing this equation by the total volume V, we get:

[N] = [A] + 2 [D]

where [N] is the concentration of the units of A.

Given a fixed value for [N] and the rate constants u and v, we can then solve for the concentrations at equilibrium:

\displaystyle{u [A]^2 = v [D] = v ([N] - [A]) / 2 }

\displaystyle{2 u [A]^2 + v [A] - v [N] = 0 }

\displaystyle{[A] = (-v \pm \sqrt{v^2 + 8 u v [N]}) / 4 u }

Since [A] can’t be negative, only the positive square root is valid.

Here is the solution for the case where u = v = 1:

\displaystyle{[A] = (\sqrt{8 [N] + 1} - 1) / 4 }

\displaystyle{[D] = ([N] - [A]) / 2 }


We’ve covered a lot of ground, starting with the introduction of the stochastic Petri net model, followed by a general discussion of reaction network dynamics, the mass action laws, and calculating equilibrium solutions for simple reaction networks.

We still have a number of topics to cover on our journey into the foundations, before being able to write informed programs to solve problems with stochastic Petri nets. Upcoming topics are (1) the deterministic rate equation for general reaction networks and its application to finding equilibrium solutions, and (2) an exploration of the stochastic dynamics of a Petri net. These are the themes that will support our upcoming software development.


12 March, 2011

I hope everyone reading this, and everyone they know, is okay…

Stories, anyone?

Check out this animation from NOAA, the National Oceanic and Atmospheric Administration:

The tsunami was unnoticeable here in Singapore. It was just 10 centimeters tall when it hit the North Maluku islands in Indonesia, and we’re protected from the open Pacific by lots of Indonesian islands.

Of course, this “protection” has its own dangers, since Indonesia is geologically active: since I’ve lived here there have been two volcanic eruptions in Java, and an earthquake in western Sumatra created a tsunami that killed over 282 people in the Mentawai islands. An earthquake in eastern Sumatra could cause a tsunami here, perhaps—Sumatra is visible from tall buildings downtown. But today things are fine, here.

They’re worse in California!—though as you might expect, some there took advantage of the tsunami for surfing.

This Week’s Finds (Week 307)

14 December, 2010

I’d like to take a break from interviews and explain some stuff I’m learning about. I’m eager to tell you about some papers in the book Tim Palmer helped edit, Stochastic Physics and Climate Modelling. But those papers are highly theoretical, and theories aren’t very interesting until you know what they’re theories of. So today I’ll talk about "El Niño", which is part of a very interesting climate cycle. Next time I’ll get into more of the math.

I hadn’t originally planned to get into so much detail on the El Niño, but this cycle is a big deal in southern California. In the city of Riverside, where I live, it’s very dry. There is a small river, but it’s just a trickle of water most of the time: there’s a lot less "river" than "side". It almost never rains between March and December. Sometimes, during a "La Niña", it doesn’t even rain in the winter! But then sometimes we have an "El Niño" and get huge floods in the winter. At this point, the tiny stream that gives Riverside its name swells to a huge raging torrent. The difference is very dramatic.

So, I’ve always wanted to understand how the El Niño cycle works — but whenever I tried to read an explanation, I couldn’t follow it!

I finally broke that mental block when I read some stuff on William Kessler‘s website. He’s an expert on the El Niño phenomenon who works at the Pacific Marine Environmental Laboratory. One thing I like about his explanations is that he says what we do know about the El Niño, and also what we don’t know. We don’t know what triggers it!

In fact, Kessler says the El Niño would make a great research topic for a smart young scientist. In an email to me, which he has allowed me to quote, he said:

We understand lots of details but the big picture remains mysterious. And I enjoyed your interview with Tim Palmer because it brought out a lot of the sources of uncertainty in present-generation climate modeling. However, with El Niño, the mystery is beyond Tim’s discussion of the difficulties of climate modeling. We do not know whether the tropical climate system on El Niño timescales is stable (in which case El Niño needs an external trigger, of which there are many candidates) or unstable. In the 80s and 90s we developed simple "toy" models that convinced the community that the system was unstable and El Niño could be expected to arise naturally within the tropical climate system. Now that is in doubt, and we are faced with a fundamental uncertainty about the very nature of the beast. Since none of us old farts has any new ideas (I just came back from a conference that reviewed this stuff), this is a fruitful field for a smart young person.

So, I hope some smart young person reads this and dives into working on El Niño!

But let’s start at the beginning. Why did I have so much trouble understanding explanations of the El Niño? Well, first of all, I’m an old fart. Second, most people are bad at explaining stuff: they skip steps, use jargon they haven’t defined, and so on. But third, climate cycles are hard to explain. There’s a lot about them we don’t understand — as Kessler’s email points out. And they also involve a kind of "cyclic causality" that’s a bit tough to mentally process.

At least where I come from, people find it easy to understand linear chains of causality, like "A causes B, which causes C". For example: why is the king’s throne made of gold? Because the king told his minister "I want a throne of gold!" And the minister told the servant, "Make a throne of gold!" And the servant made the king a throne of gold.

Now that’s what I call an explanation! It’s incredibly satisfying, at least if you don’t wonder why the king wanted a throne of gold in the first place. It’s easy to remember, because it sounds like a story. We hear a lot of stories like this when we’re children, so we’re used to them. My example sounds like the beginning of a fairy tale, where the action is initiated by a "prime mover": the decree of a king.

There’s something a bit trickier about cyclic causality, like "A causes B, which causes C, which causes A." It may sound like a sneaky trick: we consider "circular reasoning" a bad thing. Sometimes it is a sneaky trick. But sometimes this is how things really work!

Why does big business have such influence in American politics? Because big business hires lots of lobbyists, who talk to the politicians, and even give them money. Why are they allowed to do this? Because big business has such influence in American politics. That’s an example of a "vicious circle". You might like to cut it off — but like a snake holding its tail in its mouth, it’s hard to know where to start.

Of course, not all circles are "vicious". Many are "virtuous".

But the really tricky thing is how a circle can sometimes reverse direction. In academia we worry about this a lot: we say a university can either "ratchet up" or "ratchet down". A good university attracts good students and good professors, who bring in more grant money, and all this makes it even better… while a bad university tends to get even worse, for all the same reasons. But sometimes a good university goes bad, or vice versa. Explaining that transition can be hard.

It’s also hard to explain why a La Niña switches to an El Niño, or vice versa. Indeed, it seems scientists still don’t understand this. They have some models that simulate this process, but there are still lots of mysteries. And even if they get models that work perfectly, they still may not be able to tell a good story about it. Wind and water are ultimately described by partial differential equations, not fairy tales.

But anyway, let me tell you a story about how it works. I’m just learning this stuff, so take it with a grain of salt…

The "El Niño/Southern Oscillation" or "ENSO" is the largest form of variability in the Earth’s climate on times scales greater than a year and less than a decade. It occurs across the tropical Pacific Ocean every 3 to 7 years, and on average every 4 years. It can cause extreme weather such as floods and droughts in many regions of the world. Countries dependent on agriculture and fishing, especially those bordering the Pacific Ocean, are the most affected.

And here’s a cute little animation of it produced by the Australian Bureau of Meteorology:

Let me tell you first about La Niña, and then El Niño. If you keep glancing back at this little animation, I promise you can understand everything I’ll say.

Winds called trade winds blow west across the tropical Pacific. During La Niña years, water at the ocean’s surface moves west with these winds, warming up in the sunlight as it goes. So, warm water collects at the ocean’s surface in the western Pacific. This creates more clouds and rainstorms in Asia. Meanwhile, since surface water is being dragged west by the wind, cold water from below gets pulled up to take its place in the eastern Pacific, off the coast of South America.

I hope this makes sense so far. But there’s another aspect to the story. Because the ocean’s surface is warmer in the western Pacific, it heats the air and makes it rise. So, wind blows west to fill the "gap" left by rising air. This strengthens the westward-blowing trade winds.

So, it’s a kind of feedback loop: the oceans being warmer in the western Pacific helps the trade winds blow west, and that makes the western oceans even warmer.

Get it? This should all make sense so far, except for one thing. There’s one big question, and I hope you’re asking it. Namely:

Why do the trade winds blow west?

If I don’t answer this, my story so far would work just as well if I switched the words "west" and "east". That wouldn’t necessarily mean my story was wrong. It might just mean that there were two equally good options: a La Niña phase where the trade winds blow west, and another phase — say, El Niño — where they blow east! From everything I’ve said so far, the world could be permanently stuck in one of these phases. Or, maybe it could randomly flip between these two phases for some reason.

Something roughly like this last choice is actually true. But it’s not so simple: there’s not a complete symmetry between west and east.

Why not? Mainly because the Earth is turning to the east.

Air near the equator warms up and rises, so new air from more northern or southern regions moves in to take its place. But because the Earth is fatter at the equator, the equator is moving faster to the east. So, the new air from other places is moving less quickly by comparison… so as seen by someone standing on the equator, it blows west. This is an example of the Coriolis effect:

By the way: in case this stuff wasn’t tricky enough already, a wind that blows to the west is called an easterly, because it blows from the east! That’s what happens when you put sailors in charge of scientific terminology. So the westward-blowing trade winds are called "northeasterly trades" and "southeasterly trades" in the picture above. But don’t let that confuse you.

(I also tend to think of Asia as the "Far East" and California as the "West Coast", so I always need to keep reminding myself that Asia is in the west Pacific, while California is in the east Pacific. But don’t let that confuse you either! Just repeat after me until it makes perfect sense: "The easterlies blow west from West Coast to Far East".)

Okay: silly terminology aside, I hope everything makes perfect sense so far. The trade winds have a good intrinsic reason to blow west, but in the La Niña phase they’re also part of a feedback loop where they make the western Pacific warmer… which in turn helps the trade winds blow west.

But then comes an El Niño! Now for some reason the westward winds weaken. This lets the built-up warm water in the western Pacific slosh back east. And with weaker westward winds, less cold water is pulled up to the surface in the east. So, the eastern Pacific warms up. This makes for more clouds and rain in the eastern Pacific — that’s when we get floods in Southern California. And with the ocean warmer in the eastern Pacific, hot air rises there, which tends to counteract the westward winds even more!

In other words, all the feedbacks reverse themselves.

But note: the trade winds never mainly blow east. During an El Niño they still blow west, just a bit less. So, the climate is not flip-flopping between two symmetrical alternatives. It’s flip-flopping between two asymmetrical alternatives.

I hope all this makes sense… except for one thing. There’s another big question, and I hope you’re asking it. Namely:

Why do the westward trade winds weaken?

We could also ask the same question about the start of the La Niña phase: why do the westward trade winds get stronger?

The short answer is that nobody knows. Or at least there’s no one story that everyone agrees on. There are actually several stories… and perhaps more than one of them is true. But now let me just show you the data:

The top graph shows variations in the water temperature of the tropical Eastern Pacific ocean. When it’s hot we have El Niños: those are the red hills in the top graph. The blue valleys are La Niñas. Note that it’s possible to have two El Niños in a row without an intervening La Niña, or vice versa!

The bottom graph shows the "Southern Oscillation Index" or "SOI". This is the air pressure in Tahiti minus the air pressure in Darwin, Australia. You can see those locations here:

So, when the SOI is high, the air pressure is higher in the east Pacific than in the west Pacific. This is what we expect in an La Niña: that’s why the westward trade winds are strong then! Conversely, the SOI is low in the El Niño phase. This variation in the SOI is called the Southern Oscillation.

If you look at the graphs above, you’ll see how one looks almost like an upside-down version of the other. So, El Niño/La Niña cycle is tightly linked to the Southern Oscillation.

Another thing you’ll see from is that ENSO cycle is far from perfectly periodic! Here’s a graph of the Southern Oscillation Index going back a lot further:

This graph was made by William Kessler. His explanations of the ENSO cycle are the first ones I really understood:

My own explanation here is a slow-motion, watered-down version of his. Any mistakes are, of course, mine. To conclude, I want to quote his discussion of theories about why an El Niño starts, and why it ends. As you’ll see, this part is a bit more technical. It involves three concepts I haven’t explained yet:

  • The "thermocline" is the border between the warmer surface water in the ocean and the cold deep water, 100 to 200 meters below the surface. During the La Niña phase, warm water is blown to the western Pacific, and cold water is pulled up to the surface of the eastern Pacific. So, the thermocline is deeper in the west than the east:

    When an El Niño occurs, the thermocline flattens out:

  • "Oceanic Rossby waves" are very low-frequency waves in the ocean’s surface and thermocline. At the ocean’s surface they are only 5 centimeters high, but hundreds of kilometers across. They move at about 10 centimeters/second, requiring months to years to cross the ocean! The surface waves are mirrored by waves in the thermocline, which are much larger, 10-50 meters in height. When the surface goes up, the thermocline goes down.
  • The "Madden-Julian Oscillation" or "MJO" is the largest form of variability in the tropical atmosphere on time scales of 30-90 days. It’s a pulse that moves east across the Indian Ocean and Pacific ocean at 4-8 meters/second. It manifests itself as patches of anomalously high rainfall and also anomalously low rainfall. Strong Madden-Julian Oscillations are often seen 6-12 months before an El Niño starts.

With this bit of background, let’s read what Kessler wrote:

There are two main theories at present. The first is that the event is initiated by the reflection from the western boundary of the Pacific of an oceanic Rossby wave (type of low-frequency planetary wave that moves only west). The reflected wave is supposed to lower the thermocline in the west-central Pacific and thereby warm the SST [sea surface temperature] by reducing the efficiency of upwelling to cool the surface. Then that makes winds blow towards the (slightly) warmer water and really start the event. The nice part about this theory is that the Rossby waves can be observed for months before the reflection, which implies that El Niño is predictable.

The other idea is that the trigger is essentially random. The tropical convection (organized largescale thunderstorm activity) in the rising air tends to occur in bursts that last for about a month, and these bursts propagate out of the Indian Ocean (known as the Madden-Julian Oscillation). Since the storms are geostrophic (rotating according to the turning of the earth, which means they rotate clockwise in the southern hemisphere and counter-clockwise in the north), storm winds on the equator always blow towards the east. If the storms are strong enough, or last long enough, then those eastward winds may be enought to start the sloshing. But specific Madden-Julian Oscillation events are not predictable much in advance (just as specific weather events are not predictable in advance), and so to the extent that this is the main element, then El Niño will not be predictable.

In my opinion both these two processes can be important in different El Niños. Some models that did not have the MJO storms were successful in predicting the events of 1986-87 and 1991-92. That suggests that the Rossby wave part was a main influence at that time. But those same models have failed to predict the events since then, and the westerlies have appeared to come from nowhere. It is also quite possible that these two general sets of ideas are incomplete, and that there are other causes entirely. The fact that we have very intermittent skill at predicting the major turns of the ENSO cycle (as opposed to the very good forecasts that can be made once an event has begun) suggests that there remain important elements that are await explanation.

Next time I’ll talk a bit about mathematical models of the ENSO and another climate cycle — but please keep in mind that these cycles are still far from fully understood!

To hate is to study, to study is to understand, to understand is to appreciate, to appreciate is to love. So maybe I’ll end up loving your theory. – John Archibald Wheeler

The Azolla Event

18 November, 2010

My friend Bruce Smith just pointed out something I’d never heard of:

Azolla event, Wikipedia.

As you may recall, the dinosaurs were wiped out by an asteroid about 65 million years ago. Then came the Cenozoic Era: first the Paleocene, then the Eocene, and so on. Back in those days, the Earth was very warm compared to now:

Paleontologists call the peak of high temperatures the “Eocene optimum”. Back then, it was about 12 °C warmer on average. The polar regions were much warmer than today, perhaps as mild as the modern-day Pacific Northwest. In fact, giant turtles and alligators thrived north of the Arctic circle!

(“Optimum?” Yes: as if the arguments over global warming weren’t confusing enough already, paleontologists use the term “optimum” for any peak of high temperatures. I think that’s a bit silly. If you were a turtle north of the Arctic circle, it was indeed jolly optimal. But what matters now is not that certain temperature levels are inherently good or bad, but that the temperature is increasing too fast for life to easily adapt.)

Why did it get colder? This is a fascinating and important puzzle. And here’s one puzzle piece I’d never heard about. I don’t know how widely accepted this story is, but here’s how it goes:

In the early Eocene, the Arctic Ocean was almost entirely surrounded by land:

A surface layer of less salty water formed from inflowing rivers, and around 49 million years ago, vast blooms of freshwater fern Azolla began to grow in the Arctic Ocean. Apparently this stuff grows like crazy. And as bits of it died, it sank to the sea floor. This went on for about 800,000 years, and formed a layer 8 up to meters thick. And some scientists speculate that this process sucked up enough carbon dioxide to significantly chill the planet. Some say CO2 concentrations fell from 3500 ppm in the early Eocene to 650 ppm at around the time of this event!

I don’t understand much about this — I just wanted to mention it. After all, right now people are thinking about fertilizing the ocean to artificially create blooms of phytoplankton that’ll soak up CO2 and fall to the ocean floor. But if you want to read a well-informed blog article on this topic, try:

• Ole Nielsen, The Azolla event (dramatic bloom 49 million years ago).

By the way, there’s a nice graph of carbon dioxide concentrations here… inferred from boron isotope measurements:

• P. N. Pearson and M. R. Palmer, Atmospheric carbon dioxide concentrations over the past 60 million years, Nature 406 (2000), 695–699.

This Week’s Finds (Week 305)

5 November, 2010

Nathan Urban has been telling us about a paper where he estimated the probability that global warming will shut down a major current in the Atlantic Ocean:

• Nathan M. Urban and Klaus Keller, Probabilistic hindcasts and projections of the coupled climate, carbon cycle and Atlantic meridional overturning circulation system: a Bayesian fusion of century-scale observations with a simple model, Tellus A, July 16, 2010.

We left off last time with a cliff-hanger: I didn’t let him tell us what the probability is! Since you must have been clutching your chair ever since, you’ll be relieved to hear that the answer is coming now, in the final episode of this interview.

But it’s also very interesting how he and Klaus Keller got their answer. As you’ll see, there’s some beautiful math involved. So let’s get started…

JB: Last time you told us roughly how your climate model works. This time I’d like to ask you about the rest of your paper, leading up to your estimate of the probability that the Atlantic Meridional Overturning Current (or "AMOC") will collapse. But before we get into that, I’d like to ask some very general questions.

For starters, why are scientists worried that the AMOC might collapse?

Last time I mentioned the Younger Dryas event, a time when Europe became drastically colder for about 1300 years, starting around 10,800 BC. Lots of scientists think this event was caused by a collapse of the AMOC. And lots of them believe it was caused by huge amounts of fresh water pouring into the north Atlantic from an enormous glacial lake. But nothing quite like that is happening now! So if the AMOC collapses in the next few centuries, the cause would have to be a bit different.

NU: In order for the AMOC to collapse, the overturning circulation has to weaken. The overturning is driven by the sinking of cold and salty, and therefore dense, water in the north Atlantic. Anything that affects the density structure of the ocean can alter the overturning.

As you say, during the Younger Dryas, it is thought that a lot of fresh water suddenly poured into the Atlantic from the draining of a glacial lake. This lessened the density of the surface waters and reduced the rate at which they sank, shutting down the overturning.

Since there aren’t any large glacial lakes left that could abruptly drain into the ocean, the AMOC won’t shut down in the same way it previously did. But it’s still possible that climate change could cause it to shut down. The surface waters from the north Atlantic can still freshen (and become less dense), either due to the addition of fresh water from melting polar ice and snow, or due to increased precipitation to the northern latitudes. In addition, they can simply become warmer, which also makes them less dense, reducing their sinking rate and weakening the overturning.

In combination, these three factors (warming, increased precipitation, meltwater) can theoretically shut down the AMOC if they are strong enough. This will probably not be as abrupt or extreme an event as the Younger Dryas, but it can still persistently alter the regional climate.

JB: I’m trying to keep our readers in suspense for a bit longer, but I don’t think it’s giving away too much to say that when you run your model, sometimes the AMOC shuts down, or at least slows down. Can you say anything about how this tends to happen, when it does? In your model, that is. Can you tell if it’s mainly warming, or increased precipitation, or meltwater?

NU: The short answer is "mainly warming, probably". The long answer:

I haven’t done experiments with the box model myself to determine this, but I can quote from the Zickfeld et al. paper where this model was published. It says, for their baseline collapse experiment,

In the box model the initial weakening of the overturning circulation is mainly due to thermal forcing […] This effect is amplified by a negative feedback on salinity, since a weaker circulation implies reduced salt advection towards the northern latitudes.

Even if they turn off all the freshwater input, they find substantial weakening of the AMOC from warming alone.

Freshwater could potentially become the dominant effect on the AMOC if more freshwater is added than in the paper’s baseline experiment. The paper did report computer experiments with different freshwater inputs, but upon skimming it, I can’t immediately tell whether the thermal effect loses its dominance.

These experiments have also been performed using more complex climate models. This paper reports that in all the models they studied, the AMOC weakening is caused more by changes in surface heat flux than by changes in surface water flux:

• J. M. Gregory et al., A model intercomparison of changes in the Atlantic thermohaline circulation in response to increasing atmospheric CO2 concentration, Geophysical Research Letters 32 (2005), L12703.

However, that paper studied "best-estimate" freshwater fluxes, not the fluxes on the high end of what’s possible, so I don’t know whether thermal effects would still dominate if the freshwater input ends up being large. There are papers that suggest freshwater input from Greenland, at least, won’t be a dominant factor any time soon:

• J. H. Jungclaus et al., Will Greenland melting halt the thermohaline circulation?, Geophysical Research Letters 33 (2006), L17708.

• E. Driesschaert et al., Modeling the influence of Greenland ice sheet melting on the Atlantic meridional overturning circulation during the next millennia, Geophysical Research Letters 34 (2007), L10707.

I’m not sure what the situation is for precipitation, but I don’t think that would be much larger than the meltwater flux. In summary, it’s probably the thermal effects that dominate, both in complex and simpler models.

Note that in our version of the box model, the precipitation and meltwater fluxes are combined into one number, the "North Atlantic hydrological sensitivity", so we can’t distinguish between those sources of water. This number is treated as uncertain in our analysis, lying within a range of possible values determined from the hydrologic changes predicted by complex models. The Zickfeld et al. paper experimented with separating them into the two individual contributions, but my version of the model doesn’t do that.

JB: Okay. Now back to what you and Klaus Keller actually did in your paper. You have a climate model with a bunch of adjustable knobs, or parameters. Some of these parameters you take as "known" from previous research. Others are more uncertain, and that’s where the Bayesian reasoning comes in. Very roughly, you use some data to guess the probability that the right settings of these knobs lie within any given range.

How many parameters do you treat as uncertain?

NU: 18 parameters in total. 7 model parameters that control dynamics, 4 initial conditions, and 7 parameters describing error statistics.

JB: What are a few of these parameters? Maybe you can tell us about some of the most important ones — or ones that are easy to understand.

NU: I’ve mentioned these briefly in "week304" in the model description. The AMOC-related parameter is the hydrologic sensitivity I described above, controlling the flux of fresh water into the North Atlantic.

There are three climate related parameters:

• the climate sensitivity (the equilibrium warming expected in response to doubled CO2),

• the ocean heat vertical diffusivity (controlling the rate at which oceans absorb heat from the atmosphere), and

• "aerosol scaling", a factor that multiplies the strength of the aerosol-induced cooling effect, mostly due to uncertainties in aerosol-cloud interactions.

I discussed these in "week302" in the part about total feedback estimates.

There are also three carbon cycle related parameters:

• the heterotrophic respiration sensitivity (describing how quickly dead plants decay when it gets warmer),

• CO2 fertilization (how much faster plants grow in CO2-elevated conditions), and

• the ocean carbon vertical diffusivity (the rate at which the oceans absorb CO2 from the atmosphere).

The initial conditions describe what the global temperature, CO2 level, etc. were at the start of my model simulations, in 1850. The statistical parameters describe the variance and autocorrelation of the residual error between the observations and the model, due to measurement error, natural variability, and model error.

JB: Could you say a bit about the data you use to estimate these uncertain parameters? I see you use a number of data sets.

NU: We use global mean surface temperature and ocean heat content to constrain the three climate parameters. We use atmospheric CO2 concentration and some ocean flux measurements to constrain the carbon parameters. We use measurements of the AMOC strength to constrain the AMOC parameter. These are all time series data, mostly global averages — except the AMOC strength, which is an Atlantic-specific quantity defined at a particular latitude.

The temperature data are taken by surface weather stations and are for the years 1850-2009. The ocean heat data are taken by shipboard sampling, 1953-1996. The atmospheric CO2 concentrations are measured from the Mauna Loa volcano in Hawaii, 1959-2009. There are also some ice core measurements of trapped CO2 at Law Dome, Antarctica, dated to 1854-1953. The air-sea CO2 fluxes, for the 1980s and 1990s, are derived from measurements of dissolved inorganic carbon in the ocean, combined with measurements of manmade chlorofluorocarbon to date the water masses in which the carbon resides. (The dates tell you when the carbon entered the ocean.)

The AMOC strength is reconstructed from station measurements of poleward water circulation over an east-west section of the Atlantic Ocean, near 25 °N latitude. Pairs of stations measure the northward velocity of water, inferred from the ocean bottom pressure differences between northward and southward station pairs. The velocities across the Atlantic are combined with vertical density profiles to determine an overall rate of poleward water mass transport. We use seven AMOC strength estimates measured sparsely between the years 1957 and 2004.

JB: So then you start the Bayesian procedure. You take your model, start it off with your 18 parameters chosen somehow or other, run it from 1850 to now, and see how well it matches all this data you just described. Then you tweak the parameters a bit — last time we called that "turning the knobs" — and run the model again. And then you do this again and again, lots of times. The goal is to calculate the probability that the right settings of these knobs lie within any given range.

Is that about right?

NU: Yes, that’s right.

JB: About how many times did you actually run the model? Is the sort of thing you can do on your laptop overnight, or is it a mammoth task?

NU: I ran the model a million times. This took about two days on a single CPU. Some of my colleagues later ported the model from Matlab to Fortran, and now I can do a million runs in half an hour on my laptop.

JB: Cool! So if I understand correctly, you generated a million lists of 18 numbers: those uncertain parameters you just mentioned.

Or in other words: you created a cloud of points: a million points in an 18-dimensional space. Each point is a choice of those 18 parameters. And the density of this cloud near any point should be proportional to the probability that the parameters have those values.

That’s the goal, anyway: getting this cloud to approximate the right probability density on your 18-dimensional space. To get this to happen, you used the Markov chain Monte Carlo procedure we discussed last time.

Could you say in a bit more detail how you did this, exactly?

NU: There are two steps. One is to write down a formula for the probability of the parameters (the "Bayesian posterior distribution"). The second is to draw random samples from that probability distribution using Markov chain Monte Carlo (MCMC).

Call the parameter vector θ and the data vector y. The Bayesian posterior distribution p(θ|y) is a function of θ which says how probable θ is, given the data y that you’ve observed. The little bar (|) indicates conditional probability: p(θ|y) is the probability of θ, assuming that you know y happened.

The posterior factorizes into two parts, the likelihood and the prior. The prior, p(θ) says how probable you think a particular 18-dimensional vector of parameters is, before you’ve seen the data you’re using. It encodes your "prior knowledge" about the problem, unconditional on the data you’re using.

The likelihood, p(y|θ), says how likely it is for the observed data to arise from a model run using some particular vector of parameters. It describes your data generating process: assuming you know what the parameters are, how likely are you to see data that looks like what you actually measured? (The posterior is the reverse of this: how probable are the parameters, assuming the data you’ve observed?)

Bayes’s theorem simply says that the posterior is proportional to the product of these two pieces:

p(θ|y) ∝ p(y|θ) × p(θ)

If I know the two pieces, I multiply them together and use MCMC to sample from that probability distribution.

Where do the pieces come from? For the prior, we assumed bounded uniform distributions on all but one parameter. Such priors express the belief that each parameter lies within some range we deemed reasonable, but we are agnostic about whether one value within that range is more probable than any other. The exception is the climate sensitivity parameter. We have prior evidence from computer models and paleoclimate data that the climate sensitivity is most likely around 2 or 3 °C, albeit with significant uncertainties. We encoded this belief using a "diffuse" Cauchy distribution peaked in this range, but allowing substantial probability to be outside it, so as to not prematurely exclude too much of the parameter range based on possibly overconfident prior beliefs. We assume the priors on all the parameters are independent of each other, so the prior for all of them is the product of the prior for each of them.

For the likelihood, we assumed a normal (Gaussian) distribution for the residual error (the scatter of the data about the model prediction). The simplest such distribution is the independent and identically distributed ("iid") normal distribution, which says that all the data points have the same error and the errors at each data point are independent of each other. Neither of these assumptions is true. The errors are not identical, since they get bigger farther in the past, when we measured data with less precision than we do today. And they’re not independent, because if one year is warmer than the model predicts, the next year likely to be also warmer than the model predicts. There are various possible reasons for this: chaotic variability, time lags in the system due to finite heat capacity, and so on.

In this analysis, we kept the identical-error assumption for simplicity, even though it’s not correct. I think this is justifiable, because the strongest constraints on the parameters come from the most recent data, when the largest climate and carbon cycle changes have occurred. That is, the early data are already relatively uninformative, so if their errors get bigger, it doesn’t affect the answer much.

We rejected the independent-error assumption, since there is very strong autocorrelation (serial dependence) in the data, and ignoring autocorrelation is known to lead to overconfidence. When the errors are correlated, it’s harder to distinguish between a short-term random fluctuation and a true trend, so you should be more uncertain about your conclusions. To deal with this, we assumed that the errors obey a correlated autoregressive "red noise" process instead of an uncorrelated "white noise" process. In the likelihood, we converted the red-noise errors to white noise via a "whitening" process, assuming we know how much correlation is present. (We’re allowed to do that in the likelihood, because it gives the probability of the data assuming we know what all the parameters are, and the autocorrelation is one of the parameters.) The equations are given in the paper.

Finally, this gives us the formula for our posterior distribution.

JB: Great! There’s a lot of technical material here, so I have many questions, but let’s go through the whole story first, and come back to those.

NU: Okay. Next comes step two, which is to draw random samples from the posterior probability distribution via MCMC.

To do this, we use the famous Metropolis algorithm, which was invented by a physicist of that name, along with others, to do computations in statistical physics. It’s a very simple algorithm which takes a "random walk" through parameter space.

You start out with some guess for the parameters. You randomly perturb your guess to a nearby point in parameter space, which you are going to propose to move to. If the new point is more probable than the point you were at (according to the Bayesian posterior distribution), then accept it as a new random sample. If the proposed point is less probable than the point you’re at, then you randomly accept the new point with a certain probability. Otherwise you reject the move, staying where you are, treating the old point as a duplicate random sample.

The acceptance probability is equal to the ratio of the posterior distribution at the new point to the posterior distribution at the old point. If the point you’re proposing to move to is, say, 5 times less probable than the point you are at now, then there’s a 20% chance you should move there, and a 80% chance that you should stay where you are.

If you iterate this method of proposing new "jumps" through parameter space, followed by the Metropolis accept/reject procedure, you can prove that you will eventually end up with a long list of (correlated) random samples from the Bayesian posterior distribution.

JB: Okay. Now let me ask a few questions, just to help all our readers get up to speed on some jargon.

Lots of people have heard of a "normal distribution" or "Gaussian", because it’s become sort of the default choice for probability distributions. It looks like a bell curve:

When people don’t know the probability distribution of something — like the tail lengths of newts or the IQ’s of politicians — they often assume it’s a Gaussian.

But I bet fewer of our readers have heard of a "Cauchy distribution". What’s the point of that? Why did you choose that for your prior probability distribution of the climate sensitivity?

NU: There is a long-running debate about the "upper tail" of the climate sensitivity distribution. High climate sensitivities correspond to large amounts of warming. As you can imagine, policy decisions depend a lot on how likely we think these extreme outcomes could be, i.e., how quickly the "upper tail" of the probability distribution drops to zero.

A Gaussian distribution has tails that drop off exponentially quickly, so very high sensitivities will never get any significant weight. If we used it for our prior, then we’d almost automatically get a "thin tailed" posterior, no matter what the data say. We didn’t want to put that in by assumption and automatically conclude that high sensitivities should get no weight, regardless of what the data say. So we used a weaker assumption, which is a "heavy tailed" prior distribution. With this prior, the probability of large amounts of warming drops off more slowly, as a power law, instead of exponentially fast. If the data strongly rule out high warming, we can get a thin tailed posterior, but if they don’t, it will be heavy tailed. The Cauchy distribution, a limiting case of the "Student t" distribution that students of statistics may have heard of, is one of the most conservative choices for a heavy-tailed prior. Probability drops off so slowly at its tails that its variance is infinite.

JB: The issue of "fat tails" is also important in the stock market, where big crashes happen more frequently than you might guess with a Gaussian distribution. After the recent economic crisis I saw a lot of financiers walking around with their tails between their legs, wishing their tails had been fatter.

I’d also like to ask about "white noise" versus "red noise". "White noise" is a mathematical description of a situation where some quantity fluctuates randomly with time in a way so that it’s value at any time is completely uncorrelated with its value at any other time. If you graph an example of white noise, it looks really spiky:

If you play it as a sound, it sounds like hissy static — quite unpleasant. If you could play it in the form of light, it would look white, hence the name.

"Red noise" is less wild. Its value at any time is still random, but it’s correlated to the values at earlier or later times, in a specific way. So it looks less spiky:

and it sounds less high-pitched, more like a steady rainfall. Since it’s stronger at low frequencies, it would look more red if you could play it in the form of light — hence the name "red noise".

If understand correctly, you’re assuming that some aspects of the climate are noisy, but in a red noise kind of way, when you’re computing p(y|θ): the likelihood that your data takes on the value y, given your climate model with some specific choice of parameters θ.

Is that right? You’re assuming this about all your data: the temperature data from weather stations, the ocean heat data are from shipboard samples, the atmospheric CO2 concentrations at Mauna Loa volcano in Hawaii, the ice core measurements of trapped CO2, the air-sea CO2 fluxes, and also the AMOC strength? Red, red, red — all red noise?

NU: I think the red noise you’re talking about refers to a specific type of autocorrelated noise ("Brownian motion"), with a power spectrum that is inversely proportional to the square of frequency. I’m using "red noise" more generically to speak of any autocorrelated process that is stronger at low frequencies. Specifically, the process we use is a first-order autoregressive, or "AR(1)", process. It has a more complicated spectrum than Brownian motion.

JB: Right, I was talking about "red noise" of a specific mathematically nice sort, but that’s probably less convenient for you. AR(1) sounds easier for computers to generate.

NU: It’s not only easier for computers, but closer to the spectrum we see in our analysis.

Note that when I talk about error I mean "residual error", which is the difference between the observations and the model prediction. If the residual error is correlated in time, that doesn’t necessarily reflect true red noise in the climate system. It could also represent correlated errors in measurement over time, or systematic errors in the model. I am not attempting to distinguish between all these sources of error. I’m just lumping them all together into one total error process, and assuming it has a simple statistical form.

We assume the residual errors in the annual surface temperature, ocean heat, and instrumental CO2 time series are AR(1). The ice core CO2, air-sea CO2 flux, and AMOC strength data are sparse, and we can’t really hope to estimate the correlation between them, so we assume their residual errors are uncorrelated.

Speaking of correlation, I’ve been talking about "autocorrelation", which is correlation within one data set between one time and another. It’s also possible for the errors in different data sets to be correlated with each other ("cross correlation"). We assumed there is no cross correlation (and residual analysis suggests only weak correlation between data sets).

JB: I have a few more technical questions, but I bet most of our readers are eager to know: so, what next?

You use all these nifty mathematical methods to work out p(θ|y), the probability that your 18 parameters have any specific value given your data. And now I guess you want to figure out the probability that the Atlantic Meridional Overturning Current, or AMOC, will collapse by some date or other.

How do you do this? I guess most people want to know the answer more than the method, but they’ll just have to wait a few more minutes.

NU: That’s easy. After MCMC, we have a million runs of the model, sampled in proportion how well the model fits historic data. There will be lots of runs that agree well with the data, and a few that agree less well. All we do now is extend each of those runs into the future, using an assumed scenario for what CO2 emissions and other radiative forcings will do in the future. To find out the probability that the AMOC will collapse by some date, conditional on the assumptions we’ve made, we just count what fraction of the runs have an AMOC strength of zero in whatever year we care about.

JB: Okay, that’s simple enough. What scenario, or scenarios, did you consider?

NU: We considered a worst-case "business as usual" scenario in which we continue to burn fossil fuels at an accelerating rate until we start to run out of them, and eventually burn the maximum amount of fossil fuels we think there might be remaining (about 5000 gigatons worth of of carbon, compared to the roughly 500 gigatons we’ve emitted so far). This assumes we get desperate for cheap energy and extract all the hard-to-get fossil resources in oil shales and tar sands, all the remaining coal, etc. It doesn’t necessarily preclude the use of non-fossil energy; it just assumes that our appetite for energy grows so rapidly that there’s no incentive to slow down fossil fuel extraction. We used a simple economic model to estimate how fast we might do this, if the world economy continues to grow at a similar rate to the last few decades.

JB: And now for the big question: what did you find? How likely is it that the AMOC will collapse, according to your model? Of course it depends how far into the future you look.

NU: We find a negligible probability that the AMOC will collapse this century. The odds start to increase around 2150, rising to about a 10% chance by 2200, and a 35% chance by 2300, the last year considered in our scenario.

JB: I guess one can take this as good news or really scary news, depending on how much you care about folks who are alive in 2300. But I have some more questions. First, what’s a "negligible probability"?

NU: In this case, it’s less than 1 in 3000. For computational reasons, we only ran 3000 of the million samples forward into the future. There were no samples in this smaller selection that had the AMOC collapsed in 2100. The probability rises to 1 in 3000 in the year 2130 (the first time I see a collapse in this smaller selection), and 1% in 2152. You should take these numbers with a grain of salt. It’s these rare "tail-area events" that are most sensitive to modeling assumptions.

JB: Okay. And second, don’t the extrapolations become more unreliable as you keep marching further into the future? You need to model not only climate physics but also the world economy. In this calculation, how many gigatons of carbon dioxide per year are you assuming will be emitted in 2300? I’m just curious. In 1998 it was about 27.6 gigatons. By 2008, it was about 30.4.

NU: Yes, the uncertainty grows with time (and this is reflected in our projections). And in considering a fixed emissions scenario, we’ve ignored the economic uncertainty, which, so far out into the future, is even larger than the climate uncertainty. Here we’re concentrating on just the climate uncertainty, and are hoping to get an idea of bounds, so we used something close to a worst-case economic scenario. In this scenario carbon emissions peak around 2150 at about 23 gigatons carbon per year (84 gigatons CO2). By 2300 they’ve tapered off to about 4 GtC (15 GtCO2).

Actual future emissions may be less than this, if we act to reduce them, or there are fewer economically extractable fossil resources than we assume, or the economy takes a prolonged downturn, etc. Actually, it’s not completely an economic worst case; it’s possible that the world economy could grow even faster than we assume. And it’s not the worst case scenario from a climate perspective, either. For example, we don’t model potential carbon emissions from permafrost or methane clathrates. It’s also possible that climate sensitivity could be higher than what we find in our analysis.

JB: Why even bother projecting so far out into the future, if it’s so uncertain?

NU: The main reason is because it takes a while for the AMOC to weaken, so if we’re interested in what it would take to make it collapse, we have to run the projections out a few centuries. But another motivation for writing this paper is policy related, having to do with the concept of "climate commitment" or "triggering". Even if it takes a few centuries for the AMOC to collapse, it may take less time than that to reach a "point of no return", where a future collapse has already been unavoidably "triggered". Again, to investigate this question, we have to run the projections out far enough to get the AMOC to collapse.

We define "the point of no return" to be a point in time which, if CO2 emissions were immediately reduced to zero and kept there forever, the AMOC would still collapse by the year 2300 (an arbitrary date chosen for illustrative purposes). This is possible because even if we stop emitting new CO2, existing CO2 concentrations, and therefore temperatures, will remain high for a long time (see "week303").

In reality, humans wouldn’t be able to reduce emissions instantly to zero, so the actual "point of no return" would likely be earlier than what we find in our study. We couldn’t economically reduce emissions fast enough to avoid triggering an AMOC collapse. (In this study we ignore the possibility of negative carbon emissions, that is, capturing CO2 directly from the atmosphere and sequestering it for a long period of time. We’re also ignoring the possibility of climate geoengineering, which is global cooling designed to cancel out greenhouse warming.)

So what do we find? Although we calculate a negligible probability that the AMOC will collapse by the end of this century, the probability that, in this century, we will commit later generations to a collapse (by 2300) is almost 5%. The probabilities of "triggering" rise rapidly, to almost 20% by 2150 and about 33% by 2200, even though the probability of experiencing a collapse by those dates is about 1% and 10%, respectively. You can see it in this figure from our paper:

The take-home message is that while most climate projections are currently run out to 2100, we shouldn’t fixate only on what might happen to people this century. We should consider what climate changes our choices in this century, and beyond, are committing future generations to experiencing.

JB: That’s a good point!

I’d like to thank you right now for a wonderful interview, that really taught me — and I hope our readers — a huge amount about climate change and climate modelling. I think we’ve basically reached the end here, but as the lights dim and the audience files out, I’d like to ask just a few more technical questions.

One of them was raised by David Tweed. He pointed out that while you’re "training" your model on climate data from the last 150 years or so, you’re using it to predict the future in a world that will be different in various ways: a lot more CO2 in the atmosphere, hotter, and so on. So, you’re extrapolating rather than interpolating, and that’s a lot harder. It seems especially hard if the collapse of the AMOC is a kind of "tipping point" — if it suddenly snaps off at some point, instead of linearly decreasing as some parameter changes.

This raises the question: why should we trust your model, or any model of this sort, to make such extrapolations correctly? In the discussion after that comment, I think you said that ultimately it boils down to

1) whether you think you have the physics right,


2) whether you think the parameters change over time.

That makes sense. So my question is: what are some of the best ways people could build on the work you’ve done, and make more reliable predictions about the AMOC? There’s a lot at stake here!

NU: Our paper is certainly an early step in making probabilistic AMOC projections, with room for improvement. I view the main points as (1) estimating how large the climate-related uncertainties may be within a given model, and (2) illustrating the difference between experiencing, and committing to, a climate change. It’s certainly not an end-all "prediction" of what will happen 300 years from now, taking into account all possible model limitations, economic uncertainties, etc.

To answer your question, the general ways to improve predictions are to improve the models, and/or improve the data constraints. I’ll discuss both.

Although I’ve argued that our simple box model reasonably reproduces the dynamics of the more complex model it was designed to approximate, that complex model itself isn’t the best model available for the AMOC. The problem with using complex climate models is that it’s computationally impossible to run them millions of times. My solution is to work with "statistical emulators", which are tools for building fast approximations to slow models. The idea is to run the complex model a few times at different points in its parameter space, and then statistically interpolate the resulting outputs to predict what the model would have output at nearby points. This works if the model output is a smooth enough function of the parameters, and there are enough carefully-chosen "training" points.

From an oceanographic standpoint, even current complex models are probably not wholly adequate (see the discussion at the end of "week304"). There is some debate about whether the AMOC becomes more stable as the resolution of the model increases. On the other hand, people still have trouble getting the AMOC in models, and the related climate changes, to behave as abruptly as they apparently did during the Younger Dryas. I think the range of current models is probably in the right ballpark, but there is plenty of room for improvement. Model developers continue to refine their models, and ultimately, the reliability of any projection is constrained by the quality of models available.

Another way to improve predictions is to improve the data constraints. It’s impossible to go back in time and take better historic data, although with things like ice cores, it is possible to dig up new cores to analyze. It’s also possible to improve some historic "data products". For example, the ocean heat data is subject to a lot of interpolation of sparse measurements in the deep ocean, and one could potentially improve the interpolation procedure without going back in time and taking more data. There are also various corrections being applied for known biases in the data-gathering instruments and procedures, and it’s possible those could be improved too.

Alternatively, we can simply wait. Wait for new and more precise data to become available.

But when I say "improve the data constraints", I’m mostly talking about adding more of them, that I simply didn’t include in the analysis, or looking at existing data in more detail (like spatial patterns instead of global averages). For example, the ocean heat data mostly serves to constrain the vertical mixing parameter, controlling how quickly heat penetrates into the deep ocean. But we can also look at the penetration of chemicals in the ocean (such carbon from fossil fuels, or chlorofluorocarbons). This is also informative about how quickly water masses mix down to the ocean depths, and indirectly informative about how fast heat mixes. I can’t do that with my simple model (which doesn’t have the ocean circulation of any of these chemicals in it), but I can with more complex models.

As another example, I could constrain the climate sensitivity parameter better with paleoclimate data, or more resolved spatial data (to try to, e.g., pick up the spatial fingerprint of industrial aerosols in the temperature data), or by looking at data sets informative about particular feedbacks (such as water vapor), or at satellite radiation budget data.

There is a lot of room for reducing uncertainties by looking at more and more data sets. However, this presents its own problems. Not only is this simply harder to do, but it runs more directly into limitations in the models and data. For example, if I look at what ocean temperature data implies about a model’s vertical mixing parameter, and what ocean chemical data imply, I might find that they imply two inconsistent values for the parameter! Or that those data imply a different mixing than is implied by AMOC strength measurements. This can happen if there are flaws in the model (or in the data). We have some evidence from other work that there are circumstances in which this can happen:

• A. Schmittner, N. M. Urban, K. Keller and D. Matthews, Using tracer observations to reduce the uncertainty of ocean diapycnal mixing and climate-carbon cycle projections, Global Biogeochemical Cycles 23 (2009), GB4009.

• M. Goes, N. M. Urban, R. Tonkonojenkov, M. Haran, and K. Keller, The skill of different ocean tracers in reducing uncertainties about projections of the Atlantic meridional overturning circulation, Journal of Geophysical Research — Oceans, in press (2010).

How to deal with this, if and when it happens, is an open research challenge. To an extent it depends on expert judgment about which model features and data sets are "trustworthy". Some say that expert judgment renders conclusions subjective and unscientific, but as a scientist, I say that such judgments are always applied! You always weigh how much you trust your theories and your data when deciding what to conclude about them.

In my response I’ve so far ignored the part about parameters changing in time. I think the hydrological sensitivity (North Atlantic freshwater input as a function of temperature) can change with time, and this could be improved by using a better climate model that includes ice and precipitation dynamics. Feedbacks can fluctuate in time, but I think it’s okay to treat them as a constant for long term projections. Some of these parameters can also be spatially dependent (e.g., the respiration sensitivity in the carbon cycle). I think treating them all as constant is a decent first approximation for the sorts of generic questions we’re asking in the paper. Also, all the parameter estimation methods I’ve described only work with static parameters. For time varying parameters, you need to get into state estimation methods like Kalman or particle filters.

JB: I also have another technical question, which is about the Markov chain Monte Carlo procedure. You generate your cloud of points in 18-dimensional space by a procedure where you keep either jumping randomly to a nearby point, or staying put, according to that decision procedure you described. Eventually this cloud fills out to a good approximation of the probability distribution you want. But, how long is "eventually"? You said you generated a million points. But how do you know that’s enough?

NU: This is something of an art. Although there is an asymptotic convergence theorem, there is no general way of knowing whether you’ve reached convergence. First you check to see whether your chains "look right". Are they sweeping across the full range of parameter space where you expect significant probability? Are they able to complete many sweeps (thoroughly exploring parameter space)? Is the Metropolis test accepting a reasonable fraction of proposed moves? Do you have enough effective samples in your Markov chain? (MCMC generates correlated random samples, so there are fewer "effectively independent" samples in the chain than there are total samples.) Then you can do consistency checks: start the chains at several different locations in parameter space, and see if they all converge to similar distributions.

If the posterior distribution shows, or is expected to show, a lot of correlation between parameters, you have to be more careful to ensure convergence. You want to propose moves that carry you along the "principal components" of the distribution, so you don’t waste time trying to jump away from the high probability directions. (Roughly, if your posterior density is concentrated on some low dimensional manifold, you want to construct your way of moving around parameter space to stay near that manifold.) You also have to be careful if you see, or expect, multimodality (multiple peaks in the probability distribution). It can be hard for MCMC to move from one mode to another through a low-probability "wasteland"; it won’t be inclined to jump across it. There are more advanced algorithms you can use in such situations, if you suspect you have multimodality. Otherwise, you might discover later that you only sampled one peak, and never noticed that there were others.

JB: Did you do some of these things when testing out the model in your paper? Do you have any intuition for the "shape" of the probability distribution in 18-dimensional space that lies at the heart of your model? For example: do you know if it has one peak, or several?

NU: I’m pretty confident that the MCMC in our analysis is correctly sampling the shape of the probability distribution. I ran lots and lots of analyses, starting the chain in different ways, tweaking the proposal distribution (jumping rule), looking at different priors, different model structures, different data, and so on.

It’s hard to "see" what an 18-dimensional function looks like, but we have 1-dimensional and 2-dimensional projections of it in our paper:

I don’t believe that it has multiple peaks, and I don’t expect it to. Multiple peaks usually show up when the model behavior is non-monotonic as a function of the parameters. This can happen in really nonlinear systems (an with threshold systems like the AMOC), but during the historic period I’m calibrating the model to, I see no evidence of this in the model.

There are correlations between parameters, so there are certain "directions" in parameter space that the posterior distribution is oriented along. And the distribution is not Gaussian. There is evidence of skew, and nonlinear correlations between parameters. Such correlations appear when the data are insufficient to completely identify the parameters (i.e., different combinations of parameters can produce similar model output). This is discussed in more detail in another of our papers:

• Nathan M. Urban and Klaus Keller, Complementary observational constraints on climate sensitivity, Geophysical Research Letters 36 (2009), L04708.

In a Gaussian distribution, the distribution of any pair of parameters will look ellipsoidal, but our distribution has some "banana" or "boomerang" shaped pairwise correlations. This is common, for example, when the model output is a function of the product of two parameters.

JB: Okay. It’s great that we got a chance to explore some of the probability theory and statistics underlying your work. It’s exciting for me to see these ideas being used to tackle a big real-life problem. Thanks again for a great interview.

Maturity is the capacity to endure uncertainty. – John Finley