Energy Return on Energy Invested

27 October, 2010

The Azimuth Project wiki has been up and running for exactly one month!

We’ve built up a nice bunch of articles sketching some of the biggest environmental problems we face today — and some ideas for dealing with them. I invite you to look these over and improve them! It’s very easy to do.

I also invite you to join us at the Azimuth Forum, where we are deciding the fate of humanity (or something like that). We need your help!

In the weeks to come I want to tell you what we’ve learned so far. I especially want to talk about various plans of action that people have formulated to tackle global warming. Even sitting here in the comfort of this cozy blog, you can help us compare and criticize these plans.

But I also want to tell you about some interesting concepts. And the first is EROEI, or “Energy Return On Energy Invested”. The Azimuth Project entry on this concept was largely written by David Tweed. Three cheers for David Tweed!

It also had help from Eric Forgy, Graham Jones and David Pollard, and a major contribution from Anonymous Coward. I’ll shorten it and amp it up for the purposes of this blog. I know you’re here to be entertained.

The Idea

You’ve probably heard the saying “it takes money to make money”. Similarly, it takes energy to make energy. More precisely, it takes useful energy to make useful energy.

Energy Returned On Energy Invested or EROEI captures this idea: it’s simply the ratio of “useful energy acquired” to “useful energy expended”. Note that money does not enter into this concept. The difficult and often heated debate arises when we try to decide which inputs and outputs count as “useful”.

There are other names for this concept and closely related concepts. “Energy profit ratio”, “surplus energy”, “energy gain”, and “EROI”, and EROEI all describe virtually the same idea: how much energy we receive per energy put in. See:

• Nate Hagen, A Net Energy Parable: Why is ERoEI Important?, The Oil Drum, 2006.

The concept of “energy yield ratio” is also very similar, but tends to be used in slightly different ways. See the Azimuth Project article for more.

Details

The definition of EROEI for a process of “extracting energy”
is the useful acquired energy divided by the useful energy expended. The “useful” tag denotes energy which is usable by human beings now. For example: a supernova wastes a lot of energy in the process of making uranium and blasting it out into space. But that was done long before we came along, so it makes no sense to include it in the EROEI inputs.

In practice, people include inputs and outputs that aren’t strictly “energies”, but rather “substances from which energy can be extracted”. For example: one could look at the EROEI of growing trees for fuel, where the wood produced is counted as an output according to the energy extractable by burning.

In general, having a high EROEI value counts as “good”. Indeed, when the EROEI drops below 1 more energy is being used in the extraction process than is being output at the end! But because it only considers energy issues (and not resource scarcity, scalability, pollution, etc.), EROEI should only be one input into our process of deciding on technologies and actions.

When it comes to computing EROEI, the hard part is deciding which inputs and outputs should be included in the ratio — particularly since this involves considering which other competing processes are genuinely viable.

Another complication is that while various forms of energy can generally be converted to each other, this will incur losses due to conversion inefficiencies. So, you can’t look at two schemes with the same useful energy inputs that produce different kinds of energy — e.g., electricity and heat — and declare the one with the higher EROEI as more suitable.

Examples

To see some of the difficulties in calculating an EROEI, let’s imagine growing a crop of grass and then fermenting it to produce a liquid fuel. The most obvious inputs and outputs are:

“Energy” outputs:

1. The liquid fuel itself. This is unarguably useful output “energy”.

2. There may be excess heat produced by the fermentation process. Whether this is useful is debatable since the energy is of high entropy and produced at plants located away from energy consumers.

3. The remaining biomass may be suitable for burning. Again the usefulness is debatable, since the biomass may be better used for fertilising the fields used to grow the crop. Even if this isn’t the case, the biomass may require yet more energy to collect into a dry, burnable state.

“Energy” inputs:

1. Sunlight. Except for exceptional circumstances, there is no other use for sunlight falling on fields so this does not count as a useful input.

2. Artificial fertilizer. This requires energy to produce and could be used for growing food or other crops, so it definitely counts as a useful energy input.

3. Energy used by motorized vehicles, both during farming and transportation to the biomass plant. For the same reasons as fertilizer, this counts as a useful energy input.

4. Mechanical energy used to extract liquid fuel after fermentation and clear waste products from the apparatus. Again a useful energy input.

Thus one computation of EROEI would count outputs 1 and inputs 2, 3 and 4.

However, suppose that the grass crop is genuinely being grown for other reasons — e.g., as part of a crop rotation scheme — and the plant is sufficiently small that the excess heat can be used fully by the plant for staff heating. Then you could argue that the EROEI should count outputs 1 and 2 and count inputs 3 and 4. So, to determine the EROEI you need to decide which alternative uses are genuinely viable.

Note also that this EROEI calculation is purely about energy! It does not reflect issues such as whether the land usage is sustainable, possible soil depletion/erosion, scarcity of mineral inputs for artificial fertilizer, etc.

Comparison

Okay, but enough of these nuances and caveats. Important as they are, I know what you really want: a list of different forms of energy and their EROEI’s!

Natural gas: 10:1
Coal: 50:1
Oil (Ghawar supergiant field): 100:1
Oil (global average): 19:1
Tar sands: 5.2:1 to 5.8:1
Oil shale: 1.5:1 to 4:1

Wind: 18:1
Hydro: 11:1 to 267:1
Waves: 15:1
Tides: ~ 6:1
Geothermal power: 2:1 to 13:1
Solar photovoltaic power: 3.75:1 to 10:1
Solar thermal: 1.6:1

Nuclear power: 1.1:1 to 15:1

Biodiesel: 1.9:1 to 9:1
Ethanol: 0.5:1 to 8:1

This list comes from:

• Richard Heinberg, Searching for a Miracle: ‘Net Energy’ Limits & the Fate of Industrial Society.

You can read this report for more details on how he computed these numbers. If you’re like me, you’ll take a perverse interest in forms of energy production with the lowest EROEIs. For example, what idiot would make ethanol in a way that yields only half as much useful energy as it takes to make the stuff?

The US government, that’s who: the powerful corn lobby has been getting subsidies for some highly inefficient forms of biofuel! But things vary a lot from place to place: corn grows better in the heart of the corn belt (like Iowa) than near the edges (like Texas). So, the production of a bushel of corn in Iowa costs 43 megajoules of energy on average, while in Texas it costs 71 megajoules.

Similar, ethanol from sugar cane in Brazil has an EROEI of 8:1 to 10:1, but when made from Louisiana sugar cane in the United States, the EROEI is closer to 1:1.

“Solar thermal” also comes out looking bad in the table above, with an EROEI of just 1.6:1. But what’s “solar thermal”? Heinberg has a section on “active” or “concentrating” solar thermal power, where you focus sunlight to heat a liquid to drive a turbine. He also has one on “passive” solar, where you heat your house, or water, by sun falling on it. But he doesn’t give EROEI’s in either of these sections — unlike the sections on other forms of energy. So I can’t see where this figure of 1.6 is coming from.

Anyway, there’s a lot to think about here. Each one of the numbers listed above could serve as the starting-point for a fascinating discussion! Let’s start…


Information Geometry (Part 3)

25 October, 2010

So far in this series of posts I’ve been explaining a paper by Gavin Crooks. Now I want to go ahead and explain a little research of my own.

I’m not claiming my results are new — indeed I have no idea whether they are, and I’d like to hear from any experts who might know. I’m just claiming that this is some work I did last weekend.

People sometimes worry that if they explain their ideas before publishing them, someone will ‘steal’ them. But I think this overestimates the value of ideas, at least in esoteric fields like mathematical physics. The problem is not people stealing your ideas: the hard part is giving them away. And let’s face it, people in love with math and physics will do research unless you actively stop them. I’m reminded of this scene from the Marx Brothers movie where Harpo and Chico, playing wandering musicians, walk into a hotel and offer to play:

Groucho: What do you fellows get an hour?

Chico: Oh, for playing we getta ten dollars an hour.

Groucho: I see…What do you get for not playing?

Chico: Twelve dollars an hour.

Groucho: Well, clip me off a piece of that.

Chico: Now, for rehearsing we make special rate. Thatsa fifteen dollars an hour.

Groucho: That’s for rehearsing?

Chico: Thatsa for rehearsing.

Groucho: And what do you get for not rehearsing?

Chico: You couldn’t afford it.

So, I’m just rehearsing in public here — but I of course I hope to write a paper about this stuff someday, once I get enough material.

Remember where we were. We had considered a manifold — let’s finally give it a name, say M — that parametrizes Gibbs states of some physical system. By Gibbs state, I mean a state that maximizes entropy subject to constraints on the expected values of some observables. And we had seen that in favorable cases, we get a Riemannian metric on M! It looks like this:

g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle

where X_i are our observables, and the angle bracket means ‘expected value’.

All this applies to both classical or quantum mechanics. Crooks wrote down a beautiful formula for this metric in the classical case. But since I’m at the Centre for Quantum Technologies, not the Centre for Classical Technologies, I redid his calculation in the quantum case. The big difference is that in quantum mechanics, observables don’t commute! But in the calculations I did, that didn’t seem to matter much — mainly because I took a lot of traces, which imposes a kind of commutativity:

\mathrm{tr}(AB) = \mathrm{tr}(BA)

In fact, if I’d wanted to show off, I could have done the classical and quantum cases simultaneously by replacing all operators by elements of any von Neumann algebra equipped with a trace. Don’t worry about this much: it’s just a general formalism for treating classical and quantum mechanics on an equal footing. One example is the algebra of bounded operators on a Hilbert space, with the usual concept of trace. Then we’re doing quantum mechanics as usual. But another example is the algebra of suitably nice functions on a suitably nice space, where taking the trace of a function means integrating it. And then we’re doing classical mechanics!

For example, I showed you how to derive a beautiful formula for the metric I wrote down a minute ago:

g_{ij} = \mathrm{Re} \, \mathrm{tr}(\rho \; \frac{\partial \mathrm{ln} \rho }{\partial \lambda^i} \; \frac{\partial \mathrm{ln} \rho }{\partial \lambda^j} )

But if we want to do the classical version, we can say Hey, presto! and write it down like this:

g_{ij} = \int_\Omega p(\omega) \; \frac{\partial \mathrm{ln} p(\omega) }{\partial \lambda^i} \; \frac{\partial \mathrm{ln} p(\omega) }{\partial \lambda^j} \; d \omega

What did I do just now? I changed the trace to an integral over some space \Omega. I rewrote \rho as p to make you think ‘probability distribution’. And I don’t need to take the real part anymore, since is everything already real when we’re doing classical mechanics. Now this metric is the Fisher information metric that statisticians know and love!

In what follows, I’ll keep talking about the quantum case, but in the back of my mind I’ll be using von Neumann algebras, so everything will apply to the classical case too.

So what am I going to do? I’m going to fix a big problem with the story I’ve told so far.

Here’s the problem: so far we’ve only studied a special case of the Fisher information metric. We’ve been assuming our states are Gibbs states, parametrized by the expectation values of some observables X_1, \dots, X_n. Our manifold M was really just some open subset of \mathbb{R}^n: a point in here was a list of expectation values.

But people like to work a lot more generally. We could look at any smooth function \rho from a smooth manifold M to the set of density matrices for some quantum system. We can still write down the metric

g_{ij} = \mathrm{Re} \, \mathrm{tr}(\rho \; \frac{\partial \mathrm{ln} \rho }{\partial \lambda^i} \; \frac{\partial \mathrm{ln} \rho }{\partial \lambda^j} )

in this more general situation. Nobody can stop us! But it would be better if we could derive this formula, as before, starting from a formula like the one we had before:

g_{ij} = \mathrm{Re} \langle \, (X_i - \langle X_i \rangle) \, (X_j - \langle X_j \rangle)  \, \rangle

The challenge is that now we don’t have observables X_i to start with. All we have is a smooth function \rho from some manifold to some set of states. How can we pull observables out of thin air?

Well, you may remember that last time we had

\rho = \frac{1}{Z} e^{-\lambda^i X_i}

where \lambda^i were some functions on our manifold and

Z = \mathrm{tr}(e^{-\lambda^i X_i})

was the partition function. Let’s copy this idea.

So, we’ll start with our density matrix \rho, but then write it as

\rho = \frac{1}{Z} e^{-A}

where A is some self-adjoint operator and

Z = \mathrm{tr} (e^{-A})

(Note that A, like \rho, is really an operator-valued function on M. So, I should write something like A(x) to denote its value at a particular point x \in M, but I won’t usually do that. As usual, I expect some intelligence on your part!)

Now we can repeat some calculations I did last time. As before, let’s take the logarithm of \rho:

\mathrm{ln} \, \rho = -A - \mathrm{ln}\,  Z

and then differentiate it. Suppose \lambda^i are local coordinates near some point of M. Then

\frac{\partial}{\partial \lambda^i} \mathrm{ln}\, \rho = - \frac{\partial}{\partial \lambda^i} A - \frac{1}{Z} \frac{\partial }{\partial \lambda^i} Z

Last time we had nice formulas for both terms on the right-hand side above. To get similar formulas now, let’s define operators

X_i = \frac{\partial}{\partial \lambda^i} A

This gives a nice name to the first term on the right-hand side above. What about the second term? We can calculate it out:

\frac{1}{Z} \frac{\partial }{\partial \lambda^i} Z = \frac{1}{Z}  \frac{\partial }{\partial \lambda^i} \mathrm{tr}(e^{-A}) = \frac{1}{Z}  \mathrm{tr}(\frac{\partial }{\partial \lambda^i} e^{-A}) = - \frac{1}{Z}  \mathrm{tr}(e^{-A} \frac{\partial}{\partial \lambda^i} A)

where in the last step we use the chain rule. Next, use the definition of \rho and X_i, and get:

\frac{1}{Z} \frac{\partial }{\partial \lambda^i} Z = - \mathrm{tr}(\rho X_i) = - \langle X_i \rangle

This is just what we got last time! Ain’t it fun to calculate when it all works out so nicely?

So, putting both terms together, we see

\frac{\partial}{\partial \lambda^i} \mathrm{ln} \rho = - X_i + \langle X_i \rangle

or better:

X_i - \langle X_i \rangle = -\frac{\partial}{\partial \lambda^i} \mathrm{ln} \rho

This is a nice formula for the ‘fluctuation’ of the observables X_i, meaning how much they differ from their expected values. And it looks exactly like the formula we had last time! The difference is that last time we started out assuming we had a bunch of observables, X_i, and defined \rho to be the state maximizing the entropy subject to constraints on the expectation values of all these observables.
Now we’re starting with \rho and working backwards.

From here on out, it’s easy. As before, we can define g_{ij} to be the real part of the covariance matrix:

g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle

Using the formula

X_i - \langle X_i \rangle = -\frac{\partial}{\partial \lambda^i} \mathrm{ln} \rho

we get

g_{ij} = \mathrm{Re} \langle \frac{\partial \mathrm{ln} \rho}{\partial \lambda^i} \; \frac{\partial \mathrm{ln} \rho}{\partial \lambda^j} \rangle

or

g_{ij} = \mathrm{Re}\,\mathrm{tr}(\rho \; \frac{\partial \mathrm{ln} \rho}{\partial \lambda^i} \; \frac{\partial \mathrm{ln} \rho}{\partial \lambda^j})

Voilà!

When this matrix is positive definite at every point, we get a Riemanian metric on M. Last time I said this is what people call the ‘Bures metric’ — though frankly, now that I examine the formulas, I’m not so sure. But in the classical case, it’s called the Fisher information metric.

Differential geometers like to use \partial_i as a shorthand for \frac{\partial}{\partial_i}, so they’d write down our metric in a prettier way:

g_{ij} = \mathrm{Re} \, \mathrm{tr}(\rho \; \partial_i (\mathrm{ln} \, \rho) \; \partial_j (\mathrm{ln} \, \rho) )

Differential geometers like coordinate-free formulas, so let’s also give a coordinate-free formula for our metric. Suppose x \in M is a point in our manifold, and suppose v,w are tangent vectors to this point. Then

g(v,w) = \mathrm{Re} \, \langle v(\mathrm{ln}\, \rho) \; w(\mathrm{ln} \,\rho) \rangle \; = \; \mathrm{Re} \,\mathrm{tr}(\rho \; v(\mathrm{ln}\, \rho) \; w(\mathrm{ln}\, \rho))

Here \mathrm{ln}\, \rho is a smooth operator-valued function on M, and v(\mathrm{ln}\, \rho) means the derivative of this function in the v direction at the point x.

So, this is all very nice. To conclude, two more points: a technical one, and a more important philosophical one.

First, the technical point. When I said \rho could be any smooth function from a smooth manifold to some set of states, I was actually lying. That’s an important pedagogical technique: the brazen lie.

We can’t really take the logarithm of every density matrix. Remember, we take the log of a density matrix by taking the log of all its eigenvalues. These eigenvalues are ≥ 0, but if one of them is zero, we’re in trouble! The logarithm of zero is undefined.

On the other hand, there’s no problem taking the logarithm of our density-matrix-valued function \rho when it’s positive definite at each point of M. You see, a density matrix is positive definite iff its eigenvalues are all > 0. In this case it has a unique self-adjoint logarithm.

So, we must assume \rho is positive definite. But what’s the physical significance of this ‘positive definiteness’ condition? Well, any density matrix can be diagonalized using some orthonormal basis. It can then be seen as probabilistic mixture — not a quantum superposition! — of pure states taken from this basis. Its eigenvalues are the probabilities of finding the mixed state to be in one of these pure states. So, saying that all its eigenvalues are all > 0 amounts to saying that all the pure states in this orthonormal basis show up with nonzero probability! Intuitively, this means our mixed state is ‘really mixed’. For example, it can’t be a pure state. In math jargon, it means our mixed state is in the interior of the convex set of mixed states.

Second, the philosophical point. Instead of starting with the density matrix \rho, I took A as fundamental. But different choices of A give the same \rho. After all,

\rho = \frac{1}{Z} e^{-A}

where we cleverly divide by the normalization factor

Z = \mathrm{tr} (e^{-A})

to get \mathrm{tr} \, \rho = 1. So, if we multiply e^{-A} by any positive constant, or indeed any positive function on our manifold M, \rho will remain unchanged!

So we have added a little extra information when switching from \rho to A. You can think of this as ‘gauge freedom’, because I’m saying we can do any transformation like

A \mapsto A + f

where

f: M \to \mathbb{R}

is a smooth function. This doesn’t change \rho, so arguably it doesn’t change the ‘physics’ of what I’m doing. It does change Z. It also changes the observables

X_i = \frac{\partial}{\partial \lambda^i} A

But it doesn’t change their ‘fluctuations’

X_i - \langle X_i \rangle

so it doesn’t change the metric g_{ij}.

This gauge freedom is interesting, and I want to understand it better. It’s related to something very simple yet mysterious. In statistical mechanics the partition function Z begins life as ‘just a normalizing factor’. If you change the physics so that Z gets multiplied by some number, the Gibbs state doesn’t change. But then the partition function takes on an incredibly significant role as something whose logarithm you differentiate to get lots of physically interesting information! So in some sense the partition function doesn’t matter much… but changes in the partition function matter a lot.

This is just like the split personality of phases in quantum mechanics. On the one hand they ‘don’t matter’: you can multiply a unit vector by any phase and the pure state it defines doesn’t change. But on the other hand, changes in phase matter a lot.

Indeed the analogy here is quite deep: it’s the analogy between probabilities in statistical mechanics and amplitudes in quantum mechanics, the analogy between \mathrm{exp}(-\beta H) in statistical mechanics and \mathrm{exp}(-i t H / \hbar) in quantum mechanics, and so on. This is part of a bigger story about ‘rigs’ which I told back in the Winter 2007 quantum gravity seminar, especially in week13. So, it’s fun to see it showing up yet again… even though I don’t completely understand it here.

[Note: in the original version of this post, I omitted the real part in my definition g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle , giving a ‘Riemannian metric’ that was neither real nor symmetric in the quantum case. Most of the comments below are based on that original version, not the new fixed one.]


Information Geometry (Part 2)

23 October, 2010

Last time I provided some background to this paper:

• Gavin E. Crooks, Measuring thermodynamic length.

Now I’ll tell you a bit about what it actually says!

Remember the story so far: we’ve got a physical system that’s in a state of maximum entropy. I didn’t emphasize this yet, but that happens whenever our system is in thermodynamic equilibrium. An example would be a box of gas inside a piston. Suppose you choose any number for the energy of the gas and any number for its volume. Then there’s a unique state of the gas that maximizes its entropy, given the constraint that on average, its energy and volume have the values you’ve chosen. And this describes what the gas will be like in equilibrium!

Remember, by ‘state’ I mean mixed state: it’s a probabilistic description. And I say the energy and volume have chosen values on average because there will be random fluctuations. Indeed, if you look carefully at the head of the piston, you’ll see it quivering: the volume of the gas only equals the volume you’ve specified on average. Same for the energy.

More generally: imagine picking any list of numbers, and finding the maximum entropy state where some chosen observables have these numbers as their average values. Then there will be fluctuations in the values of these observables — thermal fluctuations, but also possibly quantum fluctuations. So, you’ll get a probability distribution on the space of possible values of your chosen observables. You should visualize this probability distribution as a little fuzzy cloud centered at the average value!

To a first approximation, this cloud will be shaped like a little ellipsoid. And if you can pick the average value of your observables to be whatever you’ll like, you’ll get lots of little ellipsoids this way, one centered at each point. And the cool idea is to imagine the space of possible values of your observables as having a weirdly warped geometry, such that relative to this geometry, these ellipsoids are actually spheres.

This weirdly warped geometry is an example of an ‘information geometry’: a geometry that’s defined using the concept of information. This shouldn’t be surprising: after all, we’re talking about maximum entropy, and entropy is related to information. But I want to gradually make this idea more precise.

Bring on the math!

We’ve got a bunch of observables X_1, \dots , X_n, and we’re assuming that for any list of numbers x_1, \dots , x_n, the system has a unique maximal-entropy state \rho for which the expected value of the observable X_i is x_i:

\langle X_i \rangle = x_i

This state \rho is called the Gibbs state and I told you how to find it when it exists. In fact it may not exist for every list of numbers x_1, \dots , x_n, but we’ll be perfectly happy if it does for all choices of

x = (x_1, \dots, x_n)

lying in some open subset of \mathbb{R}^n

By the way, I should really call this Gibbs state \rho(x) or something to indicate how it depends on x, but I won’t usually do that. I expect some intelligence on your part!

Now at each point x we can define a covariance matrix

\langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle

If we take its real part, we get a symmetric matrix:

g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle

It’s also nonnegative — that’s easy to see, since the variance of a probability distribution can’t be negative. When we’re lucky this matrix will be positive definite. When we’re even luckier, it will depend smoothly on x. In this case, g_{ij} will define a Riemannian metric on our open set.

So far this is all review of last time. Sorry: I seem to have reached the age where I can’t say anything interesting without warming up for about 15 minutes first. It’s like when my mom tells me about an exciting event that happened to her: she starts by saying “Well, I woke up, and it was cloudy out…”

But now I want to give you an explicit formula for the metric g_{ij}, and then rewrite it in a way that’ll work even when the state \rho is not a maximal-entropy state. And this formula will then be the general definition of the ‘Fisher information metric’ (if we’re doing classical mechanics), or a quantum version thereof (if we’re doing quantum mechanics).

Crooks does the classical case — so let’s do the quantum case, okay? Last time I claimed that in the quantum case, our maximum-entropy state is the Gibbs state

\rho = \frac{1}{Z} e^{-\lambda^i X_i}

where \lambda^i are the ‘conjugate variables’ of the observables X_i, we’re using the Einstein summation convention to sum over repeated indices that show up once upstairs and once downstairs, and Z is the partition function

Z = \mathrm{tr} (e^{-\lambda^i X_i})

(To be honest: last time I wrote the indices on the conjugate variables \lambda^i as subscripts rather than superscripts, because I didn’t want some poor schlep out there to think that \lambda^1, \dots , \lambda^n were the powers of some number \lambda. But now I’m assuming you’re all grown up and ready to juggle indices! We’re doing Riemannian geometry, after all.)

Also last time I claimed that it’s tremendously fun and enlightening to take the derivative of the logarithm of Z. The reason is that it gives you the mean values of your observables:

\langle X_i \rangle = - \frac{\partial}{\partial \lambda^i} \ln Z

But now let’s take the derivative of the logarithm of \rho. Remember, \rho is an operator — in fact a density matrix. But we can take its logarithm as explained last time, and the usual rules apply, so starting from

\rho = \frac{1}{Z} e^{-\lambda^i X_i}

we get

\mathrm{ln}\, \rho = - \lambda^i X_i - \mathrm{ln} \,Z

Next, let’s differentiate both sides with respect to \lambda^i. Why? Well, from what I just said, you should be itching to differentiate \mathrm{ln}\, Z. So let’s give in to that temptation:

\frac{\partial  }{\partial \lambda^i} \mathrm{ln}  \rho = -X_i + \langle X_i \rangle

Hey! Now we’ve got a formula for the ‘fluctuation’ of the observable X_i — that is, how much it differs from its mean value:

X_i - \langle X_i \rangle = - \frac{\partial \mathrm{ln} \rho }{\partial \lambda^i}

This is incredibly cool! I should have learned this formula decades ago, but somehow I just bumped into it now. I knew of course that \mathrm{ln} \, \rho shows up in the formula for the entropy:

S(\rho) = \mathrm{tr} ( \rho \; \mathrm{ln} \, \rho)

But I never had the brains to think about \mathrm{ln}\, \rho all by itself. So I’m really excited to discover that it’s an interesting entity in its own right — and fun to differentiate, just like \mathrm{ln}\, Z.

Now we get our cool formula for g_{ij}. Remember, it’s defined by

g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle

But now that we know

X_i - \langle X_i \rangle = -\frac{\partial \mathrm{ln} \rho }{\partial \lambda^i}

we get the formula we were looking for:

g_{ij} = \mathrm{Re} \langle \frac{\partial \mathrm{ln} \rho }{\partial \lambda^i} \; \frac{\partial \mathrm{ln} \rho }{\partial \lambda^j}  \rangle

Beautiful, eh? And of course the expected value of any observable A in the state \rho is

\langle A \rangle = \mathrm{tr}(\rho A)

so we can also write the covariance matrix like this:

g_{ij} = \mathrm{Re}\, \mathrm{tr}(\rho \; \frac{\partial \mathrm{ln} \rho }{\partial \lambda^i} \; \frac{\partial \mathrm{ln} \rho }{\partial \lambda^j} )

Lo and behold! This formula makes sense whenever \rho is any density matrix depending smoothly on some parameters \lambda^i. We don’t need it to be a Gibbs state! So, we can work more generally.

Indeed, whenever we have any smooth function from a manifold to the space of density matrices for some Hilbert space, we can define g_{ij} by the above formula! And when it’s positive definite, we get a Riemannian metric on our manifold: the Bures information metric.

The classical analogue is the somewhat more well-known ‘Fisher information metric’. When we go from quantum to classical, operators become functions and traces become integrals. There’s nothing complex anymore, so taking the real part becomes unnecessary. So the Fisher information metric looks like this:

g_{ij} = \int_\Omega p(\omega) \; \frac{\partial \mathrm{ln} p(\omega) }{\partial \lambda^i} \; \frac{\partial \mathrm{ln} p(\omega) }{\partial \lambda^j} \; d \omega

Here I’m assuming we’ve got a smooth function p from some manifold M to the space of probability distributions on some measure space (\Omega, d\omega). Working in local coordinates \lambda^i on our manifold M, the above formula defines a Riemannian metric on M, at least when g_{ij} is positive definite. And that’s the Fisher information metric!

Crooks says more: he describes an experiment that would let you measure the length of a path with respect to the Fisher information metric — at least in the case where the state \rho(x) is the Gibbs state with \langle X_i \rangle = x_i. And that explains why he calls it ‘thermodynamic length’.

There’s a lot more to say about this, and also about another question: What use is the Fisher information metric in the general case where the states \rho(x) aren’t Gibbs states?

But it’s dinnertime, so I’ll stop here.

[Note: in the original version of this post, I omitted the real part in my definition g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle , giving a ‘Riemannian metric’ that was neither real nor symmetric in the quantum case. Most of the comments below are based on that original version, not the new fixed one.]


Information Geometry (Part 1)

22 October, 2010

I’d like to provide a bit of background to this interesting paper:

• Gavin E. Crooks, Measuring thermodynamic length.

which was pointed out by John F in our discussion of entropy and uncertainty.

The idea here should work for either classical or quantum statistical mechanics. The paper describes the classical version, so just for a change of pace let me describe the quantum version.

First a lightning review of quantum statistical mechanics. Suppose you have a quantum system with some Hilbert space. When you know as much as possible about your system, then you describe it by a unit vector in this Hilbert space, and you say your system is in a pure state. Sometimes people just call a pure state a ‘state’. But that can be confusing, because in statistical mechanics you also need more general ‘mixed states’ where you don’t know as much as possible. A mixed state is described by a density matrix, meaning a positive operator \rho with trace equal to 1:

\mathrm{tr}(\rho) = 1

The idea is that any observable is described by a self-adjoint operator A, and the expected value of this observable in the mixed state \rho is

\langle A \rangle = \mathrm{tr}(\rho A)

The entropy of a mixed state is defined by

S(\rho) = -\mathrm{tr}(\rho \; \mathrm{ln} \, \rho)

where we take the logarithm of the density matrix just by taking the log of each of its eigenvalues, while keeping the same eigenvectors. This formula for entropy should remind you of the one that Gibbs and Shannon used — the one I explained a while back.

Back then I told you about the ‘Gibbs ensemble’: the mixed state that maximizes entropy subject to the constraint that some observable have a given value. We can do the same thing in quantum mechanics, and we can even do it for a bunch of observables at once. Suppose we have some observables X_1, \dots, X_n and we want to find the mixed state \rho that maximizes entropy subject to these constraints:

\langle X_i \rangle = x_i

for some numbers x_i. Then a little exercise in Lagrange multipliers shows that the answer is the Gibbs state:

\rho = \frac{1}{Z} \mathrm{exp}(-\lambda_1 X_1 + \cdots + \lambda_n X_n)

Huh?

This answer needs some explanation. First of all, the numbers \lambda_1, \dots \lambda_n are called Lagrange multipliers. You have to choose them right to get

\langle X_i \rangle = x_i

So, in favorable cases, they will be functions of the numbers x_i. And when you’re really lucky, you can solve for the numbers x_i in terms of the numbers \lambda_i. We call \lambda_i the conjugate variable of the observable X_i. For example, the conjugate variable of energy is inverse temperature!

Second of all, we take the exponential of a self-adjoint operator just as we took the logarithm of a density matrix: just take the exponential of each eigenvalue.

(At least this works when our self-adjoint operator has only eigenvalues in its spectrum, not any continuous spectrum. Otherwise we need to get serious and use the functional calculus. Luckily, if your system’s Hilbert space is finite-dimensional, you can ignore this parenthetical remark!)

But third: what’s that number Z? It begins life as a humble normalizing factor. Its job is to make sure \rho has trace equal to 1:

Z = \mathrm{tr}(\mathrm{exp}(-\lambda_1 X_1 + \cdots + \lambda_n X_n))

However, once you get going, it becomes incredibly important! It’s called the partition function of your system.

As an example of what it’s good for, it turns out you can compute the numbers x_i as follows:

x_i = - \frac{\partial}{\partial \lambda_i} \mathrm{ln} Z

In other words, you can compute the expected values of the observables X_i by differentiating the log of the partition function:

\langle X_i \rangle = - \frac{\partial}{\partial \lambda_i} \mathrm{ln} Z

Or in still other words,

\langle X_i \rangle = - \frac{1}{Z} \; \frac{\partial Z}{\partial \lambda_i}

To believe this you just have to take the equations I’ve given you so far and mess around — there’s really no substitute for doing it yourself. I’ve done it fifty times, and every time I feel smarter.

But we can go further: after the ‘expected value’ or ‘mean’ of an observable comes its variance, which is the square of its standard deviation:

(\Delta A)^2 = \langle A^2 \rangle - \langle A \rangle^2

This measures the size of fluctuations around the mean. And in the Gibbs state, we can compute the variance of the observable X_i as the second derivative of the log of the partition function:

\langle X_i^2 \rangle - \langle X_i \rangle^2 =  \frac{\partial^2}{\partial^2 \lambda_i} \mathrm{ln} Z

Again: calculate and see.

But when we’ve got lots of observables, there’s something better than the variance of each one. There’s the covariance matrix of the whole lot of them! Each observable X_i fluctuates around its mean value x_i… but these fluctuations are not independent! They’re correlated, and the covariance matrix says how.

All this is very visual, at least for me. If you imagine the fluctuations as forming a blurry patch near the point (x_1, \dots, x_n), this patch will be ellipsoidal in shape, at least when all our random fluctuations are Gaussian. And then the shape of this ellipsoid is precisely captured by the covariance matrix! In particular, the eigenvectors of the covariance matrix will point along the principal axes of this ellipsoid, and the eigenvalues will say how stretched out the ellipsoid is in each direction!

To understand the covariance matrix, it may help to start by rewriting the variance of a single observable A as

(\Delta A)^2 = \langle (A - \langle A \rangle)^2 \rangle

That’s a lot of angle brackets, but the meaning should be clear. First we look at the difference between our observable and its mean value, namely

A - \langle A \rangle

Then we square this, to get something that’s big and positive whenever our observable is far from its mean. Then we take the mean value of the that, to get an idea of how far our observable is from the mean on average.

We can use the same trick to define the covariance of a bunch of observables X_i. We get an n \times n matrix called the covariance matrix, whose entry in the ith row and jth column is

\langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle

If you think about it, you can see that this will measure correlations in the fluctuations of your observables.

An interesting difference between classical and quantum mechanics shows up here. In classical mechanics the covariance matrix is always symmetric — but not in quantum mechanics! You see, in classical mechanics, whenever we have two observables A and B, we have

\langle A B \rangle = \langle B A \rangle

since observables commute. But in quantum mechanics this is not true! For example, consider the position q and momentum p of a particle. We have

q p = p q + i

so taking expectation values we get

\langle q p \rangle = \langle p q \rangle + i

So, it’s easy to get a non-symmetric covariance matrix when our observables X_i don’t commute. However, the real part of the covariance matrix is symmetric, even in quantum mechanics. So let’s define

g_{ij} =  \mathrm{Re}  \langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle

You can check that the matrix entries here are the second derivatives of the partition function:

g_{ij} = \frac{\partial^2}{\partial \lambda_i \partial \lambda_j} \mathrm{ln} Z

And now for the cool part: this is where information geometry comes in! Suppose that for any choice of values x_i we have a Gibbs state with

\langle X_i \rangle = x_i

Then for each point

x = (x_1, \dots , x_n) \in \mathbb{R}^n

we have a matrix

g_{ij} =  \mathrm{Re}  \langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle = \frac{\partial^2}{\partial \lambda_i \partial \lambda_j} \mathrm{ln} Z

And this matrix is not only symmetric, it’s also positive. And when it’s positive definite we can think of it as an inner product on the tangent space of the point x. In other words, we get a Riemannian metric on \mathbb{R}^n. This is called the Fisher information metric.

I hope you can see through the jargon to the simple idea. We’ve got a space. Each point x in this space describes the maximum-entropy state of a quantum system for which our observables have specified mean values. But in each of these states, the observables are random variables. They don’t just sit at their mean value, they fluctuate! You can picture these fluctuations as forming a little smeared-out blob in our space. To a first approximation, this blob is an ellipsoid. And if we think of this ellipsoid as a ‘unit ball’, it gives us a standard for measuring the length of any little vector sticking out of our point. In other words, we’ve got a Riemannian metric: the Fisher information metric!

Now if you look at the Wikipedia article you’ll see a more general but to me somewhat scarier definition of the Fisher information metric. This applies whenever we’ve got a manifold whose points label arbitrary mixed states of a system. But Crooks shows this definition reduces to his — the one I just described — when our manifold is \mathbb{R}^n and it’s parametrizing Gibbs states in the way we’ve just seen.

More precisely: both Crooks and the Wikipedia article describe the classical story, but it parallels the quantum story I’ve been telling… and I think the quantum version is well-known. I believe the quantum version of the Fisher information metric is sometimes called the Bures metric, though I’m a bit confused about what the Bures metric actually is.

[Note: in the original version of this post, I omitted the real part in my definition g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)  (X_j - \langle X_j \rangle)  \rangle , giving a ‘Riemannian metric’ that was neither real nor symmetric in the quantum case. Most of the comments below are based on that original version, not the new fixed one.]


The Art of Math

22 October, 2010

On this blog I’m focusing on quantum technology and “how scientists can save the planet”. But I have a previous life, when I worked on n-categories and quantum gravity — and I still have three grad students doing theses on those topics: John Huerta, Chris Rogers and Christopher Walker. They’re all finishing up next spring — you should hire them!

Unfortunately, it’s a bit hard to explain their work in simple terms.

Fortunately, Sophie Hedben has written an article that does a great job!

You can comment over there (that is, on the n-Café), if you want.


Entropy and Uncertainty

19 October, 2010

I was going to write about a talk at the CQT, but I found a preprint lying on a table in the lecture hall, and it was so cool I’ll write about that instead:

• Mario Berta, Matthias Christandl, Roger Colbeck, Joseph M. Renes, Renato Renner, The uncertainty principle in the presence of quantum memory, Nature Physics, July 25, 2010.

Actually I won’t talk about the paper per se, since it’s better if I tell you a more basic result that I first learned from reading this paper: the entropic uncertainty principle!

Everyone loves the concept of entropy, and everyone loves the uncertainty principle. Even folks who don’t understand ’em still love ’em. They just sound so mysterious and spooky and dark. I love ’em too. So, it’s nice to see a mathematical relation between them.

I explained entropy back here, so let me say a word about the uncertainty principle. It’s a limitation on how accurately you can measure two things at once in quantum mechanics. Sometimes you can only know a lot about one thing if you don’t know much about the other. This happens when those two things “fail to commute”.

Mathematically, the usual uncertainty principle says this:

\Delta A \cdot \Delta B \ge \frac{1}{2} |\langle [A,B] \rangle |

In plain English: the uncertainty in A times the uncertainty in B is bigger than the absolute value of the expected value of their commutator

[A,B] = A B - B A

Whoops! That started off as plain English, but it degenerated into plain gibberish near the end… which is probably why most people don’t understand the uncertainty principle. I don’t think I’m gonna cure that today, but let me just nail down the math a bit.

Suppose A and B are observables — and to keep things really simple, by observable I’ll just mean a self-adjoint n \times n matrix. Suppose \psi is a state: that is, a unit vector in \mathbb{C}^n. Then the expected value of A in the state \psi is the average answer you get when you measure that observable in that state. Mathematically it’s equal to

\langle A \rangle = \langle \psi, A \psi \rangle

Sorry, there are a lot of angle brackets running around here: the ones at right stand for the inner product in \mathbb{C}^n, which I’m assuming you understand, while the ones at left are being defined by this equation. They’re just a shorthand.

Once we can compute averages, we can compute standard deviations, so we define the standard deviation of an observable A in the state \psi to be \Delta A where

(\Delta A)^2 = \langle A^2 \rangle - \langle A \rangle^2

Got it? Just like in probability theory. So now I hope you know what every symbol here means:

\Delta A \cdot \Delta B \ge \frac{1}{2} |\langle [A,B] \rangle |

and if you’re a certain sort of person you can have fun going home and proving this. Hint: it takes an inequality to prove an inequality. Other hint: what’s the most important inequality in the universe?

But now for the fun part: entropy!

Whenever you have an observable A and a state \psi, you get a probability distribution: the distribution of outcomes when you measure that observable in that state. And this probability distribution has an entropy! Let’s call the entropy S(A). I’ll define it a bit more carefully later.

But the point is: this entropy is really a very nice way to think about our uncertainty, or ignorance, of the observable A. It’s better, in many ways, than the standard deviation. For example, it doesn’t change if we multiply A by 2. The standard deviation doubles, but we’re not twice as ignorant!

Entropy is invariant under lots of transformations of our observables. So we should want an uncertainty principle that only involves entropy. And here it is, the entropic uncertainty principle:

S(A) + S(B) \ge \mathrm{log} \, \frac{1}{c}

Here c is defined as follows. To keep things simple, suppose that A is nondegenerate, meaning that all its eigenvalues are distinct. If it’s not, we can tweak it a tiny bit and it will be. Let its eigenvectors be called \phi_i. Similarly, suppose B is nondegenerate and call its eigenvectors \chi_j. Then we let

c = \mathrm{max}_{i,j} |\langle \phi_i, \chi_j \rangle|^2

Note this becomes 1 when there’s an eigenvector of A that’s also an eigenvector of B. In this case its possible to find a state where we know both observables precisely, and in this case also

\mathrm{log}\, \frac{1}{c} = 0

And that makes sense: in this case S(A) + S(B), which measures our ignorance of both observables, is indeed zero.

But if there’s no eigenvector of A that’s also an eigenvector of B, then c is smaller than 1, so

\mathrm{log} \, \frac{1}{c} > 0

so the entropic uncertainty principle says we really must have some ignorance about either A or B (or both).

So the entropic uncertainty principle makes intuitive sense. But let me define the entropy S(A), to make the principle precise. If \phi_i are the eigenvectors of A, the probabilities of getting various outcomes when we measure A in the state \psi are

p_i = |\langle \phi_i, \psi \rangle|^2

So, we define the entropy by

S(A) = - \sum_i p_i \; \mathrm{log}\, p_i

Here you can use any base for your logarithm, as long as you’re consistent. Mathematicians and physicists use e, while computer scientists, who prefer integers, settle for the best known integer approximation: 2.

Just kidding! Darn — now I’ve insulted all the computer scientists. I hope none of them reads this.

Who came up with this entropic uncertainty principle? I’m not an expert on this, so I’ll probably get this wrong, but I gather it came from an idea of Deutsch:

• David Deutsch, Uncertainty in quantum measurements, Phys. Rev. Lett. 50 (1983), 631-633.

Then it got improved and formulated as a conjecture by Kraus:

• K. Kraus, Complementary observables and uncertainty relations, Phys. Rev. D 35 (1987), 3070-3075.

and then that conjecture was proved here:

• H. Maassen and J. B. Uffink, Generalized entropic uncertainty relations, Phys. Rev. Lett. 60 (1988), 1103-1106.

The paper I found in the lecture hall proves a more refined version where the system being measured — let’s call it X — is entangled to the observer’s memory apparatus — let’s call it O. In this situation they show

S(A|O) + S(B|O) \ge S(X|O) + \mathrm{log} \, \frac{1}{c}

where I’m using a concept of “conditional entropy”: the entropy of something given something else. Here’s their abstract:

The uncertainty principle, originally formulated by Heisenberg, clearly illustrates the difference between classical and quantum mechanics. The principle bounds the uncertainties about the outcomes of two incompatible measurements, such as position and momentum, on a particle. It implies that one cannot predict the outcomes for both possible choices of measurement to arbitrary precision, even if information about the preparation of the particle is available in a classical memory. However, if the particle is prepared entangled with a quantum memory, a device that might be available in the not-too-distant future, it is possible to predict the outcomes for both measurement choices precisely. Here, we extend the uncertainty principle to incorporate this case, providing a lower bound on the uncertainties, which depends on the amount of entanglement between the particle and the quantum memory. We detail the application of our result to witnessing entanglement and to quantum key distribution.

By the way, on a really trivial note…

My wisecrack about 2 being the best known integer approximation to e made me wonder: since 3 is actually closer to e, are there some applications where ternary digits would theoretically be better than binary ones? I’ve heard of "trits" but I don’t actually know any applications where they’re optimal.

Oh — here’s one.


This Week’s Finds (Week 304)

15 October, 2010

About 10,800 BC, something dramatic happened.

The last glacial period seemed to be ending quite nicely, things had warmed up a lot — but then, suddenly, the temperature in Europe dropped about 7 °C! In Greenland, it dropped about twice that much. In England it got so cold that glaciers started forming! In the Netherlands, in winter, temperatures regularly fell below -20 °C. Throughout much of Europe trees retreated, replaced by alpine landscapes, and tundra. The climate was affected as far as Syria, where drought punished the ancient settlement of Abu Hurerya. But it doesn’t seem to have been a world-wide event.

This cold spell lasted for about 1300 years. And then, just as suddenly as it began, it ended! Around 9,500 BC, the temperature in Europe bounced back.

This episode is called the Younger Dryas, after a certain wildflower that enjoys cold weather, whose pollen is common in this period.

What caused the Younger Dryas? Could it happen again? An event like this could wreak havoc, so it’s important to know. Alas, as so often in science, the answer to these questions is "we’re not sure, but…."

We’re not sure, but the most popular theory is that a huge lake in Canada, formed by melting glaciers, broke its icy banks and flooded out into the Saint Lawrence River. This lake is called Lake Agassiz. At its maximum, it held more water than all lakes in the world now put together:



In a massive torrent lasting for years, the water from this lake rushed out to the Labrador Sea. By floating atop the denser salt water, this fresh water blocked a major current that flows in the Altantic: the Atlantic Meridional Overturning Circulation, or AMOC. This current brings warm water north and helps keep northern Europe warm. So, northern Europe was plunged into a deep freeze!

That’s the theory, anyway.

Could something like this happen again? There are no glacial lakes waiting to burst their banks, but the concentration of fresh water in the northern Atlantic has been increasing, and ocean temperatures are changing too, so some scientists are concerned. The problem is, we don’t really know what it takes to shut down the Atlantic Meridional Overturning Circulation!

To make progress on this kind of question, we need a lot of insight, but we also need some mathematical models. And that’s what Nathan Urban will tell us about now. First we’ll talk in general about climate models, Bayesian reasoning, and Monte Carlo methods. We’ll even talk about the general problem of using simple models to study complex phenomena. And then he’ll walk us step by step through the particular model that he and a coauthor have used to study this question: will the AMOC run amok?

Sorry, I couldn’t resist that. It’s not so much "running amok" that the AMOC might do, it’s more like "fizzling out". But accuracy should never stand in the way of a good pun.

On with the show:

JB: Welcome back! Last time we were talking about the new work you’re starting at Princeton. You said you’re interested in the assessment of climate policy in the presence of uncertainties and "learning" – where new facts come along that revise our understanding of what’s going on. Could you say a bit about your methodology? Or, if you’re not far enough along on this work, maybe you could talk about the methodology of some other paper in this line of research.

NU: To continue the direction of discussion, I’ll respond by talking about the methodology of a few papers along the lines of what I hope to work on here at Princeton, rather than about my past papers on uncertainty quantification. They are Keller and McInerney on learning rates:

• Klaus Keller and David McInerney, The dynamics of learning about a climate threshold, Climate Dynamics 30 (2008), 321-332.

Keller and coauthors on learning and economic policy:

• Klaus Keller, Benjamin M. Bolkerb and David F. Bradford, Uncertain climate thresholds and optimal economic growth, Journal of Environmental Economics and Management 48 (2004), 723-741.

and Oppenheimer et al. on "negative" learning (what happens when science converges to the wrong answer):

• Michael Oppenheimer, Brian C. O’Neill and Mort Webster, Negative learning, Climatic Change 89 (2008), 155-172.

The general theme of this kind of work is to statistically compare a climate model to observed data in order to understand what model behavior is allowed by existing data constraints. Then, having quantified the range of possibilities, plug this uncertainty analysis into an economic-climate model (or "integrated assessment model"), and have it determine the economically "optimal" course of action.

So: start with a climate model. There is a hierarchy of such models, ranging from simple impulse-response or "box" models to complex atmosphere-ocean general circulation models. I often use the simple models, because they’re computationally efficient and it is therefore feasible to explore their full range of uncertainties. I’m moving toward more complex models, which requires fancier statistics to extract information from a limited set of time-consuming simulations.

Given a model, then apply a Monte Carlo analysis of its parameter space. Climate models cannot simulate the entire Earth from first principles. They have to make approximations, and those approximations involve free parameters whose values must be fit to data (or calculated from specialized models). For example, a simple model cannot explicitly describe all the possible feedback interactions that are present in the climate system. It might lump them all together into a single, tunable "climate sensitivity" parameter. The Monte Carlo analysis runs the model many thousands of times at different parameter settings, and then compares the model output to past data in order to see which parameter settings are plausible and which are not. I use Bayesian statistical inference, in combination with Markov chain Monte Carlo, to quantify the degree of "plausibility" (i.e., probability) of each parameter setting.

With probability weights for the model’s parameter settings, it is now possible to weight the probability of possible future outcomes predicted by the model. This describes, conditional on the model and data used, the uncertainty about the future climate.

JB: Okay. I think I roughly understand this. But you’re using jargon that may cause some readers’ eyes to glaze over. And that would be unfortunate, because this jargon is necessary to talk about some very cool ideas. So, I’d like to ask what some phrases mean, and beg you to explain them in ways that everyone can understand.

To help out — and maybe give our readers the pleasure of watching me flounder around — I’ll provide my own quick attempts at explanation. Then you can say how close I came to understanding you.

First of all, what’s an "impulse-response model"? When I think of "impulse response" I think of, say, tapping on a wineglass and listening to the ringing sound it makes, or delivering a pulse of voltage to an electrical circuit and watching what it does. And the mathematician in me knows that this kind of situation can be modelled using certain familiar kinds of math. But you might be applying that math to climate change: for example, how the atmosphere responds when you pump some carbon dioxide into it. Is that about right?

NU: Yes. (Physics readers will know "impulse response" as "Green’s functions", by the way).

The idea is that you have a complicated computer model of a physical system whose dynamics you want to represent as a simple model, for computational convenience. In my case, I’m working with a computer model of the carbon cycle which takes CO2 emissions as input and predicts how much CO2 is left in the air after natural sources and sinks operate on what’s there. It’s possible to explicitly model most of the relevant physical and biogeochemical processes, but it takes a long time for such a computer simulation to run. Too long to explore how it behaves under many different conditions, which is what I want to do.

How do you build a simple model that acts like a more complicated one? One way is to study the complex model’s "impulse response" — in this case, how it behaves in response to an instantaneous "pulse" of carbon to the atmosphere. In general, the CO2 in the atmosphere will suddenly jump up, and then gradually relax back toward its original concentration as natural sinks remove some of that carbon from the atmosphere. The curve showing how the concentration decreases over time is the "impulse response". You derive it by telling your complex computer simulation that a big pulse of carbon was added to the air, and recording what it predicts will happen to CO2 over time.

The trick in impulse response theory is to treat an arbitrary CO2 emissions trajectory as the sum of a bunch of impulses of different sizes, one right after another. So, if emissions are 1, 3, and 7 units of carbon in years 1, 2, and 3, then you can think of that as a 1-unit pulse of carbon in year one, plus a 3-unit pulse in year 2, plus a 7-unit pulse in year 3.

The crucial assumption you make at this point is that you can treat the response of the complex model to this series of impulses as the sum of the "impulse response" curve that you worked out for a single pulse. Therefore, just by running the model in response to a single unit pulse, you can work out what the model would predict for any emissions trajectory, by adding up its response to a bunch of individual pulses. The impulse response model makes its prediction by summing up lots of copies of the impulse repsonse curve, with different sizes and at different times. (Techincally, this is a convolution of the impulse response curve, or Green’s function, with the emissions trajectory curve.)

JB: Okay. Next, what’s a "box model"? I had to look that up, and after some floundering around I bumped into a Wikipedia article that mentioned "black box models" and "white box models".

A black box model is where you’ve got a system, and all you pay attention to is its input and output — in other words, what you do to it, and what it does to you, not what’s going on "inside". A white box model, or "glass box model", lets you see what’s going on inside but not directly tinker with it, except via your input.

Is this at all close? I don’t feel very confident that I’ve understood what a "box model" is.

NU: No, box models are the sorts of things you find in "systems dynamics" theory, where you have "stocks" of a substance and "flows" of it in and out. In the carbon cycle, the "boxes" (or stocks) could be "carbon stored in wood", "carbon stored in soil", "carbon stored in the surface ocean", etc. The flows are the sources and sinks of carbon. In an ocean model, boxes could be "the heat stored in the North Atlantic", "the heat stored in the deep ocean", etc., and flows of heat between them.

Box models are a way of spatially averaging over a lot of processes that are too complicated or time-consuming to treat in detail. They’re another way of producing simplified models from more complex ones, like impulse response theory, but without the linearity assumption. For example, one could replace a three dimensional circulation model of the ocean with a couple of "big boxes of water connected by pipes". Of course, you have to then verify that your simplified model is a "good enough" representation of whatever aspect of the more complex model that you’re interested in.

JB: Okay, sure — I know a bit about these "box models", but not that name. In fact the engineers who use "bond graphs" to depict complex physical systems made of interacting parts like to emphasize the analogy between electrical circuits and hydraulic systems with water flowing through pipes. So I think box models fit into the bond graph formalism pretty nicely. I’ll have to think about that more.

Anyway: next you mentioned taking a model and doing a "Monte Carlo analysis of its parameter space". This time you explained what you meant, but I’ll still go over it.

Any model has a bunch of adjustable parameters in it, for example the "climate sensitivity", which in a simple model just means how much warmer it gets per doubling of atmospheric carbon dioxide. We can think of these adjustable parameters as knobs we’re allowed to turn. The problem is that we don’t know the best settings of these knobs! And even worse, there are lots of allowed settings.

In a Monte Carlo analysis we randomly turn these knobs to some setting, run our model, and see how well it does — presumably by comparing its results to the "right answer" in some situation where we already know the right answer. Then we keep repeating this process. We turn the knobs again and again, and accumulate information, and try to use this to guess what the right knob settings are.

More precisely: we try to guess the probability that the correct knob settings lie within any given range! We don’t try to guess their one "true" setting, because we can’t be sure what that is, and it would be silly to pretend otherwise. So instead, we work out probabilities.

Is this roughly right?

NU: Yes, that’s right.

JB: Okay. That was the rough version of the story. But then you said something a lot more specific. You say you "use Bayesian statistical inference, in combination with Markov chain Monte Carlo, to quantify the degree of "plausibility" (or probability) of each parameter setting."

So, I’ve got a couple more questions. What’s "Markov chain Monte Carlo"? I guess it’s some specific way of turning those knobs over and over again.

NU: Yes. For physicists, it’s a "random walk" way of turning the knobs: you start out at the current knob settings, and tweak each one just a little bit away from where they currently are. In the most common Markov chain Monte Carlo (MCMC) algorithm, if the new setting takes you to a more plausible setting of the knobs, you keep that setting. If the new setting produces an outcome that is less plausible, then you might keep the new setting (with a likelihood proportional to how much less plausible the new setting is), or you might stay at the existing setting and try again with a new tweaking. The MCMC algorithm is designed so that the sequence of knob settings produced will sample randomly from the probability distribution you’re interested in.

JB: And what’s "Bayesian statistical inference"? I’m sorry, I know this subject deserves a semester-long graduate course. But like a bad science journalist, I will ask you to distill it down to a few sentences! Sometime I’ll do a whole series of This Week’s Finds about statistical inference, but not now.

NU: I can distill it to one sentence: in this context, it’s a branch of statistics which allows you to assign probabilities to different settings of model parameters, based on how well those settings cause the model to reproduce the observed data.

The more common "frequentist" approach to statistics doesn’t allow you to assign probabilities to model parameters. It has a different take on probability. As a Bayesian, you assume the observed data is known and talk about probabilities of hypotheses (here, model parameters). As a frequentist, you assume the hypothesis is known (hypothetically), and talk about probabilities of data that could result from it. They differ fundamentally in what you treat as known (data, or hypothesis) and what probabilities are applied to (hypothesis, or data).

JB: Okay, and one final question: sometimes you say "plausibility" and sometimes you say "probability". Are you trying to distinguish these, or say they’re the same?

NU: I am using "probability" as a technical term which quantifies how "plausible" a hypothesis is. Maybe I should just stick to "probability".

JB: Great. Thanks for suffering through that dissection of what you said.

I think I can summarize, in a sloppy way, as follows. You take a model with a bunch of adjustable knobs, and you use some data to guess the probability that the right settings of these knobs lie within any given range. Then, you can use this model to make predictions. But these predictions are only probabilistic.

Okay, then what?

NU: This is the basic uncertainty analysis. There are several things that one can do with it. One is to look at learning rates. You can generate "hypothetical data" that we might observe in the future, by taking a model prediction and adding some "observation noise" to it. (This presumes that the model is perfect, which is not the case, but it represents a lower bound on uncertainty.) Then feed the hypothetical data back into the uncertainty analysis to calculate how much our uncertainty in the future could be reduced as a result of "observing" this "new" data. See Keller and McInerney for an example.

Another thing to do is decision making under uncertainty. For this, you need an economic integrated assessment model (or some other kind of policy model). Such a model typically has a simple description of the world economy connected to a simple description of the global climate: the world population and the economy grow at a certain rate which is tied to the energy sector, policies to reduce fossil carbon emissions have economic costs, fossil carbon emissions influence the climate, and climate change has economic costs. Different models are more or less explicit about these components (is the economy treated as a global aggregate or broken up into regional economies, how realistic is the climate model, how detailed is the energy sector model, etc.)

If you feed some policy (a course of emissions reductions over time) into such a model, it will calculate the implied emissions pathway and emissions abatement costs, as well as the implied climate change and economic damages. The net costs or benefits of this policy can be compared with a "business as usual" scenario with no emissions reductions. The net benefit is converted from "dollars" to "utility" (accounting for things like the concept that a dollar is worth more to a poor person than a rich one), and some discounting factor is applied (to downweight the value of future utility relative to present). This gives "the (discounted) utility of the proposed policy".

So far this has not taken uncertainty into account. In reality, we’re not sure what kind of climate change will result from a given emissions trajectory. (There is also economic uncertainty, such as how much it really costs to reduce emissions, but I’ll concentrate on the climate uncertainty.) The uncertainty analysis I’ve described can give probability weights to different climate change scenarios. You can then take a weighted average over all these scenarios to compute the "expected" utility of a proposed policy.

Finally, you optimize over all possible abatement policies to find the one that has the maximum expected discounted utility. See Keller et al. for a simple conceptual example of this applied to a learning scenario, and this book for a deeper discussion:

• William Nordhaus, A Question of Balance, Yale U. Press, New Haven, 2008.

It is now possible to start elaborating on this theme. For instance, in the future learning problem, you can modify the "hypothetical data" to deviate from what your climate model predicts, in order to consider what would happen if the model is wrong and we observe something "unexpected". Then you can put that into an integrated assessment model to study how much being wrong would cost us, and how fast we need to learn that we’re wrong in order to change course, policy-wise. See that paper by Oppenheimer et al. for an example.

JB: Thanks for that tour of ideas! It sounds fascinating, important, and complex.

Now I’d like to move on to talking about a specific paper of yours. It’s this one:

• Nathan Urban and Klaus Keller, Probabilistic hindcasts and projections of the coupled climate, carbon cycle, and Atlantic meridional overturning circulation system: A Bayesian fusion of century-scale observations with a simple model, Tellus A 62 (2010), 737-750.

Before I ask you about the paper, let me start with something far more basic: what the heck is the "Atlantic meridional overturning circulation" or "AMOC"?

I know it has something to do with ocean currents, and how warm water moves north near the surface of the Atlantic and then gets cold, plunges down, and goes back south. Isn’t this related to the "Gulf Stream", that warm current that supposedly keeps Europe warmer than it otherwise would be?

NU: Your first sentence pretty much sums up the basic dynamics: the warm water from the tropics cools in the North Atlantic, sinks (because it’s colder and denser), and returns south as deep water. As the water cools, the heat it releases to the atmosphere warms the region.

This is the "overturning circulation". But it’s not synonymous with the Gulf Stream. The Gulf Stream is a mostly wind-driven phenomenon, not a density driven current. The "AMOC" has both wind driven and density driven components; the latter is sometimes referred to as the "thermohaline circulation" (THC), since both heat and salinity are involved. I haven’t gotten into salinity yet, but it also influences the density structure of the ocean, and you can read Stefan Rahmstorf’s review articles for more (read the parts on non-linear behavior):

• Stefan Rahmstorf, The thermohaline ocean circulation: a brief fact sheet.

• Stefan Rahmstorf, Thermohaline ocean circulation, in Encyclopedia of Quaternary Sciences, edited by S. A. Elias, Elsevier, Amsterdam 2006.



JB: Next, why are people worrying about the AMOC? I know some scientists have argued that shortly after the last ice age, the AMOC stalled out due to lots of fresh water from Lake Agassiz, a huge lake that used to exist in what’s now Canada, formed by melting glaciers. The idea, I think, was that this event temporarily killed the Gulf Stream and made temperatures in Europe drop enormously.

Do most people believe that story these days?

NU: You’re speaking of the "Younger Dryas" abrupt cooling event around 11 to 13 thousand years ago. The theory is that a large pulse of fresh water from Lake Agassiz lessened the salinity in the Atlantic and made it harder for water to sink, thus shutting down down the overturning circulation and decreasing its release of heat in the North Atlantic. This is still a popular theory, but geologists have had trouble tracing the path of a sufficiently large supply of fresh water, at the right place, and the right time, to shut down the AMOC. There was a paper earlier this year claiming to have finally done this:

• Julian B. Murton, Mark D. Bateman, Scott R. Dallimore, James T. Teller and Zhirong Yang, Identification of Younger Dryas outburst flood path from Lake Agassiz to the Arctic Ocean, Nature 464 (2010), 740-743.

but I haven’t read it yet.

The worry is that this could happen again — not because of a giant lake draining into the Atlantic, but because of warming (and the resulting changes in precipitation) altering the thermal and salinity structure of the ocean. It is believed that the resulting shutdown of the AMOC will cause the North Atlantic region to cool, but there is still debate over what it would take to cause it to shut down. It’s also debated whether this is one of the climate "tipping points" that people talk about — whether a certain amount of warming would trigger a shutdown, and whether that shutdown would be "irreversible" (or difficult to reverse) or "abrupt".

Cooling Europe may not be a bad thing in a warming world. In fact, in a warming world, Europe might not actually cool in response to an AMOC shutdown; it might just warm more slowly. The problem is if the cooling is abrupt (and hard to adapt to), or prolonged (permamently shifting climate patterns relative to the rest of the world). Perhaps worse than the direct temperature change could be the impacts on agriculture or ocean ecosystems, resulting from major reorganizations of regional precipitation or ocean circulation patterns.

JB: So, part of your paper consists of modelling the AMOC and how it interacts with the climate and the carbon cycle. Let’s go through this step by step.

First: how do you model the climate? You say you use "the DOECLIM physical climate component of the ACC2 model, which is an energy balance model of the atmosphere coupled to a one-dimensional diffusive ocean model". I guess these are well-known ideas in your world. But I don’t even know what the acronyms stand for! Could you walk us through these ideas in a gentle way?

NU: Don’t worry about the acronyms; they’re just names people have given to particular models.

The ACC2 model is a computer model of both the climate and the carbon cycle. The climate part of our model is called DOECLIM, which I’ve used to replace the original climate component of ACC2. An "energy balance model" is the simplest possible climate model, and is a form of "box model" that I mentioned above. It treats the Earth as a big heat sink that you dump energy into (e.g., by adding greenhouse gases). Given the laws of thermodynamics, you can compute how much temperature change you get from a given amount of heat input.

This energy balance model of the atmosphere is "zero dimensional", which means that it treats the Earth as a featureless sphere, and doesn’t attempt to keep track of how heat flows or temperature changes at different locations. There is no three dimensional circulation of the atmosphere or anything like that. The atmosphere is just a "lump of heat-absorbing material".

The atmospheric "box of heat" is connected to two other boxes, which are land and ocean. In DOECLIM, "land" is just another featureless lump of material, with a different heat capacity than air. The "ocean" is more complicated. Instead of a uniform box of water with a single temperature, the ocean is "one dimensional", meaning that it has depth, and temperature is allowed to vary with depth. Heat penetrates from the surface into the deep ocean by a diffusion process, which is intended to mimic the actual circulation-driven penetration of heat into the ocean. It’s worth treating the ocean in more detail since oceans are the Earth’s major heat sink, and therefore control how quickly the planet can change temperature.

The three parameters in the DOECLIM model which we treat as uncertain are the climate (temperature) sensitivity to CO2, the vertical mixing rate of heat into the ocean, and the strength of the "aerosol indirect effect" (what kind of cooling effect industrial aerosols in the atmosphere create due to their influence on cloud behavior).

JB: Okay, that’s clear enough. But at this point I have to raise an issue about models in general. As you know, a lot of climate skeptics like to complain about the fallibility of models. They would surely become even more skeptical upon hearing that you’re treating the Earth as a featureless sphere with same temperature throughout at any given time — and treating the temperature of ocean water as depending only on the depth, not the location. Why are you simplifying things so much? How could your results possibly be relevant to the real world?

Of course, as a mathematical physicist, I know the appeal of simple models. I also know the appeal of reducing the number of dimensions. I spent plenty of time studying quantum gravity in the wholly unrealistic case of a universe with one less dimension than our real world! Reducing the number of dimensions makes the math a lot simpler. And simplified models give us a lot of insight which — with luck — we can draw upon when tackling the really hard real-world problems. But we have to be careful: they can also lead us astray.

How do you think about results obtained from simplified climate models? Are they just mathematical warmup exercises? That would be fine — I have no problem with that, as long as we’re clear about it. Or are you hoping that they give approximately correct answers?

NU: I use simple models because they’re fast and it’s easier to expose and explore their assumptions. My attitude toward simple models is a little of both the points of view you suggest: partly proof of concept, but also hopefully approximately correct, for the questions I’m asking. Let me first argue for the latter perspective.

If you’re using a zero dimensional model, you can really only hope to answer "zero dimensional questions", i.e. about the globally averaged climate. Once you’ve simplified your question by averaging over a lot of the complexity of the data, you can hope that a simple model can reproduce the remaining dynamics. But you shouldn’t just hope. When using simple models, it’s important to test the predictions of their components against more complex models and against observed data.

You can show, for example, that as far as global average surface temperature is concerned, even simpler energy balance models than DOECLIM (e.g., without a 1D ocean) can do a decent job of reproducing the behavior of more complex models. See, e.g.:

• Isaac M. Held, Michael Winton, Ken Takahashi, Thomas Delworth, Fanrong Zeng and Geoffrey K. Vallis, Probing the fast and slow components of global warming by returning abruptly to preindustrial forcing, Journal of Climate 23 (2010), 2418-2427.

for a recent study. The differences between complex models can be captured merely by retuning the "effective parameters" of the simple model. For example, many of the complexities of different feedback effects can be captured by a tunable climate sensitivity parameter in the simple model, representing the total feedback. By turning this sensitivity "knob" in the simple model, you can get it to behave like complex models which have different feedbacks in them.

There is a long history in climate science of using simple models as "mechanistic emulators" of more complex models. The idea is to put just enough physics into the simple model to get it to reproduce some specific averaged behavior of the complex model, but no more. The classic "mechanistic emulator" used by the International Panel on Climate Change is called MAGICC. BERN-CC is another model frequently used by the IPCC for carbon cycle scenario analysis — that is, converting CO2 emissions scenarios to atmospheric CO2 concentrations. A simple model that people can play around with themselves on the Web may be found here:

• Ben Matthews, Chooseclimate.

Obviously a simple model cannot reproduce all the behavior of a more complex model. But if you can provide evidence that it reproduces the behavior you’re interested in for a particular problem, it is arguably at least as "approximately correct" as the more complex model you validate it against, for that specific problem. (Whether the more complex model is an "approximately correct" representation of the real world is a separate question!)

In fact, simple models are arguably more useful than more complex ones for certain applications. The problem with complex models is, well, their complexity. They make a lot of assumptions, and it’s hard to test all of them. Simpler models make fewer assumptions, so you can test more of them, and look at the sensitivity of your conclusions to your assumptions.

If I take all the complex models used by the IPCC, they will have a range of different climate sensitivities. But what if the actual climate sensitivity is above or below that range, because all the complex models have limitations? I can’t easily explore that possibility in a complex model, because "climate sensitivity" isn’t a knob I can turn. It’s an emergent property of many different physical processes. If I want to change the model’s climate sensitivity, I might have to rewrite the cloud physics module to obey different dynamical equations, or something complicated like that — and I still won’t be able to produce a specific sensitivity. But in a simple model, "climate sensitivity" sensitivity is a "knob", and I can turn it to any desired value above, below, or within the IPCC range to see what happens.

After that defense of simple models, there are obviously large caveats. Even if you can show that a simple model can reproduce the behavior of a more complex one, you can only test it under a limited range of assumptions about model parameters, forcings, etc. It’s possible to push a simple model too far, into a regime where it stops reproducing what a more complex model would do. Simple models can also neglect relevant feedbacks and other processes. For example, in the model I use, global warming can shut down the AMOC, but changes in the AMOC don’t feed back to cool the global temperature. But the cooling from an AMOC weakening should itself slow further AMOC weakening due to global warming. The AMOC model we use is designed to partly compensate for the lack of explicit feedback of ocean heat transport on the temperature forcing, but it’s still an approximation.

In our paper we discuss what we think are the most important caveats of our simple analysis. Ultimately we need to be able to do this sort of analysis with more complex models as well, to see how robust our conclusions are to model complexity and structural assumptions. I am working in that direction now, but the complexities involved might be the subject of another interview!

JB: I’d be very happy to do another interview with you. But you’re probably eager to finish this one first. So we should march on.

But I can’t resist one more comment. You say that models even simpler than DOECLIM can emulate the behavior of more complex models. And then you add, parenthetically, "whether the more complex model is an ‘approximately correct’ representation of the real world is a separate question!" But I think that latter question is the one that ordinary people find most urgent. They won’t be reassured to know that simple models do a good job of mimicking more complicated models. They want to know how well these models mimic reality!

But maybe we’ll get to that when we talk about the Monte Carlo Markov chain procedure and how you use that to estimate the probability that the "knobs" (that is, parameters) in your model are set correctly? Presumably in that process we learn a bit about how well the model matches real-world data?

If so, we can go on talking about the model now, and come back to this point in due time.

NU: The model’s ability to represent the real world is the most important question. But it’s not one I can hope to fully answer with a simple model. In general, you won’t expect a model to exactly reproduce the data. Partly this is due to model imperfections, but partly it’s due to random "natural variability" in the system. (And also, of course, to measurement error.) Natural variability is usually related to chaotic or otherwise unpredictable atmosphere-ocean interactions, e.g. at the scale of weather events, El Niño, etc. Even a perfect model can’t be expected to predict those. With a simple model it’s really hard to tell how much of the discrepancy between model and data is due to model structural flaws, and how much is attributable to expected "random fluctuations", because simple models are too simple to generate their own "natural variability".

To really judge how well models are doing, you have to use a complex model and see how much of the discrepancy can be accounted for by the natural variability it predicts. You also have to get into a lot of detail about the quality of the observations, which means looking at spatial patterns and not just global averages. This is the sort of thing done in model validation studies, "detection and attribution" studies, and observation system papers. But it’s beyond the scope of our paper. That’s why I said the best I can do is to use simple models that perform as well as complex models for limited problems. They will of course suffer any limitations of the complex models to which they’re tuned, and if you want to read about those, you should read those modeling papers.

As far as what I can do with a simple model, yes, the Bayesian probability calculation using MCMC is a form of data-model comparison, in that it gives higher weight to model parameter settings that fit the data better. But it’s not exactly a form of "model checking", because Bayesian probability weighting is a relative procedure. It will be quite happy to assign high probability to parameter settings that fit the data terribly, as long as they still fit better than all the other parameter settings. A Bayesian probability isn’t an absolute measure of model quality, and so it can’t be used to check models. This is where classical statistical measures of "goodness of fit" can be helpful. For a philosophical discussion, see:

• Andrew Gelman and Cosma Rohilla Shalizi, Philosophy and the practice of Bayesian statistics, available as arXiv:1006.3868.

That being said, you do learn about model fit during the MCMC procedure in its attempt to sample highly probable parameter settings. When you get to the best fitting parameters, you look at the difference between the model fit and the observations to get an idea of what the "residual error" is — that is, everything that your model wasn’t able to predict.

I should add that complex models disagree more about the strength of the AMOC than they do about more commonly discussed climate variables, such as surface temperature. This can been seen in Figure 10.15 of the IPCC AR4 WG1 report: there is a cluster of models that all tend to agree with the observed AMOC strength, but there are also some models that don’t. Some of those that don’t are known to have relatively poor physical modeling of the overturning circulation, so this is to be expected (i.e., the figure looks like a worse indictment of the models than it really is). But there is still disagreement between some of the "higher quality" models. Part of the problem is that we have poor historical observations of the AMOC, so it’s sometimes hard to tell what needs fixing in the models.

Since the complex models don’t all agree about the current state of the AMOC, one can (and should) question using a simple AMOC model which has been tuned to a particular complex model. Other complex models will predict something altogether different. (And in fact, the model that our simple model was tuned to is also simpler than the IPCC AR4 models.) In our analysis we try to get around this model uncertainty by including some tunable parameters that control both the initial strength of the AMOC and how quickly it weakens. By altering those parameters, we try to span the range of possible outcomes predicted by complex models, allowing the parameters to take on whatever range of values is compatible with the (noisy) observations. This, at a minimum, leads to significant uncertainty in what the AMOC will do.

I’m okay with the idea of uncertainty — that is, after all, what my research is about. But ultimately, even projections with wide error bars still have to be taken with a grain of salt, if the most advanced models still don’t entirely agree on simple questions like the current strength of the AMOC.

JB: Okay, thanks. Clearly the question of how well your model matches reality is vastly more complicated than what you started out trying to tell me: namely, what your model is. Let’s get back to that.

To recap, your model consists of three interacting parts: a model of the climate, a model of the carbon cycle, and a model of the Atlantic meridional overturning circulation (or "AMOC"). The climate model, called "DOECLIM", itself consists of three interacting parts:

• the "land" (modeled as a "box of heat"),

• the "atmosphere" (modeled as a "box of heat")
• the "ocean" (modelled as a one-dimensional object, so that temperature varies with depth)

Next: how do you model the carbon cycle?

NU: We use a model called NICCS (nonlinear impulse-response model of the coupled carbon-cycle climate system). This model started out as an impulse response model, but because of nonlinearities in the carbon cycle, it was augmented by some box model components. NICCS takes fossil carbon emissions to the air as input, and calculates how that carbon ends up being partitioned between the atmosphere, land (vegetation and soil), and ocean.

For the ocean, it has an impulse response model of the vertical advective/diffusive transport of carbon in the ocean. This is supplemented by a differential equation that models nonlinear ocean carbonate buffering chemistry. It doesn’t have any explicit treatment of ocean biology. For the terrestrial biosphere, it has a box model of the carbon cycle. There are four boxes, each containing some amount of carbon. They are "woody vegetation", "leafy vegetation", "detritus" (decomposing organic matter), and "humus" (more stable organic soil carbon). The box model has some equations describing how quickly carbon gets transported between these boxes (or back to the atmosphere).

In addition to carbon emissions, both the land and ocean modules take global temperature as an input. (So, there should be a red arrow pointing to the "ocean" too — this is a mistake in the figure.) This is because there are temperature-dependent feedbacks in the carbon cycle. In the ocean, temperature determines how readily CO2 will dissolve in water. On land, temperature influences how quickly organic matter in soil decays ("heterotrophic respiration"). There are also purely carbon cycle feedbacks, such as the buffering chemistry mentioned above, and also "CO2 fertilization", which quantifies how plants can grow better under elevated levels of atmospheric CO2.

The NICCS model also originally contained an impulse response model of the climate (temperature as a function of CO2), but we removed that and replaced it with DOECLIM. The NICCS model itself is tuned to reproduce the behavior of a more complex Earth system model. The key three uncertain parameters treated in our analysis control the soil respiration temperature feedback, the CO2 fertilization feedback, and the vertical mixing rate of carbon into the ocean.

JB: Okay. Finally, how do you model the AMOC?

NU: This is another box model. There is a classic 1961 paper by Stommel:

• Henry Stommel, Thermohaline convection with two stable regimes of flow, Tellus 2 (1961), 224-230.

which models the overturning circulation using two boxes of water, one representing water at high latitudes and one at low latitudes. The boxes contain heat and salt. Together, temperature and salinity determine water density, and density differences drive the flow of water between boxes.

It has been shown that such box models can have interesting nonlinear dynamics, exhibiting both hysteresis and threshold behavior. Hysteresis means that if you warm the climate and then cool it back down to its original temperature, the AMOC doesn’t return to its original state. Threshold behavior means that the system exhibits multiple stable states (such as an ocean circulation with or without overturning), and you can pass a "tipping point" beyond which the system flips from one stable equilibrium to another. Ultimately, this kind of dynamics means that it can be hard to return the AMOC to its historic state if it shuts down from anthropogenic climate change.

The extent to which the real AMOC exhibits hysteresis and threshold behavior remains an open question. The model we use in our paper is a box model that has this kind of nonlinearity in it:

• Kirsten Zickfeld, Thomas Slawig and Stefan Rahmstorf, A low-order model for the response of the Atlantic thermohaline circulation to climate change, Ocean Dynamics 54 (2004), 8-26.

Instead of Stommel’s two boxes, this model uses four boxes:

It has three surface water boxes (north, south, and tropics), and one box for an underlying pool of deep water. Each box has its own temperature and salinity, and flow is driven by density gradients between them. The boxes have their own "relaxation temperatures" which the box tries to restore itself to upon perturbation; these parameters are set in a way that attempts to compensate for a lack of explicit feedback on global temperature. The model’s parameters are tuned to match the output of an intermediate complexity climate model.

The input to the model is a change in global temperature (temperature anomaly). This is rescaled to produce different temperature anomalies over each of the three surface boxes (accounting for the fact that different latitudes are expected to warm at different rates). There are similar scalings to determine how much freshwater input, from both precipitation changes and meltwater, is expected in each of the surface boxes due to a temperature change.

The main uncertain parameter is the "hydrological sensitivity" of the North Atlantic surface box, controlling how much freshwater goes into that region in a warming scenario. This is the main effect by which the AMOC can weaken. Actually, anything that changes the density of water alters the AMOC, so the overturning can weaken due to salinity changes from freshwater input, or from direct temperature changes in the surface waters. However, the former is more uncertain than the latter, so we focus on freshwater in our uncertainty analysis.

JB: Great! I see you’re emphasizing the uncertain parameters; we’ll talk more later about how you estimate these parameters, though you’ve already sort of sketched the idea.

So: you’ve described to me the three components of your model: the climate, the carbon cycle and the Atlantic meridional overturning current (AMOC). I guess to complete the description of your model, you should say how these components interact — right?

NU: Right. There is a two-way coupling between the climate module (DOECLIM) and the carbon cycle module (NICCS). The global temperature from the climate module is fed into the carbon cycle module to predict temperature-dependent feedbacks. The atmospheric CO2 predicted by the carbon cycle module is fed into the climate module to predict temperature from its greenhouse effect. There is a one-way coupling between the climate module and the AMOC module. Global temperature alters the overturning circulation, but changes in the AMOC do not themselves alter global temperature:



There is no coupling between the AMOC module and the carbon cycle module, although there technically should be: both the overturning circulation and the uptake of carbon by the oceans depend on ocean vertical mixing processes. Similarly, the climate and carbon cycle modules have their own independent parameters controlling the vertical mixing of heat and carbon, respectively, in the ocean. In reality these mixing rates are related to each other. In this sense, the modules are not fully coupled, insofar as they have independent representations of physical processes that are not really independent of each other. This is discussed in our caveats.

JB: There’s one other thing that’s puzzling me. The climate model treats the "ocean" as a single entity whose temperature varies with depth but not location. The AMOC model involves four "boxes" of water: north, south, tropical, and deep ocean water, each with its own temperature. That seems a bit schizophrenic, if you know what I mean. How are these temperatures related in your model?

You say "there is a one-way coupling between the climate module and the AMOC module." Does the ocean temperature in the climate model affect the temperatures of the four boxes of water in the AMOC model? And if so, how?

NU: The surface temperature in the climate model affects the temperatures of the individual surface boxes in the AMOC model. The climate model works only with globally averaged temperature. To convert a (change in) global temperature to (changes in) the temperatures of the surface boxes of the AMOC model, there is a "pattern scaling" coefficient which converts global temperature (anomaly) to temperature (anomaly) in a particular box.

That is, if the climate model predicts a 1 degree warming globally, that might be more or less than 1 °C of warming in the north Atlantic, tropics, etc. For example, we generally expect to see "polar amplification" where the high northern latitudes warm more quickly than the global average. These latitudinal scaling coefficients are derived from the output of a more complex climate model under a particular warming scenario, and are assumed to be constant (independent of warming scenario).

The temperature from the climate model which is fed into the AMOC model is the global (land+ocean) average surface temperature, not the DOECLIM sea surface temperature alone. This is because the pattern scaling coefficients in the AMOC model were derived relative to global temperature, not sea surface temperature.

JB: Okay. That’s a bit complicated, but I guess some sort consistency is built in, which prevents the climate model and the AMOC model from disagreeing about the ocean temperature. That’s what I was worrying about.

Thanks for leading us through this model. I think this level of detail is just enough to get a sense for how it works. And that I know roughly what your model is, I’m eager to see how you used it and what results you got!

But I’m afraid many of our readers may be nearing the saturation point. After all, I’ve been talking with you for days, with plenty of time to mull it over, while they will probably read this interview in one solid blast! So, I think we should quit here and continue in the next episode.

So, everyone: I’m afraid you’ll just have to wait, clutching your chair in suspense, for the answer to the big question: will the AMOC get turned off, or not? Or really: how likely is such an event, according to this simple model?


…we’re entering dangerous territory and provoking an ornery beast. Our climate system has proven that it can do very strange things.Wallace S. Broecker


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers