The Azolla Event

18 November, 2010

My friend Bruce Smith just pointed out something I’d never heard of:

Azolla event, Wikipedia.

As you may recall, the dinosaurs were wiped out by an asteroid about 65 million years ago. Then came the Cenozoic Era: first the Paleocene, then the Eocene, and so on. Back in those days, the Earth was very warm compared to now:



Paleontologists call the peak of high temperatures the “Eocene optimum”. Back then, it was about 12 °C warmer on average. The polar regions were much warmer than today, perhaps as mild as the modern-day Pacific Northwest. In fact, giant turtles and alligators thrived north of the Arctic circle!

(“Optimum?” Yes: as if the arguments over global warming weren’t confusing enough already, paleontologists use the term “optimum” for any peak of high temperatures. I think that’s a bit silly. If you were a turtle north of the Arctic circle, it was indeed jolly optimal. But what matters now is not that certain temperature levels are inherently good or bad, but that the temperature is increasing too fast for life to easily adapt.)

Why did it get colder? This is a fascinating and important puzzle. And here’s one puzzle piece I’d never heard about. I don’t know how widely accepted this story is, but here’s how it goes:

In the early Eocene, the Arctic Ocean was almost entirely surrounded by land:



A surface layer of less salty water formed from inflowing rivers, and around 49 million years ago, vast blooms of freshwater fern Azolla began to grow in the Arctic Ocean. Apparently this stuff grows like crazy. And as bits of it died, it sank to the sea floor. This went on for about 800,000 years, and formed a layer 8 up to meters thick. And some scientists speculate that this process sucked up enough carbon dioxide to significantly chill the planet. Some say CO2 concentrations fell from 3500 ppm in the early Eocene to 650 ppm at around the time of this event!

I don’t understand much about this — I just wanted to mention it. After all, right now people are thinking about fertilizing the ocean to artificially create blooms of phytoplankton that’ll soak up CO2 and fall to the ocean floor. But if you want to read a well-informed blog article on this topic, try:

• Ole Nielsen, The Azolla event (dramatic bloom 49 million years ago).

By the way, there’s a nice graph of carbon dioxide concentrations here… inferred from boron isotope measurements:

• P. N. Pearson and M. R. Palmer, Atmospheric carbon dioxide concentrations over the past 60 million years, Nature 406 (6797): 695–699.


Stabilization Wedges (Part 1)

16 November, 2010

Okay, let’s look at some plans for combating global warming! And let’s start with this paper:

• Stephen Pacala and Robert Socolow, Stabilization wedges: solving the climate problem for the next 50 years using current technologies, Science 305 (2004), 968-972.

I won’t try to summarize it all today, just a bit.

Stephen Pacala and Robert Socolow wrote this now-famous paper in 2004. Back then we were emitting about 6.2 gigatons of carbon per year, there were 375 ppm of carbon dioxide in the atmosphere, and many proposals to limit global warming urged that we keep the concentration below 500 ppm. Their paper outlined some strategies for keeping it below 500 ppm.

They estimated that to do this, it would be enough to hold emissions flat at 7 gigatons of carbon per year for 50 years, and then lower it to nothing. On the other hand, in a “business as usual” scenario, they estimate the emissions would ramp up to 14 gigatons per year by 2054. That’s 7 too many.

So, to keep emissions flat it would be enough to find 7 ways to reduce carbon emissions, each one of which ramps up linearly to the point of reducing carbon emissions by 1 gigaton/year in 2054. They called these stabilization wedges, because if you graph them, they look like wedges:



Their paper listed 15 possible stabilization wedges, each one with the potential to reduce carbon emissions by 1 gigaton/year by 2054. This is a nice way to start thinking about a very big problem, so many people have adopted it and modified it and criticized it in various ways, which I hope to discuss later. Right now I’ll only tell you about the original paper.

But before I even list any of their stabilization wedges, I should emphasize: stabilizing emissions at 7 gigatons is not enough to stay below 500 ppm forever! Carbon dioxide stays in the atmosphere a very long time. So, as Pacala and Socolow note:

Stabilization at any level requires that net emissions do not simply remain constant, but eventually drop to zero. For example, in one simple model that begins with the stabilization triangle but looks beyond 2054, 500-ppm stabilization is achieved by 50 years of flat emissions, followed by a linear decline of about two-thirds in the following 50 years, and a very slow decline thereafter that matches the declining ocean sink. To develop the revolutionary technologies required for such large emissions reductions in the second half of the century, enhanced research and development would have to begin immediately.

What’s the “declining ocean sink”? Right now the ocean is absorbing a lot of CO2, temporarily saving us from the full brunt of our carbon emissions — while coral reefs, shellfish and certain forms of plankton suffer from increased acidity. But this won’t go on forever; the ocean has limited capacity.

Pacala and Socolow consider several categories of climate wedges:

• efficiency and conservation
• shifting from coal to gas
• carbon capture and storage
• nuclear fission
• renewable energy sources
• forests and agriculture

Today let me just go through the first category. Here they list four wedges:

1. Efficient vehicles: increase the fuel economy for 2 billion cars from 30 to 60 miles per gallon. Or, for those of you who don’t have the incredible good luck of living in the USA: increasing it from 13 to 26 kilometers per liter. When they wrote their paper, there were 500 million cars on the planet. They expected that by 2054 this number would quadruple. When they wrote their paper, average fuel efficiency was 13 kilometers/liter. To achieve this wedge, we’d need that to double.

2. Reduced use of vehicles: decrease car travel for 2 billion 30-mpg cars from 10,000 to 5000 miles per year. In other words: decreasing the average travel from 16,000 to 8000 kilometers per year. (Clearly this wedge and the previous one are not additive: if we do them both, we don’t save 2 gigatons of carbon per year.)

3. Efficient buildings: cut carbon emissions by one-fourth in buildings and appliances. This could be done by following “known and established approaches” to energy efficient space heating and cooling, water heating, lighting, and refrigeration. Half the potential savings are in the buildings in developing countries.

4. Efficient coal plants: raise the efficiency of coal power plants to 60%. In 2004, when they wrote their paper, “coal plants, operating on average at 32% efficiency, produced about one fourth of all carbon emissions: 1.7 GtC/year out of 6.2 GtC/year.” They expected coal power plants to double their output by 2054. To achieve this wedge, we’d need their average efficiency to reach 60%.

There are lot of questions to ask! Which do you think are the most easily achieved of these wedges? What are the biggest problems with their reasoning so far? And so on…

I would love any interesting information you have on: 1) ways to make vehicles more efficient, 2) ways to coax people to drive less, 3) ways to make buildings more efficient, and 4) ways to make coal power plants more efficient. Please post it here, with references if you can!

I’ll conclude for now with a couple of tiny points. First, they seem to vacillate a bit between saying there were 6.2 and 7 gigatons of carbon emitted in 2004, which is a bit odd, but perhaps just a way of giving the world a bit of slack before levelling off emissions at 7 GtC/year. I guess it’s not really a big deal.

Second, they aren’t idiots: despite the above graph, they don’t really think carbon emissions will increase linearly in a business-as-usual scenario. This is just a deliberate simplification on their part. They also show this supposedly more accurate graph:



They say the top curve is “a representative business as usual emissions path” for global carbon emissions in the form of CO2 from fossil fuel combustion and cement manufacture, assuming 1.5% per year growth starting from 7.0 GtC/year in 2004. Note this ignores carbon emissions from deforestation, other greenhouse gases, etc. This curve is growing exponentially, not linearly.

Similarly, the bottom curve isn’t flat: it slopes down. They say the bottom curve is a “CO2 emissions path consistent with atmospheric CO2 stabilization at 500 ppm by 2125 akin to the Wigley, Richels, and Edmonds (WRE) family of stabilization curves described in [11], modified as described in Section 1 of the SOM text.”

Here reference [11] is:

• T. M. L. Wigley, in The Carbon Cycle, eds. T. M. L. Wigley and D. S. Schimel, Cambridge U. Press, Cambridge, 2000, pp. 258–276.

and the “SOM text” is the supporting online material for their paper, which unfortunately doesn’t seem to be available for free.


Our Future

11 November, 2010

I want to start talking about plans for cutting back carbon emissions, and some scenarios for what may happen, depending on what we do. We’ve got to figure this stuff out!

You’ve probably heard of 350.org, the grassroots organization that’s trying to cut CO2 levels from their current level of about 390 parts per million back down to 350. That’s a noble goal. However, even stabilizing at some much higher level will require a massive effort, given how long CO2 stays in the atmosphere:



In a famous 2004 paper, Pacala and Socolow estimated that in a “business-as-usual” scenario, carbon emissions would rise to 14 gigatons per year by 2054… while to keep CO2 below 500 ppm, they’d need to be held to 7 gigatons/year.

Alas, we’ve already gone up to 8 gigatons of carbon per year! How can we possibly keep things from getting much worse? Pacala and Socolow listed 15 measures, each of which could cut 1 gigaton of carbon per year:



(Click for a bigger image.)

Each one of these measures is big. For example, if you like nuclear power: build 700 gigawatts of nuclear power plants, doubling what we have now. But if you prefer wind: build turbines with 2000 gigawatts of peak capacity, multiplying by 50 what we have now. Or: build photovoltaic solar power plants with 2000 gigawatts of peak capacity, multiplying by 700 what we have now!

Now imagine doing lots of these things…

What if we do nothing? Some MIT scientists estimate that in a business-as-usual scenario, by 2095 there will be about 890 parts per million of CO2 in the atmosphere, and a 90% chance of a temperature increase between 3.5 and 7.3 degrees Celsius. Pick your scenario! The Stern Review on the Economics of Climate Change has a chart of the choices:



(Again, click for a bigger image.)

Of course the Stern Review has its detractors. I’m not claiming any of these issues are settled: I’m just trying to get the discussion started here. In the weeks to come, I want to go through plans and assessments in more detail, to compare them and try to find the truth.

Here are some assessments and projections I want us to discuss:

• International Panel on Climate Change Fourth Assessment Report, Climate Change 2007.

• The Dutch Government, Assessing an IPCC Assessment.

The Copenhagen Diagnosis. Summary on the Azimuth Project.

• National Research Council, Climate Stabilization Targets: Emissions, Concentrations, and Impacts over Decades to Millennia. Summary on the Azimuth Project

• K. Anderson and A. Bows, Reframing the climate change challenge in light of post-2000 emission trends. Summary on the Azimuth Project.

• William D. Norhaus, A Question of Balance: Weighing the Options on Global Warming Policies.

• The Stern Review on the Economics of Climate Change.

And here are some “plans of action”:

The Kyoto Protocol.

• World Nuclear Association, Nuclear Century Outlook. Summary and critique on the Azimuth Project.

• Mark Z. Jacobson and Mark A. Delucchi, A path to sustainable energy: how to get all energy from wind, water and solar power by 2030. Summary and critique on the Azimuth Project.

• Joe Romm, How the world can (and will) stabilize at 350 to 450 ppm: The full global warming solution. Summary on the Azimuth Project.

• Robert Pacala and Stephen Socolow, Stabilization wedges: solving the climate problem for the next 50 years with current technologies. Summary on the Azimuth Project

• New Economics Foundation, The Great Transition: A Tale of How it Turned Out Right. Summary on the Azimuth Project.

• The Union of Concerned Scientists, Climate 2030: A National Blueprint for a Clean Energy Economy.

• The Scottish Government, Renewables Action Plan.

• Bjorn Lømborg and the Copenhagen Business School, Smart Solutions to Climate Change.

As you can see, there’s already a bit about some of these on the Azimuth Project. I want more.

What are the most important things I’m missing on this list? I want broad assessments and projections of the world-wide situation on carbon emissions and energy, and even better, global plans of action. I want us to go through these, compare them, and try to understand where we stand.


This Week’s Finds (Week 305)

5 November, 2010

Nathan Urban has been telling us about a paper where he estimated the probability that global warming will shut down a major current in the Atlantic Ocean:

• Nathan M. Urban and Klaus Keller, Probabilistic hindcasts and projections of the coupled climate, carbon cycle and Atlantic meridional overturning circulation system: a Bayesian fusion of century-scale observations with a simple model, Tellus A, July 16, 2010.

We left off last time with a cliff-hanger: I didn’t let him tell us what the probability is! Since you must have been clutching your chair ever since, you’ll be relieved to hear that the answer is coming now, in the final episode of this interview.

But it’s also very interesting how he and Klaus Keller got their answer. As you’ll see, there’s some beautiful math involved. So let’s get started…

JB: Last time you told us roughly how your climate model works. This time I’d like to ask you about the rest of your paper, leading up to your estimate of the probability that the Atlantic Meridional Overturning Current (or "AMOC") will collapse. But before we get into that, I’d like to ask some very general questions.

For starters, why are scientists worried that the AMOC might collapse?

Last time I mentioned the Younger Dryas event, a time when Europe became drastically colder for about 1300 years, starting around 10,800 BC. Lots of scientists think this event was caused by a collapse of the AMOC. And lots of them believe it was caused by huge amounts of fresh water pouring into the north Atlantic from an enormous glacial lake. But nothing quite like that is happening now! So if the AMOC collapses in the next few centuries, the cause would have to be a bit different.

NU: In order for the AMOC to collapse, the overturning circulation has to weaken. The overturning is driven by the sinking of cold and salty, and therefore dense, water in the north Atlantic. Anything that affects the density structure of the ocean can alter the overturning.

As you say, during the Younger Dryas, it is thought that a lot of fresh water suddenly poured into the Atlantic from the draining of a glacial lake. This lessened the density of the surface waters and reduced the rate at which they sank, shutting down the overturning.

Since there aren’t any large glacial lakes left that could abruptly drain into the ocean, the AMOC won’t shut down in the same way it previously did. But it’s still possible that climate change could cause it to shut down. The surface waters from the north Atlantic can still freshen (and become less dense), either due to the addition of fresh water from melting polar ice and snow, or due to increased precipitation to the northern latitudes. In addition, they can simply become warmer, which also makes them less dense, reducing their sinking rate and weakening the overturning.

In combination, these three factors (warming, increased precipitation, meltwater) can theoretically shut down the AMOC if they are strong enough. This will probably not be as abrupt or extreme an event as the Younger Dryas, but it can still persistently alter the regional climate.

JB: I’m trying to keep our readers in suspense for a bit longer, but I don’t think it’s giving away too much to say that when you run your model, sometimes the AMOC shuts down, or at least slows down. Can you say anything about how this tends to happen, when it does? In your model, that is. Can you tell if it’s mainly warming, or increased precipitation, or meltwater?

NU: The short answer is "mainly warming, probably". The long answer:

I haven’t done experiments with the box model myself to determine this, but I can quote from the Zickfeld et al. paper where this model was published. It says, for their baseline collapse experiment,

In the box model the initial weakening of the overturning circulation is mainly due to thermal forcing […] This effect is amplified by a negative feedback on salinity, since a weaker circulation implies reduced salt advection towards the northern latitudes.

Even if they turn off all the freshwater input, they find substantial weakening of the AMOC from warming alone.

Freshwater could potentially become the dominant effect on the AMOC if more freshwater is added than in the paper’s baseline experiment. The paper did report computer experiments with different freshwater inputs, but upon skimming it, I can’t immediately tell whether the thermal effect loses its dominance.

These experiments have also been performed using more complex climate models. This paper reports that in all the models they studied, the AMOC weakening is caused more by changes in surface heat flux than by changes in surface water flux:

• J. M. Gregory et al., A model intercomparison of changes in the Atlantic thermohaline circulation in response to increasing atmospheric CO2 concentration, Geophysical Research Letters 32 (2005), L12703.

However, that paper studied "best-estimate" freshwater fluxes, not the fluxes on the high end of what’s possible, so I don’t know whether thermal effects would still dominate if the freshwater input ends up being large. There are papers that suggest freshwater input from Greenland, at least, won’t be a dominant factor any time soon:

• J. H. Jungclaus et al., Will Greenland melting halt the thermohaline circulation?, Geophysical Research Letters 33 (2006), L17708.

• E. Driesschaert et al., Modeling the influence of Greenland ice sheet melting on the Atlantic meridional overturning circulation during the next millennia, Geophysical Research Letters 34 (2007), L10707.

I’m not sure what the situation is for precipitation, but I don’t think that would be much larger than the meltwater flux. In summary, it’s probably the thermal effects that dominate, both in complex and simpler models.

Note that in our version of the box model, the precipitation and meltwater fluxes are combined into one number, the "North Atlantic hydrological sensitivity", so we can’t distinguish between those sources of water. This number is treated as uncertain in our analysis, lying within a range of possible values determined from the hydrologic changes predicted by complex models. The Zickfeld et al. paper experimented with separating them into the two individual contributions, but my version of the model doesn’t do that.

JB: Okay. Now back to what you and Klaus Keller actually did in your paper. You have a climate model with a bunch of adjustable knobs, or parameters. Some of these parameters you take as "known" from previous research. Others are more uncertain, and that’s where the Bayesian reasoning comes in. Very roughly, you use some data to guess the probability that the right settings of these knobs lie within any given range.

How many parameters do you treat as uncertain?

NU: 18 parameters in total. 7 model parameters that control dynamics, 4 initial conditions, and 7 parameters describing error statistics.

JB: What are a few of these parameters? Maybe you can tell us about some of the most important ones — or ones that are easy to understand.

NU: I’ve mentioned these briefly in "week304" in the model description. The AMOC-related parameter is the hydrologic sensitivity I described above, controlling the flux of fresh water into the North Atlantic.

There are three climate related parameters:

• the climate sensitivity (the equilibrium warming expected in response to doubled CO2),

• the ocean heat vertical diffusivity (controlling the rate at which oceans absorb heat from the atmosphere), and

• "aerosol scaling", a factor that multiplies the strength of the aerosol-induced cooling effect, mostly due to uncertainties in aerosol-cloud interactions.

I discussed these in "week302" in the part about total feedback estimates.

There are also three carbon cycle related parameters:

• the heterotrophic respiration sensitivity (describing how quickly dead plants decay when it gets warmer),

• CO2 fertilization (how much faster plants grow in CO2-elevated conditions), and

• the ocean carbon vertical diffusivity (the rate at which the oceans absorb CO2 from the atmosphere).

The initial conditions describe what the global temperature, CO2 level, etc. were at the start of my model simulations, in 1850. The statistical parameters describe the variance and autocorrelation of the residual error between the observations and the model, due to measurement error, natural variability, and model error.

JB: Could you say a bit about the data you use to estimate these uncertain parameters? I see you use a number of data sets.

NU: We use global mean surface temperature and ocean heat content to constrain the three climate parameters. We use atmospheric CO2 concentration and some ocean flux measurements to constrain the carbon parameters. We use measurements of the AMOC strength to constrain the AMOC parameter. These are all time series data, mostly global averages — except the AMOC strength, which is an Atlantic-specific quantity defined at a particular latitude.

The temperature data are taken by surface weather stations and are for the years 1850-2009. The ocean heat data are taken by shipboard sampling, 1953-1996. The atmospheric CO2 concentrations are measured from the Mauna Loa volcano in Hawaii, 1959-2009. There are also some ice core measurements of trapped CO2 at Law Dome, Antarctica, dated to 1854-1953. The air-sea CO2 fluxes, for the 1980s and 1990s, are derived from measurements of dissolved inorganic carbon in the ocean, combined with measurements of manmade chlorofluorocarbon to date the water masses in which the carbon resides. (The dates tell you when the carbon entered the ocean.)

The AMOC strength is reconstructed from station measurements of poleward water circulation over an east-west section of the Atlantic Ocean, near 25 °N latitude. Pairs of stations measure the northward velocity of water, inferred from the ocean bottom pressure differences between northward and southward station pairs. The velocities across the Atlantic are combined with vertical density profiles to determine an overall rate of poleward water mass transport. We use seven AMOC strength estimates measured sparsely between the years 1957 and 2004.

JB: So then you start the Bayesian procedure. You take your model, start it off with your 18 parameters chosen somehow or other, run it from 1850 to now, and see how well it matches all this data you just described. Then you tweak the parameters a bit — last time we called that "turning the knobs" — and run the model again. And then you do this again and again, lots of times. The goal is to calculate the probability that the right settings of these knobs lie within any given range.

Is that about right?

NU: Yes, that’s right.

JB: About how many times did you actually run the model? Is the sort of thing you can do on your laptop overnight, or is it a mammoth task?

NU: I ran the model a million times. This took about two days on a single CPU. Some of my colleagues later ported the model from Matlab to Fortran, and now I can do a million runs in half an hour on my laptop.

JB: Cool! So if I understand correctly, you generated a million lists of 18 numbers: those uncertain parameters you just mentioned.

Or in other words: you created a cloud of points: a million points in an 18-dimensional space. Each point is a choice of those 18 parameters. And the density of this cloud near any point should be proportional to the probability that the parameters have those values.

That’s the goal, anyway: getting this cloud to approximate the right probability density on your 18-dimensional space. To get this to happen, you used the Markov chain Monte Carlo procedure we discussed last time.

Could you say in a bit more detail how you did this, exactly?

NU: There are two steps. One is to write down a formula for the probability of the parameters (the "Bayesian posterior distribution"). The second is to draw random samples from that probability distribution using Markov chain Monte Carlo (MCMC).

Call the parameter vector θ and the data vector y. The Bayesian posterior distribution p(θ|y) is a function of θ which says how probable θ is, given the data y that you’ve observed. The little bar (|) indicates conditional probability: p(θ|y) is the probability of θ, assuming that you know y happened.

The posterior factorizes into two parts, the likelihood and the prior. The prior, p(θ) says how probable you think a particular 18-dimensional vector of parameters is, before you’ve seen the data you’re using. It encodes your "prior knowledge" about the problem, unconditional on the data you’re using.

The likelihood, p(y|θ), says how likely it is for the observed data to arise from a model run using some particular vector of parameters. It describes your data generating process: assuming you know what the parameters are, how likely are you to see data that looks like what you actually measured? (The posterior is the reverse of this: how probable are the parameters, assuming the data you’ve observed?)

Bayes’s theorem simply says that the posterior is proportional to the product of these two pieces:

p(θ|y) ∝ p(y|θ) × p(θ)

If I know the two pieces, I multiply them together and use MCMC to sample from that probability distribution.

Where do the pieces come from? For the prior, we assumed bounded uniform distributions on all but one parameter. Such priors express the belief that each parameter lies within some range we deemed reasonable, but we are agnostic about whether one value within that range is more probable than any other. The exception is the climate sensitivity parameter. We have prior evidence from computer models and paleoclimate data that the climate sensitivity is most likely around 2 or 3 °C, albeit with significant uncertainties. We encoded this belief using a "diffuse" Cauchy distribution peaked in this range, but allowing substantial probability to be outside it, so as to not prematurely exclude too much of the parameter range based on possibly overconfident prior beliefs. We assume the priors on all the parameters are independent of each other, so the prior for all of them is the product of the prior for each of them.

For the likelihood, we assumed a normal (Gaussian) distribution for the residual error (the scatter of the data about the model prediction). The simplest such distribution is the independent and identically distributed ("iid") normal distribution, which says that all the data points have the same error and the errors at each data point are independent of each other. Neither of these assumptions is true. The errors are not identical, since they get bigger farther in the past, when we measured data with less precision than we do today. And they’re not independent, because if one year is warmer than the model predicts, the next year likely to be also warmer than the model predicts. There are various possible reasons for this: chaotic variability, time lags in the system due to finite heat capacity, and so on.

In this analysis, we kept the identical-error assumption for simplicity, even though it’s not correct. I think this is justifiable, because the strongest constraints on the parameters come from the most recent data, when the largest climate and carbon cycle changes have occurred. That is, the early data are already relatively uninformative, so if their errors get bigger, it doesn’t affect the answer much.

We rejected the independent-error assumption, since there is very strong autocorrelation (serial dependence) in the data, and ignoring autocorrelation is known to lead to overconfidence. When the errors are correlated, it’s harder to distinguish between a short-term random fluctuation and a true trend, so you should be more uncertain about your conclusions. To deal with this, we assumed that the errors obey a correlated autoregressive "red noise" process instead of an uncorrelated "white noise" process. In the likelihood, we converted the red-noise errors to white noise via a "whitening" process, assuming we know how much correlation is present. (We’re allowed to do that in the likelihood, because it gives the probability of the data assuming we know what all the parameters are, and the autocorrelation is one of the parameters.) The equations are given in the paper.

Finally, this gives us the formula for our posterior distribution.

JB: Great! There’s a lot of technical material here, so I have many questions, but let’s go through the whole story first, and come back to those.

NU: Okay. Next comes step two, which is to draw random samples from the posterior probability distribution via MCMC.

To do this, we use the famous Metropolis algorithm, which was invented by a physicist of that name, along with others, to do computations in statistical physics. It’s a very simple algorithm which takes a "random walk" through parameter space.

You start out with some guess for the parameters. You randomly perturb your guess to a nearby point in parameter space, which you are going to propose to move to. If the new point is more probable than the point you were at (according to the Bayesian posterior distribution), then accept it as a new random sample. If the proposed point is less probable than the point you’re at, then you randomly accept the new point with a certain probability. Otherwise you reject the move, staying where you are, treating the old point as a duplicate random sample.

The acceptance probability is equal to the ratio of the posterior distribution at the new point to the posterior distribution at the old point. If the point you’re proposing to move to is, say, 5 times less probable than the point you are at now, then there’s a 20% chance you should move there, and a 80% chance that you should stay where you are.

If you iterate this method of proposing new "jumps" through parameter space, followed by the Metropolis accept/reject procedure, you can prove that you will eventually end up with a long list of (correlated) random samples from the Bayesian posterior distribution.

JB: Okay. Now let me ask a few questions, just to help all our readers get up to speed on some jargon.

Lots of people have heard of a "normal distribution" or "Gaussian", because it’s become sort of the default choice for probability distributions. It looks like a bell curve:

When people don’t know the probability distribution of something — like the tail lengths of newts or the IQ’s of politicians — they often assume it’s a Gaussian.

But I bet fewer of our readers have heard of a "Cauchy distribution". What’s the point of that? Why did you choose that for your prior probability distribution of the climate sensitivity?

NU: There is a long-running debate about the "upper tail" of the climate sensitivity distribution. High climate sensitivities correspond to large amounts of warming. As you can imagine, policy decisions depend a lot on how likely we think these extreme outcomes could be, i.e., how quickly the "upper tail" of the probability distribution drops to zero.

A Gaussian distribution has tails that drop off exponentially quickly, so very high sensitivities will never get any significant weight. If we used it for our prior, then we’d almost automatically get a "thin tailed" posterior, no matter what the data say. We didn’t want to put that in by assumption and automatically conclude that high sensitivities should get no weight, regardless of what the data say. So we used a weaker assumption, which is a "heavy tailed" prior distribution. With this prior, the probability of large amounts of warming drops off more slowly, as a power law, instead of exponentially fast. If the data strongly rule out high warming, we can get a thin tailed posterior, but if they don’t, it will be heavy tailed. The Cauchy distribution, a limiting case of the "Student t" distribution that students of statistics may have heard of, is one of the most conservative choices for a heavy-tailed prior. Probability drops off so slowly at its tails that its variance is infinite.

JB: The issue of "fat tails" is also important in the stock market, where big crashes happen more frequently than you might guess with a Gaussian distribution. After the recent economic crisis I saw a lot of financiers walking around with their tails between their legs, wishing their tails had been fatter.

I’d also like to ask about "white noise" versus "red noise". "White noise" is a mathematical description of a situation where some quantity fluctuates randomly with time in a way so that it’s value at any time is completely uncorrelated with its value at any other time. If you graph an example of white noise, it looks really spiky:



If you play it as a sound, it sounds like hissy static — quite unpleasant. If you could play it in the form of light, it would look white, hence the name.

"Red noise" is less wild. Its value at any time is still random, but it’s correlated to the values at earlier or later times, in a specific way. So it looks less spiky:



and it sounds less high-pitched, more like a steady rainfall. Since it’s stronger at low frequencies, it would look more red if you could play it in the form of light — hence the name "red noise".

If understand correctly, you’re assuming that some aspects of the climate are noisy, but in a red noise kind of way, when you’re computing p(y|θ): the likelihood that your data takes on the value y, given your climate model with some specific choice of parameters θ.

Is that right? You’re assuming this about all your data: the temperature data from weather stations, the ocean heat data are from shipboard samples, the atmospheric CO2 concentrations at Mauna Loa volcano in Hawaii, the ice core measurements of trapped CO2, the air-sea CO2 fluxes, and also the AMOC strength? Red, red, red — all red noise?

NU: I think the red noise you’re talking about refers to a specific type of autocorrelated noise ("Brownian motion"), with a power spectrum that is inversely proportional to the square of frequency. I’m using "red noise" more generically to speak of any autocorrelated process that is stronger at low frequencies. Specifically, the process we use is a first-order autoregressive, or "AR(1)", process. It has a more complicated spectrum than Brownian motion.

JB: Right, I was talking about "red noise" of a specific mathematically nice sort, but that’s probably less convenient for you. AR(1) sounds easier for computers to generate.

NU: It’s not only easier for computers, but closer to the spectrum we see in our analysis.

Note that when I talk about error I mean "residual error", which is the difference between the observations and the model prediction. If the residual error is correlated in time, that doesn’t necessarily reflect true red noise in the climate system. It could also represent correlated errors in measurement over time, or systematic errors in the model. I am not attempting to distinguish between all these sources of error. I’m just lumping them all together into one total error process, and assuming it has a simple statistical form.

We assume the residual errors in the annual surface temperature, ocean heat, and instrumental CO2 time series are AR(1). The ice core CO2, air-sea CO2 flux, and AMOC strength data are sparse, and we can’t really hope to estimate the correlation between them, so we assume their residual errors are uncorrelated.

Speaking of correlation, I’ve been talking about "autocorrelation", which is correlation within one data set between one time and another. It’s also possible for the errors in different data sets to be correlated with each other ("cross correlation"). We assumed there is no cross correlation (and residual analysis suggests only weak correlation between data sets).

JB: I have a few more technical questions, but I bet most of our readers are eager to know: so, what next?

You use all these nifty mathematical methods to work out p(θ|y), the probability that your 18 parameters have any specific value given your data. And now I guess you want to figure out the probability that the Atlantic Meridional Overturning Current, or AMOC, will collapse by some date or other.

How do you do this? I guess most people want to know the answer more than the method, but they’ll just have to wait a few more minutes.

NU: That’s easy. After MCMC, we have a million runs of the model, sampled in proportion how well the model fits historic data. There will be lots of runs that agree well with the data, and a few that agree less well. All we do now is extend each of those runs into the future, using an assumed scenario for what CO2 emissions and other radiative forcings will do in the future. To find out the probability that the AMOC will collapse by some date, conditional on the assumptions we’ve made, we just count what fraction of the runs have an AMOC strength of zero in whatever year we care about.

JB: Okay, that’s simple enough. What scenario, or scenarios, did you consider?

NU: We considered a worst-case "business as usual" scenario in which we continue to burn fossil fuels at an accelerating rate until we start to run out of them, and eventually burn the maximum amount of fossil fuels we think there might be remaining (about 5000 gigatons worth of of carbon, compared to the roughly 500 gigatons we’ve emitted so far). This assumes we get desperate for cheap energy and extract all the hard-to-get fossil resources in oil shales and tar sands, all the remaining coal, etc. It doesn’t necessarily preclude the use of non-fossil energy; it just assumes that our appetite for energy grows so rapidly that there’s no incentive to slow down fossil fuel extraction. We used a simple economic model to estimate how fast we might do this, if the world economy continues to grow at a similar rate to the last few decades.

JB: And now for the big question: what did you find? How likely is it that the AMOC will collapse, according to your model? Of course it depends how far into the future you look.

NU: We find a negligible probability that the AMOC will collapse this century. The odds start to increase around 2150, rising to about a 10% chance by 2200, and a 35% chance by 2300, the last year considered in our scenario.

JB: I guess one can take this as good news or really scary news, depending on how much you care about folks who are alive in 2300. But I have some more questions. First, what’s a "negligible probability"?

NU: In this case, it’s less than 1 in 3000. For computational reasons, we only ran 3000 of the million samples forward into the future. There were no samples in this smaller selection that had the AMOC collapsed in 2100. The probability rises to 1 in 3000 in the year 2130 (the first time I see a collapse in this smaller selection), and 1% in 2152. You should take these numbers with a grain of salt. It’s these rare "tail-area events" that are most sensitive to modeling assumptions.

JB: Okay. And second, don’t the extrapolations become more unreliable as you keep marching further into the future? You need to model not only climate physics but also the world economy. In this calculation, how many gigatons of carbon dioxide per year are you assuming will be emitted in 2300? I’m just curious. In 1998 it was about 27.6 gigatons. By 2008, it was about 30.4.

NU: Yes, the uncertainty grows with time (and this is reflected in our projections). And in considering a fixed emissions scenario, we’ve ignored the economic uncertainty, which, so far out into the future, is even larger than the climate uncertainty. Here we’re concentrating on just the climate uncertainty, and are hoping to get an idea of bounds, so we used something close to a worst-case economic scenario. In this scenario carbon emissions peak around 2150 at about 23 gigatons carbon per year (84 gigatons CO2). By 2300 they’ve tapered off to about 4 GtC (15 GtCO2).

Actual future emissions may be less than this, if we act to reduce them, or there are fewer economically extractable fossil resources than we assume, or the economy takes a prolonged downturn, etc. Actually, it’s not completely an economic worst case; it’s possible that the world economy could grow even faster than we assume. And it’s not the worst case scenario from a climate perspective, either. For example, we don’t model potential carbon emissions from permafrost or methane clathrates. It’s also possible that climate sensitivity could be higher than what we find in our analysis.

JB: Why even bother projecting so far out into the future, if it’s so uncertain?

NU: The main reason is because it takes a while for the AMOC to weaken, so if we’re interested in what it would take to make it collapse, we have to run the projections out a few centuries. But another motivation for writing this paper is policy related, having to do with the concept of "climate commitment" or "triggering". Even if it takes a few centuries for the AMOC to collapse, it may take less time than that to reach a "point of no return", where a future collapse has already been unavoidably "triggered". Again, to investigate this question, we have to run the projections out far enough to get the AMOC to collapse.

We define "the point of no return" to be a point in time which, if CO2 emissions were immediately reduced to zero and kept there forever, the AMOC would still collapse by the year 2300 (an arbitrary date chosen for illustrative purposes). This is possible because even if we stop emitting new CO2, existing CO2 concentrations, and therefore temperatures, will remain high for a long time (see "week303").

In reality, humans wouldn’t be able to reduce emissions instantly to zero, so the actual "point of no return" would likely be earlier than what we find in our study. We couldn’t economically reduce emissions fast enough to avoid triggering an AMOC collapse. (In this study we ignore the possibility of negative carbon emissions, that is, capturing CO2 directly from the atmosphere and sequestering it for a long period of time. We’re also ignoring the possibility of climate geoengineering, which is global cooling designed to cancel out greenhouse warming.)

So what do we find? Although we calculate a negligible probability that the AMOC will collapse by the end of this century, the probability that, in this century, we will commit later generations to a collapse (by 2300) is almost 5%. The probabilities of "triggering" rise rapidly, to almost 20% by 2150 and about 33% by 2200, even though the probability of experiencing a collapse by those dates is about 1% and 10%, respectively. You can see it in this figure from our paper:



The take-home message is that while most climate projections are currently run out to 2100, we shouldn’t fixate only on what might happen to people this century. We should consider what climate changes our choices in this century, and beyond, are committing future generations to experiencing.

JB: That’s a good point!

I’d like to thank you right now for a wonderful interview, that really taught me — and I hope our readers — a huge amount about climate change and climate modelling. I think we’ve basically reached the end here, but as the lights dim and the audience files out, I’d like to ask just a few more technical questions.

One of them was raised by David Tweed. He pointed out that while you’re "training" your model on climate data from the last 150 years or so, you’re using it to predict the future in a world that will be different in various ways: a lot more CO2 in the atmosphere, hotter, and so on. So, you’re extrapolating rather than interpolating, and that’s a lot harder. It seems especially hard if the collapse of the AMOC is a kind of "tipping point" — if it suddenly snaps off at some point, instead of linearly decreasing as some parameter changes.

This raises the question: why should we trust your model, or any model of this sort, to make such extrapolations correctly? In the discussion after that comment, I think you said that ultimately it boils down to

1) whether you think you have the physics right,

and

2) whether you think the parameters change over time.

That makes sense. So my question is: what are some of the best ways people could build on the work you’ve done, and make more reliable predictions about the AMOC? There’s a lot at stake here!

NU: Our paper is certainly an early step in making probabilistic AMOC projections, with room for improvement. I view the main points as (1) estimating how large the climate-related uncertainties may be within a given model, and (2) illustrating the difference between experiencing, and committing to, a climate change. It’s certainly not an end-all "prediction" of what will happen 300 years from now, taking into account all possible model limitations, economic uncertainties, etc.

To answer your question, the general ways to improve predictions are to improve the models, and/or improve the data constraints. I’ll discuss both.

Although I’ve argued that our simple box model reasonably reproduces the dynamics of the more complex model it was designed to approximate, that complex model itself isn’t the best model available for the AMOC. The problem with using complex climate models is that it’s computationally impossible to run them millions of times. My solution is to work with "statistical emulators", which are tools for building fast approximations to slow models. The idea is to run the complex model a few times at different points in its parameter space, and then statistically interpolate the resulting outputs to predict what the model would have output at nearby points. This works if the model output is a smooth enough function of the parameters, and there are enough carefully-chosen "training" points.

From an oceanographic standpoint, even current complex models are probably not wholly adequate (see the discussion at the end of "week304"). There is some debate about whether the AMOC becomes more stable as the resolution of the model increases. On the other hand, people still have trouble getting the AMOC in models, and the related climate changes, to behave as abruptly as they apparently did during the Younger Dryas. I think the range of current models is probably in the right ballpark, but there is plenty of room for improvement. Model developers continue to refine their models, and ultimately, the reliability of any projection is constrained by the quality of models available.

Another way to improve predictions is to improve the data constraints. It’s impossible to go back in time and take better historic data, although with things like ice cores, it is possible to dig up new cores to analyze. It’s also possible to improve some historic "data products". For example, the ocean heat data is subject to a lot of interpolation of sparse measurements in the deep ocean, and one could potentially improve the interpolation procedure without going back in time and taking more data. There are also various corrections being applied for known biases in the data-gathering instruments and procedures, and it’s possible those could be improved too.

Alternatively, we can simply wait. Wait for new and more precise data to become available.

But when I say "improve the data constraints", I’m mostly talking about adding more of them, that I simply didn’t include in the analysis, or looking at existing data in more detail (like spatial patterns instead of global averages). For example, the ocean heat data mostly serves to constrain the vertical mixing parameter, controlling how quickly heat penetrates into the deep ocean. But we can also look at the penetration of chemicals in the ocean (such carbon from fossil fuels, or chlorofluorocarbons). This is also informative about how quickly water masses mix down to the ocean depths, and indirectly informative about how fast heat mixes. I can’t do that with my simple model (which doesn’t have the ocean circulation of any of these chemicals in it), but I can with more complex models.

As another example, I could constrain the climate sensitivity parameter better with paleoclimate data, or more resolved spatial data (to try to, e.g., pick up the spatial fingerprint of industrial aerosols in the temperature data), or by looking at data sets informative about particular feedbacks (such as water vapor), or at satellite radiation budget data.

There is a lot of room for reducing uncertainties by looking at more and more data sets. However, this presents its own problems. Not only is this simply harder to do, but it runs more directly into limitations in the models and data. For example, if I look at what ocean temperature data implies about a model’s vertical mixing parameter, and what ocean chemical data imply, I might find that they imply two inconsistent values for the parameter! Or that those data imply a different mixing than is implied by AMOC strength measurements. This can happen if there are flaws in the model (or in the data). We have some evidence from other work that there are circumstances in which this can happen:

• A. Schmittner, N. M. Urban, K. Keller and D. Matthews, Using tracer observations to reduce the uncertainty of ocean diapycnal mixing and climate-carbon cycle projections, Global Biogeochemical Cycles 23 (2009), GB4009.

• M. Goes, N. M. Urban, R. Tonkonojenkov, M. Haran, and K. Keller, The skill of different ocean tracers in reducing uncertainties about projections of the Atlantic meridional overturning circulation, Journal of Geophysical Research — Oceans, in press (2010).

How to deal with this, if and when it happens, is an open research challenge. To an extent it depends on expert judgment about which model features and data sets are "trustworthy". Some say that expert judgment renders conclusions subjective and unscientific, but as a scientist, I say that such judgments are always applied! You always weigh how much you trust your theories and your data when deciding what to conclude about them.

In my response I’ve so far ignored the part about parameters changing in time. I think the hydrological sensitivity (North Atlantic freshwater input as a function of temperature) can change with time, and this could be improved by using a better climate model that includes ice and precipitation dynamics. Feedbacks can fluctuate in time, but I think it’s okay to treat them as a constant for long term projections. Some of these parameters can also be spatially dependent (e.g., the respiration sensitivity in the carbon cycle). I think treating them all as constant is a decent first approximation for the sorts of generic questions we’re asking in the paper. Also, all the parameter estimation methods I’ve described only work with static parameters. For time varying parameters, you need to get into state estimation methods like Kalman or particle filters.

JB: I also have another technical question, which is about the Markov chain Monte Carlo procedure. You generate your cloud of points in 18-dimensional space by a procedure where you keep either jumping randomly to a nearby point, or staying put, according to that decision procedure you described. Eventually this cloud fills out to a good approximation of the probability distribution you want. But, how long is "eventually"? You said you generated a million points. But how do you know that’s enough?

NU: This is something of an art. Although there is an asymptotic convergence theorem, there is no general way of knowing whether you’ve reached convergence. First you check to see whether your chains "look right". Are they sweeping across the full range of parameter space where you expect significant probability? Are they able to complete many sweeps (thoroughly exploring parameter space)? Is the Metropolis test accepting a reasonable fraction of proposed moves? Do you have enough effective samples in your Markov chain? (MCMC generates correlated random samples, so there are fewer "effectively independent" samples in the chain than there are total samples.) Then you can do consistency checks: start the chains at several different locations in parameter space, and see if they all converge to similar distributions.

If the posterior distribution shows, or is expected to show, a lot of correlation between parameters, you have to be more careful to ensure convergence. You want to propose moves that carry you along the "principal components" of the distribution, so you don’t waste time trying to jump away from the high probability directions. (Roughly, if your posterior density is concentrated on some low dimensional manifold, you want to construct your way of moving around parameter space to stay near that manifold.) You also have to be careful if you see, or expect, multimodality (multiple peaks in the probability distribution). It can be hard for MCMC to move from one mode to another through a low-probability "wasteland"; it won’t be inclined to jump across it. There are more advanced algorithms you can use in such situations, if you suspect you have multimodality. Otherwise, you might discover later that you only sampled one peak, and never noticed that there were others.

JB: Did you do some of these things when testing out the model in your paper? Do you have any intuition for the "shape" of the probability distribution in 18-dimensional space that lies at the heart of your model? For example: do you know if it has one peak, or several?

NU: I’m pretty confident that the MCMC in our analysis is correctly sampling the shape of the probability distribution. I ran lots and lots of analyses, starting the chain in different ways, tweaking the proposal distribution (jumping rule), looking at different priors, different model structures, different data, and so on.

It’s hard to "see" what an 18-dimensional function looks like, but we have 1-dimensional and 2-dimensional projections of it in our paper:





I don’t believe that it has multiple peaks, and I don’t expect it to. Multiple peaks usually show up when the model behavior is non-monotonic as a function of the parameters. This can happen in really nonlinear systems (an with threshold systems like the AMOC), but during the historic period I’m calibrating the model to, I see no evidence of this in the model.

There are correlations between parameters, so there are certain "directions" in parameter space that the posterior distribution is oriented along. And the distribution is not Gaussian. There is evidence of skew, and nonlinear correlations between parameters. Such correlations appear when the data are insufficient to completely identify the parameters (i.e., different combinations of parameters can produce similar model output). This is discussed in more detail in another of our papers:

• Nathan M. Urban and Klaus Keller, Complementary observational constraints on climate sensitivity, Geophysical Research Letters 36 (2009), L04708.

In a Gaussian distribution, the distribution of any pair of parameters will look ellipsoidal, but our distribution has some "banana" or "boomerang" shaped pairwise correlations. This is common, for example, when the model output is a function of the product of two parameters.

JB: Okay. It’s great that we got a chance to explore some of the probability theory and statistics underlying your work. It’s exciting for me to see these ideas being used to tackle a big real-life problem. Thanks again for a great interview.


Maturity is the capacity to endure uncertainty. – John Finley


2010 Singapore Energy Lecture

2 November, 2010

Yesterday morning Lisa and I took a bus downtown to see the Prime Minister of Singapore, Lee Hsien Loong, give the Singapore Energy Lecture. It was part of a big annual event, the Singapore International Energy Week. We met our friend Walter Blackstock, had a last cup of coffee, and filed into a banquet room, along with about 800 businessmen, to see what the Prime Minister had to say.

His lecture was clear and precise — very different than the rhetoric-filled talk I’m used to from American politicians when it comes to energy. He began by discussing the twin problems of peak oil and global warming.

‘Peak oil’ refers to the idea that oil production, having risen, is bound to eventually fall; of course this idea gains teeth when one believes it will fall rather soon. Mr Lee gave some evidence that oil production will fall in the next few decades, but then pointed out that similar predictions had been made before, and had turned out to be wrong. He concluded in an agnostic vein, and added that there are still huge supplies of coal, which become more useful as gasification technologies improve. What interested me was his use of the term ‘peak oil’, which I’ve never heard from the lips of an American president. But of course, I’ve never seen one speak at an energy conference.

He then noted that even if there are plenty of fossil fuels, burning them leads to global warming. He mentioned the 2010 United Nations Climate Change Conference, which will be held in Cancún, Mexico, from 29 November to 10 December. He said that if an agreement is reached, Singapore would abide by it and impose a price on carbon. He said:

“If there’s a global regime to curb carbon emissions, that means that Singapore will have to reduce our own emissions more sharply than we are doing now, in order to comply with international obligations, and we would have to make the carbon price explicit to send the right price signals.”

Increases in efficiency would not be enough, he noted, because of the ‘rebound effect’: more efficient energy usage just makes people use more energy. He also said:

“At present, we don’t have a carbon tax, but we calculate a shadow price for carbon in our cost-benefit analysis so that Government policies and decision making can be better-informed and rational.”

On the other hand, he said, if an agreement is not reached, uncertainty will continue to prevail about when the problem of global warming will finally be addressed. Given this, he said, Singapore supports the goal of reaching an agreement. He explicitly noted that this was a ‘commons’ problem: every individual nation stands to benefit by being the only one who continues to burn lots of carbon, but if every nation does this, the world will be harmed.

He was not optimistic about an agreement being reached in Cancún; he mentioned how Obama had begun his term in office strongly supporting climate change legislation, but was unable to stick with this intention. Nonetheless, he seemed to indicate that a price on carbon was inevitable in the long term.

He discussed three main ways to reduce carbon usage:

1) switching to sustainable forms of energy production like solar, wind and hydropower,

2) technologies such as carbon sequestration and nuclear power, and

3) conservation.

He said that Singapore is “an alternative energy-disadvantaged country”, so option 1) is limited. He explicitly mentioned that most sustainable forms of energy have a low ‘power density’, and again his correct use of jargon pleased me. He said that even if every building in Singapore was covered with solar cells, that would only generate 10% of the necessary power.

He said that the use of nuclear power was an option one cannot afford to dismiss:

“There is often strong resistance in countries – from the green movement, from populations who have witnessed accidents like Chernobyl, and are fearful and anxious about their safety. But if we look at this rationally, without nuclear energy, the world cannot make sufficient progress in dealing with global warming”.

He pointed out that America is beginning to build more nuclear plants, and that Angela Merkel, despite great pressure from the Greens, had put off the closure of such plants in Germany. He said that more plants would eventually be built in Germany, even though it’s “unspeakable” now.

He argued that it is important to start moving forward on this issue, even in a small state like Singapore where any nuclear plant would necessarily be close to densely populated areas. The crucial first step, he said, is to develop the technical know-how and the necessary “culture of safety”. When the moderator asked him whether nuclear power would be introduced during his time in office, he replied:

“I would say possibly during my lifetime.”

He also spent a lot of time discussing option 3), energy conservation. He said Singapore has a pilot program for a “smart grid” that lets households see how they’re using electrical power, and if this turns out to increase their efficiency, this would be adopted more widely.

All in all, an interesting talk.


Information Geometry (Part 5)

2 November, 2010

I’m trying to understand the Fisher information metric and how it’s related to Öttinger’s formalism for ‘dissipative mechanics’ — that is, mechanics including friction. They involve similar physics, and they involve similar math, but it’s not quite clear how they fit together.

I think it will help to do an example. The harmonic oscillator is a trusty workhorse throughout physics, so let’s do that.

So: suppose you have a rock hanging on a spring, and it can bounce up and down. Suppose it’s in thermal equilibrium with its environment. It will wiggle up and down ever so slightly, thanks to thermal fluctuations. The hotter it is, the more it wiggles. These vibrations are random, so its position and momentum at any given moment can be treated as random variables.

If we take quantum mechanics into account, there’s an extra source of randomness: quantum fluctuations. Now there will be fluctuations even at zero temperature. Ultimately this is due to the uncertainty principle. Indeed, if you know the position for sure, you can’t know the momentum at all!

Let’s see how the position, momentum and energy of our rock will fluctuate given that we know all three of these quantities on average. The fluctuations will form a little fuzzy blob, roughly ellipsoidal in shape, in the 3-dimensional space whose coordinates are position, momentum and energy:

Yeah, I know you’re sick of this picture, but this time it’s for real: I want to calculate what this ellipsoid actually looks like! I’m not promising I’ll do it — I may get stuck, or bored — but at least I’ll try.

Before I start the calculation, let’s guess the answer. A harmonic oscillator has a position q and momentum p, and its energy is

H = \frac{1}{2}(q^2 + p^2)

Here I’m working in units where lots of things equal 1, to keep things simple.

You’ll notice that this energy has rotational symmetry in the position-momentum plane. This is ultimately what makes the harmonic oscillator such a beloved physical system. So, we might naively guess that our little ellipsoid will have rotational symmetry as well, like this:

or this:

Here I’m using the x and y coordinates for position and momentum, while the z coordinate stands for energy. So in these examples the position and momentum fluctuations are the same size, while the energy fluctuations, drawn in the vertical direction, might be bigger or smaller.

Unfortunately, this guess really is naive. After all, there are lots of these ellipsoids, one centered at each point in position-momentum-energy space. Remember the rules of the game! You give me any point in this space. I take the coordinates of this point as the mean values of position, momentum and energy, and I find the maximum-entropy state with these mean values. Then I work out the fluctuations in this state, and draw them as an ellipsoid.

If you pick a point where position and momentum have mean value zero, you haven’t broken the rotational symmetry of the problem. So, my ellipsoid must be rotationally symmetric. But if you pick some other mean value for position and momentum, all bets are off!

Fortunately, this naive guess is actually right: all the ellipsoids are rotationally symmetric — even the ones centered at nonzero values of position and momentum! We’ll see why soon. And if you’ve been following this series of posts, you’ll know what this implies: the “Fisher information metric” g on position-momentum-energy space has rotational symmetry about any vertical axis. (Again, I’m using the vertical direction for energy.) So, if we slice this space with any horizontal plane, the metric on this plane must be the plane’s usual metric times a constant:

g = \mathrm{constant} \, (dq^2 + dp^2)

Why? Because only the usual metric on the plane, or any multiple of it, has ordinary rotations around every point as symmetries.

So, roughly speaking, we’re recovering the ‘obvious’ geometry of the position-momentum plane from the Fisher information metric. We’re recovering ‘ordinary’ geometry from information geometry!

But this should not be terribly surprising, since we used the harmonic oscillator Hamiltonian

H = \frac{1}{2}(q^2 + p^2)

as an input to our game. It’s mainly just a confirmation that things are working as we’d hope.

There’s more, though. Last time I realized that because observables in quantum mechanics don’t commute, the Fisher information metric has a curious skew-symmetric partner called \omega. So, we should also study this in our example. And when we do, we’ll see that restricted to any horizontal plane in position-momentum-energy space, we get

\omega = \mathrm{constant} \, (dq \, dp - dp \, dq)

This looks like a mutant version of the Fisher information metric

g = \mathrm{constant} \, (dq^2 + dp^2)

and if you know your geometry, you’ll know it’s the usual ‘symplectic structure’ on the position-energy plane — at least, times some constant.

All this is very reminiscent of Öttinger’s work on dissipative mechanics. But we’ll also see something else: while the constant in g depends on the energy — that is, on which horizontal plane we take — the constant in \omega does not!

Why? It’s perfectly sensible. The metric g on our horizontal plane keeps track of fluctuations in position and momentum. Thermal fluctuations get bigger when it’s hotter — and to boost the average energy of our oscillator, we must heat it up. So, as we increase the energy, moving our horizontal plane further up in position-momentum-energy space, the metric on the plane gets bigger! In other words, our ellipsoids get a fat cross-section at high energies.

On the other hand, the symplectic structure \omega arises from the fact that position q and momentum p don’t commute in quantum mechanics. They obey Heisenberg’s ‘canonical commutation relation’:

q p - p q = i

This relation doesn’t involve energy, so \omega will be the same on every horizontal plane. And it turns out this relation implies

\omega  = \mathrm{constant} \, (dq \, dp - dp \, dq)

for some constant we’ll compute later.

Okay, that’s the basic idea. Now let’s actually do some computations. For starters, let’s see why all our ellipsoids have rotational symmetry!

To do this, we need to understand a bit about the mixed state \rho that maximizes entropy given certain mean values of position, momentum and energy. So, let’s choose the numbers we want for these mean values (also known as ‘expected values’ or ‘expectation values’):

\langle H \rangle = E

\langle q \rangle = q_0

\langle p \rangle = p_0

I hope this isn’t too confusing: H, p, q are our observables which are operators, while E, p_0, q_0 are the mean values we have chosen for them. The state \rho depends on E, p_0 and q_0.

We’re doing quantum mechanics, so position q and momentum p are both self-adjoint operators on the Hilbert space L^2(\mathbb{R}):

(q\psi)(x) = x \psi(x)

(p\psi)(x) = - i \frac{d \psi}{dx}(x)

Indeed all our observables, including the Hamiltonian

H = \frac{1}{2} (p^2 + q^2)

are self-adjoint operators on this Hilbert space, and the state \rho is a density matrix on this space, meaning a positive self-adjoint operator with trace 1.

Now: how do we compute \rho? It’s a Lagrange multiplier problem: maximizing some function given some constraints. And it’s well-known that when you solve this problem, you get

\rho = \frac{1}{Z} e^{-(\lambda^1 q + \lambda^2 p + \lambda^3 H)}

where \lambda^1, \lambda^2, \lambda^3 are three numbers we yet have to find, and Z is a normalizing factor called the partition function:

Z = \mathrm{tr} (e^{-(\lambda^1 q + \lambda^2 p + \lambda^3 H)} )

Now let’s look at a special case. If we choose \lambda^1 = \lambda^2 = 0, we’re back a simpler and more famous problem, namely maximizing entropy subject to a constraint only on energy! The solution is then

\rho = \frac{1}{Z} e^{-\beta H} , \qquad Z = \mathrm{tr} (e^{- \beta H} )

Here I’m using the letter \beta instead of \lambda^3 because this is traditional. This quantity has an important physical meaning! It’s the reciprocal of temperature in units where Boltzmann’s constant is 1.

Anyway, back to our special case! In this special case it’s easy to explicitly calculate \rho and Z. Indeed, people have known how ever since Planck put the ‘quantum’ in quantum mechanics! He figured out how black-body radiation works. A box of hot radiation is just a big bunch of harmonic oscillators in thermal equilibrium. You can work out its partition function by multiplying the partition function of each one.

So, it would be great to reduce our general problem to this special case. To do this, let’s rewrite

Z = \mathrm{tr} (e^{-(\lambda^1 q + \lambda^2 p + \lambda^3 H)} )

in terms of some new variables, like this:

\rho = \frac{1}{Z} e^{-\beta(H - f q - g p)}

where now

Z = \mathrm{tr} (e^{-\beta(H - f q - g p)} )

Think about it! Now our problem is just like an oscillator with a modified Hamiltonian

H' = H - f q - g p

What does this mean, physically? Well, if you push on something with a force f, its potential energy will pick up a term - f q. So, the first two terms are just the Hamiltonian for a harmonic oscillator with an extra force pushing on it!

I don’t know a nice interpretation for the - g p term. We could say that besides the extra force equal to f, we also have an extra ‘gorce’ equal to g. I don’t know what that means. Luckily, I don’t need to! Mathematically, our whole problem is invariant under rotations in the position-momentum plane, so whatever works for q must also work for p.

Now here’s the cool part. We can complete the square:

\begin{aligned} H' & = \frac{1}{2} (q^2 + p^2) -  f q - g p \\ &= \frac{1}{2}(q^2 - 2 q f + f^2) + \frac{1}{2}(p^2 - 2 q g + g^2) - \frac{1}{2}(g^2 + f^2)  \\ &= \frac{1}{2}((q - f)^2 + (p - g)^2)  - \frac{1}{2}(g^2 + f^2)  \end{aligned}

so if we define ‘translated’ position and momentum operators:

q' = q - f, \qquad p' = p - g

we have

H' = \frac{1}{2}({q'}^2 + {p'}^2) -  \frac{1}{2}(g^2 + f^2)

So: apart from a constant, H' is just the harmonic oscillator Hamiltonian in terms of ‘translated’ position and momentum operators!

In other words: we’re studying a strange variant of the harmonic oscillator, where we are pushing on it with an extra force and also an extra ‘gorce’. But this strange variant is exactly the same as the usual harmonic oscillator, except that we’re working in translated coordinates on position-momentum space, and subtracting a constant from the Hamiltonian.

These are pretty minor differences. So, we’ve succeeded in reducing our problem to the problem of a harmonic oscillator in thermal equilibrium at some temperature!

This makes it easy to calculate

Z = \mathrm{tr} (e^{-\beta(H - f q - g p)} ) = \mathrm{tr}(e^{-\beta H'})

By our formula for H', this is just

Z = e^{\frac{1}{2}(g^2 + f^2)} \; \mathrm{tr} (e^{-\frac{1}{2}({q'}^2 + {p'}^2)})

And the second factor here equals the partition function for the good old harmonic oscillator:

Z = e^{\frac{1}{2}(g^2 + f^2)} \; \mathrm{tr} (e^{-\beta H})

So now we’re back to a textbook problem. The eigenvalues of the harmonic oscillator Hamiltonian are

n + \frac{1}{2}

where

n = 0,1,2,3, \dots

So, the eigenvalues of e^{-\beta H} are are just

e^{-\beta(n + \frac{1}{2})}

and to take the trace of this operator, we sum up these eigenvalues:

\mathrm{tr}(e^{-\beta H}) = \sum_{n = 0}^\infty e^{-\beta (n + \frac{1}{2})} = \frac{e^{-\beta/2}}{1 - e^{-\beta}}

So:

Z = e^{\frac{1}{2}(g^2 + f^2)} \; \frac{e^{-\beta/2}}{1 - e^{-\beta}}

We can now compute the Fisher information metric using this formula:

g_{ij} = \frac{\partial^2}{\partial \lambda^i \partial \lambda^j} \ln Z

if we remember how our new variables are related to the \lambda^i:

\lambda^1 = \beta f , \qquad \lambda^2 = \beta g, \qquad \lambda^3 = \beta

It’s just calculus! But I’m feeling a bit tired, so I’ll leave this pleasure to you.

For now, I’d rather go back to our basic intuition about how the Fisher information metric describes fluctuations of observables. Mathematically, this means it’s the real part of the covariance matrix

g_{ij} = \mathrm{Re} \langle \, (X_i - \langle X_i \rangle) \, (X_j - \langle X_j \rangle)  \, \rangle

where for us

X_1 = q, \qquad X_2 = p, \qquad X_3 = E

Here we are taking expected values using the mixed state \rho. We’ve seen this mixed state is just like the maximum-entropy state of a harmonic oscillator at fixed temperature — except for two caveats: we’re working in translated coordinates on position-momentum space, and subtracting a constant from the Hamiltonian. But neither of these two caveats affects the fluctuations (X_i - \langle X_i \rangle) or the covariance matrix.

So, as indeed we’ve already seen, g_{ij} has rotational symmetry in the 1-2 plane. Thus, we’ll completely know it once we know g_{11} = g_{22} and g_{33}; the other components are zero for symmetry reasons. g_{11} will equal the variance of position for a harmonic oscillator at a given temperature, while g_{33} will equal the variance of its energy. We can work these out or look them up.

I won’t do that now: I’m after insight, not formulas. For physical reasons, it’s obvious that g_{11} must diminish with diminishing energy — but not go to zero. Why? Well, as the temperature approaches zero, a harmonic oscillator in thermal equilibrium approaches its state of least energy: the so-called ‘ground state’. In its ground state, the standard deviations of position and momentum are as small as allowed by the Heisenberg uncertainty principle:

\Delta p \Delta q  \ge \frac{1}{2}

and they’re equal, so

g_{11} = (\Delta q)^2 = \frac{1}{2}.

That’s enough about the metric. Now, what about the metric’s skew-symmetric partner? This is:

\omega_{ij} = \mathrm{Im} \langle \, (X_i - \langle X_i \rangle) \, (X_j - \langle X_j \rangle)  \, \rangle

Last time we saw that \omega is all about expected values of commutators:

\omega_{ij} = \frac{1}{2i} \langle [X_i, X_j] \rangle

and this makes it easy to compute. For example,

[X_1, X_2] = q p - p q = i

so

\omega_{12} = \frac{1}{2}

Of course

\omega_{11} = \omega_{22} = 0

by skew-symmetry, so we know the restriction of \omega to any horizontal plane. We can also work out other components, like \omega_{13}, but I don’t want to. I’d rather just state this:

Summary: Restricted to any horizontal plane in the position-momentum-energy space, the Fisher information metric for the harmonic oscillator is

g = \mathrm{constant} (dq_0^2 + dp_0^2)

with a constant depending on the temperature, equalling \frac{1}{2} in the zero-temperature limit, and increasing as the temperature rises. Restricted to the same plane, the Fisher information metric’s skew-symmetric partner is

\omega = \frac{1}{2} dq_0 \wedge dp_0

(Remember, the mean values q_0, p_0, E_0 are the coordinates on position-momentum-energy space. We could also use coordinates f, g, \beta or f, g and temperature. In the chatty intro to this article you saw formulas like those above but without the subscripts; that’s before I got serious about using q and p to mean operators.)

And now for the moral. Actually I have two: a physics moral and a math moral.

First, what is the physical meaning of g or \omega when restricted to a plane of constant E_0, or if you prefer, a plane of constant temperature?

Physics Moral: Restricted to a constant-temperature plane, g is the covariance matrix for our observables. It is temperature-dependent. In the zero-temperature limit, the thermal fluctuations go away and g depends only on quantum fluctuations in the ground state. On the other hand, \omega restricted to a constant-temperature plane describes Heisenberg uncertainty relations for noncommuting observables. In our example, it is temperature-independent.

Second, what does this have to do with Kähler geometry? Remember, the complex plane has a complex-valued metric on it, called a Kähler structure. Its real part is a Riemannian metric, and its imaginary part is a symplectic structure. We can think of the the complex plane as the position-momentum plane for a point particle. Then the symplectic structure is the basic ingredient needed for Hamiltonian mechanics, while the Riemannian structure is the basic ingredient needed for the harmonic oscillator Hamiltonian.

Math Moral: In the example we considered, \omega restricted to a constant-temperature plane is equal to \frac{1}{2} the usual symplectic structure on the complex plane. On the other hand, g restricted to a constant-temperature plane is a multiple of the usual Riemannian metric on the complex plane — but this multiple is \frac{1}{2} only when the temperature is zero! So, only at temperature zero are g and \omega the real and imaginary parts of a Kähler structure.

It will be interesting to see how much of this stuff is true more generally. The harmonic oscillator is much nicer than your average physical system, so it can be misleading, but I think some of the morals we’ve seen here can be generalized.

Some other time I may so more about how all this is
related to Öttinger’s formalism, but the quick point is that he too has mixed states, and a symmetric g, and a skew-symmetric \omega. So it’s nice to see if they match up in an example.

Finally, two footnotes on terminology:

β: In fact, this quantity \beta = 1/kT is so important it deserves a better name than ‘reciprocal of temperature’. How about ‘coolness’? An important lesson from statistical mechanics is that coolness is more fundamental than temperature. This makes some facts more plausible. For example, if you say “you can never reach absolute zero,” it sounds very odd, since you can get as close as you like, and it’s even possible to get negative temperatures — but temperature zero remains tantalizingly out of reach. But “you can never attain infinite coolness” — now that makes sense.

Gorce: I apologize to Richard Feynman for stealing the word ‘gorce’ and using it a different way. Does anyone have a good intuition for what’s going on when you apply my sort of ‘gorce’ to a point particle? You need to think about velocity-dependent potentials, of that I’m sure. In the presence of a velocity-dependent potential, momentum is not just mass times velocity. Which is good: if it were, we could never have a system where the mean value of both q and p stayed constant over time!


Information Geometry (Part 4)

29 October, 2010

Before moving on, I’d like to clear up a mistake I’d been making in all my previous posts on this subject.

(By now I’ve tried to fix those posts, because people often get information from the web in a hasty way, and I don’t want my mistake to spread. But you’ll still see traces of my mistake infecting the comments on those posts.)

So what’s the mistake? It’s embarrassingly simple, but also simple to fix. A Riemannian metric must be symmetric:

g_{ij} = g_{ji}

Now, I had defined the Fisher information metric to be the so-called ‘covariance matrix’:

g_{ij} = \langle (X_i - \langle X_i \rangle) \;(X_j- \langle X_j \rangle)\rangle

where X_i are some observable-valued functions on a manifold M, and the angle brackets mean “expectation value”, computed using a mixed state \rho that also depends on the point in M.

The covariance matrix is symmetric in classical mechanics, since then observables commute, so:

\langle AB \rangle = \langle BA \rangle

But it’s not symmetric is quantum mechanics! After all, suppose q is the position operator for a particle, and p is the momentum operator. Then according to Heisenberg

qp = pq + i

in units where Planck’s constant is 1. Taking expectation values, we get:

\langle qp \rangle = \langle pq \rangle + i

and in particular:

\langle qp \rangle \ne \langle pq \rangle

We can use this to get examples where g_{ij} is not symmetric.

However, it turns out that the real part of the covariance matrix is symmetric, even in quantum mechanics — and that’s what we should use as our Fisher information metric.

Why is the real part of the covariance matrix symmetric, even in quantum mechanics? Well, suppose \rho is any density matrix, and A and B are any observables. Then by definition

\langle AB \rangle = \mathrm{tr} (\rho AB)

so taking the complex conjugate of both sides

\langle AB\rangle^*  = \mathrm{tr}(\rho AB)^* = \mathrm{tr}((\rho A B)^*) = \mathrm{tr}(B^* A^* \rho^*)

where I’m using an asterisk both for the complex conjugate of a number and the adjoint of an operator. But our observables are self-adjoint, and so is our density matrix, so we get

\mathrm{tr}(B^* A^* \rho^*) = \mathrm{tr}(B A \rho) = \mathrm{tr}(\rho B A) = \langle B A \rangle

where in the second step we used the cyclic property of the trace. In short:

\langle AB\rangle^* = \langle BA \rangle

If we take real parts, we get something symmetric:

\mathrm{Re} \langle AB\rangle =  \mathrm{Re} \langle BA \rangle

So, if we redefine the Fisher information metric to be the real part of the covariance matrix:

g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle) \; (X_j- \langle X_j \rangle)\rangle

then it’s symmetric, as it should be.

Last time I mentioned a general setup using von Neumann algebras, that handles the classical and quantum situations simultaneously. That applies here! Taking the real part has no effect in classical mechanics, so we don’t need it there — but it doesn’t hurt, either.

Taking the real part never has any effect when i = j, either, since the expected value of the square of an observable is a nonnegative number:

\langle (X_i - \langle X_i \rangle)^2 \rangle \ge 0

This has two nice consequences.

First, we get

g_{ii} = \langle (X_i - \langle X_i \rangle)^2 \rangle  \ge 0

and since this is true in any coordinate system, our would-be metric g is indeed nonnegative. It’ll be an honest Riemannian metric whenever it’s positive definite.

Second, suppose we’re working in the special case discussed in Part 2, where our manifold is an open subset of \mathbb{R}^n, and \mathbb{\rho} at the point x \in \mathbb{R}^n is the Gibbs state with \langle X_i \rangle = x_i. Then all the usual rules of statistical mechanics apply. So, we can compute the variance of the observable X_i using the partition function Z:

\langle (X_i - \langle X_i \rangle)^2 \rangle = \frac{\partial^2}{\partial \lambda_i^2} \ln Z

In other words,

g_{ii} =  \frac{\partial^2}{\partial \lambda_i^2} \ln Z

But since this is true in any coordinate system, we must have

g_{ij} =  \frac{\partial^2}{\partial \lambda_i \partial \lambda_j} \ln Z

(Here I’m using a little math trick: two symmetric bilinear forms whose diagonal entries agree in any basis must be equal. We’ve already seen that the left side is symmetric, and the right side is symmetric by a famous fact about mixed partial derivatives.)

However, I’m pretty sure this cute formula

g_{ij} =  \frac{\partial^2}{\partial \lambda_i \partial \lambda_j} \ln Z

only holds in the special case I’m talking about now, where points in \mathbb{R}^n are parametrizing Gibbs states in the obvious way. In general we must use

g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)(X_j- \langle X_j \rangle)\rangle

or equivalently,

g_{ij} = \mathrm{Re} \, \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})

Okay. So much for cleaning up Last Week’s Mess. Here’s something new. We’ve seen that whenever A and B are observables (that is, self-adjoint),

\langle AB\rangle^* = \langle BA \rangle

We got something symmetric by taking the real part:

\mathrm{Re} \langle AB\rangle =  \mathrm{Re} \langle BA \rangle

Indeed,

\mathrm{Re} \langle AB \rangle = \frac{1}{2} \langle AB + BA \rangle

But by the same reasoning, we get something antisymmetric by taking the imaginary part:

\mathrm{Im} \langle AB\rangle =  - \mathrm{Im} \langle BA \rangle

and indeed,

\mathrm{Im} \langle AB \rangle = \frac{1}{2i} \langle AB - BA \rangle

Commutators like AB-BA are important in quantum mechanics, so maybe we shouldn’t just throw out the imaginary part of the covariance matrix in our desperate search for a Riemannian metric! Besides the symmetric tensor on our manifold M:

g_{ij} = \mathrm{Re} \, \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})

we can also define a skew-symmetric tensor:

\omega_{ij} = \mathrm{Im} \,  \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})

This will vanish in the classical case, but not in the quantum case!

If you’ve studied enough geometry, you should now be reminded of things like ‘Kähler manifolds’ and ‘almost Kähler manifolds’. A Kähler manifold is a manifold that’s equipped with a symmetric tensor g and a skew-symmetric tensor \omega which fit together in the best possible way. An almost Kähler manifold is something similar, but not quite as nice. We should probably see examples of these arising in information geometry! And that could be pretty interesting.

But in general, if we start with any old manifold M together with a function \rho taking values in mixed states, we seem to be making M into something even less nice. It gets a symmetric bilinear form g on each tangent space, and a skew-symmetric bilinear form \omega, and they vary smoothly from point to point… but they might be degenerate, and I don’t see any reason for them to ‘fit together’ in the nice way we need for a Kähler or almost Kähler manifold.

However, I still think something interesting might be going on here. For one thing, there are other situations in physics where a space of states is equipped with a symmetric g and a skew-symmetric \omega. They show up in ‘dissipative mechanics’ — the study of systems whose entropy increases.

To conclude, let me remind you of some things I said in week295 of This Week’s Finds. This is a huge digression from information geometry, but I’d like to lay out the the puzzle pieces in public view, in case it helps anyone get some good ideas.

I wrote:

• Hans Christian Öttinger, Beyond Equilibrium Thermodynamics, Wiley, 2005.

I thank Arnold Neumaier for pointing out this book! It considers a fascinating generalization of Hamiltonian mechanics that applies to systems with dissipation: for example, electrical circuits with resistors, or mechanical systems with friction.

In ordinary Hamiltonian mechanics the space of states is a manifold and time evolution is a flow on this manifold determined by a smooth function called the Hamiltonian, which describes the energy of any state. In this generalization the space of states is still a manifold, but now time evolution is determined by two smooth functions: the energy and the entropy! In ordinary Hamiltonian mechanics, energy is automatically conserved. In this generalization that’s also true, but energy can go into the form of heat… and entropy automatically increases!

Mathematically, the idea goes like this. We start with a Poisson manifold, but in addition to the skew-symmetric Poisson bracket {F,G} of smooth functions on some manifold, we also have a symmetric bilinear bracket [F,G] obeying the Leibniz law

[F,GH] = [F,G]H + G[F,H]

and this positivity condition:

[F,F] ≥ 0

The time evolution of any function is given by a generalization of Hamilton’s equations:

dF/dt = {H,F} + [S,F]

where H is a function called the "energy" or "Hamiltonian", and S is a function called the "entropy". The first term on the right is the usual one. The new second term describes dissipation: as we shall see, it pushes the state towards increasing entropy.

If we require that

[H,F] = {S,F} = 0

for every function F, then we get conservation of energy, as usual in Hamiltonian mechanics:

dH/dt = {H,H} + [S,H] = 0

But we also get the second law of thermodynamics:

dS/dt = {H,S} + [S,S] ≥ 0

Entropy always increases!

Öttinger calls this framework “GENERIC” – an annoying acronym for “General Equation for the NonEquilibrium Reversible-Irreversible Coupling”. There are lots of papers about it. But I’m wondering if any geometers have looked into it!

If we didn’t need the equations [H,F] = {S,F} = 0, we could easily get the necessary brackets starting with a Kähler manifold. The imaginary part of the Kähler structure is a symplectic structure, say ω, so we can define

{F,G} = ω(dF,dG)

as usual to get Poisson brackets. The real part of the Kähler structure is a Riemannian structure, say g, so we can define

[F,G] = g(dF,dG)

This satisfies

[F,GH] = [F,G]H + G[F,H]

and

[F,F] ≥ 0

Don’t be fooled: this stuff is not rocket science. In particular, the inequality above has a simple meaning: when we move in the direction of the gradient of F, the function F increases. So adding the second term to Hamilton’s equations has the effect of pushing the system towards increasing entropy.

Note that I’m being a tad unorthodox by letting ω and g eat cotangent vectors instead of tangent vectors – but that’s no big deal. The big deal is this: if we start with a Kähler manifold and define brackets this way, we don’t get [H,F] = 0 or {S,F} = 0 for all functions F unless H and S are constant! That’s no good for applications to physics. To get around this problem, we would need to consider some sort of degenerate Kähler structure – one where ω and g are degenerate bilinear forms on the cotangent space.

Has anyone thought about such things? They remind me a little of "Dirac structures" and "generalized complex geometry" – but I don’t know enough about those subjects to know if they’re relevant here.

This GENERIC framework suggests that energy and entropy should be viewed as two parts of a single entity – maybe even its real and imaginary parts! And that in turn reminds me of other strange things, like the idea of using complex-valued Hamiltonians to describe dissipative systems, or the idea of “inverse temperature as imaginary time”. I can’t tell yet if there’s a big idea lurking here, or just a mess….


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers