Mathematics of the Environment (Part 7)

13 November, 2012

Last time we saw how the ice albedo effect could, in theory, make the Earth’s climate have two stable states. In a very simple model, we saw that a hot Earth might stay hot since it’s dark and absorbs lots of sunlight, while a cold Earth might stay cold—since it’s icy and white and reflects most of the sunlight.

If you haven’t tried it yet, make sure to play around with this program pioneered by Lesley de Cruz and then implemented in Java by Allan Erskine:

Temperature dynamics.

The explanations were all given in Part 6 so I won’t repeat them here!

This week, we’ll see how noise affects this simple climate model. We’ll borrow lots of material from here:

Glyn Adgie and Tim van Beek, Increasing the signal-to-noise ratio with more noise.

And we’ll use software written by these guys together with Allan Erskine and Jim Stuttard. The power of the Azimuth Project knows no bounds!

Stochastic differential equations

The Milankovich cycles are periodic changes in how the Earth orbits the Sun. One question is: can these changes can be responsible for the ice ages? On the first sight it seems impossible, because the changes are simply too small. But it turns out that we can find a dynamical system where a small periodic external force is actually strengthened by random ‘noise’ in the system. This phenomenon has been dubbed ‘stochastic resonance’ and has been proposed as an explanation for the ice ages. It also shows up in many other phenomena:

• Roberto Benzi, Stochastic resonance: from climate to biology.

But to understand it, we need to think a little about stochastic differential equations.

A lot of systems can be described by ordinary differential equations:

\displaystyle{ \frac{d x}{d t} = f(x, t)}

If f is nice, the time evolution of the system will be a nice smooth function x(t), like the trajectory of a thrown stone. But there are situations where we have some kind of noise, a chaotic, fluctuating influence, that we would like to take into account. This could be, for example, turbulence in the air flow around a rocket. Or, in our case, short term fluctuations of the weather of the earth. If we take this into account, we get an equation of the form

\displaystyle{ \frac{d x}{d t} = f(x, t) + w(t) }

where the w is a ‘random function’ which models the noise. Typically this noise is just a simplified way to take into account rapidly changing fine-grained aspects of the system at hand. This way we do not have to explicitly model these aspects, which is often impossible.

White noise

We’ll look at a model of this sort:

\displaystyle{ \frac{d x}{d t} = f(x, t) + w(t) }

where w(t) is ‘white noise’. But what’s that?

Very naively speaking, white noise is a random function that typically looks very wild, like this:

White noise actually has a sound, too: it sounds like this! The idea is that you can take a random function like the one graphed above, and use it to drive the speakers of your computer, to produce sound waves of an equally wild sort. And it sounds like static.

However, all this is naive. Why? The concept of a ‘random function’ is not terribly hard to define, at least if you’ve taken the usual year-long course on real analysis that we force on our grad students at U.C. Riverside: it’s a probability measure on some space of functions. But white noise is a bit too spiky and singular too be a random function: it’s a random distribution.

Distributions were envisioned by Dirac but formalized later by Laurent Schwarz and others. A distribution D(t) is a bit like a function, but often distributions are too nasty to have well-defined values at points! Instead, all that makes sense are expressions like this:

\displaystyle{ \int_\infty^\infty D(t) f(t) \, d t}

where we multiply the distribution by any compactly supported smooth function f(t) and then integrate it. Indeed, we specify a distribution just by saying what all these integrals equal. For example, the Dirac delta \delta(t) is a distribution defined by

\displaystyle{ \int_\infty^\infty \delta(t) f(t) \, d t = f(0) }

If you try to imagine the Dirac delta as a function, you run into a paradox: it should be zero everywhere except at t = 0, but its integral should equal 1. So, if you try to graph it, the region under the graph should be an infinitely skinny infinitely tall spike whose area is 1:

But that’s impossible, at least in the standard framework of mathematics—so such a function does not really exist!

Similarly, white noise w(t) is too spiky to be an honest random function, but if we multiply it by any compactly supported smooth function f(t) and integrate, we get a random variable

\displaystyle{ \int_\infty^\infty w(t) f(t) \, d t }

whose probability distribution is a Gaussian with mean zero and standard deviation equal to

\displaystyle{ \sqrt{ \int_\infty^\infty f(t)^2 \, d t} }

(Note: the word ‘distribution’ has a completely different meaning when it shows up in the phrase probability distribution! I’m assuming you’re comfortable with that meaning already.)

Indeed, the above formulas make sense and are true, not just when f is compactly supported and smooth, but whenever

\displaystyle{ \sqrt{ \int_\infty^\infty f(t)^2 \, d t} } < \infty

If you know about Gaussians and you know about this sort of integral, which shows up all over math and physics, you’ll realize that white noise is an extremely natural concept!

Brownian motion

While white noise is not a random function, its integral

\displaystyle{ W(s) = \int_0^s w(s) \, ds }

turns out to be a well-defined random function. It has a lot of names, including: the Wiener process, Brownian motion, red noise and—as a kind of off-color joke—brown noise.

The capital W here stands for Wiener: that is, Norbert Wiener, the famously absent-minded MIT professor who studied this random function… and also invented cybernetics:

Brownian noise sounds like this.

Puzzle: How does it sound different from white noise, and why?

It looks like this:

Here we are zooming in on closer and closer, while rescaling the vertical axis as well. We see that Brownian noise is self-similar: if W(t) is Brownian noise, so is W(c t)/\sqrt{c} for all c > 0.

More importantly for us, Brownian noise is the solution of this differential equation:

\displaystyle{ \frac{d W}{d t} = w(t) }

where w(t) is white noise. This is essentially true by definition, but making it rigorous takes some work. More fancy stochastic differential equations

\displaystyle{ \frac{d x}{d t} = f(x, t) + w(t) }

take even more work to rigorously formulate and solve. You can read about them here:

Stochastic differential equation, Azimuth Library.

It’s actually much easier to explain the difference equations we use to approximately solve these stochastic differential equation on the computer. Suppose we discretize time into steps like this:

t_i = i \Delta t

where \Delta t > 0 is some fixed number, our ‘time step’. Then we can define

x(t_{i+1}) = x(t_i) + f(x(t_i), t_i) \; \Delta t + w_i

where w_i are independent Gaussian random variables with mean zero and standard deviation

\sqrt{ \Delta t}

The square root in this formula comes from the definition I gave of white noise.

If we use a random number generator to crank out random numbers w_i distributed in this way, we can write a program to work out the numbers x(t_i) if we are given some initial value x(0). And if the time step \Delta t is small enough, we can hope to get a ‘good approximation to the true solution’. Of course defining what we mean by a ‘good approximation’ is tricky here… but I think it’s more important to just plunge in and see what happens, to get a feel for what’s going on here.

Stochastic resonance

Let’s do an example of this equation:

\displaystyle{ \frac{d x}{d t} = f(x, t) + w(t) }

which exhibits ‘stochastic resonance’. Namely:

\displaystyle{ \frac{d x}{d t} = x(t) - x(t)^3 + A \;  \sin(t)  + \sqrt{2D} w(t) }

Here A and D are constants we get to choose. If we set them both to zero we get:

\displaystyle{ \frac{d x}{d t} = x(t) - x(t)^3 }

This has stable equilibrium solutions at x = \pm 1 and an unstable equilibrium in between at x = 0. So, this is a bistable model similar to the one we’ve been studying, but mathematically simpler!

Then we can add an oscillating time-dependent term:

\displaystyle{ \frac{d x}{d t} = x(t) - x(t)^3 + A \;  \sin(t) }

which wiggles the system back and forth. This can make it jump from one equilibrium to another.

And then we can add on noise:

\displaystyle{ \frac{d x}{d t} = x(t) - x(t)^3 + A \;  \sin(t)  + \sqrt{2D} w(t) }

Let’s see what the solutions look like!

In the following graphs, the green curve is A \sin(t),, while the red curve is x(t). Here is a simulation with a low level of noise:

low noise level

As you can see, within the time of the simulation there is no transition from the stable state at 1 to the one at -1. If we were doing a climate model, this would be like the Earth staying in the warm state.

Here is a simulation with a high noise level:

high noise level

The solution x(t) jumps around wildly. By inspecting the graph with your eyes only, you don’t see any pattern in it, do you?

But finally, here is a simulation where the noise level is not too small, and not too big:

high noise level

Here we see the noise helping the solution x(t) hops from around x = -1 to x = 1, or vice versa. The solution x(t) is not at all ‘periodic’—it’s quite random. But still, it tends to hop back and forth thanks to the combination of the sinusoidal term A \sin(t) and the noise.

Glyn Adgie, Allan Erskine, Jim Stuttard and Tim van Beek have created an online model that lets you solve this stochastic differential equation for different values of A and D. You can try it here:

Stochastic resonance example.

You can change the values using the sliders under the graphic and see what happens. You can also choose different ‘random seeds’, which means that the random numbers used in the simulation will be different.

To read more about stochastic resonance, go here:

Stochastic resonance, Azimuth Library.

In future weeks I hope to say more about the actual evidence that stochastic resonance plays a role in our glacial cycles! It would also be great to go back to our climate model from last time and add noise. We’re working on that.


Mathematics of the Environment (Part 6)

10 November, 2012

Last time we saw a ‘bistable’ climate model, where the temperatures compatible with a given amount of sunshine can form an S-shaped curve like this:

The horizontal axis is insolation, the vertical is temperature. Between the green and the red lines the Earth can have 3 temperatures compatible with a given insolation. For example, the black vertical line intersects the S-shaped curve in three points. So we get three possible solutions: a hot Earth, a cold Earth, and an intermediate Earth.

But last time I claimed the intermediate Earth was unstable, so there are just two stable solutions. So, we say this model is bistable. This is like a simple light switch, which has two stable positions but also an intermediate unstable position halfway in between.

(Have you ever enjoyed putting a light switch into this intermediate position? If not, you must not be a physicist.)

Why is the intermediate equilibrium unstable? It seems plausible from the light switch example, but to be sure, we need to go back and study the original equation:

\displaystyle{ C \frac{d T}{d t} = - A - B T + Q c(T(t)) }

We need see what happens when we push T slightly away from one of its equilibrium values. We could do this analytically or numerically.

Luckily, Allan Erskine has made a wonderful program that lets us study it numerically: check it out!

Temperature dynamics.

Here’s you’ll see a bunch of graphs of temperature as a function of time, T(t). To spice things up, Allan has made the insolation a function of time, which starts out big for some interval [0,\tau] and then drops to its usual value Q = 341.5. So, these graphs are solutions of

\displaystyle{ C \frac{d T}{d t} = - A - B T + Q(t) c(T(t)) }

where Q(t) is a step function with

Q(t) = \left\{ \begin{array}{ccl} Q + X & \mathrm{for} & 0 \le t \le \tau \\  Q & \mathrm{for} & t > \tau  \end{array} \right.

The different graphs show solutions with different initial conditions, ranging from hot to cold. Using sliders on the bottom, you can adjust:

• the coalbedo transition rate \gamma,

• the amount X of extra insolation,

• the time \tau at which the extra insolation ends.

I urge you to start by setting \tau to its maximum value. That will make the insolation be constant as a function of time. Then you if \gamma and X are big enough, you’ll get bistability. For example:

I get this with \gamma about 0.08, X about 28.5. You can see a hot stable equilibrium, a cold one, and a solution that hesitates between the two for quite a while before going up to the hot one. This intermediate solution must be starting out very slightly above the unstable equilibrium.

When X is zero, there’s only one equilibrium solution: the cold Earth.

I can’t make X so big that the hot Earth is the only equilibrium, but it’s possible according to our model: I’ll need to change the software a bit to let us make the insolation bigger.

All sorts of more interesting things happen when we move \tau down from its maximum value. I hope you play with the parameters and see what happens. But essentially, what happens is that the hot Earth is only stable before t = \tau, since we need the extra insolation to make that happen. After that, the Earth is fated to go to a cold state.

Needless to say, these results should not be trusted when it comes to the actual climate of our actual planet! More about that later.

We can also check the bistability in a more analytical way. We get an equilibrium solution of

\displaystyle{ C \frac{d T}{d t} = - A - B T + Q c(T(t)) }

whenever we find a number T obeying this equation:

- A - B T + Q c(T) = 0

We can show that for certain values of \gamma and Q, we get solutions for three different temperatures T. It’s easy to see that - A - B T + Q c(T) is positive for very small T: if the Earth were extremely cold, the Sun would warm it up. Similarly, this quantity is negative for very large T: the Earth would cool down if it were very hot. So, the reason

- A - B T + Q c(T) = 0

has three solutions is that it starts out positive, then goes down below zero, then goes up above zero, and then goes down below zero again. So, for the intermediate point at which it’s zero, we have

\displaystyle{\frac{d}{dT}( -A - B T + Q c(T)) > 0  }

That means that if it starts out slightly warmer than this value of T, the temperature will increase—so this solution is unstable. For the hot and cold solutions, we get

\displaystyle{ \frac{d}{dT}(-A - B T + Q c(T)) < 0  }

so these equilibria are stable.

A moral

What morals can we extract from this model?

As far as climate science goes, one moral is that it pays to spend some time making sure we understand simple models before we dive into more complicated ones. Right now we’re looking at a very simple one, but we’re already seeing some interesting phenomena. The kind of model we’re looking at now is called a Budyko-Sellers model. These have been studied since the late 1960′s:

• M. I. Budyko, On the origin of glacial epochs (in Russian), Meteor. Gidrol. 2 (1968), 3-8.

• M. I. Budyko, The effect of solar radiation variations on the climate of the earth, Tellus 21 (1969), 611-619.

• William D. Sellers, A global climatic model based on the energy balance of the earth-atmosphere system, J. Appl. Meteor. 8 (1969), 392-400.

• Carl Crafoord and Erland Källén, A note on the condition for existence of more than one steady state solution in Budyko-Sellers type models, J. Atmos. Sci. 35 (1978), 1123-1125.

• Gerald R. North, David Pollard and Bruce Wielicki, Variational formulation of Budyko-Sellers climate models, J. Atmos. Sci. 36 (1979), 255-259.

I should talk more about some slightly more complex models someday.

It also pays to compare our models to reality! For example, the graphs we’ve seen show some remarkably hot and cold temperatures for the Earth. That’s a bit unnerving. Let’s investigate. Suppose we set \gamma = 0 on our slider. Then the coalbedo of the Earth becomes independent of temperature: it’s 0.525, halfway between its icy and ice-free values. Then, when the insolation takes its actual value of 342.5 watts per square meter, the model says the Earth’s temperature is very chilly: about -20 °C!

Does that mean the model is fundamentally flawed? Maybe not! After all, it’s based on very light-colored Earth. Suppose we use the actual albedo of the Earth. Of course that’s hard to define, much less determine. But let’s just look up some average value of the Earth’s albedo: supposedly it’s about 0.3. That gives a coalbedo of c = 0.7. If we plug that in our formula:

\displaystyle{ Q = \frac{ A + B T } {c} }

we get 11 °C. That’s not too far from the Earth’s actual average temperature, namely about 15 °C. So the chilly temperature of -20 °C seems to come from an Earth that’s a lot lighter in color than ours.

Our model includes the greenhouse effect, since the coeficients A and B were determined by satellite measurements of how much radiation actually escapes the Earth’s atmosphere and shoots out into space. As a further check to our model, we can look at an even simpler zero-dimensional energy balance model: a completely black Earth with no greenhouse effect. We discussed that earlier.

As he explains, this model gives the Earth a temperature of 6 °C. He also shows that in this model, lowering the albedo to a realistic value of 0.3 lowers the temperature to a chilly -18 ° C. To get from that to something like our Earth, we must take the greenhouse effect into account.

This sort of fiddling around is the sort of thing we must do to study the flaws and virtues of a climate model. Of course, any realistic climate model is vastly more sophisticated than the little toy we’ve been looking at, so the ‘fiddling around’ must also be more sophisticated. With a more sophisticated model, we can also be more demanding. For example, when I said 11 °C is “is not too far from the Earth’s actual average temperature, namely about 15 °C”, I was being very blasé about what’s actually a big discrepancy. I only took that attitude because the calculations we’re doing now are very preliminary.


Mathematics of the Environment (Part 5)

6 November, 2012

We saw last time that the Earth’s temperature seems to have been getting colder but also more erratic for the last 30 million years or so. Here’s the last 5 million again:

People think these glacial cycles are due to variations in the Earth’s orbit, but as we’ll see later, those cause quite small changes in ‘insolation’—roughly, the amount of sunshine hitting the Earth (as a function of time and location). So, M. I. Budyko, an expert on the glacial cycles, wrote:

Thus, the present thermal regime and glaciations of the Earth prove to be characterized by high instability. Comparatively small changes of radiation—only by 1.0-1.5%—are sufficient for the development of ice cover on the land and oceans that reaches temperate latitudes.

How can small changes in the amount of sunlight hitting the Earth, or other parameters, create big changes in the Earth’s temperature? The obvious answer is positive feedback: some sort of amplifying effect.

But what could it be? Do we know feedback mechanisms that can amplify small changes in temperature? Yes. Here are a few obvious ones:

Water vapor feedback. When it gets warmer, more water evaporates, and the air becomes more humid. But water vapor is a greenhouse gas, which causes additional warming. Conversely, when the Earth cools down, the air becomes drier, so the greenhouse effect becomes weaker, which tends to cool things down.

Ice albedo feedback. Snow and ice reflect more light than liquid oceans or soil. When the Earth warms up, snow and ice melt, so the Earth becomes darker, absorbs more light, and tends to get get even warmer. Conversely, when the Earth cools down, more snow and ice form, so the Earth becomes lighter, absorbs less light, and tends to get even cooler.

Carbon dioxide solubility feedback. Cold water can hold more carbon dioxide than warm water: that’s why opening a warm can of soda can be so explosive. So, when the Earth’s oceans warm up, they release carbon dioxide. But carbon dioxide is a greenhouse gas, which causes additional warming. Conversely, when the oceaans cool down, they absorb more carbon dioxide, so the greenhouse effect becomes weaker, which tends to cool things down.

Of course, there are also negative feedbacks: otherwise the climate would be utterly unstable! There are also complicated feedbacks whose overall effect is harder to evaluate:

Planck feedback. A hotter world radiates more heat, which cools it down. This is the big negative feedback that keeps all the positive feedbacks from making the Earth insanely hot or insanely cold.

Cloud feedback. A warmer Earth has more clouds, which reflect more light but also increase the greenhouse effect.

Lapse rate feedback. An increased greenhouse effect changes the vertical temperature profile of the atmosphere, which has effects of its own—but this works differently near the poles and near the equator.

See also “week302″ of This Week’s Finds, where Nathan Urban tells us more about feedbacks and how big they’re likely to be.

Understanding all these feedbacks, and which ones are important for the glacial cycles we see, is a complicated business. Instead of diving straight into this, let’s try something much simpler. Let’s just think about how the ice albedo effect could, in theory, make the Earth bistable.

To do this, let’s look at the very simplest model in this great not-yet-published book:

• Gerald R. North, Simple Models of Global Climate.

This is a zero-dimensional energy balance model, meaning that it only involves the average temperature of the earth, the average solar radiation coming in, and the average infrared radiation going out.

The average temperature will be T, measured in Celsius. We’ll assume the Earth radiates power square meter equal to

\displaystyle{ A + B T }

where A = 218 watts/meter2 and B = 1.90 watts/meter2 per degree Celsius. This is a linear approximation taken from satellite data on our Earth. In reality, the power emitted grows faster than linearly with temperature.

We’ll assume the Earth absorbs solar energy power per square meter equal to

Q c(T)

Here:

Q is the average insolation: that is, the amount of solar power per square meter hitting the top of the Earth’s atmosphere, averaged over location and time of year. In reality Q is about 341.5 watts/meter2. This is one quarter of the solar constant, meaning the solar power per square meter that would hit a panel hovering in space above the Earth’s atmosphere and facing directly at the Sun. (Why a quarter? We’ve seen why: it’s because the area of a sphere is 4 \pi r^2 while the area of a circle is just \pi r^2.)

c(T) is the coalbedo: the fraction of solar power that gets absorbed. The coalbedo depends on the temperature; we’ll have to say how.

Given all this, we get

\displaystyle{ C \frac{d T}{d t} = - A - B T + Q c(T(t)) }

where C is Earth’s heat capacity in joules per degree per square meter. Of course this is a funny thing, because heat energy is stored not only at the surface but also in the air and/or water, and the details vary a lot depending on where we are. But if we consider a uniform planet with dry air and no ocean, North says we may roughly take C equal to about half the heat capacity at constant pressure of the column of dry air over a square meter, namely 5 million joules per degree Celsius.

The easiest thing to do is find equilibrium solutions, meaning solutions where \frac{d T}{d t} = 0, so that

A + B T = Q c(T)

Now C doesn’t matter anymore! We’d like to solve for T as a function of the insolation Q, but it’s easier to solve for Q as a function of T:

\displaystyle{ Q = \frac{ A + B T } {c(T)} }

To go further, we need to guess some formula for the coalbedo c(T). The coalbedo, remember, is the fraction of sunlight that gets absorbed when it hits the Earth. It’s 1 minus the albedo, which is the fraction that gets reflected. Here’s a little chart of albedos:

If you get mixed up between albedo and coalbedo, just remember: coal has a high coalbedo.

Since we’re trying to keep things very simple right not, not model nature in all its glorious complexity, let’s just say the average albedo of the Earth is 0.65 when it’s very cold and there’s lots of snow. So, let

c_i = 1  - 0.65 =  0.35

be the ‘icy’ coalbedo, good for very low temperatures. Similarly, let’s say the average albedo drops to 0.3 when its very hot and the Earth is darker. So, let

c_f = 1 - 0.3 = 0.7

be the ‘ice-free’ coalbedo, good for high temperatures when the Earth is darker.

Then, we need a function of temperature that interpolates between c_i and c_f. Let’s try this:

c(T) = c_i + \frac{1}{2} (c_f-c_i) (1 + \tanh(\gamma T))

If you’re not a fan of the hyperbolic tangent function \tanh, this may seem scary. But don’t be intimidated!

The function \frac{1}{2}(1 + \tanh(\gamma T)) is just a function that goes smoothly from 0 at low temperatures to 1 at high temperatures. This ensures that the coalbedo is near its icy value c_i at low temperatures, and near its ice-free value c_f at high temperatures. But the fun part here is \gamma, a parameter that says how rapidly the coalbedo rises as the Earth gets warmer. Depending on this, we’ll get different effects!

The function c(T) rises fastest at T = 0, since that’s where \tanh (\gamma T) has the biggest slope. We’re just lucky that in Celsius T = 0 is the melting point of ice, so this makes a bit of sense.

Now Allan Erskine‘s programming magic comes into play! I’m very fortunate that the Azimuth Project has attracted some programmers who can make nice software for me to show you. Unfortunately his software doesn’t work on this blog—yet!—so please hop over here to see it in action:

Temperature versus insolation.

You can slide a slider to adjust the parameter \gamma to various values between 0 and 1.

In the little graph at right, you can see how the coalbedo c(T) changes as a function of the temperature T. In this graph the temperature ranges from -50 °C and 50 °C; the graph depends on what value of \gamma you choose with slider.

In the big graph at left, you can see how the insolation Q required to yield a given temperature T between -50 °C and 50 °C. As we’ve seen, it’s easiest to graph Q as a function of T:

\displaystyle{ Q = \frac{ A + B T } {c_i + \frac{1}{2} (c_f-c_i) (1 + \tanh(\gamma T))} }

Solving for T here is hard, but we can just flip the graph over to see what equilibrium temperatures T are allowed for a given insolation Q between 200 and 500 watts per square mater.

The exciting thing is that when \gamma gets big enough, three different temperatures are compatible with the same amount of insolation! This means the Earth can be hot, cold or something intermediate even when the amount of sunlight hitting it is fixed. The intermediate state is unstable, it turns out—we’ll see why later. Only the hot and cold states are stable. So, we say the Earth is bistable in this simplified model.

Can you see how big \gamma needs to be for this bistability to kick in? It’s certainly there when \gamma = 0.05, since then we get a graph like this:

When the insolation is less than about 385 W/m2 there’s only a cold state. When it hits 385 W/m2, as shown by the green line, suddenly there are two possible temperatures: a cold one and a much hotter one. When the insolation is higher, as shown by the black line, there are three possible temperatures: a cold one, and unstable intermediate one, and a hot one. And when the insolation gets above 465 W/m2, as shown by the red line, there’s only a hot state!

Mathematically, this model illustrates catastrophe theory. As we slowly turn up \gamma, we get different curves showing how temperature is a function of insolation… until suddenly the curve isn’t the graph of a function anymore: it becomes infinitely steep at one point! After that, we get bistability:


\gamma = 0.00

\gamma = 0.01

\gamma = 0.02

\gamma = 0.03

\gamma = 0.04

\gamma = 0.05

This is called a cusp catastrophe, and you can visualize these curves as slices of a surface in 3d, which looks roughly like this picture:



from here:

• Wolfram Mathworld, Cusp catastrophe. (Includes Mathematica package.)

The cusp catastrophe is ‘structurally stable’, meaning that small perturbations don’t change its qualitative behavior. In other words, whenever you have a smooth graph of a function that gets steeper and steeper until it ‘overhangs’ and ceases to be the graph of a function, it looks like this cusp catastrophe. This statement is quite vague as I’ve just said it— but it’s made 100% precise in catastrophe theory.

Structural stability is a useful concept, because it focuses our attention on robust features of models: features that don’t go away if the model is slightly wrong, as it always is.

There are lots more things to say, but the most urgent question to answer is this: why is the intermediate state unstable when it exists? Why are the other two equilibria stable? We’ll talk about that next time!


Mathematics of the Environment (Part 3)

13 October, 2012

This week I’ll release these notes before my seminar, so my students (and all of you, too) can read them ahead of time. The reason is that I’m pushing into topics I don’t understand as well as I’d like. So, my notes wrestle with some ideas in too much detail to cover in class—and I’m hoping some students will look at these notes ahead of time to prepare. Also, I’d appreciate your comments!

This week I’ll borrow material shamelessly from here:

• Seymour L. Hess, Introduction to Theoretical Meteorology, Henry Holt and Company, New York, 1959.

It’s an old book: for example, it talks about working out the area under a curve using a gadget called a planimeter, which is what people did before computers.

It also talks about how people measured the solar constant (roughly, the brightness of the Sun) before we could easily put satellites up above the Earth’s atmosphere! And it doesn’t mention global warming.

But despite or perhaps even because of these quaint features, it’s simple and clear. In case it’s not obvious yet, I’m teaching this quarter’s seminar in order to learn stuff. So, I’ll sometimes talk about old work… but if you catch me saying things that are seriously wrong (as opposed to merely primitive), please let me know.

The plan

Last time we considered a simple model Earth, a blackbody at uniform constant temperature absorbing sunlight and re-emitting the same amount of power in the form of blackbody radiation. We worked out that its temperature would be 6 °C, which is not bad. But then we took into account the fact that the Earth is not black. We got a temperature of -18 °C, which is too cold. The reason is that we haven’t yet equipped our model Earth with an atmosphere! So today let’s try that.

At this point things get a lot more complicated, even if we try a 1-dimensional model where the temperature, pressure and other features of the atmosphere only depend on altitude. So, I’ll only do what I can easily do. I’ll explain some basic laws governing radiation, and then sketch how people applied them to the Earth.

It’ll be good to start with a comment about what we did last time.

Kirchoff’s law of radiation

When we admitted the Earth wasn’t black, we said that it absorbed only about 70% of the radiation hitting it… but we still modeled it as emitting radiation just like a blackbody! Isn’t there something fishy about this?

Well, no. The Earth is mainly absorbing sunlight at visible frequencies, and at these frequencies it only absorbs about 70% of the radiation that hits it. But it mainly emits infrared light, and at these frequencies it acts like it’s almost black. These frequencies are almost completely different from those where absorption occurs.

But still, this issue is worth thinking about.

After all, emission and absorption are flip sides of the same coin. There’s a deep principle in physics, called reciprocity, which says that how X affects Y is not a separate question from how Y affects X. In fact, if you know the answer to one of these questions, you can figure out the answer to the other!

The first place most people see this principle is Newton’s third law of classical mechanics, saying that if X exerts a force on Y, Y exerts an equal and opposite force on X.

For example: if I punched your nose, your nose punched my fist just as hard, so you have no right to complain.

This law is still often stated in its old-fashioned form:

For every action there is an equal and opposite reaction.

I found this confusing as a student, because ‘force’ was part of the formal terminology of classical mechanics, but not ‘action’—at least not as used in this sentence!—and certainly not ‘reaction’. But as a statement of the basic intuition behind reciprocity, it’s got a certain charm.

In engineering, the principle of reciprocity is sometimes stated like this:

Reciprocity in linear systems is the principle that the response R_{ab} measured at a location a when the system is excited at a location b is exactly equal to R_{ba}, which is the response at location b when that same excitation is applied at a. This applies for all frequencies of the excitation.

Again this is a bit confusing, at least if you’re a mathematician who would like to know exactly how a ‘response’ or an ‘excitation’ is defined. It’s also disappointing to see the principle stated in a way that limits it to linear systems. Nonetheless it’s tremendously inspiring. What’s really going on here?

I don’t claim to have gotten to the bottom of it. My hunch is that to a large extent it will come down to the fact that mixed partial derivatives commute. If we’ve got a smooth function f of a bunch of variables x_1, \dots, x_n, and we set

\displaystyle{ R_{ab} = \frac{\partial^2 f}{\partial x_a \partial x_b} }

then

R_{ab} = R_{ba}

However, I haven’t gotten around to showing that reciprocity boils down to this in all the examples yet. Yet another unification project to add to my list!

Anyway: reciprocity has lots of interesting applications to electromagnetism. And that’s what we’re really talking about now. After all, light is electromagnetic radiation!

The simplest application is one we learn as children:

If I can see you, then you can see me.

or at least:

If light can go from X to Y in a static environment, it can also go from Y to X.

But we want something that sounds a bit different. Namely:

The tendency of a substance to absorb light at some frequency equals its tendency to emit light at that frequency.

This is too vague. We should make it precise, and in a minute I’ll try, but first let me motivate this idea with a thought experiment. Suppose we have a black rock and a white rock in a sealed mirrored container. Suppose they’re in thermal equilibrium at a very high temperature, so they’re glowing red-hot. So, there’s red light bouncing around the container. The black rock will absorb more of this light. But since they’re in thermal equilibrium, the black rock must also be emitting more of this light, or it would gain energy and get hotter than the white one. That would violate the zeroth law of thermodynamics, which implies that in thermal equilibrium, all the parts of a system must be at the same temperature.

More precisely, we have:

Kirchoff’s Law of Thermal Radiation. For any body in thermal equilibrium, its emissivity equals its absorptivity.

Let me explain. Suppose we have a surface made of some homogeneous isotropic material in thermal equilibrium at temperature T. If it’s perfectly black, we saw last time that it emits light with a monochromatic energy flux given by the Planck distribution:

\displaystyle{ f_{\lambda}(T) = \frac{2 hc^2}{\lambda^5} \frac{1}{ e^{\frac{hc}{\lambda k T}} - 1 } }

Here \lambda is the wavelength of light and the ‘monochromatic energy flux’ has units of power per area per wavelength.

But if our surface is not perfectly black, we have to multiply this by a fudge factor between 0 and 1 to get the right answer. This factor is called the emissivity of the substance. It can depend on the wavelength of the light quite a lot, and also on the surface’s temperature (since for example ice melts at high temperatures and gets darker). So, let’s call it e_\lambda(T).

We can also talk about the absorptivity of our surface, which is the fraction of light it absorbs. Again this depends on the wavelength of the light and the temperature of our surface. So, let’s call it a_\lambda(T).

Then Kirchoff’s law of thermal radiation says

e_\lambda(T) = a_\lambda(T)

So, for each frequency the emissivity must equal the absorptivity… but it’s still possible for the Earth to have an average emissivity near 1 at the wavelengths of infrared light and near 0.7 at the wavelengths of visible light. So there’s no paradox.

Puzzle 1. Is this law named after the same guy who discovered Kirchhoff’s laws governing electrical circuits?

Schwarzschild’s equation

Now let’s talk about light shining through the Earth’s atmosphere. Or more generally, light shining through a medium. What happens? It can get absorbed. It can get scattered, bouncing off in different directions. Light can also get emitted, especially if the medium is hot. The air in our atmosphere isn’t hot enough to emit a lot of visible light, but it definitely emits infrared light and microwaves.

It sounds complicated, and it is, but there are things we can say about it. Let me tell you about Schwarzschild’s equation.

Light comes in different wavelengths. So, can ask how much power per square meter this light carries per wavelength. We call this the monochromatic energy flux I_{\lambda}, since it depends on the wavelength \lambda. As mentioned last time, this has units W/m2μm, where μm stands for micrometers, a unit of wavelength.

However, because light gets absorbed, scattered and emitted the monochromatic energy flux is really a function I_{\lambda}(s), where s is the distance through the medium. Here I’m imagining an essentially one-dimensional situation, like a beam of sunlight coming down through the air when the Sun is directly overhead. We can generalize this later.

Let’s figure out the basic equation describing how I_{\lambda}(s) changes as a function of s. This is called the equation of radiative transfer, or Schwarzschild’s equation. It won’t tell us how different gases absorb different amounts of light of different frequencies—for that we need to do hard calculations, or experiments. But we can feed the results of these calculations into Schwarzschild’s equation.

For starters, let’s assume that light only gets absorbed but not emitted or scattered. Later we’ll include emission, which is very important for what we’re doing: the Earth’s atmosphere is warm enough to emit significant amounts of infrared light (though not hot enough to emit much visible light). Scattering is also very important, but it can be treated as a combination of absorption and emission.

For absorption only, we have the Beer–Lambert law:

\displaystyle{  \frac{d I_\lambda(s)}{d s} = - a_\lambda(s) I_\lambda(s)  }

In other words, the amount of radiation that gets absorbed per distance is proportional to the amount of radiation. However, the constant of proportionality a_\lambda (s) can depend on the frequency and the details of our medium at the position s. I don’t know the standard name for this constant a_\lambda (s), so let’s call it the absorption rate.

Puzzle 2. Assuming the Beer–Lambert law, show that the intensity of light at two positions s_1 and s_2 is related by

\displaystyle{ I_\lambda(s_2) = e^{-\tau} \; I_\lambda(s_1) }

where the optical depth \tau of the intervening medium is defined by

\displaystyle{ \tau = \int_{s_1}^{s_2} a_\lambda(s) \, ds }

So, a layer of stuff has optical depth equal to 1 if light shining through it has its intensity reduced by a factor of 1/e.

We can go a bit further if our medium is a rather thin gas like the air in our atmosphere. Then the absorption rate is given by

a_\lambda(s) = k_\lambda(s) \, \rho(s)

where \rho(s) is the density of the air at the position s and k_\lambda(s) is its absorption coefficient.

In other words, air absorbs light at a rate proportional to its density, but also depending on what it’s made of, which may vary with position. For example, both the density and the humidity of the atmosphere can depend on its altitude.

What about emission? Air doesn’t just absorb infrared light, it also emits significant amounts of it! As mentioned, a blackbody at temperature T emits light with a monochromatic energy flux given by the Planck distribution:

\displaystyle{ f_{\lambda}(T) = \frac{2 hc^2}{\lambda^5} \frac{1}{ e^{\frac{hc}{\lambda k T}} - 1 } }

But a gas like air is far from a blackbody, so we have to multiply this by a fudge factor. Luckily, thanks to Kirchoff’s law of radiation, this factor isn’t so fudgy: it’s just the absorption rate a_\lambda(s).

Here are we generalizing Kirchoff’s law from a surface to a column of air, but that’s okay because we can treat a column as a stack of surfaces; letting these become very thin we arrive at a differential formulation of the law that applies to absorption and emission rates instead of absorptivity and emissivity. (If you’re very sharp, you’ll remember that Kirchoff’s law applies to thermal equilibrium, and wonder about that. Air in the atmosphere isn’t in perfect thermal equilibrium, but it’s close enough for what we’re doing here.)

So, when we take absorption and also emission into account, Beer’s law gets another term:

\displaystyle{  \frac{d I_\lambda(s)}{d s} = - a_\lambda(s) I_\lambda(s) + a_\lambda(s) f_\lambda(T(s)) }

where T is the temperature of our gas at the position s. In other words:

\displaystyle{  \frac{d I_\lambda(s)}{d s} =  a_\lambda(s) ( f_\lambda(T) - I_\lambda(s))}

This is Schwarzschild’s equation.

Puzzle 3. Is this equation named after the same guy who discovered the Schwarzschild metric in general relativity, describing a spherically symmetric black hole?

Application to the atmosphere

In principle, we can use Schwarzschild’s equation to help work out how much sunlight of any frequency actually makes it through the atmosphere down to the Earth, and also how much infrared radiation makes it through the atmosphere out to space. But this is not a calculation I can do here today, because it’s very complicated.

If we actually measure what fraction of radiation of different frequencies makes it through the atmosphere, you’ll see why:


Everything here is a function of the wavelength, measured in micrometers. The smooth red curve is the Planck distribution for light coming from the Sun at a temperature of 5325 K. Most of it is visible light, with a wavelength between 0.4 and 0.7 micrometers. The jagged red region shows how much of this gets through—on a clear day, I assume—and you can see that most of it gets through. The smooth bluish curves are the Planck distributions for light coming from the Earth at various temperatures between 210 K and 310 K. Most of it is infrared light, and not much of it gets through.

This, in a nutshell, is what keeps the Earth warmer than the chilly -18 °C we got last time for an Earth with no atmosphere!

This is the greenhouse effect. As you can see, the absorption of infrared light is mainly due to water vapor, and then carbon dioxide, and then other lesser greenhouse gases, mainly methane, nitrous oxide. Oxygen and ozone also play a minor role, but ozone is more important in blocking ultraviolet light. Rayleigh scattering—the scattering of light by small particles, including molecules and atoms—is also important at short wavelengths, because its strength is proportional to 1/\lambda^4. This is why the sky is blue!

Here the wavelengths are measured in nanometers; there are 1000 nanometers in a micrometer. Rayleigh scattering continues to become more important in the ultraviolet.

But right now I want to talk about the infrared. As you can see, the all-important absorption of infrared radiation by water vapor and carbon dioxide is quite complicated. You need quantum mechanics to predict how this works from first principles. Tim van Beek gave a gentle introduction to some of the key ideas here:

• Tim van Beek, A quantum of warmth, Azimuth, 2 July 2011.

Someday it would be fun to get into the details. Not today, though!

You can see what’s going on a bit more clearly here:


The key fact is that infrared is almost completely absorbed for wavelengths between 5.5 and 7 micrometers, or over 14 micrometers. (A ‘micron’ is just an old name for a micrometer.)

The work of Simpson

The first person to give a reasonably successful explanation of how the power of radiation emitted by the Earth balances the power of the sunlight it absorbs was George Simpson. He did it in 1928:

• George C. Simpson, Further studies in terrestrial radiation, Mem. Roy. Meteor. Soc. 3 (1928), 1–26.

One year earlier, he had tried and failed to understand this problem using a ‘gray atmosphere’ model where the fraction of light that gets through was independent of its wavelength. If you’ve been paying attention, I think you can see why that didn’t work.

In 1928, since he didn’t have a computer, he made a simple model that treated emission of infrared radiation as follows.

He treated the atmosphere as made of layers of varying thickness, each layer containing 0.03 grams per centimeter2 of water vapor. The Earth’s surface radiates infrared almost as a black body. Part of the power is absorbed by the first layer above the surface, while some makes it through. The first layer then re-radiates at the same wavelengths at a rate determined by its temperature. Half this goes downward, while half goes up. Of the part going upward, some is absorbed by the next layer… and so on, up to the top layer. He took this top layer to end at the stratosphere, since the atmosphere is much drier in the stratosphere.

He did this all in a way that depends on the wavelength, but using a simplified model of how each of these layers absorbs infrared light. He assumed it was:

• completely opaque from 5.5 to 7 micrometers (due to water vapor),

• partly transparent from 7 to 8.5 micrometers (interpolating between opaque and transparent),

• completely transparent from 8.5 to 11 micrometers,

• partly transparent from 11 to 14 micrometers (interpolating between transparent and opaque),

• completely opaque above 14 micrometers (due to carbon dioxide and water vapor).

He got this result, at the latitude 50° on a clear day:


The upper smooth curve is the Planck distribution for a temperature of 280 K, corresponding to the ground. The lower smooth curve is the Planck distribution at 218 K, corresponding to the stratosphere. The shaded region is his calculation of the monochromatic flux emitted into space by the Earth. As you can see, it matches the Planck distribution for the stratosphere where the lower atmosphere is completely opaque in his model—between 5.5 and 7 micrometers, and over 14 micrometers. It matches the Planck distribution for the stratosphere where the lower atmosphere is completely transparent. Elsewhere, it interpolates between the two.

The area of this shaded region—calculated with a planimeter, perhaps?—is the total flux emitted into space.

This is just part of the story: he also took clouds into account, and he did different calculations at different latitudes. He got a reasonably good balance between the incoming and outgoing power. In short, he showed that an Earth with its observed temperatures is roughly compatible with his model of how the Earth absorbs and emits radiation. Note that this is just another way to tackle the problem of predicting the temperature given a model.

Also note that Simpson didn’t quite use the Schwarzschild equation. But I guess that in some sense he discretized it—right?

And so on

This was just the beginning of a series of more and more sophisticated models. I’m too tired to go on right now.

You’ll note one big thing we’ve omitted: any sort of calculation of how the pressure, temperature and humidity of the air varies with altitude! To the extent we talked about those at all, we treated them as inputs. But for a full-fledged one-dimensional model of the Earth’s atmosphere, we’d want to derive them from some principles. There are, after all, some important puzzles:

Puzzle 4. If hot air rises, why does the atmosphere generally get colder as you go upward, at least until you reach the stratosphere?

Puzzle 5. Why is there a tropopause? In other words, why is there a fairly sudden transition 10 kilometers up between the troposphere, where the air is moist, cooler the higher you go, and turbulent, and the stratosphere, where the air is drier, warmer the higher you go, and not turbulent?

There’s a limit to how much we can understand these puzzles using a 1-dimensional model, but we should at least try to make a model of a thin column of air with pressure, temperature and humidity varying as a function of altitude, with sunlight streaming downward and infrared radiation generally going up. If we can’t do that, we’ll never understand more complicated things, like the actual atmosphere.


Mathematics of the Environment (Part 2)

11 October, 2012

Here are some notes for the second session of my seminar. They are shamelessly borrowed from these sources:

• Tim van Beek, Putting the Earth In a Box, Azimuth, 19 June 2011.

Climate model, Azimuth Library.

Climate models

Though it’s not my central concern in this class, we should talk a little about climate models.

There are many levels of sophistication when it comes to climate models. It is wise to start with simple, not very realistic models before ascending to complicated, supposedly more realistic ones. This is true in every branch of math or physics: working with simple models gives you insights that are crucial for correctly handling more complicated models. You shouldn’t fly a fighter jet if you haven’t tried something simpler yet, like a bicycle: you’ll probably crash and burn.

As I mentioned last time, models in biology, ecology and climate science pose new challenges compared to models of the simpler systems that physicists like best. As Chris Lee emphasizes, biology inherently deals with ‘high data’ systems where the relevant information can rarely be captured in a few variables, or even a few field equations.

(Field theories involve infinitely many variables, but somehow the ones physicists like best allow us to make a small finite number of measurements and extract a prediction from them! It would be nice to understand this more formally. In quantum field theory, the ‘nice’ field theories are called ‘renormalizable’, but a similar issue shows up classically, as we’ll see in a second.)

The climate system is in part a system that feels like ‘physics’: the flow of air in the atmosphere and water in the ocean. But some of the equations here, for example the Navier–Stokes equations, are already ‘nasty’ by the standards of mathematical physics, since the existence of solutions over long periods of time has not been proved. This is related to ‘turbulence’, a process where information at one length scale can significantly affect information at another dramatically different length scale, making precise predictions difficult.

Climate prediction is, we hope and believe, somewhat insulated from the challenges of weather prediction: we can hope to know the average temperature of the Earth within a degree or two in 5 years even though we don’t know whether it will rain in Manhattan on October 8, 2017. But this hope is something that needs to be studied, not something we can take for granted.

On top of this, the climate is, quite crucially, a biological system. Plant and animal life really affects the climate, as well as being affected by it. So, for example, a really detailed climate model may have a portion specially devoted to the behavior of plankton in the Mediterranean. This means that climate models will never be as ‘neat and clean’ as physicists and mathematicians tend to want—at least, not if these models are trying to be truly realistic. And as I suggested last time, this general type of challenge—the challenge posed by biosystems too complex to precisely model—may ultimately push mathematics in very new directions.

I call this green mathematics, without claiming I know what it will be like. The term is mainly an incitement to think big. I wrote a little about it here.

However, being a bit of an old-fashioned mathematician myself, I’ll start by talking about some very simple climate models, gradually leading up to some interesting puzzles about the ‘ice ages’ or, more properly, ‘glacial cycles’ that have been pestering the Earth for the last 20 million years or so. First, though, let’s take a quick look at the hierarchy of different climate models.

Different kinds of climate models

Zero-dimensional models are like theories of classical mechanics instead of classical field theory. In other words, they only consider with globally averaged quantities, like the average temperature of the Earth, or perhaps regionally averaged quantities, like the average temperature of each ocean and each continent. This sounds silly, but it’s a great place to start. It amounts to dealing with finitely many variables depending on time:

(x_1(t), \dots x_n(t))

We might assume these obey a differential equation, which we can always make first-order by introducing extra variables:

\displaystyle{ \frac{d x_i}{d t} = f_i(t, x_1(t), \dots, x_n(t))  }

This kind of model is studied quite generally in the subject of dynamical systems theory.

In particular, energy balance models try to predict the average surface temperature of the Earth depending on the energy flow. Energy comes in from the Sun and is radiated to outer space by the Earth. What happens in between is modeled by averaged feedback equations.

The Earth has various approximately conserved quantities like the total amount of carbon, or oxygen, or nitrogen—radioactive decay creates and destroys these elements, but it’s pretty negligible in climate physics. So, these things move around from one form to another. We can imagine a model where some of our variables x_i(t) are the amounts of carbon in the air, or in the soil, or in the ocean—different ‘boxes’, abstractly speaking. It will flow from one box to another in a way that depends on various other variables in our model. This idea gives class of models called box models.

Here’s one described by Nathan Urban in “week304″ of This Week’s Finds:

I’m interested in box models because they’re a simple example of ‘networked systems’: we’ve got boxes hooked up by wires, or pipes, and we can imagine a big complicated model formed by gluing together smaller models, attaching the wires from one to the wires of another. We can use category theory to formalize this. In category theory we’d call these smaller models ‘morphisms’, and the process of gluing them together is called ‘composing’ them. I’ll talk about this a lot more someday.

One-dimensional models treat temperature and perhaps other quantities as a function of one spatial coordinate (in addition to time): for example, the altitude. This lets us include one dimensional processes of heat transport in the model, like radiation and (a very simplified model of) convection.

Two-dimensional models treat temperature and other quantities as a function of two spatial coordinates (and time): for example, altitude and latitude. Alternatively, we could treat the atmosphere as a thin layer and think of temperature at some fixed altitude as a function of latitude and longitude!

Three-dimensional models treat temperature and other quantities as a function of all three spatial coordinates. At this point we can, if we like, use the full-fledged Navier–Stokes equations to describe the motion of air in the atmosphere and water in the ocean. Needless to say, these models can become very complex and computation-intensive, depending on how many effects we want to take into account and at what resolution we wish to model the atmosphere and ocean.

General circulation models or GCMs try to model the circulation of the atmosphere and/or ocean.

Atmospheric GCMs or AGCMs model the atmosphere and typically contain a land-surface model, while imposing some boundary conditions describing sea surface temperatures. Oceanic GCMs or OGCMs model the ocean (with fluxes from the atmosphere imposed) and may or may not contain a sea ice model. Coupled atmosphere–ocean GCMs or AOGCMs do both atmosphere and ocean. These the basis for detailed predictions of future climate, such as are discussed by the Intergovernmental Panel on Climate Change, or IPCC.

• Backing down a bit, we can consider Earth models of intermediate complexity or EMICs. These might have a 3-dimensional atmosphere and a 2-dimensional ‘slab ocean’, or a 3d ocean and an energy-moisture balance atmosphere.

• Alternatively, we can consider regional circulation models or RCMs. These are limited-area models that can be run at higher resolution than the GCMs and are thus able to better represent fine-grained phenomena, including processes resulting from finer-scale topographic and land-surface features. Typically the regional atmospheric model is run while receiving lateral boundary condition inputs from a relatively-coarse resolution atmospheric analysis model or from the output of a GCM. As Michael Knap pointed out in class, there’s again something from network theory going on here: we are ‘gluing in’ the RCM into a ‘hole’ cut out of a GCM.

Modern GCMs as used in the 2007 IPCC report tended to run around 100-kilometer resolution. Individual clouds can only start to be resolved at about 10 kilometers or below. One way to deal with this is to take the output of higher resolution regional climate models and use it to adjust parameters, etcetera, in GCMs.

The hierarchy of climate models

The climate scientist Isaac Held has a great article about the hierarchy of climate models:

• Isaac Held, The gap between simulation and understanding in climate modeling, Bulletin of the American Meteorological Society (November 2005), 1609–1614.

In it, he writes:

The importance of such a hierarchy for climate modeling and studies of atmospheric and oceanic dynamics has often been emphasized. See, for example, Schneider and Dickinson (1974), and, especially, Hoskins (1983). But, despite notable exceptions in a few subfields, climate theory has not, in my opinion, been very successful at hierarchy construction. I do not mean to imply that important work has not been performed, of course, but only that the gap between comprehensive climate models and more idealized models has not been successfully closed.

Consider, by analogy, another field that must deal with exceedingly complex systems—molecular biology. How is it that biologists have made such dramatic and steady progress in sorting out the human genome and the interactions of the thousands of proteins of which we are constructed? Without doubt, one key has been that nature has provided us with a hierarchy of biological systems of increasing complexity that are amenable to experimental manipulation, ranging from bacteria to fruit fly to mouse to man. Furthermore, the nature of evolution assures us that much of what we learn from simpler organisms is directly relevant to deciphering the workings of their more complex relatives. What good fortune for biologists to be presented with precisely the kind of hierarchy needed to understand a complex system! Imagine how much progress would have been made if they were limited to studying man alone.

Unfortunately, Nature has not provided us with simpler climate systems that form such a beautiful hierarchy. Planetary atmospheres provide insights into the range of behaviors that are possible, but the known planetary atmospheres are few, and each has its own idiosyncrasies. Their study has connected to terrestrial climate theory on occasion, but the influence has not been systematic. Laboratory simulations of rotating and/or convecting fluids remain valuable and underutilized, but they cannot address our most complex problems. We are left with the necessity of constructing our own hierarchies of climate models.

Because nature has provided the biological hierarchy, it is much easier to focus the attention of biologists on a few representatives of the key evolutionary steps toward greater complexity. And, such a focus is central to success. If every molecular biologist had simply studied his or her own favorite bacterium or insect, rather than focusing so intensively on E. coli or Drosophila melanogaster, it is safe to assume that progress would have been far less rapid.

It is emblematic of our problem that studying the biological hierarchy is experimental science, while constructing and studying climate hierarchies is theoretical science. A biologist need not convince her colleagues that the model organism she is advocating for intensive study is well designed or well posed, but only that it fills an important niche in the hierarchy of complexity and that it is convenient for study. Climate theorists are faced with the difficult task of both constructing a hierarchy of models and somehow focusing the attention of the community on a few of these models so that our efforts accumulate efficiently. Even if one believes that one has defined the E. coli of climate models, it is difficult to energize (and fund) a significant number of researchers to take this model seriously and devote years to its study.

And yet, despite the extra burden of trying to create a consensus as to what the appropriate climate model hierarchies are, the construction of such hierarchies must, I believe, be a central goal of climate theory in the twenty-first century. There are no alternatives if we want to understand the climate system and our
comprehensive climate models. Our understanding will be embedded within these hierarchies.

It is possible that mathematicians, with a lot of training from climate scientists, have the sort of patience and delight in ‘study for study’s sake’ to study this hierarchy of models. Here’s one that Held calls ‘the fruit fly of climate models’:

For more, see:

• Isaac Held, The fruit fly of climate models.

The very simplest model

The very simplest model is a zero-dimensional energy balance model. In this model we treat the Earth as having just one degree of freedom—its temperature—and we treat it as a blackbody in equilibrium with the radiation coming from the Sun.

A black body is an object that perfectly absorbs and therefore also perfectly emits all electromagnetic radiation at all frequencies. Real bodies don’t have this property; instead, they absorb radiation at certain frequencies better than others, and some not at all. But there are materials that do come rather close to a black body. Usually one adds another assumption to the characterization of an ideal black body: namely, that the radiation is independent of the direction.

When the black body has a certain temperature T, it will emit electromagnetic radiation, so it will send out a certain amount of energy per second for every square meter of surface area. We will call this the energy flux and denote this as f. The SI unit for f is W/m2: that is, watts per square meter. Here the watt is a unit of energy per time.

Electromagnetic radiation comes in different wavelengths. So, can ask how much energy flux our black body emits per change in wavelength. This depends on the wavelength. We will call this the monochromatic energy flux f_{\lambda}. The SI unit for f_{\lambda} is W/m2μm, where μm stands for micrometer: a millionth of a meter, which is a unit of wavelength. We call f_\lambda the ‘monochromatic’ energy flux because it gives a number for any fixed wavelength \lambda. When we integrate the monochromatic energy flux over all wavelengths, we get the energy flux f.

Max Planck was able to calculate f_{\lambda} for a blackbody at temperature T, but only by inventing a bit of quantum mechanics. His result is called the Planck distribution: if

\displaystyle{ f_{\lambda}(T) = \frac{2 hc^2}{\lambda^5} \frac{1}{ e^{\frac{hc}{\lambda k T}} - 1 } }

where h is Planck’s constant, c is the speed of light, and k is Boltzmann’s constant. Deriving this would be tons of fun, but also a huge digression from the point of this class.

You can integrate f_\lambda over all wavelengths \lambda to get the total energy flux—that is, the total power per square meter emitted by a blackbody. The answer is surprisingly simple: if the total energy flux is defined by

\displaystyle{f = \int_0^\infty f_{\lambda}(T) \, d \lambda }

then in fact we can do the integral and get

f = \sigma \; T^4

for some constant \sigma. This fact is called the Stefan–Boltzmann law, and \sigma is called the Stefan-Boltzmann constant:

\displaystyle{ \sigma=\frac{2\pi^5 k^4}{15c^2h^3} \approx 5.67 \times 10^{-8}\, \frac{\mathrm{W}}{\mathrm{m}^2 \mathrm{K}^4} }

Using this formula, we can assign to every energy flux f a black body temperature T, which is the temperature that an ideal black body would need to have to emit f.

Let’s use this to calculate the temperature of the Earth in this simple model! A planet like Earth gets energy from the Sun and loses energy by radiating to space. Since the Earth sits in empty space, these two processes are the only relevant ones that describe the energy flow.

The sunshine near Earth carries an energy flux of about 1370 watts per square meter. If the temperature of the Earth is constant, as much energy is coming in as going out. So, we might try to balance the incoming energy flux with the outgoing flux of a blackbody at temperature T:

\displaystyle{ 1370 \, \textrm{W}/\textrm{m}^2 = \sigma T^4 }

and then solve for T:

\displaystyle{ T = \left(\frac{1370 \textrm{W}/\textrm{m}^2}{\sigma}\right)^{1/4} }

We’re making a big mistake here. Do you see what it is? But let’s go ahead and see what we get. As mentioned, the Stefan–Boltzmann constant has a value of

\displaystyle{ \sigma \approx 5.67 \times 10^{-8} \, \frac{\mathrm{W}}{\mathrm{m}^2 \mathrm{K}^4}  }

so we get

\displaystyle{ T = \left(\frac{1370}{5.67 \times 10^{-8}} \right)^{1/4} \mathrm{K} }  \approx (2.4 \cdot 10^9)^{1/4} \mathrm{K} \approx 394 \mathrm{K}

This is much too hot! Remember, this temperature is in kelvin, so we need to subtract 273 to get Celsius. Doing so, we get a temperature of 121 °C. This is above the boiling point of water!

Do you see what we did wrong? We neglected a phenomenon known as night. The Earth emits infrared radiation in all directions, but it only absorbs sunlight on the daytime side. Our calculation would be correct if the Earth were a flat disk of perfectly black stuff facing the Sun and perfectly insulated on the back so that it could only emit infrared radiation over the same area that absorbs sunlight! But in fact emission takes place over a larger area than absorption. This makes the Earth cooler.

To get the right answer, we need to take into account the fact that the Earth is round. But just for fun, let’s see how well a flat Earth theory does. A few climate skeptics may even believe this theory. Suppose the Earth were a flat disk of radius r, made of black stuff facing the Sun but not insulated on back. Then it would absorb power equal to

1370 \cdot \pi r^2

since the area of the disk is \pi r^2, but it would emit power equal to

\sigma T^4 \cdot 2 \pi r^2

since it emits from both the front and back. Setting these equal, we now get

\displaystyle{ \frac{1370}{2} \textrm{W}/\textrm{m}^2 = \sigma T^4 }

or

\displaystyle{ T = \left(\frac{1370 \textrm{W}/\textrm{m}^2}{2 \sigma}\right)^{1/4} }

This reduces the temperature by a factor of 2^{-1/4} \approx 0.84 from our previous estimate. So now the temperature works out to be less:

0.84 \cdot 394 \mathrm{K} \approx 331 \mathrm{K}

But this is still too hot! It’s 58 °C, or 136 °F for you Americans out there who don’t have a good intuition for Celsius.

So, a flat black Earth facing the Sun would be a very hot Earth.

But now let’s stop goofing around and do the calculation with a round Earth. Now it absorbs a beam of sunlight with area equal to its cross-section, a circle of area \pi r^2. But it emits infrared over its whole area of 4 \pi r^2: four times as much. So now we get

\displaystyle{ T = \left(\frac{1370 \textrm{W}/\textrm{m}^2}{4 \sigma}\right)^{1/4} }

so the temperature is reduced by a further factor of 2^{-1/4}. We get

0.84 \cdot 331 \mathrm{K} \approx 279 \mathrm{K}

That’s 6 °C. Not bad for a crude approximation! Amusingly, it’s crucial that the area of a sphere is 4 times the area of a circle of the same radius. The question if there is some deeper reason for this simple relation was posed as a geometry puzzle here on Azimuth.

I hope my clowning around hasn’t distracted you from the main point. On average our simplified blackbody Earth absorbs 1370/4 = 342.5 watts of solar power per square meter. So, that’s how much infrared radiation it has to emit. If you can imagine how much heat a 60-watt bulb puts out when it’s surrounded by black paper, we’re saying our simplified Earth emits about 6 times that heat per square meter.

The second simplest climate model

The next step is to take into account the ‘albedo’ of the Earth. The albedo is the fraction of radiation that is instantly reflected without being absorbed. The albedo of a surface does depend on the material of the surface, and in particular on the wavelength of the radiation, of course. But in a first approximation for the average albedo of earth we can take:

\mathrm{albedo}_{\mathrm{Earth}} = 0.3

This means that 30% of the radiation is instantly reflected and only 70% contributes to heating earth. So, instead of getting heated by an average of 342.5 watts per square meter of sunlight, let’s assume it’s heated by

0.7 \times 342.5 \approx 240

watts per square meter. Now we get a temperature of

\displaystyle{ T = \left(\frac{240}{5.67 \times 10^{-8}} \right)^{1/4} K }  \approx (4.2 \cdot 10^9)^{1/4} K \approx 255 K

This is -18 °C. The average temperature of earth is actually estimated to be considerably warmer: about +15 °C. This should not be a surprise: after all, 70% of the planet is covered by liquid water! This is an indication that the average temperature is most probably not below the freezing point of water.

So, our new ‘improved’ calculation gives a worse agreement with reality. The actual Earth is roughly 33 kelvin warmer than our model Earth! What’s wrong?

The main explanation for the discrepancy seems to be: our model Earth doesn’t have an atmosphere yet! Thanks in part to greenhouse gases like water vapor and carbon dioxide, sunlight at visible frequencies can get into the atmosphere more easily than infrared radiation can get out. This warms the Earth. This, in a nutshell, is why dumping a lot of extra carbon dioxide into the air can change our climate. But of course we’ll need to turn to more detailed models, or experimental data, to see how strong this effect is.

Besides the greenhouse effect, there are many other things our ultra-simplified model leaves out: everything associated to the atmosphere and oceans, such as weather, clouds, the altitude-dependence of the temperature of the atmosphere… and also the way the albedo of the Earth depends on location and even on temperature and other factors. There is much much more to say about all this… but not today!


The Mathematical Origin of Irreversibility

8 October, 2012

guest post by Matteo Smerlak

Introduction

Thermodynamical dissipation and adaptive evolution are two faces of the same Markovian coin!

Consider this. The Second Law of Thermodynamics states that the entropy of an isolated thermodynamic system can never decrease; Landauer’s principle maintains that the erasure of information inevitably causes dissipation; Fisher’s fundamental theorem of natural selection asserts that any fitness difference within a population leads to adaptation in an evolution process governed by natural selection. Diverse as they are, these statements have two common characteristics:

1. they express the irreversibility of certain natural phenomena, and

2. the dynamical processes underlying these phenomena involve an element of randomness.

Doesn’t this suggest to you the following question: Could it be that thermal phenomena, forgetful information processing and adaptive evolution are governed by the same stochastic mechanism?

The answer is—yes! The key to this rather profound connection resides in a universal property of Markov processes discovered recently in the context of non-equilibrium statistical mechanics, and known as the ‘fluctuation theorem’. Typically stated in terms of ‘dissipated work’ or ‘entropy production’, this result can be seen as an extension of the Second Law of Thermodynamics to small systems, where thermal fluctuations cannot be neglected. But it is actually much more than this: it is the mathematical underpinning of irreversibility itself, be it thermodynamical, evolutionary, or else. To make this point clear, let me start by giving a general formulation of the fluctuation theorem that makes no reference to physics concepts such as ‘heat’ or ‘work’.

The mathematical fact

Consider a system randomly jumping between states a, b,\dots with (possibly time-dependent) transition rates \gamma_{a b}(t) where a is the state prior to the jump, while b is the state after the jump. I’ll assume that this dynamics defines a (continuous-time) Markov process, namely that the numbers \gamma_{a b} are the matrix entries of an infinitesimal stochastic matrix, which means that its off-diagonal entries are non-negative and that its columns sum up to zero.

Now, each possible history \omega=(\omega_t)_{0\leq t\leq T} of this process can be characterized by the sequence of occupied states a_{j} and by the times \tau_{j} at which the transitions a_{j-1}\longrightarrow a_{j} occur (0\leq j\leq N):

\omega=(\omega_{0}=a_{0}\overset{\tau_{0}}{\longrightarrow} a_{1} \overset{\tau_{1}}{\longrightarrow}\cdots \overset{\tau_{N}}{\longrightarrow} a_{N}=\omega_{T}).

Define the skewness \sigma_{j}(\tau_{j}) of each of these transitions to be the logarithmic ratio of transition rates:

\displaystyle{\sigma_{j}(\tau_{j}):=\ln\frac{\gamma_{a_{j}a_{j-1}}(\tau_{j})}{\gamma_{a_{j-1}a_{j}}(\tau_{j})}}

Also define the self-information of the system in state a at time t by:

i_a(t):= -\ln\pi_{a}(t)

where \pi_{a}(t) is the probability that the system is in state a at time t, given some prescribed initial distribution \pi_{a}(0). This quantity is also sometimes called the surprisal, as it measures the ‘surprise’ of finding out that the system is in state a at time t.

Then the following identity—the detailed fluctuation theorem—holds:

\mathrm{Prob}[\Delta i-\Sigma=-A] = e^{-A}\;\mathrm{Prob}[\Delta i-\Sigma=A]

where

\displaystyle{\Sigma:=\sum_{j}\sigma_{j}(\tau_{j})}

is the cumulative skewness along a trajectory of the system, and

\Delta i= i_{a_N}(T)-i_{a_0}(0)

is the variation of self-information between the end points of this trajectory.

This identity has an immediate consequence: if \langle\,\cdot\,\rangle denotes the average over all realizations of the process, then we have the integral fluctuation theorem:

\langle e^{-\Delta i+\Sigma}\rangle=1,

which, by the convexity of the exponential and Jensen’s inequality, implies:

\langle \Delta i\rangle=\Delta S\geq\langle\Sigma\rangle.

In short: the mean variation of self-information, aka the variation of Shannon entropy

\displaystyle{ S(t):= \sum_{a}\pi_{a}(t)i_a(t) }

is bounded from below by the mean cumulative skewness of the underlying stochastic trajectory.

This is the fundamental mathematical fact underlying irreversibility. To unravel its physical and biological consequences, it suffices to consider the origin and interpretation of the ‘skewness’ term in different contexts. (By the way, people usually call \Sigma the ‘entropy production’ or ‘dissipation function’—but how tautological is that?)

The physical and biological consequences

Consider first the standard stochastic-thermodynamic scenario where a physical system is kept in contact with a thermal reservoir at inverse temperature \beta and undergoes thermally induced transitions between states a, b,\dots. By virtue of the detailed balance condition:

\displaystyle{ e^{-\beta E_{a}(t)}\gamma_{a b}(t)=e^{-\beta E_{b}(t)}\gamma_{b a}(t),}

the skewness \sigma_{j}(\tau_{j}) of each such transition is \beta times the energy difference between the states a_{j} and a_{j-1}, namely the heat received from the reservoir during the transition. Hence, the mean cumulative skewness \langle \Sigma\rangle is nothing but \beta\langle Q\rangle, with Q the total heat received by the system along the process. It follows from the detailed fluctuation theorem that

\langle e^{-\Delta i+\beta Q}\rangle=1

and therefore

\Delta S\geq\beta\langle Q\rangle

which is of course Clausius’ inequality. In a computational context where the control parameter is the entropy variation itself (such as in a bit-erasure protocol, where \Delta S=-\ln 2), this inequality in turn expresses Landauer’s principle: it impossible to decrease the self-information of the system’s state without dissipating a minimal amount of heat into the environment (in this case -Q \geq k T\ln2, the ‘Landauer bound’). More general situations (several types of reservoirs, Maxwell-demon-like feedback controls) can be treated along the same lines, and the various forms of the Second Law derived from the detailed fluctuation theorem.

Now, many would agree that evolutionary dynamics is a wholly different business from thermodynamics; in particular, notions such as ‘heat’ or ‘temperature’ are clearly irrelevant to Darwinian evolution. However, the stochastic framework of Markov processes is relevant to describe the genetic evolution of a population, and this fact alone has important consequences. As a simple example, consider the time evolution of mutant fixations x_{a} in a population, with a ranging over the possible genotypes. In a ‘symmetric mutation scheme’, which I understand is biological parlance for ‘reversible Markov process’, meaning one that obeys detailed balance, the ratio between the a\mapsto b and b\mapsto a transition rates is completely determined by the fitnesses f_{a} and f_b of a and b, according to

\displaystyle{\frac{\gamma_{a b}}{\gamma_{b a}} =\left(\frac{f_{b}}{f_{a}}\right)^{\nu} }

where \nu is a model-dependent function of the effective population size [Sella2005]. Along a given history of mutant fixations, the cumulated skewness \Sigma is therefore given by minus the fitness flux:

\displaystyle{\Phi=\nu\sum_{j}(\ln f_{a_j}-\ln f_{a_{j-1}}).}

The integral fluctuation theorem then becomes the fitness flux theorem:

\displaystyle{ \langle e^{-\Delta i -\Phi}\rangle=1}

discussed recently by Mustonen and Lässig [Mustonen2010] and implying Fisher’s fundamental theorem of natural selection as a special case. (Incidentally, the ‘fitness flux theorem’ derived in this reference is more general than this; for instance, it does not rely on the ‘symmetric mutation scheme’ assumption above.) The ensuing inequality

\langle \Phi\rangle\geq-\Delta S

shows that a positive fitness flux is “an almost universal evolutionary principle of biological systems” [Mustonen2010], with negative contributions limited to time intervals with a systematic loss of adaptation (\Delta S > 0). This statement may well be the closest thing to a version of the Second Law of Thermodynamics applying to evolutionary dynamics.

It is really quite remarkable that thermodynamical dissipation and Darwinian evolution can be reduced to the same stochastic mechanism, and that notions such as ‘fitness flux’ and ‘heat’ can arise as two faces of the same mathematical coin, namely the ‘skewness’ of Markovian transitions. After all, the phenomenon of life is in itself a direct challenge to thermodynamics, isn’t it? When thermal phenomena tend to increase the world’s disorder, life strives to bring about and maintain exquisitely fine spatial and chemical structures—which is why Schrödinger famously proposed to define life as negative entropy. Could there be a more striking confirmation of his intuition—and a reconciliation of evolution and thermodynamics in the same go—than the fundamental inequality of adaptive evolution \langle\Phi\rangle\geq-\Delta S?

Surely the detailed fluctuation theorem for Markov processes has other applications, pertaining neither to thermodynamics nor adaptive evolution. Can you think of any?

Proof of the fluctuation theorem

I am a physicist, but knowing that many readers of John’s blog are mathematicians, I’ll do my best to frame—and prove—the FT as an actual theorem.

Let (\Omega,\mathcal{T},p) be a probability space and (\,\cdot\,)^{\dagger}=\Omega\to \Omega a measurable involution of \Omega. Denote p^{\dagger} the pushforward probability measure through this involution, and

\displaystyle{ R=\ln \frac{d p}{d p^\dagger} }

the logarithm of the corresponding Radon-Nikodym derivative (we assume p^\dagger and p are mutually absolutely continuous). Then the following lemmas are true, with (1)\Rightarrow(2)\Rightarrow(3):

Lemma 1. The detailed fluctuation relation:

\forall A\in\mathbb{R} \quad  p\big(R^{-1}(-A) \big)=e^{-A}p \big(R^{-1}(A) \big)

Lemma 2. The integral fluctuation relation:

\displaystyle{\int_{\Omega} d p(\omega)\,e^{-R(\omega)}=1 }

Lemma 3. The positivity of the Kullback-Leibler divergence:

D(p\,\Vert\, p^{\dagger}):=\int_{\Omega} d p(\omega)\,R(\omega)\geq 0.

These are basic facts which anyone can show: (2)\Rightarrow(3) by Jensen’s inequality, (1)\Rightarrow(2) trivially, and (1) follows from R(\omega^{\dagger})=-R(\omega) and the change of variables theorem, as follows,

\begin{array}{ccl} \displaystyle{ \int_{R^{-1}(-A)} d p(\omega)} &=& \displaystyle{ \int_{R^{-1}(A)}d p^{\dagger}(\omega) } \\ \\ &=& \displaystyle{ \int_{R^{-1}(A)} d p(\omega)\, e^{-R(\omega)} } \\ \\ &=& \displaystyle{ e^{-A} \int_{R^{-1}(A)} d p(\omega)} .\end{array}

But here is the beauty: if

(\Omega,\mathcal{T},p) is actually a Markov process defined over some time interval [0,T] and valued in some (say discrete) state space \Sigma, with the instantaneous probability \pi_{a}(t)=p\big(\{\omega_{t}=a\} \big) of each state a\in\Sigma satisfying the master equation (aka Kolmogorov equation)

\displaystyle{ \frac{d\pi_{a}(t)}{dt}=\sum_{b\neq a}\Big(\gamma_{b a}(t)\pi_{a}(t)-\gamma_{a b}(t)\pi_{b}(t)\Big),}

and

• the dagger involution is time-reversal, that is \omega^{\dagger}_{t}:=\omega_{T-t},

then for a given path

\displaystyle{\omega=(\omega_{0}=a_{0}\overset{\tau_{0}}{\longrightarrow} a_{1} \overset{\tau_{1}}{\longrightarrow}\cdots \overset{\tau_{N}}{\longrightarrow} a_{N}=\omega_{T})\in\Omega}

the logarithmic ratio R(\omega) decomposes into ‘variation of self-information’ and ‘cumulative skewness’ along \omega:

\displaystyle{ R(\omega)=\underbrace{\Big(\ln\pi_{a_0}(0)-\ln\pi_{a_N}(T) \Big)}_{\Delta i(\omega)}-\underbrace{\sum_{j=1}^{N}\ln\frac{\gamma_{a_{j}a_{j-1}}(\tau_{j})}{\gamma_{a_{j-1}a_{j}}(\tau_{j})}}_{\Sigma(\omega)}.}

This is easy to see if one writes the probability of a path explicitly as

\displaystyle{p(\omega)=\pi_{a_{0}}(0)\left[\prod_{j=1}^{N}\phi_{a_{j-1}}(\tau_{j-1},\tau_{j})\gamma_{a_{j-1}a_{j}}(\tau_{j})\right]\phi_{a_{N}}(\tau_{N},T)}

where

\displaystyle{ \phi_{a}(\tau,\tau')=\phi_{a}(\tau',\tau)=\exp\Big(-\sum_{b\neq a}\int_{\tau}^{\tau'}dt\, \gamma_{a b}(t)\Big)}

is the probability that the process remains in the state a between the times \tau and \tau'. It follows from the above lemma that

Theorem. Let (\Omega,\mathcal{T},p) be a Markov process and let i,\Sigma:\Omega\rightarrow \mathbb{R} be defined as above. Then we have

1. The detailed fluctuation theorem:

\forall A\in\mathbb{R}, p\big((\Delta i-\Sigma)^{-1}(-A) \big)=e^{-A}p \big((\Delta i-\Sigma)^{-1}(A) \big)

2. The integral fluctuation theorem:

\int_{\Omega} d p(\omega)\,e^{-\Delta i(\omega)+\Sigma(\omega)}=1

3. The ‘Second Law’ inequality:

\displaystyle{ \Delta S:=\int_{\Omega} d p(\omega)\,\Delta i(\omega)\geq \int_{\Omega} d p(\omega)\,\Sigma(\omega)}

The same theorem can be formulated for other kinds of Markov processes as well, including diffusion processes (in which case it follows from the Girsanov theorem).

References

Landauer’s principle was introduced here:

• [Landauer1961] R. Landauer, Irreversibility and heat generation in the computing process}, IBM Journal of Research and Development 5, (1961) 183–191.

and is now being verified experimentally by various groups worldwide.

The ‘fundamental theorem of natural selection’ was derived by Fisher in his book:

• [Fisher1930] R. Fisher, The Genetical Theory of Natural Selection, Clarendon Press, Oxford, 1930.

His derivation has long been considered obscure, even perhaps wrong, but apparently the theorem is now well accepted. I believe the first Markovian models of genetic evolution appeared here:

• [Fisher1922] R. A. Fisher, On the dominance ratio, Proc. Roy. Soc. Edinb. 42 (1922), 321–341.

• [Wright1931] S. Wright, Evolution in Mendelian populations, Genetics 16 (1931), 97–159.

Fluctuation theorems are reviewed here:

• [Sevick2008] E. Sevick, R. Prabhakar, S. R. Williams, and D. J. Searles, Fluctuation theorems, Ann. Rev. Phys. Chem. 59 (2008), 603–633.

Two of the key ideas for the ‘detailed fluctuation theorem’ discussed here are due to Crooks:

• [Crooks1999] Gavin Crooks, The entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences, Phys. Rev. E 60 (1999), 2721–2726.

who identified (E_{a}(\tau_{j})-E_{a}(\tau_{j-1})) as heat, and Seifert:

• [Seifert2005] Udo Seifert, Entropy production along a stochastic trajectory and an integral fluctuation theorem, Phys. Rev. Lett. 95 (2005), 4.

who understood the relevance of the self-information in this context.

The connection between statistical physics and evolutionary biology is discussed here:

• [Sella2005] G. Sella and A.E. Hirsh, The application of statistical physics to evolutionary biology, Proc. Nat. Acad. Sci. USA 102 (2005), 9541–9546.

and the ‘fitness flux theorem’ is derived in

• [Mustonen2010] V. Mustonen and M. Lässig, Fitness flux and ubiquity of adaptive evolution, Proc. Nat. Acad. Sci. USA 107 (2010), 4248–4253.

Schrödinger’s famous discussion of the physical nature of life was published here:

• [Schrödinger1944] E. Schrödinger, What is Life?, Cambridge University Press, Cambridge, 1944.


Time Crystals

26 September, 2012

When water freezes and forms a crystal, it creates a periodic pattern in space. Could there be something that crystallizes to form a periodic pattern in time? Frank Wilczek, who won the Nobel Prize for helping explain why quarks and gluons trapped inside a proton or neutron act like freely moving particles when you examine them very close up, dreamt up this idea and called it a time crystal:

• Frank Wilczek, Classical time crystals.

• Frank Wilczek, Quantum time crystals.

‘Time crystals’ sound like something from Greg Egan’s Orthogonal trilogy, set in a universe where there’s no fundamental distinction between time and space. But Wilczek wanted to realize these in our universe.

Of course, it’s easy to make a system that behaves in an approximately periodic way while it slowly runs down: that’s how a clock works: tick tock, tick tock, tick tock… But a system that keeps ‘ticking away’ without using up any resource or running down would be a strange new thing. There’s no telling what weird stuff we might do with it.

It’s also interesting because physicists love symmetry. In ordinary physics there are two very important symmetries: spatial translation symmetry, and time translation symmetry. Spatial translation symmetry says that if you move an experiment any amount to the left or right, it works the same way. Time translation symmetry says that if you do an experiment any amount of time earlier or later, it works the same way.

Crystals are fascinating because they ‘spontaneously break’ spatial translation symmetry. Take a liquid, cool it until it freezes, and it forms a crystal which does not look the same if you move it any amount to the right or left. It only looks the same if you move it certain discrete steps to the right or left!

The idea of a ‘time crystal’ is that it’s a system that spontaneously breaks time translation symmetry.

Given how much physicists have studied time translation symmetry and spontaneous symmetry breaking, it’s sort of shocking that nobody before 2012 wrote about this possibility. Or maybe someone did—but I haven’t heard about it.

It takes real creativity to think of an idea so radical yet so simple. But Wilczek is famously creative. For example, he came up with anyons: a new kind of particle, neither boson nor fermion, that lives in a universe where space is 2-dimensional. And now we can make those in the lab.

Unfortunately, Wilczek didn’t know how to make a time crystal. But now a team including Xiang Zhang (seated) and Tongcang Li (standing) at U.C. Berkeley have a plan for how to do it.

Actually they propose a ring-shaped system that’s periodic in time and also in space, as shown in the picture. They call it a space-time crystal:

Here we propose a space-time crystal of trapped ions and a method to realize it experimentally by confining ions in a ring-shaped trapping potential with a static magnetic field. The ions spontaneously form a spatial ring crystal due to Coulomb repulsion. This ion crystal can rotate persistently at the lowest quantum energy state in magnetic fields with fractional fluxes. The persistent rotation of trapped ions produces the temporal order, leading to the formation of a space-time crystal. We show that these space-time crystals are robust for direct experimental observation. The proposed space-time crystals of trapped ions provide a new dimension for exploring many-body physics and emerging properties of matter.

The new paper is here:

• Tongcang Li, Zhe-Xuan Gong, Zhang-Qi Yin, H. T. Quan, Xiaobo Yin, Peng Zhang, L.-M. Duan and Xiang Zhang, Space-time crystals of trapped ions.

Alas, the press release put out by Lawrence Berkeley National Laboratory is very misleading. It describes the space-time crystal as a clock that

will theoretically persist even after the rest of our universe reaches entropy, thermodynamic equilibrium or “heat-death”.

NO!

First of all, ‘reaching entropy’ doesn’t mean anything. More importantly, as time goes by and things fall apart, this space-time crystal, assuming anyone can actually make it, will also fall apart.

I know what the person talking to the reporter was trying to say: the cool thing about this setup is that it gives a system that’s truly time-periodic, not gradually using up some resource and running down like an ordinary clock. But nonphysicist readers, seeing an article entitled ‘A Clock that Will Last Forever’, may be fooled into thinking this setup is, umm, a clock that will last forever. It’s not.

If this setup were the whole universe, it might keep ticking away forever. But in fact it’s just a small, carefully crafted portion of our universe, and it interacts with the rest of our universe, so it will gradually fall apart when everything else does… or in fact much sooner: as soon as the scientists running it turn off the experiment.

Classifying space-time crystals

What could we do with space-time crystals? It’s way too early to tell, at least for me. But since I’m a mathematician, I’d be happy to classify them. Over on Google+, William Rutiser asked if there are 4d analogs of the 3d crystallographic groups. And the answer is yes! Mathematicians with too much time on their hands have classified the analogues of crystallographic groups in 4 dimensions:

Space group: classification in small dimensions, Wikipedia.

In general these groups are called space groups (see the article for the definition). In 1 dimension there are just two, namely the symmetry groups of this:

— o — o — o — o — o — o —

and this:

— > — > — > — > — > — > —

In 2 dimensions there are 17 and they’re called wallpaper groups. In 3 dimensions there are 230 and they are called crystallographic groups. In 4 dimensions there are 4894, in 5 dimensions there are… hey, Wikipedia leaves this space blank in their table!… and in 6 dimensions there are 28,934,974.

So, there is in principle quite a large subject to study here, if people can figure out how to build a variety of space-time crystals.

There’s already book on this, if you’re interested:

• Harold Brown, Rolf Bulow, Joachim Neubuser, Hans Wondratschek and Hans Zassenhaus, Crystallographic Groups of Four-Dimensional Space, Wiley Monographs in Crystallography, 1978.


A Course on Quantum Techniques for Stochastic Mechanics

18 September, 2012

Jacob Biamonte and I have come out with a draft of a book!

A course on quantum techniques for stochastic mechanics.

It’s based on the first 24 network theory posts on this blog. It owes a lot to everyone here, and the acknowledgements just scratch the surface of that indebtedness. At some later time I’d like to go through the posts and find the top twenty people who need to be thanked. But I’m leaving Singapore on Friday, going back to California to teach at U.C. Riverside, so I’ve been rushing to get something out before then.

If you see typos or other problems, please let us know!
We’ve reorganized the original blog articles and polished them up a bit, but we plan to do more before publishing these notes as a book.

I’m looking forward to teaching a seminar called Mathematics of the Environment when I get back to U.C. Riverside, and with luck I’ll put some notes from that on the blog here. I will also be trying to round up a team of grad students to work on network theory.

The next big topics in the network theory series will be electrical circuits and Bayesian networks. I’m beginning to see how these fit together with stochastic Petri nets in a unified framework, but I’ll need to talk and write about it to fill in all the details.

You can get a sense of what this course is about by reading this:

Foreword

This course is about a curious relation between two ways of describing situations that change randomly with the passage of time. The old way is probability theory and the new way is quantum theory

Quantum theory is based, not on probabilities, but on amplitudes. We can use amplitudes to compute probabilities. However, the relation between them is nonlinear: we take the absolute value of an amplitude and square it to get a probability. It thus seems odd to treat amplitudes as directly analogous to probabilities. Nonetheless, if we do this, some good things happen. In particular, we can take techniques devised in quantum theory and apply them to probability theory. This gives new insights into old problems.

There is, in fact, a subject eager to be born, which is mathematically very much like quantum mechanics, but which features probabilities in the same equations where quantum mechanics features amplitudes. We call this subject stochastic mechanics

Plan of the course

In Section 1 we introduce the basic object of study here: a ‘stochastic Petri net’. A stochastic Petri net describes in a very general way how collections of things of different kinds can randomly interact and turn into other things. If we consider large numbers of things, we obtain a simplified deterministic model called the ‘rate equation’, discussed in Section 2. More fundamental, however, is the ‘master equation’, introduced in Section 3. This describes how the probability of having various numbers of things of various kinds changes with time.

In Section 4 we consider a very simple stochastic Petri net and notice that in this case, we can solve the master equation using techniques taken from quantum mechanics. In Section 5 we sketch how to generalize this: for any stochastic Petri net, we can write down an operator called a ‘Hamiltonian’ built from ‘creation and annihilation operators’, which describes the rate of change of the probability of having various numbers of things. In Section 6 we illustrate this with an example taken from population biology. In this example the rate equation is just the logistic equation, one of the simplest models in population biology. The master equation describes reproduction and competition of organisms in a stochastic way.

In Section 7 we sketch how time evolution as described by the master equation can be written as a sum over Feynman diagrams. We do not develop this in detail, but illustrate it with a predator–prey model from population biology. In the process, we give a slicker way of writing down the Hamiltonian for any stochastic Petri net.

In Section 8 we enter into a main theme of this course: the study of equilibrium solutions of the master and rate equations. We present the Anderson–Craciun–Kurtz theorem, which shows how to get equilibrium solutions of the master equation from equilibrium solutions of the rate equation, at least if a certain technical condition holds. Brendan Fong has translated Anderson, Craciun and Kurtz’s original proof into the language of annihilation and creation operators, and we give Fong’s proof here. In this language, it turns out that the equilibrium solutions are mathematically just like ‘coherent states’ in quantum mechanics.

In Section 9 we give an example of the Anderson–Craciun–Kurtz theorem coming from a simple reversible reaction in chemistry. This example leads to a puzzle that is resolved by discovering that the presence of ‘conserved quantities’—quantities that do not change with time—let us construct many equilibrium solutions of the rate equation other than those given by the Anderson–Craciun–Kurtz theorem.

Conserved quantities are very important in quantum mechanics, and they are related to symmetries by a result called Noether’s theorem. In Section 10 we describe a version of Noether’s theorem for stochastic mechanics, which we proved with the help of Brendan Fong. This applies, not just to systems described by stochastic Petri nets, but a much more general class of processes called ‘Markov processes’. In the analogy to quantum mechanics, Markov processes are analogous to arbitrary quantum systems whose time evolution is given by a Hamiltonian. Stochastic Petri nets are analogous to a special case of these: the case where the Hamiltonian is built from annihilation and creation operators. In Section 11 we state the analogy between quantum mechanics and stochastic mechanics more precisely, and with more attention to mathematical rigor. This allows us to set the quantum and stochastic versions of Noether’s theorem side by side and compare them in Section 12.

In Section 13 we take a break from the heavy abstractions and look at a fun example from chemistry, in which a highly symmetrical molecule randomly hops between states. These states can be seen as vertices of a graph, with the transitions as edges. In this particular example we get a famous graph with 20 vertices and 30 edges, called the ‘Desargues graph’.

In Section 14 we note that the Hamiltonian in this example is a ‘graph Laplacian’, and, following a computation done by Greg Egan, we work out the eigenvectors and eigenvalues of this Hamiltonian explicitly. One reason graph Laplacians are interesting is that we can use them as Hamiltonians to describe time evolution in both stochastic and quantum mechanics. Operators with this special property are called ‘Dirichlet operators’, and we discuss them in Section 15. As we explain, they also describe electrical circuits made of resistors. Thus, in a peculiar way, the intersection of quantum mechanics and stochastic mechanics is the study of electrical circuits made of resistors!

In Section 16, we study the eigenvectors and eigenvalues of an arbitrary Dirichlet operator. We introduce a famous result called the Perron–Frobenius theorem for this purpose. However, we also see that the Perron–Frobenius theorem is important for understanding the equilibria of Markov processes. This becomes important later when we prove the ‘deficiency zero theorem’.

We introduce the deficiency zero theorem in Section 17. This result, proved by the chemists Feinberg, Horn and Jackson, gives equilibrium solutions for the rate equation for a large class of stochastic Petri nets. Moreover, these equilibria obey the extra condition that lets us apply the Anderson–Craciun–Kurtz theorem and obtain equlibrium solutions of the master equations as well. However, the deficiency zero theorem is best stated, not in terms of stochastic Petri nets, but in terms of another, equivalent, formalism: ‘chemical reaction networks’. So, we explain chemical reaction networks here, and use them heavily throughout the rest of the course. However, because they are applicable to such a large range of problems, we call them simply ‘reaction networks’. Like stochastic Petri nets, they describe how collections of things of different kinds randomly interact and turn into other things.

In Section 18 we consider a simple example of the deficiency zero theorem taken from chemistry: a diatomic gas. In Section 19 we apply the Anderson–Craciun–Kurtz theorem to the same example.

In Section 20 we begin the final phase of the course: proving the deficiency zero theorem, or at least a portion of it. In this section we discuss the concept of ‘deficiency’, which had been introduced before, but not really explained: the definition that makes the deficiency easy to compute is not the one that says what this concept really means. In Section 21 we show how to rewrite the rate equation of a stochastic Petri net—or equivalently, of a reaction network—in terms of a Markov process. This is surprising because the rate equation is nonlinear, while the equation describing a Markov process is linear in the probabilities involved. The trick is to use a nonlinear operation called ‘matrix exponentiation’. In Section 22 we study equilibria for Markov processes. Then, finally, in Section 23, we use these equilbria to obtain equilibrium solutions of the rate equation, completing our treatment of the deficiency zero theorem.


More Second Laws of Thermodynamics

24 August, 2012

Oscar Dahlsten is visiting the Centre for Quantum Technologies, so we’re continuing some conversations about entropy that we started last year, back when the Entropy Club was active. But now Jamie Vicary and Brendan Fong are involved in the conversations.

I was surprised when Oscar told me that for a large class of random processes, the usual second law of thermodynamics is just one of infinitely many laws saying that various kinds of disorder increase. I’m annoyed that nobody ever told me about this before! It’s as if they told me about conservation of energy but not conservation of schmenergy, and phlenergy, and zenergy

So I need to tell you about this. You may not understand it, but at least I can say I tried. I don’t want you blaming me for concealing all these extra second laws of thermodynamics!

Here’s the basic idea. Not all random processes are guaranteed to make entropy increase. But a bunch of them always make probability distributions flatter in a certain precise sense. This makes the entropy of the probability distribution increase. But when you make a probability distribution flatter in this sense, a bunch of other quantities increase too! For example, besides the usual entropy, there are infinitely many other kinds of entropy, called ‘Rényi entropies’, one for each number between 0 and ∞. And a doubly stochastic operator makes all the Rényi entropies increase! This fact is a special case of Theorem 10 here:

• Tim van Erven and Peter Harremoës, Rényi divergence and majorization.

Let me state this fact precisely, and then say a word about how this is related to quantum theory and ‘the collapse of the wavefunction’.

To keep things simple let’s talk about probability distributions on a finite set, though Erven and Harremoës generalize it all to a measure space.

How do we make precise the concept that one probability distribution is flatter than another? You know it when you see it, at least some of the time. For example, suppose I have some system in thermal equilibrium at some temperature, and the probabilities of it being in various states look like this:

Then say I triple the temperature. The probabilities flatten out:

But how can we make this concept precise in a completely general way? We can do it using the concept of ‘majorization’. If one probability distribution is less flat than another, people say it ‘majorizes’ that other one.

Here’s the definition. Say we have two probability distributions p and q on the same set. For each one, list the probabilities in decreasing order:

p_1 \ge p_2 \ge \cdots \ge p_n

q_1 \ge q_2 \ge \cdots \ge q_n

Then we say p majorizes q if

p_1 + \cdots + p_k \ge q_1 + \cdots + q_k

for all 1 \le k \le n. So, the idea is that the biggest probabilities in the distribution p add up to more than the corresponding biggest ones in q.

In 1960, Alfred Rényi defined a generalization of the usual Shannon entropy that depends on a parameter \beta. If p is a probability distribution on a finite set, its Rényi entropy of order \beta is defined to be

\displaystyle{ H_\beta(p) = \frac{1}{1 - \beta} \ln \sum_i p_i^\beta }

where 0 \le \beta < \infty. Well, to be honest: if \beta is 0, 1, or \infty we have to define this by taking a limit where we let \beta creep up to that value. But the limit exists, and when \beta = 1 we get the usual Shannon entropy

\displaystyle{ H_1(p) = - \sum_i p_i \ln(p_i) }

As I explained a while ago, Rényi entropies are important ways of measuring biodiversity. But here’s what I learned just now, from the paper by Erven and Harremoës:

Theorem 1. If a probability distribution p majorizes a probability distribution q, its Rényi entropies are smaller:

\displaystyle{ H_\beta(p) \le H_\beta(q) }

for all 0 \le \beta < \infty.

And here’s what makes this fact so nice. If you do something to a classical system in a way that might involve some randomness, we can describe your action using a stochastic matrix. An n \times n matrix T is called stochastic if whenever p \in \mathbb{R}^n is a probability distribution, so is T p. This is equivalent to saying:

• the matrix entries of T are all \ge 0, and

• each column of T sums to 1.

If T is stochastic, it’s not necessarily true that the entropy of T p is greater than or equal to that of p, not even for the Shannon entropy.

Puzzle 1. Find a counterexample.

However, entropy does increase if we use specially nice stochastic matrices called ‘doubly stochastic’ matrices. People say a matrix T doubly stochastic if it’s stochastic and it maps the probability distribution

\displaystyle{ p_0 = (\frac{1}{n}, \dots, \frac{1}{n}) }

to itself. This is the most spread-out probability distribution of all: every other probability distribution majorizes this one.

Why do they call such matrices ‘doubly’ stochastic? Well, if you’ve got a stochastic matrix, each column sums to one. But a stochastic operator is doubly stochastic if and only if each row sums to 1 as well.

Here’s a really cool fact:

Theorem 2. If T is doubly stochastic, p majorizes T p for any probability distribution p \in \mathbb{R}^n. Conversely, if a probability distribution p majorizes a probability distribution q, then q = T p for some doubly stochastic matrix T.

Taken together, Theorems 1 and 2 say that doubly stochastic transformations increase entropy… but not just Shannon entropy! They increase all the different Rényi entropies, as well. So if time evolution is described by a doubly stochastic matrix, we get lots of ‘second laws of thermodynamics’, saying that all these different kinds of entropy increase!

Finally, what does all this have to do with quantum mechanics, and collapsing the wavefunction? There are different things to say, but this is the simplest:

Theorem 3. Given two probability distributions p, q \in \mathbb{R}^n, then p majorizes q if and only there exists a self-adjoint matrix D with eigenvalues p_i and diagonal entries q_i.

The matrix D will be a density matrix: a self-adjoint matrix with positive eigenvalues and trace equal to 1. We use such matrices to describe mixed states in quantum mechanics.

Theorem 3 gives a precise sense in which preparing a quantum system in some state, letting time evolve, and then measuring it ‘increases randomness’.

How? Well, suppose we have a quantum system whose Hilbert space is \mathbb{C}^n. If we prepare the system in a mixture of the standard basis states with probabilities p_i, we can describe it with a diagonal density matrix D_0. Then suppose we wait a while and some unitary time evolution occurs. The system is now described by a new density matrix

D = U D_0 \, U^{-1}

where U is some unitary operator. If we then do a measurement to see which of the standard basis states our system now lies in, we’ll get the different possible results with probabilities q_i, the diagonal entries of D. But the eigenvalues of D will still be the numbers p_i. So, by the theorem, p majorizes q!

So, not only Shannon entropy but also all the Rényi entropies will increase!

Of course, there are some big physics questions lurking here. Like: what about the real world? In the real world, do lots of different kinds of entropy tend to increase, or just some?

Of course, there’s a huge famous old problem about how reversible time evolution can be compatible with any sort of law saying that entropy must always increase! Still, there are some arguments, going back to Boltzmann’s H-theorem, which show entropy increases under some extra conditions. So then we can ask if other kinds of entropy, like Rényi entropy, increase as well. This will be true whenever we can argue that time evolution is described by doubly stochastic matrices. Theorem 3 gives a partial answer, but there’s probably much more to say.

I don’t have much more to say right now, though. I’ll just point out that while doubly stochastic matrices map the ‘maximally smeared-out’ probability distribution

\displaystyle{ p_0 = (\frac{1}{n}, \dots, \frac{1}{n}) }

to itself, a lot of this theory generalizes to stochastic matrices that map exactly one other probability distribution to itself. We need to work with relative Rényi entropy instead of Rényi entropy, and so on, but I don’t think these adjustments are really a big deal. And there are nice theorems that let you know when a stochastic matrix maps exactly one probability distribution to itself, based on the Perron–Frobenius theorem.

References

I already gave you a reference for Theorem 1, namely the paper by Erven and Harremoës, though I don’t think they were the first to prove this particular result: they generalize it quite a lot.

What about Theorem 2? It goes back at least to here:

• Barry C. Arnold, Majorization and the Lorenz Order: A Brief Introduction, Springer Lecture Notes in Statistics 43, Springer, Berlin, 1987.

The partial order on probability distributions given by majorization is also called the ‘Lorenz order’, but mainly when we consider probability distributions on infinite sets. This name presumably comes from the Lorenz curve, a measure of income inequality. This curve shows for the bottom x% of households, what percentage y% of the total income they have:

Puzzle 2. If you’ve got two different probability distributions of incomes, and one majorizes the other, how are their Lorenz curves related?

When we generalize majorization by letting some other probability distribution take the place of

\displaystyle{ p_0 = (\frac{1}{n}, \dots, \frac{1}{n}) }

it seems people call it the ‘Markov order’. Here’s a really fascinating paper on that, which I’m just barely beginning to understand:

• A. N. Gorban, P. A. Gorban and G. Judge, Entropy: the Markov ordering approach, Entropy 12 (2010), 1145–1193.

What about Theorem 3? Apparently it goes back to here:

• A. Uhlmann, Wiss. Z. Karl-Marx-Univ. Leipzig 20 (1971), 633.

though I only know this thanks to a more recent paper:

• Michael A. Nielsen, Conditions for a class of entanglement transformations, Phys. Rev. Lett. 83 (1999), 436–439.

By the way, Nielsen’s paper contains another very nice result about majorization! Suppose you have states \psi and \phi of a 2-part quantum system. You can trace out one part and get density matrices describing mixed states of the other part, say D_\psi and D_\phi. Then Nielsen shows you can get from \psi to \phi using ‘local operations and classical communication’ if and only if D_\phi majorizes D_\psi. Note that things are going backwards here compared to how they’ve been going in the rest of this post: if we can get from \psi to \phi, then all forms of entropy go down when we go from D_\psi to D_\phi! This ‘anti-second-law’ behavior is confusing at first, but familiar to me by now.

When I first learned all this stuff, I naturally thought of the following question—maybe you did too, just now. If p, q \in \mathbb{R}^n are probability distributions and

\displaystyle{ H_\beta(p) \le H_\beta(q) }

for all 0 \le \beta < \infty, is it true that p majorizes q?

Apparently the answer must be no, because Klimesh has gone to quite a bit of work to obtain a weaker conclusion: not that p majorizes q, but that p \otimes r majorizes q \otimes r for some probability distribution r \in \mathbb{R}^m. He calls this catalytic majorization, with r serving as a ‘catalyst’:

• Matthew Klimesh, Inequalities that collectively completely characterizes the catalytic majorization relation.

I thank Vlatko Vedral here at the CQT for pointing this out!

Finally, here is a good general introduction to majorization, pointed out by Vasileios Anagnostopoulos:

• T. Ando, Majorization, doubly stochastic matrices, and comparison of eigenvalues, Linear Algebra and its Applications 118 (1989), 163-–248.


Network Theory (Part 20)

6 August, 2012

guest post by Jacob Biamonte

We’re in the middle of a battle: in addition to our typical man versus equation scenario, it’s a battle between two theories. For those good patrons following the network theory series, you know the two opposing forces well. It’s our old friends, at it again:

Stochastic Mechanics vs Quantum Mechanics!

Today we’re reporting live from a crossroads, and we’re facing a skirmish that gives rise to what some might consider a paradox. Let me sketch the main thesis before we get our hands dirty with the gory details.

First I need to tell you that the battle takes place at the intersection of stochastic and quantum mechanics. We recall from Part 16 that there is a class of operators called ‘Dirichlet operators’ that are valid Hamiltonians for both stochastic and quantum mechanics. In other words, you can use them to generate time evolution both for old-fashioned random processes and for quantum processes!

Staying inside this class allows the theories to fight it out on the same turf. We will be considering a special subclass of Dirichlet operators, which we call ‘irreducible Dirichlet operators’. These are the ones where starting in any state in our favorite basis of states, we have a nonzero chance of winding up in any other. When considering this subclass, we found something interesting:

Thesis. Let H be an irreducible Dirichlet operator with n eigenstates. In stochastic mechanics, there is only one valid state that is an eigenvector of H: the unique so-called ‘Perron–Frobenius state’. The other n-1 eigenvectors are forbidden states of a stochastic system: the stochastic system is either in the Perron–Frobenius state, or in a superposition of at least two eigensvectors. In quantum mechanics, all n eigenstates of H are valid states.

This might sound like a riddle, but today as we’ll prove, riddle or not, it’s a fact. If it makes sense, well that’s another issue. As John might have said, it’s like a bone kicked down from the gods up above: we can either choose to chew on it, or let it be. Today we are going to do a bit of chewing.

One of the many problems with this post is that John had a nut loose on his keyboard. It was not broken! I’m saying he wrote enough blog posts on this stuff to turn them into a book. I’m supposed to be compiling the blog articles into a massive LaTeX file, but I wrote this instead.

Another problem is that this post somehow seems to use just about everything said before, so I’m going to have to do my best to make things self-contained. Please bear with me as I try to recap what’s been done. For those of you familiar with the series, a good portion of the background for what we’ll cover today can be found in Part 12 and Part 16.

At the intersection of two theories

As John has mentioned in his recent talks, the typical view of how quantum mechanics and probability theory come into contact looks like this:

The idea is that quantum theory generalizes classical probability theory by considering observables that don’t commute.

That’s perfectly valid, but we’ve been exploring an alternative view in this series. Here quantum theory doesn’t subsume probability theory, but they intersect:

What goes in the middle you might ask? As odd as it might sound at first, John showed in Part 16 that electrical circuits made of resistors constitute the intersection!

For example, a circuit like this:

gives rise to a Hamiltonian H that’s good both for stochastic mechanics and quantum mechanics. Indeed, he found that the power dissipated by a circuit made of resistors is related to the familiar quantum theory concept known as the expectation value of the Hamiltonian!

\textrm{power} = -2 \langle \psi, H \psi \rangle

Oh—and you might think we made a mistake and wrote our Ω (ohm) symbols upside down. We didn’t. It happens that ℧ is the symbol for a ‘mho’—a unit of conductance that’s the reciprocal of an ohm. Check out Part 16 for the details.

Stochastic mechanics versus quantum mechanics

Let’s recall how states, time evolution, symmetries and observables work in the two theories. Today we’ll fix a basis for our vector space of states, and we’ll assume it’s finite-dimensional so that all vectors have n components over either the complex numbers \mathbb{C} or the reals \mathbb{R}. In other words, we’ll treat our space as either \mathbb{C}^n or \mathbb{R}^n. In this fashion, linear operators that map such spaces to themselves will be represented as square matrices.

Vectors will be written as \psi_i where the index i runs from 1 to n, and we think of each choice of the index as a state of our system—but since we’ll be using that word in other ways too, let’s call it a configuration. It’s just a basic way our system can be.

States

Besides the configurations i = 1,\dots, n, we have more general states that tell us the probability or amplitude of finding our system in one of these configurations:

Stochastic states are n-tuples of nonnegative real numbers:

\psi_i \in \mathbb{R}^+

The probability of finding the system in the ith configuration is defined to be \psi_i. For these probabilities to sum to one, \psi_i needs to be normalized like this:

\displaystyle{ \sum_i \psi_i = 1 }

or in the notation we’re using in these articles:

\displaystyle{ \langle \psi \rangle = 1 }

where we define

\displaystyle{ \langle \psi \rangle = \sum_i \psi_i }

Quantum states are n-tuples of complex numbers:

\psi_i \in \mathbb{C}

The probability of finding a state in the ith configuration is defined to be |\psi(x)|^2. For these probabilities to sum to one, \psi needs to be normalized like this:

\displaystyle{ \sum_i |\psi_i|^2 = 1 }

or in other words

\langle \psi,  \psi \rangle = 1

where the inner product of two vectors \psi and \phi is defined by

\displaystyle{ \langle \psi, \phi \rangle = \sum_i \overline{\psi}_i \phi_i }

Now, the usual way to turn a quantum state \psi into a stochastic state is to take the absolute value of each number \psi_i and then square it. However, if the numbers \psi_i happen to be nonnegative, we can also turn \psi into a stochastic state simply by multiplying it by a number to ensure \langle \psi \rangle = 1.

This is very unorthodox, but it lets us evolve the same vector \psi either stochastically or quantum-mechanically, using the recipes I’ll describe next. In physics jargon these correspond to evolution in ‘real time’ and ‘imaginary time’. But don’t ask me which is which: from a quantum viewpoint stochastic mechanics uses imaginary time, but from a stochastic viewpoint it’s the other way around!

Time evolution

Time evolution works similarly in stochastic and quantum mechanics, but with a few big differences:

• In stochastic mechanics the state changes in time according to the master equation:

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

which has the solution

\psi(t) = \exp(t H) \psi(0)

• In quantum mechanics the state changes in time according to Schrödinger’s equation:

\displaystyle{ \frac{d}{d t} \psi(t) = -i H \psi(t) }

which has the solution

\psi(t) = \exp(-i t H) \psi(0)

The operator H is called the Hamiltonian. The properties it must have depend on whether we’re doing stochastic mechanics or quantum mechanics:

• We need H to be infinitesimal stochastic for time evolution given by \exp(-i t H) to send stochastic states to stochastic states. In other words, we need that (i) its columns sum to zero and (ii) its off-diagonal entries are real and nonnegative:

\displaystyle{ \sum_i H_{i j}=0 }

i\neq j \Rightarrow H_{i j}\geq 0

• We need H to be self-adjoint for time evolution given by \exp(-t H) to send quantum states to quantum states. So, we need

H = H^\dagger

where we recall that the adjoint of a matrix is the conjugate of its transpose:

\displaystyle{ (H^\dagger)_{i j} := \overline{H}_{j i} }

We are concerned with the case where the operator H generates both a valid quantum evolution and also a valid stochastic one:

H is a Dirichlet operator if it’s both self-adjoint and infinitesimal stochastic.

We will soon go further and zoom in on this intersection! But first let’s finish our review.

Symmetries

As John explained in Part 12, besides states and observables we need symmetries, which are transformations that map states to states. These include the evolution operators which we only briefly discussed in the preceding subsection.

• A linear map U that sends quantum states to quantum states is called an isometry, and isometries are characterized by this property:

U^\dagger U = 1

• A linear map U that sends stochastic states to stochastic states is called a stochastic operator, and stochastic operators are characterized by these properties:

\displaystyle{ \sum_i U_{i j} = 1 }

and

U_{i j} \geq 0

A notable difference here is that in our finite-dimensional situation, isometries are always invertible, but stochastic operators may not be! If U is an n \times n matrix that’s an isometry, U^\dagger is its inverse. So, we also have

U U^\dagger = 1

and we say U is unitary. But if U is stochastic, it may not have an inverse—and even if it does, its inverse is rarely stochastic. This explains why in stochastic mechanics time evolution is often not reversible, while in quantum mechanics it always is.

Puzzle 1. Suppose U is a stochastic n \times n matrix whose inverse is stochastic. What are the possibilities for U?

It is quite hard for an operator to be a symmetry in both stochastic and quantum mechanics, especially in our finite-dimensional situation:

Puzzle 2. Suppose U is an n \times n matrix that is both stochastic and unitary. What are the possibilities for U?

Observables

‘Observables’ are real-valued quantities that can be measured, or predicted, given a specific theory.

• In quantum mechanics, an observable is given by a self-adjoint matrix O, and the expected value of the observable O in the quantum state \psi is

\displaystyle{ \langle \psi , O \psi \rangle = \sum_{i,j} \overline{\psi}_i O_{i j} \psi_j }

• In stochastic mechanics, an observable O has a value O_i in each configuration i, and the expected value of the observable O in the stochastic state \psi is

\displaystyle{ \langle O \psi \rangle = \sum_i O_i \psi_i }

We can turn an observable in stochastic mechanics into an observable in quantum mechanics by making a diagonal matrix whose diagonal entries are the numbers O_i.

From graphs to matrices

Back in Part 16, John explained how a graph with positive numbers on its edges gives rise to a Hamiltonian in both quantum and stochastic mechanics—in other words, a Dirichlet operator.

Here’s how this works. We’ll consider simple graphs: graphs without arrows on their edges, with at most one edge from one vertex to another, with no edges from a vertex to itself. And we’ll only look at graphs with finitely many vertices and edges. We’ll assume each edge is labelled by a positive number, like this:

If our graph has n vertices, we can create an n \times n matrix A where A_{i j} is the number labelling the edge from i to j, if there is such an edge, and 0 if there’s not. This matrix is symmetric, with real entries, so it’s self-adjoint. So A is a valid Hamiltonian in quantum mechanics.

How about stochastic mechanics? Remember that a Hamiltonian H in stochastic mechanics needs to be ‘infinitesimal stochastic’. So, its off-diagonal entries must be nonnegative, which is indeed true for our A, but also the sums of its columns must be zero, which is not true when our A is nonzero.

But now comes the best news you’ve heard all day: we can improve A to a stochastic operator in a way that is completely determined by A itself! This is done by subtracting a diagonal matrix L whose entries are the sums of the columns of A:

L_{i i} = \sum_i A_{i j}

i \ne j \Rightarrow L_{i j} = 0

It’s easy to check that

H = A - L

is still self-adjoint, but now also infinitesimal stochastic. So, it’s a Dirichlet operator: a good Hamiltonian for both stochastic and quantum mechanics!

In Part 16, we saw a bit more: every Dirichlet operator arises this way. It’s easy to see. You just take your Dirichlet operator and make a graph with one edge for each nonzero off-diagonal entry. Then you label the edge with this entry.

So, Dirichlet operators are essentially the same as finite simple graphs with edges labelled by positive numbers.

Now, a simple graph can consist of many separate ‘pieces’, called components. Then there’s no way for a particle hopping along the edges to get from one component to another, either in stochastic or quantum mechanics. So we might as well focus our attention on graphs with just one component. These graphs are called ‘connected’. In other words:

Definition. A simple graph is connected if it is nonempty and there is a path of edges connecting any vertex to any other.

Our goal today is to understand more about Dirichlet operators coming from connected graphs. For this we need to learn the Perron–Frobenius theorem. But let’s start with something easier.

Perron’s theorem

In quantum mechanics it’s good to think about observables that have positive expected values:

\langle \psi, O \psi \rangle > 0

for every quantum state \psi \in \mathbb{C}^n. These are called positive definite. But in stochastic mechanics it’s good to think about matrices that are positive in a more naive sense:

Definition. An n \times n real matrix T is positive if all its entries are positive:

T_{i j} > 0

for all 1 \le i, j \le n.

Similarly:

Definition. A vector \psi \in \mathbb{R}^n is positive if all its components are positive:

\psi_i > 0

for all 1 \le i \le n.

We’ll also define nonnegative matrices and vectors in the same way, replacing > 0 by \ge 0. A good example of a nonnegative vector is a stochastic state.

In 1907, Perron proved the following fundamental result about positive matrices:

Perron’s Theorem. Given a positive square matrix T, there is a positive real number r, called the Perron–Frobenius eigenvalue of T, such that r is an eigenvalue of T and any other eigenvalue \lambda of T has |\lambda| < r. Moreover, there is a positive vector \psi \in \mathbb{R}^n with T \psi = r \psi. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of T must be a scalar multiple of \psi.

In other words, if T is positive, it has a unique eigenvalue with the largest absolute value. This eigenvalue is positive. Up to a constant factor, it has an unique eigenvector. We can choose this eigenvector to be positive. And then, up to a constant factor, it’s the only nonnegative eigenvector of T.

From matrices to graphs

The conclusions of Perron’s theorem don’t hold for matrices that are merely nonnegative. For example, these matrices

\left( \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right) , \qquad \left( \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right)

are nonnegative, but they violate lots of the conclusions of Perron’s theorem.

Nonetheless, in 1912 Frobenius published an impressive generalization of Perron’s result. In its strongest form, it doesn’t apply to all nonnegative matrices; only to those that are ‘irreducible’. So, let us define those.

We’ve seen how to build a matrix from a graph. Now we need to build a graph from a matrix! Suppose we have an n \times n matrix T. Then we can build a graph G_T with n vertices where there is an edge from the ith vertex to the jth vertex if and only if T_{i j} \ne 0.

But watch out: this is a different kind of graph! It’s a directed graph, meaning the edges have directions, there’s at most one edge going from any vertex to any vertex, and we do allow an edge going from a vertex to itself. There’s a stronger concept of ‘connectivity’ for these graphs:

Definition. A directed graph is strongly connected if there is a directed path of edges going from any vertex to any other vertex.

So, you have to be able to walk along edges from any vertex to any other vertex, but always following the direction of the edges! Using this idea we define irreducible matrices:

Definition. A nonnegative square matrix T is irreducible if its graph G_T is strongly connected.

The Perron–Frobenius theorem

Now we are ready to state:

The Perron-Frobenius Theorem. Given an irreducible nonnegative square matrix T, there is a positive real number r, called the Perron–Frobenius eigenvalue of T, such that r is an eigenvalue of T and any other eigenvalue \lambda of T has |\lambda| \le r. Moreover, there is a positive vector \psi \in \mathbb{R}^n with T\psi = r \psi. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of T must be a scalar multiple of \psi.

The only conclusion of this theorem that’s weaker than those of Perron’s theorem is that there may be other eigenvalues with |\lambda| = r. For example, this matrix is irreducible and nonnegative:

\left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right)

Its Perron–Frobenius eigenvalue is 1, but it also has -1 as an eigenvalue. In general, Perron-Frobenius theory says quite a lot about the other eigenvalues on the circle |\lambda| = r, but we won’t need that fancy stuff here.

Perron–Frobenius theory is useful in many ways, from highbrow math to ranking football teams. We’ll need it not just today but also later in this series. There are many books and other sources of information for those that want to take a closer look at this subject. If you’re interested, you can search online or take a look at these:

• Dimitrious Noutsos, Perron Frobenius theory and some extensions, 2008. (Includes proofs of the basic theorems.)

• V. S. Sunder, Perron Frobenius theory, 18 December 2009. (Includes applications to graph theory, Markov chains and von Neumann algebras.)

• Stephen Boyd, Lecture 17: Perron Frobenius theory, Winter 2008-2009. (Includes a max-min characterization of the Perron–Frobenius eigenvalue and applications to Markov chains, economics, population growth and power control.)

I have not taken a look myself, but if anyone is interested and can read German, the original work appears here:

• Oskar Perron, Zur Theorie der Matrizen, Math. Ann. 64 (1907), 248–263.

• Georg Frobenius, Über Matrizen aus nicht negativen Elementen, S.-B. Preuss Acad. Wiss. Berlin (1912), 456–477.

And, of course, there’s this:

• Wikipedia, Perron–Frobenius theorem.

It’s got a lot of information.

Irreducible Dirichlet operators

Now comes the payoff. We saw how to get a Dirichlet operator H from any finite simple graph with edges labelled by positive numbers. Now let’s apply Perron–Frobenius theory to prove our thesis.

Unfortunately, the matrix H is rarely nonnegative. If you remember how we built it, you’ll see its off-diagonal entries will always be nonnegative… but its diagonal entries can be negative.

Luckily, we can fix this just by adding a big enough multiple of the identity matrix to H! The result is a nonnegative matrix

T = H + c I

where c > 0 is some large number. This matrix T has the same eigenvectors as H. The off-diagonal matrix entries of T are the same as those of A, so T_{i j} is nonzero for i \ne j exactly when the graph we started with has an edge from i to j. So, for i \ne j, the graph G_T will have an directed edge going from i to j precisely when our original graph had an edge from i to j. And that means that if our original graph was connected, G_T will be strongly connected. Thus, by definition, the matrix T is irreducible!

Since T is nonnegative and irreducible, the Perron–Frobenius theorem swings into action and we conclude:

Lemma. Suppose H is the Dirichlet operator coming from a connected finite simple graph with edges labelled by positive numbers. Then the eigenvalues of H are real. Let \lambda be the largest eigenvalue. Then there is a positive vector \psi \in \mathbb{R}^n with H\psi = \lambda \psi. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of H must be a scalar multiple of \psi.

Proof. The eigenvalues of H are real since H is self-adjoint. Notice that if r is the Perron–Frobenius eigenvalue of T = H + c I and

T \psi = r \psi

then

H \psi = (r - c)\psi

By the Perron–Frobenius theorem the number r is positive, and it has the largest absolute value of any eigenvalue of T. Thanks to the subtraction, the eigenvalue r - c may not have the largest absolute value of any eigenvalue of H. It is, however, the largest eigenvalue of H, so we take this as our \lambda. The rest follows from the Perron–Frobenius theorem.   █

But in fact we can improve this result, since the largest eigenvalue \lambda is just zero. Let’s also make up a definition, to make our result sound more slick:

Definition. A Dirichlet operator is irreducible if it comes from a connected finite simple graph with edges labelled by positive numbers.

This meshes nicely with our earlier definition of irreducibility for nonnegative matrices. Now:

Theorem. Suppose H is an irreducible Dirichlet operator. Then H has zero as its largest real eigenvalue. There is a positive vector \psi \in \mathbb{R}^n with H\psi = 0. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of H must be a scalar multiple of \psi.

Proof. Choose \lambda as in the Lemma, so that H\psi = \lambda \psi. Since \psi is positive we can normalize it to be a stochastic state:

\displaystyle{ \sum_i \psi_i = 1 }

Since H is a Dirichlet operator, \exp(t H) sends stochastic states to stochastic states, so

\displaystyle{ \sum_i (\exp(t H) \psi)_i = 1 }

for all t \ge 0. On the other hand,

\displaystyle{ \sum_i (\exp(t H)\psi)_i = \sum_i e^{t \lambda} \psi_i = e^{t \lambda} }

so we must have \lambda = 0.   █

What’s the point of all this? One point is that there’s a unique stochastic state \psi that’s an equilibrium state: since H \psi = 0, it doesn’t change with time. It’s also globally stable: since all the other eigenvalues of H are negative, all other stochastic states converge to this one as time goes forward.

An example

There are many examples of irreducible Dirichlet operators. For instance, in Part 15 we talked about graph Laplacians. The Laplacian of a connected simple graph is always irreducible. But let us try a different sort of example, coming from the picture of the resistors we saw earlier:

Let’s create a matrix A whose entry A_{i j} is the number labelling the edge from i to j if there is such an edge, and zero otherwise:

A = \left(  \begin{array}{ccccc}   0 & 2 & 1 & 0 & 1 \\   2 & 0 & 0 & 1 & 1 \\   1 & 0 & 0 & 2 & 1 \\   0 & 1 & 2 & 0 & 1 \\   1 & 1 & 1 & 1 & 0  \end{array}  \right)

Remember how the game works. The matrix A is already a valid Hamiltonian for quantum mechanics, since it’s self adjoint. However, to get a valid Hamiltonian for both stochastic and quantum mechanics—in other words, a Dirichlet operator—we subtract the diagonal matrix L whose entries are the sums of the columns of A. In this example it just so happens that the column sums are all 4, so L = 4 I, and our Dirichlet operator is

H = A - 4 I = \left(  \begin{array}{rrrrr}   -4 & 2 & 1 & 0 & 1 \\   2 & -4 & 0 & 1 & 1 \\   1 & 0 & -4 & 2 & 1 \\   0 & 1 & 2 & -4 & 1 \\   1 & 1 & 1 & 1 & -4  \end{array}  \right)

We’ve set up this example so it’s easy to see that the vector \psi = (1,1,1,1,1) has

H \psi = 0

So, this is the unique eigenvector for the eigenvalue 0. We can use Mathematica to calculate the remaining eigenvalues of H. The set of eigenvalues is

\{0, -7, -8, -8, -3 \}

As we expect from our theorem, the largest real eigenvalue is 0. By design, the eigenstate associated to this eigenvalue is

| v_0 \rangle = (1, 1, 1, 1, 1)

(This funny notation for vectors is common in quantum mechanics, so don’t worry about it.) All the other eigenvectors fail to be nonnegative, as predicted by the theorem. They are:

| v_1 \rangle = (1, -1, -1, 1, 0)
| v_2 \rangle = (-1, 0, -1, 0, 2)
| v_3 \rangle = (-1, 1, -1, 1, 0)
| v_4 \rangle = (-1, -1, 1, 1, 0)

To compare the quantum and stochastic states, consider first |v_0\rangle. This is the only eigenvector that can be normalized to a stochastic state. Remember, a stochastic state must have nonnegative components. This rules out |v_1\rangle through to |v_4\rangle as valid stochastic states, no matter how we normalize them! However, these are allowed as states in quantum mechanics, once we normalize them correctly. For a stochastic system to be in a state other than the Perron–Frobenius state, it must be a linear combination of least two eigenstates. For instance,

\psi_a = (1-a)|v_0\rangle + a |v_1\rangle

can be normalized to give stochastic state only if 0 \leq a \leq \frac{1}{2}.

And, it’s easy to see that it works this way for any irreducible Dirichlet operator, thanks to our theorem. So, our thesis has been proved true!

Puzzles on irreducibility

Let us conclude with a couple more puzzles. There are lots of ways to characterize irreducible nonnegative matrices; we don’t need to mention graphs. Here’s one:

Puzzle 3. Let T be a nonnegative n \times n matrix. Show that T is irreducible if and only if for all i,j \ge 0, (A^m)_{i j} > 0 for some natural number m.

You may be confused because today we explained the usual concept of irreducibility for nonnegative matrices, but also defined a concept of irreducibility for Dirichlet operators. Luckily there’s no conflict: Dirichlet operators aren’t nonnegative matrices, but if we add a big multiple of the identity to a Dirichlet operator it becomes a nonnegative matrix, and then:

Puzzle 4. Show that a Dirichlet operator H is irreducible if and only if the nonnegative operator H + c I (where c is any sufficiently large constant) is irreducible.

Irreducibility is also related to the nonexistence of interesting conserved quantities. In Part 11 we saw a version of Noether’s Theorem for stochastic mechanics. Remember that an observable O in stochastic mechanics assigns a number O_i to each configuration i = 1, \dots, n. We can make a diagonal matrix with O_i as its diagonal entries, and by abuse of language we call this O as well. Then we say O is a conserved quantity for the Hamiltonian H if the commutator [O,H] = O H - H O vanishes.

Puzzle 5. Let H be a Dirichlet operator. Show that H is irreducible if and only if every conserved quantity O for H is a constant, meaning that for some c \in \mathbb{R} we have O_i = c for all i. (Hint: examine the proof of Noether’s theorem.)

In fact this works more generally:

Puzzle 6. Let H be an infinitesimal stochastic matrix. Show that H + c I is an irreducible nonnegative matrix for all sufficiently large c if and only if every conserved quantity O for H is a constant.


Follow

Get every new post delivered to your Inbox.

Join 784 other followers