Network Theory (Part 11)

4 October, 2011

jointly written with Brendan Fong

Noether proved lots of theorems, but when people talk about Noether’s theorem, they always seem to mean her result linking symmetries to conserved quantities. Her original result applied to classical mechanics, but today we’d like to present a version that applies to ‘stochastic mechanics’—or in other words, Markov processes.

What’s a Markov process? We’ll say more in a minute—but in plain English, it’s a physical system where something hops around randomly from state to state, where its probability of hopping anywhere depends only on where it is now, not its past history. Markov processes include, as a special case, the stochastic Petri nets we’ve been talking about.

Our stochastic version of Noether’s theorem is copied after a well-known quantum version. It’s yet another example of how we can exploit the analogy between stochastic mechanics and quantum mechanics. But for now we’ll just present the stochastic version. Next time we’ll compare it to the quantum one.

Markov processes

We should and probably will be more general, but let’s start by considering a finite set of states, say X. To describe a Markov process we then need a matrix of real numbers H = (H_{i j})_{i, j \in X}. The idea is this: suppose right now our system is in the state i. Then the probability of being in some state j changes as time goes by—and H_{i j} is defined to be the time derivative of this probability right now.

So, if \psi_i(t) is the probability of being in the state i at time t, we want the master equation to hold:

\displaystyle{ \frac{d}{d t} \psi_i(t) = \sum_{j \in X} H_{i j} \psi_j(t) }

This motivates the definition of ‘infinitesimal stochastic’, which we recall from Part 5:

Definition. Given a finite set X, a matrix of real numbers H = (H_{i j})_{i, j \in X} is infinitesimal stochastic if

i \ne j \implies H_{i j} \ge 0

and

\displaystyle{ \sum_{i \in X} H_{i j} = 0 }

for all j \in X.

The inequality says that if we start in the state i, the probability of being found in some other state, which starts at 0, can’t go down, at least initially. The equation says that the probability of being somewhere or other doesn’t change. Together, these facts imply that that:

H_{i i} \le 0

That makes sense: the probability of being in the state $i$, which starts at 1, can’t go up, at least initially.

Using the magic of matrix multiplication, we can rewrite the master equation as follows:

\displaystyle{\frac{d}{d t} \psi(t) = H \psi(t) }

and we can solve it like this:

\psi(t) = \exp(t H) \psi(0)

If H is an infinitesimal stochastic operator, we will call \exp(t H) a Markov process, and H its Hamiltonian.

(Actually, most people call \exp(t H) a Markov semigroup, and reserve the term Markov process for another way of looking at the same idea. So, be careful.)

Noether’s theorem is about ‘conserved quantities’, that is, observables whose expected values don’t change with time. To understand this theorem, you need to know a bit about observables. In stochastic mechanics an observable is simply a function assigning a number O_i to each state i \in X.

However, in quantum mechanics we often think of observables as matrices, so it’s nice to do that here, too. It’s easy: we just create a matrix whose diagonal entries are the values of the function O. And just to confuse you, we’ll also call this matrix O. So:

O_{i j} = \left\{ \begin{array}{ccl}  O_i & \textrm{if} & i = j \\ 0 & \textrm{if} & i \ne j  \end{array} \right.

One advantage of this trick is that it lets us ask whether an observable commutes with the Hamiltonian. Remember, the commutator of matrices is defined by

[O,H] = O H - H O

Noether’s theorem will say that [O,H] = 0 if and only if O is ‘conserved’ in some sense. What sense? First, recall that a stochastic state is just our fancy name for a probability distribution \psi on the set X. Second, the expected value of an observable O in the stochastic state \psi is defined to be

\displaystyle{ \sum_{i \in X} O_i \psi_i }

In Part 5 we introduced the notation

\displaystyle{ \int \phi = \sum_{i \in X} \phi_i }

for any function \phi on X. The reason is that later, when we generalize X from a finite set to a measure space, the sum at right will become an integral over X. Indeed, a sum is just a special sort of integral!

Using this notation and the magic of matrix multiplication, we can write the expected value of O in the stochastic state \psi as

\int O \psi

We can calculate how this changes in time if \psi obeys the master equation… and we can write the answer using the commutator [O,H]:

Lemma. Suppose H is an infinitesimal stochastic operator and O is an observable. If \psi(t) obeys the master equation, then

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t) }

Proof. Using the master equation we have

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int O \frac{d}{d t} \psi(t) = \int O H \psi(t) } \qquad \qquad \qquad \; (1)

But since H is infinitesimal stochastic,

\displaystyle{ \sum_{i \in X} H_{i j} = 0  }

so for any function \phi on X we have

\displaystyle{ \int H \phi = \sum_{i, j \in X} H_{i j} \phi_j = 0 }

and in particular

\int H O \psi(t) = 0   \quad \; \qquad \qquad \qquad \qquad   \qquad \qquad \qquad \qquad (2)

Since [O,H] = O H - H O , we conclude from (1) and (2) that

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t) }

as desired.   █

The commutator doesn’t look like it’s doing much here, since we also have

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int O H \psi(t) }

which is even simpler. But the commutator will become useful when we get to Noether’s theorem!

Noether’s theorem

Here’s a version of Noether’s theorem for Markov processes. It says an observable commutes with the Hamiltonian iff the expected values of that observable and its square don’t change as time passes:

Theorem. Suppose H is an infinitesimal stochastic operator and O is an observable. Then

[O,H] =0

if and only if

\displaystyle{ \frac{d}{d t} \int O\psi(t) = 0 }

and

\displaystyle{ \frac{d}{d t} \int O^2\psi(t) = 0 }

for all \psi(t) obeying the master equation.

If you know Noether’s theorem from quantum mechanics, you might be surprised that in this version we need not only the observable but also its square to have an unchanging expected value! We’ll explain this, but first let’s prove the theorem.

Proof. The easy part is showing that if [O,H]=0 then \frac{d}{d t} \int O\psi(t) = 0 and \frac{d}{d t} \int O^2\psi(t) = 0. In fact there’s nothing special about these two powers of t; we’ll show that

\displaystyle{ \frac{d}{d t} \int O^n \psi(t) = 0 }

for all n. The point is that since H commutes with O, it commutes with all powers of O:

[O^n, H] = 0

So, applying the Lemma to the observable O^n, we see

\displaystyle{ \frac{d}{d t} \int O^n \psi(t) =  \int [O^n, H] \psi(t) = 0 }

The backward direction is a bit trickier. We now assume that

\displaystyle{ \frac{d}{d t} \int O\psi(t) = \frac{d}{d t} \int O^2\psi(t) = 0 }

for all solutions \psi(t) of the master equation. This implies

\int O H\psi(t) = \int O^2 H\psi(t) = 0

or since this holds for all solutions,

\displaystyle{ \sum_{i \in X} O_i H_{i j} = \sum_{i \in X} O_i^2H_{i j} = 0 }  \qquad \qquad \qquad \qquad  \qquad \qquad (3)

We wish to show that [O,H]= 0.

First, recall that we can think of O is a diagonal matrix with:

O_{i j} = \left\{ \begin{array}{ccl}  O_i & \textrm{if} & i = j \\ 0 & \textrm{if} & i \ne j  \end{array} \right.

So, we have

\begin{array}{ccl} [O,H]_{i j} &=& \displaystyle{ \sum_{k \in X} (O_{i k}H_{k j} - H_{i k} O_{k j}) } \\ \\ &=& O_i H_{i j} - H_{i j}O_j \\ \\ &=& (O_i-O_j)H_{i j} \end{array}

To show this is zero for each pair of elements i, j \in X, it suffices to show that when H_{i j} \ne 0, then O_j = O_i. That is, we need to show that if the system can move from state j to state i, then the observable takes the same value on these two states.

In fact, it’s enough to show that this sum is zero for any j \in X:

\displaystyle{ \sum_{i \in X} (O_j-O_i)^2 H_{i j} }

Why? When i = j, O_j-O_i = 0, so that term in the sum vanishes. But when i \ne j, (O_j-O_i)^2 and H_{i j} are both non-negative—the latter because H is infinitesimal stochastic. So if they sum to zero, they must each be individually zero. Thus for all i \ne j, we have (O_j-O_i)^2H_{i j}=0. But this means that either O_i = O_j or H_{i j} = 0, which is what we need to show.

So, let’s take that sum and expand it:

\displaystyle{ \sum_{i \in X} (O_j-O_i)^2 H_{i j} = \sum_i (O_j^2 H_{i j}- 2O_j O_i H_{i j} +O_i^2 H_{i j}) }

which in turn equals

\displaystyle{  O_j^2\sum_i H_{i j} - 2O_j \sum_i O_i H_{i j} + \sum_i O_i^2 H_{i j} }

The three terms here are each zero: the first because H is infinitesimal stochastic, and the latter two by equation (3). So, we’re done!   █

Markov chains

So that’s the proof… but why do we need both O and its square to have an expected value that doesn’t change with time to conclude [O,H] = 0? There’s an easy counterexample if we leave out the condition involving O^2. However, the underlying idea is clearer if we work with Markov chains instead of Markov processes.

In a Markov process, time passes by continuously. In a Markov chain, time comes in discrete steps! We get a Markov process by forming \exp(t H) where H is an infinitesimal stochastic operator. We get a Markov chain by forming the operator U, U^2, U^3, \dots where U is a ‘stochastic operator’. Remember:

Definition. Given a finite set X, a matrix of real numbers U = (U_{i j})_{i, j \in X} is stochastic if

U_{i j} \ge 0

for all i, j \in X and

\displaystyle{ \sum_{i \in X} U_{i j} = 1 }

for all j \in X.

The idea is that U describes a random hop, with U_{i j} being the probability of hopping to the state i if you start at the state j. These probabilities are nonnegative and sum to 1.

Any stochastic operator gives rise to a Markov chain U, U^2, U^3, \dots . And in case it’s not clear, that’s how we’re defining a Markov chain: the sequence of powers of a stochastic operator. There are other definitions, but they’re equivalent.

We can draw a Markov chain by drawing a bunch of states and arrows labelled by transition probabilities, which are the matrix elements U_{i j}:

Here is Noether’s theorem for Markov chains:

Theorem. Suppose U is a stochastic operator and O is an observable. Then

[O,U] =0

if and only if

\displaystyle{  \int O U \psi = \int O \psi }

and

\displaystyle{ \int O^2 U \psi = \int O^2 \psi }

for all stochastic states \psi.

In other words, an observable commutes with U iff the expected values of that observable and its square don’t change when we evolve our state one time step using U.

You can probably prove this theorem by copying the proof for Markov processes:

Puzzle. Prove Noether’s theorem for Markov chains.

But let’s see why we need the condition on the square of observable! That’s the intriguing part. Here’s a nice little Markov chain:

where we haven’t drawn arrows labelled by 0. So, state 1 has a 50% chance of hopping to state 0 and a 50% chance of hopping to state 2; the other two states just sit there. Now, consider the observable O with

O_i = i

It’s easy to check that the expected value of this observable doesn’t change with time:

\displaystyle{  \int O U \psi = \int O \psi }

for all \psi. The reason, in plain English, is this. Nothing at all happens if you start at states 0 or 2: you just sit there, so the expected value of O doesn’t change. If you start at state 1, the observable equals 1. You then have a 50% chance of going to a state where the observable equals 0 and a 50% chance of going to a state where it equals 2, so its expected value doesn’t change: it still equals 1.

On the other hand, we do not have [O,U] = 0 in this example, because we can hop between states where O takes different values. Furthermore,

\displaystyle{  \int O^2 U \psi \ne \int O^2 \psi }

After all, if you start at state 1, O^2 equals 1 there. You then have a 50% chance of going to a state where O^2 equals 0 and a 50% chance of going to a state where it equals 4, so its expected value changes!

So, that’s why \int O U \psi = \int O \psi for all \psi is not enough to guarantee [O,U] = 0. The same sort of counterexample works for Markov processes, too.

Finally, we should add that there’s nothing terribly sacred about the square of the observable. For example, we have:

Theorem. Suppose H is an infinitesimal stochastic operator and O is an observable. Then

[O,H] =0

if and only if

\displaystyle{ \frac{d}{d t} \int f(O) \psi(t) = 0 }

for all smooth f: \mathbb{R} \to \mathbb{R} and all \psi(t) obeying the master equation.

Theorem. Suppose U is a stochastic operator and O is an observable. Then

[O,U] =0

if and only if

\displaystyle{  \int f(O) U \psi = \int f(O) \psi }

for all smooth f: \mathbb{R} \to \mathbb{R} and all stochastic states \psi.

These make the ‘forward direction’ of Noether’s theorem stronger… and in fact, the forward direction, while easier, is probably more useful! However, if we ever use Noether’s theorem in the ‘reverse direction’, it might be easier to check a condition involving only O and its square.


Network Theory (Part 10)

16 September, 2011

Last time Brendan showed us a proof of the Anderson-Craciun-Kurtz theorem on equilibrium states. Today we’ll look at an example that illustrates this theorem. This example brings up an interesting ‘paradox’—or at least a puzzle. Resolving this will get us ready to think about a version of Noether’s theorem relating conserved quantities and symmetries. Next time Brendan will state and prove a version of Noether’s theorem that applies to stochastic Petri nets.

A reversible reaction

In chemistry a type of atom, molecule, or ion is called a chemical species, or species for short. Since we’re applying our ideas to both chemistry and biology, it’s nice that ‘species’ is also used for a type of organism in biology. This stochastic Petri net describes the simplest reversible reaction of all, involving two species:

We have species 1 turning into species 2 with rate constant \beta, and species 2 turning back into species 1 with rate constant \gamma. So, the rate equation is:

\begin{array}{ccc} \displaystyle{ \frac{d x_1}{d t} } &=& -\beta x_1 + \gamma x_2  \\  \qquad \mathbf{} \\ \displaystyle{ \frac{d x_2}{d t} } &=&  \beta x_1 - \gamma x_2  \end{array}

where x_1 and x_2 are the amounts of species 1 and 2, respectively.

Equilibrium solutions of the rate equation

Let’s look for equilibrium solutions of the rate equation, meaning solutions where the amount of each species doesn’t change with time. Equilibrium occurs when each species is getting created at the same rate at which it’s getting destroyed.

So, let’s see when

\displaystyle{ \frac{d x_1}{d t} = \frac{d x_2}{d t} = 0 }

Clearly this happens precisely when

\beta x_1 = \gamma x_2

This says the rate at which 1’s are turning into 2’s equals the rate at which 2’s are turning back into 1’s. That makes perfect sense.

Complex balanced equilibrium

In general, a chemical reaction involves a bunch of species turning into a bunch of species. Since ‘bunch’ is not a very dignified term, a bunch of species is usually called a complex. We saw last time that it’s very interesting to study a strong version of equilibrium: complex balanced equilibrium, in which each complex is being created at the same rate at which it’s getting destroyed.

However, in the Petri net we’re studying today, all the complexes being produced or destroyed consist of a single species. In this situation, any equilibrium solution is automatically complex balanced. This is great, because it means we can apply the Anderson-Craciun-Kurtz theorem from last time! This says how to get from a complex balanced equilibrium solution of the rate equation to an equilibrium solution of the master equation.

First remember what the master equation says. Let \psi_{n_1, n_2}(t) be the probability that we have n_1 things of species 1 and n_2 things of species 2 at time t. We summarize all this information in a formal power series:

\Psi(t) = \sum_{n_1, n_2 = 0}^\infty \psi_{n_1, n_2}(t) z_1^{n_1} z_2^{n_2}

Then the master equation says

\frac{d}{d t} \Psi(t) = H \Psi (t)

where following the general rules laid down in Part 8,

\begin{array}{ccl} H &=& \beta (a_2^\dagger - a_1^\dagger) a_1 + \gamma (a_1^\dagger - a_2^\dagger )a_2 \end{array}

This may look scary, but the annihilation operator a_i and the creation operator a_i^\dagger are just funny ways of writing the partial derivative \partial/\partial z_i and multiplication by z_i, so

\begin{array}{ccl} H &=& \displaystyle{ \beta (z_2 - z_1) \frac{\partial}{\partial z_1} + \gamma (z_1 - z_2) \frac{\partial}{\partial z_2} } \end{array}

or if you prefer,

\begin{array}{ccl} H &=& \displaystyle{ (z_2 - z_1) \, (\beta \frac{\partial}{\partial z_1} - \gamma \frac{\partial}{\partial z_2}) } \end{array}

The first term describes species 1 turning into species 2. The second describes species 2 turning back into species 1.

Now, the Anderson-Cranciun-Kurtz theorem says that whenever (x_1,x_2) is a complex balanced solution of the rate equation, this recipe gives an equilibrium solution of the master equation:

\displaystyle{ \Psi = \frac{e^{x_1 z_1 + x_2 z_2}}{e^{x_1 + x_2}} }

In other words: whenever \beta x_1 = \gamma x_2, we have

we have

H\Psi = 0

Let’s check this! For starters, the constant in the denominator of \Psi doesn’t matter here, since H is linear. It’s just a normalizing constant, put in to make sure that our probabilities \psi_{n_1, n_2} sum to 1. So, we just need to check that

\displaystyle{ (z_2 - z_1) (\beta \frac{\partial}{\partial z_1} - \gamma \frac{\partial}{\partial z_2}) e^{x_1 z_1 + x_2 z_2} = 0 }

If we do the derivatives on the left hand side, it’s clear we want

\displaystyle{ (z_2 - z_1) (\beta x_1 - \gamma x_2) e^{x_1 z_1 + x_2 z_2} = 0 }

And this is indeed true when \beta x_1 = \gamma x_2 .

So, the theorem works as advertised. And now we can work out the probability \psi_{n_1,n_2} of having n_1 things of species 1 and n_2 of species 2 in our equilibrium state \Psi. To do this, we just expand the function \Psi as a power series and look at the coefficient of z_1^{n_1} z_2^{n_2}. We have

\Psi = \displaystyle{ \frac{e^{x_1 z_1 + x_2 z_2}}{e^{x_1 + x_2}} } = \displaystyle{ \frac{1}{e^{x_1} e^{x_2}} \sum_{n_1,n_2}^\infty \frac{(x_1 z_1)^{n_1}}{n_1!} \frac{(x_2 z_2)^{n_2}}{n_2!} }

so we get

\psi_{n_1,n_2} = \displaystyle{ \frac{1}{e^{x_1}} \frac{x_1^{n_1}}{n_1!} \; \cdot \; \frac{1}{e^{x_2}} \frac{x_1^{n_2}}{n_2!} }

This is just a product of two independent Poisson distributions!

In case you forget, a Poisson distribution says the probability of k events occurring in some interval of time if they occur with a fixed average rate and independently of the time since the last event. If the expected number of events is \lambda, the Poisson distribution is

\displaystyle{ \frac{1}{e^\lambda} \frac{\lambda^k}{k!} }

and it looks like this for various values of \lambda:

It looks almost like a Gaussian when \lambda is large, but when \lambda is small it becomes very lopsided.

Anyway: we’ve seen that in our equilibrium state, the number of things of species i = 1,2 is given by a Poisson distribution with mean x_i. That’s very nice and simple… but the amazing thing is that these distributions are independent.

Mathematically, this means we just multiply them to get the probability of finding n_1 things of species 1 and n_2 of species 2. But it also means that knowing how many things there are of one species says nothing about the number of the other.

But something seems odd here. One transition in our Petri net consumes a 1 and produces a 2, while the other consumes a 2 and produces a 1. The total number of particles in the system never changes. The more 1’s there are, the fewer 2’s there should be. But we just said knowing how many 1’s we have tells us nothing about how many 2’s we have!

At first this seems like a paradox. Have we made a mistake? Not exactly. But we’re neglecting something.

Conserved quantities

Namely: the equilibrium solutions of the master equation we’ve found so far are not the only ones! There are other solutions that fit our intuitions better.

Suppose we take any of our equilibrium solutions \Psi and change it like this: set the probability \psi_{n_1,n_2} equal to 0 unless

n_1 + n_2 = n

but otherwise leave it unchanged. Of course the probabilities no longer sum to 1, but we can rescale them so they do.

The result is a new equilibrium solution, say \Psi_n. Why? Because, as we’ve already seen, no transitions will carry us from one value of n_1 + n_2 to another. And in this new solution, the number of 1’s is clearly not independent from the number of 2’s. The bigger one is, the smaller the other is.

Puzzle 1. Show that this new solution \Psi_n depends only on n and the ratio x_1/x_2 = \gamma/\beta, not on anything more about the values of x_1 and x_2 in the original solution

\Psi = \displaystyle{ \frac{e^{x_1 z_1 + x_2 z_2}}{e^{x_1 + x_2}} }

Puzzle 2. What is this new solution like when \beta = \gamma? (This particular choice makes the problem symmetrical when we interchange species 1 and 2.)

What’s happening here is that this particular stochastic Petri net has a ‘conserved quantity’: the total number of things never changes with time. So, we can take any equilibrium solution of the master equation and—in the language of quantum mechanics—`project down to the subspace’ where this conserved quantity takes a definite value, and get a new equilibrium solution. In the language of probability theory, we say it a bit differently: we’re ‘conditioning on’ the conserved quantity taking a definite value. But the idea is the same.

This important feature of conserved quantities suggests that we should try to invent a new version of Noether’s theorem. This theorem links conserved quantities and symmetries of the Hamiltonian.

There are already a couple versions of Noether’s theorem for classical mechanics, and for quantum mechanics… but now we want a version for stochastic mechanics. And indeed one exists, and it’s relevant to what we’re doing here. We’ll show it to you next time.


Network Theory (Part 8)

9 September, 2011

Summer vacation is over. Time to get back to work!

This month, before he goes to Oxford to begin a master’s program in Mathematics and the Foundations of Computer Science, Brendan Fong is visiting the Centre for Quantum Technologies and working with me on stochastic Petri nets. He’s proved two interesting results, which he wants to explain.

To understand what he’s done, you need to know how to get the rate equation and the master equation from a stochastic Petri net. We’ve almost seen how. But it’s been a long time since the last article in this series, so today I’ll start with some review. And at the end, just for fun, I’ll say a bit more about how Feynman diagrams show up in this theory.

Since I’m an experienced teacher, I’ll assume you’ve forgotten everything I ever said.

(This has some advantages. I can change some of my earlier terminology—improve it a bit here and there—and you won’t even notice.)

Stochastic Petri nets

Definition. A Petri net consists of a set S of species and a set T of transitions, together with a function

i : S \times T \to \mathbb{N}

saying how many things of each species appear in the input for each transition, and a function

o: S \times T \to \mathbb{N}

saying how many things of each species appear in the output.

We can draw pictures of Petri nets. For example, here’s a Petri net with two species and three transitions:

It should be clear that the transition ‘predation’ has one wolf and one rabbit as input, and two wolves as output.

A ‘stochastic’ Petri net goes further: it also says the rate at which each transition occurs.

Definition. A stochastic Petri net is a Petri net together with a function

r: T \to [0,\infty)

giving a rate constant for each transition.

Master equation versus rate equation

Starting from any stochastic Petri net, we can get two things. First:

• The master equation. This says how the probability that we have a given number of things of each species changes with time.

• The rate equation. This says how the expected number of things of each species changes with time.

The master equation is stochastic: it describes how probabilities change with time. The rate equation is deterministic.

The master equation is more fundamental. It’s like the equations of quantum electrodynamics, which describe the amplitudes for creating and annihilating individual photons. The rate equation is less fundamental. It’s like the classical Maxwell equations, which describe changes in the electromagnetic field in a deterministic way. The classical Maxwell equations are an approximation to quantum electrodynamics. This approximation gets good in the limit where there are lots of photons all piling on top of each other to form nice waves.

Similarly, the rate equation can be derived from the master equation in the limit where the number of things of each species become large, and the fluctuations in these numbers become negligible.

But I won’t do this derivation today! Nor will I probe more deeply into the analogy with quantum field theory, even though that’s my ultimate goal. Today I’m content to remind you what the master equation and rate equation are.

The rate equation is simpler, so let’s do that first.

The rate equation

Suppose we have a stochastic Petri net with k different species. Let x_i be the number of things of the ith species. Then the rate equation looks like this:

\displaystyle{ \frac{d x_i}{d t} = ??? }

It’s really a bunch of equations, one for each 1 \le i \le k. But what is the right-hand side?

The right-hand side is a sum of terms, one for each transition in our Petri net. So, let’s start by assuming our Petri net has just one transition.

Suppose the ith species appears as input to this transition m_i times, and as output n_i times. Then the rate equation is

\displaystyle{ \frac{d x_i}{d t} = r (n_i - m_i) x_1^{m_1} \cdots x_k^{m_k} }

where r is the rate constant for this transition.

That’s really all there is to it! But we can make it look nicer. Let’s make up a vector

x = (x_1, \dots , x_k) \in [0,\infty)^k

that says how many things there are of each species. Similarly let’s make up an input vector

m = (m_1, \dots, m_k) \in \mathbb{N}^k

and an output vector

n = (n_1, \dots, n_k) \in \mathbb{N}^k

for our transition. To be cute, let’s also define

\displaystyle{ x^m = x_1^{m_1} \cdots x_k^{m_k} }

Then we can write the rate equation for a single transition like this:

\displaystyle{ \frac{d x}{d t} = r (n-m) x^m }

Next let’s do a general stochastic Petri net, with lots of transitions. Let’s write T for the set of transitions and r(\tau) for the rate constant of the transition \tau \in T. Let n(\tau) and m(\tau) be the input and output vectors of the transition \tau. Then the rate equation is:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) (n(\tau) - m(\tau)) x^{m(\tau)} }

For example, consider our rabbits and wolves:

Suppose

• the rate constant for ‘birth’ is \beta,

• the rate constant for ‘predation’ is \gamma,

• the rate constant for ‘death’ is \delta.

Let x_1(t) be the number of rabbits and x_2(t) the number of wolves at time t. Then the rate equation looks like this:

\displaystyle{ \frac{d x_1}{d t} = \beta x_1 - \gamma x_1 x_2 }

\displaystyle{ \frac{d x_2}{d t} = \gamma x_1 x_2 - \delta x_2 }

If you stare at this, and think about it, it should make perfect sense. If it doesn’t, go back and read Part 2.

The master equation

Now let’s do something new. In Part 6 I explained how to write down the master equation for a stochastic Petri net with just one species. Now let’s generalize that. Luckily, the ideas are exactly the same.

So, suppose we have a stochastic Petri net with k different species. Let \psi_{n_1, \dots, n_k} be the probability that we have n_1 things of the first species, n_2 of the second species, and so on. The master equation will say how all these probabilities change with time.

To keep the notation clean, let’s introduce a vector

n = (n_1, \dots, n_k) \in \mathbb{N}^k

and let

\psi_n = \psi_{n_1, \dots, n_k}

Then, let’s take all these probabilities and cook up a formal power series that has them as coefficients: as we’ve seen, this is a powerful trick. To do this, we’ll bring in some variables z_1, \dots, z_k and write

\displaystyle{ z^n = z_1^{n_1} \cdots z_k^{n_k} }

as a convenient abbreviation. Then any formal power series in these variables looks like this:

\displaystyle{ \Psi = \sum_{n \in \mathbb{N}^k} \psi_n z^n }

We call \Psi a state if the probabilities sum to 1 as they should:

\displaystyle{ \sum_n \psi_n = 1 }

The simplest example of a state is a monomial:

\displaystyle{  z^n = z_1^{n_1} \cdots z_k^{n_k} }

This is a state where we are 100% sure that there are n_1 things of the first species, n_2 of the second species, and so on. We call such a state a pure state, since physicists use this term to describe a state where we know for sure exactly what’s going on. Sometimes a general state, one that might not be pure, is called mixed.

The master equation says how a state evolves in time. It looks like this:

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

So, I just need to tell you what H is!

It’s called the Hamiltonian. It’s a linear operator built from special operators that annihilate and create things of various species. Namely, for each state 1 \le i \le k we have an annihilation operator:

\displaystyle{ a_i \Psi = \frac{d}{d z_i} \Psi }

and a creation operator:

\displaystyle{ a_i^\dagger \Psi = z_i \Psi }

How do we build H from these? Suppose we’ve got a stochastic Petri net whose set of transitions is T. As before, write r(\tau) for the rate constant of the transition \tau \in T, and let n(\tau) and m(\tau) be the input and output vectors of this transition. Then:

\displaystyle{ H = \sum_{\tau \in T} r(\tau) \, ({a^\dagger}^{n(\tau)} - {a^\dagger}^{m(\tau)}) \, a^{m(\tau)}  }

where as usual we’ve introduce some shorthand notations to keep from going insane. For example:

a^{m(\tau)} = a_1^{m_1(\tau)} \cdots  a_k^{m_k(\tau)}

and

{a^\dagger}^{m(\tau)} = {a_1^\dagger }^{m_1(\tau)} \cdots  {a_k^\dagger}^{m_k(\tau)}

Now, it’s not surprising that each transition \tau contributes a term to H. It’s also not surprising that this term is proportional to the rate constant r(\tau). The only tricky thing is the expression

\displaystyle{ ({a^\dagger}^{n(\tau)} - {a^\dagger}^{m(\tau)})\, a^{m(\tau)} }

How can we understand it? The basic idea is this. We’ve got two terms here. The first term:

\displaystyle{  {a^\dagger}^{n(\tau)} a^{m(\tau)} }

describes how m_i(\tau) things of the ith species get annihilated, and n_i(\tau) things of the ith species get created. Of course this happens thanks to our transition \tau. The second term:

\displaystyle{  - {a^\dagger}^{m(\tau)} a^{m(\tau)} }

is a bit harder to understand, but it says how the probability that nothing happens—that we remain in the same pure state—decreases as time passes. Again this happens due to our transition \tau.

In fact, the second term must take precisely the form it does to ensure ‘conservation of total probability’. In other words: if the probabilities \psi_n sum to 1 at time zero, we want these probabilities to still sum to 1 at any later time. And for this, we need that second term to be what it is! In Part 6 we saw this in the special case where there’s only one species. The general case works the same way.

Let’s look at an example. Consider our rabbits and wolves yet again:

and again suppose the rate constants for birth, predation and death are \beta, \gamma and \delta, respectively. We have

\displaystyle{ \Psi = \sum_n \psi_n z^n }

where

\displaystyle{ z^n = z_1^{n_1} z_2^{n_2} }

and \psi_n = \psi_{n_1, n_2} is the probability of having n_1 rabbits and n_2 wolves. These probabilities evolve according to the equation

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

where the Hamiltonian is

H = \beta B + \gamma C + \delta D

and B, C and D are operators describing birth, predation and death, respectively. (B stands for birth, D stands for death… and you can call predation ‘consumption’ if you want something that starts with C. Besides, ‘consumer’ is a nice euphemism for ‘predator’.) What are these operators? Just follow the rules I described:

\displaystyle{ B = {a_1^\dagger}^2 a_1 - a_1^\dagger a_1 }

\displaystyle{ C = {a_2^\dagger}^2 a_1 a_2 - a_1^\dagger a_2^\dagger a_1 a_2 }

\displaystyle{ D = a_2 -  a_2^\dagger a_2 }

In each case, the first term is easy to understand:

• Birth annihilates one rabbit and creates two rabbits.

• Predation annihilates one rabbit and one wolf and creates two wolves.

• Death annihilates one wolf.

The second term is trickier, but I told you how it works.

Feynman diagrams

How do we solve the master equation? If we don’t worry about mathematical rigor too much, it’s easy. The solution of

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

should be

\displaystyle{ \Psi(t) = e^{t H} \Psi(0) }

and we can hope that

\displaystyle{ e^{t H} = 1 + t H + \frac{(t H)^2}{2!} + \cdots }

so that

\displaystyle{ \Psi(t) = \Psi(0) + t H \Psi(0) + \frac{t^2}{2!} H^2 \Psi(0) + \cdots }

Of course there’s always the question of whether this power series converges. In many contexts it doesn’t, but that’s not necessarily a disaster: the series can still be asymptotic to the right answer, or even better, Borel summable to the right answer.

But let’s not worry about these subtleties yet! Let’s just imagine our rabbits and wolves, with Hamiltonian

H = \beta B + \gamma C + \delta D

Now, imagine working out

\Psi(t) = \displaystyle{ \Psi(0) + t H \Psi(0) + \frac{t^2}{2!} H^2 \Psi(0) + \frac{t^3}{3!} H^3 \Psi(0) + \cdots }

We’ll get lots of terms involving products of B, C and D hitting our original state \Psi(0). And we can draw these as diagrams! For example, suppose we start with one rabbit and one wolf. Then

\Psi(0) = z_1 z_2

And suppose we want to compute

\displaystyle{ H^3 \Psi(0) = (\beta B + \gamma C + \delta D)^3 \Psi(0) }

as part of the task of computing \Psi(t). Then we’ll get lots of terms: 27, in fact, though many will turn out to be zero. Let’s take one of these terms, for example the one proportional to:

D C B \Psi(0)

We can draw this as a sum of Feynman diagrams, including this:

In this diagram, we start with one rabbit and one wolf at top. As we read the diagram from top to bottom, first a rabbit is born (B), then predation occur (C), and finally a wolf dies (D). The end result is again a rabbit and a wolf.

This is just one of four Feynman diagrams we should draw in our sum for D C B \Psi(0), since either of the two rabbits could have been eaten, and either wolf could have died. So, the end result of computing

\displaystyle{ H^3 \Psi(0) }

will involve a lot of Feynman diagrams… and of course computing

\displaystyle{ \Psi(t) = \Psi(0) + t H \Psi(0) + \frac{t^2}{2!} H^2 \Psi(0) + \frac{t^3}{3!} H^3 \Psi(0) + \cdots }

will involve even more, even if we get tired and give up after the first few terms. So, this Feynman diagram business may seem quite tedious… and it may not be obvious how it helps.

But it does, sometimes!

Now is not the time for me to describe ‘practical’ benefits of Feynman diagrams. Instead, I’ll just point out one conceptual benefit. We started with what seemed like a purely computational chore, namely computing

\displaystyle{ \Psi(t) = \Psi(0) + t H \Psi(0) + \frac{t^2}{2!} H^2 \Psi(0) + \cdots }

But then we saw—at least roughly—how this series has a clear meaning! It can be written as a sum over diagrams, each of which represents a possible history of rabbits and wolves. So, it’s what physicists call a ‘sum over histories’.

Feynman invented the idea of a sum over histories in the context of quantum field theory. At the time this idea seemed quite mind-blowing, for various reasons. First, it involved elementary particles instead of everyday things like rabbits and wolves. Second, it involved complex ‘amplitudes’ instead of real probabilities. Third, it actually involved integrals instead of sums. And fourth, a lot of these integrals diverged, giving infinite answers that needed to be ‘cured’ somehow!

Now we’re seeing a sum over histories in a more down-to-earth context without all these complications. A lot of the underlying math is analogous… but now there’s nothing mind-blowing about it: it’s quite easy to understand. So, we can use this analogy to demystify quantum field theory a bit. On the other hand, thanks to this analogy, all sorts of clever ideas invented by quantum field theorists will turn out to have applications to biology and chemistry! So it’s a double win.


Eddy Who?

25 August, 2011

Or: A very short introduction to turbulence

guest post by Tim van Beek

Have a look at this picture:

Then look at this one:

Do they look similar?

They should! They are both examples of a Kelvin-Helmoltz instability.

The first graphic is a picture of billow clouds (the fancier name is altostratus undulatus clouds):

The picture is taken from:

• C. Donald Ahrens: Meteorology Today, 9th edition, Brooks/Cole, Kentucky, 2009.

The second graphic:

shows a lab experiment and is taken from:

• G.L. Brown and A. Roshko, online available Density effects and large structure in turbulent mixing layers, Journal of Fluid Mechanics 64 (1974), 775-816.

Isn’t it strange that clouds in the sky would show the same pattern as some gases in a small laboratory experiment? The reason for this is not quite understood today. In this post, I would like to talk a little bit about what is known.

Matter that tries to get out of its own way

Fluids like water and air can be well described by Newton’s laws of classical mechanics. When you start learning about classical mechanics, you consider discrete masses, most of the time. Billiard balls, for example. But it is possible to formulate Newton’s laws of motion for fluids by treating them as ‘infinitely many infinitesimally small’ billiard balls, all pushing and rubbing against each other and therefore trying to get out of the way of each other.

If we do this, we get some equations describing fluid flow: the Navier-Stokes equations.

The Navier-Stokes equations are a complicated set of nonlinear partial differential equations. A lot of mathematical questions about them are still unanswered, like: under what conditions is there a smooth solution to these equations? If you can answer that question, you will win one of the a million dollars from the Clay Mathematics Institute.

If you completed the standard curriculum of physics as I did, chances are that you never attended a class on fluid dynamics. At least I never did. When you take a first look at the field, you will notice: the literature about the Navier-Stokes equations alone is huge! Not to mention all the special aspects of numerical simulations, special aspects of the climate and so on.

So it is nice to find a pedagogical introduction to the subject for people who have some background knowledge in partial differential equations, for the mathematical theory:

• C. Foias, R. Rosa, O. Manley and R. Temam, Navier-Stokes Equations and Turbulence, Cambridge U. Press, Cambridge, 2001.

So, there is a lot of fun to be had for the mathematically inclined. But today I would like to talk about an aspect of fluid flows that also has a tremendous practical importance, especially for the climate of the Earth: turbulence!

There is no precise definition of turbulence, but people know it when they see it. A fluid can flow in layers, with one layer above the other, maybe slightly slower or faster. Material of one layer does hardly mix with material of another layer. These flows are called laminar flows. When a laminar flow gets faster and faster, it turns into a turbulent flow at some point:

This is a fluid flow inside a circular pipe, with a layer of some darker fluid in the middle.

As a first guess we could say that a characteristic property of turbulent flow is the presence of circular flows, commonly called eddies.

Tempests in Teapots

A funny aspect of the Navier-Stokes equations is that they don’t come with any recommended length scale. Properties of the fluid flow like velocity and pressure are modelled as smooth functions of continuous time and space. Of course we know that this model does not work on a atomic length scale, where we have to consider individual atoms. Pressure and velocity of a fluid flow don’t make any sense on a length scale that is smaller than the average distance between electrons and the atom nucleus.

We know this, but this is a fact that is not present in the model comprised by the Navier-Stokes equations!

But let us look at bigger length scales. An interesting feature of the solutions of the Navier-Stokes equations is that there are fluid flows that stretch over hundreds of meters that look like fluid flows that stretch over centimeters only. And it is really astonishing that this phenomenon can be observed in nature. This is another example of the unreasonable effectiveness of mathematical models.

You have seen an example of this in the introduction already. That was a boundary layer instability. Here is a full blown turbulent example:

The last two pictures are from the book:

• Arkady Tsinober: An Informal Conceptual Introduction to Turbulence, 2nd edition, Springer, Fluid Mechanics and Its Applications Volume 92, Berlin, 2009.

This is a nice introduction to the subject, especially if you are more interested in phenomenology than mathematical details.

Maybe you noticed the “Reynolds number” in the label text of the last picture. What is that?

People in business administration like management ratios; they throw all the confusing information they have about a company into a big mixer and extract one or two numbers that tell them where they stand, like business volume and earnings. People in hydrodynamics are somewhat similar; they define all kinds of “numbers” that condense a lot of information about fluid flows.

A CEO would want to know if the earnings of his company are positive or negative. We would like to know a number that tells us if a fluid flow is laminar or turbulent. Luckily, such a number already exists. It is the Reynolds number! A low number indicates a laminar flow, a high number a turbulent flow. Like the calculation of the revenue of a company, the calculation of the Reynolds number of a given fluid flow is not an exact science. Instead there is some measure of estimation necessary. The definition involves, for example, a “characteristic length scale”. This is a fuzzy concept that usually involves some object that interacts with – in our case – the fluid flow. The characteristic length scale in this case is the physical dimension of the object. While there is usually no objectively correct way to assing a “characteristic length” to a three dimensional object, this concept allows us nevertheless to distinguish the scattering of water waves on a ocean liner (length scale ≈ 103 meter) from their scattering on a peanut (length scale ≈ 10-2 meter).

The following graphic shows laminar and turbulent flows and their characteristic Reynolds numbers:

schematic display of laminar and turbulent flow for different Reynolds numbers

This graphic is from the book

• Thomas Bohr, Mogens H. Jensen, Giovanni Paladin and Angelo Vulpiani, Dynamical Systems Approach to Turbulence, Cambridge U. Press, Cambridge, 1998

But let us leave the Reynolds number for now and turn to one of its ingredients: viscosity. Understanding viscosity is important for understanding how eddies in a fluid flow are connected to energy dissipation.

Eddies dissipate kinetic energy

“Eddies,” said Ford, “in the space-time continuum.”

“Ah,” nodded Arthur, “is he? Is he?” He pushed his hands into the pocket of his dressing gown and looked knowledgeably into the distance.

“What?” said Ford.

“Er, who,” said Arthur, “is Eddy, then, exactly, then?”

– from Douglas Adams: Life, the Universe and Everything

A fluid flow can be pictured as consisting of a lot of small fluid packages that move alongside each other. In many situations, there will be some friction between these packages. In the case of fluids, this friction is called viscosity.

It is an empirical fact that at small velocities fluid flows are laminar: there are layers upon layers, with one layer moving at a constant speed, and almost no mixing. At the boundaries, the fluid will attach to the surrounding material, and the relative fluid flow will be zero. If you picture such a flow between a plate that is at rest, and a plate that is moving forward, you will see that due to friction between the layers a force needs to be exerted to keep the moving plate moving:

In the simplest approximation, you will have to exert some force F per unit area A, in order to sustain a linear increase of the velocity of the upper plate along the y-axis, \partial u / \partial y. The constant of proportionality is called the viscosity \mu:

\displaystyle{ \frac{F}{A} = \mu \frac{\partial u}{\partial y}}

More friction means a bigger viscosity: honey has a bigger viscosity than water.

If you stir honey, the fluid flow will come to a halt rather fast. The energy that you put in to start the fluid flow is turned to heat by dissipation. This mechanism is of course related to friction and therefore to viscosity.

It is possible to formulate an exact formula for this dissipation process using the Navier-Stokes equations. It is not hard to prove it, but I will only explain the involved gadgets.

A fluid flow in three dimensions can be described by stating the velocity of the fluid flow at a certain time t and \vec{x} \in \mathbb{R}^3 (I’m not specifying the region of the fluid flow or any boundary or initial conditions). Let’s call the velocity \vec{u}(t, \vec{x}).

Let’s assume that the fluid has a constant density \rho. Such a fluid is called incompressible. For convenience let us assume that the density is 1: \rho = 1. Then the kinetic energy E(\vec{u}) of a fluid flow at a fixed time t is given by

\displaystyle{ E(\vec{u}) = \int \| \vec{u}(t, \vec{x}) \|^2 \; d^3 x }

Let’s just assume that this integral is finite for the moment. This is the first gadget we need.

The second gadget is called enstrophy \epsilon of the fluid flow. This is a measure of how much eddies there are. It is the integral

\displaystyle{\epsilon = \int \| \nabla \times \vec{u} \|^2 \; d^3 x }

where \nabla \times denotes the curl of the fluid velocity. The faster the fluid rotates, the bigger the curl is.

(The math geeks will notice that the vector fields \vec{u} that have a finite kinetic energy and a finite enstrophy are precisely the elements of the Sobolev space H^1(\mathbb{R}^3))

Here is the relationship of the decay of the kinetic energy and the enstrophy, which is a consequence of the Navier-Stokes equations (and suitable boundary conditions):

\displaystyle{\frac{d}{d t} E = - \mu \epsilon}

This equation says that the energy decays with time, and it decays faster if there is a higher viscosity, and if there are more and stronger eddies.

If you are interested in the mathematically precise derivation of this equation, you can look it up in the book I already mentioned:

• C. Foias, R. Rosa, O. Manley and R. Temam, Navier-Stokes Equations and Turbulence, Cambridge U. Press, Cambridge, 2001.

This connection of eddies and dissipation could indicate that there is also a connection of eddies and some maximum entropy principle. Since eddies maximize dissipation, natural fluid flows should somehow tend towards the production of eddies. It would be interesting to know more about this!

In this post we have seen eddies at different length scales. There are buzzwords in meteorology for this:

You have seen eddies at the ‘microscale’ (left) and at the ‘mesoscale’ (middle). A blog post about eddies should of course mention the most famous eddy of the last decade, which formed at the ‘synoptic scale’:

Do you recognize it? That was Hurricane Katrina.

It is obviously important to understand disasters like this one on the synoptic scale. This is an active topic of ongoing research, both in meteorology and in climate science.


This Week’s Finds (Week 318)

21 August, 2011

The Earth’s climate is complicated. How well do we understand how it will react to the drastic increase in atmospheric carbon dioxide that we’re imposing on it? One reason it’s hard to be 100% sure is that the truly dramatic changes most scientists expect lie mostly in the future. There’s a lot of important evidence we’ll get only when it’s too late.

Luckily, the Earth’s past also shows signs of dramatic climate change: for example, the glacial cycles I began discussing last time in "week317". These cycles make an interesting test of how well we understand climate change. Of course, their mechanism is very different from that of human-caused global warming, so we might understand one but not the other. Indeed, earlier episodes like the Paleocene-Eocene Thermal Maximum might shed more light on what we’re doing to the Earth now! But still, the glacial cycles are an impressive instance of dramatic climate change, which we’d do well to understand.

As I hinted last week, a lot of scientists believe that the Earth’s glacial cycles are related to cyclic changes in the Earth’s orbit and the tilt of its axis. Since one of the first scientists to carefully study this issue was Milutin Milankovitch, these are called Milankovitch cycles. The three major types of Milankovitch cycle are:

• changes in the eccentricity of the Earth’s orbit – that is, how much the orbit deviates from being a circle:




(changes greatly exaggerated)

• changes in the obliquity, or tilt of the Earth’s axis:



precession, meaning changes in the direction of the Earth’s axis relative to the fixed stars:



Now, the first important thing to realize is this: it’s not obvious that Milankovitch cycles can cause glacial cycles. During a glacial period, the Earth is about 5°C cooler than it is now. But the Milankovitch cycles barely affect the overall annual amount of solar radiation hitting the Earth!

This fact is clear for precession or changes in obliquity, since these just involve the tilt of the Earth’s axis, and the Earth is nearly a sphere. The amount of Sun hitting a sphere doesn’t depend on how the sphere is ’tilted’.

For changes in the eccentricity of the Earth’s orbit, this fact is a bit less obvious. After all, when the orbit is more eccentric, the Earth gets closer to the Sun sometimes, but farther at other times. So you need to actually sit down and do some math to figure out the net effect. Luckily, Greg Egan did this for us—I’ll show you his calculation at the end of this article. It turns out that when the Earth’s orbit is at its most eccentric, it gets very, very slightly more energy from the Sun each year: 0.167% more than when its orbit is at its least eccentric.

So, there are interesting puzzles involved in the Milankovitch cycles. They don’t affect the total amount of radiation that hits the Earth each year—not much, anyway—but they do cause substantial changes in the amount of radiation that hits the Earth at various different latitudes in various different seasons. We need to understand what such changes might do.

James Croll was one of the first to think about this, back around 1875. He decided that what really matters is the amount of sunlight hitting the far northern latitudes in winter. When this was low, he claimed, glaciers would tend to form and an ice age would start. But later, in the 1920s, Milankovitch made the opposite claim: what really matters is the amount of sunlight hitting the far northern latitudes in summer. When this was low, an ice age would start.

If we take a quick look at the data, we see that the truth is not obvious:


I like this graph because it’s pretty… but I wish the vertical axes were labelled. We will see some more precise graphs in future weeks.

Nonetheless, this graph gives some idea of what’s going on. Precession, obliquity and eccentricity vary in complex but still predictable ways. From this you can compute the amount of solar energy that hits the surface of the Earth’s atmosphere on July 1st at a latitude of 65° N. That’s the yellow curve. People believe this quantity has some relation to the Earth’s temperature, as shown by the black curve at bottom. However, the relation is far from clear!

Indeed, if you only look at this graph, you might easily decide that Milankovitch cycles are not important in causing glacial cycles. But people have analyzed temperature proxies over long spans of time, and found evidence for cyclic changes at periods that match those of the Milankovitch cycles. Here’s a classic paper on this subject:

• J. D. Hays, J. Imbrie, and N. J. Shackleton, Variations in the earth’s orbit: pacemaker of the Ice Ages, Science 194 (1976), 1121-1132.

They selected two sediment cores from the Indian ocean, which contain sediments deposited over the last 450,000 years. They measured:

1) Ts, an estimate of summer sea-surface temperatures at the core site, derived from a statistical analysis of tiny organisms called radiolarians found in the sediments.

2) δ18O, the excess of the heavy isotope of oxygen in tiny organisms called foraminifera also found in the sediments.

3) The percentage of radiolarians that are Cycladophora davisiana—a certain species not used in the estimation of Ts.

Identical samples were analyzed for the three variables at 10-centimeter intervals throughout each core. Then they took a Fourier transform of this data to see at which frequencies these variables wiggle the most! When we take the Fourier transform of a function and then square it, the result is called the power spectrum. So, they actually graphed the power spectra for these three variables:

The top graph shows the power spectra for Ts, δ18O, and the percentage of Cycladophora davisiana. The second one shows the spectra after a bit of extra messing around. Either way, there seem to be peaks at frequencies of 19, 23, 42 and roughly 100 thousand years. However the last number is quite fuzzy: if you look, you’ll see the three different power spectra have peaks at 94, 106 and 122 thousand years.

So, some sort of cycles seem to be occurring. This is far from the only piece of evidence, but it’s a famous one.

Now let’s go over the three major forms of Milankovitch cycle, and keep our eye out for cycles that take place every 19, 23, 42 or roughly 100 thousand years!

Eccentricity

The Earth’s orbit is an ellipse, and the eccentricity of this ellipse says how far it is from being circular. But the eccentricity of the Earth’s orbit slowly changes: it varies from being nearly circular, with an eccentricity of 0.005, to being more strongly elliptical, with an eccentricity of 0.058. The mean eccentricity is 0.028. There are several periodic components to these variations. The strongest occurs with a period of 413,000 years, and changes the eccentricity by ±0.012. Two other components have periods of 95,000 and 123,000 years.

The eccentricity affects the percentage difference in incoming solar radiation between the perihelion, the point where the Earth is closest to the Sun, and the aphelion, when it is farthest from the Sun. This works as follows. The percentage difference between the Earth’s distance from the Sun at perihelion and aphelion is twice the eccentricity, and the percentage change in incoming solar radiation is about twice that. The first fact follows from the definition of eccentricity, while the second follows from differentiating the inverse-square relationship between brightness and distance.

Right now the eccentricity is 0.0167, or 1.67%. Thus, the distance from the Earth to Sun varies 3.34% over the course of a year. This in turn gives an annual variation in incoming solar radiation of about 6.68%. Note that this is not the cause of the seasons: those arise due to the Earth’s tilt, and occur at different times in the northern and southern hemispheres.

Obliquity

The angle of the Earth’s axial tilt with respect to the plane of its orbit, called the obliquity, varies between 22.1° and 24.5° in a roughly periodic way, with a period of 41,000 years. When the obliquity is high, the strength of seasonal variations is stronger.

Right now the obliquity is 23.44°, roughly halfway between its extreme values. It is decreasing, and will reach its minimum value around the year 10,000 CE.

Precession

The slow turning in the direction of the Earth’s axis of rotation relative to the fixed stars, called precession, has a period of roughly 23,000 years. As precession occurs, the seasons drift in and out of phase with the perihelion and aphelion of the Earth’s orbit.

Right now the perihelion occurs during the southern hemisphere’s summer, while the aphelion is reached during the southern winter. This tends to make the southern hemisphere seasons more extreme than the northern hemisphere seasons.

The gradual precession of the Earth is not due to the same physical mechanism as the wobbling of the top. That sort of wobbling does occur, but it has a period of only 427 days. The 23,000-year precession is due to tidal interactions between the Earth, Sun and Moon. For details, see:

• John Baez, The wobbling of the Earth and other curiosities.

In the real world, most things get more complicated the more carefully you look at them. For example, precession actually has several periodic components. According to André Berger, a top expert on changes in the Earth’s orbit, the four biggest components have these periods:

• 23,700 years

• 22,400 years

• 18,980 years

• 19,160 years

in order of decreasing strength. But in geology, these tend to show up either as a single peak around the mean value of 21,000 years, or two peaks at frequencies of 23,000 and 19,000 years.

To add to the fun, the three effects I’ve listed—changes in eccentricity, changes in obliquity, and precession—are not independent. According to Berger, cycles in eccentricity arise from ‘beats’ between different precession cycles:

• The 95,000-year eccentricity cycle arises from a beat between the 23,700-year and 19,000-year precession cycles.

• The 123,000-year eccentricity cycle arises from a beat between the 22,4000-year and 18,000-year precession cycles.

We should delve into all this stuff more deeply someday. For now, let me just refer you to this classic review paper:

• André Berger, Pleistocene climatic variability at astronomical frequencies, Quaternary International 2 (1989), 1-14.

Later, as I get up to speed, I’ll talk about more modern work.

Paleontology versus astronomy

So now we can compare the data from ocean sediments to the Milankovitch cycles as computed in astronomy:

• The roughly 19,000-year cycle in ocean sediments may come from 18,980-year and 19,160-year precession cycles.

• The roughly 23,000-year cycle in ocean sediments may come from 23,700-year precession cycle.

• The roughly 42,000-year cycle in ocean sediments may come from the 41,000-year obliquity cycle.

• The roughly 100,000-year cycle in ocean sediments may come from the 95,000-year and 123,000-year eccentricity cycles.

Again, the last one looks the most fuzzy. As we saw, different kinds of sediments seem to indicate cycles of 94, 106 and 122 thousand years. At least two of these periods match eccentricity cycles fairly well. But a detailed analysis would be required to distinguish between real effects and coincidences in this subject!

The effect of eccentricity

I bet some of you are hungry for some actual math. As I mentioned, it takes some work to see how changes in the eccentricity of the Earth’s orbit affect the annual average of sunlight hitting the top of the Earth’s atmosphere. Luckily Greg Egan has done this work for us. While the result is surely not new, his approach makes nice use of the fact that both gravity and solar radiation obey an inverse-square law. That’s pretty cool.

Here is his calculation:

The angular velocity of a planet is

\displaystyle{\frac{d \theta}{d t} = \frac{J}{m r^2} }

where J is the constant orbital angular momentum of the planet and m is its mass. Thus the radiant energy delivered per unit time to the planet is

\displaystyle{ \frac{d U}{d t} = \frac{C}{r^2}}

for some constant C. It follows that the energy delivered per unit of angular progress around the orbit is

\displaystyle{ \frac{d U}{d \theta} = \frac{C}{r^2} \frac{d t}{d \theta} = \frac{C m}{J} }

So, the total energy delivered in one period will be

\displaystyle{ U=\frac{2\pi C m}{J} }

How can we relate the orbital angular momentum J to the shape of the orbit? If you equate the total energy of the planet, kinetic \frac{1}{2}m v^2 plus potential -\frac{G M m}{r}, at its aphelion r_1 and perihelion r_2, and use J to get the velocity in the kinetic energy term from its distance, v=\frac{J}{m r}, when we solve for J we get:

\displaystyle{J = m \sqrt{\frac{2 G M r_1 r_2}{r_1+r_2}} = m b \sqrt{\frac{G M}{a}}}

where

\displaystyle{ a=\frac{1}{2} (r_1+r_2)}

is the semi-major axis of the orbit and

\displaystyle{ b=\sqrt{r_1 r_2} }

is the semi-minor axis. But we can also relate J to the period of the orbit, T, by integrating the rate at which orbital area is swept out by the planet:

\displaystyle{\frac{1}{2}  r^2 \frac{d \theta}{d t} = \frac{J}{2 m} }

over one orbit. Since the area of an ellipse is \pi a b, this gives us:

\displaystyle{ J = \frac{2 \pi a b m}{T} }

Equating these two expressions for J shows that the period is:

\displaystyle{ T = 2 \pi \sqrt{\frac{a^3}{G M}}}

So the period depends only on the semi-major axis; for a fixed value of a, it’s independent of the eccentricity.

As the eccentricity of the Earth’s orbit changes, the orbital period T, and hence the semi-major axis a, remains almost constant. So, we have:

\displaystyle{ U=\frac{2\pi C m}{J} = \frac{2\pi C}{b} \sqrt{\frac{a}{G M}} }

Expressing the semi-minor axis in terms of the semi-major axis and the eccentricity, b^2 = a^2 (1-e^2), we get:

\displaystyle{U=\frac{2\pi C}{\sqrt{G M a (1-e^2)}}}

So to second order in e, we have:

\displaystyle{U = \frac{\pi C}{\sqrt{G M a}} (2+e^2) }

The expressions simplify if we consider average rate of energy delivery over an orbit, which makes all the grungy constants related to gravitational dynamics go away:

\displaystyle{\frac{U}{T} = \frac{C}{a^2 \sqrt{1-e^2}} }

or to second order in e:

\displaystyle{ \frac{U}{T} = \frac{C}{a^2} (1+\frac{1}{2} e^2) }

We can now work out how much the actual changes in the Earth’s orbit affect the amount of solar radiation it gets! The eccentricity of the Earth’s orbit varies between 0.005 and 0.058. The total energy the Earth gets each year from solar radiation is proportional to

\displaystyle{ \frac{1}{\sqrt{1-e^2}} }

where e is the eccentricity. When the eccentricity is at its lowest value, e = 0.005, we get

\displaystyle{ \frac{1}{\sqrt{1-e^2}} = 1.0000125 }

When the eccentricity is at its highest value, e = 0.058, we get

\displaystyle{\frac{1}{\sqrt{1-e^2}} = 1.00168626 }

So, the change is about

\displaystyle{1.00168626/1.0000125 = 1.00167373 }

In other words, a change of merely 0.167%.

That’s very small And the effect on the Earth’s temperature would naively be even less!

Naively, we can treat the Earth as a greybody: an ideal object whose tendency to absorb or emit radiation is the same at all wavelengths and temperatures. Since the temperature of a greybody is proportional to the fourth root of the power it receives, a 0.167% change in solar energy received per year corresponds to a percentage change in temperature roughly one fourth as big. That’s a 0.042% change in temperature. If we imagine starting with an Earth like ours, with an average temperature of roughly 290 kelvin, that’s a change of just 0.12 kelvin!

The upshot seems to be this: in a naive model without any amplifying effects, changes in the eccentricity of the Earth’s orbit would cause temperature changes of just 0.12 °C!

This is much less than the roughly 5 °C change we see between glacial and interglacial periods. So, if changes in eccentricity are important in glacial cycles, we have some explaining to do. Possible explanations include season-dependent phenomena and climate feedback effects. Probably both are very important!

Next time I’ll start talking about some theories of how Milankovitch cycles might cause the glacial cycles. I thank Frederik De Roo, Martin Gisser and Cameron Smith for suggesting improvements to this issue before its release, over on the Azimuth Forum. Please join us over there.


Little did I suspect, at the time I made this resolution, that it would become a path so entangled that fully twenty years would elapse before I could get out of it. – James Croll, on his decision to study the cause of the glacial cycles


A Quantum of Warmth

2 July, 2011

guest post by Tim van Beek

The Case of the Missing 33 Kelvin, Continued

Last time, when we talked about putting the Earth in a box, we saw that a simple back-of-the-envelope calculation of the energy balance and resulting black body temperature of the earth comes surprisingly close to the right answer.

But there was a gap: the black body temperature calculated with a zero-dimensional energy balance model is about 33 kelvin lower than the estimated average surface temperature on Earth.

In other words, this simplified model predicts an Earth that’s 33 °C colder than it really is!

In such a situation, as theoretical physicists, we start by taking a bow, patting ourselves on the back, and congratulating ourselves on a successful first approximation.

Then we look for the next most important effect that we need to include in our model.

This effect needs to:

1) have a steady and continuous influence over thousands of years,

2) have a global impact,

3) be rather strong, because heating the planet Earth by 33 kelvin on the average needs a lot of power.

The simplest explanation would of course be that there is something fundamentally wrong with our back-of-the-envelope calculation.

One possibility, as Itai Bar-Natan mentioned last time, is geothermal energy. It certainly matches point 1, maybe matches point 2, but it is hard to guess if it matches point 3. As John pointed out, we can check the Earth’s energy budget on Wikipedia. This suggests that the geothermal heating is very small. Should we trust Wikipedia? I don’t know. We should check it out!

But I will not do that today. Instead I would like to talk about the most prominent explanation:

Most of you will of course have heard about the effect that climate scientists talk about, which is often—but confusingly—called the ‘greenhouse effect’, or ‘back radiation’. However, the term that is most accurate is downward longwave radiation (DLR), so I would like to use that instead.

In order to assess if this is a viable explanation of the missing 33 kelvin, we will first have to understand the effect better. So this is what I will talk about today.

In order to get a better understanding, we will have to peek into our simple model’s box and figure out what is going on in there in more detail.

Peeking into the Box: Surface and Atmosphere

To get a better approximation, instead of treating the whole earth as a black body, we will have to split up the system into the Earth itself, and its atmosphere. For the surface of the Earth it is still a good approximation to say that it is a black body.

The atmosphere is more complicated. In a next approximation step, I would like to pretend that the atmosphere is a body of its own, hovering above the surface of the earth, as a separate system. So we will ignore that there are several different layers in the atmosphere doing different things, including interactions with the surface. Well, we are not going to ignore the interaction with the surface completely, as you will see.

Since one can quickly get lost in details when discussing the atmosphere, I’m going to cheat and look up the overall average effects in an introductory meteorology textbook:

• C. Donald Ahrens: Meteorology Today, 9th edition, Brooks/Cole, Florence, Kentucky, 2009.

Here is what atmosphere and Earth’s surface do to the incoming radiation from the Sun (from page 48):

Of 100 units of inbound solar energy flux, 30 are reflected or scattered back to space without a contribution to the energy balance of the Earth. This corresponds to an overall average albedo of 0.3 for the Earth.

The next graphic shows the most important processes of heat and mass transport caused by the remaining 70 units of energy flux, with their overall average effect (from page 49):

Maybe you have some questions about this graphic; I certainly do.

Conduction and Convection?

Introductory classes for partial differential equations sometimes start with the one dimensional heat equation. This equation describes the temperature distribution of a rod of metal that is heated on one end and kept cool on the other. The kind of heat transfer occurring here is called conduction. The atoms or molecules stay where they are and transfer energy by interacting with their neighbors.

However, heat transfer by conduction is negligible for gases like the atmosphere. Why is it there in the graphic? The answer may be that conduction is still important for boundary layers. Or maybe the author wanted to include it to avoid the question “why is conduction not in the graphic?” I don’t know. But I’ll trust that the number associated with the “convection and conduction” part is correct, for now.

What is Latent Heat?

There is a label “latent heat” on the left part of the atmosphere: latent heat is energy input that does not result in a temperature increase, or energy output that does not result in a temperature decrease. This can happen, when there is a phase change of a component of the system. For example, when liquid water at 0°C freezes, it turns into ice at 0°C while losing energy to its environment. But the temperature of the whole system stays at 0°C.

The human body uses this effect, too, when it cools itself by sweating. This cooling effect works as long as the fluid water turns into water vapor and withdraws energy from the skin in the process.

The picture above shows a forest with water vapor (invisible), fluid (dispersed in the air) and snow. As the Sun sets, parts of the water vapor will eventually condense, and fluid water will turn into ice, releasing energy to the environment. During the phase changes there will be energy loss without a temperature decrease of the water.

Downward Longwave Radiation

When there is a lot of light there are also dark shadows. — main character in Johann Wolfgang von Goethe’s Götz von Berlichingen

Last time we pretended that the Earth as a whole behaves like a black body.

Now that we split up the Earth into surface and atmosphere, you may notice that:

a) a lot of sunlight passes through the atmosphere and reaches the surface, and

b) there is a lot of energy flowing downwards from the atmosphere to the surface in form of infrared radiation. This is called downward longwave radiation.

Observation a) shows that the atmosphere does not act like a black body at all. Instead, it has a nonzero transmittance, which means that not all incoming radiation is absorbed.

Observation b) shows that assuming that the black body temperature of the Earth is equal to the average surface temperature could go wrong, because—from the viewpoint of the surface—there is an additional inbound energy flux from the atmosphere.

The reason for both observations is that the atmosphere consists of various gases, like O2, N2, H2O (water vapor) and CO2. Any gas molecule can absorb and emit radiation only at certain frequencies, which are called its emission spectrum. This fact led to the development of quantum mechanics, which can be used to calculate the emission spectrum of any molecule.

Molecules and Degrees of Freedom

When a photon hits a molecule, the molecule can absorb the photon and gain energy in three main ways:

• One of its electron can climb to a higher energy level.

• The molecule can vibrate more strongly.

• The molecule can rotate more rapidly.

To get a first impression of the energy levels involved in these three processes, let’s have a look at this graphic:

This is taken from the book

* Sune Svanberg, _Atomic and Molecular Spectroscopy: Basic Aspects and Practical Applications_, 4th edition, Advanced Texts in Physics, Springer, Berlin, 2004.

The y-axis shows the energy difference in ‘eV’, or ‘electron volts’. An electron volt is the amount of energy an electron gains or loses as its potential changes by one volt.

Accoding to quantum mechanics, a molecule can emit and absorb only photons whose energy matches the difference of one of the discrete energy levels in the graphic, for any one of the three processes.

It is possible to use the characteristic absorption and emission properties of molecules of different chemical species to analyze the chemical composition of an unknown probe of gases (and other materials, too). These methods are usually called names involving the word ‘spectroscopy’. For example, infrared spectroscopy involves methods that examine what happens to infrared radiation when you send it to your probe.

By the way, Wikipedia has a funny animated picture of the different vibrational modes of a molecule on the page about infrared spectroscopy.

But why does so much of radiation from the Sun pass through the atmosphere, while a lot of infrared radiation emitted by the Earth instead bounces back to the surface? The answer to this puzzle involves a specific property of certain components of the atmosphere.

Can You See an Electron Hopping?

Here is a nice overview of the spectrum of electromagnetic radiation:

The energy E and the wavelength \lambda of a photon have a very simple relationship:

\displaystyle{ E = \frac{c \; h}{\lambda}}

where h is Planck’s constant and c is the speed of light. In short, photons with longer wavelengths have less energy.

Planck’s constant is

h \approx 6 \times 10^{-15} \; eV \times s

while the speed of light is

c \approx 3 \times 10^{8} \; m/s

Plugging these into the formula we get that a photon with an energy of one electron volt has a wavelength of about 1.2 micrometers, which is just outside the visible range, a bit towards the infrared direction. The visible range corresponds to 1.6 to 3.4 electron volts. If you want, you can scroll up to the graphic with the energy levels and calculate which processese will result in which kind of radiation.

Electrons that take a step down the orbital ladder in an atom emit a photon. Depending on the atom and the kind of transition, some of those photons will be in the visible range, and some will be in the ultraviolet.

There is no Infrared from the Sun (?)

From the Planck distribution, we can determine that the Sun and Earth, which are approximately black bodies, emit radiation mostly at very different wavelengths:

This graphic is sometimes called ‘twin peak graph’.

Oversimplifying, we could say: The Earth emits infrared radiation; the Sun emits almost no infrared. So, if you find infrared radiation on earth, you can be sure that it did not come from the Sun.

The problem with this statement is that, strictly speaking, the Sun does emit radiation at wavelengths that are in the infrared range. This is the reason why people have come up with the term near-infra-red radiation, which we define to be the range of 0.85 and 5.0 micrometer wavelength. Radiation with longer wavelengths is called far infrared. With these definitions we can say that the Sun radiates in the near-infra-red range, and earth does not.

Only certain components of the atmosphere emit and absorb radiation in the infrared part. These are called—somewhat misleadingly—greenhouse gases. I would like to call them ‘infrared-active gases’ instead, but unfortunately the ‘greenhouse gas’ misnomer is very popular. Two prominent ones are H2O and CO2:

The atmospheric window at 8 to 12μm is quite transparent, which means that this radiation passes from the surface through the atmosphere into space without much ado. Therefore, this window is used by satellites to estimate the surface temperature.

Since most radiation coming from the Earth is infrared, and only some constituents of the atmosphere react to it—excluding the major ones—a small amount of, say, CO2 could have a lot of influence on the energy balance. Like being the only one in a group of hundreds with a boom box. But we should check that more thoroughly.

Can a Cold Body Warm a Warmer Body?

Downward longwave radiation warms the surface, but the atmosphere is colder than the surface, so how can radiation from the colder atmosphere result in a higher surface temperature? Doesn’t that violate the second law of thermodynamics?

The answer is: no, it does not. It turns out that others have already taken pains to explain this on the blogosphere, so I’d like to point you there instead of trying to do a better job here:

• Roy Spencer, Yes, Virginia, cooler objects can make warmer objects even warmer still, 23 July 2010.

• The Science of Doom, The amazing case of “back-radiation”, 27 July 2010.

It’s the Numbers, Stupid!

Shut up and calculate! — Leitmotiv of several prominent physicists after becoming exhausted by philosophical discussions about the interpretation of quantum mechanics.

Maybe we have succeeded by now to convince the imaginary advisory board of the zero dimensional energy balance model project that there really is an effect like ‘downward longwave radiation’. It certainly should be there if quantum mechanics is right. But I have not explained yet how big it is. According to the book Meteorology Today, it is big. But maybe the people who contributed to the graphic got fooled somehow; and there really is a different explanation for the case of the missing 33 kelvin.

What do you think?

When we dip our toes into a new topic, it is important to keep simple yet fundamental questions like this in mind, and keep asking them.

In this case we are lucky: it is possible to measure the amount of downward longwave radiation. There are a lot of field studies, and the results have been incorporated in global climate models. But we will have to defer this story to another day.


Is Life Improbable?

31 May, 2011

Mine? Yes. And maybe you’ve wondered just how improbable your life is. But that’s not really the question today…

Here at the Centre for Quantum Technologies, Dagomir Kaszlikowski asked me to give a talk on this paper:

• John Baez, Is life improbable?, Foundations of Physics 19 (1989), 91-95.

This was the second paper I wrote, right after my undergraduate thesis. Nobody ever seemed to care about it, so it’s strange—but nice—to finally be giving a talk on it.

My paper does not try to settle the question its title asks. Rather, it tries to refute the argument here:

• Eugene P. Wigner, The probability of the existence of a self-reproducing unit, Symmetries and Reflections, Indiana University Press, Bloomington, 1967, pp. 200-208.

According Wigner, his argument

purports to show that, according to standard quantum mechanical theory, the probability is zero for the existence of self-reproducing states, i.e., organisms.

Given how famous Eugene Wigner is (he won a Nobel prize, after all) and how earth-shattering his result would be if true, it’s surprising how little criticism his paper has received. David Bohm mentioned it approvingly in 1969. In 1974 Hubert Yockey cited it saying

for all physics has to offer, life should never have appeared and if it ever did it would soon die out.

As you’d expect, there are some websites mentioning Wigner’s argument as evidence that some supernatural phenomenon is required to keep life going. Wigner himself believed it was impossible to formulate quantum theory in a fully consistent way without referring to consciousness. Since I don’t believe either of these claims, I think it’s good to understand the flaw in Wigner’s argument.

So, let me start by explaining his argument. Very roughly, it purports to show that if there are many more ways a chunk of matter can be ‘dead’ than ‘living’, the chance is zero that we can choose some definition of ‘living’ and a suitable ‘nutrient’ state such that every ‘living’ chunk of matter can interact with this ‘nutrient’ state to produce two ‘living’ chunks.

In making this precise, Wigner considers more than just two chunks of matter: he also allows there to be an ‘environment’. So, he considers a quantum system made of three parts, and described by a Hilbert space

H = H_1 \otimes H_1 \otimes H_2

Here the first H_1 corresponds to a chunk of matter. The second H_1 corresponds to another chunk of matter. The space H_3 corresponds to the ‘environment’. Suppose we wait for a certain amount of time and see what the system does; this will be described by some unitary operator

S: H \to H

Wigner asks: if we pick this operator S in a random way, what’s the probability that there’s some n-dimensional subspace of ‘living organism’ states in H_1, and some ‘nutrient plus environment’ state in H_1 \otimes H_2, such that the time evolution sends any living organism together with the nutrient plus environment to two living organisms and some state of the environment?

A bit more precisely: suppose we pick S in a random way. Then what’s the probability that there exists an n-dimensional subspace

V \subseteq H_1

and a state

w \in H_1 \otimes H_2

such that S maps every vector in V \otimes \langle w \rangle to a vector in V \otimes V \otimes H_2? Here \langle w \rangle means the 1-dimensional subspace spanned by the vector w.

And his answer is: if

\mathrm{dim}(H_1) \gg n

then this probability is zero.

You may need to reread the last few paragraphs a couple times to understand Wigner’s question, and his answer. In case you’re still confused, I should say that V \subseteq H_1 is what I’m calling the space of ‘living organism’ states of our chunk of matter, while w \in H_1 \otimes H_2 is the ‘nutrient plus environment’ state.

Now, Wigner did not give a rigorous proof of his claim, nor did he say exactly what he meant by ‘probability’: he didn’t specify a probability measure on the space of unitary operators on H. But if we use the obvious choice (called ‘normalized Haar measure’) his argument can most likely be turned into a proof.

So, I don’t want to argue with his math. I want to argue with his interpretation of the math. He concludes that

the chances are nil for the existence of a set of ‘living’ states for which one can find a nutrient of such nature that interaction always leads to multiplication.

The problem is that he fixed the decomposition of the Hilbert space H as a tensor product

H = H_1 \otimes H_1 \otimes H_2

before choosing the time evolution operator S. There is no good reason to do that. It only makes sense split up a physical into parts this way after we have some idea of what the dynamics is. An abstract Hilbert space doesn’t come with a favored decomposition as a tensor product into three parts!

If we let ourselves pick this decomposition after picking the operator S, the story changes completely. My paper shows:

Theorem 1. Let H, H_1 and H_2 be finite-dimensional Hilbert spaces with H \cong H_1 \otimes H_1 \otimes H_2. Suppose S : H \to H is any unitary operator, suppose V is any subspace of H_1, and suppose w is any unit vector in H_1 \otimes H_2 Then there is a unitary isomorphism

U: H \to H_1 \otimes H_1 \otimes H_2

such that if we identify H with H_1 \otimes H_1 \otimes H_2 using U, the operator S maps V \otimes \langle w \rangle into V \otimes V \otimes H_2.

In other words, if we allow ourselves to pick the decomposition after picking S, we can always find a ‘living organism’ subspace of any dimension we like, together with a ‘nutrient plus environment’ state that allows our living organism to reproduce.

However, if you look at the proof in my paper, you’ll see it’s based on a kind of cheap trick (as I forthrightly admit). Namely, I pick the ‘nutrient plus environment’ state to lie in V \otimes H_2, so the nutrient actually consists of another organism!

This goes to show that you have to be very careful about theorems like this. To prove that life is improbable, you need to find some necessary conditions for what counts as life, and show that these are improbable (in some sense, and of course it matters a lot what that sense is). Refuting such an argument does not prove that life is probable: for that you need some sufficient conditions for what counts as life. And either way, if you prove a theorem using a ‘cheap trick’, it probably hasn’t gotten to grips with the real issues.

I also show that as the dimension of H approaches infinity, the probability approaches 1 that we can get reproduction with a 1-dimensional ‘living organism’ subspace and a ‘nutrient plus environment’ state that lies in orthogonal complement of V \otimes H_2. In other words, the ‘nutrient’ is not just another organism sitting there all ready to go!

More precisely:

Theorem 2. Let H, H_1 and H_2 be finite-dimensional Hilbert spaces with \mathrm{dim}(H) = \mathrm{dim}(H_1)^2 \cdot \mathrm{dim}(H_2). Let \mathbf{S'} be the set of unitary operators S: H \to H with the following property: there’s a unit vector v \in H_1, a unit vector w \in V^\perp \otimes H_2, and a unitary isomorphism

U: H \to H_1 \otimes H_1 \otimes H_2

such that if we identify H with H_1 \otimes H_1 \otimes H_2 using U, the operator S maps v \otimes w into \langle v\rangle \otimes \langle v \rangle \otimes H_2. Then the normalized Haar measure of \mathbf{S'} approaches 1 as \mathrm{dim}(H) \to \infty.

Here V^\perp is the orthogonal complement of V \subseteq H_1; that is, the space of all vectors perpendicular to V.

I won’t include the proofs of these theorems, since you can see them in my paper.

Just to be clear: I certainly don’t think these theorems prove that life is probable! You can’t have theorems without definitions, and I think that coming up with a good general definition of ‘life’, or even supposedly simpler concepts like ‘entity’ and ‘reproduction’, is extremely tough. The formalism discussed here is oversimplified for dozens of reasons, a few of which are listed at the end of my paper. So far we’re only in the first fumbling stages of addressing some very hard questions.

All my theorems do is point out that Wigner’s argument has a major flaw: he’s choosing a way to divide the world into chunks of matter and the environment before choosing his laws of physics. This doesn’t make much sense, and reversing the order dramatically changes the conclusions.

By the way: I just started looking for post-1989 discussions of Wigner’s paper. So far I haven’t found any interesting ones. Here’s a more recent paper that’s somewhat related, which doesn’t mention Wigner’s work:

• Indranil Chakrabarty and Prashant, Non existence of quantum mechanical self replicating machine, 2005.

The considerations here seem more closely related to the Wooters–Zurek no-cloning theorem.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers