Symmetry and The Fourth Dimension (Part 3)

22 July, 2012

So, where were we? I got a bit distracted by other things. So let me review, but also push ahead a bit further. This time we’ll see what a ‘Coxeter diagram’ is. Later we’ll use Coxeter diagrams as ways to describe lots of wonderful shapes.

Platonic solids and their Coxeter complexes

In Part 1 we started by looking at the five Platonic solids:

We saw that each Platonic solid has a bunch of symmetries where we reflect it across planes called mirrors, which all intersect at the center of the solid. If we take a sphere and slice it with these mirrors, it gets chopped up into triangles, and we get a pattern called a Coxeter complex.

Let’s pick a random Platonic solid and see how it works. For example, the dodecahedron:

gives this Coxeter complex:

In simple terms: we puff up our Platonic solid into a nice round sphere, and then chop its faces into triangles drawn on this sphere. That’s how it always works. And you’ll notice that each triangle has:

• one corner at the center of a face of our Platonic solid,

• one corner at the center of an edge of our Platonic solid,

• one corner at a vertex of our Platonic solid.

So any triangle determines a vertex, edge and face of our Platonic solid! But we don’t get just any old vertex, edge and face this way. The vertex has to lie on the edge, and the edge has to lie on the face.

That’s how it always works—check and see! So, let’s make up a definition:

Definition. Given a Platonic solid, a flag is a vertex, edge and face where the vertex lies on the edge and the edge lies on the face.

In fact, a Platonic solid always gives a Coxeter complex with one triangle for each flag.

Coxeter groups

Next, in Part 2, we looked at operations that flip these triangles around. I showed you how it works for the cube. I was too lazy to puff it up into a sphere, but I chopped its surface into triangles, one for each flag:

This is the lazy man’s way to draw the Coxeter complex of the cube.

Then I studied operations that flip triangles over to triangles that touch them, One operation, called V, changes which vertex of the cube our triangle contains. For example, if we start with this black triangle:

the operation V changes it to this blue one:

Another operation, called E, changes which edge of the cube our triangle touches:

And a third, called F, changes which face of the cube our triangle lies on:

Using these operations, we can get to any triangle starting from any other. But the cool part is that these operations obey some equations! For any Platonic solid, they always obey these equations:

V2 = E2 = F2 = 1

These say flipping twice the same way gets you back where you started. But there are also three equations that are more interesting. Some of these depend on which Platonic solid we’re looking at. They can be seen using pretty pictures:

• For the cube we get this relation:

(VE)4 = 1

from this picture:

Get it? We flip our triangle to change which vertex of the cube it contains, then flip it to change which edge of the cube it touches… and after 8 flips we’re back to where we started, so:

VEVEVEVE = 1

or for short:

(VE)4 = 1

We get a 4 in the exponent here because each face of the cube has 4 edges and 4 vertices. So, you can easily work out how this relation goes for any Platonic solid.

• For the cube we also get this relation:

(VF)2 = 1

from this picture:

We get a 2 in the exponent here because each edge of the cube contains 2 vertices and touches 2 faces. This is true no matter what Platonic solid we have!

• And finally, for the cube we get this relation:

(EF)3 = 1

from this picture:

We get a 3 in the exponent here because each vertex of the cube touches 3 edges and 3 faces. So, you can easily work out how this relation goes for any Platonic solid, too.

The operations V, E, F, together with these relations, generate something called the Coxeter group of the Platonic solid. To know exactly what I mean here you need to know a wee bit of group theory, which is the study of symmetry. But even if you don’t, I hope you get the idea: the Coxeter group consists of all the ways we can flip one triangle in our Coxeter complex over to another by a sequence of V, E, and F flips.

Coxeter diagrams

Coxeter diagrams start out being a cute way to record the equations we just saw. Later, they’ll become an amazingly system for creating and classifying highly symmetrical structures: Platonic solids in all dimensions, the solids we get by truncating these in various ways, and more.

Today I’ll draw these diagrams a bit differently than the ways you usually see in books. That’s okay: there are lots of different ways to draw them, depending on what you’re using them for.

In my current way of doing things, the Coxeter diagram of the cube looks like this:

V—4—E—3—F

There’s an edge from the letter V to the letter E, labelled by the number 4. This means

(VE)4 = 1

And there’s an edge from the letter E to the letter F, labelled by the number 3. This means

(EF)3 = 1

But there’s no edge from the V to the F. This means

(VF)2 = 1

In other words, we make up an extra rule: if an edge would be labelled by the number 2, we leave it invisible. The reason is that relations of this sort are ‘boring’: they show up a lot. For example, we’ve seen that (VF)2 = 1 for all Platonic solids.

We also don’t bother to record these relations:

V2 = E2 = F2 = 1

because these too are ‘boring’. We’ll assume relations of this sort always hold.

Now, I’ll leave you with a puzzle:

Puzzle 1. What are the Coxeter diagrams of the five Platonic solids? What patterns do you see in these?

Coxeter diagrams of regular tilings

But so you don’t think I’m wimping out, I’ll do an example myself. The triangular tiling of the plane is not a Platonic solid:

But it’s what you’d get if you tried to build a Platonic solid where 6 equilateral triangles meet at each vertex! And we can still define a Coxeter complex for it. It looks like this:

There’s one triangle here for each flag in the triangular tiling. And if you stare at this picture, you can read off the equations for the Coxeter group, just as we did for the cube. We have the boring equations

V2 = E2 = F2 = 1

but also:

(VE)3 = 1

since each face of the triangular tiling has 3 vertices and 3 edges, and:

(VF)2 = 1

since each edge of this tiling contains 2 vertices and touches 2 faces, and:

(EF)6 = 1

since each vertex of the this tiling touches 6 edges and 6 faces.

So, here’s the Coxeter diagram of the triangular tiling:

V—3—E—6—F

Whoops, now I can’t resist giving you another puzzle! There are two more regular tilings of the plane, so you should try those:

Puzzle 2. What are the Coxeter diagrams of the square tiling:

and the hexagonal tiling:

?


Disease-Spreading Zombies

20 July, 2012

Are you a disease-spreading zombie?

You may have read about the fungus that can infect an ant and turn it into a zombie, making it climb up the stem of a plant and hang onto it, then die and release spores from a stalk that grows out of its head.

But this isn’t the only parasite that controls the behavior of its host.

If you ever got sick, had diarrhea, and thought hard about why, you’ll understand what I mean. You were helping spread the disease… especially if you were poor and didn’t have a toilet. This is why improved sanitation actually reduces the virulence of some diseases: it’s no longer such a good strategy for bacteria to cause diarrhea, so they evolve away from it!

There are plenty of other examples. Lots of diseases make you sneeze or cough, spreading the germs to other people. The rabies virus drives dogs crazy and makes them want to bite. There’s a parasitic flatworm that makes ants want to climb to the top of a blade of grass, lock their jaws onto it and wait there until they get eaten by a sheep! But the protozoan Toxoplasma gondii is more mysterious.

It causes a disease called toxoplasmosis. You can get it from cats, you can get it from eating infected meat, and you can even inherit it from your mother.

Lots of people have it: somewhere between 1/3 and 1/2 of everyone in the world!

A while back, the Czech scientist Jaroslav Flegr did some experiments. He found that people who tested positive for this parasite have slower reaction times. But even more interestingly, he claims that men with the parasite are more introverted, suspicious, oblivious to other people’s opinions of them, and inclined to disregard rules… while infected women, are more outgoing, trusting, image-conscious, and rule-abiding than uninfected women!

What could explain this?

The disease is carried by both cats and mice. Cats catch it by eating mice. The disease causes behavior changes in mice: they seem to become more anxious and run around more. This may increase their chance of getting eaten by a cat and passing on the disease. But we are genetically similar to mice… so we too may become more anxious when we’re infected with this disease. And men and women may act differently when they’re anxious.

It’s just a theory so far. Nonetheless, I won’t be surprised to hear there are parasites that affect our behavior in subtle ways. I don’t know if viruses or bacteria are sophisticated enough to trigger changes in behavior more subtle than diarrhea… but there are always lots of bacteria in your body, about 10 times as many as actual human cells. Many of these belong to unidentified species. And as long as they don’t cause obvious pathologies, doctors have had little reason to study them.

As for viruses, don’t forget that about 8% of your DNA is made of viruses that once copied themselves into your ancestors’ genome. They’re called endogenous retroviruses, and I find them very spooky and fascinating. Once they get embedded in our DNA, they can’t always get back out: a lot of them are defective, containing deletions or nonsense mutations. But some may still be able to get back out. And there are hints that some are implicated in certain kinds of cancer and autoimmune disease.

Even more intriguingly, a 2004 study reported that antibodies to endogenous retroviruses were more common in people with schizophrenia! And the cerebrospinal fluid of people who’d recently gotten schizophrenia contained levels of a key enzyme used by retroviruses, reverse transcriptase, four times higher than control subjects.

So it’s possible—just possible—that some viruses, either free-living or built into our DNA, may change our behavior in subtle ways that increase their chance of spreading.

For more on Jaroslav Flegr’s research, read this fascinating article:

• Kathleen MacAuliffe, How your cat is making you crazy, The Atlantic, March 2012.

Among other things you’ll read about the parasitologists
Glenn McConkey and Joanne Webster, who have shown that Toxoplasma gondii has two genes that allow it to crank up production of the neurotransmitter dopamine in the host’s brain. It seems this makes rats feel pleasure when they smell a cat!

(Do you like cats? Hmm.)

Of course, in business and politics we see many examples of ‘parasites’ that hijack organizations and change these organizations’ behavior to benefit themselves. It’s not nice. But it’s natural.

So even if you aren’t a disease-spreading zombie, it’s quite possible you’re dealing with them on a regular basis.


Network Theory (Part 19)

18 July, 2012

joint with Jacob Biamonte

It’s time to resume the network theory series! We’re writing a little book based on some of these posts, so we want to finish up our discussion of stochastic Petri nets and chemical reaction networks. But it’s been a long time, so we’ll assume you forgot everything we’ve said before, and make this post as self-contained as possible.

Last time we started looking at a simple example: a diatomic gas.

A diatomic molecule of this gas can break apart into two atoms:

A_2 \to A + A

and conversely, two atoms can combine to form a diatomic molecule:

A + A \to A_2

We can draw both these reactions using a chemical reaction network:

where we’re writing B instead of A_2 to abstract away some detail that’s just distracting here.

Last time we looked at the rate equation for this chemical reaction network, and found equilibrium solutions of that equation. Now let’s look at the master equation, and find equilibrium solutions of that. This will serve as a review of three big theorems.

The master equation

We’ll start from scratch. The master equation is all about how atoms or molecules or rabbits or wolves or other things interact randomly and turn into other things. So, let’s write \psi_{m,n} for the probability that we have m atoms of A and n molecule of B in our container. These probabilities are functions of time, and master equation will say how they change.

First we need to pick a rate constant for each reaction. Let’s say the rate constant for this reaction is some number \alpha > 0:

B \to A + A

while the rate constant for the reverse reaction is some number \beta > 0:

A + A \to B

Before we make it pretty using the ideas we’ve been explaining all along, the master equation says

\begin{array}{ccr} \displaystyle{ \frac{d}{d t} \psi_{m,n} (t)} &=&  \alpha (n+1) \, \psi_{m-2,n+1}(t) \\ \\   &&- \; \alpha n \, \psi_{m,n}(t) \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1}(t) \\ \\ && - \;\beta m(m-1) \, \psi_{m,n}(t) \\ \\   \end{array}

where we define \psi_{i,j} to be zero if either i or j is negative.

Yuck!

Normally we don’t show you such nasty equations. Indeed the whole point of our work has been to demonstrate that by packaging the equations in a better way, we can understand them using high-level concepts instead of mucking around with millions of scribbled symbols. But we thought we’d show you what’s secretly lying behind our beautiful abstract formalism, just once.

Each term has a meaning. For example, the third one:

\beta (m+2)(m+1)\psi_{m+2,n-1}(t)

means that the reaction A + A \to B will tend to increase the probability of there being m atoms of A and n molecules of B if we start with m+2 atoms of A and n-1 molecules of B. This reaction can happen in (m+2)(m+1) ways. And it happens at a probabilistic rate proportional to the rate constant for this reaction, \beta.

We won’t go through the rest of the terms. It’s a good exercise to do so, but there could easily be a typo in the formula, since it’s so long and messy. So let us know if you find one!

To simplify this mess, the key trick is to introduce a generating function that summarizes all the probabilities in a single power series:

\Psi = \sum_{m,n \ge 0} \psi_{m,n} y^m \, z^n

It’s a power series in two variables, y and z, since we have two chemical species: As and Bs.

Using this trick, the master equation looks like

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

where the Hamiltonian H is a sum of terms, one for each reaction. This Hamiltonian is built from operators that annihilate and create As and Bs. The annihilation and creation operators for A atoms are:

\displaystyle{ a = \frac{\partial}{\partial y} , \qquad a^\dagger = y }

The annihilation operator differentiates our power series with respect to the variable y. The creation operator multiplies it by that variable. Similarly, the annihilation and creation operators for B molecules are:

\displaystyle{ b = \frac{\partial}{\partial z} , \qquad b^\dagger = z }

In Part 8 we explained a recipe that lets us stare at our chemical reaction network and write down this Hamiltonian:

H = \alpha ({a^\dagger}^2 b - b^\dagger b) + \beta (b^\dagger a^2 - {a^\dagger}^2 a^2)

As promised, there’s one term for each reaction. But each term is itself a sum of two: one that increases the probability that our container of chemicals will be in a new state, and another that decreases the probability that it’s in its original state. We get a total of four terms, which correspond to the four terms in our previous way of writing the master equation.

Puzzle 1. Show that this new way of writing the master equation is equivalent to the previous one.

Equilibrium solutions

Now we will look for all equilibrium solutions of the master equation: in other words, solutions that don’t change with time. So, we’re trying to solve

H \Psi = 0

Given the rather complicated form of the Hamiltonian, this seems tough. The challenge looks more concrete but even more scary if we go back to our original formulation. We’re looking for probabilities \psi_{m,n}, nonnegative numbers that sum to one, such that

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

for all m and n.

This equation is horrid! But the good news is that it’s linear, so a linear combination of solutions is again a solution. This lets us simplify the problem using a conserved quantity.

Clearly, there’s a quantity that the reactions here don’t change:

What’s that? It’s the number of As plus twice the number of Bs. After all, a B can turn into two As, or vice versa.

(Of course the secret reason is that B is a diatomic molecule made of two As. But you’d be able to follow the logic here even if you didn’t know that, just by looking at the chemical reaction network… and sometimes this more abstract approach is handy! Indeed, the way chemists first discovered that certain molecules are made of certain atoms is by seeing which reactions were possible and which weren’t.)

Suppose we start in a situation where we know for sure that the number of Bs plus twice the number of As equals some number k:

\psi_{m,n} = 0  \;  \textrm{unless} \; m+2n = k

Then we know \Psi is initially of the form

\displaystyle{ \Psi = \sum_{m+2n = k} \psi_{m,n} \, y^m z^n  }

But since the number of As plus twice the number of Bs is conserved, if \Psi obeys the master equation it will continue to be of this form!

Put a fancier way, we know that if a solution of the master equation starts in this subspace:

\displaystyle{ L_k =  \{ \Psi: \; \Psi = \sum_{m+2n = k} \psi_{m,n} y^m z^n \; \textrm{for some} \; \psi_{m,n} \}  }

it will stay in this subspace. So, because the master equation is linear, we can take any solution \Psi and write it as a linear combination of solutions \Psi_k, one in each subspace L_k.

In particular, we can do this for an equilibrium solution \Psi. And then all the solutions \Psi_k are also equilibrium solutions: they’re linearly independent, so if one of them changed with time, \Psi would too.

This means we can just look for equilibrium solutions in the subspaces L_k. If we find these, we can get all equilibrium solutions by taking linear combinations.

Once we’ve noticed that, our horrid equation makes a bit more sense:

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

Note that if the pair of subscripts m, n obey m + 2n = k, the same is true for the other pairs of subscripts here! So our equation relates the values of \psi_{m,n} for all the points (m,n) with integer coordinates lying on this line segment:

m+2n = k , \qquad m ,n \ge 0

You should be visualizing something like this:

If you think about it a minute, you’ll see that if we know \psi_{m,n} at two points on such a line, we can keep using our equation to recursively work out all the rest. So, there are at most two linearly independent equilibrium solutions of the master equation in each subspace L_k.

Why at most two? Why not two? Well, we have to be a bit careful about what happens at the ends of the line segment: remember that \psi_{m,n} is defined to be zero when m or n becomes negative. If we think very hard about this, we’ll see there’s just one linearly independent equilibrium solution of the master equation in each subspace L_k. But this is the sort of nitty-gritty calculation that’s not fun to watch someone else do, so we won’t bore you with that.

Soon we’ll move on to a more high-level approach to this problem. But first, one remark. Our horrid equation is like a fancy version of the usual discretized form of the equation

\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }

namely:

\psi_{n-1} - 2 \psi_{n} + \psi_{n+1} = 0

And this makes sense, since we get

\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }

by taking the heat equation:

\displaystyle \frac{\partial \psi}{\partial t} = {\frac{\partial^2 \psi}{\partial x^2} }

and assuming \psi doesn’t depend on time. So what we’re doing is a lot like looking for equilibrium solutions of the heat equation.

The heat equation describes how heat smears out as little particles of heat randomly move around. True, there don’t really exist ‘little particles of heat’, but this equation also describes the diffusion of any other kind of particles as they randomly move around undergoing Brownian motion. Similarly, our master equation describes a random walk on this line segment:

m+2n = k , \qquad m , n \ge 0

or more precisely, the points on this segment with integer coordinates. The equilibrium solutions arise when the probabilities \psi_{m,n} have diffused as much as possible.

If you think about it this way, it should be physically obvious that there’s just one linearly independent equilibrium solution of the master equation for each value of k.

There’s a general moral here, too, which we’re seeing in a special case: the master equation for a chemical reaction network really describes a bunch of random walks, one for each allowed value of the conserved quantities that can be built as linear combinations of number operators. In our case we have one such conserved quantity, but in general there may be more (or none).

Furthermore, these ‘random walks’ are what we’ve been calling Markov processes.

Noether’s theorem

We simplified our task of finding equilibrium solutions of the master equation by finding a conserved quantity. The idea of simplifying problems using conserved quantities is fundamental to physics: this is why physicists are so enamored with quantities like energy, momentum, angular momentum and so on.

Nowadays physicists often use ‘Noether’s theorem’ to get conserved quantities from symmetries. There’s a very simple version of Noether’s theorem for quantum mechanics, but in Part 11 we saw a version for stochastic mechanics, and it’s that version that is relevant now. Here’s a paper which explains it in detail:

• John Baez and Brendan Fong, Noether’s theorem for Markov processes.

We don’t really need Noether’s theorem now, since we found the conserved quantity and exploited it without even noticing the symmetry. Nonetheless it’s interesting to see how it relates to what we’re doing.

For the reaction we’re looking at now, the idea is that the subspaces L_k are eigenspaces of an operator that commutes with the Hamiltonian H. It follows from standard math that a solution of the master equation that starts in one of these subspaces, stays in that subspace.

What is this operator? It’s built from ‘number operators’. The number operator for As is

N_A = a^\dagger a

and the number operator for Bs is

N_B = b^\dagger b

A little calculation shows

N_A \,y^m z^n = m \, y^m z^n, \quad \qquad  N_B\, y^m z^n = n \,y^m z^n

so the eigenvalue of N_A is the number of As, while the eigenvalue of N_B is the number of Bs. This is why they’re called number operators.

As a consequence, the eigenvalue of the operator N_A + 2N_B is the number of As plus twice the number of Bs:

(N_A + 2N_B) \, y^m z^n = (m + 2n) \, y^m z^n

Let’s call this operator O, since it’s so important:

O = N_A + 2N_B

If you think about it, the spaces L_k we saw a minute ago are precisely the eigenspaces of this operator:

L_k = \{ \Psi : \; O \Psi = k \Psi \}

As we’ve seen, solutions of the master equation that start in one of these eigenspaces will stay there. This lets take some techniques that are very familiar in quantum mechanics, and apply them to this stochastic situation.

First of all, time evolution as described by the master equation is given by the operators \exp(t H). In other words,

\displaystyle{ \frac{d}{d t} \Psi(t) } = H \Psi(t) \; \textrm{and} \;  \Psi(0) = \Phi \quad  \Rightarrow \quad \Psi(t) = \exp(t H) \Phi

But if you start in some eigenspace of O, you stay there. Thus if \Phi is an eigenvector of O, so is \exp(t H) \Phi, with the same eigenvalue. In other words,

O \Phi = k \Phi

implies

O \exp(t H) \Phi = k \exp(t H) \Phi = \exp(t H) O \Phi

But since we can choose a basis consisting of eigenvectors of O, we must have

O \exp(t H) = \exp(t H) O

or, throwing caution to the winds and differentiating:

O H = H O

So, as we’d expect from Noether’s theorem, our conserved quantity commutes with the Hamiltonian! This in turn implies that H commutes with any polynomial in O, which in turn suggests that

\exp(s O) H = H \exp(s O)

and also

\exp(s O) \exp(t H) = \exp(t H) \exp(s O)

The last equation says that O generates a 1-parameter family of ‘symmetries’: operators \exp(s O) that commute with time evolution. But what do these symmetries actually do? Since

O y^m z^n = (m + 2n) y^m z^n

we have

\exp(s O) y^m z^n = e^{s(m + 2n)}\, y^m z^n

So, this symmetry takes any probability distribution \psi_{m,n} and multiplies it by e^{s(m + 2n)}.

In other words, our symmetry multiplies the relative probability of finding our container of gas in a given state by a factor of e^s for each A atom, and by a factor of e^{2s} for each B molecule. It might not seem obvious that this operation commutes with time evolution! However, experts on chemical reaction theory are familiar with this fact.

Finally, a couple of technical points. Starting where we said “throwing caution to the winds”, our treatment has not been rigorous, since O and H are unbounded operators, and these must be handled with caution. Nonetheless, all the commutation relations we wrote down are true.

The operators \exp(s O) are unbounded for positive s. They’re bounded for negative s, so they give a one-parameter semigroup of bounded operators. But they’re not stochastic operators: even for s negative, they don’t map probability distributions to probability distributions. However, they do map any nonzero vector \Psi with \psi_{m,n} \ge 0 to a vector \exp(s O) \Psi with the same properties. So, we can just normalize this vector and get a probability distribution. The need for this normalization is why we spoke of relative probabilities.

The Anderson–Craciun–Kurtz theorem

Now we’ll actually find all equilibrium solutions of the master equation in closed form. To understand this final section, you really do need to remember some things we’ve discussed earlier. Last time we considered the same chemical reaction network we’re studying today, but we looked at its rate equation, which looks like this:

\displaystyle{ \frac{d}{d t} x_1 =  2 \alpha x_2 - 2 \beta x_1^2}

\displaystyle{ \frac{d}{d t} x_2 = - \alpha x_2 + \beta x_1^2 }

This describes how the number of As and Bs changes in the limit where there are lots of them and we can treat them as varying continuously, in a deterministic way. The number of As is x_1, and the number of Bs is x_2.

We saw that the quantity

x_1 + 2 x_2

is conserved, just as today we’ve seen that N_A + 2 N_B is conserved. We saw that the rate equation has one equilibrium solution for each choice of x_1 + 2 x_2. And we saw that these equilibrium solutions obey

\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }

The Anderson–Craciun–Kurtz theorem, introduced in Part 9, is a powerful result that gets equilibrium solution of the master equation from equilibrium solutions of the rate equation. It only applies to equilibrium solutions that are ‘complex balanced’, but that’s okay:

Puzzle 2. Show that the equilibrium solutions of the rate equation for the chemical reaction network

are complex balanced.

So, given any equilibrium solution (x_1,x_2) of our rate equation, we can hit it with the Anderson-Craciun-Kurtz theorem and get an equilibrium solution of the master equation! And it looks like this:

\displaystyle{  \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

In this solution, the probability distribution

\displaystyle{ \psi_{m,n} = e^{-(x_1 + x_2)} \,  \frac{x_1^m x_2^n} {m! n! } }

is a product of Poisson distributions. The factor in front is there to make the numbers \psi_{m,n} add up to one. And remember, x_1, x_2 are any nonnegative numbers with

\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }

So from all we’ve said, the above formula is an explicit closed-form solution of the horrid equation

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

That’s pretty nice. We found some solutions without ever doing any nasty calculations.

But we’ve really done better than getting some equilibrium solutions of the master equation. By restricting attention to n,m with m+2n = k, our formula for \psi_{m,n} gives an equilibrium solution that lives in the eigenspace L_k:

\displaystyle{  \Psi_k = e^{-(x_1 + x_2)} \, \sum_{m+2n =k} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

And by what we’ve said, linear combinations of these give all equilibrium solutions of the master equation.

And we got them with very little work! Despite all the fancy talk in today’s post, we essentially just took the equilibrium solutions of the rate equation and plugged them into a straightforward formula to get equilibrium solutions of the master equation. This is why the Anderson–Craciun–Kurtz theorem is so nice. And of course we’re looking at a very simple reaction network: for more complicated ones it becomes even better to use this theorem to avoid painful calculations.

We could go further. For example, we could study nonequilibrium solutions using Feynman diagrams like this:

But instead, we will leave off with two more puzzles. We introduced some symmetries, but we haven’t really explored them yet:

Puzzle 3. What do the symmetries associated to the conserved quantity O do to the equilibrium solutions of the master equation given by

\displaystyle{  \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

where (x_1,x_2) is an equilibrium solution of the rate equation? In other words, what is the significance of the one-parameter family of solutions

\exp(s O) \Psi ?

Also, we used a conceptual argument to check that H commutes with O, but it’s good to know that we can check this sort of thing directly:

Puzzle 4. Compute the commutator

[H, O] = H O - O H

and show it vanishes.


The Mathematics of Biodiversity (Part 8)

14 July, 2012

Last time I mentioned that estimating entropy from real-world data is important not just for measuring biodiversity, but also for another area of biology: neurobiology!

When you look at something, neurons in your eye start firing. But how, exactly, is their firing related to what you see? Questions like this are hard! Answering them— ‘cracking the neural code’—is a big challenge. To make progress, neuroscientists are using information theory. But as I explained last time, estimating information from experimental data is tricky.

Romain Brasselet, now a postdoc at the Max Planck Institute for Biological Cybernetics at Tübingen, is working on these topics. He sent me a nice email explaining this area.

This is a bit of a digression, but the Mathematics of Biodiversity program in Barcelona has been extraordinarily multidisciplinary, with category theorists rubbing shoulders with ecologists, immunologists and geneticists. One of the common themes is entropy and its role in biology, so I think it’s worth posting Romain’s comments here. This is what he has to say…

Information in neurobiology

I will try to explain why neurobiologists are today very interested in reliable estimates of entropy/information and what are the techniques we use to obtain them.

The activity of sensory as well as more central neurons is known to be modulated by external stimulations. In 1926, in a seminal paper, Adrian observed that neurons in the sciatic nerve of the frog fire action potentials (or spikes) when some muscle in the hindlimb is stretched. In addition, he observed that the frequency of the spikes increases with the amplitude of the stretching.

• E.D. Adrian, The impulses produced by sensory nerve endings. (1926).

For another very nice example, in 1962, Hubel and Wiesel found neurons in the cat visual cortex whose activity depends on the orientation of a visual stimulus, a simple black line over white background: some neurons fire preferentially for one orientation of the line (Hubel and Wiesel were awarded the 1981 Nobel Prize in Physiology for their work). This incidentally led to the concept of “receptive field” which is of tremendous importance in neurobiology—but though it’s fascinating, it’s a different topic.

Good, we are now able to define what makes a neuron tick. The problem is that neural activity is often very “noisy”: when the exact same stimulus is presented many times, the responses appear to be very different from trial to trial. Even careful observation cannot necessarily reveal correlations between the stimulations and the neural activity. So we would like a measure capable of capturing the statistical dependencies between the stimulation and the response of the neuron to know if we can say something about the stimulation just by observing the response of a neuron, which is essentially the task of the brain. In particular, we want a fundamental measure that does not rely on any assumption about the functioning of the brain. Information theory provides the tools to do this, that is why we like to use it: we often try to measure the mutual information between stimuli and responses.

To my knowledge, the first paper using information theory in neuroscience was by MacKay and McCulloch in 1952:

• Donald M. Mackay and Warren S. McCulloch, The limiting information capacity of a neuronal link, Bulletin of Mathematical Biophysics 14 (1952), 127–135.

But information theory was not used in neuroscience much until the early 90’s. It started again with a paper by Bialek et al. in 1991:

• W. Bialek, F. Rieke, R. R. de Ruyter van Steveninck and D. Warland, Reading a neural code, Science 252 (1991), 1854–1857.

However, when applying information-theoretic methods to biological data, we often have a limited sampling of the neural response, we are usually very happy when we have 50 trials for a given stimulus. Why is this limited sample a problem?

During the major part of the 20th century, following Adrian’s finding, the paradigm for the neural code was the frequency of the spikes or, equivalently, the number of spikes in a window of time. But in the early 90’s, it was observed that the exact timing of spikes is (in some cases) reliable across trials. So instead of considering the neural response as a single number (the number of spikes), the temporal patterns of spikes started to be taken into account. But time is continuous, so to be able to do actual computations, time was discretized and a neural response became a binary string.

Now, if you consider relevant time-scales, say, a 100 millisecond time window with a 1 millisecond bin with a firing frequency of about 50 per second, then your response space is huge and the estimates of information with only 50 trials are not reliable anymore. That’s why a lot of efforts have been carried out to overcome the limited sampling bias.

Now, getting at the techniques developed in this field, John already mentioned the work by Liam Paninski, but here are other very interesting references:

• Stefano Panzeri and Alessandro Treves, Analytical estimates of limited sampling biases in different information measures, Network: Computation in Neural Systems 7 (1996), 87–107.

They computed the first-order bias of the information (related to the Miller–Madow correction) and then used a Bayesian technique to estimate the number of responses not included in the sample but that would be in an infinite sample (a goal similar to that of Good’s rule of thumb).

• S.P. Strong, R. Koberle, R.R. de Ruyter van Steveninck, and W. Bialek, Entropy and information in neural spike trains, Phys. Rev. Lett. 80 (1998), 197–200.

The entropy (or if you prefer, information) estimate can be expanded in a power series in N (the sample size) around the true value. By computing the estimate for various values of N and fitting it with a parabola, it is possible to estimate the value of the entropy as N \rightarrow \infty.

These approaches are also well-known:

• Ilya Nemenman, Fariel Shafee and William Bialek, Entropy and inference, revisited, 2002.

• Alexander Kraskov, Harald Stögbauer and Peter Grassberger, Estimating mutual information, Phys. Rev. E. 69 (2004), 066138.

Actually, Stefano Panzeri has quite a few impressive papers about this problem, and recently with colleagues he has made public a free Matlab toolbox for information theory (www.ibtb.org) implementing various correction methods.

Finally, the work by Jonathan Victor is worth mentioning, since he provided (to my knowledge again) the first estimate of mutual information using geometry. This is of particular interest with respect to the work by Christina Cobbold and Tom Leinster on measures of biodiversity that take the distance between species into account:

• J. D. Victor and K. P. Purpura, Nature and precision of temporal coding in visual cortex: a metric-space analysis, Journal of Neural Physiology 76 (1996), 1310–1326.

He introduced a distance between sequences of spikes and from this, derived a lower bound on mutual information.

• Jonathan D. Victor, Binless strategies for estimation of information from neural data, Phys. Rev. E. 66 (2002), 051903.

Taking inspiration from work by Kozachenko and Leonenko, he obtained an estimate of the information based on the distances between the closest responses.

Without getting too technical, that’s what we do in neuroscience about the limited sampling bias. The incentive is that obtaining reliable estimates is crucial to understand the ‘neural code’, the holy grail of computational neuroscientists.


The Mathematics of Biodiversity (Part 7)

12 July, 2012

How ignorant are you?

Do you know?

Do you know how much don’t you know?

It seems hard to accurately estimate your lack of knowledge. It even seems hard to say precisely how hard it is. But the cool thing is, we can actually extract an interesting math question from this problem. And one answer to this question leads to the following conclusion:

There’s no unbiased way to estimate how ignorant you are.

But the devil is in the details. So let’s see the details!

The Shannon entropy of a probability distribution is a way of measuring how ignorant we are when this probability distribution describes our knowledge.

For example, suppose all we care about is whether this ancient Roman coin will land heads up or tails up:

If we know there’s a 50% chance of it landing heads up, that’s a Shannon entropy of 1 bit: we’re missing one bit of information.

But suppose for some reason we know for sure it’s going to land heads up. For example, suppose we know the guy on this coin is the emperor Pupienus Maximus, a egomaniac who had lead put on the back of all coins bearing his likeness, so his face would never hit the dirt! Then the Shannon entropy is 0: we know what’s going to happen when we toss this coin.

Or suppose we know there’s a 90% it will land heads up, and a 10% chance it lands tails up. Then the Shannon entropy is somewhere in between. We can calculate it like this:

- 0.9 \log_2 (0.9) - 0.1 \log_2 (0.1) = 0.46899...

so that’s how many bits of information we’re missing.

But now suppose we have no idea. Suppose we just start flipping the coin over and over, and seeing what happens. Can we estimate the Shannon entropy?

Here’s a naive way to do it. First, use your experimental data to estimate the probability that that the coin lands heads-up. Then, stick that probability into the formula for Shannon entropy. For example, say we flip the coin 3 times and it lands head-up once. Then we can estimate the probability of it landing heads-up as 1/3, and tails-up as 2/3. So we can estimate that the Shannon entropy is

\displaystyle{ - \frac{1}{3} \log_2 (\frac{1}{3}) -\frac{2}{3} \log_2 (\frac{2}{3}) = 0.918... }

But it turns out that this approach systematically underestimates the Shannon entropy!

Say we have a coin that lands up a certain fraction of the time, say p. And say we play this game: we flip our coin n times, see what we get, and estimate the Shannon entropy using the simple recipe I just illustrated.

Of course, our estimate will depend on the luck of the game. But on average, it will be less than the actual Shannon entropy, which is

- p \log_2 (p) - (1-p) \log_2 (1-p)

We can prove this mathematically. But it shouldn’t be surprising. After all, if n = 1, we’re playing a game where we flip the coin just once. And with this game, our naive estimate of the Shannon entropy will always be zero! Each time we play the game, the coin will either land heads up 100% of the time, or tails up 100% of the time!

If we play the game with more coin flips, the error gets less severe. In fact it approaches zero as the number of coin flips gets ever larger, so that n \to \infty. The case where you flip the coin just once is an extreme case—but extreme cases can be good to think about, because they can indicate what may happen in less extreme cases.

One moral here is that naively generalizing on the basis of limited data can make you feel more sure you know what’s going on than you actually are.

I hope you knew that already!

But we can also say, in a more technical way, that the naive way of estimating Shannon entropy is a biased estimator: the average value of the estimator is different from the value of the quantity being estimated.

Here’s an example of an unbiased estimator. Say you’re trying to estimate the probability that the coin will land heads up. You flip it n times and see that it lands up m times. You estimate that the probability is m/n. That’s the obvious thing to do, and it turns out to be unbiased.

Statisticians like to think about estimators, and being unbiased is one way an estimator can be ‘good’. Beware: it’s not the only way! There are estimators that are unbiased, but whose standard deviation is so huge that they’re almost useless. It can be better to have an estimate of something that’s more accurate, even though on average it’s a bit too low. So sometimes, a biased estimator can be more useful than an unbiased estimator.

Nonetheless, my ears perked up when Lou Jost mentioned that there is no unbiased estimator for Shannon entropy. In rough terms, the moral is that:

There’s no unbiased way to estimate how ignorant you are.

I think this is important. For example, it’s important because Shannon entropy is also used as a measure of biodiversity. Instead of flipping a coin repeatedly and seeing which side lands up, now we go out and collect plants or animals, and see which species we find. The relative abundance of different species defines a probability distribution on the set of species. In this language, the moral is:

There’s no unbiased way to estimate biodiversity.

But of course, this doesn’t mean we should give up. We may just have to settle for an estimator that’s a bit biased! And people have spent a bunch of time looking for estimators that are less biased than the naive one I just described.

By the way, equating ‘biodiversity’ with ‘Shannon entropy’ is sloppy: there are many measures of biodiversity. The Shannon entropy is just a special case of the Rényi entropy, which depends on a parameter q: we get Shannon entropy when q = 1.

As q gets smaller, the Rényi entropy gets more and more sensitive to rare species—or shifting back to the language of probability theory, rare events. It’s the rare events that make Shannon entropy hard to estimate, so I imagine there should be theorems about estimators for Rényi entropy, which say it gets harder to estimate as q gets smaller. Do you know such theorems?

Also, I should add that biodiversity is better captured by the ‘Hill numbers’, which are functions of the Rényi entropy, than by the Rényi entropy itself. (See here for the formulas.) Since these functions are nonlinear, the lack of an unbiased estimator for Rényi entropy doesn’t instantly imply the same for the Hill numbers. So there are also some obvious questions about unbiased estimators for Hill numbers. Do you know answers to those?

Here are some papers on estimators for entropy. Most of these focus on estimating the Shannon entropy of a probability distribution on a finite set.

This old classic has a proof that the ‘naive’ estimator of Shannon entropy is biased, and estimates on the bias:

• Bernard Harris, The statistical estimation of entropy in the non-parametric case, Army Research Office, 1975.

He shows the bias goes to zero as we increase the number of samples: the number I was calling n in my coin flip example. In fact he shows the bias goes to zero like O(1/n). This is big big O notation which means that as n \to +\infty, the bias is bounded by some constant times 1/n. This constant depends on the size of our finite set—or, if you want to do better, the class number, which is the number of elements on which our probability distribution is nonzero.

Using this idea, he shows that you can find a less biased estimator if you have a probability distribution p_i on a finite set and you know that exactly k of these probabilities are nonzero. To do this, just take the ‘naive’ estimator I described earlier and add (k-1)/2n. This is called the Miller–Madow bias correction. The bias of this improved estimator goes to zero like O(1/n^2).

The problem is that in practice you don’t know ahead of time how many probabilities are nonzero! In applications to biodiversity this would amount to knowing ahead of time how many species exist, before you go out looking for them.

But what about the theorem that there’s no unbiased estimator for Shannon entropy? The best reference I’ve found is this:

• Liam Paninski, Estimation of entropy and mutual information, Neural Computation 15 (2003) 1191-1254.

In Proposition 8 of Appendix A, Paninski gives a quick proof that there is no unbiased estimator of Shannon entropy for probability distributions on a finite set. But his paper goes far beyond this. Indeed, it seems like a pretty definitive modern discussion of the whole subject of estimating entropy. Interestingly, this subject is dominated by neurobiologists studying entropy of signals in the brain! So, lots of his examples involve brain signals.

Another overview, with tons of references, is this:

• J. Beirlant, E. J. Dudewicz, L. Györfi, and E. C. van der Meulen, Nonparametric entropy estimation: an overview.

This paper focuses on the situation where don’t know ahead of time how many probabilities are nonzero:

• Anne Chao and T.-J. Shen, Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample, Environmental and Ecological Statistics 10 (2003), 429&–443.

In 2003 there was a conference on the problem of estimating entropy, whose webpage has useful information. As you can see, it was dominated by neurobiologists:

Estimation of entropy and information of undersampled probability distributions: theory, algorithms, and applications to the neural code, Whistler, British Columbia, Canada, 12 December 2003.

By the way, I was very confused for a while, because these guys claim to have found an unbiased estimator of Shannon entropy:

• Stephen Montgomery Smith and Thomas Schürmann, Unbiased estimators for entropy and class number.

However, their way of estimating entropy has a funny property: in the language of biodiversity, it’s only well-defined if our samples include at least one species of each organism. So, we cannot compute this estimate for an arbitary list of n samples. This means it’s not estimator in the usual sense—the sense that Paninski is using! So it doesn’t really contradict Paninski’s result.

To wrap up, let me state Paninski’s result in a mathematically precise way. Suppose p is a probability distribution on a finite set X. Suppose S is any number we can compute from p: that is, any real-valued function on the set of probability distributions. We’ll be interested in the case where S is the Shannon entropy:

\displaystyle{ S = -\sum_{x \in X} p(x) \, \log p(x) }

Here we can use whatever base for the logarithm we like: earlier I was using base 2, but that’s not sacred. Define an estimator to be any function

\hat{S}: X^n \to \mathbb{R}

The idea is that given n samples from the set X, meaning points x_1, \dots, x_n \in X, the estimator gives a number \hat{S}(x_1, \dots, x_n). This number is supposed to estimate some feature of the probability distribution p: for example, its entropy.

If the samples are independent and distributed according to the distribution p, the sample mean of the estimator will be

\displaystyle{ \langle \hat{S} \rangle = \sum_{x_1, \dots, x_n \in X} \hat{S}(x_1, \dots, x_n) \, p(x_1) \cdots p(x_n) }

The bias of the estimator is the difference between the sample mean of the estimator and actual value of S:

\langle \hat{S} \rangle - S

The estimator \hat{S} is unbiased if this bias is zero for all p.

Proposition 8 of Paninski’s paper says there exists no unbiased estimator for entropy! The proof is very short…

Okay, that’s all for today.

I’m back in Singapore now; I learned so much at the Mathematics of Biodiversity conference that there’s no way I’ll be able to tell you all that information. I’ll try to write a few more blog posts, but please be aware that my posts so far give a hopelessly biased and idiosyncratic view of the conference, which would be almost unrecognizable to most of the participants. There are a lot of important themes I haven’t touched on at all… while this business of entropy estimation barely came up: I just find it interesting!

If more of you blogged more, we wouldn’t have this problem.


The Mathematics of Biodiversity (Part 6)

6 July, 2012

Here are two fun botany stories I learned today from Lou Jost.

The decline and fall of the Roman Empire

I thought Latin was a long-dead language… except in Finland, where 75,000 people regularly listen to the news in Latin. That’s cool, but surely the last time someone seriously needed to write in Latin was at least a century ago… right?

No! Until the beginning of 2012, botanists reporting new species were required to do so in Latin.

Like this:

Arbor ad 8 alta, raminculis sparse pilosis, trichomatis 2-2.5 mm longis. Folia persistentia; laminae anisophyllae, foliis majoribus ellipticus, 12-23.5 cm longis, 6-13 cm latis, minoribus orbicularis, ca 8.5 cm longis, 7.5 cm latis, apice acuminato et caudato, acuminibus 1.5-2 cm longis, basi rotundata ad obtusam, margine integra, supra sericea, trichomatis 2.5-4 mm longis, appressis, pagina inferiore sericea ad pilosam, trichomatis 2-3 mm longis; petioli 4-7 mm longi. Inflorescentia terminalis vel axillaris, cymosa, 8-10 cm latis. Flores bisexuales; calyx tubularis, ca. 6 mm longus, 10-costatus; corolla alba, tubularis, 5-lobata; stamina 5, filis 8-10 mm longis, pubescentia ad insertionem.

The International Botanical Congress finally voted last year to drop this requirement. So, the busy people who are discovering about 2000 species of plants, algae and fungi each year no longer need to file their reports in the language of the Roman Empire.

Orchid Fever

The first person who publishes a paper on a new species of plant gets to name it. Sometimes the competition is fierce, as for the magnificent orchid shown above, Phragmipedium kovachii.

Apparently one guy beat another, his archenemy, by publishing an article just a few days earlier. But the other guy took his revenge by getting the first guy arrested for illegally taking an endangered orchid out of Peru. The first guy wound up getting two years’ probation and a $1,000 fine.

But, he got his name on the orchid!

I believe the full story appears here:

• Eric Hansen, Orchid Fever: A Horticultural Tale of Love, Lust, and Lunacy, Vintage Books, New York, 2001.

You can read a summary here.

Ecominga

By the way, Lou Jost is not only a great discoverer of new orchid species and a biologist deeply devoted to understanding the mathematics of biodiversity. He also runs a foundation called Ecominga, which runs a number of nature reserves in Ecuador, devoted to preserving the amazing biodiversity of the Upper Pastaza Watershed. This area contains over 190 species of plants not found anywhere else in the world, as well as spectacled bears, mountain tapirs, and an enormous variety of birds.

The forests here are being cut down… but Ecominga has bought thousands of hectares in key locations, and is protecting them. They need money to pay the locals who patrol and run the reserves. It’s not a lot of money in the grand scheme of things—a few thousand dollars a month. So if you’re interested, go to the Ecominga website, check out the information and reports and pictures, and think about giving them some help! Or for that matter, contract me and I’ll put you in touch with him.


The Mathematics of Biodiversity (Part 5)

3 July, 2012

I’d be happy to get your feedback on these slides of the talk I’m giving the day after tomorrow:

• John Baez, Diversity, entropy and thermodynamics, 6 July 2012, Exploratory Conference on the Mathematics of Biodiversity, Centre de Recerca Matemàtica, Barcelona.

Abstract: As is well known, some popular measures of biodiversity are formally identical to measures of entropy developed by Shannon, Rényi and others. This fact is part of a larger analogy between thermodynamics and the mathematics of biodiversity, which we explore here. Any probability distribution can be extended to a 1-parameter family of probability distributions where the parameter has the physical meaning of ‘temperature’. This allows us to introduce thermodynamic concepts such as energy, entropy, free energy and the partition function in any situation where a probability distribution is present—for example, the probability distribution describing the relative abundances of different species in an ecosystem. The Rényi entropy of this probability distribution is closely related to the change in free energy with temperature. We give one application of thermodynamic ideas to population dynamics, coming from the work of Marc Harper: as a population approaches an ‘evolutionary optimum’, the amount of Shannon information it has ‘left to learn’ is nonincreasing. This fact is closely related to the Second Law of Thermodynamics.

This talk is rather different than the one I’d envisaged giving! There was a lot of interest in my work on Rényi entropy and thermodynamics, because Rényi entropies—and their exponentials, called the Hill numbers—are an important measure of biodiversity. So, I decided to spend a lot of time talking about that.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers