Symmetry and the Fourth Dimension (Part 4)

26 July, 2012

Last time I posed a puzzle: figure out the Coxeter diagrams of the Platonic solids.

When you do this, it’s hard to help noticing a cool fact: if you take a Platonic solid and draw a dot in the center of each face, these dots are the vertices of another Platonic solid, called its dual. And if we do this again, we get back the same Platonic solid that we started with! These two solids have very similar Coxeter diagrams.

For example, starting with the cube, we get the octahedron:

Starting with the octahedron, we get back the cube:

These pictures were made by Alan Goodman, and if you go to his webpage you can see how all 5 Platonic solids work, and you can download his free book, which includes a good elementary introduction to group theory:

• Alan Goodman, Algebra: Abstract and Concrete, SemiSimple Press, Iowa City, 2012.

When we take the dual of a Platonic solid, or any other polyhedron, we replace:

• each vertex by a face,
• each edge by an edge,
• each face by a vertex.

So, it should not be surprising that in the Coxeter diagram, which records information about vertices, edges and faces, we just switch the letters V and F.

Here’s the story in detail.

Tetrahedron

 

The tetrahedron is its own dual, and its Coxeter diagram

V—3—E—3—F

doesn’t change when we switch the letters V and F. Remember, this diagram means that the tetrahedron has:

• 3 vertices and 3 edges around each face,
• 3 edges and 3 faces around each vertex.

Cube and Octahedron

 

   

The dual of the cube is the octahedron, and vice versa. The Coxeter diagram of the cube is:

V—4—E—3—F

because the cube has:

• 4 vertices and 4 edges around each face,
• 3 edges and 3 faces around each vertex.

On the other hand, the Coxeter diagram of the octahedron is:

V—3—E—4—F

because it has:

• 3 vertices and 3 edges around each face,
• 4 edges and 4 faces around each vertex.

If we switch the letters V and F in one of these Coxeter diagrams, we get the other one… drawn backwards, but that doesn’t count in this game.

Dodecahedron and Icosahedron

 

   

The dual of the dodecahedron is the icosahedron, and vice versa. The Coxeter diagram of the dodecahedron is:

V—5—E—3—F

because it has:

• 5 vertices and 5 edges around each face,
• 3 edges and 3 faces around each vertex.

The Coxeter diagram of the icosahedron is:

V—3—E—5—F

because it has:

• 3 vertices and 3 edges around each face,
• 5 edges and 5 faces around each vertex.

Again, you can get from either of these two Coxeter diagrams to the other by switching V and F. That’s duality.

The numbers

But now let’s think a bit about a deeper pattern lurking around here.

Puzzle. If we take the Coxeter diagrams we’ve just seen:

V—3—E—5—F
V—3—E—4—F
V—3—E—3—F
V—4—E—3—F
V—5—E—3—F

and strip off everything but the numbers, we get these ordered pairs:

(3,5),   (3,4),   (3,3),   (4,3),   (5,3)

Why do these pairs and only these pairs give Platonic solids? I’ve listed them in a cute way just for fun, but that’s not the point.

There could be a number of perfectly correct ways to tackle this puzzle. But I have one in mind, so maybe I should give you a couple of clues to nudge you toward my way of thinking—though I’d be happy to hear other ways, too!

First, what about this pair?

(3,6)

Well, last time we looked at the corresponding Coxeter diagram:

V—3—E—6—F

and we saw it doesn’t come from a Platonic solid. Instead, it comes from this tiling of the plane:

What I’m looking for is an equation or something like that, which holds only for the pairs of numbers that give Platonic solids. And it should work for some good reason, not by coincidence!

One more clue. If your equation, or whatever it is, allows extra solutions like (2,n) or (n,2), don’t be discouraged! There are weird degenerate Platonic solids called hosohedra, with just two vertices, like this:

You can’t make the faces flat, but you can still draw it on a sphere, and in some ways that’s more important. The Coxeter diagram for this guy is:

V—2—E—3—F

And each hosohedron has a dual, called a dihedron, with just two faces, like this:

The Coxeter diagram for this is:

V—3—E—2—F

So, if your answer to the puzzle allows for hosohedra and dihedra, it’s not actually bad. As you proceed deeper and deeper into this subject, you realize more and more that hosohedra and dihedra are important, even though they’re not polyhedra in the usual sense.


Symmetry and The Fourth Dimension (Part 3)

22 July, 2012

So, where were we? I got a bit distracted by other things. So let me review, but also push ahead a bit further. This time we’ll see what a ‘Coxeter diagram’ is. Later we’ll use Coxeter diagrams as ways to describe lots of wonderful shapes.

Platonic solids and their Coxeter complexes

In Part 1 we started by looking at the five Platonic solids:

We saw that each Platonic solid has a bunch of symmetries where we reflect it across planes called mirrors, which all intersect at the center of the solid. If we take a sphere and slice it with these mirrors, it gets chopped up into triangles, and we get a pattern called a Coxeter complex.

Let’s pick a random Platonic solid and see how it works. For example, the dodecahedron:

gives this Coxeter complex:

In simple terms: we puff up our Platonic solid into a nice round sphere, and then chop its faces into triangles drawn on this sphere. That’s how it always works. And you’ll notice that each triangle has:

• one corner at the center of a face of our Platonic solid,

• one corner at the center of an edge of our Platonic solid,

• one corner at a vertex of our Platonic solid.

So any triangle determines a vertex, edge and face of our Platonic solid! But we don’t get just any old vertex, edge and face this way. The vertex has to lie on the edge, and the edge has to lie on the face.

That’s how it always works—check and see! So, let’s make up a definition:

Definition. Given a Platonic solid, a flag is a vertex, edge and face where the vertex lies on the edge and the edge lies on the face.

In fact, a Platonic solid always gives a Coxeter complex with one triangle for each flag.

Coxeter groups

Next, in Part 2, we looked at operations that flip these triangles around. I showed you how it works for the cube. I was too lazy to puff it up into a sphere, but I chopped its surface into triangles, one for each flag:

This is the lazy man’s way to draw the Coxeter complex of the cube.

Then I studied operations that flip triangles over to triangles that touch them, One operation, called V, changes which vertex of the cube our triangle contains. For example, if we start with this black triangle:

the operation V changes it to this blue one:

Another operation, called E, changes which edge of the cube our triangle touches:

And a third, called F, changes which face of the cube our triangle lies on:

Using these operations, we can get to any triangle starting from any other. But the cool part is that these operations obey some equations! For any Platonic solid, they always obey these equations:

V2 = E2 = F2 = 1

These say flipping twice the same way gets you back where you started. But there are also three equations that are more interesting. Some of these depend on which Platonic solid we’re looking at. They can be seen using pretty pictures:

• For the cube we get this relation:

(VE)4 = 1

from this picture:

Get it? We flip our triangle to change which vertex of the cube it contains, then flip it to change which edge of the cube it touches… and after 8 flips we’re back to where we started, so:

VEVEVEVE = 1

or for short:

(VE)4 = 1

We get a 4 in the exponent here because each face of the cube has 4 edges and 4 vertices. So, you can easily work out how this relation goes for any Platonic solid.

• For the cube we also get this relation:

(VF)2 = 1

from this picture:

We get a 2 in the exponent here because each edge of the cube contains 2 vertices and touches 2 faces. This is true no matter what Platonic solid we have!

• And finally, for the cube we get this relation:

(EF)3 = 1

from this picture:

We get a 3 in the exponent here because each vertex of the cube touches 3 edges and 3 faces. So, you can easily work out how this relation goes for any Platonic solid, too.

The operations V, E, F, together with these relations, generate something called the Coxeter group of the Platonic solid. To know exactly what I mean here you need to know a wee bit of group theory, which is the study of symmetry. But even if you don’t, I hope you get the idea: the Coxeter group consists of all the ways we can flip one triangle in our Coxeter complex over to another by a sequence of V, E, and F flips.

Coxeter diagrams

Coxeter diagrams start out being a cute way to record the equations we just saw. Later, they’ll become an amazingly system for creating and classifying highly symmetrical structures: Platonic solids in all dimensions, the solids we get by truncating these in various ways, and more.

Today I’ll draw these diagrams a bit differently than the ways you usually see in books. That’s okay: there are lots of different ways to draw them, depending on what you’re using them for.

In my current way of doing things, the Coxeter diagram of the cube looks like this:

V—4—E—3—F

There’s an edge from the letter V to the letter E, labelled by the number 4. This means

(VE)4 = 1

And there’s an edge from the letter E to the letter F, labelled by the number 3. This means

(EF)3 = 1

But there’s no edge from the V to the F. This means

(VF)2 = 1

In other words, we make up an extra rule: if an edge would be labelled by the number 2, we leave it invisible. The reason is that relations of this sort are ‘boring’: they show up a lot. For example, we’ve seen that (VF)2 = 1 for all Platonic solids.

We also don’t bother to record these relations:

V2 = E2 = F2 = 1

because these too are ‘boring’. We’ll assume relations of this sort always hold.

Now, I’ll leave you with a puzzle:

Puzzle 1. What are the Coxeter diagrams of the five Platonic solids? What patterns do you see in these?

Coxeter diagrams of regular tilings

But so you don’t think I’m wimping out, I’ll do an example myself. The triangular tiling of the plane is not a Platonic solid:

But it’s what you’d get if you tried to build a Platonic solid where 6 equilateral triangles meet at each vertex! And we can still define a Coxeter complex for it. It looks like this:

There’s one triangle here for each flag in the triangular tiling. And if you stare at this picture, you can read off the equations for the Coxeter group, just as we did for the cube. We have the boring equations

V2 = E2 = F2 = 1

but also:

(VE)3 = 1

since each face of the triangular tiling has 3 vertices and 3 edges, and:

(VF)2 = 1

since each edge of this tiling contains 2 vertices and touches 2 faces, and:

(EF)6 = 1

since each vertex of the this tiling touches 6 edges and 6 faces.

So, here’s the Coxeter diagram of the triangular tiling:

V—3—E—6—F

Whoops, now I can’t resist giving you another puzzle! There are two more regular tilings of the plane, so you should try those:

Puzzle 2. What are the Coxeter diagrams of the square tiling:

and the hexagonal tiling:

?


Network Theory (Part 19)

18 July, 2012

joint with Jacob Biamonte

It’s time to resume the network theory series! We’re writing a little book based on some of these posts, so we want to finish up our discussion of stochastic Petri nets and chemical reaction networks. But it’s been a long time, so we’ll assume you forgot everything we’ve said before, and make this post as self-contained as possible.

Last time we started looking at a simple example: a diatomic gas.

A diatomic molecule of this gas can break apart into two atoms:

A_2 \to A + A

and conversely, two atoms can combine to form a diatomic molecule:

A + A \to A_2

We can draw both these reactions using a chemical reaction network:

where we’re writing B instead of A_2 to abstract away some detail that’s just distracting here.

Last time we looked at the rate equation for this chemical reaction network, and found equilibrium solutions of that equation. Now let’s look at the master equation, and find equilibrium solutions of that. This will serve as a review of three big theorems.

The master equation

We’ll start from scratch. The master equation is all about how atoms or molecules or rabbits or wolves or other things interact randomly and turn into other things. So, let’s write \psi_{m,n} for the probability that we have m atoms of A and n molecule of B in our container. These probabilities are functions of time, and master equation will say how they change.

First we need to pick a rate constant for each reaction. Let’s say the rate constant for this reaction is some number \alpha > 0:

B \to A + A

while the rate constant for the reverse reaction is some number \beta > 0:

A + A \to B

Before we make it pretty using the ideas we’ve been explaining all along, the master equation says

\begin{array}{ccr} \displaystyle{ \frac{d}{d t} \psi_{m,n} (t)} &=&  \alpha (n+1) \, \psi_{m-2,n+1}(t) \\ \\   &&- \; \alpha n \, \psi_{m,n}(t) \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1}(t) \\ \\ && - \;\beta m(m-1) \, \psi_{m,n}(t) \\ \\   \end{array}

where we define \psi_{i,j} to be zero if either i or j is negative.

Yuck!

Normally we don’t show you such nasty equations. Indeed the whole point of our work has been to demonstrate that by packaging the equations in a better way, we can understand them using high-level concepts instead of mucking around with millions of scribbled symbols. But we thought we’d show you what’s secretly lying behind our beautiful abstract formalism, just once.

Each term has a meaning. For example, the third one:

\beta (m+2)(m+1)\psi_{m+2,n-1}(t)

means that the reaction A + A \to B will tend to increase the probability of there being m atoms of A and n molecules of B if we start with m+2 atoms of A and n-1 molecules of B. This reaction can happen in (m+2)(m+1) ways. And it happens at a probabilistic rate proportional to the rate constant for this reaction, \beta.

We won’t go through the rest of the terms. It’s a good exercise to do so, but there could easily be a typo in the formula, since it’s so long and messy. So let us know if you find one!

To simplify this mess, the key trick is to introduce a generating function that summarizes all the probabilities in a single power series:

\Psi = \sum_{m,n \ge 0} \psi_{m,n} y^m \, z^n

It’s a power series in two variables, y and z, since we have two chemical species: As and Bs.

Using this trick, the master equation looks like

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

where the Hamiltonian H is a sum of terms, one for each reaction. This Hamiltonian is built from operators that annihilate and create As and Bs. The annihilation and creation operators for A atoms are:

\displaystyle{ a = \frac{\partial}{\partial y} , \qquad a^\dagger = y }

The annihilation operator differentiates our power series with respect to the variable y. The creation operator multiplies it by that variable. Similarly, the annihilation and creation operators for B molecules are:

\displaystyle{ b = \frac{\partial}{\partial z} , \qquad b^\dagger = z }

In Part 8 we explained a recipe that lets us stare at our chemical reaction network and write down this Hamiltonian:

H = \alpha ({a^\dagger}^2 b - b^\dagger b) + \beta (b^\dagger a^2 - {a^\dagger}^2 a^2)

As promised, there’s one term for each reaction. But each term is itself a sum of two: one that increases the probability that our container of chemicals will be in a new state, and another that decreases the probability that it’s in its original state. We get a total of four terms, which correspond to the four terms in our previous way of writing the master equation.

Puzzle 1. Show that this new way of writing the master equation is equivalent to the previous one.

Equilibrium solutions

Now we will look for all equilibrium solutions of the master equation: in other words, solutions that don’t change with time. So, we’re trying to solve

H \Psi = 0

Given the rather complicated form of the Hamiltonian, this seems tough. The challenge looks more concrete but even more scary if we go back to our original formulation. We’re looking for probabilities \psi_{m,n}, nonnegative numbers that sum to one, such that

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

for all m and n.

This equation is horrid! But the good news is that it’s linear, so a linear combination of solutions is again a solution. This lets us simplify the problem using a conserved quantity.

Clearly, there’s a quantity that the reactions here don’t change:

What’s that? It’s the number of As plus twice the number of Bs. After all, a B can turn into two As, or vice versa.

(Of course the secret reason is that B is a diatomic molecule made of two As. But you’d be able to follow the logic here even if you didn’t know that, just by looking at the chemical reaction network… and sometimes this more abstract approach is handy! Indeed, the way chemists first discovered that certain molecules are made of certain atoms is by seeing which reactions were possible and which weren’t.)

Suppose we start in a situation where we know for sure that the number of Bs plus twice the number of As equals some number k:

\psi_{m,n} = 0  \;  \textrm{unless} \; m+2n = k

Then we know \Psi is initially of the form

\displaystyle{ \Psi = \sum_{m+2n = k} \psi_{m,n} \, y^m z^n  }

But since the number of As plus twice the number of Bs is conserved, if \Psi obeys the master equation it will continue to be of this form!

Put a fancier way, we know that if a solution of the master equation starts in this subspace:

\displaystyle{ L_k =  \{ \Psi: \; \Psi = \sum_{m+2n = k} \psi_{m,n} y^m z^n \; \textrm{for some} \; \psi_{m,n} \}  }

it will stay in this subspace. So, because the master equation is linear, we can take any solution \Psi and write it as a linear combination of solutions \Psi_k, one in each subspace L_k.

In particular, we can do this for an equilibrium solution \Psi. And then all the solutions \Psi_k are also equilibrium solutions: they’re linearly independent, so if one of them changed with time, \Psi would too.

This means we can just look for equilibrium solutions in the subspaces L_k. If we find these, we can get all equilibrium solutions by taking linear combinations.

Once we’ve noticed that, our horrid equation makes a bit more sense:

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

Note that if the pair of subscripts m, n obey m + 2n = k, the same is true for the other pairs of subscripts here! So our equation relates the values of \psi_{m,n} for all the points (m,n) with integer coordinates lying on this line segment:

m+2n = k , \qquad m ,n \ge 0

You should be visualizing something like this:

If you think about it a minute, you’ll see that if we know \psi_{m,n} at two points on such a line, we can keep using our equation to recursively work out all the rest. So, there are at most two linearly independent equilibrium solutions of the master equation in each subspace L_k.

Why at most two? Why not two? Well, we have to be a bit careful about what happens at the ends of the line segment: remember that \psi_{m,n} is defined to be zero when m or n becomes negative. If we think very hard about this, we’ll see there’s just one linearly independent equilibrium solution of the master equation in each subspace L_k. But this is the sort of nitty-gritty calculation that’s not fun to watch someone else do, so we won’t bore you with that.

Soon we’ll move on to a more high-level approach to this problem. But first, one remark. Our horrid equation is like a fancy version of the usual discretized form of the equation

\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }

namely:

\psi_{n-1} - 2 \psi_{n} + \psi_{n+1} = 0

And this makes sense, since we get

\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }

by taking the heat equation:

\displaystyle \frac{\partial \psi}{\partial t} = {\frac{\partial^2 \psi}{\partial x^2} }

and assuming \psi doesn’t depend on time. So what we’re doing is a lot like looking for equilibrium solutions of the heat equation.

The heat equation describes how heat smears out as little particles of heat randomly move around. True, there don’t really exist ‘little particles of heat’, but this equation also describes the diffusion of any other kind of particles as they randomly move around undergoing Brownian motion. Similarly, our master equation describes a random walk on this line segment:

m+2n = k , \qquad m , n \ge 0

or more precisely, the points on this segment with integer coordinates. The equilibrium solutions arise when the probabilities \psi_{m,n} have diffused as much as possible.

If you think about it this way, it should be physically obvious that there’s just one linearly independent equilibrium solution of the master equation for each value of k.

There’s a general moral here, too, which we’re seeing in a special case: the master equation for a chemical reaction network really describes a bunch of random walks, one for each allowed value of the conserved quantities that can be built as linear combinations of number operators. In our case we have one such conserved quantity, but in general there may be more (or none).

Furthermore, these ‘random walks’ are what we’ve been calling Markov processes.

Noether’s theorem

We simplified our task of finding equilibrium solutions of the master equation by finding a conserved quantity. The idea of simplifying problems using conserved quantities is fundamental to physics: this is why physicists are so enamored with quantities like energy, momentum, angular momentum and so on.

Nowadays physicists often use ‘Noether’s theorem’ to get conserved quantities from symmetries. There’s a very simple version of Noether’s theorem for quantum mechanics, but in Part 11 we saw a version for stochastic mechanics, and it’s that version that is relevant now. Here’s a paper which explains it in detail:

• John Baez and Brendan Fong, Noether’s theorem for Markov processes.

We don’t really need Noether’s theorem now, since we found the conserved quantity and exploited it without even noticing the symmetry. Nonetheless it’s interesting to see how it relates to what we’re doing.

For the reaction we’re looking at now, the idea is that the subspaces L_k are eigenspaces of an operator that commutes with the Hamiltonian H. It follows from standard math that a solution of the master equation that starts in one of these subspaces, stays in that subspace.

What is this operator? It’s built from ‘number operators’. The number operator for As is

N_A = a^\dagger a

and the number operator for Bs is

N_B = b^\dagger b

A little calculation shows

N_A \,y^m z^n = m \, y^m z^n, \quad \qquad  N_B\, y^m z^n = n \,y^m z^n

so the eigenvalue of N_A is the number of As, while the eigenvalue of N_B is the number of Bs. This is why they’re called number operators.

As a consequence, the eigenvalue of the operator N_A + 2N_B is the number of As plus twice the number of Bs:

(N_A + 2N_B) \, y^m z^n = (m + 2n) \, y^m z^n

Let’s call this operator O, since it’s so important:

O = N_A + 2N_B

If you think about it, the spaces L_k we saw a minute ago are precisely the eigenspaces of this operator:

L_k = \{ \Psi : \; O \Psi = k \Psi \}

As we’ve seen, solutions of the master equation that start in one of these eigenspaces will stay there. This lets take some techniques that are very familiar in quantum mechanics, and apply them to this stochastic situation.

First of all, time evolution as described by the master equation is given by the operators \exp(t H). In other words,

\displaystyle{ \frac{d}{d t} \Psi(t) } = H \Psi(t) \; \textrm{and} \;  \Psi(0) = \Phi \quad  \Rightarrow \quad \Psi(t) = \exp(t H) \Phi

But if you start in some eigenspace of O, you stay there. Thus if \Phi is an eigenvector of O, so is \exp(t H) \Phi, with the same eigenvalue. In other words,

O \Phi = k \Phi

implies

O \exp(t H) \Phi = k \exp(t H) \Phi = \exp(t H) O \Phi

But since we can choose a basis consisting of eigenvectors of O, we must have

O \exp(t H) = \exp(t H) O

or, throwing caution to the winds and differentiating:

O H = H O

So, as we’d expect from Noether’s theorem, our conserved quantity commutes with the Hamiltonian! This in turn implies that H commutes with any polynomial in O, which in turn suggests that

\exp(s O) H = H \exp(s O)

and also

\exp(s O) \exp(t H) = \exp(t H) \exp(s O)

The last equation says that O generates a 1-parameter family of ‘symmetries’: operators \exp(s O) that commute with time evolution. But what do these symmetries actually do? Since

O y^m z^n = (m + 2n) y^m z^n

we have

\exp(s O) y^m z^n = e^{s(m + 2n)}\, y^m z^n

So, this symmetry takes any probability distribution \psi_{m,n} and multiplies it by e^{s(m + 2n)}.

In other words, our symmetry multiplies the relative probability of finding our container of gas in a given state by a factor of e^s for each A atom, and by a factor of e^{2s} for each B molecule. It might not seem obvious that this operation commutes with time evolution! However, experts on chemical reaction theory are familiar with this fact.

Finally, a couple of technical points. Starting where we said “throwing caution to the winds”, our treatment has not been rigorous, since O and H are unbounded operators, and these must be handled with caution. Nonetheless, all the commutation relations we wrote down are true.

The operators \exp(s O) are unbounded for positive s. They’re bounded for negative s, so they give a one-parameter semigroup of bounded operators. But they’re not stochastic operators: even for s negative, they don’t map probability distributions to probability distributions. However, they do map any nonzero vector \Psi with \psi_{m,n} \ge 0 to a vector \exp(s O) \Psi with the same properties. So, we can just normalize this vector and get a probability distribution. The need for this normalization is why we spoke of relative probabilities.

The Anderson–Craciun–Kurtz theorem

Now we’ll actually find all equilibrium solutions of the master equation in closed form. To understand this final section, you really do need to remember some things we’ve discussed earlier. Last time we considered the same chemical reaction network we’re studying today, but we looked at its rate equation, which looks like this:

\displaystyle{ \frac{d}{d t} x_1 =  2 \alpha x_2 - 2 \beta x_1^2}

\displaystyle{ \frac{d}{d t} x_2 = - \alpha x_2 + \beta x_1^2 }

This describes how the number of As and Bs changes in the limit where there are lots of them and we can treat them as varying continuously, in a deterministic way. The number of As is x_1, and the number of Bs is x_2.

We saw that the quantity

x_1 + 2 x_2

is conserved, just as today we’ve seen that N_A + 2 N_B is conserved. We saw that the rate equation has one equilibrium solution for each choice of x_1 + 2 x_2. And we saw that these equilibrium solutions obey

\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }

The Anderson–Craciun–Kurtz theorem, introduced in Part 9, is a powerful result that gets equilibrium solution of the master equation from equilibrium solutions of the rate equation. It only applies to equilibrium solutions that are ‘complex balanced’, but that’s okay:

Puzzle 2. Show that the equilibrium solutions of the rate equation for the chemical reaction network

are complex balanced.

So, given any equilibrium solution (x_1,x_2) of our rate equation, we can hit it with the Anderson-Craciun-Kurtz theorem and get an equilibrium solution of the master equation! And it looks like this:

\displaystyle{  \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

In this solution, the probability distribution

\displaystyle{ \psi_{m,n} = e^{-(x_1 + x_2)} \,  \frac{x_1^m x_2^n} {m! n! } }

is a product of Poisson distributions. The factor in front is there to make the numbers \psi_{m,n} add up to one. And remember, x_1, x_2 are any nonnegative numbers with

\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }

So from all we’ve said, the above formula is an explicit closed-form solution of the horrid equation

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

That’s pretty nice. We found some solutions without ever doing any nasty calculations.

But we’ve really done better than getting some equilibrium solutions of the master equation. By restricting attention to n,m with m+2n = k, our formula for \psi_{m,n} gives an equilibrium solution that lives in the eigenspace L_k:

\displaystyle{  \Psi_k = e^{-(x_1 + x_2)} \, \sum_{m+2n =k} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

And by what we’ve said, linear combinations of these give all equilibrium solutions of the master equation.

And we got them with very little work! Despite all the fancy talk in today’s post, we essentially just took the equilibrium solutions of the rate equation and plugged them into a straightforward formula to get equilibrium solutions of the master equation. This is why the Anderson–Craciun–Kurtz theorem is so nice. And of course we’re looking at a very simple reaction network: for more complicated ones it becomes even better to use this theorem to avoid painful calculations.

We could go further. For example, we could study nonequilibrium solutions using Feynman diagrams like this:

But instead, we will leave off with two more puzzles. We introduced some symmetries, but we haven’t really explored them yet:

Puzzle 3. What do the symmetries associated to the conserved quantity O do to the equilibrium solutions of the master equation given by

\displaystyle{  \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

where (x_1,x_2) is an equilibrium solution of the rate equation? In other words, what is the significance of the one-parameter family of solutions

\exp(s O) \Psi ?

Also, we used a conceptual argument to check that H commutes with O, but it’s good to know that we can check this sort of thing directly:

Puzzle 4. Compute the commutator

[H, O] = H O - O H

and show it vanishes.


The Mathematics of Biodiversity (Part 5)

3 July, 2012

I’d be happy to get your feedback on these slides of the talk I’m giving the day after tomorrow:

• John Baez, Diversity, entropy and thermodynamics, 6 July 2012, Exploratory Conference on the Mathematics of Biodiversity, Centre de Recerca Matemàtica, Barcelona.

Abstract: As is well known, some popular measures of biodiversity are formally identical to measures of entropy developed by Shannon, Rényi and others. This fact is part of a larger analogy between thermodynamics and the mathematics of biodiversity, which we explore here. Any probability distribution can be extended to a 1-parameter family of probability distributions where the parameter has the physical meaning of ‘temperature’. This allows us to introduce thermodynamic concepts such as energy, entropy, free energy and the partition function in any situation where a probability distribution is present—for example, the probability distribution describing the relative abundances of different species in an ecosystem. The Rényi entropy of this probability distribution is closely related to the change in free energy with temperature. We give one application of thermodynamic ideas to population dynamics, coming from the work of Marc Harper: as a population approaches an ‘evolutionary optimum’, the amount of Shannon information it has ‘left to learn’ is nonincreasing. This fact is closely related to the Second Law of Thermodynamics.

This talk is rather different than the one I’d envisaged giving! There was a lot of interest in my work on Rényi entropy and thermodynamics, because Rényi entropies—and their exponentials, called the Hill numbers—are an important measure of biodiversity. So, I decided to spend a lot of time talking about that.


The Mathematics of Biodiversity (Part 4)

2 July, 2012


Today the conference part of this program is starting:

Research Program on the Mathematics of Biodiversity, June-July 2012, Centre de Recerca Matemàtica, Barcelona, Spain. Organized by Ben Allen, Silvia Cuadrado, Tom Leinster, Richard Reeve and John Woolliams.

Lou Jost kicked off the proceedings with an impassioned call to think harder about fundamental concepts:

• Lou Jost, Why biologists should care about the mathematics of biodiversity.

Then Tom Leinster gave an introduction to some of these concepts, and Lou explained how they show up in ecology, genetics, economics and physics.

Suppose we have n different species on an island. Suppose a fraction p_i of the organisms belong to the ith species. So,

\displaystyle{ \sum_{i=1}^n p_i = 1}

and mathematically we can treat these numbers as probabilities.

People have many ways to compute the ‘biodiversity’ from these numbers. Some of these can be wildly misleading when applied incorrectly, and this has led to shocking errors. For example, in genetics, a commonly used formula for determining when plants or animals on a bunch of islands will split into separate species is completely wrong.

In fact, if we’re not careful, some measures of biodiversity can fool us into thinking we’re saving most of the biodiversity when we’re actually losing almost all of it!

One good example involves measures of similarity between tropical butterflies in the canopy (the top of the forest) and the understory (the bottom). According to Lou Just, some published studies say the similarity is about 95%. That sounds like the two communities are almost the same. However, almost no butterflies living in the canopy live in the understory, and vice versa! The problem is that mathematics is being used inappropriately.

Here are four famous measures of biodiversity:

Species richness. This is just the number of species:

n

Shannon entropy. This is the expected amount of information you gain when someone tells you which species an organism belongs to:

\displaystyle{ - \sum_{i=1}^n p_i \ln(p_i) }

• The inverse Simpson index. This is the reciprocal of the probability that two randomly chosen organisms belong to the same species:

\displaystyle{ 1 \big/ \sum_{i=1}^n p_i^2 }

The probability that two organisms belong to the same species is called the Simpson index:

\displaystyle{ \sum_{i=1}^n p_i^2 }

This is used in economics as a measure of the concentration of wealth, where p_i is the fraction of wealth owned by the ith individual. Be careful: there’s a lot of different jargon in different fields, so it’s easy to get confused at first! For example, the probability that two organisms belong to different species is often called the Gini–Simpson index:

\displaystyle{ 1 - \sum_{i=1}^n p_i^2 }

It was introduced by the statistician Corrado Gini a century ago, in 1912 and the ecologist Edward H. Simpson in 1949. It’s also called the heterozygosity in genetics.

• The Berger–Parker index. This is the fraction of organisms that belong to the most common species:

\mathrm{max} \, p_i

So, unlike the other main ones I’ve listed, this quantity tends to go down when biodiversity goes up. To fix this we could take its reciprocal, as we did with the Simpson index.

What a mess, eh? But here’s some good news: all these quantities are functions of a single quantity, the Rényi entropy:

\displaystyle{ H_q(p) = \frac{1}{1 -q} \ln \sum_{i=1}^n p_i^q  }

for various values of the parameter q.

I’ve written about the Rényi entropies and their role in thermodynamics before on this blog. I’ll also talk about it later in this conference, and I’ll show you my slides. So, I won’t repeat that story here. Suffice it to say that Rényi entropies are fascinating but still a bit mysterious to me.

But one of Lou Jost’s main points is that we can make bad mistakes if we work with Rényi entropies when we should be working with their exponentials, which are called Hill numbers and denoted by a D, for ‘diversity’:

\displaystyle{ {}^qD(p) = e^{H_q(p)} =   \left(\sum_{i=1}^n p_i^q \right)^{\frac{1}{1-q}}  }

These were introduced by M. O. Hill in 1973. One reason they’re good is that they are effective numbers. This means that if all the species are equally common, the Hill number equals the number of species, regardless of q:

p_i = \frac{1}{n} \; \Longrightarrow \; {}^qD(p) = n

So, they’re a way of measuring an ‘effective’ number of species in situations where species are not all equally common.

A closely related fact is that the Hill numbers obey the replication principle. This means that if we have probability distributions on two finite sets, each with Hill number X for some choice of q, and we combine them with equal weights to get a probability distribution on the disjoint union of those sets, the resulting distribution has Hill number 2X.

Another good fact is that the Hill numbers are as large as possible when all the probabilities p_i are equal. They’re as small as possible, namely 1, when one of the p_i equals 1 and the rest are zero.

Let’s see how all the measures of biodiversity I listed are either Hill numbers or can easily be converted to Hill numbers. We’ll also see that at q = 0, the Hill number treats all species that are present in an equal way, regardless of their abundance. As q increases, it counts more abundant species more heavily, since we’re raising the probabilities p_i to a bigger power. And when q = \infty, we only care about the most abundant species: none of the others matter at all!

Here goes:

• The species richness is the limit of the Hill numbers as q \to 0 from above:

\displaystyle{ \lim_{q \to 0^+} {}^qD (p) = n }

So, we can just call this {}^0D(p).

• The exponential of the Shannon entropy is the limit of the Hill numbers as q \to 1:

\displaystyle{ \lim_{q \to 1} {}^qD(p) = \exp\left(- \sum_{i=1}^n p_i \ln(p_i)\right) }

So, we can just call this {}^1D(p).

• The inverse Simpson index is the Hill number at q = 2:

\displaystyle{  {}^2D(p) =  1 \big/ \sum_{i=1}^n p_i^2 }

• The reciprocal of the Berger–Parker index is the limit of Hill numbers as q \to +\infty:

\displaystyle{ \lim_{q \to +\infty} {}^qD(p) = 1 \big/ \mathrm{max} \, p_i }

so we can call this quantity {}^\infty D(p).

These facts mean that understanding Hill numbers will help us understand lots of measures of biodiversity! And the good properties of Hill numbers will help us avoid dangerous mistakes.

For mathematicians, a good challenge is to find theorems uniquely characterizing the Hill numbers…. preferably with assumptions that biologists will accept as plausible facts about ‘diversity’. Some theorems like this already exist for specific choices of q, but it will be better to characterize the function {}^q D for all values of q in one blow. Tom Leinster is working on such a theorem now.

Another important task is to generalize Hill numbers to take into account things like:

• ‘distances’ between species, measured either genetically, phylogenetically or functionally,

• ‘values’ for species, measured either economically or
any other way.

There’s a lot of work on this, and many of the talks here conference will discuss these generalizations.


The Mathematics of Biodiversity (Part 3)

27 June, 2012

We tend to think of biodiversity as a good thing, but sometimes it’s deadly. Yesterday Andrei Korobeinikov gave a talk on ‘Viral evolution within a host’, which was mainly about AIDS.

The virus that causes this disease, HIV, can reproduce very fast. In an untreated patient near death there are between 1010 and 1012 new virions per day! Remember, a virion is an individual virus particle. The virus also has a high mutation rate: about 3 × 10-5 mutations per generation for each base—that is, each molecule of A,T,C, or G in the RNA of the virus. That may not seem like a lot, but if you multiply it by 1012 you’ll see that a huge number of new variations of each base arise within the body of a single patient.

So, evolution is at work within you as you die.

And in fact, many scientists believe that the diversity of the virus eventually overwhelms your immune system! Although it’s apparently not quite certain, it seems that while the body generates B cells and T cells to attack different variants of HIV as they arise, they eventually can’t keep up with the sheer number of variants.

Of course, the fact that the HIV virus attacks the immune system makes the disearse even worse. Here in blue you see the number of T cells per cubic millimeter of blood, and in red you see the number of virions per cubic centimeter of blood for a typical untreated patient:

Mathematicians and physicists have looked at some very simple models to get a qualitative understanding of these issues. One famous paper that started this off is:

• Lev S. Tsimring, Herbert Levine and David A. Kessler, RNA virus evolution via a fitness-space model, Phys. Rev. Lett. 76 (1996), 4440–4443.

The idea here is to say that at any time t the viruses have a probability density p(r,t) of having fitness r. In fact the different genotypes of the virus form a cloud in a higher-dimensional space, but these authors are treating that space is 1-dimensional, with fitness as its one coordinate, just to keep things simple. They then write down an equation for how the population density changes with time:

\displaystyle{\frac{\partial }{\partial t}p(r,t) = (r - \langle r \rangle)\, p(r,t) + D \frac{\partial^2 }{\partial r}p(r,t) - \frac{\partial}{\partial r}(v_{\mathrm{drift}}\, p(r,t)) }

This is a replication-mutation-drift equation. If we just had

\displaystyle{\frac{\partial }{\partial t}p(r,t) = (r - \langle r \rangle)\, p(r,t) }

this would be a version of the replicator equation, which I explained recently in Information Geometry (Part 9). Here

\displaystyle{ \langle r \rangle = \int_0^\infty r p(r,t) dr }

is the mean fitness, and the replicator equations says that the fraction of organisms of a given type grows at a rate proportional to how much their fitness exceeds the mean fitness: that’s where the (r - \langle r \rangle) comes from.

If we just had

\displaystyle{\frac{\partial }{\partial t}p(r,t) =  D \frac{\partial^2 }{\partial r^2}p(r,t) }

this would be the heat equation, which describes diffusion occurring at a rate D. This models the mutation of the virus, though not in a very realistic way.

If we just had

\displaystyle{\frac{\partial}{\partial t} p(r,t) =  - \frac{\partial}{\partial r}(v_{\mathrm{drift}} \, p(r,t)) }

the fitness of the virus would increase at rate equal to the drift velocity v_{\mathrm{drift}}.

If we include both the diffusion and drift terms:

\displaystyle{\frac{\partial }{\partial t} p(r,t) =  D \frac{\partial^2 }{\partial r^2}p(r,t) - \frac{\partial}{\partial r}(v_{\mathrm{drift}} \, p(r,t)) }

we get the Fokker–Planck equation. This is a famous model of something that’s spreading while also drifting along at a constant velocity: for example, a drop of ink in moving water. Its solutions look like this:

Here we start with stuff concentrated at one point, and it spreads out into a Gaussian while drifting along.

By the way, watch out: what biologists call ‘genetic drift’ is actually a form of diffusion, not what physicists call ‘drift’.

More recently, people have looked at another very simple model. You can read about it here:

• Martin A. Nowak, and R. M. May, Virus Dynamics, Oxford University Press, Oxford, 2000.

In this model the variables are:

• the number of healthy human cells of some type, \mathrm{H}(t)

• the number of infected human cells of that type, \mathrm{I}(t)

• the number of virions, \mathrm{V}(t)

These are my names for variables, not theirs. It’s just a sick joke that these letters spell out ‘HIV’.

Chemists like to describe how molecules react and turn into other molecules using ‘chemical reaction networks’. You’ve seen these if you’ve taken chemistry, but I’ve been explaining more about the math of these starting in Network Theory (Part 17). We can also use them here! Though May and Nowak probably didn’t put it this way, we can consider a chemical reaction network with the following 6 reactions:

• the production of a healthy cell:

\longrightarrow \mathrm{H}

• the infection of a healthy cell by a virion:

\mathrm{H} + \mathrm{V} \longrightarrow \mathrm{I}

• the production of a virion by an infected cell:

\mathrm{I} \longrightarrow \mathrm{I} + \mathrm{V}

• the death of a healthy cell:

\mathrm{H} \longrightarrow

• the death of a infected cell:

\mathrm{I} \longrightarrow

• the death of a virion:

\mathrm{V} \longrightarrow

Using a standard recipe which I explained, we can get from this chemical reaction network to some ‘rate equations’ saying how the number of healthy cells, infected cells and virions changes with time:

\displaystyle{ \frac{d\mathrm{H}}{dt}  =   \alpha - \beta \mathrm{H}\mathrm{V} - \gamma \mathrm{H} }

\displaystyle{ \frac{d\mathrm{I}}{dt}  = \beta \mathrm{H}\mathrm{V} - \delta \mathrm{I} }

\displaystyle{ \frac{d\mathrm{V}}{dt}  = - \beta \mathrm{H}\mathrm{V} + \epsilon \mathrm{I} - \zeta \mathrm{V} }

The Greek letters are constants called ‘rate constants’, and there’s one for each of the 6 reactions. The equations we get this way are exactly those described by Nowak and May!

What Andrei Korobeinikov is to unify the ideas behind the two models I’ve described here. Alas, I don’t have the energy to explain how. Indeed, I don’t even have the energy to explain what the models I’ve described actually predict. Sad, but true.

I don’t see anything online about Korobeinikov’s new work, but you can read some of his earlier work here:

• Andrei Korobeinikov, Global properties of basic virus dynamics models.

• Suzanne M. O’Regan, Thomas C. Kelly, Andrei Korobeinikov, Michael J. A. O’Callaghan and Alexei V. Pokrovskii, Lyapunov functions for SIR and SIRS epidemic models, Appl. Math. Lett. 23 (2010), 446-448.

The SIR and SIRS models are models of disease that also arise from chemical reaction networks. I explained them back in Network Theory (Part 3). That was before I introduced the terminology of chemical reaction networks… back then I was talking about ‘stochastic Petri nets’, which are an entirely equivalent formalism. Here’s the stochastic Petri net for the SIRS model:

Puzzle: Draw the stochastic Petri net for the HIV model discussed above. It should have 3 yellow circles and 6 aqua squares.


Information Geometry (Part 13)

26 June, 2012

Last time I gave a sketchy overview of evolutionary game theory. Now let’s get serious.

I’ll start by explaining ‘Nash equilibria’ for 2-person games. These are situations where neither player can profit by changing what they’re doing. Then I’ll introduce ‘mixed strategies’, where the players can choose among several strategies with different probabilities. Then I’ll introduce evolutionary game theory, where we think of each strategy as a species, and its probability as the fraction of organisms that belong to that species.

Back in Part 9, I told you about the ‘replicator equation’, which says how these fractions change with time thanks to natural selection. Now we’ll see how this leads to the idea of an ‘evolutionarily stable strategy’. And finally, we’ll see that when evolution takes us toward such a stable strategy, the amount of information the organisms have ‘left to learn’ keeps decreasing!

Nash equilibria

We can describe a certain kind of two-person game using a payoff matrix, which is an n \times n matrix A_{ij} of real numbers. We think of A_{ij} as the payoff that either player gets if they choose strategy i and their opponent chooses strategy j.

Note that in this kind of game, there’s no significant difference between the ‘first player’ and the ‘second player’: either player wins an amount A_{ij} if they choose strategy i and their opponent chooses strategy j. So, this kind of game is called symmetric even though the matrix A_{ij} may not be symmetric. Indeed, it’s common for this matrix to be antisymmetric, meaning A_{ij} = - A_{ji}, since in this case what one player wins, the other loses. Games with this extra property are called zero-sum games. But we won’t limit ourselves to those!

We say a strategy i is a symmetric Nash equilibrium if

A_{ii} \ge A_{ji}

for all j. This means that if both players use strategy i, neither gains anything by switching to another strategy.

For example, suppose our matrix is

\left( \begin{array}{rr} -1 & -12 \\  0 & -3 \end{array} \right)

Then we’ve got the Prisoner’s Dilemma exactly as described last time! Here strategy 1 is cooperate and strategy 2 is defect. If a player cooperates and so does his opponent, he wins

A_{11} = -1

meaning he gets one month in jail. We include a minus sign because ‘winning a month in jail’ is not a good thing. If the player cooperates but his opponent defects, he gets a whole year in jail:

A_{12} = -12

If he defects but his opponent cooperates, he doesn’t go to jail at all:

A_{21} = 0

And if they both defect, they both get three months in jail:

A_{22} = -3

You can see that defecting is a Nash equilibrium, since

A_{22} \ge A_{12}

So, oddly, if our prisoners know game theory and believe Nash equilibria are best, they’ll both be worse off than if they cooperate and don’t betray each other.

    

Nash equilibria for mixed strategies

So far we’ve been assuming that with 100% certainty, each player chooses one strategy i = 1,2,3,\dots, n. Since we’ll be considering more general strategies in a minute, let’s call these pure strategies.

Now let’s throw some probability theory into the stew! Let’s allow the players to pick different pure strategies with different probabilities. So, we define a mixed strategy to be a probability distribution on the set of pure strategies. In other words, it’s a list of n nonnegative numbers

p_i \ge 0

that sum to one:

\displaystyle{ \sum_{i=1}^n p_i = 1 }

Say I choose the mixed strategy p while you, my opponent, choose the mixed strategy q. Say our choices are made independently. Then the probability that I choose the pure strategy i while you chose j is

p_i q_j

so the expected value of my winnings is

\displaystyle{ \sum_{i,j = 1}^n p_i A_{ij} q_j }

or using vector notation

p \cdot A q

where the dot is the usual dot product on \mathbb{R}^n.

We can easily adapt the concept of Nash equilibrium to mixed strategies. A mixed strategy q is a symmetric Nash equilibrium if for any other mixed strategy p,

q \cdot A q \ge  p \cdot A q

This means that if both you and I are playing the mixed strategy q, I can’t improve my expected winnings by unilaterally switching to the mixed strategy p. And neither can you, because the game is symmetric!

If this were a course on game theory, I would now do some examples. But it’s not, so I’ll just send you to page 6 of Sandholm’s paper: he looks at some famous games like ‘hawks and doves’ and ‘rock paper scissors’.

Evolutionarily stable strategies

We’re finally ready to discuss evolutionarily stable strategies. To do this, let’s reinterpret the ‘pure strategies’ i = 1,2,3, \dots n as species. Here I don’t necessarily mean species in the classic biological sense: I just mean different kinds of self-replicating entities, or replicators. For example, they could be different alleles of the same gene.

Similarly, we’ll reinterpret the ‘mixed strategy’ p as describing a mixed population of replicators, where the fraction of replicators belonging to the ith species is p_i. These numbers are still probabilities: p_i is the probability that a randomly chosen replicator will belong to the ith species.

We’ll reinterpret the payoff matrix A_{ij} as a fitness matrix. In our earlier discussion of the replicator equation, we assumed that the population P_i of the ith species grew according to the replicator equation

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) P_i }

where the fitness function f_i is any smooth function of the populations of each kind of replicator.

But in evolutionary game theory it’s common to start by looking at a simple special case where

\displaystyle{f_i(P_1, \dots, P_n)   = \sum_{j=1}^n A_{ij} p_j }

where

\displaystyle{ p_j = \frac{P_j}{\sum_k P_k} }

is the fraction of replicators who belong to the jth species.

What does this mean? The idea is that we have a well-mixed population of game players—or replicators. Each one has its own pure strategy—or species. Each one randomly roams around and ‘plays games’ with each other replicator it meets. It gets to reproduce at a rate proportional to its expected winnings.

This is unrealistic in all sorts of ways, but it’s mathematically cute, and it’s been studied a lot, so it’s good to know about. Today I’ll explain evolutionarily stable strategies only in this special case. Later I’ll go back to the general case.

Suppose that we select a sample of replicators from the overall population. What is the mean fitness of the replicators in this sample? For this, we need to know the probability that a replicator from this sample belongs to the ith species. Say it’s q_j. Then the mean fitness of our sample is

\displaystyle{ \sum_{i,j=1}^n q_i A_{ij} p_j }

This is just a weighted average of the fitnesses in our earlier formula. But using the magic of vectors, we can write this sum as

q \cdot A p

We already saw this type of expression in the last section! It’s my expected winnings if I play the mixed strategy q and you play the mixed strategy p.

John Maynard Smith defined q to be evolutionarily stable strategy if when we add a small population of ‘invaders’ distributed according to any other probability distribution p, the original population is more fit than the invaders.

In simple terms: a small ‘invading’ population will do worse than the population as a whole.

Mathematically, this means:

q \cdot A ((1-\epsilon)q + \epsilon p) >  p \cdot A ((1-\epsilon)q + \epsilon p)

for all mixed strategies p and all sufficiently small \epsilon \ge 0 . Here

(1-\epsilon)q + \epsilon p

is the population we get by replacing an \epsilon-sized portion of our original population by invaders.

Puzzle: Show that q is an evolutionarily stable strategy if and only these two conditions hold for all mixed stategies p:

q \cdot A q \ge p \cdot A q

and also, for all q \ne p,

q \cdot A q = p \cdot A q \; \implies \; q \cdot A p > p \cdot A p

The first condition says that q is a symmetric Nash equilibrium. In other words, the invaders can’t on average be better playing against the original population than members of the original population are. The second says that if the invaders are just as good at playing against the original population, they must be worse at playing against each other! The combination of these conditions means the invaders won’t take over.

Again, I should do some examples… but instead I’ll refer you to page 9 of Sandholm’s paper, and also these course notes:

• Samuel Alizon and Daniel Cownden, Evolutionary games and evolutionarily stable strategies.

• Samuel Alizon and Daniel Cownden, Replicator dynamics.

The decrease of relative information

Now comes the punchline… but with a slight surprise twist at the end. Last time we let

P = (P_1, \dots , P_n)

be a population that evolves with time according to the replicator equation, and we let p be the corresponding probability distribution. We supposed q was some fixed probability distribution. We saw that the relative information

I(q,p) = \displaystyle{ \sum_i \ln \left(\frac{q_i}{ p_i }\right) q_i }

obeys

\displaystyle{ \frac{d}{dt} I(q,p) =  (p - q) } \cdot f(P)

where f(P) is the vector of fitness functions. So, this relative information can never increase if

(p - q) \cdot f(P) \le 0

for all P.

We can adapt this to the special case we’re looking at now. Remember, right now we’re assuming

\displaystyle{f_i(P_1, \dots, P_n)   = \sum_{j=1}^n A_{ij} p_j }

so

f(P) = A p

Thus, the relative information will never increase if

(p - q) \cdot A p \le 0

or in other words,

q \cdot A p \ge p \cdot A p  \qquad \qquad \qquad \qquad \qquad \qquad  (1)

Now, this looks very similar to the conditions for an evolutionary stable strategy as stated in the Puzzle above. But it’s not the same! That’s the surprise twist.

Remember, the Puzzle says that q is an evolutionarily stable state if for all mixed strategies p we have

q \cdot A q \ge p \cdot A q  \qquad \qquad \qquad \qquad \qquad \qquad (2)

and also

q \cdot A q = p \cdot A q \; \implies \; q \cdot A p > p \cdot A p  \qquad \; (3)

Note that condition (1), the one we want, is neither condition (2) nor condition (3)! This drove me crazy for almost a day.

I kept thinking I’d made a mistake, like mixing up p and q somewhere. You’ve got to mind your p’s and q’s in this game!

But the solution turned out to be this. After Maynard Smith came up with his definition of ‘evolutionarily stable state’, another guy came up with a different definition:

• Bernhard Thomas, On evolutionarily stable sets, J. Math. Biology 22 (1985), 105–115.

For him, an evolutionarily stable strategy obeys

q \cdot A q \ge p \cdot A q  \qquad \qquad \qquad \qquad \qquad \qquad (2)

and also

q \cdot A p \ge p \cdot A p  \qquad \qquad \qquad \qquad \qquad \qquad  (1)

Condition (1) is stronger than condition (3), so he renamed Maynard Smith’s evolutionarily stable strategies weakly evolutionarily stable strategies. And condition (1) guarantees that the relative information I(q,p) can never increase. So, now we’re happy.

Except for one thing: why should we switch from Maynard Smith’s perfectly sensible concept of evolutionarily stable state to this new stronger one? I don’t really know, except that

• it’s not much stronger

and

• it lets us prove the theorem we want!

So, it’s a small mystery for me to mull over. If you have any good ideas, let me know.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers