Network Theory (Part 10)

16 September, 2011

Last time Brendan showed us a proof of the Anderson-Craciun-Kurtz theorem on equilibrium states. Today we’ll look at an example that illustrates this theorem. This example brings up an interesting ‘paradox’—or at least a puzzle. Resolving this will get us ready to think about a version of Noether’s theorem relating conserved quantities and symmetries. Next time Brendan will state and prove a version of Noether’s theorem that applies to stochastic Petri nets.

A reversible reaction

In chemistry a type of atom, molecule, or ion is called a chemical species, or species for short. Since we’re applying our ideas to both chemistry and biology, it’s nice that ‘species’ is also used for a type of organism in biology. This stochastic Petri net describes the simplest reversible reaction of all, involving two species:

We have species 1 turning into species 2 with rate constant \beta, and species 2 turning back into species 1 with rate constant \gamma. So, the rate equation is:

\begin{array}{ccc} \displaystyle{ \frac{d x_1}{d t} } &=& -\beta x_1 + \gamma x_2  \\  \qquad \mathbf{} \\ \displaystyle{ \frac{d x_2}{d t} } &=&  \beta x_1 - \gamma x_2  \end{array}

where x_1 and x_2 are the amounts of species 1 and 2, respectively.

Equilibrium solutions of the rate equation

Let’s look for equilibrium solutions of the rate equation, meaning solutions where the amount of each species doesn’t change with time. Equilibrium occurs when each species is getting created at the same rate at which it’s getting destroyed.

So, let’s see when

\displaystyle{ \frac{d x_1}{d t} = \frac{d x_2}{d t} = 0 }

Clearly this happens precisely when

\beta x_1 = \gamma x_2

This says the rate at which 1’s are turning into 2’s equals the rate at which 2’s are turning back into 1’s. That makes perfect sense.

Complex balanced equilibrium

In general, a chemical reaction involves a bunch of species turning into a bunch of species. Since ‘bunch’ is not a very dignified term, a bunch of species is usually called a complex. We saw last time that it’s very interesting to study a strong version of equilibrium: complex balanced equilibrium, in which each complex is being created at the same rate at which it’s getting destroyed.

However, in the Petri net we’re studying today, all the complexes being produced or destroyed consist of a single species. In this situation, any equilibrium solution is automatically complex balanced. This is great, because it means we can apply the Anderson-Craciun-Kurtz theorem from last time! This says how to get from a complex balanced equilibrium solution of the rate equation to an equilibrium solution of the master equation.

First remember what the master equation says. Let \psi_{n_1, n_2}(t) be the probability that we have n_1 things of species 1 and n_2 things of species 2 at time t. We summarize all this information in a formal power series:

\Psi(t) = \sum_{n_1, n_2 = 0}^\infty \psi_{n_1, n_2}(t) z_1^{n_1} z_2^{n_2}

Then the master equation says

\frac{d}{d t} \Psi(t) = H \Psi (t)

where following the general rules laid down in Part 8,

\begin{array}{ccl} H &=& \beta (a_2^\dagger - a_1^\dagger) a_1 + \gamma (a_1^\dagger - a_2^\dagger )a_2 \end{array}

This may look scary, but the annihilation operator a_i and the creation operator a_i^\dagger are just funny ways of writing the partial derivative \partial/\partial z_i and multiplication by z_i, so

\begin{array}{ccl} H &=& \displaystyle{ \beta (z_2 - z_1) \frac{\partial}{\partial z_1} + \gamma (z_1 - z_2) \frac{\partial}{\partial z_2} } \end{array}

or if you prefer,

\begin{array}{ccl} H &=& \displaystyle{ (z_2 - z_1) \, (\beta \frac{\partial}{\partial z_1} - \gamma \frac{\partial}{\partial z_2}) } \end{array}

The first term describes species 1 turning into species 2. The second describes species 2 turning back into species 1.

Now, the Anderson-Cranciun-Kurtz theorem says that whenever (x_1,x_2) is a complex balanced solution of the rate equation, this recipe gives an equilibrium solution of the master equation:

\displaystyle{ \Psi = \frac{e^{x_1 z_1 + x_2 z_2}}{e^{x_1 + x_2}} }

In other words: whenever \beta x_1 = \gamma x_2, we have

we have

H\Psi = 0

Let’s check this! For starters, the constant in the denominator of \Psi doesn’t matter here, since H is linear. It’s just a normalizing constant, put in to make sure that our probabilities \psi_{n_1, n_2} sum to 1. So, we just need to check that

\displaystyle{ (z_2 - z_1) (\beta \frac{\partial}{\partial z_1} - \gamma \frac{\partial}{\partial z_2}) e^{x_1 z_1 + x_2 z_2} = 0 }

If we do the derivatives on the left hand side, it’s clear we want

\displaystyle{ (z_2 - z_1) (\beta x_1 - \gamma x_2) e^{x_1 z_1 + x_2 z_2} = 0 }

And this is indeed true when \beta x_1 = \gamma x_2 .

So, the theorem works as advertised. And now we can work out the probability \psi_{n_1,n_2} of having n_1 things of species 1 and n_2 of species 2 in our equilibrium state \Psi. To do this, we just expand the function \Psi as a power series and look at the coefficient of z_1^{n_1} z_2^{n_2}. We have

\Psi = \displaystyle{ \frac{e^{x_1 z_1 + x_2 z_2}}{e^{x_1 + x_2}} } = \displaystyle{ \frac{1}{e^{x_1} e^{x_2}} \sum_{n_1,n_2}^\infty \frac{(x_1 z_1)^{n_1}}{n_1!} \frac{(x_2 z_2)^{n_2}}{n_2!} }

so we get

\psi_{n_1,n_2} = \displaystyle{ \frac{1}{e^{x_1}} \frac{x_1^{n_1}}{n_1!} \; \cdot \; \frac{1}{e^{x_2}} \frac{x_1^{n_2}}{n_2!} }

This is just a product of two independent Poisson distributions!

In case you forget, a Poisson distribution says the probability of k events occurring in some interval of time if they occur with a fixed average rate and independently of the time since the last event. If the expected number of events is \lambda, the Poisson distribution is

\displaystyle{ \frac{1}{e^\lambda} \frac{\lambda^k}{k!} }

and it looks like this for various values of \lambda:

It looks almost like a Gaussian when \lambda is large, but when \lambda is small it becomes very lopsided.

Anyway: we’ve seen that in our equilibrium state, the number of things of species i = 1,2 is given by a Poisson distribution with mean x_i. That’s very nice and simple… but the amazing thing is that these distributions are independent.

Mathematically, this means we just multiply them to get the probability of finding n_1 things of species 1 and n_2 of species 2. But it also means that knowing how many things there are of one species says nothing about the number of the other.

But something seems odd here. One transition in our Petri net consumes a 1 and produces a 2, while the other consumes a 2 and produces a 1. The total number of particles in the system never changes. The more 1’s there are, the fewer 2’s there should be. But we just said knowing how many 1’s we have tells us nothing about how many 2’s we have!

At first this seems like a paradox. Have we made a mistake? Not exactly. But we’re neglecting something.

Conserved quantities

Namely: the equilibrium solutions of the master equation we’ve found so far are not the only ones! There are other solutions that fit our intuitions better.

Suppose we take any of our equilibrium solutions \Psi and change it like this: set the probability \psi_{n_1,n_2} equal to 0 unless

n_1 + n_2 = n

but otherwise leave it unchanged. Of course the probabilities no longer sum to 1, but we can rescale them so they do.

The result is a new equilibrium solution, say \Psi_n. Why? Because, as we’ve already seen, no transitions will carry us from one value of n_1 + n_2 to another. And in this new solution, the number of 1’s is clearly not independent from the number of 2’s. The bigger one is, the smaller the other is.

Puzzle 1. Show that this new solution \Psi_n depends only on n and the ratio x_1/x_2 = \gamma/\beta, not on anything more about the values of x_1 and x_2 in the original solution

\Psi = \displaystyle{ \frac{e^{x_1 z_1 + x_2 z_2}}{e^{x_1 + x_2}} }

Puzzle 2. What is this new solution like when \beta = \gamma? (This particular choice makes the problem symmetrical when we interchange species 1 and 2.)

What’s happening here is that this particular stochastic Petri net has a ‘conserved quantity’: the total number of things never changes with time. So, we can take any equilibrium solution of the master equation and—in the language of quantum mechanics—`project down to the subspace’ where this conserved quantity takes a definite value, and get a new equilibrium solution. In the language of probability theory, we say it a bit differently: we’re ‘conditioning on’ the conserved quantity taking a definite value. But the idea is the same.

This important feature of conserved quantities suggests that we should try to invent a new version of Noether’s theorem. This theorem links conserved quantities and symmetries of the Hamiltonian.

There are already a couple versions of Noether’s theorem for classical mechanics, and for quantum mechanics… but now we want a version for stochastic mechanics. And indeed one exists, and it’s relevant to what we’re doing here. We’ll show it to you next time.


Network Theory (Part 9)

13 September, 2011

jointly written with Brendan Fong

Last time we reviewed the rate equation and the master equation. Both of them describe processes where things of various kinds can react and turn into other things. But:

• In the rate equation, we assume the number of things varies continuously and is known precisely.

• In the master equation, we assume the number of things varies discretely and is known only probabilistically.

This should remind you of the difference between classical mechanics and quantum mechanics. But the master equation is not quantum, it’s stochastic: it involves probabilities, but there’s no uncertainty principle going on.

Still, a lot of the math is similar.

Now, given an equilibrium solution to the rate equation—one that doesn’t change with time—we’ll try to find a solution to the master equation with the same property. We won’t always succeed—but we often can! The theorem saying how was proved here:

• D. F. Anderson, G. Craciun and T. G. Kurtz, Product-form stationary distributions for deficiency zero chemical reaction networks.

To emphasize the analogy to quantum mechanics, we’ll translate their proof into the language of annihilation and creation operators. In particular, our equilibrium solution of the master equation is just like what people call a ‘coherent state’ in quantum mechanics.

So, if you know about quantum mechanics and coherent states, you should be happy. But if you don’t, fear not!—we’re not assuming you do.

The rate equation

To construct our equilibrium solution of the master equation, we need a special type of solution to our rate equation. We call this type a ‘complex balanced solution’. This means that not only is the net rate of production of each species zero, but the net rate of production of each possible bunch of species is zero.

Before we make this more precise, let’s remind ourselves of the basic setup.

We’ll consider a stochastic Petri net with a finite set S of species and a finite set T of transitions. For convenience let’s take S = \{1,\dots, k\}, so our species are numbered from 1 to k. Then each transition \tau has an input vector m(\tau) \in \mathbb{N}^k and output vector n(\tau) \in \mathbb{N}^k. These say how many things of each species go in, and how many go out. Each transition also has rate constant r(\tau) \in [0,\infty), which says how rapidly it happens.

The rate equation concerns a vector x(t) \in [0,\infty)^k whose ith component is the number of things of the ith species at time t. Note: we’re assuming this number of things varies continuously and is known precisely! This should remind you of classical mechanics. So, we’ll call x(t), or indeed any vector in [0,\infty)^k, a classical state.

The rate equation says how the classical state x(t) changes with time:

\displaystyle{  \frac{d x}{d t} = \sum_{\tau \in T} r(\tau)\, (n(\tau)-m(\tau)) \, x^{m(\tau)} }

You may wonder what x^{m(\tau)} means: after all, we’re taking a vector to a vector power! It’s just an abbreviation, which we’ve seen plenty of times before. If x \in \mathbb{R}^k is a list of numbers and m \in \mathbb{N}^k is a list of natural numbers, we define

x^m = x_1^{m_1} \cdots x_k^{m_k}

We’ll also use this notation when x is a list of operators.

Complex balance

The vectors m(\tau) and n(\tau) are examples of what chemists call complexes. A complex is a bunch of things of each species. For example, if the set S consists of three species, the complex (1,0,5) is a bunch consisting of one thing of the first species, none of the second species, and five of the third species.

For our Petri net, the set of complexes is the set \mathbb{N}^k, and the complexes of particular interest are the input complex m(\tau) and the output complex n(\tau) of each transition \tau.

We say a classical state c \in [0,\infty)^k is complex balanced if for all complexes \kappa \in \mathbb{N}^k we have

\displaystyle{ \sum_{\{\tau : m(\tau) = \kappa\}} r(\tau) c^{m(\tau)} =\sum_{\{\tau : n(\tau) = \kappa\}} r(\tau) c^{m(\tau)}  }

The left hand side of this equation, which sums over the transitions with input complex \kappa, gives the rate of consumption of the complex \kappa. The right hand side, which sums over the transitions with output complex \kappa, gives the rate of production of \kappa . So, this equation requires that the net rate of production of the complex \kappa is zero in the classical state c .

Puzzle. Show that if a classical state c is complex balanced, and we set x(t) = c for all t, then x(t) is a solution of the rate equation.

Since x(t) doesn’t change with time here, we call it an equilibrium solution of the rate equation. Since x(t) = c is complex balanced, we call it complex balanced equilibrium solution.

The master equation

We’ve seen that any complex balanced classical state gives an equilibrium solution of the rate equation. The Anderson–Craciun–Kurtz theorem says that it also gives an equilibrium solution of the master equation.

The master equation concerns a formal power series

\displaystyle{ \Psi(t) = \sum_{n \in \mathbb{N}^k} \psi_n(t) z^n }

where

z^n = z_1^{n_1} \cdots z_k^{n_k}

and

\psi_n(t) = \psi_{n_1, \dots,n_k}(t)

is the probability that at time t we have n_1 things of the first species, n_2 of the second species, and so on.

Note: now we’re assuming this number of things varies discretely and is known only probabilistically! So, we’ll call \Psi(t), or indeed any formal power series where the coefficients are probabilities summing to 1, a stochastic state. Earlier we just called it a ‘state’, but that would get confusing now: we’ve got classical states and stochastic states, and we’re trying to relate them.

The master equation says how the stochastic state \Psi(t) changes with time:

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

where the Hamiltonian H is:

\displaystyle{ H = \sum_{\tau \in T} r(\tau) \, \left({a^\dagger}^{n(\tau)} - {a^\dagger}^{m(\tau)} \right) \, a^{m(\tau)}  \label{master} }

The notation here is designed to neatly summarize some big products of annihilation and creation operators. For any vector n \in \mathbb{N}^k, we have

a^n = a_1^{n_1} \cdots  a_k^{n_k}

and

\displaystyle{  {a^\dagger}^n = {a_1^\dagger }^{n_1} \cdots  {a_k^\dagger}^{n_k} }

Coherent states

Now suppose c \in [0,\infty)^k is a complex balanced equilibrium solution of the rate equation. We want to get an equilibrium solution of the master equation. How do we do it?

For any c \in [0,\infty)^k there is a stochastic state called a coherent state, defined by

\displaystyle{ \Psi_c = \frac{e^{c z}}{e^c} }

Here we are using some very terse abbreviations. Namely, we are defining

e^{c} = e^{c_1} \cdots e^{c_k}

and

e^{c z} = e^{c_1 z_1} \cdots e^{c_k z_k}

Equivalently,

\displaystyle{ e^{c z} = \sum_{n \in \mathbb{N}^k} \frac{c^n}{n!}z^n }

where c^n and z^n are defined as products in our usual way, and

n! = n_1! \, \cdots \, n_k!

Either way, if you unravel the abbrevations, here’s what you get:

\displaystyle{  \Psi_c = e^{-(c_1 + \cdots + c_k)} \, \sum_{n \in \mathbb{N}^k} \frac{c_1^{n_1} \cdots c_k^{n_k}} {n_1! \, \cdots \, n_k! } \, z_1^{n_1} \cdots z_k^{n_k} }

Maybe now you see why we like the abbreviations.

The name ‘coherent state’ comes from quantum mechanics. In quantum mechanics, we think of a coherent state \Psi_c as the ‘quantum state’ that best approximates the classical state c. But we’re not doing quantum mechanics now, we’re doing probability theory. \Psi_c isn’t a ‘quantum state’, it’s a stochastic state.

In probability theory, people like Poisson distributions. In the state \Psi_c, the probability of having n_i things of the ith species is equal to

\displaystyle{  e^{-c_i} \, \frac{c_i^{n_i}}{n_i!} }

This is precisely the definition of a Poisson distribution with mean equal to c_i. We can multiply a bunch of factors like this, one for each species, to get

\displaystyle{ e^{-c} \frac{c^n}{n!} }

This is the probability of having n_1 things of the first species, n_2 things of the second, and so on, in the state \Psi_c. So, the state \Psi_c is a product of independent Poisson distributions. In particular, knowing how many things there are of one species says nothing all about how many things there are of any other species!

It is remarkable that such a simple state can give an equilibrium solution of the master equation, even for very complicated stochastic Petri nets. But it’s true—at least if c is complex balanced.

The Anderson–Craciun–Kurtz theorem

Now we’re ready to state and prove the big result:

Theorem (Anderson–Craciun–Kurtz). Suppose c \in [0,\infty)^k is a complex balanced equilibrium solution of the rate equation. Then H \Psi_c = 0.

It follows that \Psi_c is an equilibrium solution of the master equation. In other words, if we take \Psi(t) = \Psi_c for all times t, the master equation holds:

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

since both sides are zero.

Proof. To prove the Anderson–Craciun–Kurtz theorem, we just need to show that H \Psi_c = 0. Since \Psi_c is a constant times e^{c z}, it suffices to show H e^{c z} = 0. Remember that

\displaystyle{ H e^{c z} =  \sum_{\tau \in T} r(\tau) \left( {a^\dagger}^{n(\tau)} -{a^\dagger}^{m(\tau)} \right) \, a^{m(\tau)} \, e^{c z} }

Since the annihilation operator a_i is given by differentiation with respect to z_i, while the creation operator a^\dagger_i is just multiplying by z_i, we have:

\displaystyle{ H e^{c z} = \sum_{\tau \in T} r(\tau) \, c^{m(\tau)} \left( z^{n(\tau)} - z^{m(\tau)} \right) e^{c z} }

Expanding out e^{c z} we get:

\displaystyle{ H e^{c z} = \sum_{i \in \mathbb{N}^k} \sum_{\tau \in T} r(\tau)c^{m(\tau)}\left(z^{n(\tau)}\frac{c^i}{i!}z^i - z^{m(\tau)}\frac{c^i}{i!}z^i\right) }

Shifting indices and defining negative powers to be zero:

\displaystyle{ H e^{c z}  = \sum_{i \in \mathbb{N}^k} \sum_{\tau \in T} r(\tau)c^{m(\tau)}\left(\frac{c^{i-n(\tau)}}{(i-n(\tau))!}z^i - \frac{c^{i-m(\tau)}}{(i-m(\tau))!}z^i\right) }

So, to show H e^{c z} = 0, we need to show this:

\displaystyle{ \sum_{i \in \mathbb{N}^k} \sum_{\tau \in T} r(\tau)c^{m(\tau)}\frac{c^{i-n(\tau)}}{(i-n(\tau))!}z^i =\sum_{i \in \mathbb{N}^k} \sum_{\tau \in T} r(\tau)c^{m(\tau)}\frac{c^{i-m(\tau)}}{(i-m(\tau))!}z^i }

We do this by splitting the sum over T according to output and then input complexes, making use of the complex balanced condition:

\displaystyle{ \sum_{i \in \mathbb{N}^k} \sum_{\kappa \in \mathbb{N}^k} \sum_{\{\tau : n(\tau)=\kappa\}}  r(\tau)c^{m(\tau)}\frac{c^{i-n(\tau)}}{(i-n(\tau))!} \, z^i  = }

\displaystyle{ \sum_{i \in \mathbb{N}^k} \sum_{\kappa \in \mathbb{N}^k} \frac{c^{i-\kappa}}{(i-\kappa)!}\, z^i \sum_{\{\tau : n(\tau) = \kappa\}}  r(\tau)c^{m(\tau)} = }

\displaystyle{\sum_{i \in \mathbb{N}^k} \sum_{\kappa \in \mathbb{N}^k} \frac{c^{i-\kappa}}{(i-\kappa)!}\, z^i \sum_{\{\tau : m(\tau) = \kappa\}}  r(\tau)c^{m(\tau)}  = }

\displaystyle{ \sum_{i \in \mathbb{N}^k} \sum_{\kappa \in \mathbb{N}^k} \sum_{\{\tau : m(\tau) = \kappa\}}  r(\tau)c^{m(\tau)}\frac{c^{i-m(\tau)}}{(i-m(\tau))!}\, z^i }

This completes the proof! It’s just algebra, but it seems a bit magical, so we’re trying to understand it better.

I hope you see how amazing this result is. If you know quantum mechanics and coherent states you’ll understand what I mean. A coherent state is the "best quantum approximation" to a classical state, but we don’t expect this quantum state to be exactly time-independent when the corresponding classical state is, except in very special cases, like when the Hamiltonian is quadratic in the creation and annihilation operators. Here we are getting a result like that much more generally… but only given the "complex balanced" condition.

An example

We’ve already seen one example of the theorem, back in Part 7. We had this stochastic Petri net:

We saw that the rate equation is just the logistic equation, familiar from population biology. The equilibrium solution is complex balanced, because pairs of amoebas are getting created at the same rate as they’re getting destroyed, and single amoebas are getting created at the same rate as they’re getting destroyed.

So, the Anderson–Craciun–Kurtz theorem guarantees that there’s an equilibrium solution of the master equation where the number of amoebas is distributed according to a Poisson distribution. And, we actually checked that this was true!

Next time we’ll look at another example.


Network Theory (Part 8)

9 September, 2011

Summer vacation is over. Time to get back to work!

This month, before he goes to Oxford to begin a master’s program in Mathematics and the Foundations of Computer Science, Brendan Fong is visiting the Centre for Quantum Technologies and working with me on stochastic Petri nets. He’s proved two interesting results, which he wants to explain.

To understand what he’s done, you need to know how to get the rate equation and the master equation from a stochastic Petri net. We’ve almost seen how. But it’s been a long time since the last article in this series, so today I’ll start with some review. And at the end, just for fun, I’ll say a bit more about how Feynman diagrams show up in this theory.

Since I’m an experienced teacher, I’ll assume you’ve forgotten everything I ever said.

(This has some advantages. I can change some of my earlier terminology—improve it a bit here and there—and you won’t even notice.)

Stochastic Petri nets

Definition. A Petri net consists of a set S of species and a set T of transitions, together with a function

i : S \times T \to \mathbb{N}

saying how many things of each species appear in the input for each transition, and a function

o: S \times T \to \mathbb{N}

saying how many things of each species appear in the output.

We can draw pictures of Petri nets. For example, here’s a Petri net with two species and three transitions:

It should be clear that the transition ‘predation’ has one wolf and one rabbit as input, and two wolves as output.

A ‘stochastic’ Petri net goes further: it also says the rate at which each transition occurs.

Definition. A stochastic Petri net is a Petri net together with a function

r: T \to [0,\infty)

giving a rate constant for each transition.

Master equation versus rate equation

Starting from any stochastic Petri net, we can get two things. First:

• The master equation. This says how the probability that we have a given number of things of each species changes with time.

• The rate equation. This says how the expected number of things of each species changes with time.

The master equation is stochastic: it describes how probabilities change with time. The rate equation is deterministic.

The master equation is more fundamental. It’s like the equations of quantum electrodynamics, which describe the amplitudes for creating and annihilating individual photons. The rate equation is less fundamental. It’s like the classical Maxwell equations, which describe changes in the electromagnetic field in a deterministic way. The classical Maxwell equations are an approximation to quantum electrodynamics. This approximation gets good in the limit where there are lots of photons all piling on top of each other to form nice waves.

Similarly, the rate equation can be derived from the master equation in the limit where the number of things of each species become large, and the fluctuations in these numbers become negligible.

But I won’t do this derivation today! Nor will I probe more deeply into the analogy with quantum field theory, even though that’s my ultimate goal. Today I’m content to remind you what the master equation and rate equation are.

The rate equation is simpler, so let’s do that first.

The rate equation

Suppose we have a stochastic Petri net with k different species. Let x_i be the number of things of the ith species. Then the rate equation looks like this:

\displaystyle{ \frac{d x_i}{d t} = ??? }

It’s really a bunch of equations, one for each 1 \le i \le k. But what is the right-hand side?

The right-hand side is a sum of terms, one for each transition in our Petri net. So, let’s start by assuming our Petri net has just one transition.

Suppose the ith species appears as input to this transition m_i times, and as output n_i times. Then the rate equation is

\displaystyle{ \frac{d x_i}{d t} = r (n_i - m_i) x_1^{m_1} \cdots x_k^{m_k} }

where r is the rate constant for this transition.

That’s really all there is to it! But we can make it look nicer. Let’s make up a vector

x = (x_1, \dots , x_k) \in [0,\infty)^k

that says how many things there are of each species. Similarly let’s make up an input vector

m = (m_1, \dots, m_k) \in \mathbb{N}^k

and an output vector

n = (n_1, \dots, n_k) \in \mathbb{N}^k

for our transition. To be cute, let’s also define

\displaystyle{ x^m = x_1^{m_1} \cdots x_k^{m_k} }

Then we can write the rate equation for a single transition like this:

\displaystyle{ \frac{d x}{d t} = r (n-m) x^m }

Next let’s do a general stochastic Petri net, with lots of transitions. Let’s write T for the set of transitions and r(\tau) for the rate constant of the transition \tau \in T. Let n(\tau) and m(\tau) be the input and output vectors of the transition \tau. Then the rate equation is:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) (n(\tau) - m(\tau)) x^{m(\tau)} }

For example, consider our rabbits and wolves:

Suppose

• the rate constant for ‘birth’ is \beta,

• the rate constant for ‘predation’ is \gamma,

• the rate constant for ‘death’ is \delta.

Let x_1(t) be the number of rabbits and x_2(t) the number of wolves at time t. Then the rate equation looks like this:

\displaystyle{ \frac{d x_1}{d t} = \beta x_1 - \gamma x_1 x_2 }

\displaystyle{ \frac{d x_2}{d t} = \gamma x_1 x_2 - \delta x_2 }

If you stare at this, and think about it, it should make perfect sense. If it doesn’t, go back and read Part 2.

The master equation

Now let’s do something new. In Part 6 I explained how to write down the master equation for a stochastic Petri net with just one species. Now let’s generalize that. Luckily, the ideas are exactly the same.

So, suppose we have a stochastic Petri net with k different species. Let \psi_{n_1, \dots, n_k} be the probability that we have n_1 things of the first species, n_2 of the second species, and so on. The master equation will say how all these probabilities change with time.

To keep the notation clean, let’s introduce a vector

n = (n_1, \dots, n_k) \in \mathbb{N}^k

and let

\psi_n = \psi_{n_1, \dots, n_k}

Then, let’s take all these probabilities and cook up a formal power series that has them as coefficients: as we’ve seen, this is a powerful trick. To do this, we’ll bring in some variables z_1, \dots, z_k and write

\displaystyle{ z^n = z_1^{n_1} \cdots z_k^{n_k} }

as a convenient abbreviation. Then any formal power series in these variables looks like this:

\displaystyle{ \Psi = \sum_{n \in \mathbb{N}^k} \psi_n z^n }

We call \Psi a state if the probabilities sum to 1 as they should:

\displaystyle{ \sum_n \psi_n = 1 }

The simplest example of a state is a monomial:

\displaystyle{  z^n = z_1^{n_1} \cdots z_k^{n_k} }

This is a state where we are 100% sure that there are n_1 things of the first species, n_2 of the second species, and so on. We call such a state a pure state, since physicists use this term to describe a state where we know for sure exactly what’s going on. Sometimes a general state, one that might not be pure, is called mixed.

The master equation says how a state evolves in time. It looks like this:

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

So, I just need to tell you what H is!

It’s called the Hamiltonian. It’s a linear operator built from special operators that annihilate and create things of various species. Namely, for each state 1 \le i \le k we have an annihilation operator:

\displaystyle{ a_i \Psi = \frac{d}{d z_i} \Psi }

and a creation operator:

\displaystyle{ a_i^\dagger \Psi = z_i \Psi }

How do we build H from these? Suppose we’ve got a stochastic Petri net whose set of transitions is T. As before, write r(\tau) for the rate constant of the transition \tau \in T, and let n(\tau) and m(\tau) be the input and output vectors of this transition. Then:

\displaystyle{ H = \sum_{\tau \in T} r(\tau) \, ({a^\dagger}^{n(\tau)} - {a^\dagger}^{m(\tau)}) \, a^{m(\tau)}  }

where as usual we’ve introduce some shorthand notations to keep from going insane. For example:

a^{m(\tau)} = a_1^{m_1(\tau)} \cdots  a_k^{m_k(\tau)}

and

{a^\dagger}^{m(\tau)} = {a_1^\dagger }^{m_1(\tau)} \cdots  {a_k^\dagger}^{m_k(\tau)}

Now, it’s not surprising that each transition \tau contributes a term to H. It’s also not surprising that this term is proportional to the rate constant r(\tau). The only tricky thing is the expression

\displaystyle{ ({a^\dagger}^{n(\tau)} - {a^\dagger}^{m(\tau)})\, a^{m(\tau)} }

How can we understand it? The basic idea is this. We’ve got two terms here. The first term:

\displaystyle{  {a^\dagger}^{n(\tau)} a^{m(\tau)} }

describes how m_i(\tau) things of the ith species get annihilated, and n_i(\tau) things of the ith species get created. Of course this happens thanks to our transition \tau. The second term:

\displaystyle{  - {a^\dagger}^{m(\tau)} a^{m(\tau)} }

is a bit harder to understand, but it says how the probability that nothing happens—that we remain in the same pure state—decreases as time passes. Again this happens due to our transition \tau.

In fact, the second term must take precisely the form it does to ensure ‘conservation of total probability’. In other words: if the probabilities \psi_n sum to 1 at time zero, we want these probabilities to still sum to 1 at any later time. And for this, we need that second term to be what it is! In Part 6 we saw this in the special case where there’s only one species. The general case works the same way.

Let’s look at an example. Consider our rabbits and wolves yet again:

and again suppose the rate constants for birth, predation and death are \beta, \gamma and \delta, respectively. We have

\displaystyle{ \Psi = \sum_n \psi_n z^n }

where

\displaystyle{ z^n = z_1^{n_1} z_2^{n_2} }

and \psi_n = \psi_{n_1, n_2} is the probability of having n_1 rabbits and n_2 wolves. These probabilities evolve according to the equation

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

where the Hamiltonian is

H = \beta B + \gamma C + \delta D

and B, C and D are operators describing birth, predation and death, respectively. (B stands for birth, D stands for death… and you can call predation ‘consumption’ if you want something that starts with C. Besides, ‘consumer’ is a nice euphemism for ‘predator’.) What are these operators? Just follow the rules I described:

\displaystyle{ B = {a_1^\dagger}^2 a_1 - a_1^\dagger a_1 }

\displaystyle{ C = {a_2^\dagger}^2 a_1 a_2 - a_1^\dagger a_2^\dagger a_1 a_2 }

\displaystyle{ D = a_2 -  a_2^\dagger a_2 }

In each case, the first term is easy to understand:

• Birth annihilates one rabbit and creates two rabbits.

• Predation annihilates one rabbit and one wolf and creates two wolves.

• Death annihilates one wolf.

The second term is trickier, but I told you how it works.

Feynman diagrams

How do we solve the master equation? If we don’t worry about mathematical rigor too much, it’s easy. The solution of

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

should be

\displaystyle{ \Psi(t) = e^{t H} \Psi(0) }

and we can hope that

\displaystyle{ e^{t H} = 1 + t H + \frac{(t H)^2}{2!} + \cdots }

so that

\displaystyle{ \Psi(t) = \Psi(0) + t H \Psi(0) + \frac{t^2}{2!} H^2 \Psi(0) + \cdots }

Of course there’s always the question of whether this power series converges. In many contexts it doesn’t, but that’s not necessarily a disaster: the series can still be asymptotic to the right answer, or even better, Borel summable to the right answer.

But let’s not worry about these subtleties yet! Let’s just imagine our rabbits and wolves, with Hamiltonian

H = \beta B + \gamma C + \delta D

Now, imagine working out

\Psi(t) = \displaystyle{ \Psi(0) + t H \Psi(0) + \frac{t^2}{2!} H^2 \Psi(0) + \frac{t^3}{3!} H^3 \Psi(0) + \cdots }

We’ll get lots of terms involving products of B, C and D hitting our original state \Psi(0). And we can draw these as diagrams! For example, suppose we start with one rabbit and one wolf. Then

\Psi(0) = z_1 z_2

And suppose we want to compute

\displaystyle{ H^3 \Psi(0) = (\beta B + \gamma C + \delta D)^3 \Psi(0) }

as part of the task of computing \Psi(t). Then we’ll get lots of terms: 27, in fact, though many will turn out to be zero. Let’s take one of these terms, for example the one proportional to:

D C B \Psi(0)

We can draw this as a sum of Feynman diagrams, including this:

In this diagram, we start with one rabbit and one wolf at top. As we read the diagram from top to bottom, first a rabbit is born (B), then predation occur (C), and finally a wolf dies (D). The end result is again a rabbit and a wolf.

This is just one of four Feynman diagrams we should draw in our sum for D C B \Psi(0), since either of the two rabbits could have been eaten, and either wolf could have died. So, the end result of computing

\displaystyle{ H^3 \Psi(0) }

will involve a lot of Feynman diagrams… and of course computing

\displaystyle{ \Psi(t) = \Psi(0) + t H \Psi(0) + \frac{t^2}{2!} H^2 \Psi(0) + \frac{t^3}{3!} H^3 \Psi(0) + \cdots }

will involve even more, even if we get tired and give up after the first few terms. So, this Feynman diagram business may seem quite tedious… and it may not be obvious how it helps.

But it does, sometimes!

Now is not the time for me to describe ‘practical’ benefits of Feynman diagrams. Instead, I’ll just point out one conceptual benefit. We started with what seemed like a purely computational chore, namely computing

\displaystyle{ \Psi(t) = \Psi(0) + t H \Psi(0) + \frac{t^2}{2!} H^2 \Psi(0) + \cdots }

But then we saw—at least roughly—how this series has a clear meaning! It can be written as a sum over diagrams, each of which represents a possible history of rabbits and wolves. So, it’s what physicists call a ‘sum over histories’.

Feynman invented the idea of a sum over histories in the context of quantum field theory. At the time this idea seemed quite mind-blowing, for various reasons. First, it involved elementary particles instead of everyday things like rabbits and wolves. Second, it involved complex ‘amplitudes’ instead of real probabilities. Third, it actually involved integrals instead of sums. And fourth, a lot of these integrals diverged, giving infinite answers that needed to be ‘cured’ somehow!

Now we’re seeing a sum over histories in a more down-to-earth context without all these complications. A lot of the underlying math is analogous… but now there’s nothing mind-blowing about it: it’s quite easy to understand. So, we can use this analogy to demystify quantum field theory a bit. On the other hand, thanks to this analogy, all sorts of clever ideas invented by quantum field theorists will turn out to have applications to biology and chemistry! So it’s a double win.


Rationality in Humans and Monkeys

15 July, 2011

Cosma Shalizi wrote a great review of this book:

• David Easley and Jon Kleinberg, Networks, Crowds and Markets: Reasoning about a Highly Connected World, Cambridge University Press, Cambridge, 2010.

Apparently this is one of the first systematic textbooks on network science, which Shalizi defines as:

the study of networks of semiautonomous but interdependent units and of the way those networks shape both the behavior of individuals and the large-scale patterns that emerge from small-scale interactions.

This is not quite the same as what I’ve been calling network theory, but I’d like to see how they fit together.

Shalizi’s review includes a great putdown, not of the book’s authors, but of the limitations of a certain concept of ‘rationality’ that’s widely used in economics:

What game theorists somewhat disturbingly call rationality is assumed throughout—in other words, game players are assumed to be hedonistic yet infinitely calculating sociopaths endowed with supernatural computing abilities.

Clearly we have to go beyond these simplifying assumptions. There’s a lot of work being done on this. One important approach is to go out and see what people actually do in various situations. And another is to compare it to what monkeys will do in the same situations!

Monkey money

Here’s a video by Laurie Santos, who has done just that:

First she taught capuchin monkeys how to use money. Then, she discovered that they make the same mistakes with money that people do!

For example, they make different decisions in what mathematically might seem like the same situation, depending on how it’s framed.

Suppose I give you $1000, and then ask which game would you rather play:

1) a game where I give you either $1000 more or nothing more, with equal odds.

2) a game where I always give you $500 more.

Most people prefer game 2), even though the average, or expected amount of money collected is the same in both games. We say such people are risk averse. Someone who loves to gamble might prefer game 1).

Like people, most capuchin monkeys chose game 2), although Santos used grapes rather than money in this particular experiment.

So, like people, it seems monkeys are risk averse. This is not a ‘mistake’: there are good reasons to be risk averse.

On other hand, suppose I give you $2000 — twice as much as before! Feel all those crisp bills… think about all the good stuff you can buy. Now, which game would you rather play:

1′) a game where I either take away $1000 or nothing, with equal odds.

2′) a game where I always take away $500.

Most people prefer game 1′). The strange thing is that mathematically, the overall situation is isomorphic to the previous one. It’s just been framed in a different way. The first situation seems to be about ‘maximizing gains’. The second seems to be about ‘minimizing losses’. In the second situation, people are more likely to accept risk, in the hopes that with some chance they won’t lose anything. This is called loss aversion.

Monkeys, too, prefer game 1′).

This suggests that loss aversion is at least 35 million years old. It’s survived a long process of evolution! To me that suggests that while ‘irrational’, it’s probably a useful heuristic in most situations that commonly arise in primate societies.

Laurie Santos has a slightly different take on it. She says:

It was Camus who once said that man is the only species who refused to be what he really is. But the irony is that it might only be by recognizing our limitations that we can really actually overcome them.

Does economics elude mathematical reasoning?

For yet another approach the enormous project of reconciling economics to the reality of human behavior, see:

• Yanis Varoufakis, Foundations of Economics: A beginner’s companion, Routledge, London, 1998.

• Yanis Varoufakis, Joseph Halevi and Nicholas J. Theocarakis, Modern Political Economics: Making Sense of the Post-2008 World, Routledge, London, 2011.

For the introduction of the first book go here. The second is described on the Routledge website:

The book is divided into two parts. The first part delves into every major economic theory, from Aristotle to the present, with a determination to discover clues of what went wrong in 2008. The main finding is that all economic theory is inherently flawed. Any system of ideas whose purpose is to describe capitalism in mathematical or engineering terms leads to inevitable logical inconsistency; an inherent error that stands between us and a decent grasp of capitalist reality. The only scientific truth about capitalism is its radical indeterminacy, a condition which makes it impossible to use science’s tools (e.g. calculus and statistics) to second-guess it. The second part casts an attentive eye on the post-war era; on the breeding ground of the Crash of 2008. It distinguishes between two major post-war phases: The Global Plan (1947-1971) and the Global Minotaur (1971-2008).

The emphasis is mine here, not because I’m sure it’s true, but because of how important it could be. It seems quite plausible to me. People seem able to squirm out of any mathematical framework we set up to describe them, short of the laws of physics. Still, I’d like to see the book’s argument. If there’s a ‘logical inconsistency’ in something, I want to actually see it.

You can get a bit more feeling for the second book in a blog post by Varoufakis. Among other things, he makes a point reminiscent of one that Phil Henshaw has repeatedly hammered home here:

Imagine a theorist that tries to explain complex evolving ecosystems by means of engineering models. What would result but incongruity and a mindset bent on misunderstanding the essence of the explanandum; a flight from that which craves explanation?

(By the way: I thank Mike Stay and Miguel Carrión-Álvarez for pointing out some items that appear in this blog entry. They did this on Google Plus. More on that later, maybe.)


Operads and the Tree of Life

6 July, 2011

This week Lisa and I are visiting her 90-year-old mother in Montréal. Friday I’m giving a talk at the Université du Québec à Montréal. The main person I know there is André Joyal, an expert on category theory and algebraic topology. So, I decided to give a talk explaining how some ideas from these supposedly ‘pure’ branches of math show up in biology.

My talk is called ‘Operads and the Tree of Life’.

Trees

In biology, trees are very important:

So are trees of a more abstract sort: phylogenetic trees describe the history of evolution. The biggest phylogenetic tree is the ‘Tree of Life’. It includes all the organisms on our planet, alive now or anytime in the past. Here’s a rough sketch of this enormous tree:

Its structure is far from fully understood. So, biologists typically study smaller phylogenetic trees, like this tree of dog-like species made by Elaine Ostrander:

Abstracting still further, we can also think of a tree as a kind of purely mathematical structure, like this:

Trees are important in combinatorics, but also in algebraic topology. The reason is that in algebraic topology we get pushed into studying spaces equipped with enormous numbers of operations. We’d get hopelessly lost without a good way of drawing these operations. We can draw an operation f with n inputs and one output as a little tree like this:

We can also draw the various ways of composing these operations. Composing them is just like building a big tree out of little trees!

An operation with n inputs and one output is called an n-ary operation. In the late 1960s, various mathematicians including Boardmann and Vogt realized that spaces with tons of n-ary operations were crucial to algebraic topology. To handle all these operations, Peter May invented the concept of an operad. This formalizes the way operations can be drawn as trees. By now operads are a standard tool, not just in topology, but also in algebraic geometry, string theory and many other subjects.

But how do operads show up in biology?

When attending a talk by Susan Holmes on phylogenetic trees, I noticed that her work on phylogenetic trees was closely related to a certain operad. And when I discussed her work here, James Griffin pointed out that this operad can be built using a slight variant of a famous construction due to Boardman and Vogt: their so-called ‘W construction’!

I liked the idea that trees and operads in topology might be related to phylogenetic trees. And thinking further, I found that the relation was real, and far from a coincidence. In fact, phylogenetic trees can be seen as operations in a certain operad… and this operad is closely related to the way computational biologists model DNA evolution as a branching sort of random walk.

That’s what I’d like to explain now.

I’ll be a bit sketchy, because I’d rather get across the basic ideas than the technicalities. I could even be wrong about some fine points, and I’d be glad to talk about those in the comments. But the overall picture is solid.

Phylogenetic trees

First, let’s ponder the mathematical structure of a phylogenetic tree. First, it’s a tree: a connected graph with no circuits. Second, it’s a rooted tree, meaning it has one vertex which is designated the root. And third, the leaves are labelled.

I should explain the third part! For any rooted tree, the vertices with just one edge coming out of them are called leaves. If the root is drawn at the bottom of the tree, the leaves are usually drawn at the top. In biology, the leaves are labelled by names of species: these labels matter. In mathematics, we can label the leaves by numbers 1, 2, \dots, n, where n is the number of leaves.

Summarizing all this, we can say a phylogenetic tree should at least be a leaf-labelled rooted tree.

That’s not all there is to it. But first, a comment. When you see a phylogenetic tree drawn by a biologist, it’ll pretty much always a binary tree, meaning that as we move up any edge, away from the root, it either branches into two new edges or ends in a leaf. The reason is that while species often split into two as they evolve, it is less likely for a species to split into three or more new species all at once.

So, the phylogenetic trees we see in biology are usually leaf-labeled rooted binary trees. However, we often want to guess such a tree from some data. In this game, trees that aren’t binary become important too!

Why? Well, here another fact comes into play. In a phylogenetic tree, typically each edge can be labeled with a number saying how much evolution occurred along that edge. But as this number goes to zero, we get a tree that’s not binary anymore. So, we think of non-binary trees as conceptually useful ‘borderline cases’ between binary trees.

So, it’s good to think about phylogenetic trees that aren’t necessarily binary… and have edges labelled by numbers. Let’s make this into a formal definition:

Definition A phylogenetic tree is a leaf-labeled rooted tree where each edge not touching a leaf is labeled by a positive real number called its length.

By the way, I’m not claiming that biologists actually use this definition. I’ll write \mathrm{Phyl}_n for the set of phylogenetic trees with n leaves. This becomes a topological space in a fairly obvious way, where we can trace out a continuous path by continuously varying the edge lengths of a tree. But when some edge lengths approach zero, our graph converges to one where the vertices at ends of these edges ‘fuse into one’, leaving us with a graph with fewer vertices.

Here’s an example for you to check your understanding of what I just said. With the topology I’m talking about, there’s a continuous path in \mathrm{Phyl}_4 that looks like this:

These trees are upside-down, but don’t worry about that. You can imagine this path as a process where biologists slowly change their minds about a phylogenetic tree as new data dribbles in. As they change their minds, the tree changes shape in a continuous way.

For more on the space of phylogenetic trees, see:

• Louis Billera, Susan Holmes and Karen Vogtmann, Geometry of the space of phylogenetic trees, Advances in Applied Mathematics 27 (2001), 733-767.

Operads

How are phylogenetic trees related to operads? I have three things to say about this. First, they are the operations of an operad:

Theorem 1. There is an operad called the phylogenetic operad, or \mathrm{Phyl}, whose space of n-ary operations is \mathrm{Phyl}_n.

If you don’t know what an operad is, I’d better tell you now. They come in different flavors, and technically I’ll be using ‘symmetric topological operads’. But instead of giving the full definition, which you can find on the nLab, I think it’s better if I sketch some of the key points.

For starters, an operad O consists of a topological space O_n for each n = 0,1,2,3 \dots. The point in O_n are called the n-ary operations of O. You can visualize an n-ary operation f \in O_n as a black box with n input wires and one output wire:

Of course, this also looks like a tree.

We can permute the inputs of an n-ary operation and get a new n-ary operation, so we have an action of the permutation group S_n on O_n. You visualize this as permuting input wires:

More importantly, we can compose operations! If we have an n-ary operation f, and n more operations, say g_1, \dots, g_n, we can compose f with all the rest and get an operation called

f \circ (g_1, \dots, g_n)

Here’s how you should imagine it:

Composition and permutation must obey some laws, all of which are completely plausible if you draw them as pictures. For example, the associative law makes a composite of composites like this well-defined:

Now, these pictures look a lot like trees. So it shouldn’t come as a shock that phylogenetic trees are the operations of some operad \mathrm{Phyl}. But let’s sketch why it’s true.

First, we can permute the ‘inputs’—meaning the labels on the leaves—of any phylogenetic tree and get a new phylogenetic tree. This is obvious.

Second, and more importantly, we can ‘compose’ phylogenetic trees. How do we do this? Simple: we glue the roots of a bunch of phylogenetic trees to the leaves of another and get a new one!

More precisely, suppose we have a phylogenetic tree with n leaves, say f. And suppose we have n more, say g_1, \dots, g_n. Then we can glue the roots of g_1, \dots, g_n to the leaves of g to get a new phylogenetic tree called

f \circ (g_1, \dots, g_n)

Third and finally, all the operad laws hold. Since these laws all look obvious when you draw them using pictures, this is really easy to show.

If you’ve been paying careful attention, you should be worrying about something now. In operad theory, we think of an operation f \in O_n as having n inputs and one output. For example, this guy has 3 inputs and one output:

But in biology, we think of a phylogenetic tree as having one input and n outputs. We start with one species (or other grouping of organisms) at the bottom of the tree, let it evolve and branch, and wind up with n of them!

In other words, operad theorists read a tree from top to bottom, while biologists read it from bottom to top.

Luckily, this isn’t a serious problem. Mathematicians often use a formal trick where they take an operation with n inputs and one output and think of it as having one input and n outputs. They use the prefix ‘co-‘ to indicate this formal trick.

So, we could say that phylogenetic trees stand for ‘co-operations’ rather than operations. Soon this trick will come in handy. But not just yet!

The W construction

Boardman and Vogt had an important construction for getting new operads for old, called the ‘W construction’. Roughly speaking, if you start with an operad O, this gives a new operad \mathrm{W}(O) whose operations are leaf-labelled rooted trees where:

1) all vertices except leaves are labelled by operations of O, and a vertex with n input edges must be labelled by an n-ary operation of O,

and

2) all edges except those touching the leaves are labelled by numbers in (0,1].

If you think about it, the operations of \mathrm{W}(O) are strikingly similar to phylogenetic trees, except that:

1) in phylogenetic trees the vertices don’t seem to be labelled by operations of a operad,

and

2) we use arbitrary nonnegative numbers to label edges, instead of numbers in (0,1].

The second point is a real difference, but it doesn’t matter much: if Boardman and Vogt had used nonnegative numbers instead of numbers in (0,1] to label edges in the W construction, it would have worked just as well. Technically, they’d get a ‘weakly equivalent’ operad.

The first point is not a real difference. You see, there’s an operad called \mathrm{Comm} which has exactly one operation of each arity. So, labelling vertices by operations of \mathrm{Comm} is a completely trivial process.

As a result, we conclude:

Theorem 2. The phylogenetic operad is weakly equivalent to \mathrm{W}(\mathrm{Comm}).

If you’re not an expert on operads (such a person is called an ‘operadchik’), you may be wondering what \mathrm{Comm} stands for. The point is that operads have ‘algebras’, where the abstract operations of the operad are realized as actual operations on some topological space. And the algebras of \mathrm{Comm} are precisely commutative topological monoids: that is, topological spaces equipped with a commutative associative product!

Branching Markov processes and evolution

By now, if you haven’t fallen asleep, you should be brimming with questions, such as:

1) What does it mean that phylogenetic trees are the operations of some operad \mathrm{Phyl}? Why should we care?

2) What does it mean to apply the W construction to the operad \mathrm{Comm}? What’s the significance of doing this?

3) What does it mean that \mathrm{Phyl} is weakly equivalent to \mathrm{W}(\mathrm{Comm})? You can see the definition of weak equivalence here, but it’s pretty technical, so it needs some explanation.

The answers to questions 2) and 3) take us quickly into fairly deep waters of category theory and algebraic topology—deep, that is, if you’ve never tried to navigate them. However, these waters are well-trawled by numerous experts, and I have little to say about questions 2) and 3) that they don’t already know. So given how long this talk already is, I’ll instead try to answer question 1). This is where some ideas from biology come into play.

I’ll summarize my answer in a theorem, and then explain what the theorem means:

Theorem 3. Given any continuous-time Markov process on a finite set X, the vector space V whose basis is X naturally becomes a coalgebra of the phylogenetic operad.

Impressive, eh? But this theorem is really just saying that biologists are already secretly using the phylogenetic operad.

Biologists who try to infer phylogenetic trees from present-day genetic data often use simple models where the genotype of each species follows a ‘random walk’. Also, species branch in two at various times. These models are called Markov models.

The simplest Markov model for DNA evolution is the Jukes–Cantor model. Consider a genome of fixed length: that is, one or more pieces of DNA having a total of N base pairs. For example, this tiny genome has N = 4 base pairs, just enough to illustrate the 4 possible choices, which are called A, T, C and G:

Since there are 4 possible choices for each base pair, there are 4^N possible genotypes with N base pairs. In the human genome, N is about 3 \times 10^9. So, there are about

4^{3 \times 10^9} \approx 10^{1,800,000,000}

genotypes of this length. That’s a lot!

As time passes, the Jukes–Cantor model says that the human genome randomly walks through this enormous set of possibilities, with each base pair having the same rate of randomly flipping to any other base pair.

Biologists have studied many ways to make this model more realistic in many ways, but in a Markov model of DNA evolution we’ll typically have some finite set X of possible genotypes, together with some random walk on this set. But the term ‘random walk’ is a bit imprecise: what I really mean is a ‘continuous-time Markov process’. So let me define that.

Fix a finite set X. For each time t \in [0,\infty) and pair of points i, j in X, a continuous-time Markov process gives a number T_{ij}(t) \in [0,1] saying the probability that starting at the point i at time zero, the random walk will go to the point j at time t. We can think of these numbers as forming an X \times X square matrix T(t) at each time t. We demand that four properties hold:

1) T(t) depends continuously on t.

2) For all s, t we have T(s) T(t) = T(s + t).

3) T(0) is the identity matrix.

4) For all j and t we have:

\sum_{i \in X} T_{i j}(t) = 1.

All these properties make a lot of sense if you think a bit, though condition 2) says that the random walk does not change character with the passage of time, which would be false given external events like, say, ice ages. As far as math jargon goes, conditions 1)-3) say that T is a continuous one-parameter semigroup, while condition 4) together with the fact that T_{ij}(t) \in [0,1] says that at each time, T(t) is a stochastic matrix.

Let V be the vector space whose basis is X. To avoid getting confused, let’s write e_i for the basis vector corresponding to i \in X. Any probability distribution on X gives a vector in V. Why? Because it gives a probability \psi_i for each i \in X, and we can think of these as the components of a vector \psi \in V.

Similarly, for any time t \in [0,\infty), we can think of the matrix T(t) as a linear operator

T(t) : V \to V

So, if we start with some probability distribution \psi of genotypes, and let them evolve for a time t according to our continuous-time Markov process, by the end the probability distribution will be T(t) \psi.

But species do more than evolve this way: they also branch! A phylogenetic tree describes a way for species to evolve and branch.

So, you might hope that any phylogenetic tree f \in \mathrm{Phyl}_n gives a ‘co-operation’ that takes one probability distribution \psi \in V as input and returns n probability distributions as output.

That’s true. But these n probability distributions will be correlated, so it’s better to think of them as a single probability distribution on the set X^n. This can be seen as a vector in the vector space V^{\otimes n}, the tensor product of n copies of V.

So, any phylogenetic tree f \in \mathrm{Phyl}_n gives a linear operator from V to V^{\otimes n}. We’ll call it

T(f) : V \to V^{\otimes n}

because we’ll build it starting from the Markov process T.

Here’s a sketch of how we build it—I’ll give a more precise account in the next and final section. A phylogenetic tree is made of a bunch of vertices and edges. So, I just need to give you an operator for each vertex and each edge, and you can compose them and tensor them to get the operator T(f):

1) For each vertex with one edge coming in and n coming out:

we need an operator

V \to V^{\otimes n}

that describes what happens when one species branches into n species. This operator takes the probability distribution we put in and makes n identical and perfectly correlated copies. To define this operator, we use the fact that the vector space V has a basis e_i labelled by the genotypes i \in X. Here’s how the operator is defined:

e_i \mapsto e_i \otimes \cdots \otimes e_i \in V^{\otimes n}

2) For each edge of length t, we need an operator that describes a random walk of length t. This operator is provided by our continuous-time Markov process: it’s

T(f) : V \to V

And that’s it! By combining these two kinds of operators, one for ‘branching’ and one for ‘random walking’, we get a systematic way to take any phylogenetic tree f \in \mathrm{Phyl}_n and get an operator

T(f) : V \to V^{\otimes n}

In fact, these operators T(f) obey just the right axioms to make V into what’s called a ‘coalgebra’ of the phylogenetic operad. But to see this—that is, to prove Theorem 3—it helps to use a bit more operad technology.

The proof

I haven’t even defined coalgebras of operads yet. And I don’t think I’ll bother. Why not? Well, while the proof of Theorem 3 is fundamentally trivial, it’s sufficiently sophisticated that only operadchiks would enjoy it without a lengthy warmup. And you’re probably getting tired by now.

So, to most of you reading this: bye! It was nice seeing you! And I hope you sensed the real point of this talk:

Some of the beautiful structures used in algebraic topology are also lurking in biology. These structures may or may not be useful in biology… but we’ll never know if we don’t notice them and say what they are! So, it makes sense for mathematicians to spend some time looking for them.

Now, let me sketch a proof of Theorem 3. It follows from a more general theorem:

Theorem 4. Suppose V is an object in some symmetric monoidal topological category C. Suppose that V is equipped with an action of the additive monoid [0,\infty). Suppose also that V is a cocommutative coalgebra. Then V naturally becomes a coalgebra of the phylogenetic operad.

How does this imply Theorem 3? In Theorem 3, C is the category of finite-dimensional real vector space. The action of [0,\infty) on V is the continuous-time Markov process. And V becomes a cocommutative coalgebra because it’s a vector space with a distinguished basis, namely the finite set X. This makes V into a cocommutative coalgebra in the usual way, where the comultiplication:

\Delta: V \to V \otimes V

‘duplicates’ basis vectors:

\Delta : e_i \mapsto e_i \otimes e_i

while the counit:

\epsilon : V \to \mathbb{R}

‘deletes’ them:

\epsilon : e_i \to 1

These correspond to species splitting in two and species going extinct, respectively. (Biologists trying to infer phylogenetic trees often ignore extinction, but it’s mathematically and biologically natural to include it.) So, all the requirements are met to apply Theorem 4 and make V into coalgebra of the phylogenetic operad.

But how do we prove Theorem 4? It follows immediately from Theorem 5:

Theorem 5. The phylogenetic operad \mathrm{Phyl} is the coproduct of the operad \mathrm{Comm} and the additive monoid [0,\infty), viewed as an operad with only 1-ary operations.

Given how coproducts works, this means that an algebra of both \mathrm{Comm} and [0,\infty) is automatically an algebra of \mathrm{Phyl}. In other words, any commutative algebra with an action of [0,\infty) is an algebra of \mathrm{Phyl}. Dualizing, it follows that any cocommutative coalgebra with an action of [0,\infty) is an coalgebra of \mathrm{Phyl}. And that’s Theorem 4!

But why is Theorem 5 true? First of all, I should emphasize that the idea of using it was suggested by Tom Leinster in our last blog conversation on the phylogenetic operad. And in fact, Tom proved a result very similar to Theorem 5 here:

• Tom Leinster, Coproducts of operads, and the W-construction, 14 September 2000.

He gives an explicit description of the coproduct of an operad O and a monoid, viewed as an operad with only unary operations. He works with non-symmetric, non-topological operads, but his ideas also work for symmetric, topological ones. Applying his ideas to the coproduct of \mathrm{Comm} and [0,\infty), we see that we get the phylogenetic operad!

And so, phylogenetic trees turn out to be related to coproducts of operads. Who’d have thought it? But we really don’t have as many fundamentally different ideas as you might think: it’s hard to have new ideas. So if you see biologists and algebraic topologists both drawing pictures of trees, you should expect that they’re related.


Networks and Population Biology (Part 4)

6 May, 2011

Today was the last day of the tutorials on discrete mathematics and probability in networks and population biology. Persi Diaconis gave two talks, one on ‘exponential families’ of random graphs and one on ‘exchangeability’. Since there’s way too much to summarize, I’ll focus on explaining ideas from the first talk, leaving you to read about the second here:

• Persi Diaconis and Svante Janson, Graph limits and exchangeable random graphs.

Susan Holmes also gave two talks. The first was on metagenomics and the human microbiome—very cool stuff. Did you know that your body contains 100 trillion bacteria, and only 10 trillion human cells? And you’ve got 1000 species of bacteria in your gut? Statistical ecologists are getting very interested in this.

Her second talk was about doing statistics when you’ve got lots of data of different kinds that need to be integrated: numbers, graphs and trees, images, spatial information, and so on. This is clearly the wave of the future. You can see the slides for this talk here:

• Susan Holmes, Heterogeneous data challenge: combining complex data.

The basic idea of Persi Diaconis’ talk was simple and shocking. Suppose you choose a random graph in the most obvious way designed to heighten the chance that it contains a triangle, or some other figure. Then in fact all you’ve done is change the chance that there’s an edge between any given pair of vertices!

But to make this precise—to make it even true—we need to say what the rules are.

For starters, let me point you back to part 2 for Persi’s definitions of ‘graph’ and ‘graph homomorphism’. If we fix a finite set \{1,\dots, n\}, there will be a big set \mathcal{G}_n of graphs with exactly these vertices. To define a kind of ‘random graph’, we first pick a probability measure on each set \mathcal{G}_n. Then, we demand that these probability measures converge in a certain sense as n \to \infty.

However, we can often describe random graphs in a more intuitive way! For example, the simplest random graphs are the Erdős–Rényi random graphs. These depend on a parameter p \in [0,1]. The idea here is that we take our set of vertices and for each pair we flip a coin that lands heads up with probability p. If it lands heads up, we stick in an edge between those vertices; otherwise not. So, the presence or absence of each edge is an independent random variable.

Here’s a picture of an Erdős–Rényi random graph drawn by von Frisch, with a 1% chance of an edge between any two vertices. But it’s been drawn in a way so that the best-connected vertices are near the middle, so it doesn’t look as random as it is:

People have studied the Erdős–Rényi random graphs very intensively, so now people are eager to study random graphs with more interesting correlations. For example, consider the graph where we draw an edge between any two people who are friends. If you’re my friend and I’m friends with someone else, that improves the chances that you’re friends with them! In other words, friends tend to form ‘triangles’. But in an Erdős–Rényi random graph there’s no effect like that.

‘Exponential families’ of random graphs seem like a way around this problem. The idea here is to pick a specific collection of graphs H_1, \dots, H_k and say how commonly we want these to appear in our random graph. If we only use one graph H_1, and we take this to be two vertices connected by an edge, we’ll get an Erdős–Rényi random graph. But, if we also want our graph to contain a lot of triangles, we can pick H_2 to be a triangle.

More precisely, remember from part 2 that t(H_i,G) is the fraction of functions mapping H_i to vertices of G that are actually graph homomorphisms. This is the smart way to keep track of how often H_i shows up inside G. So, we pick some numbers \beta_1, \dots , \beta_k and define a probability measure on \mathcal{G}_n as follows: the probability of any particular graph G \in \mathcal{G}_n should be proportional to

\displaystyle{\exp \left( \beta_1 \, t(H_1, G) + \cdots + \beta_k \, t(H_k, G) \right)}

If you’re a physicist you’ll call this a ‘Gibbs state’, and you’ll know this is the way to get a probability distribution that maximizes entropy while holding the expected values of t(H_i, G) constant. Statisticians like to call the whole family of Gibbs states as we vary the number \beta_i an ‘exponential family’. But the cool part, for me, is that we can apply ideas from physics—namely, statistical mechanics—to graph theory.

So far we’ve got a probability measure on our set \mathcal{G}_n of graphs with n vertices. These probability measures converge in a certain sense as n \to \infty. But Diaconis and Chatterjee proved a shocking theorem: for almost all choices of the graphs H_i and numbers \beta_i > 0, these probability measures converge to an Erdős–Rényi random graph! And in the other cases, they converge to a probabilistic mixture of Erdős–Rényi random graphs.

In short, as long as the numbers \beta_i are positive, exponential families don’t buy us much. We could just work with Erdős–Rényi random graphs, or probabilistic mixtures of these. The exponential families are still very interesting to study, but they don’t take us into truly new territory.

The theorem is here:

• Sourav Chatterjee and Persi Diaconis, Estimating and understanding random graph models.

To reach new territory, we can try letting some \beta_i be negative. The paper talks about this too. Here many questions remain open!


Networks and Population Biology (Part 3)

5 May, 2011

I think today I’ll focus on one aspect of the talks Susan Holmes gave today: the space of phylogenetic trees. Her talks were full of interesting applications to genetics, but I’m afraid my summary will drift off into a mathematical daydream inspired by what she said! Luckily you can see her actual talk uncontaminated by my reveries here:

• Susan Holmes, Treespace: distances between trees.

It’s based on this paper:

• Louis Billera, Susan Holmes and Karen Vogtmann, Geometry of the space of phylogenetic trees, Advances in Applied Mathematics 27 (2001), 733-767.

As I mentioned last time, a phylogenetic tree is something like this:

More mathematically, what we see here is a tree (a connected graph with no circuits), with a distinguished vertex called the root, and n vertices of degree 1, called leaves, that are labeled with elements from some n-element set. We shall call such a thing a leaf-labelled rooted tree.

Now, the tree above is actually a binary tree, meaning that as we move up an edge, away from the root, it either branches into two new edges or ends in a leaf. (More precisely: each vertex that doesn’t have degree 1 has degree 3.) This makes sense in biology because while species often split into two as they evolve, it is less likely for a species to split into three all at once.

So, the phylogenetic trees we see in biology are usually leaf-labeled rooted binary trees. However, we often want to guess such a tree from some data. In this game, trees that aren’t binary become important too!

Why? Well, each edge of the tree can be labeled with a number saying how much evolution occurred along that edge: for example, how many DNA base pairs changed. But as this number goes to zero, we get a tree that’s not binary anymore. So, we think of non-binary trees as conceptually useful ‘intermediate cases’ between binary trees.

This idea immediately leads us to consider a topological space consisting of phylogenetic trees which are not necessarily binary. And at this point in the lecture I drifted off into a daydream about ‘operads’, which are a nice piece of mathematics that’s closely connected to this idea.

So, I will deviate slightly from Holmes and define a phylogenetic tree to be a leaf-labeled rooted tree where each edge is labeled by a number called its length. This length must be positive for every edge except the edge incident to the root; for that edge any nonnegative length is allowed.

Let’s write \mathrm{Phyl}_n for the set of phylogenetic trees with n leaves. This becomes a topological space in a fairly obvious way. For example, there’s a continuous path in \mathrm{Phyl}_3 that looks like this:

Moreover we have this fact:

Theorem. There is a topological operad called the phylogenetic operad, or \mathrm{Phyl}, whose space of n-ary operations is \mathrm{Phyl}_n for n \ge 1 and the empty set for n = 0.

If you don’t know what an operad is, don’t be scared. This mainly just means that you can glue a bunch of phylogenetic trees to the top of another one and get a new phylogenetic tree! More precisely, suppose you have a phylogenetic tree with n leaves, say T. And suppose you have n more, say T_1, \dots, T_n. Then you can glue the roots of T_1, \dots, T_n to the leaves of T to get a new phylogenetic tree called T \circ (T_1, \dots, T_n). Furthermore, this gluing operation obeys some rules which look incredibly intimidating when you write them out using symbols, but pathetically obvious when you draw them using pictures of trees. And these rules are the definition of an operad.

I would like to know if mathematicians have studied the operad \mathrm{Phyl}. It’s closely related to Stasheff’s associahedron operad, but importantly different. Operads have ‘algebras’, and the algebras of the associahedron operad are topological spaces with a product that’s ‘associative up to coherent homotopy’. I believe algebras of the phylogenetic operad are topological spaces with a commutative product that’s associative up to coherent homotopy. Has someone studied these?

In their paper Holmes and her coauthors discuss the associahedron in relation to their own work, but they don’t mention operads. I’ve found another paper that mentions ‘the space of phylogenetic trees’:

• David Speyer and Bernd Sturmfels, The tropical Grassmannian, Adv. Geom. 4 (2004), 389–411.

but they don’t seem to study the operad aspect either.

Perhaps one reason is that Holmes and her coauthors deliberately decide to ignore the labellings on the edges incident to the leaves. So, they get a space of phylogenetic trees with n leaves whose product with (0,\infty)^n is the space I’m calling \mathrm{Phyl}_n. As they mention, this simplifies the geometry a bit. However, it’s not so nice if you want an operad that accurately describes how you can build a big phylogenetic tree from smaller ones.

They don’t care about operads; they do some wonderful things with the geometry of their space of phylogenetic trees. They construct a natural metric on it, and show it’s a CAT(0) space in the sense of Gromov. This means that the triangles in this space are more skinny than those in Euclidean space—more like triangles in hyperbolic space:

They study geodesics in this space—even though it’s not a manifold, but something more singular. And so on!

There’s a lot of great geometry here. But for Holmes, all this is just preparation for doing some genomics— for example, designing statistical tests to measure how reliable the phylogenetic trees guessed from data actually are. And for that aspect, try this:

• Susan Holmes, Statistical approach to tests involving phylogenies, in O. Gascuel, editor, Mathematics of Evolution and Phylogeny, Oxford U. Press, Oxford, 2007.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers