Networks and Population Biology (Part 2)

4 May, 2011

Yesterday I was too sick to attend this conference:

Tutorials on discrete mathematics and probability in networks and population biology, Institute of Mathematical Sciences, National University of Singapore, 2-6 May 2011. Organized by Andrew Barbour, Malwina Luczak, Gesine Reinert and Rongfeng Sun.

But today I bounced back, and I want to tell you about the first lectures by Persi Diaconis and his wife Susan Holmes, both statisticians at Stanford University.

Since Persi Diaconis is one of those people who make it really obvious that being a mathematician is more fun than any other profession, I should say a bit about him. He left home at the age of 14 to become a professional stage magician, and only returned to school so that he could learn all the math needed to understand Feller’s famous book An Introduction to Probability Theory and Its Applications. Since then he has worked on the mathematics of shuffling cards and the physics of flipping coins. He also works on random matrix theory, the statistics of Riemann zeta zeroes, applications of Hopf algebras to probability theory, and all sorts of other things that don’t sound fun unless you know enough to realize that yes, they are.

In my last blog post on this conference Tim van Beek asked if Persi could really flip a coin and have it land with whatever side up he wanted. So, I asked him about that. He said yes, he could. He didn’t demonstrate it at the time, since we were just about to go to a talk. But he said you can see videos of other people doing it on the web.

Unfortunately, the videos I’ve found so far mainly convince me, not that there are people who have mastered the physical skill of controlling a coin flip, but that there are lots of tricksters out there! Here are two:

• Derrin Brown, The system – coin toss.

• EricSurf6, How to win a coin toss.

Can you figure out how they do it? If you don’t get the second one, you can watch this.

So, I have to wonder: can Persi really flip a coin and have it land the way he wants, or is just fooling people into thinking he can? Mind you, I wouldn’t consider the latter a bad thing, at least insofar as he’s still a practicing stage magician: a basic part of the craft is trickery of this sort. I should ask if he’s sworn the Magician’s Oath:

“As a magician I promise never to reveal the secret of any illusion to a non-magician, unless that one swears to uphold the Magician’s Oath in turn. I promise never to perform any illusion for any non-magician without first practicing the effect until I can perform it well enough to maintain the illusion of magic.”

Anyway, he is giving a series of talks on “Graph limits and exponential models”. There are lots of very large graphs in biology, so he plans to talk about ways to study large graphs by taking limits where the graphs become infinite. He began by describing a result so beautiful that I would like to state it here.

‘Graph’ means many things, but in his talk a graph is simple (no self-loops or multiple edges between vertices), undirected and labeled. In other words, something like this:

A graph homomorphism

\phi : G \to H

is a map sending vertices of G to vertices of H, such that whenever vertices i,j of G have an edge between them, the vertices \phi(i), \phi(j) have an edge between them in H.

Given this, we can define t(G,H) to be the fraction of the functions from vertices of G to vertices of H that are actually graph homomorphisms.

The famous graph theorist Lovasz said that a sequence of graphs G_n converges if t(G_n, H) converges for all graphs H.

Using this definition, we have:

Theorem (Aldous-Hoover, Lovasz et al). There is a compact metric space X such that each graph gives a point in X, and a sequence of graphs G_n converges iff

G_n \to x

for some point x \in X.

The space X is called the space of graph limits, and the important thing is that one can construct it rather explicitly! Persi explained how. For a written explanation, see:

• Miklos Racz, Dense graph limits, 31 October 2010.

After a break, Susan Holmes began her series of talks on “Phylogeny, trees and the human microbiome”. I can’t easily summarize all she said, but here are the slides for the first talk:

• Susan Holmes, Molecular evolution: phylogenetic tree building.

Her basic plan today was to teach us a little genomics, a little statistics, and a little bit about Markov chains, so that we could start to understand how people attempt to reconstruct phylogenetic trees from DNA data.

A phylogenetic tree is something like this:

or something less grand where, for example, you only consider species of frogs, or even HIV viruses in a single patient. These days there are programs to generate such trees given copies of the ‘same gene’ in the different organisms you’re considering. These programs have names like PhyML, FastML, RaxML and Mr. Bayes. In case you’re wondering, ‘ML’ stands for ‘maximum likelihood’ a standard technique in statistics, while Bayes was the originator of some very important ideas in statistical reasoning.

But even though there are programs to do these things, there are still lots of fascinating problems to solve in this area! And indeed, even understanding the programs is a bit of a challenge.

Since we were talking about the genetic code here recently, I’ll wrap up by mentioning one thing learned about this from Susan’s talk.

It’s common to describe the accumulation of changes in DNA using substitution models. These are continuous time Markov chains where each base pair has a given probability of mutating to another one. Often people assume this probability for each base pair is independent of what its neighbors do. The last assumption is known to be false, but that’s another story for another day! What I wanted to say is that there are two famous models. The simplest is the Jukes-Cantor model, where each of the four bases—A, T, C, and G—has an equal probability of mutating into any other. But a somewhat better model is the Kimura model, where the transitions

A ↔ T
C ↔ G

have a different rate of occuring than all the remaining possibilities. If you look at the pictures of the A, T, C, and G molecules here you’ll instantly see why:

Since I’m busy learning about continuous-time Markov chains, it’s nice to see more good examples!


Networks and Population Biology (Part 1)

2 May, 2011

There are some tutorials starting today here:

Tutorials on discrete mathematics and probability in networks and population biology, Institute of Mathematical Sciences, National University of Singapore, 2-6 May 2011. Organized by Andrew Barbour, Malwina Luczak, Gesine Reinert and Rongfeng Sun.

Rick Durrett is speaking on “Cancer modelling”. For his slides, see here, here and here. But here’s a quick taste:

Back in 1954, Armitage and Doll noticed that log-log plots of cancer incidence as a function of age are close to linear, except for breast cancer, which slows down in older women. They suggested an explanation: a chain of independent random events have to occur before cancer can start. A simple model based on a Markov process gives a simple formula for how many events it must take—see the first batch of slides for details. This work was the first of a series of ever more sophisticated multi-stage models of carcinogenesis.

One of the first models Durrett explained was the Moran process: a stochastic model of a finite population of constant size in which things of two types, say A and B are competing for dominance. I believe this model can be described by a stochastic Petri net with two states, A and B, and two transitions:

A + B \to A + A

and

A + B \to B + B

Since I like stochastic Petri nets, I’d love to add this to my collection.

Chris Cannings is talking about “Evolutionary conflict theory” and the concept of ‘evolutionary stable strategies’ for two-party games. Here’s the basic idea, in a nutshell.

Suppose a population of animals roams around randomly and whenever two meet, they engage in some sort of conflict… or more generally, any sort of ‘game’. Suppose each can choose from some set S of strategies. Suppose that if one chooses strategy i \in S and the other chooses strategy j \in S, the expected ‘payoff’ to the one is A_{ij}, while for the other it’s A_{ji}.

More generally, the animals might choose their strategies probabilistically. If the first chooses the ith strategy with probability \psi_i, and the second chooses it with probability \phi_i, then the expected payoff to the first player is

\langle \psi , A \phi \rangle

where the angle brackets are the usual inner product in L^2(S). I’m saying this in an overly fancy way, and making it look like quantum mechanics, in the hope that some bright kid out there will get some new ideas. But it’s not rocket science; the angle bracket is just a notation for this sum:

\langle \psi , A \phi \rangle = \sum_{i, j \in S} \psi_i A_{ij} \phi_j

Let me tell you what it means for a probabilistic strategy \psi to be ‘evolutionarily stable’. Suppose we have a ‘resident’ population of animals with strategy \psi and we add a few ‘invaders’ with some other strategy, say \phi. Say the fraction of animals who are invaders is some small number \epsilon, while the fraction of residents is 1 - \epsilon.

If a resident plays the game against a randomly chosen animal, its expected payoff will be

\langle \psi , A (\epsilon \phi + (1 - \epsilon) \psi) \rangle

Indeed, it’s just as if the resident was playing the game against an animal with probabilistic strategy \epsilon \phi + (1 - \epsilon) \psi! On the other hand, if an invader plays the game against a randomly chosen animal, its expected payoff will be

\langle \phi , A (\epsilon \phi + (1 - \epsilon) \psi) \rangle

The strategy \psi is evolutionarily stable if the residents do better:

\langle \psi , A (\epsilon \phi + (1 - \epsilon) \psi) \rangle \ge \langle \phi , A (\epsilon \phi + (1 - \epsilon) \psi) \rangle

for all probability distributions \phi and sufficiently small \epsilon > 0.

Canning showed us how to manipulate this condition in various ways and prove lots of nice theorems. His slides will appear online later, and then I’ll include a link to them. Naturally, I’m hoping we’ll see that a dynamical model, where animals with greater payoff get to reproduce more, has the evolutionary stable strategies as stable equilibria. And I’m hoping that some model of this sort can be described using a stochastic Petri net—though I’m not sure I see how.

On another note, I was happy to see Persi Diaconis and meet his wife Susan Holmes. Both will be speaking later in the week. Holmes is a statistician who specializes in “large, messy datasets” from biology. Lately she’s been studying ant networks! Using sophisticated image analysis to track individual ants over long periods of time, she and her coauthors have built up networks showing who meets who in ant ant colony. They’ve found, for example, that some harvester ants interact with many more of their fellows than the average ant. However, this seems to be due to their location rather than any innate proclivity. They’re the ants who hang out near the entrance of the nest!

That’s my impression from a short conversation, anyway. I should read her brand-new paper:

• Noa Pinter-Wollman, Roy Wollman, Adam Guetz, Susan Holmes and Deborah M. Gordon, The effect of individual variation on the structure and function of interaction networks in harvester ants, Journal of the Royal Society Interface, 13 April 2011.

She said this is a good book to read:

• Deborah M. Gordon, Ant Encounters: Interaction Networks and Colony Behavior, Princeton U. Press, Princeton New Jersey, 2010.

There are also lots of papers available at Gordon’s website.


Network Theory (Part 7)

27 April, 2011

guest post by Jacob Biamonte

This post is part of a series on what John and I like to call Petri net field theory. Stochastic Petri nets can be used to model everything from vending machines to chemical reactions. Chemists have proven some powerful theorems about when these systems have equilibrium states. We’re trying to bind these old ideas into our fancy framework, in hopes that quantum field theory techniques could also be useful in this deep subject. We’ll describe the general theory later; today we’ll do an example from population biology.

Those of you following this series should know that I’m the calculation bunny for this project, with John playing the role of the wolf. If I don’t work quickly, drawing diagrams and trying to keep up with John’s space-bending quasar of information, I’ll be eaten alive! It’s no joke, so please try to respond and pretend to enjoy anything you read here. This will keep me alive for longer. If I did not take notes during our meetings, lots of this stuff would have never made it here, so hope you enjoy.

Amoeba reproduction and competition

Here’s a stochastic Petri net:

It shows a world with one state, amoeba, and two transitions:

reproduction, where one amoeba turns into two. Let’s call the rate constant for this transition \alpha.

competition, where two amoebas battle for resources and only one survives. Let’s call the rate constant for this transition \beta.

We are going to analyse this example in several ways. First we’ll study the deterministic dynamics it describes: we’ll look at its rate equation, which turns out to be the logistic equation, familiar in population biology. Then we’ll study the stochastic dynamics, meaning its master equation. That’s where the ideas from quantum field theory come in.

The rate equation

If P(t) is the population of amoebas at time t, we can follow the rules explained in Part 3 and crank out this rate equation:

\displaystyle{ \frac{d P}{d t} = \alpha P - \beta P^2}

We can rewrite this as

\displaystyle{\frac{d P }{d t}= k P(1-\frac{P}{Q}) }

where

\displaystyle{ Q = \frac{\alpha}{\beta} , \qquad k = \alpha}

What’s the meaning of Q and k?

Q is the carrying capacity, that is, the maximum sustainable population the environment can support.

k is the growth rate describing the approximately exponential growth of population when P(t) is small.

It’s a rare treat to find such an important differential equation that can be solved by analytical methods. Let’s enjoy solving it.

We start by separating variables and integrating both sides:

\displaystyle{\int \frac{d P}{P (1-P/Q)} = \int k d t}

We need to use partial fractions on the left side above, resulting in

\displaystyle{\int \frac{d P}{P}} + \displaystyle{\int \frac{d P}{Q-P} } = \displaystyle{\int k d t}

and so we pick up a constant C, let

A= \pm e^{-C}

and rearrange things as

\displaystyle{\frac{Q-P}{P}=A e^{-k t} }

so the population as a function of time becomes

\displaystyle{ P(t) = \frac{Q}{1+A e^{-k t}}}

At t=0 we can determine A uniquely. We write P_0 := P(0) and A becomes

\displaystyle{ A = \frac{Q-P_0}{P_0}}

The model now becomes very intuitive. Let’s set Q = k=1 and make a plot for various values of A:

We arrive at three distinct cases:

equilibrium (A=0). The horizontal blue line corresponds to the case where the initial population P_0 exactly equals the carrying capacity. In this case the population is constant.

dieoff (A < 0). The three decaying curves above the horizontal blue line correspond to cases where initial population is higher than the carrying capacity. The population dies off over time and approaches the carrying capacity.

growth (A > 0). The four increasing curves below the horizontal blue line represent cases where the initial population is lower than the carrying capacity. Now the population grows over time and approaches the carrying capacity.

The master equation

Next, let us follow the rules explained in Part 6 to write down the master equation for our example. Remember, now we write:

\displaystyle{\Psi(t) = \sum_{n = 0}^\infty \psi_n(t) z^n }

where \psi_n(t) is the probability of having n amoebas at time t, and z is a formal variable. The master equation says

\displaystyle{\frac{d}{d t} \Psi(t) = H \Psi(t)}

where H is an operator on formal power series called the Hamiltonian. To get the Hamiltonian we take each transition in our Petri net and build an operator built from creation and annihilation operators, as follows. Reproduction works like this:

while competition works like this:

Here a is the annihilation operator, a^\dagger is the creation operator and N = a^\dagger a is the number operator. Last time John explained precisely how the N‘s arise. So the theory is already in place, and we arrive at this Hamiltonian:

H = \alpha (a^\dagger a^\dagger a - N) \;\; + \;\; \beta(a^\dagger a a - N(N-1))

Remember, \alpha is the rate constant for reproduction, while \beta is the rate constant for competition.

The master equation can be solved: it’s equivalent to

\frac{d}{d t}(e^{-t H}\Psi(t))=0

so that e^{-t H}\Psi(t) is constant, and so

\Psi(t) = e^{t H}\Psi(0)

and that’s it! We can calculate the time evolution starting from any initial probability distribution of populations. Maybe everyone is already used to this, but I find it rather remarkable.

Here’s how it works. We pick a population, say n amoebas at t=0. This would mean \Psi(0) = z^n. We then evolve this state using e^{t H}. We expand this operator as

\begin{array}{ccl} e^{t H} &=&\displaystyle{ \sum_{n=0}^\infty \frac{t^n H^n}{n!} }  \\  \\  &=& \displaystyle{ 1 + t H + \frac{1}{2}t^2 H^n + \cdots }\end{array}

This operator contains the full information for the evolution of the system. It contains the histories of all possible amoeba populations—an amoeba mosaic if you will. From this, we can construct amoeba Feynman diagrams.

To do this, we work out each of the H^n terms in the expansion above. The first-order terms correspond to the Hamiltonian acting once. These are proportional to either \alpha or \beta. The second-order terms correspond to the Hamiltonian acting twice. These are proportional to either \alpha^2, \alpha\beta or \beta^2. And so on.

This is where things start to get interesting! To illustrate how it works, we will consider two possibilities for the second-order terms:

1) We start with a lone amoeba, so \Psi(0) = z. It reproduces and splits into two. In the battle of the century, the resulting amoebas compete and one dies. At the end we have:

\frac{\alpha \beta}{2}  (a^\dagger a a)(a^\dagger a^\dagger a) z

We can draw this as a Feynman diagram:

You might find this tale grim, and you may not like the odds either. It’s true, the odds could be better, but people are worse off than amoebas! The great Japanese swordsman Miyamoto Musashi quoted the survival odds of fair sword duels as 1/3, seeing that 1/3 of the time both participants die. A remedy is to cheat, but these amoeba are competing honestly.

2) We start with two amoebas, so the initial state is \Psi(0) = z^2. One of these amoebas splits into two. One of these then gets into an argument with the original amoeba over the Azimuth blog. The amoeba who solved all John’s puzzles survives. At the end we have

\frac{\alpha \beta}{2} (a^\dagger a a)(a^\dagger a^\dagger a) z^2

with corresponding Feynman diagram:

This should give an idea of how this all works. The exponential of the Hamiltonian gives all possible histories, and each of these can be translated into a Feynman diagram. In a future blog entry, we might explain this theory in detail.

An equilibrium state

We’ve seen the equilibrium solution for the rate equation; now let’s look for equilibrium solutions of the master equation. This paper:

• D. F. Anderson, G. Craciun and T.G. Kurtz, Product-form stationary distributions for deficiency zero chemical reaction networks, arXiv:0803.3042.

proves that for a large class of stochastic Petri nets, there exists an equilibrium solution of the master equation where the number of things in each state is distributed according to a Poisson distribution. Even more remarkably, these probability distributions are independent, so knowing how many things are in one state tells you nothing about how many are in another!

Here’s a nice quote from this paper:

The surprising aspect of the deficiency zero theorem is that the assumptions of the theorem are completely related to the network of the system whereas the conclusions of the theorem are related to the dynamical properties of the system.

The ‘deficiency zero theorem’ is a result of Feinberg, which says that for a large class of stochastic Petri nets, the rate equation has a unique equilibrium solution. Anderson showed how to use this fact to get equilibrium solutions of the master equation!

We will consider this in future posts. For now, we need to talk a bit about ‘coherent states’.

These are all over the place in quantum theory. Legend (or at least Wikipedia) has it that Erwin Schrödinger himself discovered coherent states when he was looking for states of a quantum system that look ‘as classical as possible’. Suppose you have a quantum harmonic oscillator. Then the uncertainty principle says that

\Delta p \Delta q \ge \hbar/2

where \Delta p is the uncertainty in the momentum and \Delta q is the uncertainty in position. Suppose we want to make \Delta p \Delta q as small as possible, and suppose we also want \Delta p = \Delta q. Then we need our particle to be in a ‘coherent state’. That’s the definition. For the quantum harmonic oscillator, there’s a way to write quantum states as formal power series

\displaystyle{ \Psi = \sum_{n = 0}^\infty \psi_n z^n}

where \psi_n is the amplitude for having n quanta of energy. A coherent state then looks like this:

\displaystyle{ \Psi = e^{c z} = \sum_{n = 0}^\infty \frac{c^n}{n!} z^n}

where c can be any complex number. Here we have omitted a constant factor necessary to normalize the state.

We can also use coherent states in classical stochastic systems like collections of amoebas! Now the coefficient of z^n tells us the probability of having n amoebas, so c had better be real. And probabilities should sum to 1, so we really should normalize \Psi as follows:

\displaystyle{  \Psi = \frac{e^{c z}}{e^c} = e^{-c} \sum_{n = 0}^\infty \frac{c^n}{n!} z^n }

Now, the probability distribution

\displaystyle{\psi_n = e^{-c} \; \frac{c^n}{n!}}

is called a Poisson distribution. So, for starters you can think of a ‘coherent state’ as an over-educated way of talking about a Poisson distribution.

Let’s work out the expected number of amoebas in this Poisson distribution. In the answers to the puzzles in Part 6, we started using this abbreviation:

\displaystyle{ \sum \Psi = \sum_{n = 0}^\infty \psi_n }

We also saw that the expected number of amoebas in the probability distribution \Psi is

\displaystyle{  \sum N \Psi }

What does this equal? Remember that N = a^\dagger a. The annihilation operator a is just \frac{d}{d z}, so

\displaystyle{ a \Psi = c \Psi}

and we get

\displaystyle{ \sum N \Psi = \sum a^\dagger a \Psi = c \sum a^\dagger \Psi }

But we saw in Part 5 that a^\dagger is stochastic, meaning

\displaystyle{  \sum a^\dagger \Psi = \sum \Psi }

for any \Psi. Furthermore, our \Psi here has

\displaystyle{ \sum \Psi = 1}

since it’s a probability distribution. So:

\displaystyle{  \sum N \Psi = c \sum a^\dagger \Psi = c \sum \Psi = c}

The expected number of amoebas is just c.

Puzzle 1. This calculation must be wrong if c is negative: there can’t be a negative number of amoebas. What goes wrong then?

Puzzle 2. Use the same tricks to calculate the standard deviation of the number of amoebas in the Poisson distribution \Psi.

Now let’s return to our problem and consider the initial amoeba state

\displaystyle{ \Psi = e^{c z}}

Here aren’t bothering to normalize it, because we’re going to look for equilibrium solutions to the master equation, meaning solutions where \Psi(t) doesn’t change with time. So, we want to solve

\displaystyle{  H \Psi = 0}

Since this equation is linear, the normalization of \Psi doesn’t really matter.

Remember,

\displaystyle{  H\Psi = \alpha (a^\dagger a^\dagger a - N)\Psi + \beta(a^\dagger a a - N(N-1)) \Psi }

Let’s work this out. First consider the two \alpha terms:

\displaystyle{ a^\dagger a^\dagger a \Psi = c z^2 \Psi }

and

\displaystyle{ -N \Psi = -a^\dagger a\Psi = -c z \Psi}

Likewise for the \beta terms we find

\displaystyle{ a^\dagger a a\Psi=c^2 z \Psi}

and

\displaystyle{ -N(N-1)\psi = -a^\dagger a^\dagger a a \Psi = -c^2 z^2\Psi }

Here I’m using something John showed in Part 6: the product a^\dagger a^\dagger a a equals the ‘falling power’ N(N-1).

The sum of all four terms must vanish. This happens whenever

\displaystyle{ \alpha(c z^2 - c z)+\beta(c^2 z-c^2 z^2) = 0}

which is satisfied for

\displaystyle{ c= \frac{\alpha}{\beta}}

Yipee! We’ve found an equilibrium solution, since we found a value for c that makes H \Psi = 0. Even better, we’ve seen that the expected number of amoebas in this equilibrium state is

\displaystyle{ \frac{\alpha}{\beta}}

This is just the same as the equilibrium population we saw for the rate equation—that is, the carrying capacity for the logistic equation! That’s pretty cool, but it’s no coincidence: in fact, Anderson proved it works like this for lots of stochastic Petri nets.

I’m not sure what’s up next or what’s in store, since I’m blogging at gun point from inside a rabbit cage:

Give me all your blog posts, get out of that rabbit cage and reach for the sky!

I’d imagine we’re going to work out the theory behind this example and prove the existence of equilibrium solutions for master equations more generally. One idea John had was to have me start a night shift—that way you’ll get Azimuth posts 24 hours a day.


Network Theory (Part 6)

16 April, 2011

Now for the fun part. Let’s see how tricks from quantum theory can be used to describe random processes. I’ll try to make this post self-contained. So, even if you skipped a bunch of the previous ones, this should make sense.

You’ll need to know a bit of math: calculus, a tiny bit probability theory, and linear operators on vector spaces. You don’t need to know quantum theory, though you’ll have more fun if you do. What we’re doing here is very similar… but also strangely different—for reasons I explained last time.

Rabbits and quantum mechanics

Suppose we have a population of rabbits in a cage and we’d like to describe its growth in a stochastic way, using probability theory. Let \psi_n be the probability of having n rabbits. We can borrow a trick from quantum theory, and summarize all these probabilities in a formal power series like this:

\Psi = \sum_{n = 0}^\infty \psi_n z^n

The variable z doesn’t mean anything in particular, and we don’t care if the power series converges. See, in math ‘formal’ means “it’s only symbols on the page, just follow the rules”. It’s like if someone says a party is ‘formal’, so need to wear a white tie: you’re not supposed to ask what the tie means.

However, there’s a good reason for this trick. We can define two operators on formal power series, called the annihilation operator:

a \Psi = \frac{d}{d z} \Psi

and the creation operator:

a^\dagger \Psi = z \Psi

They’re just differentiation and multiplication by z, respectively. So, for example, suppose we start out being 100% sure we have n rabbits for some particular number n. Then \psi_n = 1, while all the other probabilities are 0, so:

\Psi = z^n

If we then apply the creation operator, we obtain

a^\dagger \Psi = z^{n+1}

Voilà! One more rabbit!

The annihilation operator is more subtle. If we start out with n rabbits:

\Psi = z^n

and then apply the annihilation operator, we obtain

a \Psi = n z^{n-1}

What does this mean? The z^{n-1} means we have one fewer rabbit than before. But what about the factor of n? It means there were n different ways we could pick a rabbit and make it disappear! This should seem a bit mysterious, for various reasons… but we’ll see how it works soon enough.

The creation and annihilation operators don’t commute:

(a a^\dagger - a^\dagger a) \Psi = \frac{d}{d z} (z \Psi) - z \frac{d}{d z} \Psi = \Psi

so for short we say:

a a^\dagger - a^\dagger a = 1

or even shorter:

[a, a^\dagger] = 1

where the commutator of two operators is

[S,T] = S T - T S

The noncommutativity of operators is often claimed to be a special feature of quantum physics, and the creation and annihilation operators are fundamental to understanding the quantum harmonic oscillator. There, instead of rabbits, we’re studying quanta of energy, which are peculiarly abstract entities obeying rather counterintuitive laws. So, it’s cool that the same math applies to purely classical entities, like rabbits!

In particular, the equation [a, a^\dagger] = 1 just says that there’s one more way to put a rabbit in a cage of rabbits, and then take one out, than to take one out and then put one in.

But how do we actually use this setup? We want to describe how the probabilities \psi_n change with time, so we write

\Psi(t) = \sum_{n = 0}^\infty \psi_n(t) z^n

Then, we write down an equation describing the rate of change of \Psi:

\frac{d}{d t} \Psi(t) = H \Psi(t)

Here H is an operator called the Hamiltonian, and the equation is called the master equation. The details of the Hamiltonian depend on our problem! But we can often write it down using creation and annihilation operators. Let’s do some examples, and then I’ll tell you the general rule.

Catching rabbits

Last time I told you what happens when we stand in a river and catch fish as they randomly swim past. Let me remind you of how that works. But today let’s use rabbits.

So, suppose an inexhaustible supply of rabbits are randomly roaming around a huge field, and each time a rabbit enters a certain area, we catch it and add it to our population of caged rabbits. Suppose that on average we catch one rabbit per unit time. Suppose the chance of catching a rabbit during any interval of time is independent of what happened before. What is the Hamiltonian describing the probability distribution of caged rabbits, as a function of time?

There’s an obvious dumb guess: the creation operator! However, we saw last time that this doesn’t work, and we saw how to fix it. The right answer is

H = a^\dagger - 1

To see why, suppose for example that at some time t we have n rabbits, so:

\Psi(t) = z^n

Then the master equation says that at this moment,

\frac{d}{d t} \Psi(t) = (a^\dagger - 1) \Psi(t) =  z^{n+1} - z^n

Since \Psi = \sum_{n = 0}^\infty \psi_n(t) z^n, this implies that the coefficients of our formal power series are changing like this:

\frac{d}{d t} \psi_{n+1}(t) = 1
\frac{d}{d t} \psi_{n}(t) = -1

while all the rest have zero derivative at this moment. And that’s exactly right! See, \psi_{n+1}(t) is the probability of having one more rabbit, and this is going up at rate 1. Meanwhile, \psi_n(t) is the probability of having n rabbits, and this is going down at the same rate.

Puzzle 1. Show that with this Hamiltonian and any initial conditions, the master equation predicts that the expected number of rabbits grows linearly.

Dying rabbits

Don’t worry: no rabbits are actually injured in the research that Jacob Biamonte is doing here at the Centre for Quantum Technologies. He’s keeping them well cared for in a big room on the 6th floor. This is just a thought experiment.

Suppose a mean nasty guy had a population of rabbits in a cage and didn’t feed them at all. Suppose that each rabbit has a unit probability of dying per unit time. And as always, suppose the probability of this happening in any interval of time is independent of what happens before that time.

What is the Hamiltonian? Again there’s a dumb guess: the annihilation operator! And again this guess is wrong, but it’s not far off. As before, the right answer includes a ‘correction term’:

H = a - N

This time the correction term is famous in its own right. It’s called the number operator:

N = a^\dagger a

The reason is that if we start with n rabbits, and apply this operator, it amounts to multiplication by n:

N z^n = z \frac{d}{d z} z^n = n z^n

Let’s see why this guess is right. Again, suppose that at some particular time t we have n rabbits, so

\Psi(t) = z^n

Then the master equation says that at this time

\frac{d}{d t} \Psi(t) = (a - N) \Psi(t) = n z^{n-1} - n z^n

So, our probabilities are changing like this:

\frac{d}{d t} \psi_{n-1}(t) = n
\frac{d}{d t} \psi_n(t) = -n

while the rest have zero derivative. And this is good! We’re starting with n rabbits, and each has a unit probability per unit time of dying. So, the chance of having one less should be going up at rate n. And the chance of having the same number we started with should be going down at the same rate.

Puzzle 2. Show that with this Hamiltonian and any initial conditions, the master equation predicts that the expected number of rabbits decays exponentially.

Breeding rabbits

Suppose we have a strange breed of rabbits that reproduce asexually. Suppose that each rabbit has a unit probability per unit time of having a baby rabbit, thus effectively duplicating itself.

As you can see from the cryptic picture above, this ‘duplication’ process takes one rabbit as input and has two rabbits as output. So, if you’ve been paying attention, you should be ready with a dumb guess for the Hamiltonian: a^\dagger a^\dagger a. This operator annihilates one rabbit and then creates two!

But you should also suspect that this dumb guess will need a ‘correction term’. And you’re right! As always, the correction terms makes the probability of things staying the same go down at exactly the rate that the probability of things changing goes up.

You should guess the correction term… but I’ll just tell you:

H = a^\dagger a^\dagger a - N

We can check this in the usual way, by seeing what it does when we have n rabbits:

H z^n =  z^2 \frac{d}{d z} z^n - n z^n = n z^{n+1} - n z^n

That’s good: since there are n rabbits, the rate of rabbit duplication is n. This is the rate at which the probability of having one more rabbit goes up… and also the rate at which the probability of having n rabbits goes down.

Puzzle 3. Show that with this Hamiltonian and any initial conditions, the master equation predicts that the expected number of rabbits grows exponentially.

Dueling rabbits

Let’s do some stranger examples, just so you can see the general pattern.

Here each pair of rabbits has a unit probability per unit time of fighting a duel with only one survivor. You might guess the Hamiltonian a^\dagger a a, but in fact:

H = a^\dagger a a - N(N-1)

Let’s see why this is right! Let’s see what it does when we have n rabbits:

H z^n = z \frac{d^2}{d z^2} z^n - n(n-1)z^n = n(n-1) z^{n-1} - n(n-1)z^n

That’s good: since there are n(n-1) ordered pairs of rabbits, the rate at which duels take place is n(n-1). This is the rate at which the probability of having one less rabbit goes up… and also the rate at which the probability of having n rabbits goes down.

(If you prefer unordered pairs of rabbits, just divide the Hamiltonian by 2. We should talk about this more, but not now.)

Brawling rabbits

Now each triple of rabbits has a unit probability per unit time of getting into a fight with only one survivor! I don’t know the technical term for a three-way fight, but perhaps it counts as a small ‘brawl’ or ‘melee’. In fact the Wikipedia article for ‘melee’ shows three rabbits in suits of armor, fighting it out:

Now the Hamiltonian is:

H = a^\dagger a^3 - N(N-1)(N-2)

You can check that:

H z^n = n(n-1)(n-2) z^{n-2} - n(n-1)(n-2) z^n

and this is good, because n(n-1)(n-2) is the number of ordered triples of rabbits. You can see how this number shows up from the math, too:

a^3 z^n = \frac{d^3}{d z^3} z^n = n(n-1)(n-2) z^{n-3}

The general rule

Suppose we have a process taking k rabbits as input and having j rabbits as output:

I hope you can guess the Hamiltonian I’ll use for this:

H = {a^{\dagger}}^j a^k - N(N-1) \cdots (N-k+1)

This works because

a^k z^n = \frac{d^k}{d z^k} z^n = n(n-1) \cdots (n-k+1) z^{n-k}

so that if we apply our Hamiltonian to n rabbits, we get

H z^n =  n(n-1) \cdots (n-k+1) (z^{n+j-k} - z^n)

See? As the probability of having n+j-k rabbits goes up, the probability of having n rabbits goes down, at an equal rate. This sort of balance is necessary for H to be a sensible Hamiltonian in this sort of stochastic theory (an ‘infinitesimal stochastic operator’, to be precise). And the rate is exactly the number of ordered k-tuples taken from a collection of n rabbits. This is called the kth falling power of n, and written as follows:

n^{\underline{k}} = n(n-1) \cdots (n-k+1)

Since we can apply functions to operators as well as numbers, we can write our Hamiltonian as:

H = {a^{\dagger}}^j a^k - N^{\underline{k}}

Kissing rabbits

Let’s do one more example just to test our understanding. This time each pair of rabbits has a unit probability per unit time of bumping into one another, exchanging a friendly kiss and walking off. This shouldn’t affect the rabbit population at all! But let’s follow the rules and see what they say.

According to our rules, the Hamiltonian should be:

H = {a^{\dagger}}^2 a^2 - N(N-1)

However,

{a^{\dagger}}^2 a^2 z^n = z^2 \frac{d^2}{dz^2} z^n = n(n-1) z^n = N(N-1) z^n

and since z^n form a ‘basis’ for the formal power series, we see that:

{a^{\dagger}}^2 a^2 = N(N-1)

so in fact:

H = 0

That’s good: if the Hamiltonian is zero, the master equation will say

\frac{d}{d t} \Psi(t) = 0

so the population, or more precisely the probability of having any given number of rabbits, will be constant.

There’s another nice little lesson here. Copying the calculation we just did, it’s easy to see that:

{a^{\dagger}}^k a^k = N^{\underline{k}}

This is a cute formula for falling powers of the number operator in terms of annihilation and creation operators. It means that for the general transition we saw before:

we can write the Hamiltonian in two equivalent ways:

H = {a^{\dagger}}^j a^k - N^{\underline{k}} =  {a^{\dagger}}^j a^k - {a^{\dagger}}^k a^k

Okay, that’s it for now! We can, and will, generalize all this stuff to stochastic Petri nets where there are things of many different kinds—not just rabbits. And we’ll see that the master equation we get matches the answer to the puzzle in Part 4. That’s pretty easy. But first, we’ll have a guest post by Jacob Biamonte, who will explain a more realistic example from population biology.


Network Theory (Part 5)

11 April, 2011

Last time we saw clues that stochastic Petri nets are a lot like quantum field theory, but with probabilities replacing amplitudes. There’s a powerful analogy at work here, which can help us a lot. So, this time I want to make that analogy precise.

But first, let me quickly sketch why it could be worthwhile.

A Poisson process

Consider this stochastic Petri net with rate constant r:

It describes an inexhaustible supply of fish swimming down a river, and getting caught when they run into a fisherman’s net. In any short time \Delta t there’s a chance of about r \Delta t of a fish getting caught. There’s also a chance of two or more fish getting caught, but this becomes negligible by comparison as \Delta t \to 0. Moreover, the chance of a fish getting caught during this interval of time is independent of what happens before or afterwards. This sort of process is called a Poisson process.

Problem. Suppose we start out knowing for sure there are no fish in the fisherman’s net. What’s the probability that he has caught n fish at time t?

At any time there will be some probability of having caught n fish; let’s call this probability \psi(n,t), or \psi(n) for short. We can summarize all these probabilities in a single power series, called a generating function:

\Psi(t) = \sum_{n=0}^\infty \psi(n,t) \, z^n

Here z is a formal variable—don’t ask what it means, for now it’s just a trick. In quantum theory we use this trick when talking about collections of photons rather than fish, but then the numbers \psi(n,t) are complex ‘amplitudes’. Now they are real probabilities, but we can still copy what the physicists do, and use this trick to rewrite the master equation as follows:

\frac{d}{d t} \Psi(t) = H \Psi(t)

This describes how the probability of having caught any given number of fish changes with time.

What’s the operator H? Well, in quantum theory we describe the creation of photons using a certain operator on power series called the creation operator:

a^\dagger \Psi = z \Psi

We can try to apply this to our fish. If at some time we’re 100% sure we have n fish, we have

\Psi = z^n

so applying the creation operator gives

a^\dagger \Psi = z^{n+1}

One more fish! That’s good. So, an obvious wild guess is

H = r a^\dagger

where r is the rate at which we’re catching fish. Let’s see how well this guess works.

If you know how to exponentiate operators, you know to solve this equation:

\frac{d}{d t} \Psi(t) = H \Psi(t)

It’s easy:

\Psi(t) = \mathrm{exp}(t H) \Psi(0)

Since we start out knowing there are no fish in the net, we have

\Psi(0) = 1

so with our guess for H we get

\Psi(t) = \mathrm{exp}(r t a^\dagger) 1

But a^\dagger is the operator of multiplication by z, so \mathrm{exp}(r t a^\dagger) is multiplication by e^{r t z}, so

\Psi(t) = e^{r t z} = \sum_{n = 0}^\infty \frac{(r t)^n}{n!} \, z^n

So, if our guess is right, the probability of having caught n fish at time t is

\frac{(r t)^n}{n!}

Unfortunately, this can’t be right, because these probabilities don’t sum to 1! Instead their sum is

\sum_{n=0}^\infty \frac{(r t)^n}{n!} = e^{r t}

We can try to wriggle out of the mess we’re in by dividing our answer by this fudge factor. It sounds like a desperate measure, but we’ve got to try something!

This amounts to guessing that the probability of having caught n fish by time t is

\frac{(r t)^n}{n!} \, e^{-r t }

And this is right! This is called the Poisson distribution: it’s famous for being precisely the answer to the problem we’re facing.

So on the one hand our wild guess about H was wrong, but on the other hand it was not so far off. We can fix it as follows:

H = r (a^\dagger - 1)

The extra -1 gives us the fudge factor we need.

So, a wild guess corrected by an ad hoc procedure seems to have worked! But what’s really going on?

What’s really going on is that a^\dagger, or any multiple of this, is not a legitimate Hamiltonian for a master equation: if we define a time evolution operator \exp(t H) using a Hamiltonian like this, probabilities won’t sum to 1! But a^\dagger - 1 is okay. So, we need to think about which Hamiltonians are okay.

In quantum theory, self-adjoint Hamiltonians are okay. But in probability theory, we need some other kind of Hamiltonian. Let’s figure it out.

Probability versus quantum theory

Suppose we have a system of any kind: physical, chemical, biological, economic, whatever. The system can be in different states. In the simplest sort of model, we say there’s some set X of states, and say that at any moment in time the system is definitely in one of these states. But I want to compare two other options:

• In a probabilistic model, we may instead say that the system has a probability \psi(x) of being in any state x \in X. These probabilities are nonnegative real numbers with

\sum_{x \in X} \psi(x) = 1

• In a quantum model, we may instead say that the system has an amplitude \psi(x) of being in any state x \in X. These amplitudes are complex numbers with

\sum_{x \in X} | \psi(x) |^2 = 1

Probabilities and amplitudes are similar yet strangely different. Of course given an amplitude we can get a probability by taking its absolute value and squaring it. This is a vital bridge from quantum theory to probability theory. Today, however, I don’t want to focus on the bridges, but rather the parallels between these theories.

We often want to replace the sums above by integrals. For that we need to replace our set X by a measure space, which is a set equipped with enough structure that you can integrate real or complex functions defined on it. Well, at least you can integrate so-called ‘integrable’ functions—but I’ll neglect all issues of analytical rigor here. Then:

• In a probabilistic model, the system has a probability distribution \psi : X \to \mathbb{R}, which obeys \psi \ge 0 and

\int_X \psi(x) \, d x = 1

• In a quantum model, the system has a wavefunction \psi : X \to \mathbb{C}, which obeys

\int_X | \psi(x) |^2 \, d x= 1

In probability theory, we integrate \psi over a set S \subset X to find out the probability that our systems state is in this set. In quantum theory we integrate |\psi|^2 over the set to answer the same question.

We don’t need to think about sums over sets and integrals over measure spaces separately: there’s a way to make any set X into a measure space such that by definition,

\int_X \psi(x) \, dx = \sum_{x \in X} \psi(x)

In short, integrals are more general than sums! So, I’ll mainly talk about integrals, until the very end.

In probability theory, we want our probability distributions to be vectors in some vector space. Ditto for wave functions in quantum theory! So, we make up some vector spaces:

• In probability theory, the probability distribution \psi is a vector in the space

L^1(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)| \, d x < \infty \}

• In quantum theory, the wavefunction \psi is a vector in the space

L^2(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)|^2 \, d x < \infty \}

You may wonder why I defined L^1(X) to consist of complex functions when probability distributions are real. I’m just struggling to make the analogy seem as strong as possible. In fact probability distributions are not just real but nonnegative. We need to say this somewhere… but we can, if we like, start by saying they’re complex-valued functions, but then whisper that they must in fact be nonnegative (and thus real). It’s not the most elegant solution, but that’s what I’ll do for now.

Now:

• The main thing we can do with elements of L^1(X), besides what we can do with vectors in any vector space, is integrate one. This gives a linear map:

\int : L^1(X) \to \mathbb{C}

• The main thing we can with elements of L^2(X), besides the besides the things we can do with vectors in any vector space, is take the inner product of two:

\langle \psi, \phi \rangle = \int_X \overline{\psi}(x) \phi(x) \, d x

This gives a map that’s linear in one slot and conjugate-linear in the other:

\langle - , - \rangle :  L^2(X) \times L^2(X) \to \mathbb{C}

First came probability theory with L^1(X); then came quantum theory with L^2(X). Naive extrapolation would say it’s about time for someone to invent an even more bizarre theory of reality based on L^3(X). In this, you’d have to integrate the product of three wavefunctions to get a number! The math of Lp spaces is already well-developed, so give it a try if you want. I’ll stick to L^1 and L^2 today.

Stochastic versus unitary operators

Now let’s think about time evolution:

• In probability theory, the passage of time is described by a map sending probability distributions to probability distributions. This is described using a stochastic operator

U : L^1(X) \to L^1(X)

meaning a linear operator such that

\int U \psi = \int \psi

and

\psi \ge 0 \quad \implies \quad U \psi \ge 0

• In quantum theory the passage of time is described by a map sending wavefunction to wavefunctions. This is described using an isometry

U : L^2(X) \to L^2(X)

meaning a linear operator such that

\langle U \psi , U \phi \rangle = \langle \psi , \phi \rangle

In quantum theory we usually want time evolution to be reversible, so we focus on isometries that have inverses: these are called unitary operators. In probability theory we often consider stochastic operators that are not invertible.

Infinitesimal stochastic versus self-adjoint operators

Sometimes it’s nice to think of time coming in discrete steps. But in theories where we treat time as continuous, to describe time evolution we usually need to solve a differential equation. This is true in both probability theory and quantum theory:

• In probability theory we often describe time evolution using a differential equation called the master equation:

\frac{d}{d t} \psi(t) = H \psi(t)

whose solution is

\psi(t) = \exp(t H)\psi(0)

• In quantum theory we often describe time evolution using a differential equation called Schrödinger’s equation:

i \frac{d}{d t} \psi(t) = H \psi(t)

whose solution is

\psi(t) = \exp(-i t H)\psi(0)

In fact the appearance of i in the quantum case is purely conventional; we could drop it to make the analogy better, but then we’d have to work with ‘skew-adjoint’ operators instead of self-adjoint ones in what follows.

Let’s guess what properties an operator H should have to make \exp(-i t H) unitary for all t. We start by assuming it’s an isometry:

\langle \exp(-i t H) \psi, \exp(-i t H) \phi \rangle = \langle \psi, \phi \rangle

Then we differentiate this with respect to t and set t = 0, getting

\langle -iH \psi, \phi \rangle +  \langle \psi, -iH \phi \rangle = 0

or in other words

\langle H \psi, \phi \rangle = \langle \psi, H \phi \rangle

Physicists call an operator obeying this condition self-adjoint. Mathematicians know there’s more to it, but today is not the day to discuss such subtleties, intriguing though they be. All that matters now is that there is, indeed, a correspondence between self-adjoint operators and well-behaved ‘one-parameter unitary groups’ \exp(-i t H). This is called Stone’s Theorem.

But now let’s copy this argument to guess what properties an operator H must have to make \exp(t H) stochastic. We start by assuming \exp(t H) is stochastic, so

\int \exp(t H) \psi = \int \psi

and

\psi \ge 0 \quad \implies \quad \exp(t H) \psi \ge 0

We can differentiate the first equation with respect to t and set t = 0, getting

\int H \psi = 0

for all \psi.

But what about the second condition,

\psi \ge 0 \quad \implies \quad \exp(t H) \psi \ge 0?

It seems easier to deal with this in the special case when integrals over X reduce to sums. So let’s suppose that happens… and let’s start by seeing what the first condition says in this case.

In this case, L^1(X) has a basis of ‘Kronecker delta functions’: The Kronecker delta function \delta_i vanishes everywhere except at one point i \in X, where it equals 1. Using this basis, we can write any operator on L^1(X) as a matrix.

As a warmup, let’s see what it means for an operator

U: L^1(X) \to L^1(X)

to be stochastic in this case. We’ll take the conditions

\int U \psi = \int \psi

and

\psi \ge 0 \quad \implies \quad U \psi \ge 0

and rewrite them using matrices. For both, it’s enough to consider the case where \psi is a Kronecker delta, say \delta_j.

In these terms, the first condition says

\sum_{i \in X} U_{i j} = 1

for each column j. The second says

U_{i j} \ge 0

for all i, j. So, in this case, a stochastic operator is just a square matrix where each column sums to 1 and all the entries are nonnegative. (Such matrices are often called left stochastic.)

Next, let’s see what we need for an operator H to have the property that \exp(t H) is stochastic for all t \ge 0. It’s enough to assume t is very small, which lets us use the approximation

\exp(t H) = 1 + t H + \cdots

and work to first order in t. Saying that each column of this matrix sums to 1 then amounts to

\sum_{i \in X} \delta_{i j} + t H_{i j} + \cdots  = 1

which requires

\sum_{i \in X} H_{i j} = 0

Saying that each entry is nonnegative amounts to

\delta_{i j} + t H_{i j} + \cdots  \ge 0

When i = j this will be automatic when t is small enough, so the meat of this condition is

H_{i j} \ge 0 \; \mathrm{if} \; i \ne j

So, let’s say H is an infinitesimal stochastic matrix if its columns sum to zero and its off-diagonal entries are nonnegative.

I don’t love this terminology: do you know a better one? There should be some standard term. People here say they’ve seen the term ‘stochastic Hamiltonian’. The idea behind my term is that any infintesimal stochastic operator should be the infinitesimal generator of a stochastic process.

In other words, when we get the details straightened out, any 1-parameter family of stochastic operators

U(t) : L^1(X) \to L^1(X) \qquad   t \ge 0

obeying

U(0) = I

U(t) U(s) = U(t+s)

and continuity:

t_i \to t \quad \implies \quad U(t_i) \psi \to U(t)\psi

should be of the form

U(t) = \exp(t H)

for a unique ‘infinitesimal stochastic operator’ H.

When X is a finite set, this is true—and an infinitesimal stochastic operator is just a square matrix whose columns sum to zero and whose off-diagonal entries are nonnegative. But do you know a theorem characterizing infinitesimal stochastic operators for general measure spaces X? Someone must have worked it out.

Luckily, for our work on stochastic Petri nets, we only need to understand the case where X is a countable set and our integrals are really just sums. This should be almost like the case where X is a finite set—but we’ll need to take care that all our sums converge.

The moral

Now we can see why a Hamiltonian like a^\dagger is no good, while a^\dagger - 1 is good. (I’ll ignore the rate constant r since it’s irrelevant here.) The first one is not infinitesimal stochastic, while the second one is!

In this example, our set of states is the natural numbers:

X = \mathbb{N}

The probability distribution

\psi : \mathbb{N} \to \mathbb{C}

tells us the probability of having caught any specific number of fish.

The creation operator is not infinitesimal stochastic: in fact, it’s stochastic! Why? Well, when we apply the creation operator, what was the probability of having n fish now becomes the probability of having n+1 fish. So, the probabilities remain nonnegative, and their sum over all n is unchanged. Those two conditions are all we need for a stochastic operator.

Using our fancy abstract notation, these conditions say:

\int a^\dagger \psi = \int \psi

and

\psi \ge 0 \; \implies \; a^\dagger \psi \ge 0

So, precisely by virtue of being stochastic, the creation operator fails to be infinitesimal stochastic:

\int a^\dagger \psi \ne 0

Thus it’s a bad Hamiltonian for our stochastic Petri net.

On the other hand, a^\dagger - 1 is infinitesimal stochastic. Its off-diagonal entries are the same as those of a^\dagger, so they’re nonnegative. Moreover:

\int (a^\dagger - 1) \psi = 0

precisely because

\int a^\dagger \psi = \int \psi

You may be thinking: all this fancy math just to understand a single stochastic Petri net, the simplest one of all!

But next time I’ll explain a general recipe which will let you write down the Hamiltonian for any stochastic Petri net. The lessons we’ve learned today will make this much easier. And pondering the analogy between probability theory and quantum theory will also be good for our bigger project of unifying the applications of network diagrams to dozens of different subjects.


Network Theory (Part 4)

6 April, 2011

Last time I explained the rate equation of a stochastic Petri net. But now let’s get serious: let’s see what’s really stochastic— that is, random—about a stochastic Petri net. For this we need to forget the rate equation (temporarily) and learn about the ‘master equation’. This is where ideas from quantum field theory start showing up!

A Petri net has a bunch of states and a bunch of transitions. Here’s an example we’ve already seen, from chemistry:

The states are in yellow, the transitions in blue. A labelling of our Petri net is a way of putting some number of things in each state. We can draw these things as little black dots:

In this example there are only 0 or 1 things in each state: we’ve got one atom of carbon, one molecule of oxygen, one molecule of sodium hydroxide, one molecule of hydrochloric acid, and nothing else. But in general, we can have any natural number of things in each state.

In a stochastic Petri net, the transitions occur randomly as time passes. For example, as time passes we could see a sequence of transitions like this:

 

 

 

 

 

 

Each time a transition occurs, the number of things in each state changes in an obvious way.

The Master Equation

Now, I said the transitions occur ‘randomly’, but that doesn’t mean there’s no rhyme or reason to them! The miracle of probability theory is that it lets us state precise laws about random events. The law governing the random behavior of a stochastic Petri net is called the ‘master equation’.

In a stochastic Petri net, each transition has a rate constant, a nonnegative real number. Roughly speaking, this determines the probability of that transition.

A bit more precisely: suppose we have a Petri net that is labelled in some way at some moment. Then the probability that a given transition occurs in a short time \Delta t is approximately:

• the rate constant for that transition, times

• the time \Delta t, times

• the number of ways the transition can occur.

More precisely still: this formula is correct up to terms of order (\Delta t)^2. So, taking the limit as \Delta t \to 0, we get a differential equation describing precisely how the probability of the Petri net having a given labelling changes with time! And this is the master equation.

Now, you might be impatient to actually see the master equation, but that would be rash. The true master doesn’t need to see the master equation. It sounds like a Zen proverb, but it’s true. The raw beginner in mathematics wants to see the solutions of an equation. The more advanced student is content to prove that the solution exists. But the master is content to prove that the equation exists.

A bit more seriously: what matters is understanding the rules that inevitably lead to some equation: actually writing it down is then straightforward.

And you see, there’s something I haven’t explained yet: “the number of ways the transition can occur”. This involves a bit of counting. Consider, for example, this Petri net:

Suppose there are 10 rabbits and 5 wolves.

• How many ways can the birth transition occur? Since birth takes one rabbit as input, it can occur in 10 ways.

• How many ways can predation occur? Since predation takes one rabbit and one wolf as inputs, it can occur in 10 × 5 = 50 ways.

• How many ways can death occur? Since death takes one wolf as input, it can occur in 5 ways.

Or consider this one:

Suppose there are 10 hydrogen atoms and 5 oxygen atoms. How many ways can they form a water molecule? There are 10 ways to pick the first hydrogen, 9 ways to pick the second hydrogen, and 5 ways to pick the oxygen. So, there are

10 \times 9 \times 5 = 450

ways.

Note that we’re treating the hydrogen atoms as distinguishable, so there are 10 \times 9 ways to pick them, not \frac{10 \times 9}{2} = \binom{9}{2}. In general, the number of ways to choose M distinguishable things from a collection of L is the falling power

L^{\underline{M}} = L \cdot (L - 1) \cdots (L - M + 1)

where there are M factors in the product, but each is 1 less than the preceding one—hence the term ‘falling’.

Okay, now I’ve given you all the raw ingredients to work out the master equation for any stochastic Petri net. The previous paragraph was a big fat hint. One more nudge and you’re on your own:

Puzzle. Suppose we have a stochastic Petri net with k states and just one transition, whose rate constant is r. Suppose the ith state appears m_i times as the input of this transition and n_i times as the output. A labelling of this stochastic Petri net is a k-tuple of natural numbers \ell = (\ell_1, \dots, \ell_k) saying how many things are in each state. Let \psi_\ell(t) be the probability that the labelling is \ell at time t. Then the master equation looks like this:

\frac{d}{d t} \psi_{\ell}(t)  = \sum_{\ell'} H_{\ell \ell'} \psi_{\ell'}(t)

for some matrix of real numbers H_{\ell \ell'}. What is this matrix?

You can write down a formula for this matrix using what I’ve told you. And then, if you have a stochastic Petri net with more transitions, you can just compute the matrix for each transition using this formula, and add them all up.

Someday I’ll tell you the answer to this puzzle, but I want to get there by a strange route: I want to guess the master equation using ideas from quantum field theory!

Some clues

Why?

Well, if we think about a stochastic Petri net whose labelling undergoes random transitions as I’ve described, you’ll see that any possible ‘history’ for the labelling can be drawn in a way that looks like a Feynman diagram. In quantum field theory, Feynman diagrams show how things interact and turn into other things. But that’s what stochastic Petri nets do, too!

For example, if our Petri net looks like this:

then a typical history can be drawn like this:

Some rabbits and wolves come in on top. They undergo some transitions as time passes, and go out on the bottom. The vertical coordinate is time, while the horizontal coordinate doesn’t really mean anything: it just makes the diagram easier to draw.

If we ignore all the artistry that makes it cute, this Feynman diagram is just a graph with states as edges and transitions as vertices. Each transition occurs at a specific time.

We can use these Feynman diagrams to compute the probability that if we start it off with some labelling at time t_1, our stochastic Petri net will wind up with some other labelling at time t_2. To do this, we just take a sum over Feynman diagrams that start and end with the given labellings. For each Feynman diagram, we integrate over all possible times at which the transitions occur. And what do we integrate? Just the product of the rate constants for those transitions!

That was a bit of a mouthful, and it doesn’t really matter if you followed it in detail. What matters is that it sounds a lot like stuff you learn when you study quantum field theory!

That’s one clue that something cool is going on here. Another is the master equation itself:

\frac{d}{dt} \psi_{\ell}(t) = \sum_{\ell'} H_{\ell \ell'} \psi_{\ell'}(t)

This looks a lot like Schrödinger’s equation, the basic equation describing how a quantum system changes with the passage of time.

We can make it look even more like Schrödinger’s equation if we create a vector space with the labellings \ell as a basis. The numbers \psi_\ell(t) will be the components of some vector \psi(t) in this vector space. The numbers H_{\ell \ell'} will be the matrix entries of some operator H on that vector space. And the master equation becomes:

\frac{d}{d t} \psi(t) = H \psi(t)

Compare Schrödinger’s equation:

i \frac{d}{d t} \psi(t) = H \psi(t)

The only visible difference is that factor of i!

But of course this is linked to another big difference: in the master equation \psi describes probabilities, so it’s a vector in a real vector space. In quantum theory \psi describes probability amplitudes, so it’s a vector in a complex Hilbert space.

Apart from this huge difference, everything is a lot like quantum field theory. In particular, our vector space is a lot like the Fock space one sees in quantum field theory. Suppose we have a quantum particle that can be in k different states. Then its Fock space is the Hilbert space we use to describe an arbitrary collection of such particles. It has an orthonormal basis denoted

| \ell_1 \cdots \ell_k \rangle

where \ell_1, \dots, \ell_k are natural numbers saying how many particles there are in each state. So, any vector in Fock space looks like this:

\psi = \sum_{\ell_1, \dots, \ell_k} \; \psi_{\ell_1 , \dots, \ell_k} \,  | \ell_1 \cdots \ell_k \rangle

But if write the whole list \ell_1, \dots, \ell_k simply as \ell, this becomes

\psi = \sum_{\ell} \psi_{\ell}   | \ell \rangle

This is almost like what we’ve been doing with Petri nets!—except I hadn’t gotten around to giving names to the basis vectors.

In quantum field theory class, I learned lots of interesting operators on Fock space: annihilation and creation operators, number operators, and so on. So, when I bumped into this master equation

\frac{d}{d t} \psi(t) = H \psi(t)

it seemed natural to take the operator H and write it in terms of these. There was an obvious first guess, which didn’t quite work… but thinking a bit harder eventually led to the right answer. Later, it turned out people had already thought about similar things. So, I want to explain this.

When I first started working on this stuff, I was focused on the difference between collections of indistinguishable things, like bosons or fermions, and collections of distinguishable things, like rabbits or wolves.

But with the benefit of hindsight, it’s even more important to think about the difference between ordinary quantum theory, which is all about probability amplitudes, and the game we’re playing now, which is all about probabilities. So, next time I’ll explain how we need to modify quantum theory so that it’s about probabilities. This will make it easier to guess a nice formula for H.


Network Theory (Part 3)

3 April, 2011

As we saw last time, a Petri net is a picture that shows different kinds of things and processes that turn bunches of things into other bunches of things, like this:

The kinds of things are called states and the processes are called transitions. We see such transitions in chemistry:

H + OH → H2O

and population biology:

amoeba → amoeba + amoeba

and the study of infectious diseases:

infected + susceptible → infected + infected

and many other situations.

A “stochastic” Petri net says the rate at which each transition occurs. We can think of these transitions as occurring randomly at a certain rate—and then we get a stochastic process described by something called the “master equation”. But for starters, we’ve been thinking about the limit where there are very many things in each state. Then the randomness washes out, and the expected number of things in each state changes deterministically in a manner described by the “rate equation”.

It’s time to explain the general recipe for getting this rate equation! It looks complicated at first glance, so I’ll briefly state it, then illustrate it with tons of examples, and then state it again.

One nice thing about stochastic Petri nets is that they let you dabble in many sciences. Last time we got a tiny taste of how they show up in population biology. This time we’ll look at chemistry and models of infectious diseases. I won’t dig very deep, but take my word for it: you can do a lot with stochastic Petri nets in these subjects! I’ll give some references in case you want to learn more.

Rate equations: the general recipe

Here’s the recipe, really quick:

A stochastic Petri net has a set of states and a set of transitions. Let’s concentrate our attention on a particular transition. Then the ith state will appear m_i times as the input to that transition, and n_i times as the output. Our transition also has a reaction rate 0 \le r < \infty.

The rate equation answers this question:

\frac{d x_i}{d t} = ???

where x_i(t) is the number of things in the ith state at time t. The answer is a sum of terms, one for each transition. Each term works the same way. For the transition we’re looking at, it’s

r (n_i - m_i) x_1^{m_1} \cdots x_k^{m_k}

The factor of (n_i - m_i) shows up because our transition destroys m_i things in the ith state and creates n_i of them. The big product over all states, x_1^{m_1} \cdots x_k^{m_k} , shows up because our transition occurs at a rate proportional to the product of the numbers of things it takes as inputs. The constant of proportionality is the reaction rate r.

The formation of water (1)

But let’s do an example. Here’s a naive model for the formation of water from atomic hydrogen and oxygen:

This Petri net has just one transition: two hydrogen atoms and an oxygen atom collide simultaneously and form a molecule of water. That’s not really how it goes… but what if it were? Let’s use [\mathrm{H}] for the number of hydrogen atoms, and so on, and let the reaction rate be \alpha. Then we get this rate equation:

\begin{array}{ccl}  \frac{d [\mathrm{H}]}{d t} &=& - 2 \alpha [\mathrm{H}]^2 [\mathrm{O}]   \\  \\  \frac{d [\mathrm{O}]}{d t} &=& - \alpha [\mathrm{H}]^2 [\mathrm{O}]   \\  \\  \frac{d [\mathrm{H}_2\mathrm{O}]}{d t} &=& \alpha [\mathrm{H}]^2 [\mathrm{O}] \end{array}

See how it works? The reaction occurs at a rate proportional to the product of the numbers of things that appear as inputs: two H’s and one O. The constant of proportionality is the rate constant \alpha. So, the reaction occurs at a rate equal to \alpha [\mathrm{H}]^2 [\mathrm{O}] . Then:

• Since two hydrogen atoms get used up in this reaction, we get a factor of -2 in the first equation.

• Since one oxygen atom gets used up, we get a factor of -1 in the second equation.

• Since one water molecule is formed, we get a factor of +1 in the third equation.

The formation of water (2)

Let me do another example, just so chemists don’t think I’m an absolute ninny. Chemical reactions rarely proceed by having three things collide simultaneously—it’s too unlikely. So, for the formation of water from atomic hydrogen and oxygen, there will typically be an intermediate step. Maybe something like this:

Here OH is called a ‘hydroxyl radical’. I’m not sure this is the most likely pathway, but never mind—it’s a good excuse to work out another rate equation. If the first reaction has rate constant \alpha and the second has rate constant \beta, here’s what we get:

\begin{array}{ccl}  \frac{d [\mathrm{H}]}{d t} &=& - \alpha [\mathrm{H}] [\mathrm{O}] - \beta [\mathrm{H}] [\mathrm{OH}]   \\  \\  \frac{d [\mathrm{OH}]}{d t} &=&  \alpha [\mathrm{H}] [\mathrm{O}] -  \beta [\mathrm{H}] [\mathrm{OH}]   \\  \\  \frac{d [\mathrm{O}]}{d t} &=& - \alpha [\mathrm{H}] [\mathrm{O}]   \\  \\  \frac{d [\mathrm{H}_2\mathrm{O}]}{d t} &=&   \beta [\mathrm{H}] [\mathrm{OH}]  \end{array}

See how it works? Each reaction occurs at a rate proportional to the product of the numbers of things that appear as inputs. We get minus signs when a reaction destroys one thing of a given kind, and plus signs when it creates one. We don’t get factors of 2 as we did last time, because now no reaction creates or destroys two of anything.

The dissociation of water (1)

In chemistry every reaction comes with a reverse reaction. So, if hydrogen and oxygen atoms can combine to form water, a water molecule can also ‘dissociate’ into hydrogen and oxygen atoms. The rate constants for the reverse reaction can be different than for the original reaction… and all these rate constants depend on the temperature. At room temperature, the rate constant for hydrogen and oxygen to form water is a lot higher than the rate constant for the reverse reaction. That’s why we see a lot of water, and not many lone hydrogen or oxygen atoms. But at sufficiently high temperatures, the rate constants change, and water molecules become more eager to dissociate.

Calculating these rate constants is a big subject. I’m just starting to read this book, which looked like the easiest one on the library shelf:

• S. R. Logan, Chemical Reaction Kinetics, Longman, Essex, 1996.

But let’s not delve into these mysteries today. Let’s just take our naive Petri net for the formation of water and turn around all the arrows, to get the reverse reaction:

If the reaction rate is \alpha, here’s the rate equation:

\begin{array}{ccl}  \frac{d [\mathrm{H}]}{d t} &=& 2 \alpha [\mathrm{H}_2\mathrm{O}]   \\  \\  \frac{d [\mathrm{O}]}{d t} &=& \alpha [\mathrm{H}_2 \mathrm{O}]   \\  \\  \frac{d [\mathrm{H}_2\mathrm{O}]}{d t} &=& - \alpha [\mathrm{H}_2 \mathrm{O}]   \end{array}

See how it works? The reaction occurs at a rate proportional to [\mathrm{H}_2\mathrm{O}], since it has just a single water molecule as input. That’s where the \alpha [\mathrm{H}_2\mathrm{O}] comes from. Then:

• Since two hydrogen atoms get formed in this reaction, we get a factor of +2 in the first equation.

• Since one oxygen atom gets formed, we get a factor of +1 in the second equation.

• Since one water molecule gets used up, we get a factor of +1 in the third equation.

The dissociation of water (part 2)

Of course, we can also look at the reverse of the more realistic reaction involving a hydroxyl radical as an intermediate. Again, we just turn around the arrows in the Petri net we had:

Now the rate equation looks like this:

\begin{array}{ccl}  \frac{d [\mathrm{H}]}{d t} &=& + \alpha [\mathrm{OH}] + \beta [\mathrm{H}_2\mathrm{O}]  \\  \\  \frac{d [\mathrm{OH}]}{d t} &=& - \alpha [\mathrm{OH}] + \beta [\mathrm{H}_2 \mathrm{O}]  \\  \\  \frac{d [\mathrm{O}]}{d t} &=& + \alpha [\mathrm{OH}]  \\  \\  \frac{d [\mathrm{H}_2\mathrm{O}]}{d t} &=& - \beta [\mathrm{H}_2\mathrm{O}]  \end{array}

Do you see why? Test your understanding of the general recipe.

By the way: if you’re a category theorist, when I said “turn around all the arrows” you probably thought “opposite category”. And you’d be right! A Petri net is just a way of presenting a strict symmetric monoidal category that’s freely generated by some objects (the states) and some morphisms (the transitions). When we turn around all the arrows in our Petri net, we’re getting a presentation of the opposite symmetric monoidal category. For more details, try:

Vladimiro Sassone, On the category of Petri net computations, 6th International Conference on Theory and Practice of Software Development, Proceedings of TAPSOFT ’95, Lecture Notes in Computer Science 915, Springer, Berlin, pp. 334-348.

After I explain how stochastic Petri nets are related to quantum field theory, I hope to say more about this category theory business. But if you don’t understand it, don’t worry about it now—let’s do a few more examples.

The SI model

The SI model is an extremely simple model of an infectious disease. We can describe it using this Petri net:

There are two states: susceptible and infected. And there’s a transition called infection, where an infected person meets a susceptible person and infects them.

Suppose S is the number of susceptible people and I the number of infected ones. If the rate constant for infection is \beta, the rate equation is

\begin{array}{ccl}  \frac{d S}{d t} &=& - \beta S I \\ \\  \frac{d I}{d t} &=&  \beta S I   \end{array}

Do you see why?

By the way, it’s easy to solve these equations exactly. The total number of people doesn’t change, so S + I is a conserved quantity. Use this to get rid of one of the variables. You’ll get a version of the famous logistic equation, so the fraction of people infected must grow sort of like this:

Puzzle. Is there a stochastic Petri net with just one state whose rate equation is the logistic equation:

\frac{d P}{d t} = \alpha P - \beta P^2 \; ?

The SIR model

The SI model is just a warmup for the more interesting SIR model, which was invented by Kermack and McKendrick in 1927:

• W. O. Kermack and A. G. McKendrick, A Contribution to the mathematical theory of epidemics, Proc. Roy. Soc. Lond. A 115 (1927), 700-721.

The SIR model has an extra state, called resistant, and an extra transition, called recovery, where an infected person gets better and develops resistance to the disease:

If the rate constant for infection is \beta and the rate constant for recovery is \alpha, the rate equation for this stochastic Petri net is:

\begin{array}{ccl}  \frac{d S}{d t} &=& - \beta S I \\  \\  \frac{d I}{d t} &=&  \beta S I - \alpha I \\   \\  \frac{d R}{d t} &=&  \alpha I  \end{array}

See why?

I don’t know a ‘closed-form’ solution to these equations. But Kermack and McKendrick found an approximate solution in their original paper. They used this to model the death rate from bubonic plague during an outbreak in Bombay that started in 1905, and they got pretty good agreement. Nowadays, of course, we can solve these equations numerically on the computer.

The SIRS model

There’s an even more interesting model of infectious disease called the SIRS model. This has one more transition, called losing resistance, where a resistant person can go back to being susceptible. Here’s the Petri net:

Puzzle. If the rate constants for recovery, infection and loss of resistance are \alpha, \beta, and \gamma, write down the rate equations for this stochastic Petri net.

In the SIRS model we see something new: cyclic behavior! Say you start with a few infected people and a lot of susceptible ones. Then lots of people get infected… then lots get resistant… and then, much later, if you set the rate constants right, they lose their resistance and they’re ready to get sick all over again! You can sort of see it from the Petri net, which looks like a cycle.

I learned about the SI, SIR and SIRS models from this great book:

• Marc Mangel, The Theoretical Biologist’s Toolbox: Quantitative Methods for Ecology and Evolutionary Biology, Cambridge U. Press, Cambridge, 2006.

For more models of this type, see:

Compartmental models in epidemiology, Wikipedia.

A ‘compartmental model’ is closely related to a stochastic Petri net, but beware: the pictures in this article are not really Petri nets!

The general recipe revisited

Now let me remind you of the general recipe and polish it up a bit. So, suppose we have a stochastic Petri net with k states. Let x_i be the number of things in the ith state. Then the rate equation looks like:

\frac{d x_i}{d t} = ???

It’s really a bunch of equations, one for each 1 \le i \le k. But what is the right-hand side?

The right-hand side is a sum of terms, one for each transition in our Petri net. So, let’s assume our Petri net has just one transition! (If there are more, consider one at a time, and add up the results.)

Suppose the ith state appears as input to this transition m_i times, and as output n_i times. Then the rate equation is

\frac{d x_i}{d t} = r (n_i - m_i) x_1^{m_1} \cdots x_k^{m_k}

where r is the rate constant for this transition.

That’s really all there is to it! But subscripts make my eyes hurt more and more as I get older—this is the real reason for using index-free notation, despite any sophisticated rationales you may have heard—so let’s define a vector

x = (x_1, \dots , x_k)

that keeps track of how many things there are in each state. Similarly let’s make up an input vector:

m = (m_1, \dots, m_k)

and an output vector:

n = (n_1, \dots, n_k)

for our transition. And a bit more unconventionally, let’s define

x^m = x_1^{m_1} \cdots x_k^{m_k}

Then we can write the rate equation for a single transition as

\frac{d x}{d t} = r (n-m) x^m

This looks a lot nicer!

Indeed, this emboldens me to consider a general stochastic Petri net with lots of transitions, each with their own rate constant. Let’s write T for the set of transitions and r(\tau) for the rate constant of the transition \tau \in T. Let n(\tau) and m(\tau) be the input and output vectors of the transition \tau. Then the rate equation for our stochastic Petri net is

\frac{d x}{d t} = \sum_{\tau \in T} r(\tau) (n(\tau) - m(\tau)) x^{m(\tau)}

That’s the fully general recipe in a nutshell. I’m not sure yet how helpful this notation will be, but it’s here whenever we want it.

Next time we’ll get to the really interesting part, where ideas from quantum theory enter the game! We’ll see how things in different states randomly transform into each other via the transitions in our Petri net. And someday we’ll check that the expected number of things in each state evolves according to the rate equation we just wrote down… at least in there limit where there are lots of things in each state.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers