Coherence for Solutions of the Master Equation

10 July, 2013

guest post by Arjun Jain

I am a master’s student in the physics department of the Indian Institute of Technology Roorkee. I’m originally from Delhi. Since some time now, I’ve been wanting to go into Mathematical Physics. I hope to do a PhD in that. Apart from maths and physics, I am also quite passionate about art and music.

Right now I am visiting John Baez at the Centre for Quantum Technologies, and we’re working on chemical reaction networks. This post can be considered as an annotation to the last paragraph of John’s paper, Quantum Techniques for Reaction Networks, where he raises the question of when a solution to the master equation that starts as a coherent state will remain coherent for all times. Remember, the ‘master equation’ describes the random evolution of collections of classical particles, and a ‘coherent state’ is one where the probability distribution of particles of each type is a Poisson distribution.

If you’ve been following the network theory series on this blog, you’ll know these concepts, and you’ll know the Anderson-Craciun-Kurtz theorem gives many examples of coherent states that remain coherent. However, all these are equilibrium solutions of the master equation: they don’t change with time. Moreover they are complex balanced equilibria: the rate at which any complex is produced equals the rate at which it is consumed.

There are also non-equilibrium examples where coherent states remain coherent. But they seem rather rare, and I would like to explain why. So, I will give a necessary condition for it to happen. I’ll give the proof first, and then discuss some simple examples. We will see that while the condition is necessary, it is not sufficient.

First, recall the setup. If you’ve been following the network theory series, you can skip the next section.

Reaction networks

Definition. A reaction network consists of:

• a finite set S of species,

• a finite set K of complexes, where a complex is a finite sum of species, or in other words, an element of \mathbb{N}^S,

• a graph with K as its set of vertices and some set T of edges.

You should have in mind something like this:

where our set of species is S = \{A,B,C,D,E\}, the complexes are things like A + E, and the arrows are the elements of T, called transitions or reactions. So, we have functions

s , t : T \to K

saying the source and target of each transition.

Next:

Definition. A stochastic reaction network is a reaction network together with a function r: T \to (0,\infty) assigning a rate constant to each reaction.

From this we can write down the master equation, which describes how a stochastic state evolves in time:

\displaystyle{ \frac{d}{dt} \Psi(t) = H \Psi(t) }

Here \Psi(t) is a vector in the stochastic Fock space, which is the space of formal power series in a bunch of variables, one for each species, and H is an operator on this space, called the Hamiltonian.

From now on I’ll number the species with numbers from 1 to k, so

S = \{1, \dots, k\}

Then the stochastic Fock space consists of real formal power series in variables that I’ll call z_1, \dots, z_k. We can write any of these power series as

\displaystyle{\Psi = \sum_{\ell \in \mathbb{N}^k} \psi_\ell z^\ell }

where

z^\ell = z_1^{\ell_1} \cdots z_k^{\ell_k}

We have annihilation and creation operators on the stochastic Fock space:

\displaystyle{ a_i \Psi = \frac{\partial}{\partial z_i} \Psi }

\displaystyle{ a_i^\dagger \Psi = z_i \Psi }

and the Hamiltonian is built from these as follows:

\displaystyle{ H = \sum_{\tau \in T} r(\tau) \, ({a^\dagger}^{t(\tau)} - {a^\dagger}^{s(\tau)}) \, a^{s(\tau)} }

John explained this here (using slightly different notation), so I won’t go into much detail now, but I’ll say what all the symbols mean. Remember that the source of a transition \tau is a complex, or list of natural numbers:

s(\tau) = (s_1(\tau), \dots, s_k(\tau))

So, the power a^{s(\tau)} is really an abbreviation for a big product of annihilation operators, like this:

\displaystyle{ a^{s(\tau)} = a_1^{s_1(\tau)} \cdots a_k^{s_k(\tau)} }

This describes the annihilation of all the inputs to the transition \tau. Similarly, we define

\displaystyle{ {a^\dagger}^{s(\tau)} = {a_1^\dagger}^{s_1(\tau)} \cdots {a_k^\dagger}^{s_k(\tau)} }

and

\displaystyle{ {a^\dagger}^{t(\tau)} = {a_1^\dagger}^{t_1(\tau)} \cdots {a_k^\dagger}^{t_k(\tau)} }

The result

Here’s the result:

Theorem. If a solution \Psi(t) of the master equation is a coherent state for all times t \ge 0, then \Psi(0) must be complex balanced except for complexes of degree 0 or 1.

This requires some explanation.

First, saying that \Psi(t) is a coherent state means that it is an eigenvector of all the annihilation operators. Concretely this means

\Psi (t) = \displaystyle{\frac{e^{c(t) \cdot z}}{e^{c_1(t) + \cdots + c_k(t)}}}

where

c(t) = (c_1(t), \dots, c_k(t)) \in [0,\infty)^k

and

z = (z_1, \dots, z_k)

It will be helpful to write

\mathbf{1}= (1,1,1,...)

so we can write

\Psi (t) = \displaystyle{ e^{c(t) \cdot (z - \mathbf{1})} }

Second, we say that a complex has degree d if it is a sum of exactly d species. For example, in this reaction network:

the complexes A + C and B + E have degree 2, while the rest have degree 1. We use the word ‘degree’ because each complex \ell gives a monomial

z^\ell = z_1^{\ell_1} \cdots z_k^{\ell_k}

and the degree of the complex is the degree of this monomial, namely

\ell_1 + \cdots + \ell_k

Third and finally, we say a solution \Psi(t) of the master equation is complex balanced for a specific complex \ell if the total rate at which that complex is produced equals the total rate at which it’s destroyed.

Now we are ready to prove the theorem:

Proof. Consider the master equation

\displaystyle { \frac{d \Psi (t)}{d t} = H \psi (t) }

Assume that \Psi(t) is a coherent state for all t \ge 0. This means

\Psi (t) = \displaystyle{ e^{c(t) \cdot (z - \mathbf{1})} }

For convenience, we write c(t) simply as c, and similarly for the components c_i. Then we have

\displaystyle{ \frac{d\Psi(t)}{dt}  = (\dot{c} \cdot (z - \mathbf{1})) \, e^{c \cdot (z - \mathbf{1})}   }

On the other hand, the master equation gives

\begin{array}{ccl} \displaystyle {\frac{d\Psi(t)}{dt}} &=&  \displaystyle{ \sum_{\tau \in T} r(\tau) \, ({a^\dagger}^{t(\tau)} - {a^\dagger}^{s(\tau)}) \, a^{s(\tau)} e^{c \cdot (z - \mathbf{1})} } \\  \\  &=& \displaystyle{\sum_{\tau \in T} c^{t(\tau)} r(\tau) \, ({z}^{t(\tau)} - {z}^{s(\tau)}) e^{c \cdot (z - \mathbf{1})} } \end{array}

So,

\displaystyle{ (\dot{c} \cdot (z - \mathbf{1})) \, e^{c \cdot (z - \mathbf{1})}  =\sum_{\tau \in T} c^{t(\tau)} r(\tau) \, ({z}^{t(\tau)} - {z}^{s(\tau)}) e^{c \cdot (z - \mathbf{1})} }

As a result, we get

\displaystyle{  \dot{c}\cdot z -\dot{c}\cdot\mathbf{1} =  \sum_{\tau \in T} c^{s(\tau)} r(\tau) \, ({z}^{t(\tau)} - {z}^{s(\tau)})  }.

Comparing the coefficients of all z^\ell, we obtain the following. For \ell = 0, which is the only complex of degree zero, we get

\displaystyle { \sum_{\tau: t(\tau)=0} r(\tau) c^{s(\tau)} - \sum_{\tau\;:\; s(\tau)= 0} r(\tau) c^{s(\tau)} = -\dot{c}\cdot\mathbf{1} }

For the complexes \ell of degree one, we get these equations:

\displaystyle { \sum_{\tau\;:\; t(\tau)=(1,0,0,\dots)} r(\tau) c^{s(\tau)} - \sum_{\tau \;:\;s(\tau)=(1,0,0,\dots)} r(\tau) c^{s(\tau)}= \dot{c_1} }

\displaystyle { \sum_{\tau\; :\; t(\tau)=(0,1,0,\dots)} r(\tau) c^{s(\tau)} - \sum_{\tau\;:\; s(\tau)=(0,1,0,\dots)} r(\tau) c^{s(\tau)} = \dot{c_2} }

and so on. For all the remaining complexes \ell we have

\displaystyle { \sum_{\tau\;:\; t(\tau)=\ell} r(\tau) c^{s(\tau)} = \sum_{\tau \;:\; s(\tau)=\ell} r(\tau) c^{s(\tau)} }.

This says that the total rate at which this complex is produced equals the total rate at which it’s destroyed. So, our solution of the master equation is complex balanced for all complexes \ell of degree greater than one. This is our necessary condition.                                                                                   █

To illustrate the theorem, I’ll consider three simple examples. The third example shows that the condition in the theorem, though necessary, is not sufficient. Note that our proof also gives a necessary and sufficient condition for a coherent state to remain coherent: namely, that all the equations we listed hold, not just initially but for all times. But this condition seems a bit complicated.

Introducing amoebae into a Petri dish

Suppose that there is an inexhaustible supply of amoebae, randomly floating around in a huge pond. Each time an amoeba comes into our collection area, we catch it and add it to the population of amoebae in the Petri dish. Suppose that the rate constant for this process is 3.

So, the Hamiltonian is 3(a^\dagger -1). If we start with a coherent state, say

\displaystyle { \Psi(0)=\frac{e^{cz}}{e^c} }

then

\displaystyle { \Psi(t) = e^{3(a^\dagger -1)t} \; \frac{e^{cz}}{e^c}  = \frac{e^{(c+3t)z}}{e^{c+3t}} }

which is coherent at all times.

We can see that the condition of the theorem is satisfied, as all the complexes in the reaction network have degree 0 or 1.

Amoebae reproducing and competing

This example shows a Petri dish with one species, amoebae, and two transitions: fission and competition. We suppose that the rate constant for fission is 2, while that for competition is 1. The Hamiltonian is then

H= 2({a^\dagger}^2-a^\dagger)a + (a^\dagger-{a^\dagger}^2)a^2

If we start off with the coherent state

\displaystyle{\Psi(0) = \frac{e^{2z}}{e^2}}

we find that

\displaystyle {\Psi(t)=e^{2(z^2-z)2+(z-z^2)4} \; \Psi(0)}=\Psi(0)

which is coherent. It should be noted that the chosen initial state

\displaystyle{ \frac{e^{2z}}{e^2}}

was a complex balanced equilibrium solution. So, the Anderson–Craciun–Kurtz Theorem applies to this case.

Amoebae reproducing, competing, and being introduced

This is a combination of the previous two examples, where apart from ongoing reproduction and competition, amoebae are being introduced into the dish with a rate constant 3.

As in the above examples, we might think that coherent states could remain coherent forever here too. Let’s check that.

Assuming that this was true, if

\displaystyle{\Psi(t) = \frac{e^{c(t)z}}{e^{c(t)}} }

then c(t) would have to satisfy the following:

\dot{c}(t) = c(t)^2 + 3 -2c(t)

and

c(t)^2=2c(t)

Using the second equation, we get

\dot{c}(t) = 3 \Rightarrow c = 3t+ c_0

But this is certainly not a solution of the second equation. So, here we find that initially coherent states do not remain remain coherent for all times.

However, if we choose

\displaystyle{\Psi(0) = \frac{e^{2z}}{e^2}}

then this coherent state is complex balanced except for complexes of degree 1, since it was in the previous example, and the only new feature of this example, at time zero, is that single amoebas are being introduced—and these are complexes of degree 1. So, the condition of the theorem does hold.

So, the condition in the theorem is necessary but not sufficient. However, it is easy to check, and we can use it to show that in many cases, coherent states must cease to be coherent.


The Large-Number Limit for Reaction Networks (Part 2)

6 July, 2013

I’ve been talking a lot about ‘stochastic mechanics’, which is like quantum mechanics but with probabilities replacing amplitudes. In Part 1 of this mini-series I started telling you about the ‘large-number limit’ in stochastic mechanics. It turns out this is mathematically analogous to the ‘classical limit’ of quantum mechanics, where Planck’s constant \hbar goes to zero.

There’s a lot more I need to say about this, and lots more I need to figure out. But here’s one rather easy thing.

In quantum mechanics, ‘coherent states’ are a special class of quantum states that are very easy to calculate with. In a certain precise sense they are the best quantum approximations to classical states. This makes them good tools for studying the classical limit of quantum mechanics. As \hbar \to 0, they reduce to classical states where, for example, a particle has a definite position and momentum.

We can borrow this strategy to study the large-number limit of stochastic mechanics. We’ve run into coherent states before in our discussions here. Now let’s see how they work in the large-number limit!

Coherent states

For starters, let’s recall what coherent states are. We’ve got k different kinds of particles, and we call each kind a species. We describe the probability that we have some number of particles of each kind using a ‘stochastic state’. For starters, this is a formal power series in variables z_1, \dots, z_k. We write it as

\displaystyle{\Psi = \sum_{\ell \in \mathbb{N}^k} \psi_\ell z^\ell }

where z^\ell is an abbreviation for

z_1^{\ell_1} \cdots z_k^{\ell_k}

But for \Psi to be a stochastic state the numbers \psi_\ell need to be probabilities, so we require that

\psi_\ell \ge 0

and

\displaystyle{ \sum_{\ell \in \mathbb{N}^k} \psi_\ell = 1}

Sums of coefficients like this show up so often that it’s good to have an abbreviation for them:

\displaystyle{ \langle \Psi \rangle =  \sum_{\ell \in \mathbb{N}^k} \psi_\ell}

Now, a coherent state is a stochastic state where the numbers of particles of each species are independent random variables, and the number of the ith species is distributed according to a Poisson distribution.

Since we can pick ithe means of these Poisson distributions to be whatever we want, we get a coherent state \Psi_c for each list of numbers c \in [0,\infty)^k:

\displaystyle{ \Psi_c = \frac{e^{c \cdot z}}{e^c} }

Here I’m using another abbreviation:

e^{c} = e^{c_1 + \cdots + c_k}

If you calculate a bit, you’ll see

\displaystyle{  \Psi_c = e^{-(c_1 + \cdots + c_k)} \, \sum_{n \in \mathbb{N}^k} \frac{c_1^{n_1} \cdots c_k^{n_k}} {n_1! \, \cdots \, n_k! } \, z_1^{n_1} \cdots z_k^{n_k} }

Thus, the probability of having n_i things of the ith species is equal to

\displaystyle{  e^{-c_i} \, \frac{c_i^{n_i}}{n_i!} }

This is precisely the definition of a Poisson distribution with mean equal to c_i.

What are the main properties of coherent states? For starters, they are indeed states:

\langle \Psi_c \rangle = 1

More interestingly, they are eigenvectors of the annihilation operators

a_i = \displaystyle{ \frac{\partial}{\partial z_i} }

since when you differentiate an exponential you get back an exponential:

\begin{array}{ccl} a_i \Psi_c &=&  \displaystyle{ \frac{\partial}{\partial z_i} \frac{e^{c \cdot z}}{e^c} } \\ \\   &=& c_i \Psi_c \end{array}

We can use this fact to check that in this coherent state, the mean number of particles of the ith species really is c_i. For this, we introduce the number operator

N_i = a_i^\dagger a_i

where a_i^\dagger is the creation operator:

(a_i^\dagger \Psi)(z) = z_i \Psi(z)

The number operator has the property that

\langle N_i \Psi \rangle

is the mean number of particles of the ith species. If we calculate this for our coherent state \Psi_c, we get

\begin{array}{ccl} \langle a_i^\dagger a_i \Psi_c \rangle &=& c_i \langle a_i^\dagger \Psi_c \rangle \\  \\ &=& c_i \langle \Psi_c \rangle \\ \\ &=& c_i \end{array}

Here in the second step we used the general rule

\langle a_i^\dagger \Phi \rangle = \langle \Phi \rangle

which is easy to check.

Rescaling

Now let’s see how coherent states work in the large-numbers limit. For this, let’s use the rescaled annihilation, creation and number operators from Part 1. They look like this:

A_i = \hbar \, a_i

C_i = a_i^\dagger

\widetilde{N}_i = C_i A_i

Since

\widetilde{N}_i = \hbar N_i

the point is that the rescaled number operator counts particles not one at a time, but in bunches of size 1/\hbar. For example, if \hbar is the reciprocal of Avogadro’s number, we are counting particles in ‘moles’. So, \hbar \to 0 corresponds to a large-number limit.

To flesh out this idea some more, let’s define rescaled coherent states:

\widetilde{\Psi}_c = \Psi_{c/\hbar}

These are eigenvectors of the rescaled annihilation operators:

\begin{array}{ccl} A_i \widetilde{\Psi}_c &=& \hbar a_i \Psi_{c/\hbar}  \\  \\  &=& c_i \Psi_{c/\hbar} \\ \\  &=& c_i \widetilde{\Psi}_c  \end{array}

This in turn means that

\begin{array}{ccl} \langle \widetilde{N}_i \widetilde{\Psi}_c \rangle &=& \langle C_i A_i \widetilde{\Psi}_c \rangle \\  \\  &=& c_i \langle  C_i \widetilde{\Psi}_c \rangle \\  \\ &=& c_i \langle \widetilde{\Psi}_c \rangle \\ \\ &=& c_i \end{array}

Here we used the general rule

\langle C_i \Phi \rangle = \langle \Phi \rangle

which holds because the ‘rescaled’ creation operator C_i is really just the usual creation operator, which obeys this rule.

What’s the point of all this fiddling around? Simply this. The equation

\langle \widetilde{N}_i \widetilde{\Psi}_c \rangle = c_i

says the expected number of particles of the ith species in the state \widetilde{\Psi}_c is c_i, if we count these particles not one at a time, but in bunches of size 1/\hbar.

A simple test

As a simple test of this idea, let’s check that as \hbar \to 0, the standard deviation of the number of particles in the state \Psi_c goes to zero… where we count particle using the rescaled number operator.

The variance of the rescaled number operator is, by definition,

\langle \widetilde{N}_i^2 \widetilde{\Psi}_c \rangle -   \langle \widetilde{N}_i^2 \widetilde{\Psi}_c \rangle^2

and the standard deviation is the square root of the variance.

We already know the mean of the rescaled number operator:

\langle \widetilde{N}_i \widetilde{\Psi}_c \rangle = c_i

So, the main thing we need to calculate is the mean of its square:

\langle \widetilde{N}_i^2 \widetilde{\Psi}_c \rangle

For this we will use the commutation relation derived last time:

[A_i , C_i] = \hbar

This implies

\begin{array}{ccl} \widetilde{N}_i^2 &=& C_i A_i C_i A_i \\  \\  &=&  C_i (C_i A_i + \hbar) A_i \\ \\  &=&  C_i^2 A_i^2 + \hbar C_i A_i \end{array}

so

\begin{array}{ccl} \langle \widetilde{N}_i^2\widetilde{\Psi}_c \rangle &=& \langle (C_i^2 A_i^2 + \hbar C_i A_i) \Psi_c \rangle \\   \\  &=&  c_i^2 + \hbar c_i  \end{array}

where we used our friends

A_i \Psi_c = c_i \Psi_c

and

\langle C_i \Phi \rangle = \langle \Phi \rangle

So, the variance of the rescaled number of particles is

\begin{array}{ccl} \langle \widetilde{N}_i^2 \widetilde{\Psi}_c \rangle  -   \langle \widetilde{N}_i \widetilde{\Psi}_c \rangle^2  &=& c_i^2 + \hbar c_i - c_i^2 \\  \\  &=& \hbar c_i \end{array}

and the standard deviation is

(\hbar c_i)^{1/2}

Good, it goes to zero as \hbar \to 0! And the square root is just what you’d expect if you’ve thought about stuff like random walks or the central limit theorem.

A puzzle

I feel sure that in any coherent state, not only the variance but also all the higher moments of the rescaled number operators go to zero as \hbar \to 0. Can you prove this?

Here I mean the moments after the mean has been subtracted. The pth moment is then

\langle (\widetilde{N}_i - c_i)^p \; \widetilde{\Psi}_c \rangle

I want this to go to zero as \hbar \to 0.

Here’s a clue that should help. First, there’s a textbook formula for the higher moments of Poisson distributions without the mean subtracted. If I understand it correctly, it gives this:

\displaystyle{ \langle N_i^m \; \Psi_c \rangle = \sum_{j = 1}^m {c_i}^j \; \left\{ \begin{array}{c} m \\ j \end{array} \right\} }

Here

\displaystyle{ \left\{ \begin{array}{c} m \\ j \end{array} \right\} }

is the number of ways to partition an m-element set into j nonempty subsets. This is called Stirling’s number of the second kind. This suggests that there’s some fascinating combinatorics involving coherent states. That’s exactly the kind of thing I enjoy, so I would like to understand this formula someday… but not today! I just want something to go to zero!

If I rescale the above formula, I seem to get

\begin{array}{ccl} \langle \widetilde{N}_i^m \; \widetilde{\Psi}_c \rangle &=& \hbar^m \langle N_i^m \Psi_{c/\hbar} \rangle \\ \\ &=& \hbar^m \; \displaystyle{ \sum_{j = 1}^m \left(\frac{c_i}{\hbar}\right)^j \left\{ \begin{array}{c} m \\ j \end{array} \right\} } \end{array}

We could plug this formula into

\langle (\widetilde{N}_i - c_i)^p \; \widetilde{\Psi}_c \rangle =  \displaystyle{ \sum_{m = 0}^p \, \binom{m}{p} \; \langle \widetilde{N}_i^m \;  \widetilde{\Psi}_c \rangle \, (-c_i)^{p - m} }

and then try to show the result goes to zero as \hbar \to 0. But I don’t have the energy to do that… not right now, anyway!

Maybe you do. Or maybe you can think of a better approach to solving this problem. The answer must be well-known, since the large-number limit of a Poisson distribution is a very important thing.


Relative Entropy (Part 2)

2 July, 2013

In the first part of this mini-series, I describe how various ideas important in probability theory arise naturally when you start doing linear algebra using only the nonnegative real numbers.

But after writing it, I got an email from a rather famous physicist saying he got “lost at line two”. So, you’ll be happy to hear that the first part is not a prerequisite for the remaining parts! I wrote it just to intimidate that guy.

Tobias Fritz and I have proved a theorem characterizing the concept of relative entropy, which is also known as ‘relative information’, ‘information gain’ or—most terrifying and least helpful of all—‘Kullback-Leibler divergence’. In this second part I’ll introduce two key players in this theorem. The first, \mathrm{FinStat}, is a category where:

• an object consists of a system with finitely many states, and a probability distribution on those states

and

• a morphism consists of a deterministic ‘measurement process’ mapping states of one system to states of another, together with a ‘hypothesis’ that lets the observer guess a probability distribution of states of the system being measured, based on what they observe.

The second, \mathrm{FP}, is a subcategory of \mathrm{FinStat}. It has all the same objects, but only morphisms where the hypothesis is ‘optimal’. This means that if the observer measures the system many times, and uses the probability distribution of their observations together with their hypothesis to guess the probability distribution of states of the system, they get the correct answer (in the limit of many measurements).

In this part all I will really do is explain precisely what \mathrm{FinStat} and \mathrm{FP} are. But to whet your appetite, let me explain how we can use them to give a new characterization of relative entropy!

Suppose we have any morphism in \mathrm{FinStat}. In other words: suppose we have a deterministic measurement process, together with a hypothesis that lets the observer guess a probability distribution of states of the system being measured, based on what they observe.

Then we have two probability distributions on the states of the system being measured! First, the ‘true’ probability distribution. Second, the probability that the observer will guess based on their observations.

Whenever we have two probability distributions on the same set, we can compute the entropy of the first relative to to the second. This describes how surprised you’ll be if you discover the probability distribution is really the first, when you thought it was the second.

So: any morphism in \mathrm{FinStat} will have a relative entropy. It will describe how surprised the observer will be when they discover the true probability distribution, given what they had guessed.

But this amount of surprise will be zero if their hypothesis was ‘optimal’ in the sense I described. So, the relative entropy will vanish on morphisms in \mathrm{FP}.

Our theorem says this fact almost characterizes the concept of relative entropy! More precisely, it says that any convex-linear lower semicontinuous functor

F : \mathrm{FinStat} \to [0,\infty]

that vanishes on the subcategory \mathrm{FP} must equal some constant times the relative entropy.

Don’t be scared! This should not make sense to you yet, since I haven’t said how I’m thinking of [0,+\infty] as a category, nor what a ‘convex-linear lower semicontinuous functor’ is, nor how relative entropy gives one. I will explain all that later. I just want you to get a vague idea of where I’m going.

Now let me explain the categories \mathrm{FinStat} and \mathrm{FP}. We need to warm up a bit first.

FinStoch

A stochastic map f : X \leadsto Y is different from an ordinary function, because instead of assigning a unique element of Y to each element of X, it assigns a probability distribution on Y to each element of X. So you should imagine it as being like a function ‘with random noise added’, so that f(x) is not a specific element of Y, but instead has a probability of taking on different values. This is why I’m using a weird wiggly arrow to denote a stochastic map.

More formally:

Definition. Given finite sets X and Y, a stochastic map f : X \leadsto Y assigns a real number f_{yx} to each pair x \in X, y \in Y, such that fixing any element x, the numbers f_{yx} form a probability distribution on Y. We call f_{yx} the probability of y given x.

In more detail:

f_{yx} \ge 0 for all x \in X, y \in Y.

and

\displaystyle{ \sum_{y \in Y} f_{yx} = 1} for all x \in X.

Note that we can think of f : X \leadsto Y as a Y \times X-shaped matrix of numbers. A matrix obeying the two properties above is called stochastic. This viewpoint is nice because it reduces the problem of composing stochastic maps to matrix multiplication. It’s easy to check that multiplying two stochastic matrices gives a stochastic matrix. So, composing stochastic maps gives a stochastic map.

We thus get a category:

Definition. Let \mathrm{FinStoch} be the category of finite sets and stochastic maps between them.

In case you’re wondering why I’m restricting attention to finite sets, it’s merely because I want to keep things simple. I don’t want to worry about whether sums or integrals converge.

FinProb

Now take your favorite 1-element set and call it 1. A function p : 1 \to X is just a point of X. But a stochastic map p : 1 \leadsto X is something more interesting: it’s a probability distribution on X.

Why? Because it gives a probability distribution on X for each element of 1, but that set has just one element.

Last time I introduced the rather long-winded phrase finite probability measure space to mean a finite set with a probability distribution on it. But now we’ve seen a very quick way to describe such a thing within \mathrm{FinStoch}:

And this gives a quick way to think about a measure-preserving function between finite probability measure spaces! It’s just a commutative triangle like this:

Note that the horizontal arrow f:  X \to Y is not wiggly. The straight arrow means it’s an honest function, not a stochastic map. But a function is a special case of a stochastic map! So it makes sense to compose a straight arrow with a wiggly arrow—and the result is, in general, a wiggly arrow. So, it makes sense to demand that this triangle commutes, and this says that the function f: X \to Y is measure-preserving.

Let me work through the details, in case they’re not clear.

First: how is a function a special case of a stochastic map? Here’s how. If we start with a function f: X \to Y, we get a matrix of numbers

f_{yx} = \delta_{y,f(x)}

where \delta is the Kronecker delta. So, each element x \in X gives a probability distribution that’s zero except at f(x).

Given this, we can work out what this commuting triangle really says:

If use p_x to stand for the probability distribution that p: 1 \leadsto X puts on X, and similarly for q_y, the commuting triangle says

\displaystyle{ q_y = \sum_{x \in X} \delta_{y,f(x)} p_x}

or in other words:

\displaystyle{ q_y = \sum_{x \in X : f(x) = y} p_x }

or if you like:

\displaystyle{ q_y = \sum_{x \in f^{-1}(y)} p_x }

In this situation people say q is p pushed forward along f, and they say f is a measure-preserving function.

So, we’ve used \mathrm{FinStoch} to describe another important category:

Definition. Let \mathrm{FinProb} be the category of finite probability measure spaces and measure-preserving functions between them.

I can’t resist mentioning another variation:

A commuting triangle like this is a measure-preserving stochastic map. In other words, p gives a probability measure on X, q gives a probability measure on Y, and f: X \leadsto Y is a stochastic map with

\displaystyle{ q_y = \sum_{x \in X} f_{yx} p_x }

FinStat

The category we really need for relative entropy is a bit more subtle. An object is a finite probability measure space:

but a morphism looks like this:

The whole diagram doesn’t commute, but the two equations I wrote down hold. The first equation says that f: X \to Y is a measure-preserving function. In other words, this triangle, which we’ve seen before, commutes:

The second equation says that f \circ s is the identity, or in math jargon, s is a section for f.

But what does that really mean?

The idea is that X is the set of ‘states’ of some system, while Y is a set of possible ‘observations’ you might make. The function f is a ‘measurement process’. You ‘measure’ the system using f, and if the system is in the the state x you get the observation f(x). The probability distribution p says the probability that the system is any given state, while q says the probability that you get any given observation when you do your measurement.

Note: are assuming for now that that there’s no random noise in the observation process! That’s why f is a function instead of a stochastic map.

But what about s? That’s the fun part: s describes your ‘hypothesis’ about the system’s state given a particular measurement! If you measure the system and get a result y \in Y, you guess it’s in the state x with probability s_{xy}.

And we don’t want this hypothesis to be really dumb: that’s what

f \circ s = 1_Y

says. You see, this equation says that

\sum_{x \in X} \delta_{y', f(x)} s_{xy} = \delta_{y' y}

or in other words:

\sum_{x \in f^{-1}(y')} s_{xy} = \delta_{y' y}

If you think about it, this implies s_{xy} = 0 unless f(x) = y.

So, if you make an observation y, you will guess the system is in state x with probability zero unless f(x) = y. In short, you won’t make a really dumb guess about the system’s state.

Here’s how we compose morphisms:

We get a measure-preserving function g \circ f : X \to Z and a stochastic map going back, s \circ t : Z \to Z. You can check that these obey the required equations:

g \circ f \circ p = r

g \circ f \circ s \circ t = 1_Z

So, we get a category:

Definition. Let \mathrm{FinStat} be the category where an object is a finite probability measure space:

a morphism is a diagram obeying these equations:

and composition is defined as above.

FP

As we’ve just seen, a morphism in \mathrm{FinStat} consists of a ‘measurement process’ f and a ‘hypothesis’ s:

But sometimes we’re lucky and our hypothesis is optimal, in the sense that

s \circ q = p

Conceptually, this says that if you take the probability distribution q on our observations and use it to guess a probability distribution for the system’s state using our hypothesis s, you get the correct answer: p.

Mathematically, it says that this diagram commutes:

In other words, s is a measure-preserving stochastic map.

There’s a subcategory of \mathrm{FinStat} with all the same objects, but only these ‘optimal’ morphisms. It’s important, but the name we have for it is not very exciting:

Definition. Let \mathrm{FP} be the subcategory of \mathrm{FinStat} where an object is a finite probability measure space

and a morphism is a diagram obeying these equations:

Why do we call this category \mathrm{FP}? Because it’s a close relative of \mathrm{FinProb}, where a morphism, you’ll remember, looks like this:

The point is that for a morphism in \mathrm{FP}, the conditions on s are so strong that they completely determine it unless there are observations that happen with probability zero—that is, unless there are y \in Y with q_y = 0. To see this, note that

s \circ q = p

actually says

\sum_{y \in Y} s_{xy} q_y = p_x

for any choice of x \in X. But we’ve already seen s_{xy} = 0 unless f(x) = y, so the sum has just one term, and the equation says

s_{x,f(x)} q_{f(x)} = p_x

We can solve this for s_{x,f(x)}, so s is completely determined… unless q_{f(x)} = 0.

This covers the case when y = f(x). We also can’t figure out s_{x,y} if y isn’t in the image of f.

So, to be utterly precise, s is determined by p, q and f unless there’s an element y \in Y that has q_y = 0. Except for this special case, a morphism in \mathrm{FP} is just a morphism in \mathrm{FinProb}. But in this special case, a morphism in \mathrm{FP} has a little extra information: an arbitrary probability distribution on the inverse image of each point y with this property.

In short, \mathrm{FP} is the same as \mathrm{FinProb} except that our observer’s ‘optimal hypothesis’ must provide a guess about the state of the system given an observation, even in cases of observations that occur with probability zero.

I’m going into these nitpicky details for two reasons. First, we’ll need \mathrm{FP} for our characterization of relative entropy. But second, Tom Leinster already ran into this category in his work on entropy and category theory! He discussed it here:

• Tom Leinster, An operadic introduction to entropy.

Despite the common theme of entropy, he arrived at it from a very different starting-point.

Conclusion

So, I hope that next time I can show you something like this:

and you’ll say “Oh, that’s a probability distribution on the states of some system!” Intuitively, you should think of the wiggly arrow p as picking out a ‘random element’ of the set X.

I hope I can show you this:

and you’ll say “Oh, that’s a deterministic measurement process, sending a probability distribution on the states of the measured system to a probability distribution on observations!”

I hope I can show you this:

and you’ll say “Oh, that’s a deterministic measurement process, together with a hypothesis about the system’s state, given what is observed!”

And I hope I can show you this:

and you’ll say “Oh, that’s a deterministic measurement process, together with an optimal hypothesis about the system’s state, given what is observed!”

I don’t count on it… but I can hope.

Postscript

And speaking of unrealistic hopes, if I were really optimistic I would hope you noticed that \mathrm{FinStoch} and \mathrm{FinProb}, which underlie the more fancy categories I’ve discussed today, were themselves constructed starting from linear algebra over the nonnegative numbers [0,\infty) in Part 1. That ‘foundational’ work is not really needed for what we’re doing now. However, I like the fact that we’re ultimately getting the concept of relative entropy starting from very little: just linear algebra, using only nonnegative numbers!


The Large-Number Limit for Reaction Networks (Part 1)

1 July, 2013

Waiting for the other shoe to drop.

This is a figure of speech that means ‘waiting for the inevitable consequence of what’s come so far’. Do you know where it comes from? You have to imagine yourself in an apartment on the floor below someone who is taking off their shoes. When you hear one, you know the next is coming.

There’s even an old comedy routine about this:

A guest who checked into an inn one night was warned to be quiet because the guest in the room next to his was a light sleeper. As he undressed for bed, he dropped one shoe, which, sure enough, awakened the other guest. He managed to get the other shoe off in silence, and got into bed. An hour later, he heard a pounding on the wall and a shout: “When are you going to drop the other shoe?”

When we were working on math together, James Dolan liked to say “the other shoe has dropped” whenever an inevitable consequence of some previous realization became clear. There’s also the mostly British phrase the penny has dropped. You say this when someone finally realizes the situation they’re in.

But sometimes one realization comes after another, in a long sequence. Then it feels like it’s raining shoes!

I guess that’s a rather strained metaphor. Perhaps falling like dominoes is better for these long chains of realizations.

This is how I’ve felt in my recent research on the interplay between quantum mechanics, stochastic mechanics, statistical mechanics and extremal principles like the principle of least action. The basics of these subjects should be completely figured out by now, but they aren’t—and a lot of what’s known, nobody bothered to tell most of us.

So, I was surprised to rediscover that the Maxwell relations in thermodynamics are formally identical to Hamilton’s equations in classical mechanics… though in retrospect it’s obvious. Thermodynamics obeys the principle of maximum entropy, while classical mechanics obeys the principle of least action. Wherever there’s an extremal principle, symplectic geometry, and equations like Hamilton’s equations, are sure to follow.

I was surprised to discover (or maybe rediscover, I’m not sure yet) that just as statistical mechanics is governed by the principle of maximum entropy, quantum mechanics is governed by a principle of maximum ‘quantropy’. The analogy between statistical mechanics and quantum mechanics has been known at least since Feynman and Schwinger. But this basic aspect was never explained to me!

I was also surprised to rediscover that simply by replacing amplitudes by probabilities in the formalism of quantum field theory, we get a nice formalism for studying stochastic many-body systems. This formalism happens to perfectly match the ‘stochastic Petri nets’ and ‘reaction networks’ already used in subjects from population biology to epidemiology to chemistry. But now we can systematically borrow tools from quantum field theory! All the tricks that particle physicists like—annihilation and creation operators, coherent states and so on—can be applied to problems like the battle between the AIDS virus and human white blood cells.

And, perhaps because I’m a bit slow on the uptake, I was surprised when yet another shoe came crashing to the floor the other day.

Because quantum field theory has, at least formally, a nice limit where Planck’s constant goes to zero, the same is true for for stochastic Petri nets and reaction networks!

In quantum field theory, we call this the ‘classical limit’. For example, if you have a really huge number of photons all in the same state, quantum effects sometimes become negligible, and we can describe them using the classical equations describing electromagnetism: the classical Maxwell equations. In stochastic situations, it makes more sense to call this limit the ‘large-number limit’: the main point is that there are lots of particles in each state.

In quantum mechanics, different observables don’t commute, so the so-called commutator matters a lot:

[A,B] = AB - BA

These commutators tend to be proportional to Planck’s constant. So in the limit where Planck’s constant \hbar goes to zero, observables commute… but commutators continue to have a ghostly existence, in the form of Poisson bracket:

\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }

Poisson brackets are a key part of symplectic geometry—the geometry of classical mechanics. So, this sort of geometry naturally shows up in the study of stochastic Petri nets!

Let me sketch how it works. I’ll start with a section reviewing stuff you should already know if you’ve been following the network theory series.

The stochastic Fock space

Suppose we have some finite set S. We call its elements species, since we think of them as different kinds of things—e.g., kinds of chemicals, or kinds of organisms.

To describe the probability of having any number of things of each kind, we need the stochastic Fock space. This is the space of real formal power series in a bunch of variables, one for each element of S. It won’t hurt to simply say

S = \{1, \dots, k \}

Then the stochastic Fock space is

\mathbb{R}[[z_1, \dots, z_k ]]

this being math jargon for the space of formal power series with real coefficients in some variables z_1, \dots, z_k, one for each element of S.

We write

n = (n_1, \dots, n_k) \in \mathbb{N}^S

and use this abbreviation:

z^n = z_1^{n_1} \cdots z_k^{n_k}

We use z^n to describe a state where we have n_1 things of the first species, n_2 of the second species, and so on.

More generally, a stochastic state is an element \Psi of the stochastic Fock space with

\displaystyle{ \Psi = \sum_{n \in \mathbb{N}^k} \psi_n \, z^n }

where

\psi_n \ge 0

and

\displaystyle{ \sum_{n  \in \mathbb{N}^k} \psi_n = 1 }

We use \Psi to describe a state where \psi_n is the probability of having n_1 things of the first species, n_2 of the second species, and so on.

The stochastic Fock space has some important operators on it: the annihilation operators given by

\displaystyle{ a_i \Psi = \frac{\partial}{\partial z_i} \Psi }

and the creation operators given by

\displaystyle{ a_i^\dagger \Psi = z_i \Psi }

From these we can define the number operators:

N_i = a_i^\dagger a_i

Part of the point is that

N_i z^n = n_i z^n

This says the stochastic state z^n is an eigenstate of all the number operators, with eigenvalues saying how many things there are of each species.

The annihilation, creation, and number operators obey some famous commutation relations, which are easy to check for yourself:

[a_i, a_j] = 0

[a_i^\dagger, a_j^\dagger] = 0

[a_i, a_j^\dagger] = \delta_{i j}

[N_i, N_j ] = 0

[N_i , a_j^\dagger] = \delta_{i j} a_j^\dagger

[N_i , a_j] = - \delta_{i j} a_j^\dagger

The last two have easy interpretations. The first of these two implies

N_i a_i^\dagger \Psi = a_i^\dagger (N_i + 1) \Psi

This says that if we start in some state \Psi, create a thing of type i, and then count the things of that type, we get one more than if we counted the number of things before creating one. Similarly,

N_i a_i \Psi = a_i (N_i - 1) \Psi

says that if we annihilate a thing of type i and then count the things of that type, we get one less than if we counted the number of things before annihilating one.

Introducing Planck’s constant

Now let’s introduce an extra parameter into this setup. To indicate the connection to quantum physics, I’ll call it \hbar, which is the usual symbol for Planck’s constant. However, I want to emphasize that we’re not doing quantum physics here! We’ll see that the limit where \hbar \to 0 is very interesting, but it will correspond to a limit where there are many things of each kind.

We’ll start by defining

A_i = \hbar \, a_i

and

C_i = a_i^\dagger

Here A stands for ‘annihilate’ and C stands for ‘create’. Think of A as a rescaled annihilation operator. Using this we can define a rescaled number operator:

\widetilde{N}_i = C_i A_i

So, we have

\widetilde{N}_i = \hbar N_i

and this explains the meaning of the parameter \hbar. The idea is that instead of counting things one at time, we count them in bunches of size 1/\hbar.

For example, suppose \hbar = 1/12. Then we’re counting things in dozens! If we have a state \Psi with

N_i \Psi = 36 \Psi

then there are 36 things of the ith kind. But this implies

\widetilde{N}_i \Psi = 3 \Psi

so there are 3 dozen things of the ith kind.

Chemists don’t count in dozens; they count things in big bunches called moles. A mole is approximately the number of carbon atoms in 12 grams: Avogadro’s number, 6.02 × 1023. When you count things by moles, you’re taking \hbar to be 1.66 × 10-24, the reciprocal of Avogadro’s number.

So, while in quantum mechanics Planck’s constant is ‘the quantum of action’, a unit of action, here it’s ‘the quantum of quantity’: the amount that corresponds to one thing.

We can easily work out the commutation relations of our new rescaled operators:

[A_i, A_j] = 0

[C_i, C_j] = 0

[A_i, C_j] = \hbar \, \delta_{i j}

[\widetilde{N}_i, \widetilde{N}_j ] = 0

[\widetilde{N}_i , C_j] = \hbar \,  \delta_{i j} C_j

[\widetilde{N}_i , A_j] = - \hbar \, \delta_{i j} A_j

These are just what you see in quantum mechanics! The commutators are all proportional to \hbar.

Again, we can understand what these relations mean if we think a bit. For example, the commutation relation for \widetilde{N}_i and C_i says

N_i C_i \Psi = C_i (N_i + \hbar) \Psi

This says that if we start in some state \Psi, create a thing of type i, and then count the things of that type, we get \hbar more than if we counted the number of things before creating one. This is because we are counting things not one at a time, but in bunches of size 1/\hbar.

You may be wondering why I defined the rescaled annihilation operator to be \hbar times the original annihilation operator:

A_i = \hbar \, a_i

but left the creation operator unchanged:

C_i = a_i^\dagger

I’m wondering that too! I’m not sure I’m doing things the best way yet. I’ve also tried another more symmetrical scheme, taking A_k = \sqrt{\hbar} \, a_k and C_k = \sqrt{\hbar} a_k^\dagger. This gives the same commutation relations, but certain other formulas become more unpleasant. I’ll explain that some other day.

Next, we can take the limit as \hbar \to 0 and define Poisson brackets of operators by

\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }

To make this rigorous it’s best to proceed algebraically. For this we treat \hbar as a formal variable rather than a specific number. So, our number system becomes \mathbb{R}[\hbar], the algebra of polynomials in \hbar. We define the Weyl algebra to be the algebra over \mathbb{R}[\hbar] generated by elements A_i and C_i obeying

[A_i, A_j] = 0

[C_i, C_j] = 0

[A_i, C_j] = \hbar \, \delta_{i j}

We can set \hbar = 0 in this formalism; then the Weyl algebra reduces to the algebra of polynomials in the variables A_i and C_i. This algebra is commutative! But we can define a Poisson bracket on this algebra by

\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }

It takes a bit of work to explain to algebraists exactly what’s going on in this formula, because it involves an interplay between the algebra of polynomials in A_i and C_i, which is commutative, and the Weyl algebra, which is not. I’ll be glad to explain the details if you want. But if you’re a physicist, you can just follow your nose and figure out what the formula gives. For example:

\begin{array}{ccl}   \{A_i, C_j\} &=& \displaystyle{ \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A_i, C_j] } \\  \\  &=& \displaystyle{ \lim_{\hbar \to 0} \; \frac{1}{\hbar} \, \hbar \, \delta_{i j} }  \\  \\  &=& \delta_{i j} \end{array}

Similarly, we have:

\{ A_i, A_j \} = 0

\{ C_i, C_j \} = 0

\{ A_i, C_j \} = \delta_{i j}

\{ \widetilde{N}_i, \widetilde{N}_j \}  = 0

\{ \widetilde{N}_i , C_j \} = \delta_{i j} C_j

\{ \widetilde{N}_i , A_j \} = - \delta_{i j} A_j

I should probably use different symbols for A_i, C_i and \widetilde{N}_i after we’ve set \hbar = 0, since they’re really different now, but I don’t have the patience to make up more names for things!

Now, we can think of A_i and C_i as coordinate functions on a 2k-dimensional vector space, and all the polynomials in A_i and C_i as functions on this space. This space is what physicists would call a ‘phase space’: they use this kind of space to describe the position and momentum of a particle, though here we are using it in a different way. Mathematicians would call it a ‘symplectic vector space’, because it’s equipped with a special structure, called a symplectic structure, that lets us define Poisson brackets of smooth functions on this space. We won’t need to get into that now, but it’s important—and it makes me happy to see it here.

More

There’s a lot more to do, but not today. My main goal is to understand, in a really elegant way, how the master equation for a stochastic Petri net reduces to the rate equation in the large-number limit. What we’ve done so far is start thinking of this as a \hbar \to 0 limit. This should let us borrow ideas about classical limits in quantum mechanics, and apply them to stochastic mechanics.

Stay tuned!


Relative Entropy (Part 1)

20 June, 2013

I’m trying to finish off a paper that Tobias Fritz and I have been working on, which gives a category-theoretic (and Bayesian!) characterization of relative entropy. It’s a kind of sequel to our paper with Tom Leinster, in which we characterized entropy.

That earlier paper was developed in conversations on the n-Category Café. It was a lot of fun; I sort of miss that style of working. Also, to get warmed up, I need to think through some things I’ve thought about before. So, I might as well write them down here.

The idea

There are many categories related to probability theory, and they’re related in many ways. Last summer—on the 24th of August 2012, according to my notes here—Jamie Vicary, Brendan Fong and I worked through a bunch of these relationships. I need to write them down now, even if they’re not all vitally important to my paper with Tobias. They’re sort of buzzing around my brain like flies.

(Tobias knows this stuff too, and this is how we think about probability theory, but we weren’t planning to stick it in our paper. Maybe we should.)

Let’s restrict attention to probability measures on finite sets, and related structures. We could study these questions more generally, and we should, but not today. What we’ll do is give a unified purely algebraic description of:

• finite sets

• measures on finite sets

• probability measures on finite sets

and various kinds of maps between these:

• functions

• bijections

• measure-preserving functions

• stochastic maps

Finitely generated free [0,∞)-modules

People often do linear algebra over a field, which is—roughly speaking—a number system where you can add, subtract, multiply and divide. But algebraists have long realized that a lot of linear algebra still works with a commutative ring, where you can’t necessarily divide. It gets more complicated, but also a lot more interesting.

But in fact, a lot still works with a commutative rig, where we can’t necessarily subtract either! Something I keep telling everyone is that linear algebra over rigs is a good idea for studying things like probability theory, thermodynamics, and the principle of least action.

Today we’ll start with the rig of nonnegative real numbers with their usual addition and multiplication; let’s call this [0,\infty) . The idea is that measure theory, and probability theory, are closely related to linear algebra over this rig.

Let C be the category with of finitely generated free [0,\infty) -modules as objects, and module homomorphisms as morphisms. I’ll call these morphisms maps.

Puzzle. Do we need to say ‘free’ here? Are there finitely generated modules over [0,\infty) that aren’t free?

Every finitely generated free [0,\infty)-module is isomorphic to [0,\infty)^S for some finite set S . In other words, it’s isomorphic to [0,\infty)^n for some n = 0, 1, 2, \dots . So, C is equivalent to the category where objects are natural numbers, a morphism from m to n is an m \times n matrix of numbers in [0,\infty) , and composition is done by matrix multiplication. I’ll also call this equivalent category C.

We can take tensor products of finitely generated free modules, and this makes C into a symmetric monoidal †-category. This means we can draw maps using string diagrams in the usual way. However, I’m feeling lazy so I’ll often write equations when I could be drawing diagrams.

One of the rules of the game is that all these equations will make sense in any symmetric monoidal †-category. So we could, if we wanted, generalize ideas from probability theory this way. If you want to do this, you’ll need to know that [0,\infty) is the unit for the tensor product in C. We’ll be seeing this guy [0,\infty) a lot. So if you want to generalize, replace C by any symmetric monoidal †-category, and replace [0,\infty) by the unit for the tensor product.

Finite sets

There’s a way to see the category of finite sets lurking in C, which we can borrow from this paper:

• Bob Coecke, Dusko Pavlovic and Jamie Vicary, A new description of orthogonal bases.

For any finite set S , we get a free finitely generated [0,\infty) -module, namely [0,\infty)^S . This comes with some structure:

• a multiplication m: [0,\infty)^S \otimes [0,\infty)^S \to [0,\infty)^S , coming from pointwise multiplication of [0,\infty) -valued functions on S

• the unit for this multiplication, an element of [0,\infty)^S, which we can write as a morphism i: [0,\infty) \to [0,\infty)^S

• a comultiplication, obtained by taking the diagonal map \Delta : S \to S \times S and promoting it to a linear map \Delta : [0,\infty)^S \to [0, \infty)^S \otimes [0,\infty)^S

• a counit for this comultiplication, obtained by taking the unique map to the terminal set ! : S \to 1 and promoting it to a linear map e: [0,\infty)^S \to [0,\infty)

These morphisms m, i, \Delta, e make

x = [0,\infty)^S

into a commutative Frobenius algebra in C . That’s a thing where the unit, counit, multiplication and comultiplication obey these laws:

(I drew these back when I was feeling less lazy.) This Frobenius algebra is also ‘special’, meaning it obeys this:

And it’s also a †-Frobenius algebra, meaning that the counit and comultiplication are obtained from the unit and multiplication by ‘flipping’ them using the †category structure. (If we think of a morphism in C as a matrix, its dagger is its transpose.)

Conversely, suppose we have any special commutative †-Frobenius algebra x . Then using the ideas in the paper by Coecke, Pavlovich and Vicary we can recover a basis for x , consisting of the vectors e_i \in x with

\Delta(e_i) = e_i \otimes e_i

This basis forms a set S such that

x \cong [0,\infty)^S

for some specified isomorphism in C. Furthermore, this is an isomorphism of special commutative †-Frobenius algebras!

In case you’re wondering, these vectors e_i correspond to the functions on S that are zero everywhere except at one point i \in S, where they equal 1.

In short, a special commutative †-Frobenius algebra in C is just a fancy way of talking about a finite set. This may seem silly, but it’s a way to start describing probability theory using linear algebra very much as we do with quantum theory. This analogy between quantum theory and probability theory is so interesting that it deserves a book.

Functions and bijections

Now suppose we have two special commutative †-Frobenius algebra in C, say x and y .

Suppose f : x \to y is a Frobenius algebra homomorphism: that is, a map preserving all the structure—the unit, counit, multiplication and comultiplication. Then it comes from an isomorphism of finite sets. This lets us find \mathrm{FinSet}_0 , the groupoid of finite sets and bijections, inside C.

Alternatively, suppose f : x \to y is just a coalgebra homomorphism: that is a map preserving just the counit and comultiplication. Then it comes from an arbitrary function between finite sets. This lets us find FinSet , the category of finite sets and functions, inside C .

But what if f preserves just the counit? This sounds like a dry, formal question. But it’s not: the answer is something useful, a ‘stochastic map’.

Stochastic maps

A stochastic map from a finite set S to a finite set T is a map sending each point of S to a probability measure on T .

We can think of this as a T \times S -shaped matrix of numbers in [0,\infty) , where a given column gives the probability that a given point in S goes to any point in T . The sum of the numbers in each column will be 1. And conversely, any T \times S -shaped matrix of numbers in [0,\infty) , where each column sums to 1, gives a stochastic map from S to T .

But now let’s describe this idea using the category C. We’ve seen a finite set is the same as a special commutative †-Frobenius algebra. So, say we have two of these, x and y . Our matrix of numbers in [0,\infty) is just a map

f: x \to y

So, we just need a way to state the condition that each column in the matrix sums to 1. And this condition simply says that f preserves the counit:

\epsilon_y \circ f = \epsilon_x

where \epsilon_x : x \to [0,\infty) is the counit for x , and similarly for \epsilon_y .

To understand this, note that if we use the canonical isomorphism

x \cong [0,\infty)^S

the counit \epsilon_x can be seen as the map

[0,\infty)^S \to [0,\infty)

that takes any S -tuple of numbers and sums them up. In other words, it’s integration with respect to counting measure. So, the equation

\epsilon_y \circ f = \epsilon_x

says that if we take any S -tuple of numbers, multiply it by the matrix f , and then sum up the entries of the resulting T -tuple, it’s the same as if we summed up the original S -tuple. But this says precisely that each column of the matrix f sums to 1.

So, we can use our formalism to describe \mathrm{FinStoch}, the category with finite sets as objects and stochastic maps as morphisms. We’ve seen this category is equivalent to the category with special commutative †-Frobenius algebras in C as objects and counit-preserving maps as morphisms.

Finite measure spaces

Now let’s use our formalism to describe finite measure spaces—by which, beware, I mean a finite sets equipped with measures! To do this, we’ll use a special commutative †-Frobenius algebra x in C together with any map

\mu: [0,\infty) \to x

Starting from these, we get a specified isomorphism

x \cong [0,\infty)^S

and \mu sends the number 1 to a vector in [0,\infty)^S : that is, a function on S taking values in [0,\infty) . Multiplying this function by counting measure, we get a measure on S .

Puzzle. How can we describe this measure without the annoying use of counting measure?

Conversely, any measure on a finite set gives a special commutative †-Frobenius algebra x in C equipped with a map from [0,\infty) .

So, we can say a finite measure space is a special commutative †-Frobenius algebra in C equipped with a map

\mu: [0,\infty) \to x

And given two of these,

\mu: [0,\infty) \to x , \qquad  \nu: [0,\infty) \to y

and a coalgebra morphism

f : x \to y

obeying this equation

f \circ \mu  = \nu

then we get a measure-preserving function between finite measure spaces! If you’re a category theorist, you’ll draw this equation as a commutative triangle:

Conversely, any measure-preserving function between finite measure spaces obeys this equation. So, we get an algebraic way of describing the category \mathrm{FinMeas} , with finite measure spaces as objects and measure-preserving maps as morphisms.

Finite probability measure spaces

I’m mainly interested in probability measures. So suppose x is a special commutative †-Frobenius algebra in C equipped with a map

\mu: [0,\infty) \to x

We’ve seen this gives a finite measure space. But this is a probability measure space if and only if

e \circ \mu = 1

where

e : x \to [0,\infty)

is the counit for x . The equation simply says the total integral of our measure \mu is 1.

So, we get a way of describing the category \mathrm{FinProb} , which has finite probability measure spaces as objects and measure-preserving maps as objects. Given finite probability measure spaces described this way:

\mu: [0,\infty) \to x , \qquad  \nu: [0,\infty) \to y

a measure-preserving function is a coalgebra morphism

f : x \to y

such that the obvious triangle commutes:

f \circ \mu  = \nu

Measure-preserving stochastic maps

Say we have two finite measure spaces. Then we can ask whether a stochastic map from one to the other is measure-preserving. And we can answer this question in the language of C .

Remember, a finite measure space is a special commutative †-Frobenius algebra x in C together with a map

\mu: [0,\infty) \to x

Say we have another one:

\nu: [0,\infty) \to y

A stochastic map is just a map

f: x \to y

that preserves the counit:

\epsilon_y \circ f = \epsilon_x

But it’s a measure-preserving stochastic map if also

f \circ \mu  = \nu

Next…

There’s a lot more to say; I haven’t gotten anywhere near what Tobias and I are doing! But it’s pleasant to have this basic stuff written down.


Quantum Techniques for Reaction Networks

11 June, 2013

Fans of the network theory series might like to look at this paper:

• John Baez, Quantum techniques for reaction networks.

and I would certainly appreciate comments and corrections.

This paper tackles a basic question we never got around to discussing: how the probabilistic description of a system where bunches of things randomly interact and turn into other bunches of things can reduce to a deterministic description in the limit where there are lots of things!

Mathematically, such systems are given by ‘stochastic Petri nets’, or if you prefer, ‘stochastic reaction networks’. These are just two equivalent pictures of the same thing. For example, we could describe some chemical reactions using this Petri net:

but chemists would use this reaction network:

C + O2 → CO2
CO2 + NaOH → NaHCO3
NaHCO3 + HCl → H2O + NaCl + CO2

Making either of them ‘stochastic’ merely means that we specify a ‘rate constant’ for each reaction, saying how probable it is.

For any such system we get a ‘master equation’ describing how the probability of having any number of things of each kind changes with time. In the class I taught on this last quarter, the students and I figured out how to derive from this an equation saying how the expected number of things of each kind changes with time. Later I figured out a much slicker argument… but either way, we get this result:

Theorem. For any stochastic reaction network and any stochastic state \Psi(t) evolving in time according to the master equation, then

\displaystyle{ \frac{d}{dt} \langle N \Psi(t) \rangle } =  \displaystyle{\sum_{\tau \in T}} \, r(\tau) \,  (s(\tau) - t(\tau)) \;  \left\langle N^{\underline{s(\tau)}}\, \Psi(t) \right\rangle

assuming the derivative exists.

Of course this will make no sense yet if you haven’t been following the network theory series! But I explain all the notation in the paper, so don’t be scared. The main point is that \langle N \Psi(t) \rangle is a vector listing the expected number of things of each kind at time t. The equation above says how this changes with time… but it closely resembles the ‘rate equation’, which describes the evolution of chemical systems in a deterministic way.

And indeed, the next big theorem says that the master equation actually implies the rate equation when the probability of having various numbers of things of each kind is given by a product of independent Poisson distributions. In this case \Psi(t) is what people in quantum physics call a ‘coherent state’. So:

Theorem. Given any stochastic reaction network, let
\Psi(t) be a mixed state evolving in time according to the master equation. If \Psi(t) is a coherent state when t = t_0, then \langle N \Psi(t) \rangle obeys the rate equation when t = t_0.

In most cases, this only applies exactly at one moment of time: later \Psi(t) will cease to be a coherent state. Then we must resort to the previous theorem to see how the expected number of things of each kind changes with time.

But sometimes our state \Psi(t) will stay coherent forever! For one case where this happens, see the companion paper, which I blogged about a little while ago:

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

We wrote this first, but logically it comes after the one I just finished now!

All this material will get folded into the book I’m writing with Jacob Biamonte. There are just a few remaining loose ends that need to be tied up.


Quantum Techniques for Studying Equilibrium in Reaction Networks

16 May, 2013

 

The summer before last, I invited Brendan Fong to Singapore to work with me on my new ‘network theory’ project. He quickly came up with a nice new proof of a result about mathematical chemistry. We blogged about it, and I added it to my book, but then he became a grad student at Oxford and got distracted by other kinds of networks—namely, Bayesian networks.

So, we’ve just now finally written up this result as a self-contained paper:

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

Check it out and let us know if you spot mistakes or stuff that’s not clear!

The idea, in brief, is to use math from quantum field theory to give a somewhat new proof of the Anderson–Craciun–Kurtz theorem.

This remarkable result says that in many cases, we can start with an equilibrium solution of the ‘rate equation’ which describes the behavior of chemical reactions in a deterministic way in the limit of a large numbers of molecules, and get an equilibrium solution of the ‘master equation’ which describes chemical reactions probabilistically for any number of molecules.

The trick, in our approach, is to start with a chemical reaction network, which is something like this:

and use it to write down a Hamiltonian describing the time evolution of the probability that you have various numbers of each kind of molecule: A, B, C, D, E, … Using ideas from quantum mechanics, we can write this Hamiltonian in terms of annihilation and creation operators—even though our problem involves probability theory, not quantum mechanics! Then we can write down the equilibrium solution as a ‘coherent state’. In quantum mechanics, that’s a quantum state that approximates a classical one as well as possible.


All this is part of a larger plan to take tricks from quantum mechanics and apply them to ‘stochastic mechanics’, simply by working with real numbers representing probabilities instead of complex numbers representing amplitudes!

I should add that Brendan’s work on Bayesian networks is also very cool, and I plan to talk about it here and even work it into the grand network theory project I have in mind. But this may take quite a long time, so for now you should read his paper:

• Brendan Fong, Causal theories: a categorical perspective on Bayesian networks.


Follow

Get every new post delivered to your Inbox.

Join 3,094 other followers