## The Large-Number Limit for Reaction Networks (Part 1)

1 July, 2013

Waiting for the other shoe to drop.

This is a figure of speech that means ‘waiting for the inevitable consequence of what’s come so far’. Do you know where it comes from? You have to imagine yourself in an apartment on the floor below someone who is taking off their shoes. When you hear one, you know the next is coming.

A guest who checked into an inn one night was warned to be quiet because the guest in the room next to his was a light sleeper. As he undressed for bed, he dropped one shoe, which, sure enough, awakened the other guest. He managed to get the other shoe off in silence, and got into bed. An hour later, he heard a pounding on the wall and a shout: “When are you going to drop the other shoe?”

When we were working on math together, James Dolan liked to say “the other shoe has dropped” whenever an inevitable consequence of some previous realization became clear. There’s also the mostly British phrase the penny has dropped. You say this when someone finally realizes the situation they’re in.

But sometimes one realization comes after another, in a long sequence. Then it feels like it’s raining shoes!

I guess that’s a rather strained metaphor. Perhaps falling like dominoes is better for these long chains of realizations.

This is how I’ve felt in my recent research on the interplay between quantum mechanics, stochastic mechanics, statistical mechanics and extremal principles like the principle of least action. The basics of these subjects should be completely figured out by now, but they aren’t—and a lot of what’s known, nobody bothered to tell most of us.

So, I was surprised to rediscover that the Maxwell relations in thermodynamics are formally identical to Hamilton’s equations in classical mechanics… though in retrospect it’s obvious. Thermodynamics obeys the principle of maximum entropy, while classical mechanics obeys the principle of least action. Wherever there’s an extremal principle, symplectic geometry, and equations like Hamilton’s equations, are sure to follow.

I was surprised to discover (or maybe rediscover, I’m not sure yet) that just as statistical mechanics is governed by the principle of maximum entropy, quantum mechanics is governed by a principle of maximum ‘quantropy’. The analogy between statistical mechanics and quantum mechanics has been known at least since Feynman and Schwinger. But this basic aspect was never explained to me!

I was also surprised to rediscover that simply by replacing amplitudes by probabilities in the formalism of quantum field theory, we get a nice formalism for studying stochastic many-body systems. This formalism happens to perfectly match the ‘stochastic Petri nets’ and ‘reaction networks’ already used in subjects from population biology to epidemiology to chemistry. But now we can systematically borrow tools from quantum field theory! All the tricks that particle physicists like—annihilation and creation operators, coherent states and so on—can be applied to problems like the battle between the AIDS virus and human white blood cells.

And, perhaps because I’m a bit slow on the uptake, I was surprised when yet another shoe came crashing to the floor the other day.

Because quantum field theory has, at least formally, a nice limit where Planck’s constant goes to zero, the same is true for for stochastic Petri nets and reaction networks!

In quantum field theory, we call this the ‘classical limit’. For example, if you have a really huge number of photons all in the same state, quantum effects sometimes become negligible, and we can describe them using the classical equations describing electromagnetism: the classical Maxwell equations. In stochastic situations, it makes more sense to call this limit the ‘large-number limit’: the main point is that there are lots of particles in each state.

In quantum mechanics, different observables don’t commute, so the so-called commutator matters a lot:

$[A,B] = AB - BA$

These commutators tend to be proportional to Planck’s constant. So in the limit where Planck’s constant $\hbar$ goes to zero, observables commute… but commutators continue to have a ghostly existence, in the form of Poisson bracket:

$\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }$

Poisson brackets are a key part of symplectic geometry—the geometry of classical mechanics. So, this sort of geometry naturally shows up in the study of stochastic Petri nets!

Let me sketch how it works. I’ll start with a section reviewing stuff you should already know if you’ve been following the network theory series.

### The stochastic Fock space

Suppose we have some finite set $S$. We call its elements species, since we think of them as different kinds of things—e.g., kinds of chemicals, or kinds of organisms.

To describe the probability of having any number of things of each kind, we need the stochastic Fock space. This is the space of real formal power series in a bunch of variables, one for each element of $S.$ It won’t hurt to simply say

$S = \{1, \dots, k \}$

Then the stochastic Fock space is

$\mathbb{R}[[z_1, \dots, z_k ]]$

this being math jargon for the space of formal power series with real coefficients in some variables $z_1, \dots, z_k,$ one for each element of $S.$

We write

$n = (n_1, \dots, n_k) \in \mathbb{N}^S$

and use this abbreviation:

$z^n = z_1^{n_1} \cdots z_k^{n_k}$

We use $z^n$ to describe a state where we have $n_1$ things of the first species, $n_2$ of the second species, and so on.

More generally, a stochastic state is an element $\Psi$ of the stochastic Fock space with

$\displaystyle{ \Psi = \sum_{n \in \mathbb{N}^k} \psi_n \, z^n }$

where

$\psi_n \ge 0$

and

$\displaystyle{ \sum_{n \in \mathbb{N}^k} \psi_n = 1 }$

We use $\Psi$ to describe a state where $\psi_n$ is the probability of having $n_1$ things of the first species, $n_2$ of the second species, and so on.

The stochastic Fock space has some important operators on it: the annihilation operators given by

$\displaystyle{ a_i \Psi = \frac{\partial}{\partial z_i} \Psi }$

and the creation operators given by

$\displaystyle{ a_i^\dagger \Psi = z_i \Psi }$

From these we can define the number operators:

$N_i = a_i^\dagger a_i$

Part of the point is that

$N_i z^n = n_i z^n$

This says the stochastic state $z^n$ is an eigenstate of all the number operators, with eigenvalues saying how many things there are of each species.

The annihilation, creation, and number operators obey some famous commutation relations, which are easy to check for yourself:

$[a_i, a_j] = 0$

$[a_i^\dagger, a_j^\dagger] = 0$

$[a_i, a_j^\dagger] = \delta_{i j}$

$[N_i, N_j ] = 0$

$[N_i , a_j^\dagger] = \delta_{i j} a_j^\dagger$

$[N_i , a_j] = - \delta_{i j} a_j^\dagger$

The last two have easy interpretations. The first of these two implies

$N_i a_i^\dagger \Psi = a_i^\dagger (N_i + 1) \Psi$

This says that if we start in some state $\Psi,$ create a thing of type $i,$ and then count the things of that type, we get one more than if we counted the number of things before creating one. Similarly,

$N_i a_i \Psi = a_i (N_i - 1) \Psi$

says that if we annihilate a thing of type $i$ and then count the things of that type, we get one less than if we counted the number of things before annihilating one.

### Introducing Planck’s constant

Now let’s introduce an extra parameter into this setup. To indicate the connection to quantum physics, I’ll call it $\hbar,$ which is the usual symbol for Planck’s constant. However, I want to emphasize that we’re not doing quantum physics here! We’ll see that the limit where $\hbar \to 0$ is very interesting, but it will correspond to a limit where there are many things of each kind.

We’ll start by defining

$A_i = \hbar \, a_i$

and

$C_i = a_i^\dagger$

Here $A$ stands for ‘annihilate’ and $C$ stands for ‘create’. Think of $A$ as a rescaled annihilation operator. Using this we can define a rescaled number operator:

$\widetilde{N}_i = C_i A_i$

So, we have

$\widetilde{N}_i = \hbar N_i$

and this explains the meaning of the parameter $\hbar.$ The idea is that instead of counting things one at time, we count them in bunches of size $1/\hbar.$

For example, suppose $\hbar = 1/12.$ Then we’re counting things in dozens! If we have a state $\Psi$ with

$N_i \Psi = 36 \Psi$

then there are 36 things of the ith kind. But this implies

$\widetilde{N}_i \Psi = 3 \Psi$

so there are 3 dozen things of the ith kind.

Chemists don’t count in dozens; they count things in big bunches called moles. A mole is approximately the number of carbon atoms in 12 grams: Avogadro’s number, 6.02 × 1023. When you count things by moles, you’re taking $\hbar$ to be 1.66 × 10-24, the reciprocal of Avogadro’s number.

So, while in quantum mechanics Planck’s constant is ‘the quantum of action’, a unit of action, here it’s ‘the quantum of quantity’: the amount that corresponds to one thing.

We can easily work out the commutation relations of our new rescaled operators:

$[A_i, A_j] = 0$

$[C_i, C_j] = 0$

$[A_i, C_j] = \hbar \, \delta_{i j}$

$[\widetilde{N}_i, \widetilde{N}_j ] = 0$

$[\widetilde{N}_i , C_j] = \hbar \, \delta_{i j} C_j$

$[\widetilde{N}_i , A_j] = - \hbar \, \delta_{i j} A_j$

These are just what you see in quantum mechanics! The commutators are all proportional to $\hbar.$

Again, we can understand what these relations mean if we think a bit. For example, the commutation relation for $\widetilde{N}_i$ and $C_i$ says

$N_i C_i \Psi = C_i (N_i + \hbar) \Psi$

This says that if we start in some state $\Psi,$ create a thing of type $i,$ and then count the things of that type, we get $\hbar$ more than if we counted the number of things before creating one. This is because we are counting things not one at a time, but in bunches of size $1/\hbar.$

You may be wondering why I defined the rescaled annihilation operator to be $\hbar$ times the original annihilation operator:

$A_i = \hbar \, a_i$

but left the creation operator unchanged:

$C_i = a_i^\dagger$

I’m wondering that too! I’m not sure I’m doing things the best way yet. I’ve also tried another more symmetrical scheme, taking $A_k = \sqrt{\hbar} \, a_k$ and $C_k = \sqrt{\hbar} a_k^\dagger.$ This gives the same commutation relations, but certain other formulas become more unpleasant. I’ll explain that some other day.

Next, we can take the limit as $\hbar \to 0$ and define Poisson brackets of operators by

$\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }$

To make this rigorous it’s best to proceed algebraically. For this we treat $\hbar$ as a formal variable rather than a specific number. So, our number system becomes $\mathbb{R}[\hbar],$ the algebra of polynomials in $\hbar$. We define the Weyl algebra to be the algebra over $\mathbb{R}[\hbar]$ generated by elements $A_i$ and $C_i$ obeying

$[A_i, A_j] = 0$

$[C_i, C_j] = 0$

$[A_i, C_j] = \hbar \, \delta_{i j}$

We can set $\hbar = 0$ in this formalism; then the Weyl algebra reduces to the algebra of polynomials in the variables $A_i$ and $C_i.$ This algebra is commutative! But we can define a Poisson bracket on this algebra by

$\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }$

It takes a bit of work to explain to algebraists exactly what’s going on in this formula, because it involves an interplay between the algebra of polynomials in $A_i$ and $C_i,$ which is commutative, and the Weyl algebra, which is not. I’ll be glad to explain the details if you want. But if you’re a physicist, you can just follow your nose and figure out what the formula gives. For example:

$\begin{array}{ccl} \{A_i, C_j\} &=& \displaystyle{ \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A_i, C_j] } \\ \\ &=& \displaystyle{ \lim_{\hbar \to 0} \; \frac{1}{\hbar} \, \hbar \, \delta_{i j} } \\ \\ &=& \delta_{i j} \end{array}$

Similarly, we have:

$\{ A_i, A_j \} = 0$

$\{ C_i, C_j \} = 0$

$\{ A_i, C_j \} = \delta_{i j}$

$\{ \widetilde{N}_i, \widetilde{N}_j \} = 0$

$\{ \widetilde{N}_i , C_j \} = \delta_{i j} C_j$

$\{ \widetilde{N}_i , A_j \} = - \delta_{i j} A_j$

I should probably use different symbols for $A_i, C_i$ and $\widetilde{N}_i$ after we’ve set $\hbar = 0,$ since they’re really different now, but I don’t have the patience to make up more names for things!

Now, we can think of $A_i$ and $C_i$ as coordinate functions on a 2k-dimensional vector space, and all the polynomials in $A_i$ and $C_i$ as functions on this space. This space is what physicists would call a ‘phase space’: they use this kind of space to describe the position and momentum of a particle, though here we are using it in a different way. Mathematicians would call it a ‘symplectic vector space’, because it’s equipped with a special structure, called a symplectic structure, that lets us define Poisson brackets of smooth functions on this space. We won’t need to get into that now, but it’s important—and it makes me happy to see it here.

### More

There’s a lot more to do, but not today. My main goal is to understand, in a really elegant way, how the master equation for a stochastic Petri net reduces to the rate equation in the large-number limit. What we’ve done so far is start thinking of this as a $\hbar \to 0$ limit. This should let us borrow ideas about classical limits in quantum mechanics, and apply them to stochastic mechanics.

Stay tuned!

## Relative Entropy (Part 1)

20 June, 2013

I’m trying to finish off a paper that Tobias Fritz and I have been working on, which gives a category-theoretic (and Bayesian!) characterization of relative entropy. It’s a kind of sequel to our paper with Tom Leinster, in which we characterized entropy.

That earlier paper was developed in conversations on the n-Category Café. It was a lot of fun; I sort of miss that style of working. Also, to get warmed up, I need to think through some things I’ve thought about before. So, I might as well write them down here.

### The idea

There are many categories related to probability theory, and they’re related in many ways. Last summer—on the 24th of August 2012, according to my notes here—Jamie Vicary, Brendan Fong and I worked through a bunch of these relationships. I need to write them down now, even if they’re not all vitally important to my paper with Tobias. They’re sort of buzzing around my brain like flies.

(Tobias knows this stuff too, and this is how we think about probability theory, but we weren’t planning to stick it in our paper. Maybe we should.)

Let’s restrict attention to probability measures on finite sets, and related structures. We could study these questions more generally, and we should, but not today. What we’ll do is give a unified purely algebraic description of:

• finite sets

• measures on finite sets

• probability measures on finite sets

and various kinds of maps between these:

• functions

• bijections

• measure-preserving functions

• stochastic maps

### Finitely generated free [0,∞)-modules

People often do linear algebra over a field, which is—roughly speaking—a number system where you can add, subtract, multiply and divide. But algebraists have long realized that a lot of linear algebra still works with a commutative ring, where you can’t necessarily divide. It gets more complicated, but also a lot more interesting.

But in fact, a lot still works with a commutative rig, where we can’t necessarily subtract either! Something I keep telling everyone is that linear algebra over rigs is a good idea for studying things like probability theory, thermodynamics, and the principle of least action.

Today we’ll start with the rig of nonnegative real numbers with their usual addition and multiplication; let’s call this $[0,\infty) .$ The idea is that measure theory, and probability theory, are closely related to linear algebra over this rig.

Let $C$ be the category with of finitely generated free $[0,\infty)$-modules as objects, and module homomorphisms as morphisms. I’ll call these morphisms maps.

Puzzle. Do we need to say ‘free’ here? Are there finitely generated modules over $[0,\infty)$ that aren’t free?

Every finitely generated free $[0,\infty)$-module is isomorphic to $[0,\infty)^S$ for some finite set $S .$ In other words, it’s isomorphic to $[0,\infty)^n$ for some $n = 0, 1, 2, \dots .$ So, $C$ is equivalent to the category where objects are natural numbers, a morphism from $m$ to $n$ is an $m \times n$ matrix of numbers in $[0,\infty) ,$ and composition is done by matrix multiplication. I’ll also call this equivalent category $C.$

We can take tensor products of finitely generated free modules, and this makes $C$ into a symmetric monoidal †-category. This means we can draw maps using string diagrams in the usual way. However, I’m feeling lazy so I’ll often write equations when I could be drawing diagrams.

One of the rules of the game is that all these equations will make sense in any symmetric monoidal †-category. So we could, if we wanted, generalize ideas from probability theory this way. If you want to do this, you’ll need to know that $[0,\infty)$ is the unit for the tensor product in $C.$ We’ll be seeing this guy $[0,\infty)$ a lot. So if you want to generalize, replace $C$ by any symmetric monoidal †-category, and replace $[0,\infty)$ by the unit for the tensor product.

### Finite sets

There’s a way to see the category of finite sets lurking in $C,$ which we can borrow from this paper:

• Bob Coecke, Dusko Pavlovic and Jamie Vicary, A new description of orthogonal bases.

For any finite set $S ,$ we get a free finitely generated $[0,\infty)$-module, namely $[0,\infty)^S .$ This comes with some structure:

• a multiplication $m: [0,\infty)^S \otimes [0,\infty)^S \to [0,\infty)^S ,$ coming from pointwise multiplication of $[0,\infty)$-valued functions on $S$

• the unit for this multiplication, an element of $[0,\infty)^S,$ which we can write as a morphism $i: [0,\infty) \to [0,\infty)^S$

• a comultiplication, obtained by taking the diagonal map $\Delta : S \to S \times S$ and promoting it to a linear map $\Delta : [0,\infty)^S \to [0, \infty)^S \otimes [0,\infty)^S$

• a counit for this comultiplication, obtained by taking the unique map to the terminal set $! : S \to 1$ and promoting it to a linear map $e: [0,\infty)^S \to [0,\infty)$

These morphisms $m, i, \Delta, e$ make

$x = [0,\infty)^S$

into a commutative Frobenius algebra in $C .$ That’s a thing where the unit, counit, multiplication and comultiplication obey these laws:

(I drew these back when I was feeling less lazy.) This Frobenius algebra is also ‘special’, meaning it obeys this:

And it’s also a †-Frobenius algebra, meaning that the counit and comultiplication are obtained from the unit and multiplication by ‘flipping’ them using the †category structure. (If we think of a morphism in $C$ as a matrix, its dagger is its transpose.)

Conversely, suppose we have any special commutative †-Frobenius algebra $x .$ Then using the ideas in the paper by Coecke, Pavlovich and Vicary we can recover a basis for $x ,$ consisting of the vectors $e_i \in x$ with

$\Delta(e_i) = e_i \otimes e_i$

This basis forms a set $S$ such that

$x \cong [0,\infty)^S$

for some specified isomorphism in $C.$ Furthermore, this is an isomorphism of special commutative †-Frobenius algebras!

In case you’re wondering, these vectors $e_i$ correspond to the functions on $S$ that are zero everywhere except at one point $i \in S,$ where they equal 1.

In short, a special commutative †-Frobenius algebra in $C$ is just a fancy way of talking about a finite set. This may seem silly, but it’s a way to start describing probability theory using linear algebra very much as we do with quantum theory. This analogy between quantum theory and probability theory is so interesting that it deserves a book.

### Functions and bijections

Now suppose we have two special commutative †-Frobenius algebra in $C$, say $x$ and $y .$

Suppose $f : x \to y$ is a Frobenius algebra homomorphism: that is, a map preserving all the structure—the unit, counit, multiplication and comultiplication. Then it comes from an isomorphism of finite sets. This lets us find $\mathrm{FinSet}_0 ,$ the groupoid of finite sets and bijections, inside $C.$

Alternatively, suppose $f : x \to y$ is just a coalgebra homomorphism: that is a map preserving just the counit and comultiplication. Then it comes from an arbitrary function between finite sets. This lets us find $FinSet ,$ the category of finite sets and functions, inside $C .$

But what if $f$ preserves just the counit? This sounds like a dry, formal question. But it’s not: the answer is something useful, a ‘stochastic map’.

### Stochastic maps

A stochastic map from a finite set $S$ to a finite set $T$ is a map sending each point of $S$ to a probability measure on $T .$

We can think of this as a $T \times S$-shaped matrix of numbers in $[0,\infty) ,$ where a given column gives the probability that a given point in $S$ goes to any point in $T .$ The sum of the numbers in each column will be 1. And conversely, any $T \times S$-shaped matrix of numbers in $[0,\infty) ,$ where each column sums to 1, gives a stochastic map from $S$ to $T .$

But now let’s describe this idea using the category $C.$ We’ve seen a finite set is the same as a special commutative †-Frobenius algebra. So, say we have two of these, $x$ and $y .$ Our matrix of numbers in $[0,\infty)$ is just a map

$f: x \to y$

So, we just need a way to state the condition that each column in the matrix sums to 1. And this condition simply says that $f$ preserves the counit:

$\epsilon_y \circ f = \epsilon_x$

where $\epsilon_x : x \to [0,\infty)$ is the counit for $x ,$ and similarly for $\epsilon_y .$

To understand this, note that if we use the canonical isomorphism

$x \cong [0,\infty)^S$

the counit $\epsilon_x$ can be seen as the map

$[0,\infty)^S \to [0,\infty)$

that takes any $S$-tuple of numbers and sums them up. In other words, it’s integration with respect to counting measure. So, the equation

$\epsilon_y \circ f = \epsilon_x$

says that if we take any $S$-tuple of numbers, multiply it by the matrix $f ,$ and then sum up the entries of the resulting $T$-tuple, it’s the same as if we summed up the original $S$-tuple. But this says precisely that each column of the matrix $f$ sums to 1.

So, we can use our formalism to describe $\mathrm{FinStoch},$ the category with finite sets as objects and stochastic maps as morphisms. We’ve seen this category is equivalent to the category with special commutative †-Frobenius algebras in $C$ as objects and counit-preserving maps as morphisms.

### Finite measure spaces

Now let’s use our formalism to describe finite measure spaces—by which, beware, I mean a finite sets equipped with measures! To do this, we’ll use a special commutative †-Frobenius algebra $x$ in $C$ together with any map

$\mu: [0,\infty) \to x$

Starting from these, we get a specified isomorphism

$x \cong [0,\infty)^S$

and $\mu$ sends the number 1 to a vector in $[0,\infty)^S$: that is, a function on $S$ taking values in $[0,\infty) .$ Multiplying this function by counting measure, we get a measure on $S .$

Puzzle. How can we describe this measure without the annoying use of counting measure?

Conversely, any measure on a finite set gives a special commutative †-Frobenius algebra $x$ in $C$ equipped with a map from $[0,\infty) .$

So, we can say a finite measure space is a special commutative †-Frobenius algebra in $C$ equipped with a map

$\mu: [0,\infty) \to x$

And given two of these,

$\mu: [0,\infty) \to x , \qquad \nu: [0,\infty) \to y$

and a coalgebra morphism

$f : x \to y$

obeying this equation

$f \circ \mu = \nu$

then we get a measure-preserving function between finite measure spaces! If you’re a category theorist, you’ll draw this equation as a commutative triangle:

Conversely, any measure-preserving function between finite measure spaces obeys this equation. So, we get an algebraic way of describing the category $\mathrm{FinMeas} ,$ with finite measure spaces as objects and measure-preserving maps as morphisms.

### Finite probability measure spaces

I’m mainly interested in probability measures. So suppose $x$ is a special commutative †-Frobenius algebra in $C$ equipped with a map

$\mu: [0,\infty) \to x$

We’ve seen this gives a finite measure space. But this is a probability measure space if and only if

$e \circ \mu = 1$

where

$e : x \to [0,\infty)$

is the counit for $x .$ The equation simply says the total integral of our measure $\mu$ is 1.

So, we get a way of describing the category $\mathrm{FinProb} ,$ which has finite probability measure spaces as objects and measure-preserving maps as objects. Given finite probability measure spaces described this way:

$\mu: [0,\infty) \to x , \qquad \nu: [0,\infty) \to y$

a measure-preserving function is a coalgebra morphism

$f : x \to y$

such that the obvious triangle commutes:

$f \circ \mu = \nu$

### Measure-preserving stochastic maps

Say we have two finite measure spaces. Then we can ask whether a stochastic map from one to the other is measure-preserving. And we can answer this question in the language of $C .$

Remember, a finite measure space is a special commutative †-Frobenius algebra $x$ in $C$ together with a map

$\mu: [0,\infty) \to x$

Say we have another one:

$\nu: [0,\infty) \to y$

A stochastic map is just a map

$f: x \to y$

that preserves the counit:

$\epsilon_y \circ f = \epsilon_x$

But it’s a measure-preserving stochastic map if also

$f \circ \mu = \nu$

### Next…

There’s a lot more to say; I haven’t gotten anywhere near what Tobias and I are doing! But it’s pleasant to have this basic stuff written down.

## Quantum Techniques for Reaction Networks

11 June, 2013

Fans of the network theory series might like to look at this paper:

• John Baez, Quantum techniques for reaction networks.

and I would certainly appreciate comments and corrections.

This paper tackles a basic question we never got around to discussing: how the probabilistic description of a system where bunches of things randomly interact and turn into other bunches of things can reduce to a deterministic description in the limit where there are lots of things!

Mathematically, such systems are given by ‘stochastic Petri nets’, or if you prefer, ‘stochastic reaction networks’. These are just two equivalent pictures of the same thing. For example, we could describe some chemical reactions using this Petri net:

but chemists would use this reaction network:

C + O2 → CO2
CO2 + NaOH → NaHCO3
NaHCO3 + HCl → H2O + NaCl + CO2

Making either of them ‘stochastic’ merely means that we specify a ‘rate constant’ for each reaction, saying how probable it is.

For any such system we get a ‘master equation’ describing how the probability of having any number of things of each kind changes with time. In the class I taught on this last quarter, the students and I figured out how to derive from this an equation saying how the expected number of things of each kind changes with time. Later I figured out a much slicker argument… but either way, we get this result:

Theorem. For any stochastic reaction network and any stochastic state $\Psi(t)$ evolving in time according to the master equation, then

$\displaystyle{ \frac{d}{dt} \langle N \Psi(t) \rangle } = \displaystyle{\sum_{\tau \in T}} \, r(\tau) \, (s(\tau) - t(\tau)) \; \left\langle N^{\underline{s(\tau)}}\, \Psi(t) \right\rangle$

assuming the derivative exists.

Of course this will make no sense yet if you haven’t been following the network theory series! But I explain all the notation in the paper, so don’t be scared. The main point is that $\langle N \Psi(t) \rangle$ is a vector listing the expected number of things of each kind at time $t.$ The equation above says how this changes with time… but it closely resembles the ‘rate equation’, which describes the evolution of chemical systems in a deterministic way.

And indeed, the next big theorem says that the master equation actually implies the rate equation when the probability of having various numbers of things of each kind is given by a product of independent Poisson distributions. In this case $\Psi(t)$ is what people in quantum physics call a ‘coherent state’. So:

Theorem. Given any stochastic reaction network, let
$\Psi(t)$ be a mixed state evolving in time according to the master equation. If $\Psi(t)$ is a coherent state when $t = t_0,$ then $\langle N \Psi(t) \rangle$ obeys the rate equation when $t = t_0.$

In most cases, this only applies exactly at one moment of time: later $\Psi(t)$ will cease to be a coherent state. Then we must resort to the previous theorem to see how the expected number of things of each kind changes with time.

But sometimes our state $\Psi(t)$ will stay coherent forever! For one case where this happens, see the companion paper, which I blogged about a little while ago:

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

We wrote this first, but logically it comes after the one I just finished now!

All this material will get folded into the book I’m writing with Jacob Biamonte. There are just a few remaining loose ends that need to be tied up.

## Quantum Techniques for Studying Equilibrium in Reaction Networks

16 May, 2013

The summer before last, I invited Brendan Fong to Singapore to work with me on my new ‘network theory’ project. He quickly came up with a nice new proof of a result about mathematical chemistry. We blogged about it, and I added it to my book, but then he became a grad student at Oxford and got distracted by other kinds of networks—namely, Bayesian networks.

So, we’ve just now finally written up this result as a self-contained paper:

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

Check it out and let us know if you spot mistakes or stuff that’s not clear!

The idea, in brief, is to use math from quantum field theory to give a somewhat new proof of the Anderson–Craciun–Kurtz theorem.

This remarkable result says that in many cases, we can start with an equilibrium solution of the ‘rate equation’ which describes the behavior of chemical reactions in a deterministic way in the limit of a large numbers of molecules, and get an equilibrium solution of the ‘master equation’ which describes chemical reactions probabilistically for any number of molecules.

The trick, in our approach, is to start with a chemical reaction network, which is something like this:

and use it to write down a Hamiltonian describing the time evolution of the probability that you have various numbers of each kind of molecule: A, B, C, D, E, … Using ideas from quantum mechanics, we can write this Hamiltonian in terms of annihilation and creation operators—even though our problem involves probability theory, not quantum mechanics! Then we can write down the equilibrium solution as a ‘coherent state’. In quantum mechanics, that’s a quantum state that approximates a classical one as well as possible.

All this is part of a larger plan to take tricks from quantum mechanics and apply them to ‘stochastic mechanics’, simply by working with real numbers representing probabilities instead of complex numbers representing amplitudes!

I should add that Brendan’s work on Bayesian networks is also very cool, and I plan to talk about it here and even work it into the grand network theory project I have in mind. But this may take quite a long time, so for now you should read his paper:

• Brendan Fong, Causal theories: a categorical perspective on Bayesian networks.

## Game Theory (Part 11)

6 February, 2013

Here’s a game. I flip a fair coin. If it lands heads up, I give you $1. If it lands tails up, I give you nothing. How much should you pay to play this game? This is not a mathematics question, because it asks what you “should” do. This could depend on many things that aren’t stated in the question. Nonetheless, mathematicians have a way they like to answer this question. They do it by computing the so-called ‘expected value’ of your payoff. With probability 1/2 you get 1 dollar; with probability 1/2 you get 0 dollars. So, the expected value is defined to be $\displaystyle{ \frac{1}{2} \times 1 + \frac{1}{2} \times 0 = \frac{1}{2} }$ Don’t be fooled by the word ‘expected’: mathematicians use words in funny ways. I’m not saying you should expect to get 1/2 a dollar each time you play this game: obviously you don’t! It means that you get 1/2 a dollar ‘on average’. More precisely: if you play the game lots of times, say a million times, there’s a high probability that you’ll get fairly close to 1/2 a million dollars. (We could make this more precise and prove it, but that would be quite a digression right now.) So, if you have lots of money and lots of time, you could pay up to 1/2 a dollar to play this game, over and over, and still make money on average. If you pay exactly 1/2 a dollar you won’t make money on average, but you won’t lose it either—on average. ### Expected values Let’s make the idea precise: Definition. Suppose $X$ is a finite set and $p$ is a probability distribution on that set. Suppose $f: X \to \mathbb{R}$ is a function from $X$ to $\mathbb{R}.$ Then the expected value of $f$ with respect to $p$ is defined to be $\displaystyle{ \langle f \rangle = \sum_{i \in X} p_i f(i) }$ The idea here is that we are averaging the different values $f(i)$ of the function $f$, but we count $f(i)$ more when the probability of the event $i$ is bigger. We pronounce $\langle f \rangle$ like this: “the expected value of $f$“. ### Examples Example 1. Suppose you enter a lottery have a 0.01% chance of winning$100 and a 99.99% chance of winning nothing. What is the expected value of your payoff?

With probability 0.0001 you win $100. With probability .9999 you win zero dollars. So, your expected payoff is $\displaystyle{ 0.0001 \times 100 + .9999 \times 0 = 0.01 }$ dollars. So: if you play this game over and over, you expect that on average you will win a penny per game. But usually you have to pay to enter a lottery! This changes everything. Let’s see how: Example 2. Suppose you pay$5 to enter a lottery. Suppose you have a 0.01% chance of winning $100 and a 99.99% chance of winning nothing. What is the expected value of your payoff, including your winnings but also the money you paid? With probability 0.0001 you win$100, but pay $5, so your payoff is$95 in this case. With probability .9999 you win nothing, but pay $5, so your payoff is -$5 in this case. So, your expected payoff is

$\displaystyle{ 0.0001 \times 95 - .9999 \times 5 = - 4.99 }$

dollars. In simple terms: if we play this game over and over, we expect that on average we will lose $4.99 per play. Example 3. Suppose you pay$5 to play a game where you
flip a coin 5 times. Suppose the coin is fair and the flips are independent. If the coin lands heads up every time, you win $100. Otherwise you win nothing. What is the expected value of your payoff, including your winnings but also the money you paid? Since the coin flips are fair and independent, the probability that it lands heads up every time is $\displaystyle{ \frac{1}{2^5} = \frac{1}{32} }$ So, when we count the$5 you pay to play, with probability 1/32 your payoff is $95, and with probability (1 – 1/32) = 31/32 your payoff is -$5. The expected value of your payoff is thus

$\displaystyle{ \frac{1}{32} \times 95 - \frac{31}{32} \times 5 = -1.875 }$

dollars.

### Risk aversion and risk tolerance

Soon we’ll start talking about games where players used ‘mixed strategies’, meaning that they randomly make their choices according to some probability distribution. To keep the math simple, we will assume our ‘rational agents’ want to maximize the expected value of their payoff.

But it’s important to remember that life is not really so simple, especially if payoffs are measured in dollars. Rational agents may have good reasons to do something else!

For example, suppose some evil fiend says they’ve kidnapped my wife and they’ll kill her unless I give him a dollar. Suppose I only have 99 cents. But suppose they offer me a chance to play this game: I flip a fair coin, and if it lands heads up, I get $1. If it lands tails up, I get nothing. How much would I pay to play this game? Assuming I had no way to call the police, etcetera, I would pay all my 99 cents to play this game. After all, if I don’t play it, my wife will die! But if I do play it, I would at least have a 50% chance of saving her. The point here is that my happiness, or utility, is not proportional to my amount of money. If I have less than$1, I’m really miserable. If I have \$1 or more, I’m much better off.

There are many other reasons why people might be willing to pay more or less to play a game than the expected value of its monetary payoff. Some people are risk tolerant: they are willing to accept higher risks to get a chance at a higher payoffs. Others are risk averse: they would prefer to have a high probability of getting a payoff even if it’s not so big. See:

Risk aversion, Wikipedia.

In class I asked all the students: would you like to play the following game? I’ll flip a fair coin. Then I’ll double your quiz score for today if it comes heads, but give you a zero for your quiz score if it comes up tails.

Suppose your quiz score is $Q.$ If you get heads, I’ll give you $Q$ more points. If you get tails, I’ll take away $Q$ points. So the expected value of the payoff for this game, measured in points, is

$\displaystyle{ \frac{1}{2} \times Q - \frac{1}{2} \times Q = 0 }$

So, if the expected value is what matters to you, you’ll be right on the brink of wanting to play this game: it doesn’t help you, and it doesn’t hurt you.

But in reality, different people will make different decisions. I polled the students, using our electronic clicker system, and 46% said they wanted to play this game. 54% said they did not.

Then I changed the game. I said that I would roll a fair 6-sided die. If a 6 came up, I would multiply their quiz score by 6. Otherwise I would set their quiz score to zero.

If your quiz score is $Q$, your payoff if you win will be $5 Q$, since I’m multiplying your score by 6. If you lose, your payoff will be $-Q.$ So, the expected value of your payoff is still zero:

$\displaystyle{ \frac{1}{6} \times 5Q - \frac{5}{6} \times Q = 0 }$

But now the stakes are higher, in a certain sense. You can win more, but it’s less likely.

Only 30% of students wanted to play this new game, while 70% said they would not.

I got the students who wanted to play to hand in slips of paper with their names on them. I put them in a hat and had a student randomly choose one. The winner got to play this game. He rolled a 1. So, his quiz score for the day went to zero.

Ouch!

Here is a famous beggar in San Francisco:

## Game Theory (Part 10)

5 February, 2013

Last time we solved some probability puzzles involving coin flips. This time we’ll look at puzzles involving cards.

### Permutations

Example 1. How many ways are there to order 3 cards: a jack (J), a queen (Q), and a king (K)?

By order them I mean put one on top, then one in the middle, then one on the bottom. There are three choices for the first card: it can be A, Q, or K. That leaves two choices for what the second card can be, and just one for the third. So, there are

$3 \times 2 \times 1 = 6$

ways to order the cards.

Example 2. How many ways are there to order all 52 cards in an ordinary deck?

By the same reasoning, the answer is

$52 \times 51 \times 50 \times \cdots \times 2 \times 1$

This is a huge number. We call it 52 factorial, or 52! for short. I guess the exclamation mark emphasizes how huge this number is. In fact

$52! \approx 8.06 \times 10^{67}$

This is smaller than the number of atoms in the observable universe, which is about $10^{80}.$ But it’s much bigger than the number of galaxies in the observable universe, which is about $10^{11}$, or even the number of stars in the observable universe, which is roughly $10^{22}.$ It’s impressive that we can hold such a big number in our hand… in the form of possible ways to order a deck of cards!

### A well-shuffled deck

Definition 1. We say a deck is well-shuffled if each of the possible ways of ordering the cards in the deck has the same probability.

Example 3. If a deck of cards is well-shuffled, what’s the probability that it’s in this order?

Since all orders have the same probability, and there are $52!$ of them, the probability that they’re in any particular order is

$\displaystyle{ \frac{1}{52!} }$

$\displaystyle{ \frac{1}{52!} \approx 1.24 \times 10^{-68} }$

### A hand from a well-shuffled deck

Suppose you take the top $k$ cards from a well-shuffled deck of $n$ cards. You’ll get a subset of cards—though card players call this a hand of cards instead of a subset. And, there are $n$ choose $k$ possible hands you could get! Remember from last time:

Definition 2. The binomial coefficient

$\displaystyle{ \binom{n}{k} = \frac{n(n-1)(n-2) \cdots (n-k+1)}{k(k-1)(k-2) \cdots 1}}$

called $n$ choose $k$ is the number of ways of choosing a subset of $k$ things from a set of $n$ things.

I guess card-players call a set a ‘deck’, and a subset a ‘hand’. But now we can write a cool new formula for $n$ choose $k.$ Just multiply the top and bottom of that big fraction by

$\displaystyle{ (n-k)(n-k-1) \cdots 1}$

We get

$\begin{array}{ccl} \displaystyle{ \binom{n}{k}} &=& \displaystyle{ \frac{n(n-1)(n-2) \cdots 1}{(k(k-1)(k-2) \cdots 1)((n-k)(n-k-1) \cdots 1)} } \\ &=& \displaystyle{ \frac{n!}{k! (n-k)!} } \end{array}$

I won’t do it here, but here’s something you can prove using stuff I’ve told you. Suppose you have a well-shuffled deck of $n$ cards and you draw a hand of $k$ cards. Then each of these hands is equally probable!

Using this we can solve lots of puzzles.

Example 4. If you draw a hand of 5 cards from a well-shuffled standard deck, what’s the probability that you get the 10, jack, queen, king and ace of spades?

Since I’m claiming that all hands are equally probable, we just need to count the number of hands, and take the reciprocal of that.

There are

$\displaystyle{ \binom{52}{5} = \frac{52 \times 51 \times 50 \times 49 \times 48}{5 \times 4 \times 3 \times 2 \times 1} }$

5-card hands drawn from a 52-card deck. So, the probability of getting any particular hand is

$\displaystyle{ \frac{1}{\binom{52}{5}} = \frac{5 \times 4 \times 3 \times 2 \times 1}{52 \times 51 \times 50 \times 49 \times 48} }$

We can simplify this a bit since 50 is 5 × 10 and 48 is twice 4 × 3 × 2 × 1. So, the probability is

$\displaystyle{ \frac{1}{52 \times 51 \times 10 \times 49 \times 2} = \frac{1}{2598960} \approx 3.85 \cdot 10^{-7}}$

### A royal flush

The hand we just saw:

{10♠, J♠, Q♠, K♠, A♠}

is an example of a ‘royal flush’… the best kind of hand in poker!

Definition 3. A straight is a hand of five cards that can be arranged in a consecutive sequence, for example:

{7♥, 8♣, 9♠, 10♠, J♦}

Definition 4. A straight flush is a straight whose cards are all of the same suit, for example:

{7♣, 8♣, 9♣, 10♣, J♣}

Definition 5. A royal flush is a straight flush where the cards go from 10 to ace, for example:

{10♠, J♠, Q♠, K♠, A♠}

Example 5. If you draw a 5-card hand from a standard deck, what is the probability that it is a royal flush?

We have seen that each 5-card hand has probability

$\displaystyle{ \frac{1}{\binom{52}{5}} = \frac{1}{2598960} }$

There are just 4 royal flushes, one for each suit. So, the probability of getting a royal flush is

$\displaystyle{ \frac{4}{\binom{52}{5}} = \frac{1}{649740} \approx 0.000154\%}$

### Puzzles

Suppose you have a well-shuffled standard deck of 52 cards, and you draw a hand of 5 cards.

Puzzle 1. What is the probability that the hand is a straight flush?

Puzzle 2. What is the probability that the hand is a straight flush but not a royal flush?

Puzzle 3. What is the probability that the hand is a straight?

Puzzle 4. What is the probability that the hand is a straight but not a straight flush?

## Game Theory (Part 9)

5 February, 2013

Last time we talked about independence of a pair of events, but we can easily go on and talk about independence of a longer sequence of events. For example, suppose we have three coins. Suppose:

• the 1st coin has probability $p_H$ of landing heads up and $p_T$ of landing tails up;
• the 2nd coin has probability $q_H$ of landing heads up and $q_T$ of landing tails up;
• the 3rd coin has probability $r_H$ of landing heads up and $r_T$ of landing tails up.

Suppose we flip all of these coins: the 1st, then the 2nd, then the 3rd. What’s the probability that we get this sequence of results:

$(H, T, T)$

If the coin flips are independent, the probability is just this product:

$p_H \, q_T \, r_T$

See the pattern? We just multiply the probabilities. And there’s nothing special about coins here, or the number three. We could flip a coin, roll a die, pick a card, and see if it’s raining outside.

For example, what’s the probability that we get heads with our coin, the number 6 on our die, an ace of spades with our cards, and it’s raining? If these events are independent, we just calculate:

the probability that we get heads, times
the probability that we roll a 6, times
the probability that we get an ace of spades, times
the probability that it’s raining outside.

Let’s solve some puzzles using this idea!

### Three flips of a fair coin

Example 1. Suppose you have a fair coin: this means it has a 50% chance of landing heads up and a 50% chance of landing tails up. Suppose you flip it three times and these flips are independent. What is the probability that it lands heads up, then tails up, then heads up?

$(H, T, H)$

Since the flips are independent this is

$p_{(H,T,H)} = p_H \, p_T \, p_H$

Since the coin is fair we have

$\displaystyle{ p_H = p_T = \frac{1}{2} }$

so

$\displaystyle{ p_H p_T p_H = \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} = \frac{1}{8} }$

So the answer is 1/8, or 12.5%.

Example 2. In the same situation, what’s the probability that the coin lands heads up exactly twice?

There are 2 × 2 × 2 = 8 events that can happen:

$(H,H,H)$
$(H,H,T), \; (H,T,H), \; (T,H,H)$
$(H,T,T), \; (T,H,T), \; (T,T,H)$
$(T,T,T)$

We can work out the probability of each of these events. For example, we’ve already seen that $(H,T,H)$ is

$\displaystyle{ p_{(H,T,H)} = p_H p_T p_H = \frac{1}{8} }$

since the coin is fair and the flips are independent. In fact, all 8 probabilities work out the same way. We always get 1/8. In other words, each of the 8 events is equally likely!

But we’re interested in the probability that we get exactly two heads. That’s the probability of this subset:

$S = \{ (T,H,H), (H,T,H), (H,H,T) \}$

Using the rule we saw in Part 7, this probability is

$\displaystyle{ p(S) = p_{(T,H,H)} + p_{(H,T,H)} + p_{(H,H,T)} = 3 \times \frac{1}{8} }$

So the answer is 3/8, or 37.5%.

I could have done this a lot faster. I could say “there are 8 events that can happen, each equally likely, and three that give us two heads, so the probability is 3/8.” But I wanted to show you how we’re just following rules we’ve already seen!

### Three flips of a very unfair coin

Example 3. Now suppose we have an unfair coin with a 90% chance of landing heads up and 10% chance of landing tails up! What’s the probability that if we flip it three times, it lands heads up exactly twice? Again let’s assume the coin flips are independent.

Most of the calculation works exactly the same way, but now our coin has

$\displaystyle{ p_H = 0.9, \quad p_T = 0.1 }$

We’re interested in the events where the coin comes up heads twice, so we look at this subset:

$S = \{ (T,H,H), (H,T,H), (H,H,T) \}$

The probability of this subset is

$\begin{array}{ccl} p(S) &=& p_{(T,H,H)} + p_{(H,T,H)} + p_{(H,H,T)} \\ &=& p_T \, p_H \, p_H + p_H \, p_T \, p_H + p_H \, p_H \, p_T \\ &=& 3 p_T p_H^2 \\ &=& 3 \times 0.1 \times 0.9^2 \\ &=& 0.3 \times 0.81 \\ &=& 0.243 \end{array}$

So now the probability is just 24.3%.

### Six flips of a fair coin

Example 4. Suppose you have a fair coin. Suppose you flip it six times and these flips are independent. What is the probability that it lands heads up exactly twice?

We did a similar problem already, where we flipped the coin three times. Go back and look at that if you forget! The answer to that problem was

$\displaystyle{ 3 \times \frac{1}{8} }$

Why? Here’s why: there were 3 ways to get two heads when you flipped 3 coins, and each of these events had probability

$\displaystyle{ \left(\frac{1}{2}\right)^3 = \frac{1}{8} }$

We can do our new problem the same way. Count the number of ways to get two heads when we flip six coins. Then multiply this by

$\displaystyle{ \left(\frac{1}{2}\right)^6 = \frac{1}{64} }$

The hard part is to count how many ways we can get two heads when we flip six coins. To get good at probabilities, we have to get good at counting. It’s boring to list all the events we’re trying to count:

(H,H,T,T,T,T), (H,T,H,T,T,T), (H,T,T,H,T,T), …

So let’s try to come up with a better idea.

We have to pick 2 out of our 6 flips to be H’s. How many ways are there to do this?

There are 6 ways to pick one of the flips and draw a red H on it, and then 5 ways left over to pick another and draw a blue H on it… letting the rest be T’s. For example:

(T, H, T, T, H, T)

So, we’ve got 6 × 5 = 30 choices. But we don’t really care which H is red and which H is blue—that’s just a trick to help us solve the problem. For example, we don’t want to count

(T, H, T, T, H, T)

as different from

(T, H, T, T, H, T)

So, there aren’t really 30 ways to get two heads. There are only half as many! There are 15 ways.

So, the probability of getting two heads when we flip the coin six times is

$\displaystyle{ 15 \times \frac{1}{64} = \frac{15}{64} \approx .234 }$

where the squiggle means ‘approximately’. So: about 23.4%.

### Binomial coefficients

Now for some jargon, which will help when we do harder problems like this. We say there are 6 choose 2 ways to choose 2 out of 6 things, and we write this as

$\displaystyle{ \binom{6}{2} }$

This sort of number is called a binomial coefficient.

We’ve just shown that

$\displaystyle{ \binom{6}{2} = \frac{6 \times 5}{2 \times 1} = 15 }$

Why write it like this funky fraction: $\frac{6 \times 5}{2 \times 1}$? Because it’ll help us see the pattern for doing harder problems like this!

### Nine flips of a fair coin

If we flip a fair coin 9 times, and the flips are independent, what’s the probability that we get heads exactly 6 times?

This works just like the last problem, only the numbers are bigger. So, I’ll do it faster!

When we flip the coin 9 times there are $2^9$ possible events that can happen. Each of these is equally likely if it’s a fair coin and the flips are independent. So each has probability

$\displaystyle{ \frac{1}{2^9} }$

To get the answer, we need to multiply this by the number of ways we can get heads exactly 6 times. This number is called ’9 choose 6′ or

$\displaystyle{ \binom{9}{6} }$

for short. It’s the number of ways we can choose 6 things out of a collection of 9.

So we just need to know: what’s 9 choose 6? We can work this out as before. There are 9 ways to pick one of the flips and draw a red H on it, then 8 ways left to pick another and draw a blue H on it, and 7 ways left to pick a third and draw a orange H on it. That sounds like 9 × 8 × 7.

But we’ve overcounted! After all, we don’t care about the colors. We don’t care about the difference between this:

(T, H, T, T, H, T, T, H, T)

and this:

(T, H, T, T, H, T, T, H, T)

In fact we’ve counted each possibility 6 times! Why six? The first H could be red, green or blue—that’s 3 choices. But then the second H could be either of the two remaining 2 colors… and for the third, we just have 1 choice. So there are 3 × 2 × 1 = 6 ways to permute the colors.

So, the actual number of ways to get 6 heads out of 9 coin flips is

$\displaystyle{ \frac{9 \times 8 \times 7}{3 \times 2 \times 1} }$

In other words:

$\displaystyle{ \binom{9}{6} = \frac{9 \times 8 \times 7}{3 \times 2 \times 1} }$

To get the answer to our actual problem, remember we need to multiply $1/2^9$ by this. So the answer is

$\displaystyle{ \frac{1}{2^9} \times \binom{9}{6} }$

If you’re a pure mathematician, you can say you’re done now. But normal people won’t understand this answer, so let’s calculate it out. I hope you know the first ten powers of two: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024. So:

$\displaystyle{ 2^9 = 512 }$

I hope you can also do basic arithmetic like this:

$\displaystyle{ \binom{9}{6} = \frac{9 \times 8 \times 7}{3 \times 2 \times 1} = 84}$

So, the probability of getting 6 heads when you do 9 independent flips of a fair coin is

$\displaystyle{ \frac{1}{2^9} \times \binom{9}{6} = \frac{84}{512} = 0.164025 }$

or 16.4025%. I broke down and used a calculator at the last step. We’re becoming serious nerds here.

Okay, that’s enough for now. We’ve been counting how many ways we can get a certain number of heads from a certain number of coin flips. What we’re realy doing is taking a set of coin flips, say $n$ of them, and choosing a subset of $k$ of them to be heads. So, we say

Definition. The binomial coefficient

$\displaystyle{ \binom{n}{k} }$

called $n$ choose $k,$ is the number of ways of choosing a subset of $k$ things from a set of $n$ things.

We have seen in some examples that

$\displaystyle{ \binom{n}{k} = \frac{n(n-1)(n-2) \cdots (n-k+1)}{k(k-1)(k-2) \cdots 1} }$

Here there’s a product of $k$ consecutive numbers on top, and $k$ on bottom too. We didn’t prove this is true in general, but it’s not hard to see, using the tricks we’ve used already.

## Game Theory (Part 8)

28 January, 2013

Last time we learned some rules for calculating probabilities. But we need a few more rules to get very far.

For example:

We say a coin is fair if it has probability 1/2 of landing heads up and probability 1/2 of landing tails up. What is the probability that if we flip two fair coins, both will land heads up?

Since each coin could land heads up or tails up, there are 4 events to consider here:

$(H,H), (H,T),$
$(T,H), (T,T)$

It seems plausible that each should be equally likely. If so, each has probability 1/4. So then the answer to our question would be 1/4.

But this is plausible only because we’re assuming that what one coin does doesn’t affect that the other one does! In other words, we’re assuming the two coin flips are ‘independent’.

If the coins were connected in some sneaky way, maybe each time one landed heads up, the other would land tails up. Then the answer to our question would be zero. Of course this seems silly. But it’s good to be very clear about this issue… because sometimes one event does affect another!

For example, suppose there’s a 5% probability of rain each day in the winter in Riverside. What’s the probability that it rains two days in a row? Remember that 5% is 0.05. So, you might guess the answer is

$0.05 \times 0.05 = 0.0025$

But this is wrong, because if it rains one day, that increases the probability that it will rain the next day. In other words, these events aren’t independent.

But if two events are independent, there’s an easy way to figure out the probability that they both happen: just multiply their probabilities! For example, if the chance that it will rain today in Riverside is 5% and the chance that it will rain tomorrow in Singapore is 60%, the chance that both these things will happen is

$0.05 \times 0.6 = 0.03$

or 3%, if these events are independent. I could try to persuade that this is a good rule, and maybe I will… but for now let’s just state it in a general way.

### Independence

So, let’s make a precise definition out of all this! Suppose we have two sets of events, $X$ and $Y.$ Remember that $X \times Y$, the Cartesian product of the sets $X$ and $Y$, is the set of all ordered pairs $(i,j)$ where $i \in X$ and $j \in Y$:

$X \times Y = \{ (i,j) : \; i \in X, j \in Y \}$

So, an event in $X \times Y$ consists of an event in $X$ and an event in $Y$. For example, if

$X = \{ \textrm{rain today}, \textrm{no rain today} \}$

and

$Y = \{ \textrm{rain tomorrow}, \textrm{no rain tomorrow} \}$

then

$X \times Y = \begin{array}{l} \{ \textrm{(rain today, rain tomorrow)}, \\ \textrm{(no rain today, rain tomorrow)}, \\ \textrm{(rain today, no rain tomorrow}, \\ \textrm{(no rain today, no rain tomorrow)} \} \end{array}$

Now we can define ‘independence’. It’s a rule for getting a probability distribution on $X \times Y$ from probability distributions on $X$ and $Y$:

Definition. Suppose $p$ is a probability distribution on a set of events $X,$ and $q$ is a probability distribution on a set of events $Y.$ If these events are independent, we use the probability distribution $r$ on $X \times Y$ given by

$r_{(i,j)} = p_i q_j$

People often call this probability distribution $p \times q$ instead of $r$.

### Examples

Example 1. Suppose we have a fair coin. This means we have a set of events

$X = \{H, T \}$

and a probability distribution $p$ with

$\displaystyle{ p_H = p_T = \frac{1}{2} }$

Now suppose we flip it twice. We get a set of four events:

$X \times X = \{(H,H), (H,T), (T,H), (T,T)\}$

Suppose the two coin flips are independent. Then we describe the pair of coin flips using the probability measure $r = p \times p$ on $X \times X,$ with

$\displaystyle{ r_{(H,H)} = p_H p_H = \frac{1}{4} }$

$\displaystyle{ r_{(H,T)} = p_H p_T = \frac{1}{4} }$

$\displaystyle{ r_{(T,H)} = p_T p_H = \frac{1}{4} }$

$\displaystyle{ r_{(T,T)} = p_T p_T = \frac{1}{4} }$

So, each of the four events—”heads, heads” and so on—has probability 1/4. This is fairly boring: you should have known this already!

But now we can do a harder example:

Example 2. Suppose we have an unfair coin that has a 60% chance of landing heads up and a 40% chance of landing tails up. Now we have a new probability distribution on $X,$ say $q$:

$\displaystyle{ q_H = .6, \quad q_T = .4 }$

Now say we flip this coin twice. What are the probabilities of the four different events that can happen? Let’s assume the two coin flips are independent. This means we should describe the pair of coin flips with a probability measure $s = q \times q$ on $X \times X.$ This tells us the answer to our question. We can work it out:

$\displaystyle{ s_{(H,H)} = q_H q_H = 0.6 \times 0.6 = 0.36 }$

$\displaystyle{ s_{(H,T)} = q_H q_T = 0.6 \times 0.4 = 0.24 }$

$\displaystyle{ s_{(T,H)} = q_T q_H = 0.4 \times 0.6 = 0.24 }$

$\displaystyle{ s_{(T,T)} = q_T q_T = 0.4 \times 0.4 = 0.16 }$

Puzzle 1. In this situation what is the probability that when we flip the coin twice it comes up heads exactly once?

Puzzle 2. In this situation what is the probability that when we flip the coin twice it comes up heads at least once?

For these puzzles you need to use what I told you in the section on ‘Probabilities of subsets’ near the end of Part 7.

Puzzle 3. Now suppose we have one fair coin and one coin that has a 60% chance of landing heads up. The first one is described by the probability distribution $p,$ while the second is described by $q.$ How likely is it that the first lands heads up and the second lands tails up? We can answer questions like this if the coin flips are independent. We do this by multiplying $p$ and $q$ to get a probability measure $t = p \times q$ on $X \times X.$ Remember the rule for how to do this:

$t_{(i,j)} = p_i q_j$

where each of $i$ and $j$ can be either $H$ or $T.$

What are these probabilities:

$\displaystyle{ t_{(H,H)} = ? }$

$\displaystyle{ t_{(H,T)} = ? }$

$\displaystyle{ t_{(T,H)} = ? }$

$\displaystyle{ t_{(T,T)} = ? }$

Puzzle 4. In this situation what is the probability that exactly one coin lands heads up?

Puzzle 5. In this situation what is the probability that at least one coin lands heads up?

Next time we’ll go a lot further…

## Game Theory (Part 7)

26 January, 2013

We need to learn a little probability theory to go further in our work on game theory.

We’ll start with some finite set $X$ of ‘events’. The idea is that these are things that can happen—for example, choices you could make while playing a game. A ‘probability distribution’ on this set assigns to each event a number called a ‘probability’—which says, roughly speaking, how likely that event is. If we’ve got some event $i,$ we’ll call its probability $p_i.$

For example, suppose we’re interested in whether it will rain today or not. Then we might look at a set of two events:

$X = \{\textrm{rain}, \textrm{no rain} \}$

If the weatherman says the chance of rain is 20%, then

$p_{\textrm{rain} } = 0.2$

since 20% is just a fancy way of saying 0.2. The chance of no rain will then be 80%, or 0.8, since the probabilities should add up to 1:

$p_{\textrm{no rain}} = 0.8$

Let’s make this precise with an official definition:

Definition. Given a finite set $X$ of events, a probability distribution $p$ assigns a real number $p_i$ called a probability to each event $i \in X,$ such that:

1) $0 \le p_i \le 1$

and

2) $\displaystyle{ \sum_{i \in X} p_i = 1}$

Note that this official definition doesn’t say what an event really is, and it doesn’t say what probabilities really mean. But that’s how it should be! As usual with math definitions, the words in boldface could be replaced by any other words and the definition would still do its main job, which is to let us prove theorems involving these words. If we wanted, we could call an event a doohickey, and call a probability a schnoofus. All our theorems would still be true.

Of course we hope our theorems will be useful in real world applications. And in these applications, the probabilities $p_i$ will be some way of measuring ‘how likely’ events are. But it’s actually quite hard to say precisely what probabilities really mean! People have been arguing about this for centuries. So it’s good that we separate this hard task from our definition above, which is quite simple and 100% precise.

Why is it hard to say what probabilities really are? Well, what does it mean to say “the probability of rain is 20%”? Suppose you see a weather report and read this. What does it mean?

A student suggests: “it means that if you looked at a lot of similar days, it would rain on 20% of them.”

Yes, that’s pretty good. But what counts as a “similar day”? How similar does it have to be? Does everyone have to wear the same clothes? No, that probably doesn’t matter, because presumably doesn’t affect the weather. But what does affect the weather? A lot of things! Do all those things have to be exactly the same for it count as similar day.

And what counts as a “lot” of days? How many do we need?

And it won’t rain on exactly 20% of those days. How close do we need to get?

Imagine I have a coin and I claim it lands heads up 50% of the time. Say I flip it 10 times and it lands heads up every time. Does that mean I was wrong? Not necessarily. It’s possible that the coin will do this. It’s just not very probable.

But look: now we’re using the word ‘probable’, which is the word we’re trying to understand! It’s getting sort of circular: we’re saying a coin has a 50% probability of landing heads up if when you flip it a lot of times, it probably lands head up close to 50% of the time. That’s not very helpful if you don’t already have some idea what ‘probability’ means.

For all these reasons, and many more, it’s tricky to say exactly what probabilities really mean. People have made a lot of progress on this question, but we will sidestep it and focus on learning to calculate with probabilities.

If you want to dig in a bit deeper, try this:

Probability interpretations, Wikipedia.

### Equally likely events

As I’ve tried to convince you, it can be hard to figure out the probabilities of events. But it’s easy if we assume all the events are equally likely.

Suppose we have a set $X$ consisting of $n$ events. And suppose that all the probabilities $p_i$ are equal: say for some constant $c$ we have

$p_i = c$

for all $i \in X.$ Then by rule 1) above,

$\displaystyle{ 1 = \sum_{i \in X} p_i = \sum_{i \in X} c = n c }$

since we’re just adding the number $c$ to itself $n$ times. So,

$\displaystyle{ c = \frac{1}{n} }$

and thus

$\displaystyle{ p_i = \frac{1}{n} }$

for all $i \in X$.

I made this look harder than it really is. I was just trying to show you that it follows from the definitions, not any intuition. But it’s obvious: if you have $n$ events that are equally likely, each one has probability $1/n.$

Example 1. Suppose we have a coin that can land either heads up or tails up—let’s ignore the possibility that it lands on its edge! Then

$X = \{ H, T\}$

If we assume these two events are equally probable, we must have

$\displaystyle{ p_H = p_T = \frac{1}{2} }$

Note I said “if we assume” these two events are equally probable. I didn’t say they actually are! Are they? Suppose we take a penny and flip it a zillion times. Will it land heads up almost exactly half a zillion times?

Probably not! The treasury isn’t interested in making pennies that do this. They’re interested in making the head look like Lincoln, and the tail look like the Lincoln national monument:

Or at least they used to. Since the two sides are different, there’s no reason they should have the exact same probability of landing on top.

In fact nobody seems to have measured the difference between heads and tails in probabilities for flipping pennies. For hand-flipped pennies, it seems whatever side that starts on top has a roughly 51% chance of landing on top! But if you spin a penny, it’s much more likely to land tails up:

The coin flip: a fundamentally unfair proposition?, Coding the Wheel.

Example 2. Suppose we have a standard deck of cards, well-shuffled, and assume that when I draw a card from this deck, each card is equally likely to be chosen. What is the probability that I draw the ace of spades?

If there’s no joker in the deck, there are 52 cards, so the answer is 1/52.

Let me remind you how a deck of cards works: I wouldn’t want someone to fail the course because they didn’t ever play cards! Here are the 52 cards in a standard deck. Here’s what they all look like (click to enlarge):

As you can see, they come in 4 kinds, called suits. The suits are:

• clubs: ♣

diamonds: ♦

hearts: ♥

Two suits are black and two are red. Each suit has 13 cards in it, for a total of 4 × 13 = 52. The cards in each suit are numbered from 1 to 13, except for four exceptions. They go like this:

A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K

A stands for ‘ace’, J for ‘jack’, Q for ‘queen’ and K for ‘king’.

### Probabilities of subsets

If we know a probability distribution on a finite set $X$, we can define the probability that an event in some subset $S \subseteq X$ will occur. We define this to be

$\displaystyle{p(S) = \sum_{i \in S} p_i }$

For example, I usually have one of three things for breakfast:

$X = \{ \textrm{oatmeal}, \textrm{waffles}, \textrm{eggs} \}$

I have an 86% chance of eating oatmeal for breakfast, a 10% chance of eating waffles, and a 4% chance of eating eggs and toast. What’s the probability that I will eat oatmeal or waffles? These choices form the subset

$S = \{ \textrm{oatmeal}, \textrm{waffles} \}$

and the probability for this subset is

$p(S) = p_{\textrm{oatmeal}} + p_{\textrm{waffles}} = 0.86 + 0.1 = 0.96$

Here’s an example from cards:

Example 2. Suppose we have a standard deck of cards, well-shuffled, and assume that when I draw a card from this deck, each card is equally likely to be chosen. What is the probability that I draw a card in the suit of hearts?

Since there are 13 cards in the suit of hearts, each with probability 1/52, we add up their probabilities and get

$\displaystyle{ 13 \times \frac{1}{52} = \frac{1}{4} }$

This should make sense, since there are 4 suits, and as many cards in each suit.

### Card tricks

This is just a fun digression. The deck of cards involves some weird numerology. For starters, it has 52 cards. That’s a strange number! Where else have you seen this number?

A student says: “It’s the number of weeks in a year.”

Right! And these 52 cards are grouped in 4 suits. What does the year have 4 of?

A student says: “Seasons!”

Right! And we have 52 = 4 × 13. So what are there 13 of?

A student says: “Weeks in a season!”

Right! I have no idea if this is a coincidence or not. And have you ever added up the values of all the cards in a suit, where we count the ace as 1, and so on? We get

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13

And what’s that equal to?

After a long pause, a student says “91.”

Yes, that’s a really strange number. But let’s say we total up the values of all the cards in the deck, not just one suit. What do we get?

A student says “We get 4 × 91… or 364.”

Right. Three-hundred and sixty-four. Almost the number of days in year.

“So add one more: the joker! Then you get 365!”

Right, maybe that’s why they put an extra card called the joker in the deck:

One extra card for one extra day, joker-day… April Fool’s Day! That brings the total up to 365.

Again, I have no idea if this is a coincidence or not. But the people who invented the Tarot deck were pretty weird—they packed it with symbolism—so maybe the ordinary cards were designed this way on purpose too.

Puzzle. What are the prime factors of the number 91? You should know by now… and you should know what they have to do with the calendar!

## Petri Net Programming (Part 2)

20 December, 2012

guest post by David A. Tanzer

### An introduction to stochastic Petri nets

In the previous article, I explored a simple computational model called Petri nets. They are used to model reaction networks, and have applications in a wide variety of fields, including population ecology, gene regulatory networks, and chemical reaction networks. I presented a simulator program for Petri nets, but it had an important limitation: the model and the simulator contain no notion of the rates of the reactions. But these rates critically determine the character of the dynamics of network.

Here I will introduce the topic of ‘stochastic Petri nets,’ which extends the basic model to include reaction dynamics. Stochastic means random, and it is presumed that there is an underlying random process that drives the reaction events. This topic is rich in both its mathematical foundations and its practical applications. A direct application of the theory yields the rate equation for chemical reactions, which is a cornerstone of chemical reaction theory. The theory also gives algorithms for analyzing and simulating Petri nets.

We are now entering the ‘business’ of software development for applications to science. The business logic here is nothing but math and science itself. Our study of this logic is not an academic exercise that is tangential to the implementation effort. Rather, it is the first phase of a complete software development process for scientific programming applications.

The end goals of this series are to develop working code to analyze and simulate Petri nets, and to apply these tools to informative case studies. But we have some work to do en route, because we need to truly understand the models in order to properly interpret the algorithms. The key questions here are when, why, and to what extent the algorithms give results that are empirically predictive. We will therefore be embarking on some exploratory adventures into the relevant theoretical foundations.

The overarching subject area to which stochastic Petri nets belong has been described as stochastic mechanics in the network theory series here on Azimuth. The theme development here will partly parallel that of the network theory series, but with a different focus, since I am addressing a computationally oriented reader. For an excellent text on the foundations and applications of stochastic mechanics, see:

• Darren Wilkinson, Stochastic Modelling for Systems Biology, Chapman and Hall/CRC Press, Boca Raton, Florida, 2011.

### Review of basic Petri nets

A Petri net is a graph with two kinds of nodes: species and transitions. The net is populated with a collection of ‘tokens’ that represent individual entities. Each token is attached to one of the species nodes, and this attachment indicates the type of the token. We may therefore view a species node as a container that holds all of the tokens of a given type.

The transitions represent conversion reactions between the tokens. Each transition is ‘wired’ to a collection of input species-containers, and to a collection of output containers. When it ‘fires’, it removes one token from each input container, and deposits one token to each output container.

Here is the example we gave, for a simplistic model of the formation and dissociation of H2O molecules:

The circles are for species, and the boxes are for transitions.

The transition combine takes in two H tokens and one O token, and outputs one H2O token. The reverse transition is split, which takes in one H2O, and outputs two H’s and one O.

An important application of Petri nets is to the modeling of biochemical reaction networks, which include the gene regulatory networks. Since genes and enzymes are molecules, and their binding interactions are chemical reactions, the Petri net model is directly applicable. For example, consider a transition that inputs one gene G, one enzyme E, and outputs the molecular form G • E in which E is bound to a particular site on G.

Applications of Petri nets may differ widely in terms of the population sizes involved in the model. In general chemistry reactions, the populations are measured in units of moles (where a mole is ‘Avogadro’s number’ 6.022 · 1023 entities). In gene regulatory networks, on the other hand, there may only be a handful of genes and enzymes involved in a reaction.

This difference in scale leads to a qualitative difference in the modelling. With small population sizes, the stochastic effects will predominate, but with large populations, a continuous, deterministic, average-based approximation can be used.

### Representing Petri nets by reaction formulas

Petri nets can also be represented by formulas used for chemical reaction networks. Here is the formula for the Petri net shown above:

H2O ↔ H + H + O

or the more compact:

H2O ↔ 2 H + O

The double arrow is a compact designation for two separate reactions, which happen to be opposites of each other.

By the way, this reaction is not physically realistic, because one doesn’t find isolated H and O atoms traveling around and meeting up to form water molecules. This is the actual reaction pair that predominates in water:

2 H2O ↔ OH- + H3O+

Here, a hydrogen nucleus H+, with one unit of positive charge, gets removed from one of the H2O molecules, leaving behind the hydroxide ion OH-. In the same stroke, this H+ gets re-attached to the other H2O molecule, which thereby becomes a hydronium ion, H3O+.

For a more detailed example, consider this reaction chain, which is of concern to the ocean environment:

CO2 + H2O ↔ H2CO3 ↔ H+ + HCO3-

This shows the formation of carbonic acid, namely H2CO3, from water and carbon dioxide. The next reaction represents the splitting of carbonic acid into a hydrogen ion and a negatively charged bicarbonate ion, HCO3-. There is a further reaction, in which a bicarbonate ion further ionizes into an H+ and a doubly negative carbonate ion CO32-. As the diagram indicates, for each of these reactions, a reverse reaction is also present. For a more detailed description of this reaction network, see:

• Stephen E. Bialkowski, Carbon dioxide and carbonic acid.

Increased levels of CO2 in the atmosphere will change the balance of these reactions, leading to a higher concentration of hydrogen ions in the water, i.e., a more acidic ocean. This is of concern because the metabolic processes of aquatic organisms is sensitive to the pH level of the water. The ultimate concern is that entire food chains could be disrupted, if some of the organisms cannot survive in a higher pH environment. See the Wikipedia page on ocean acidification for more information.

Exercise. Draw Petri net diagrams for these reaction networks.

### Motivation for the study of Petri net dynamics

The relative rates of the various reactions in a network critically determine the qualitative dynamics of the network as a whole. This is because the reactions are ‘competing’ with each other, and so their relative rates determine the direction in which the state of the system is changing. For instance, if molecules are breaking down faster then they are being formed, then the system is moving towards full dissociation. When the rates are equal, the processes balance out, and the system is in an equilibrium state. Then, there are only temporary fluctuations around the equilibrium conditions.

The rate of the reactions will depend on the number of tokens present in the system. For example, if any of the input tokens are zero, then the transition can’t fire, and so its rate must be zero. More generally, when there are few input tokens available, there will be fewer reaction events, and so the firing rates will be lower.

Given a specification for the rates in a reaction network, we can then pose the following kinds of questions about its dynamics:

• Does the network have an equilibrium state?

• If so, what are the concentrations of the species at equilibrium?

• How quickly does it approach the equilibrium?

• At the equilibrium state, there will still be temporary fluctuations around the equilibrium concentrations. What are the variances of these fluctuations?

• Are there modes in which the network will oscillate between states?

This is the grail we seek.

Aside from actually performing empirical experiments, such questions can be addressed either analytically or through simulation methods. In either case, our first step is to define a theoretical model for the dynamics of a Petri net.

### Stochastic Petri nets

A stochastic Petri net (with kinetics) is a Petri net that is augmented with a specification for the reaction dynamics. It is defined by the following:

• An underlying Petri net, which consists of species, transitions, an input map, and an output map. These maps assign to each transition a multiset of species. (Multiset means that duplicates are allowed.) Recall that the state of the net is defined by a marking function, that maps each species to its population count.

• A rate constant that is associated with each transition.

• A kinetic model, that gives the expected firing rate for each transition as a function of the current marking. Normally, this kinetic function will include the rate constant as a multiplicative factor.

A further ‘sanity constraint’ can be put on the kinetic function for a transition: it should give a positive value if and only if all of its inputs are positive.

• A stochastic model, which defines the probability distribution of the time intervals between firing events. This specific distribution of the firing intervals for a transition will be a function of the expected firing rate in the current marking.

This definition is based on the standard treatments found, for example in:

• M. Ajmone Marsan, Stochastic Petri nets: an elementary introduction, in Advances in Petri Nets, Springer, Berlin, 1989, 1–23.

or Wilkinson’s book mentioned above. I have also added an explicit mention of the kinetic model, based on the ‘kinetics’ described in here:

• Martin Feinberg, Lectures on chemical reaction networks.

There is an implied random process that drives the reaction events. A classical random process is given by a container with ‘particles’ that are randomly traveling around, bouncing off the walls, and colliding with each other. This is the general idea behind Brownian motion. It is called a random process because the outcome results from an ‘experiment’ that is not fully determined by the input specification. In this experiment, you pour in the ingredients (particles of different types), set the temperature (the distributions of the velocities), give it a stir, and then see what happens. The outcome consists of the paths taken by each of the particles.

In an important limiting case, the stochastic behavior becomes deterministic, and the population sizes become continuous. To see this, consider a graph of population sizes over time. With larger population sizes, the relative jumps caused by the firing of individual transitions become smaller, and graphs look more like continuous curves. In the limit, we obtain an approximation for high population counts, in which the graphs are continuous curves, and the concentrations are treated as continuous magnitudes. In a similar way, a pitcher of sugar can be approximately viewed as a continuous fluid.

This simplification permits the application of continuous mathematics to study of reaction network processes. It leads to the basic rate equation for reaction networks, which specifies the direction of change of the system as a function of the current state of the system.

In this article we will be exploring this continuous deterministic formulation of Petri nets, under what is known as the mass action kinetics. This kinetics is one implementation of the general specification of a kinetic model, as defined above. This means that it will define the expected firing rate of each transition, in a given marking of the net. The probabilistic variations in the spacing of the reactions—around the mean given by the expected firing rate—is part of the stochastic dynamics, and will be addressed in a subsequent article.

### The mass-action kinetics

Under the mass action kinetics, the expected firing rate of a transition is proportional to the product of the concentrations of its input species. For instance, if the reaction were A + C → D, then the firing rate would be proportional to the concentration of A times the concentration of C, and if the reaction were A + A → D, it would be proportional to the square of the concentration of A.

This principle is explained by Feinberg as follows:

For the reaction A+C → D, an occurrence requires that a molecule of A meet a molecule of C in the reaction, and we take the probability of such an encounter to be proportional to the product [of the concentrations of A and C]. Although we do not presume that every such encounter yields a molecule of D, we nevertheless take the occurrence rate of A+C → D to be governed by [the product of the concentrations].

For an in-depth proof of the mass action law, see this article:

• Daniel Gillespie, A rigorous definition of the chemical master equation, 1992.

Note that we can easily pass back and forth between speaking of the population counts for the species, and the concentrations of the species, which is just the population count divided by the total volume V of the system. The mass action law applies to both cases, the only difference being that the constant factors of (1/V) used for concentrations will get absorbed into the rate constants.

The mass action kinetics is a basic law of empirical chemistry. But there are limits to its validity. First, as indicated in the proof in the Gillespie, the mass action law rests on the assumptions that the system is well-stirred and in thermal equilibrium. Further limits are discussed here:

• Georg Job and Regina Ruffler, Physical Chemistry (first five chapters), Section 5.2, 2010.

They write:

…precise measurements show that the relation above is not strictly adhered to. At higher concentrations, values depart quite noticeably from this relation. If we gradually move to lower concentrations, the differences become smaller. The equation here expresses a so-called “limiting law“ which strictly applies only when c → 0.

In practice, this relation serves as a useful approximation up to rather high concentrations. In the case of electrically neutral substances, deviations are only noticeable above 100 mol m−3. For ions, deviations become observable above 1 mol m−3, but they are so small that they are easily neglected if accuracy is not of prime concern.

Why would the mass action kinetics break down at high concentrations? According to the book quoted, it is due to “molecular and ionic interactions.” I haven’t yet found a more detailed explanation, but here is my supposition about what is meant by molecular interactions in this context. Doubling the number of A molecules doubles the number of expected collisions between A and C molecules, but it also reduces the probability that any given A and C molecules that are within reacting distance will actually react. The reaction probability is reduced because the A molecules are ‘competing’ for reactions with the C molecules. With more A molecules, it becomes more likely that a C molecule will simultaneously be within reacting distance of several A molecules; each of these A molecules reduces the probability that the other A molecules will react with the C molecule. This is most pronounced when the concentrations in a gas get high enough that the molecules start to pack together to form a liquid.

### The equilibrium relation for a pair of opposite reactions

Suppose we have two opposite reactions:

$T: A + B \stackrel{u}{\longrightarrow} C + D$

$T': C + D \stackrel{v}{\longrightarrow} A + B$

Since the reactions have exactly opposite effects on the population sizes, in order for the population sizes to be in a stable equilibrium, the expected firing rates of $T$ and $T'$ must be equal:

$\mathrm{rate}(T') = \mathrm{rate}(T)$

By mass action kinetics:

$\mathrm{rate}(T) = u [A] [B]$

$\mathrm{rate}(T') = v [C] [D]$

where $[X]$ means the concentration of $X.$

Hence at equilibrium:

$u [A] [B] = v [C] [D]$

So:

$\displaystyle{ \frac{[A][B]}{[C][D]} = \frac{v}{u} = K }$

where $K$ is the equilibrium constant for the reaction pair.

### Equilibrium solution for the formation and dissociation of a diatomic molecule

Let A be some type of atom, and let D = A2 be the diatomic form of A. Then consider the opposite reactions:

$A + A \stackrel{u}{\longrightarrow} D$

$D \stackrel{v}{\longrightarrow} A + A$

From the preceding analysis, at equilibrium the following relation holds:

$u [A]^2 = v [D]$

Let $N(A)$ and $N(B)$ be the population counts for A and B, and let

$N = N(A) + 2 N(D)$

be the total number of units of A in the system, whether they be in the form of atoms or diatoms.

The value of $N$ is an invariant property of the system. The reactions cannot change it, because they are just shuffling the units of A from one arrangement to the other. By way of contrast, $N(A)$ is not an invariant quantity.

Dividing this equation by the total volume $V$, we get:

$[N] = [A] + 2 [D]$

where $[N]$ is the concentration of the units of A.

Given a fixed value for $[N]$ and the rate constants $u$ and $v$, we can then solve for the concentrations at equilibrium:

$\displaystyle{u [A]^2 = v [D] = v ([N] - [A]) / 2 }$

$\displaystyle{2 u [A]^2 + v [A] - v [N] = 0 }$

$\displaystyle{[A] = (-v \pm \sqrt{v^2 + 8 u v [N]}) / 4 u }$

Since $[A]$ can’t be negative, only the positive square root is valid.

Here is the solution for the case where $u = v = 1$:

$\displaystyle{[A] = (\sqrt{8 [N] + 1} - 1) / 4 }$

$\displaystyle{[D] = ([N] - [A]) / 2 }$

### Conclusion

We’ve covered a lot of ground, starting with the introduction of the stochastic Petri net model, followed by a general discussion of reaction network dynamics, the mass action laws, and calculating equilibrium solutions for simple reaction networks.

We still have a number of topics to cover on our journey into the foundations, before being able to write informed programs to solve problems with stochastic Petri nets. Upcoming topics are (1) the deterministic rate equation for general reaction networks and its application to finding equilibrium solutions, and (2) an exploration of the stochastic dynamics of a Petri net. These are the themes that will support our upcoming software development.