## Petri Net Programming (Part 2)

20 December, 2012

guest post by David A. Tanzer

### An introduction to stochastic Petri nets

In the previous article, I explored a simple computational model called Petri nets. They are used to model reaction networks, and have applications in a wide variety of fields, including population ecology, gene regulatory networks, and chemical reaction networks. I presented a simulator program for Petri nets, but it had an important limitation: the model and the simulator contain no notion of the rates of the reactions. But these rates critically determine the character of the dynamics of network.

Here I will introduce the topic of ‘stochastic Petri nets,’ which extends the basic model to include reaction dynamics. Stochastic means random, and it is presumed that there is an underlying random process that drives the reaction events. This topic is rich in both its mathematical foundations and its practical applications. A direct application of the theory yields the rate equation for chemical reactions, which is a cornerstone of chemical reaction theory. The theory also gives algorithms for analyzing and simulating Petri nets.

We are now entering the ‘business’ of software development for applications to science. The business logic here is nothing but math and science itself. Our study of this logic is not an academic exercise that is tangential to the implementation effort. Rather, it is the first phase of a complete software development process for scientific programming applications.

The end goals of this series are to develop working code to analyze and simulate Petri nets, and to apply these tools to informative case studies. But we have some work to do en route, because we need to truly understand the models in order to properly interpret the algorithms. The key questions here are when, why, and to what extent the algorithms give results that are empirically predictive. We will therefore be embarking on some exploratory adventures into the relevant theoretical foundations.

The overarching subject area to which stochastic Petri nets belong has been described as stochastic mechanics in the network theory series here on Azimuth. The theme development here will partly parallel that of the network theory series, but with a different focus, since I am addressing a computationally oriented reader. For an excellent text on the foundations and applications of stochastic mechanics, see:

• Darren Wilkinson, Stochastic Modelling for Systems Biology, Chapman and Hall/CRC Press, Boca Raton, Florida, 2011.

### Review of basic Petri nets

A Petri net is a graph with two kinds of nodes: species and transitions. The net is populated with a collection of ‘tokens’ that represent individual entities. Each token is attached to one of the species nodes, and this attachment indicates the type of the token. We may therefore view a species node as a container that holds all of the tokens of a given type.

The transitions represent conversion reactions between the tokens. Each transition is ‘wired’ to a collection of input species-containers, and to a collection of output containers. When it ‘fires’, it removes one token from each input container, and deposits one token to each output container.

Here is the example we gave, for a simplistic model of the formation and dissociation of H2O molecules:

The circles are for species, and the boxes are for transitions.

The transition combine takes in two H tokens and one O token, and outputs one H2O token. The reverse transition is split, which takes in one H2O, and outputs two H’s and one O.

An important application of Petri nets is to the modeling of biochemical reaction networks, which include the gene regulatory networks. Since genes and enzymes are molecules, and their binding interactions are chemical reactions, the Petri net model is directly applicable. For example, consider a transition that inputs one gene G, one enzyme E, and outputs the molecular form G • E in which E is bound to a particular site on G.

Applications of Petri nets may differ widely in terms of the population sizes involved in the model. In general chemistry reactions, the populations are measured in units of moles (where a mole is ‘Avogadro’s number’ 6.022 · 1023 entities). In gene regulatory networks, on the other hand, there may only be a handful of genes and enzymes involved in a reaction.

This difference in scale leads to a qualitative difference in the modelling. With small population sizes, the stochastic effects will predominate, but with large populations, a continuous, deterministic, average-based approximation can be used.

### Representing Petri nets by reaction formulas

Petri nets can also be represented by formulas used for chemical reaction networks. Here is the formula for the Petri net shown above:

H2O ↔ H + H + O

or the more compact:

H2O ↔ 2 H + O

The double arrow is a compact designation for two separate reactions, which happen to be opposites of each other.

By the way, this reaction is not physically realistic, because one doesn’t find isolated H and O atoms traveling around and meeting up to form water molecules. This is the actual reaction pair that predominates in water:

2 H2O ↔ OH- + H3O+

Here, a hydrogen nucleus H+, with one unit of positive charge, gets removed from one of the H2O molecules, leaving behind the hydroxide ion OH-. In the same stroke, this H+ gets re-attached to the other H2O molecule, which thereby becomes a hydronium ion, H3O+.

For a more detailed example, consider this reaction chain, which is of concern to the ocean environment:

CO2 + H2O ↔ H2CO3 ↔ H+ + HCO3-

This shows the formation of carbonic acid, namely H2CO3, from water and carbon dioxide. The next reaction represents the splitting of carbonic acid into a hydrogen ion and a negatively charged bicarbonate ion, HCO3-. There is a further reaction, in which a bicarbonate ion further ionizes into an H+ and a doubly negative carbonate ion CO32-. As the diagram indicates, for each of these reactions, a reverse reaction is also present. For a more detailed description of this reaction network, see:

• Stephen E. Bialkowski, Carbon dioxide and carbonic acid.

Increased levels of CO2 in the atmosphere will change the balance of these reactions, leading to a higher concentration of hydrogen ions in the water, i.e., a more acidic ocean. This is of concern because the metabolic processes of aquatic organisms is sensitive to the pH level of the water. The ultimate concern is that entire food chains could be disrupted, if some of the organisms cannot survive in a higher pH environment. See the Wikipedia page on ocean acidification for more information.

Exercise. Draw Petri net diagrams for these reaction networks.

### Motivation for the study of Petri net dynamics

The relative rates of the various reactions in a network critically determine the qualitative dynamics of the network as a whole. This is because the reactions are ‘competing’ with each other, and so their relative rates determine the direction in which the state of the system is changing. For instance, if molecules are breaking down faster then they are being formed, then the system is moving towards full dissociation. When the rates are equal, the processes balance out, and the system is in an equilibrium state. Then, there are only temporary fluctuations around the equilibrium conditions.

The rate of the reactions will depend on the number of tokens present in the system. For example, if any of the input tokens are zero, then the transition can’t fire, and so its rate must be zero. More generally, when there are few input tokens available, there will be fewer reaction events, and so the firing rates will be lower.

Given a specification for the rates in a reaction network, we can then pose the following kinds of questions about its dynamics:

• Does the network have an equilibrium state?

• If so, what are the concentrations of the species at equilibrium?

• How quickly does it approach the equilibrium?

• At the equilibrium state, there will still be temporary fluctuations around the equilibrium concentrations. What are the variances of these fluctuations?

• Are there modes in which the network will oscillate between states?

This is the grail we seek.

Aside from actually performing empirical experiments, such questions can be addressed either analytically or through simulation methods. In either case, our first step is to define a theoretical model for the dynamics of a Petri net.

### Stochastic Petri nets

A stochastic Petri net (with kinetics) is a Petri net that is augmented with a specification for the reaction dynamics. It is defined by the following:

• An underlying Petri net, which consists of species, transitions, an input map, and an output map. These maps assign to each transition a multiset of species. (Multiset means that duplicates are allowed.) Recall that the state of the net is defined by a marking function, that maps each species to its population count.

• A rate constant that is associated with each transition.

• A kinetic model, that gives the expected firing rate for each transition as a function of the current marking. Normally, this kinetic function will include the rate constant as a multiplicative factor.

A further ‘sanity constraint’ can be put on the kinetic function for a transition: it should give a positive value if and only if all of its inputs are positive.

• A stochastic model, which defines the probability distribution of the time intervals between firing events. This specific distribution of the firing intervals for a transition will be a function of the expected firing rate in the current marking.

This definition is based on the standard treatments found, for example in:

• M. Ajmone Marsan, Stochastic Petri nets: an elementary introduction, in Advances in Petri Nets, Springer, Berlin, 1989, 1–23.

or Wilkinson’s book mentioned above. I have also added an explicit mention of the kinetic model, based on the ‘kinetics’ described in here:

• Martin Feinberg, Lectures on chemical reaction networks.

There is an implied random process that drives the reaction events. A classical random process is given by a container with ‘particles’ that are randomly traveling around, bouncing off the walls, and colliding with each other. This is the general idea behind Brownian motion. It is called a random process because the outcome results from an ‘experiment’ that is not fully determined by the input specification. In this experiment, you pour in the ingredients (particles of different types), set the temperature (the distributions of the velocities), give it a stir, and then see what happens. The outcome consists of the paths taken by each of the particles.

In an important limiting case, the stochastic behavior becomes deterministic, and the population sizes become continuous. To see this, consider a graph of population sizes over time. With larger population sizes, the relative jumps caused by the firing of individual transitions become smaller, and graphs look more like continuous curves. In the limit, we obtain an approximation for high population counts, in which the graphs are continuous curves, and the concentrations are treated as continuous magnitudes. In a similar way, a pitcher of sugar can be approximately viewed as a continuous fluid.

This simplification permits the application of continuous mathematics to study of reaction network processes. It leads to the basic rate equation for reaction networks, which specifies the direction of change of the system as a function of the current state of the system.

In this article we will be exploring this continuous deterministic formulation of Petri nets, under what is known as the mass action kinetics. This kinetics is one implementation of the general specification of a kinetic model, as defined above. This means that it will define the expected firing rate of each transition, in a given marking of the net. The probabilistic variations in the spacing of the reactions—around the mean given by the expected firing rate—is part of the stochastic dynamics, and will be addressed in a subsequent article.

### The mass-action kinetics

Under the mass action kinetics, the expected firing rate of a transition is proportional to the product of the concentrations of its input species. For instance, if the reaction were A + C → D, then the firing rate would be proportional to the concentration of A times the concentration of C, and if the reaction were A + A → D, it would be proportional to the square of the concentration of A.

This principle is explained by Feinberg as follows:

For the reaction A+C → D, an occurrence requires that a molecule of A meet a molecule of C in the reaction, and we take the probability of such an encounter to be proportional to the product [of the concentrations of A and C]. Although we do not presume that every such encounter yields a molecule of D, we nevertheless take the occurrence rate of A+C → D to be governed by [the product of the concentrations].

For an in-depth proof of the mass action law, see this article:

• Daniel Gillespie, A rigorous definition of the chemical master equation, 1992.

Note that we can easily pass back and forth between speaking of the population counts for the species, and the concentrations of the species, which is just the population count divided by the total volume V of the system. The mass action law applies to both cases, the only difference being that the constant factors of (1/V) used for concentrations will get absorbed into the rate constants.

The mass action kinetics is a basic law of empirical chemistry. But there are limits to its validity. First, as indicated in the proof in the Gillespie, the mass action law rests on the assumptions that the system is well-stirred and in thermal equilibrium. Further limits are discussed here:

• Georg Job and Regina Ruffler, Physical Chemistry (first five chapters), Section 5.2, 2010.

They write:

…precise measurements show that the relation above is not strictly adhered to. At higher concentrations, values depart quite noticeably from this relation. If we gradually move to lower concentrations, the differences become smaller. The equation here expresses a so-called “limiting law“ which strictly applies only when c → 0.

In practice, this relation serves as a useful approximation up to rather high concentrations. In the case of electrically neutral substances, deviations are only noticeable above 100 mol m−3. For ions, deviations become observable above 1 mol m−3, but they are so small that they are easily neglected if accuracy is not of prime concern.

Why would the mass action kinetics break down at high concentrations? According to the book quoted, it is due to “molecular and ionic interactions.” I haven’t yet found a more detailed explanation, but here is my supposition about what is meant by molecular interactions in this context. Doubling the number of A molecules doubles the number of expected collisions between A and C molecules, but it also reduces the probability that any given A and C molecules that are within reacting distance will actually react. The reaction probability is reduced because the A molecules are ‘competing’ for reactions with the C molecules. With more A molecules, it becomes more likely that a C molecule will simultaneously be within reacting distance of several A molecules; each of these A molecules reduces the probability that the other A molecules will react with the C molecule. This is most pronounced when the concentrations in a gas get high enough that the molecules start to pack together to form a liquid.

### The equilibrium relation for a pair of opposite reactions

Suppose we have two opposite reactions:

$T: A + B \stackrel{u}{\longrightarrow} C + D$

$T': C + D \stackrel{v}{\longrightarrow} A + B$

Since the reactions have exactly opposite effects on the population sizes, in order for the population sizes to be in a stable equilibrium, the expected firing rates of $T$ and $T'$ must be equal:

$\mathrm{rate}(T') = \mathrm{rate}(T)$

By mass action kinetics:

$\mathrm{rate}(T) = u [A] [B]$

$\mathrm{rate}(T') = v [C] [D]$

where $[X]$ means the concentration of $X.$

Hence at equilibrium:

$u [A] [B] = v [C] [D]$

So:

$\displaystyle{ \frac{[A][B]}{[C][D]} = \frac{v}{u} = K }$

where $K$ is the equilibrium constant for the reaction pair.

### Equilibrium solution for the formation and dissociation of a diatomic molecule

Let A be some type of atom, and let D = A2 be the diatomic form of A. Then consider the opposite reactions:

$A + A \stackrel{u}{\longrightarrow} D$

$D \stackrel{v}{\longrightarrow} A + A$

From the preceding analysis, at equilibrium the following relation holds:

$u [A]^2 = v [D]$

Let $N(A)$ and $N(B)$ be the population counts for A and B, and let

$N = N(A) + 2 N(D)$

be the total number of units of A in the system, whether they be in the form of atoms or diatoms.

The value of $N$ is an invariant property of the system. The reactions cannot change it, because they are just shuffling the units of A from one arrangement to the other. By way of contrast, $N(A)$ is not an invariant quantity.

Dividing this equation by the total volume $V$, we get:

$[N] = [A] + 2 [D]$

where $[N]$ is the concentration of the units of A.

Given a fixed value for $[N]$ and the rate constants $u$ and $v$, we can then solve for the concentrations at equilibrium:

$\displaystyle{u [A]^2 = v [D] = v ([N] - [A]) / 2 }$

$\displaystyle{2 u [A]^2 + v [A] - v [N] = 0 }$

$\displaystyle{[A] = (-v \pm \sqrt{v^2 + 8 u v [N]}) / 4 u }$

Since $[A]$ can’t be negative, only the positive square root is valid.

Here is the solution for the case where $u = v = 1$:

$\displaystyle{[A] = (\sqrt{8 [N] + 1} - 1) / 4 }$

$\displaystyle{[D] = ([N] - [A]) / 2 }$

### Conclusion

We’ve covered a lot of ground, starting with the introduction of the stochastic Petri net model, followed by a general discussion of reaction network dynamics, the mass action laws, and calculating equilibrium solutions for simple reaction networks.

We still have a number of topics to cover on our journey into the foundations, before being able to write informed programs to solve problems with stochastic Petri nets. Upcoming topics are (1) the deterministic rate equation for general reaction networks and its application to finding equilibrium solutions, and (2) an exploration of the stochastic dynamics of a Petri net. These are the themes that will support our upcoming software development.

## Network Theory (Part 25)

3 November, 2012

In parts 2-24 of this network theory series, we’ve been talking about Petri nets and reaction networks. These parts are now getting turned into a book. You can see a draft here:

• John Baez and Jacob Biamonte, A course on quantum techniques for stochastic mechanics.

There’s a lot more to network theory than this. But before I dive into the next big topic, I want to mention a few more odds and ends about Petri nets and reaction networks. For example, their connection to logic and computation!

As we’ve seen, a stochastic Petri net can be used to describe a bunch of chemical reactions with certain reaction rates. We could try to use these reactions to build a ‘chemical computer’. But how powerful can such a computer be?

I don’t know the answer. But before people got interested in stochastic Petri nets, computer scientists spent quite some time studying plain old Petri nets, which don’t include the information about reaction rates. They used these as simple models of computation. And since computer scientists like to know which questions are decidable by means of an algorithm and which aren’t, they proved some interesting theorems about decidability for Petri nets.

Let me talk about: ‘reachability’: the question of which collections of molecules can turn into which other collections, given a fixed set of chemical reactions. For example, suppose you have these chemical reactions:

C + O2 → CO2

CO2 + NaOH → NaHCO3

NaHCO3 + HCl → H2O + NaCl + CO2

Can you use these to turn

C + O2 + NaOH + HCl

into

CO2 + H2O + NaCl ?

It’s not too hard to settle this particular question—we’ll do it soon. But settling all possible such questions turns out to be very hard

### Reachability

Remember:

Definition. A Petri net consists of a set $S$ of species and a set $T$ of transitions, together with a function

$i : S \times T \to \mathbb{N}$

saying how many copies of each state shows up as input for each transition, and a function

$o: S \times T \to \mathbb{N}$

saying how many times it shows up as output.

Today we’ll assume both $S$ and $T$ are finite.

Jacob and I like to draw the species as yellow circles and the transitions as aqua boxes, in a charmingly garish color scheme chosen by Dave Tweed. So, the chemical reactions I mentioned before:

C + O2 → CO2

CO2 + NaOH → NaHCO3

NaHCO3 + HCl → H2O + NaCl + CO2

give this Petri net:

A ‘complex’ is, roughly, a way of putting dots in the yellow circles. In chemistry this says how many molecules we have of each kind. Here’s an example:

This complex happens to have just zero or one dot in each circle, but that’s not required: we could have any number of dots in each circle. So, mathematically, a complex is a finite linear combination of species, with natural numbers as coefficients. In other words, it’s an element of $\mathbb{N}^S.$ In this particular example it’s

C + O2 + NaOH + HCl

Given two complexes, we say one is reachable from another if, loosely speaking, we can get to it from the other by a finite sequence of transitions. For example, earlier on I asked if we can get from the complex I just mentioned to the complex

CO2 + H2O + NaCl

which we can draw like this:

And the answer is yes, we can do it with this sequence of transitions:

This settles the question I asked earlier.

So in chemistry, reachability is all about whether it’s possible to use certain chemical reactions to turn one collection of molecules into another using a certain set of reactions. I hope this is clear enough; I could formalize it further but it seems unnecessary. If you have questions, ask me or read this:

Petri net: execution semantics, Wikipedia.

### The reachability problem

Now the reachability problem asks: given a Petri net and two complexes, is one reachable from the other?

If the answer is ‘yes’, of course you can show that by an exhaustive search of all possibilities. But if the answer is ‘no’, how can you be sure? It’s not obvious, in general. Back in the 1970′s, computer scientists felt this problem should be decidable by some algorithm… but they had a lot of trouble finding such an algorithm.

In 1976, Richard J. Lipton showed that if such an algorithm existed, it would need to take at least an exponential amount of memory space and an exponential amount of time to run:

• Richard J. Lipton, The reachability problem requires exponential space, Technical Report 62, Yale University, 1976.

This means that most computer scientists would consider any algorithm to solve the reachability problem ‘infeasible’, since they like polynomial time algorithms.

On the bright side, it means that Petri nets might be fairly powerful when viewed as computers themselves! After all, for a universal Turing machine, the analogue of the reachability problem is undecidable. So if the reachability problem for Petri nets were decidable, they couldn’t be universal computers. But if it were decidable but hard, Petri nets might be fairly powerful—though still not universal—computers.

In 1977, at the ACM Symposium on the Theory of Computing, two researchers presented a proof that reachability problem was decidable:

• S. Sacerdote and R. Tenney, The decidability of the reachability problem for vector addition systems, Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, 2-4 May 1977, Boulder, Colorado, USA, ACM, 1977, pp. 61–76.

• James L. Peterson, Petri Net Theory and the Modeling of Systems, Prentice–Hall, New Jersey, 1981.

This is a very nice introduction to early work on Petri nets and decidability. Peterson had an interesting idea, too:

There would seem to be a very useful connection between Petri nets and Presburger arithmetic.

He gave some evidence, and suggested using this to settle the decidability of the reachability problem. I found that intriguing! Let me explain why.

Presburger arithmetic is a simple set of axioms for the arithmetic of natural numbers, much weaker than Peano arithmetic or even Robinson arithmetic. Unlike those other systems, Presburger arithmetic doesn’t mention multiplication. And unlike those other systems, you can write an algorithm that decides whether any given statement in Presburger arithmetic is provable.

However, any such algorithm must be very slow! In 1974, Fischer and Rabin showed that any decision algorithm for Presburger arithmetic has a worst-case runtime of at least

$2^{2^{c n}}$

for some constant $c,$ where $n$ is the length of the statement. So we say the complexity is at least doubly exponential. That’s much worse than exponential! On the other hand, an algorithm with a triply exponential run time was found by Oppen in 1978.

I hope you see why this is intriguing. Provability is a lot like reachability, since in a proof you’re trying to reach the conclusion starting from the assumptions using certain rules. Like Presburger arithmetic, Petri nets are all about addition, since they consists of transitions going between linear combinations like this:

6 CO2 + 6 H2O → C6H12O6 + 6 O2

That’s why the old literature calls Petri nets vector addition systems. And finally, the difficulty of deciding provability in Presburger arithmetic smells a bit like the difficulty of deciding reachability in Petri nets.

So, I was eager to learn what happened after Peterson wrote his book.

For starters, in 1981, the very year Peterson’s book came out, Ernst Mayr showed that the reachability problem for Petri nets is decidable:

• Ernst Mayr, Persistence of vector replacement systems is decidable, Acta Informatica 15 (1981), 309–318.

As you can see from the title, Mayr actually proved some other property was decidable. However, it follows that reachability is decidable, and Mayr pointed this out in his paper. In fact the decidability of reachability for Petri nets is equivalent to lots of other interesting questions. You can see a bunch here:

• Javier Esparza and Mogens Nielsen, Decidability issues for Petri nets—a survey, Bulletin of the European Association for Theoretical Computer Science 52 (1994), 245–262.

Mayr’s algorithm was complicated. Worse still, it seems to take a hugely long time to run. It seems nobody knows an explicit bound on its runtime. The runtim might even grow faster than any primitive recursive function. The Ackermann function and the closely related Ackermann numbers are famous examples of functions that grow more rapidly than any primitive recursive function. If you don’t know about these, now is the time to learn!

Remember that we can define multiplication by iterating addition:

$n \times m = n + n + n + \cdots + n$

where add $n$ to itself $m$ times. Then we can define exponentiation by iterating multiplication:

$n \uparrow m = n \times n \times n \times \cdots \times n$

where we multiply $n$ by itself $m$ times. Here I’m using Knuth’s up-arrow notation. Then we can define tetration by iterating exponentiation:

$n \uparrow^2 m = n \uparrow (n \uparrow (n \uparrow \cdots \uparrow n)))$

Then we can define an operation $\uparrow^3$ by iterating tetration, and so on. All these functions are primitive recursive. But the $n$th Ackermann number is not; it’s defined to be

$n \uparrow^n n$

This grows at an insanely rapid rate:

$1 \uparrow 1 = 1$

$2 \uparrow^2 2 = 4$

$3 \uparrow^3 3 = 3^{3^{3^{.^{.^{.}}}}}$

where we have a stack of $3^{3^3}$ threes—or in other words, $3^{7625597484987}$ threes! When we get to $4 \uparrow^4 4,$ my mind boggles. I wish it didn’t, but it does.

In 1998 someone came up with a faster algorithm:

• Zakaria Bouziane, A primitive recursive algorithm for the general Petri net reachability problem, in 39th Annual Symposium on Foundations of Computer Science, IEEE, 1998, pp. 130-136.

Bouziane claimed this algorithm is doubly exponential in space and time. That’s very slow, but not insanely slow.

However, it seems that Bouziane made a mistake:

• Petr Jančar, Bouziane’s transformation of the Petri net reachability problem and incorrectness of the related algorithm, Information and Computation, 206 (2008), 1259–1263.

So: if I tell you some chemicals and a bunch of reactions involving these chemicals, you can decide when some combination of these chemicals can turn into another combination. But it may take a long time to decide this. And we don’t know exactly how long: just more than ‘exponentially long’!

What about the connection to Presburger arithmetic? This title suggests that it exists:

• Jérôme Leroux, The general vector addition system reachability problem by Presburger inductive separators, 2008.

But I don’t understand the paper well enough to be sure. Can someone say more?

Also, does anyone know more about the computational power of Petri nets? They’re not universal computers, but is there a good way to say how powerful they are? Does the fact that it takes a long time to settle the reachability question really imply that they have a lot of computational power?

### Symmetric monoidal categories

Next let me explain the secret reason I’m so fascinated by this. This section is mainly for people who like category theory.

As I mentioned once before, a Petri net is actually nothing but a presentation of a symmetric monoidal category that’s free on some set of objects and some set of morphisms going between tensor products of those objects:

Vladimiro Sassone, On the category of Petri net computations, 6th International Conference on Theory and Practice of Software Development, Proceedings of TAPSOFT ’95, Lecture Notes in Computer Science 915, Springer, Berlin, pp. 334-348.

In chemistry we write the tensor product additively, but we could also write it as $\otimes$. Then the reachability problem consists of questions of this general type:

Suppose we have a symmetric monoidal category freely generated by objects $A, B, C$ and morphisms

$e: A \to B \otimes C$

$f: A \otimes A \otimes B \to A \otimes C$

$g: A \otimes B \otimes C \to A \otimes B \otimes B$

$h : B \otimes A \otimes A \to B$

Is there a morphism from $B \otimes A \otimes A$ to $A \otimes A$?

This is reminiscent of the word problem for groups and other problems where we are given a presentation of an algebraic structure and have to decide if two elements are equal… but now, instead of asking whether two elements are equal we are asking if there is a morphism from one object to another. So, it is fascinating that this problem is decidable—unlike the word problem for groups—but still very hard to decide.

Just in case you want to see a more formal statement, let me finish off by giving you that:

Reachability problem. Given a symmetric monoidal category $C$ freely generated by a finite set of objects and a finite set of morphisms between tensor products of these objects, and given two objects $x,y \in C,$ is there a morphism $f: x \to y$?

Theorem (Lipton, Mayr). There is an algorithm that decides the reachability problem. However, for any such algorithm, for any $c > 0,$ the worst-case run-time exceeds $2^{c n}$ where $n$ is the size of the problem: the sum of the number of generating objects, the number of factors in the sources and targets of all the generating morphisms, and the number of factors in the objects $x,y \in C$ for which the reachability problem is posed.

## Petri Net Programming

1 October, 2012

guest post by David Tanzer

Petri nets are a simple model of computation with a range of modelling applications that include chemical reaction networks, manufacturing processes, and population biology dynamics. They are graphs through which entities flow and have their types transformed. They also have far-reaching mathematical properties which are the subject of an extensive literature. See the network theory series here on Azimuth for a panoramic and well-diagrammed introduction to the theory and applications of Petri nets.

In this article, I will introduce Petri nets and give an expository presentation of a program to simulate them. I am hoping that this will be of interest to anyone who wants to see how scientific concepts can be formulated and actualized in computer software. Here the simulator program will be applied to a “toy model” of a chemical reaction network for the synthesis and dissociation of H2O molecules. The source code for this program should give the reader some “clay” to work with.

This material will prepare you for the further topic of stochastic Petri nets, where the sequencing of the Petri net is probabilistically controlled.

### Definition of Petri nets

A Petri net is a diagram with two kinds of nodes: container nodes, called species (or “places”, “states”), which can hold zero or more tokens, and transition nodes, which are “wired” to some containers called its inputs and some called its outputs. A transition can have multiple input or output connections to a single container.

When a transition fires, it removes one token from each input container, and adds one token to each output container. When there are multiple inputs or outputs to the same container, then that many tokens get removed or added.

The total state of the Petri net is described by a labelling function which maps each container to the number of tokens it holds. A transition is enabled to fire in a labelling if there are enough tokens at its input containers. If no transitions are enabled, the it is halted.

The sequencing is non-deterministic, because in a given labelling there may be several transitions that are enabled. Dataflow arises whenever one transition sends tokens to a container that is read by another transition.

Petri nets represent entity conversion networks, which consist of entities of different species, along with reactions that convert input entities to output entities. Each entity is symbolized by a token in the net, and all the tokens for a species are grouped into an associated container. The conversion of entities is represented by the transitions that transform tokens.

### Example 1: Disease processes

Here’s an example discussed earlier on Azimuth. It describes the virus that causes AIDS. The species are healthy cell, infected cell, and virion (the technical term for an individual virus). The transitions are for infection, production of healthy cells, reproduction of virions within an infected cell, death of healthy cells, death of infected cells, and death of virions.

Here the species are yellow circles and the transitions are aqua squares. Note that there are three transitions called “death” and two called “production.” They are disambiguated by a Greek letter suffix.

production ($\alpha$) describes the birth of one healthy cell, so it has no input and one healthy as output.

death ($\gamma$) has one healthy as input and no output.

death ($\delta$) has one infected as input and no output.

death ($\zeta$) has one virion as input and no output.

infection ($\beta$) takes one healthy and one virion as input, and has one infected cell as output.

production ($\epsilon$) describes the reproduction of the virus in an infected cell, so it has one infected as input, and one infected and one virion as output.

### Example 2: Population dynamics

Here the tokens represent organisms, and the species are biological species. This simple example involving two species, wolf and rabbit:

There are three transitions: birth, which inputs one rabbit and outputs rabbit + rabbit like asexual reproduction), predation, which converts rabbit plus wolf to wolf plus wolf, and death, which inputs one wolf and outputs nothing.

### Example 3: Chemical reaction networks

Here the entities are chemical units: molecules, isolated atoms, or ions. Each chemical unit is represented by a token in the net, and a container holds all of the tokens for a chemical species. Chemical reactions are then modeled as transitions that consume input tokens and generate output tokens.

We will be using the following simplified model for water formation and dissociation:

• The species are H, O, and H2O.

• Transition combine inputs two H atoms and one O atom, and outputs an H2O molecule.

• Transition split is the reverse process, which inputs one H2 and outputs two H and one O.

Note that the model is intended to show the idea of chemical reaction Petri nets, so it is not physically realistic. The actual reactions involve H2 and O2, and there are intermediate transitions as well. For more details, see Part 3 of the network theory series.

### A Petri net simulator

The following Python script will simulate a Petri net, using parameters that describe the species, the transitions, and the initial labelling. It will run the net for a specified number of steps. At each step it chooses randomly among the enabled transitions, fires it, and prints the new labelling on the console.

Here is the first Petri net simulator.

#### Running the script

The model parameters are already coded into the script. So let’s give it a whirl:

python petri1.py

This produced the output:

    H, O, H2O, Transition
5, 3, 4, split
7, 4, 3, split
9, 5, 2, combine
7, 4, 3, combine
5, 3, 4, split
7, 4, 3, split
9, 5, 2, split
11, 6, 1, combine
9, 5, 2, split
11, 6, 1, split
13, 7, 0, ...


We started out in a state with 5 H’s, 3 O’s, and 4 H2O’s, then a split took place, which increased H by 2, increased O by 1, and decreased H2O by one, then…

Running it again gives a different execution sequence.

#### Software structures used in the program

Before performing a full dissection of the code, I’d like to make a digression to discuss some of the basic software constructs that are used in this program. This is directed in particular to those of you who are coming from the science side, and are interested in learning more about programming.

This program exercises a few of the basic mechanisms of object-oriented and functional programming. The Petri net logic is bundled into the definition of a single class called PetriNet, and each PetriNet object is an instance of the class. This logic is grouped into methods, which are functions whose first parameter is bound to an instance of the class.

For example, here is the method IsHalted of the class PetriNet:

    def IsHalted(this):
return len(this.EnabledTransitions()) == 0


The first parameter, conventionally called “this” or “self,” refers to the PetriNet class instance. len returns the length of a list, and EnabledTransitions is a method that returns the list of transitions enabled in the current labelling.

Here is the syntax for using the method:

    if petriNetInstance.IsHalted(): ...


If it were the case that IsHalted took additional parameters, the definition would look like this:

    def IsHalted(this, arg1, ..., argn):


and the call would look like this:

    if petriNetInstance.IsHalted(arg1, ..., argn)


Here is the definition of EnabledTransitions, which shows a basic element of functional programming:

    def EnabledTransitions(this):
return filter(lambda transition: transition.IsEnabled(this.labelling), this.transitions)


A PetriNet instance holds this list of Transition objects, called this.transitions. The expression “lambda: transition …” is an anonymous function that maps a Transition object to the boolean result returned by calling the IsEnabled method on that transition, given the current labelling this.labelling. The filter function takes an anonymous boolean-valued function and a list of objects, and returns the sublist consisting of those objects that satisfy this predicate.

The program also gives an example of class inheritance, because the class PetriNet inherits all fields and methods from a base class called PetriNetBase.

#### Python as a language for pedagogical and scientific programming

We continue with our digression, to talk now about the choice of language. With something as fundamental to our thinking as a language, it’s easy to see how the topic of language choice tends to become partisan. Imagine, for example, a debate between different nationalities, about which language was more beautiful, logical, etc. Here the issue is put into perspective by David Tweed from the Azimuth project:

Programming languages are a lot like “motor vehicles”: a family car has different trade-offs to a sports car to a small van to a big truck to a motorbike to …. Each of these has their own niche where they make the most sense to use.

The Petri net simulator which I wrote in Python, and which will soon to be dissected, could have been equivalently written in any modern object-oriented programming language, such as C++, Java or C#. I chose Python for the following reasons. First, it is a scripting language. This means that the casual user need not get involved with a compilation step that is preparatory to running the program. Second, it does provide the abstraction capabilities needed to express the Petri net model. Third, I like the way that the syntax rolls out. This syntax has been described as executable pseudo-code. Finally, the language has a medium-sized user-base and support community.

Python is good for proof of concept programs, and it works well as a pedagogical programming language. Yet it is also practical. It has been widely adopted in the field of scientific programming. David Tweed explains it in these terms:

I think a big part of Python’s use comes from an the way that a lot of scientific programming is as much about “management” — parsing files of prexisting structured data, iterating over data, controlling network connections, calling C code, launching subprograms, etc — as much as “numeric computation”. This is where Python is particularly good, and it’s now acquired the NumPy & SciPy extensions to speed up the numerical programming elements, but it’s primarily the higher level elements that make it attractive in science.

Because variables in Python are neither declared in the program text, nor inferred by the byte-code compiler, the types of the variables are known only at run time. This has a negative impact on performance for data intensive calculations: rather than having the compiler generate the right code for the data type that is being processed, the types need to be checked at run time. The NumPy and SciPy libraries address this by providing bulk array operations, e.g., addition of two matrices, in a native code library that is integrated with the Python runtime environment. If this framework does not suffice for your high-performance numerical application, then you will have to turn to other languages, notably, C++.

All this being said, it is now time to return to the main topic of this article, which is the content of Petri net programming.

#### Top-level structure of the Petri net simulator script

At a high level, the script constructs a Petri net, constructs the initial labelling, and then runs the simulation for a given number of steps.

First construct the Petri net:


# combine: 2H + 1O -> 1H2O
combineSpec = ("combine", [["H",2],["O",1]], [["H2O",1]])

# split: 1H2O -> 2H + 1O
splitSpec = ("split", [["H2O",1]], [["H",2],["O",1]])

petriNet = PetriNet(
["H","O","H2O"],         # species
[combineSpec,splitSpec]  # transitions
)


Then establish the initial labelling:

    initialLabelling = {"H":5, "O":3, "H2O":4}


Then run it for twenty steps:

    steps = 20
petriNet.RunSimulation(steps, initialLabelling)


#### Program code

Species will be represented simply by their names, i.e., as strings. PetriNet and Transition will be defined as object classes. Each PetriNet instance holds a list of species names, a list of transition names, a list of Transition objects, and the current labelling. The labelling is a dictionary from species name to token count. Each Transition object contains an input dictionary and an input dictionary. The input dictionary map the name of a species to the number of times that the transition takes it as input, and similarly for the output dictionary.

##### Class PetriNet

The PetriNet class has a top-level method called RunSimulation, which makes repeated calls to FireOneRule. FireOneRule obtains the list of enabled transitions, chooses one randomly, and fires it. This is accomplished by the method SelectRandom, which uses a random integer between 1 and N to choose a transition from the list of enabled transitions.

    class PetriNet(PetriNetBase):
def RunSimulation(this, iterations, initialLabelling):
this.PrintHeader()  # prints e.g. "H, O, H2O"
this.labelling = initialLabelling
this.PrintLabelling() # prints e.g. "3, 5, 2"

for i in range(iterations):
if this.IsHalted():
print "halted"
return
else:
this.FireOneRule()
this.PrintLabelling();
print "iterations completed"

def EnabledTransitions(this):
return filter(lambda transition: transition.IsEnabled(this.labelling), this.transitions)

def IsHalted(this):
return len(this.EnabledTransitions()) == 0

def FireOneRule(this):
this.SelectRandom(this.EnabledTransitions()).Fire (this.labelling)

def SelectRandom(this, items):
randomIndex = randrange(len(items))
return items[randomIndex]

##### Class Transition

The Transition class exposes two key methods. IsEnabled takes a labeling as parameter, and returns a boolean saying whether the transition can fire. This is determined by comparing the input map for the transition with the token counts in the labeling, to see if there is sufficient tokens for it to fire. The Fire method takes a labelling in, and updates the counts in it to reflect the action of removing input tokens and creating output tokens.

    class Transition:
# Fields:
# transitionName
# inputMap: speciesName -&gt; inputCount
# outputMap: speciesName -&gt; outputCount

# constructor
def __init__(this, transitionName):
this.transitionName = transitionName
this.inputMap = {}
this.outputMap = {}

def IsEnabled(this, labelling):
for inputSpecies in this.inputMap.keys():
if labelling[inputSpecies] &lt; this.inputMap[inputSpecies]:
return False  # not enough tokens
return True # good to go

def Fire(this, labelling):
print this.transitionName
for inputName in this.inputMap.keys():
labelling[inputName] = labelling[inputName] - this.inputMap[inputName]
for outputName in this.outputMap.keys():
labelling[outputName] = labelling[outputName] + this.outputMap[outputName]

##### Class PetriNetBase

Notice that the class line for PetriNet declares that it inherits from a base class PetriNetBase. The base class contains utility methods that support PetriNet: PrintHeader, PrintLabelling, SelectRandom, and the constructor, which converts the transition specifications into Transition objects.

    class PetriNetBase:
# Fields:
# speciesNames
# Transition list
# labelling: speciesName -&gt; token count

# constructor
def __init__(this, speciesNames, transitionSpecs):
this.speciesNames = speciesNames
this.transitions = this.BuildTransitions(transitionSpecs)

def BuildTransitions(this, transitionSpecs):
transitions = []
for (transitionName, inputSpecs, outputSpecs) in transitionSpecs:
transition = Transition(transitionName)
for degreeSpec in inputSpecs:
this.SetDegree(transition.inputMap, degreeSpec)
for degreeSpec in outputSpecs:
this.SetDegree(transition.outputMap, degreeSpec)
transitions.append(transition)
return transitions

def SetDegree(this, dictionary, degreeSpec):
speciesName = degreeSpec[0]
if len(degreeSpec) == 2:
degree = degreeSpec[1]
else:
degree = 1
dictionary[speciesName] = degree

print string.join(this.speciesNames, ", ") + ", Transition"

def PrintLabelling(this):
for speciesName in this.speciesNames:
print str(this.labelling[speciesName]) + ",",


### Summary

We’ve learned about an important model for computation, called Petri nets, and seen how it can be used to model a general class of entity conversion networks, which include chemical reaction networks as a major case. Then we wrote a program to simulate them, and applied it to a simple model for formation and dissociation of water molecules.

This is a good beginning, but observe the following limitation of our current program: it just randomly picks a rule. When we run the program, the simulation makes a kind of random walk, back and forth, between the states of full synthesis and full dissociation. But in a real chemical system, the rates at which the transitions fires are _probabilistically determined_, and depend, among other things, on the temperature. With a high probability for formation, and a low probability for dissociation, we would expect the system to reach an equilibrium state in which H2O is the predominant “token” in the system. The relative concentration of the various chemical species would be determined by the relative firing rates of the various transitions.

This gives motivation for the next article that I am writing, on stochastic Petri nets.

### Programming exercises

1. Extend the constructor for PetriNet to accept a dictionary from transitionName to a number, which will give the relative probability of that transition firing. Modify the firing rule logic to use these values. This is a step in the direction of the stochastic Petri nets covered in the next article.

2. Make a prediction about the overall evolution of the system, given fixed probabilities for synthesis and dissociation, and then run the program to see if your prediction is confirmed.

3. If you like, improve the usability of the script, by passing the model parameters from the command line. Use sys.argv and eval. You can use single quotes, and pass a string like “{‘H’:5, ‘O’:3, ‘H2O’:4}” from the command line.

### Acknowledgements

Thanks to David Tweed and Ken Webb for reviewing the code and coming up with improvements to the specification syntax for the Petri nets. Thanks to Jacob Biamonte for the diagrams, and for energetically reviewing the article. John Baez contributed the three splendid examples of Petri nets. He also encouraged me along the way, and provided good support as an editor.

### Appendix: Notes on the language interpreter

The sample program is written in Python, which is a low-fuss scripting language with abstraction capabilities. The language is well-suited for proof-of-concept programming, and it has a medium-sized user base and support community. The main website is www.python.org.

You have a few distributions to choose from:

• If you are on a Linux or Mac type of system, it is likely to already be installed. Just open a shell and type “python.” Otherwise, use the package manager to install it.

• In Windows, you can use the version from the python.org web site. Alternatively, install cygwin, and in the installer, select Python, along with any of your other favorite Linux-style packages. Then you can partially pretend that you are working on a Linux type of system.

• The Enthought Python distribution comes packaged with a suite of Python-based open-source scientific and numeric computing packages. This includes ipython, scipy, numpy and matplotlib.

The program is distributed as a self-contained script, so in Linux and cygwin you can just execute ./petri1.py. When running this way under cygwin, you can either refer to the cygwin version of the Python executable, or to any of the native Windows versions of interpreter. You just need to adjust the first line of the script to point to the python executable.

## Azimuth News (Part 2)

28 September, 2012

Last week I finished a draft of a book and left Singapore, returning to my home in Riverside, California. It’s strange and interesting, leaving the humid tropics for the dry chaparral landscape I know so well.

Now I’m back to my former life as a math professor at the University of California. I’ll be going back to the Centre for Quantum Technology next summer, and summers after that, too. But life feels different now: a 2-year period of no teaching allowed me to change my research direction, but now it’s time to teach people what I’ve learned!

It also happens to be a time when the Azimuth Project is about to do a lot of interesting things. So, let me tell you some news!

### Programming with Petri nets

The Azimuth Project has a bunch of new members, who are bringing with them new expertise and lots of energy. One of them is David Tanzer, who was an undergraduate math major at U. Penn, and got a Ph.D. in computer science at NYU. Now he’s a software developer, and he lives in Brooklyn, New York.

He writes:

My areas of interest include:

• Queryable encyclopedias

• Machine representation of scientific theories

• Machine representation of conflicts between contending theories

• Social and technical structures to support group problem-solving activities

• Balkan music, Afro-Latin rhythms, and jazz guitar

To me, the most meaningful applications of science are to the myriad of problems that beset the human race. So the Aziumuth Project is a good focal point for me.

And on Azimuth, he’s starting to write some articles on ‘programming with Petri nets’. We’ve talked about them a lot in the network theory series:

They’re a very general modelling tool in chemistry, biology and computer science, precisely the sort of tool we need for a deep understanding of the complex systems that keep our living planet going—though, let’s be perfectly clear about this, just one of many such tools, and one of the simplest. But as mathematical physicists, Jacob Biamonte and I have studied Petri nets in a highly theoretical way, somewhat neglecting the all-important problem of how you write programs that simulate Petri nets!

Such programs are commercially available, but it’s good to see how to write them yourself, and that’s what David Tanzer will tell us. He’ll use the language Python to write these programs in a nice modern object-oriented way. So, if you like coding, this is where the rubber meets the road.

I’m no expert on programming, but it seems the modularity of Python code nicely matches the modularity of Petri nets. This is something I’d like to get into more deeply someday, in my own effete theoretical way. I think the category-theoretic foundations of computer languages like Python are worth understanding, perhaps more interesting in fact than purely functional languages like Haskell, which are better understood. And I think they’ll turn out to be nicely related to the category-theoretic foundations of Petri nets and other networks I’m going to tell you about!

And I believe this will be important if we want to develop ‘ecotechnology’, where our machines and even our programming methodologies borrow ingenuity and wisdom from biological processes… and learn to blend with nature instead of fighting it.

### Petri nets, systems biology, and beyond

Another new member of the Azimuth Project is Ken Webb. He has a BA in Cognitive Science from Carleton University in Ottawa, and an MSc in Evolutionary and Adaptive Systems from The University of Sussex in Brighton. Since then he’s worked for many years as a software developer and consultant, using many different languages and approaches.

He writes:

Things that I’m interested in include:

• networks of all types, hierarchical organization of network nodes, and practical applications

• climate change, and “saving the planet”

• programming code that anyone can run in their browser, and that anyone can edit and extend in their browser

• approaches to software development that allow independently-developed apps to work together

• the relationship between computer-science object-oriented (OO) concepts and math concepts

• how everything is connected

I’ve been paying attention to the Azimuth Project because it parallels my own interests, but with a more math focus (math is not one of my strong points). As learning exercises, I’ve reimplemented a few of the applications mentioned on Azimuth pages. Some of my online workbooks (blog-like entries that are my way of taking active notes) were based on content at the Azimuth Project.

He’s started building a Petri net modeling and simulation tool called Xholon. It’s written in Java and can be run online using Java Web Start (JNLP). Using this tool you can completely specify Petri net models using XML. You can see more details, and examples, on his Azimuth page. If I were smarter, or had more spare time, I would have already figured out how to include examples that actually run in an interactive way in blog articles here! But more on that later.

Soon I hope Ken will finish a blog entry in which he discusses how Petri nets fit into a bigger setup that can also describe ‘containers’, where molecules are held in ‘membranes’ and these membranes can allow chosen molecules through, and also split or merge—more like biology than inorganic chemistry. His outline is very ambitious:

This tutorial works through one simple example to demonstrate the commonality/continuity between a large number of different ways that people use to understand the structure and behavior of the world around us. These include chemical reaction networks, Petri nets, differential equations, agent-based modeling, mind maps, membrane computing, Unified Modeling Language, Systems Biology Markup Language, and Systems Biology Graphical Notation. The intended audience includes scientists, engineers, programmers, and other technically literate nonexperts. No math knowledge is required.

### The Azimuth Server

With help from Glyn Adgie and Allan Erskine, Jim Stuttard has been setting up a server for Azimuth. All these folks are programmers, and Jim Stuttard, in particular, was a systems consultant and software applications programmer in C, C++ and Java until 2001. But he’s really interested in formal methods, and now he programs in Haskell.

I won’t say anything about the Azimuth server, since I’ll get it wrong, it’s not quite ready yet, and Jim wisely prefers to get it working a bit more before he talks about it. But you can get a feeling for what’s coming by going here.

### How to find out more

You can follow what we’re doing by visiting the Azimuth Forum. Most of our conversations there are open to the world, but some can only be seen if you become a member. This is easy to do, except for one little thing.

Nobody, nobody , seems capable of reading the directions where I say, in boldface for easy visibility:

Use your whole real name as username. Spaces and capital letters are good. So, for example, a username like ‘Tim van Beek’ is good, ‘timvanbeek’ not so good, and ‘Tim’ or ‘tvb’ won’t be allowed.

The main point is that we want people involved with the Azimuth Project to have clear identities. The second, more minor point is that our software is not braindead, so you can choose a username that’s your actual name, like

Tim van Beek

instead of having to choose something silly like

timvanbeek

or

tim_van_beek

But never mind me: I’m just a crotchety old curmudgeon. Come join the fun and help us save the planet by developing software that explains climate science, biology, and ecology—and, just maybe, speeds up the development of green mathematics and ecotechnology!

## A Course on Quantum Techniques for Stochastic Mechanics

18 September, 2012

Jacob Biamonte and I have come out with a draft of a book!

It’s based on the first 24 network theory posts on this blog. It owes a lot to everyone here, and the acknowledgements just scratch the surface of that indebtedness. At some later time I’d like to go through the posts and find the top twenty people who need to be thanked. But I’m leaving Singapore on Friday, going back to California to teach at U.C. Riverside, so I’ve been rushing to get something out before then.

If you see typos or other problems, please let us know!
We’ve reorganized the original blog articles and polished them up a bit, but we plan to do more before publishing these notes as a book.

I’m looking forward to teaching a seminar called Mathematics of the Environment when I get back to U.C. Riverside, and with luck I’ll put some notes from that on the blog here. I will also be trying to round up a team of grad students to work on network theory.

The next big topics in the network theory series will be electrical circuits and Bayesian networks. I’m beginning to see how these fit together with stochastic Petri nets in a unified framework, but I’ll need to talk and write about it to fill in all the details.

You can get a sense of what this course is about by reading this:

### Foreword

This course is about a curious relation between two ways of describing situations that change randomly with the passage of time. The old way is probability theory and the new way is quantum theory

Quantum theory is based, not on probabilities, but on amplitudes. We can use amplitudes to compute probabilities. However, the relation between them is nonlinear: we take the absolute value of an amplitude and square it to get a probability. It thus seems odd to treat amplitudes as directly analogous to probabilities. Nonetheless, if we do this, some good things happen. In particular, we can take techniques devised in quantum theory and apply them to probability theory. This gives new insights into old problems.

There is, in fact, a subject eager to be born, which is mathematically very much like quantum mechanics, but which features probabilities in the same equations where quantum mechanics features amplitudes. We call this subject stochastic mechanics

#### Plan of the course

In Section 1 we introduce the basic object of study here: a ‘stochastic Petri net’. A stochastic Petri net describes in a very general way how collections of things of different kinds can randomly interact and turn into other things. If we consider large numbers of things, we obtain a simplified deterministic model called the ‘rate equation’, discussed in Section 2. More fundamental, however, is the ‘master equation’, introduced in Section 3. This describes how the probability of having various numbers of things of various kinds changes with time.

In Section 4 we consider a very simple stochastic Petri net and notice that in this case, we can solve the master equation using techniques taken from quantum mechanics. In Section 5 we sketch how to generalize this: for any stochastic Petri net, we can write down an operator called a ‘Hamiltonian’ built from ‘creation and annihilation operators’, which describes the rate of change of the probability of having various numbers of things. In Section 6 we illustrate this with an example taken from population biology. In this example the rate equation is just the logistic equation, one of the simplest models in population biology. The master equation describes reproduction and competition of organisms in a stochastic way.

In Section 7 we sketch how time evolution as described by the master equation can be written as a sum over Feynman diagrams. We do not develop this in detail, but illustrate it with a predator–prey model from population biology. In the process, we give a slicker way of writing down the Hamiltonian for any stochastic Petri net.

In Section 8 we enter into a main theme of this course: the study of equilibrium solutions of the master and rate equations. We present the Anderson–Craciun–Kurtz theorem, which shows how to get equilibrium solutions of the master equation from equilibrium solutions of the rate equation, at least if a certain technical condition holds. Brendan Fong has translated Anderson, Craciun and Kurtz’s original proof into the language of annihilation and creation operators, and we give Fong’s proof here. In this language, it turns out that the equilibrium solutions are mathematically just like ‘coherent states’ in quantum mechanics.

In Section 9 we give an example of the Anderson–Craciun–Kurtz theorem coming from a simple reversible reaction in chemistry. This example leads to a puzzle that is resolved by discovering that the presence of ‘conserved quantities’—quantities that do not change with time—let us construct many equilibrium solutions of the rate equation other than those given by the Anderson–Craciun–Kurtz theorem.

Conserved quantities are very important in quantum mechanics, and they are related to symmetries by a result called Noether’s theorem. In Section 10 we describe a version of Noether’s theorem for stochastic mechanics, which we proved with the help of Brendan Fong. This applies, not just to systems described by stochastic Petri nets, but a much more general class of processes called ‘Markov processes’. In the analogy to quantum mechanics, Markov processes are analogous to arbitrary quantum systems whose time evolution is given by a Hamiltonian. Stochastic Petri nets are analogous to a special case of these: the case where the Hamiltonian is built from annihilation and creation operators. In Section 11 we state the analogy between quantum mechanics and stochastic mechanics more precisely, and with more attention to mathematical rigor. This allows us to set the quantum and stochastic versions of Noether’s theorem side by side and compare them in Section 12.

In Section 13 we take a break from the heavy abstractions and look at a fun example from chemistry, in which a highly symmetrical molecule randomly hops between states. These states can be seen as vertices of a graph, with the transitions as edges. In this particular example we get a famous graph with 20 vertices and 30 edges, called the ‘Desargues graph’.

In Section 14 we note that the Hamiltonian in this example is a ‘graph Laplacian’, and, following a computation done by Greg Egan, we work out the eigenvectors and eigenvalues of this Hamiltonian explicitly. One reason graph Laplacians are interesting is that we can use them as Hamiltonians to describe time evolution in both stochastic and quantum mechanics. Operators with this special property are called ‘Dirichlet operators’, and we discuss them in Section 15. As we explain, they also describe electrical circuits made of resistors. Thus, in a peculiar way, the intersection of quantum mechanics and stochastic mechanics is the study of electrical circuits made of resistors!

In Section 16, we study the eigenvectors and eigenvalues of an arbitrary Dirichlet operator. We introduce a famous result called the Perron–Frobenius theorem for this purpose. However, we also see that the Perron–Frobenius theorem is important for understanding the equilibria of Markov processes. This becomes important later when we prove the ‘deficiency zero theorem’.

We introduce the deficiency zero theorem in Section 17. This result, proved by the chemists Feinberg, Horn and Jackson, gives equilibrium solutions for the rate equation for a large class of stochastic Petri nets. Moreover, these equilibria obey the extra condition that lets us apply the Anderson–Craciun–Kurtz theorem and obtain equlibrium solutions of the master equations as well. However, the deficiency zero theorem is best stated, not in terms of stochastic Petri nets, but in terms of another, equivalent, formalism: ‘chemical reaction networks’. So, we explain chemical reaction networks here, and use them heavily throughout the rest of the course. However, because they are applicable to such a large range of problems, we call them simply ‘reaction networks’. Like stochastic Petri nets, they describe how collections of things of different kinds randomly interact and turn into other things.

In Section 18 we consider a simple example of the deficiency zero theorem taken from chemistry: a diatomic gas. In Section 19 we apply the Anderson–Craciun–Kurtz theorem to the same example.

In Section 20 we begin the final phase of the course: proving the deficiency zero theorem, or at least a portion of it. In this section we discuss the concept of ‘deficiency’, which had been introduced before, but not really explained: the definition that makes the deficiency easy to compute is not the one that says what this concept really means. In Section 21 we show how to rewrite the rate equation of a stochastic Petri net—or equivalently, of a reaction network—in terms of a Markov process. This is surprising because the rate equation is nonlinear, while the equation describing a Markov process is linear in the probabilities involved. The trick is to use a nonlinear operation called ‘matrix exponentiation’. In Section 22 we study equilibria for Markov processes. Then, finally, in Section 23, we use these equilbria to obtain equilibrium solutions of the rate equation, completing our treatment of the deficiency zero theorem.

## Network Theory (Part 24)

23 August, 2012

Now we’ve reached the climax of our story so far: we’re ready to prove the deficiency zero theorem. First let’s talk about it informally a bit. Then we’ll review the notation, and then—hang on to your seat!—we’ll give the proof.

The crucial trick is to relate a bunch of chemical reactions, described by a ‘reaction network’ like this:

to a simpler problem where a system randomly hops between states arranged in the same pattern:

This is sort of amazing, because we’ve thrown out lots of detail. It’s also amazing because this simpler problem is linear. In the original problem, the chance that a reaction turns a B + E into a D is proportional to the number of B’s times the number of E’s. That’s nonlinear! But in the simplified problem, the chance that your system hops from state 4 to state 3 is just proportional to the probability that it’s in state 4 to begin with. That’s linear.

The wonderful thing is that, at least under some conditions, we can find equilibrium solutions of our original problem starting from equilibrium solutions of the simpler problem.

Let’s roughly sketch how it works, and where we are so far. Our simplified problem is described by an equation like this:

$\displaystyle{ \frac{d}{d t} \psi = H \psi }$

where $\psi$ is a function that the probability of being in each state, and $H$ describes the probability per time of hopping from one state to another. We can easily understand quite a lot about the equilibrium solutions, where $\psi$ doesn’t change at all:

$H \psi = 0$

because this is a linear equation. We did this in Part 23. Of course, when I say ‘easily’, that’s a relative thing: we needed to use the Perron–Frobenius theorem, which Jacob introduced in Part 20. But that’s a well-known theorem in linear algebra, and it’s easy to apply here.

In Part 22, we saw that the original problem was described by an equation like this, called the ‘rate equation’:

$\displaystyle{ \frac{d x}{d t} = Y H x^Y }$

Here $x$ is a vector whose entries describe the amount of each kind of chemical: the amount of A’s, the amount of B’s, and so on. The matrix $H$ is the same as in the simplified problem, but $Y$ is a matrix that says how many times each chemical shows up in each spot in our reaction network:

The key thing to notice is $x^Y,$ where we take a vector and raise it to the power of a matrix. We explained this operation back in Part 22. It’s this operation that says how many B + E pairs we have, for example, given the number of B’s and the number of E’s. It’s this that makes the rate equation nonlinear.

Now, we’re looking for equilibrium solutions of the rate equation, where the rate of change is zero:

$Y H x^Y = 0$

But in fact we’ll do even better! We’ll find solutions of this:

$H x^Y = 0$

And we’ll get these by taking our solutions of this:

$H \psi = 0$

$\psi = x^Y$

while $\psi$ remains a solution of $H \psi = 0.$

But: how do we do this ‘adjusting’? That’s the crux of the whole business! That’s what we’ll do today.

Remember, $\psi$ is a function that gives a probability for each ‘state’, or numbered box here:

The picture here consists of two pieces, called ‘connected components’: the piece containing boxes 0 and 1, and the piece containing boxes 2, 3 and 4. It turns out that we can multiply $\psi$ by a function that’s constant on each connected component, and if we had $H \psi = 0$ to begin with, that will still be true afterward. The reason is that there’s no way for $\psi$ to ‘leak across’ from one component to another. It’s like having water in separate buckets. You can increase the amount of water in one bucket, and decrease it another, and as long as the water’s surface remains flat in each bucket, the whole situation remains in equilibrium.

That’s sort of obvious. What’s not obvious is that we can adjust $\psi$ this way so as to ensure

$\psi = x^Y$

for some $x.$

And indeed, it’s not always true! It’s only true if our reaction network obeys a special condition. It needs to have ‘deficiency zero’. We defined this concept back in Part 21, but now we’ll finally use it for something. It turns out to be precisely the right condition to guarantee we can tweak any function on our set of states, multiplying it by a function that’s constant on each connected component, and get a new function $\psi$ with

$\psi = x^Y$

When all is said and done, that is the key to the deficiency zero theorem.

### Review

The battle is almost upon us—we’ve got one last chance to review our notation. We start with a stochastic reaction network:

This consists of:

• finite sets of transitions $T,$ complexes $K$ and species $S,$

• a map $r: T \to (0,\infty)$ giving a rate constant for each transition,

source and target maps $s,t : T \to K$ saying where each transition starts and ends,

• a one-to-one map $Y : K \to \mathbb{N}^S$ saying how each complex is made of species.

Then we extend $s, t$ and $Y$ to linear maps:

Then we put inner products on these vector spaces as described last time, which lets us ‘turn around’ the maps $s$ and $t$ by taking their adjoints:

$s^\dagger, t^\dagger : \mathbb{R}^K \to \mathbb{R}^T$

More surprisingly, we can ‘turn around’ $Y$ and get a nonlinear map using ‘matrix exponentiation’:

$\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\ x &\mapsto& x^Y \end{array}$

This is most easily understood by thinking of $x$ as a row vector and $Y$ as a matrix:

Remember, complexes are made out of species. The matrix entry $Y_{i j}$ says how many things of the $j$th species there are in a complex of the $i$th kind. If $\psi \in \mathbb{R}^K$ says how many complexes there are of each kind, $Y \psi \in \mathbb{R}^S$ says how many things there are of each species. Conversely, if $x \in \mathbb{R}^S$ says how many things there are of each species, $x^Y \in \mathbb{R}^K$ says how many ways we can build each kind of complex from them.

So, we get these maps:

Next, the boundary operator

$\partial : \mathbb{R}^T \to \mathbb{R}^K$

describes how each transition causes a change in complexes:

$\partial = t - s$

As we saw last time, there is a Hamiltonian

$H : \mathbb{R}^K \to \mathbb{R}^K$

describing a Markov processes on the set of complexes, given by

$H = \partial s^\dagger$

But the star of the show is the rate equation. This describes how the number of things of each species changes with time. We write these numbers in a list and get a vector $x \in \mathbb{R}^S$ with nonnegative components. The rate equation says:

$\displaystyle{ \frac{d x}{d t} = Y H x^Y }$

We can read this as follows:

$x$ says how many things of each species we have now.

$x^Y$ says how many complexes of each kind we can build from these species.

$s^\dagger x^Y$ says how many transitions of each kind can originate starting from these complexes, with each transition weighted by its rate.

$H x^Y = \partial s^\dagger x^Y$ is the rate of change of the number of complexes of each kind, due to these transitions.

$Y H x^Y$ is the rate of change of the number of things of each species.

### The zero deficiency theorem

We are looking for equilibrium solutions of the rate equation, where the number of things of each species doesn’t change at all:

$Y H x^Y = 0$

In fact we will find complex balanced equilibrium solutions, where even the number of complexes of each kind doesn’t change:

$H x^Y = 0$

More precisely, we have:

Deficiency Zero Theorem (Child’s Version). Suppose we have a reaction network obeying these two conditions:

1. It is weakly reversible, meaning that whenever there’s a transition from one complex $\kappa$ to another $\kappa',$ there’s a directed path of transitions going back from $\kappa'$ to $\kappa.$

2. It has deficiency zero, meaning $\mathrm{im} \partial \cap \mathrm{ker} Y = \{ 0 \}$.

Then for any choice of rate constants there exists a complex balanced equilibrium solution of the rate equation where all species are present in nonzero amounts. In other words, there exists $x \in \mathbb{R}^S$ with all components positive and such that:

$H x^Y = 0$

Proof. Because our reaction network is weakly reversible, the theorems in Part 23 show there exists $\psi \in (0,\infty)^K$ with

$H \psi = 0$

This $\psi$ may not be of the form $x^Y,$ but we shall adjust $\psi$ so that it becomes of this form, while still remaining a solution of $H \psi = 0$latex . To do this, we need a couple of lemmas:

Lemma 1. $\mathrm{ker} \partial^\dagger + \mathrm{im} Y^\dagger = \mathbb{R}^K.$

Proof. We need to use a few facts from linear algebra. If $V$ is a finite-dimensional vector space with inner product, the orthogonal complement $L^\perp$ of a subspace $L \subseteq V$ consists of vectors that are orthogonal to everything in $L$:

$L^\perp = \{ v \in V : \quad \forall w \in L \quad \langle v, w \rangle = 0 \}$

We have

$(L \cap M)^\perp = L^\perp + M^\perp$

where $L$ and $M$ are subspaces of $V$ and $+$ denotes the sum of two subspaces: that is, the smallest subspace containing both. Also, if $T: V \to W$ is a linear map between finite-dimensional vector spaces with inner product, we have

$(\mathrm{ker} T)^\perp = \mathrm{im} T^\dagger$

and

$(\mathrm{im} T)^\perp = \mathrm{ker} T^\dagger$

Now, because our reaction network has deficiency zero, we know that

$\mathrm{im} \partial \cap \mathrm{ker} Y = \{ 0 \}$

Taking the orthogonal complement of both sides, we get

$(\mathrm{im} \partial \cap \mathrm{ker} Y)^\perp = \mathbb{R}^K$

and using the rules we mentioned, we obtain

$\mathrm{ker} \partial^\dagger + \mathrm{im} Y^\dagger = \mathbb{R}^K$

as desired.   █

Now, given a vector $\phi$ in $\mathbb{R}^K$ or $\mathbb{R}^S$ with all positive components, we can define the logarithm of such a vector, component-wise:

$(\ln \phi)_i = \ln (\phi_i)$

Similarly, for any vector $\phi$ in either of these spaces, we can define its exponential in a component-wise way:

$(\exp \phi)_i = \exp(\phi_i)$

These operations are inverse to each other. Moreover:

Lemma 2. The nonlinear operator

$\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\ x &\mapsto& x^Y \end{array}$

is related to the linear operator

$\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\ x &\mapsto& Y^\dagger x \end{array}$

by the formula

$x^Y = \exp(Y^\dagger \ln x )$

which holds for all $x \in (0,\infty)^S.$

Proof. A straightforward calculation. By the way, this formula would look a bit nicer if we treated $\ln x$ as a row vector and multiplied it on the right by $Y$: then we would have

$x^Y = \exp((\ln x) Y)$

The problem is that we are following the usual convention of multiplying vectors by matrices on the left, yet writing the matrix on the right in $x^Y.$ Taking the transpose $Y^\dagger$ of the matrix $Y$ serves to compensate for this.   █

Now, given our vector $\psi \in (0,\infty)^K$ with $H \psi = 0,$ we can take its logarithm and get $\ln \psi \in \mathbb{R}^K.$ Lemma 1 says that

$\mathbb{R}^K = \mathrm{ker} \partial^\dagger + \mathrm{im} Y^\dagger$

so we can write

$\ln \psi = \alpha + Y^\dagger \beta$

where $\alpha \in \mathrm{ker} \partial^\dagger$ and $\beta \in \mathbb{R}^S.$ Moreover, we can write

$\beta = \ln x$

for some $x \in (0,\infty)^S,$ so that

$\ln \psi = \alpha + Y^\dagger (\ln x)$

Exponentiating both sides component-wise, we get

$\psi = \exp(\alpha) \; \exp(Y^\dagger (\ln x))$

where at right we are taking the component-wise product of vectors. Thanks to Lemma 2, we conclude that

$\psi = \exp(\alpha) x^Y$

So, we have taken $\psi$ and almost written it in the form $x^Y$—but not quite! We can adjust $\psi$ to make it be of this form:

$\exp(-\alpha) \psi = x^Y$

Clearly all the components of $\exp(-\alpha) \psi$ are positive, since the same is true for both $\psi$ and $\exp(-\alpha).$ So, the only remaining task is to check that

$H(\exp(-\alpha) \psi) = 0$

We do this using two lemmas:

Lemma 3. If $H \psi = 0$ and $\alpha \in \mathrm{ker} \partial^\dagger,$ then $H(\exp(-\alpha) \psi) = 0.$

Proof. It is enough to check that multiplication by $\exp(-\alpha)$ commutes with the Hamiltonian $H,$ since then

$H(\exp(-\alpha) \psi) = \exp(-\alpha) H \psi = 0$

Recall from Part 23 that $H$ is the Hamiltonian of a Markov process associated to this ‘graph with rates’:

As noted here:

• John Baez and Brendan Fong, A Noether theorem for Markov processes.

multiplication by some function on $K$ commutes with $H$ if and only if that function is constant on each connected component of this graph. Such functions are called conserved quantities.

So, it suffices to show that $\exp(-\alpha)$ is constant on each connected component. For this, it is enough to show that $\alpha$ itself is constant on each connected component. But this will follow from the next lemma, since $\alpha \in \mathrm{ker} \partial^\dagger.$   █

Lemma 4. A function $\alpha \in \mathbb{R}^K$ is a conserved quantity iff $\partial^\dagger \alpha = 0 .$ In other words, $\alpha$ is constant on each connected component of the graph $s, t: T \to K$ iff $\partial^\dagger \alpha = 0$.

Proof. Suppose $\partial^\dagger \alpha = 0,$ or in other words, $\alpha \in \mathrm{ker} \partial^\dagger,$ or in still other words, $\alpha \in (\mathrm{im} \partial)^\perp.$ To show that $\alpha$ is constant on each connected component, it suffices to show that whenever we have two complexes connected by a transition, like this:

$\tau: \kappa \to \kappa'$

then $\alpha$ takes the same value at both these complexes:

$\alpha_\kappa = \alpha_{\kappa'}$

To see this, note

$\partial \tau = t(\tau) - s(\tau) = \kappa' - \kappa$

and since $\alpha \in (\mathrm{im} \partial)^\perp,$ we conclude

$\langle \alpha, \kappa' - \kappa \rangle = 0$

But calculating this inner product, we see

$\alpha_{\kappa'} - \alpha_{\kappa} = 0$

as desired.

For the converse, we simply turn the argument around: if $\alpha$ is constant on each connected component, we see $\langle \alpha, \kappa' - \kappa \rangle = 0$ whenever there is a transition $\tau : \kappa \to \kappa'.$ It follows that $\langle \alpha, \partial \tau \rangle = 0$ for every transition $\tau,$ so $\alpha \in (\mathrm{im} \partial)^\perp .$

And thus concludes the proof of the lemma!   █

And thus concludes the proof of the theorem!   █

And thus concludes this post!

## Network Theory (Part 23)

21 August, 2012

We’ve been looking at reaction networks, and we’re getting ready to find equilibrium solutions of the equations they give. To do this, we’ll need to connect them to another kind of network we’ve studied. A reaction network is something like this:

It’s a bunch of complexes, which are sums of basic building-blocks called species, together with arrows called transitions going between the complexes. If we know a number for each transition describing the rate at which it occurs, we get an equation called the ‘rate equation’. This describes how the amount of each species changes with time. We’ve been talking about this equation ever since the start of this series! Last time, we wrote it down in a new very compact form:

$\displaystyle{ \frac{d x}{d t} = Y H x^Y }$

Here $x$ is a vector whose components are the amounts of each species, while $H$ and $Y$ are certain matrices.

But now suppose we forget how each complex is made of species! Suppose we just think of them as abstract things in their own right, like numbered boxes:

We can use these boxes to describe states of some system. The arrows still describe transitions, but now we think of these as ways for the system to hop from one state to another. Say we know a number for each transition describing the probability per time at which it occurs:

Then we get a ‘Markov process’—or in other words, a random walk where our system hops from one state to another. If $\psi$ is the probability distribution saying how likely the system is to be in each state, this Markov process is described by this equation:

$\displaystyle{ \frac{d \psi}{d t} = H \psi }$

This is simpler than the rate equation, because it’s linear. But the matrix $H$ is the same—we’ll see that explicitly later on today.

What’s the point? Well, our ultimate goal is to prove the deficiency zero theorem, which gives equilibrium solutions of the rate equation. That means finding $x$ with

$Y H x^Y = 0$

Today we’ll find all equilibria for the Markov process, meaning all $\psi$ with

$H \psi = 0$

Then next time we’ll show some of these have the form

$\psi = x^Y$

So, we’ll get

$H x^Y = 0$

and thus

$Y H x^Y = 0$

as desired!

So, let’s get to to work.

### The Markov process of a graph with rates

We’ve been looking at stochastic reaction networks, which are things like this:

However, we can build a Markov process starting from just part of this information:

Let’s call this thing a ‘graph with rates’, for lack of a better name. We’ve been calling the things in $K$ ‘complexes’, but now we’ll think of them as ‘states’. So:

Definition. A graph with rates consists of:

• a finite set of states $K,$

• a finite set of transitions $T,$

• a map $r: T \to (0,\infty)$ giving a rate constant for each transition,

source and target maps $s,t : T \to K$ saying where each transition starts and ends.

Starting from this, we can get a Markov process describing how a probability distribution $\psi$ on our set of states will change with time. As usual, this Markov process is described by a master equation:

$\displaystyle{ \frac{d \psi}{d t} = H \psi }$

for some Hamiltonian:

$H : \mathbb{R}^K \to \mathbb{R}^K$

What is this Hamiltonian, exactly? Let’s think of it as a matrix where $H_{i j}$ is the probability per time for our system to hop from the state $j$ to the state $i.$ This looks backwards, but don’t blame me—blame the guys who invented the usual conventions for matrix algebra. Clearly if $i \ne j$ this probability per time should be the sum of the rate constants of all transitions going from $j$ to $i$:

$\displaystyle{ i \ne j \quad \Rightarrow \quad H_{i j} = \sum_{\tau: j \to i} r(\tau) }$

where we write $\tau: j \to i$ when $\tau$ is a transition with source $j$ and target $i.$

Now, we saw in Part 11 that for a probability distribution to remain a probability distribution as it evolves in time according to the master equation, we need $H$ to be infinitesimal stochastic: its off-diagonal entries must be nonnegative, and the sum of the entries in each column must be zero.

The first condition holds already, and the second one tells us what the diagonal entries must be. So, we’re basically done describing $H.$ But we can summarize it this way:

Puzzle 1. Think of $\mathbb{R}^K$ as the vector space consisting of finite linear combinations of elements $\kappa \in K.$ Then show

$\displaystyle{ H \kappa = \sum_{s(\tau) = \kappa} r(\tau) (t(\tau) - s(\tau)) }$

### Equilibrium solutions of the master equation

Now we’ll classify equilibrium solutions of the master equation, meaning $\psi \in \mathbb{R}^K$ with

$H \psi = 0$

We’ll do only do this when our graph with rates is ‘weakly reversible’. This concept doesn’t actually depend on the rates, so let’s be general and say:

Definition. A graph is weakly reversible if for every edge $\tau : i \to j,$ there is directed path going back from $j$ to $i,$ meaning that we have edges

$\tau_1 : j \to j_1 , \quad \tau_2 : j_1 \to j_2 , \quad \dots, \quad \tau_n: j_{n-1} \to i$

This graph with rates is not weakly reversible:

but this one is:

The good thing about the weakly reversible case is that we get one equilibrium solution of the master equation for each component of our graph, and all equilibrium solutions are linear combinations of these. This is not true in general! For example, this guy is not weakly reversible:

It has only one component, but the master equation has two linearly independent equilibrium solutions: one that vanishes except at the state 0, and one that vanishes except at the state 2.

The idea of a ‘component’ is supposed to be fairly intuitive—our graph falls apart into pieces called components—but we should make it precise. As explained in Part 21, the graphs we’re using here are directed multigraphs, meaning things like

$s, t : E \to V$

where $E$ is the set of edges (our transitions) and $V$ is the set of vertices (our states). There are actually two famous concepts of ‘component’ for graphs of this sort: ‘strongly connected’ components and ‘connected’ components. We only need connected components, but let me explain both concepts, in a futile attempt to slake your insatiable thirst for knowledge.

Two vertices $i$ and $j$ of a graph lie in the same strongly connected component iff you can find a directed path of edges from $i$ to $j$ and also one from $j$ back to $i.$

Remember, a directed path from $i$ to $j$ looks like this:

$i \to a \to b \to c \to j$

Here’s a path from $x$ to $y$ that is not directed:

$i \to a \leftarrow b \to c \to j$

and I hope you can write down the obvious but tedious definition of an ‘undirected path’, meaning a path made of edges that don’t necessarily point in the correct direction. Given that, we say two vertices $i$ and $j$ lie in the same connected component iff you can find an undirected path going from $i$ to $j.$ In this case, there will automatically also be an undirected path going from $j$ to $i.$

For example, $i$ and $j$ lie in the same connected component here, but not the same strongly connected component:

$i \to a \leftarrow b \to c \to j$

Here’s a graph with one connected component and 3 strongly connected components, which are marked in blue:

For the theory we’re looking at now, we only care about connected components, not strongly connected components! However:

Puzzle 2. Show that for weakly reversible graph, the connected components are the same as the strongly connected components.

With these definitions out of way, we can state today’s big theorem:

Theorem. Suppose $H$ is the Hamiltonian of a weakly reversible graph with rates:

Then for each connected component $C \subseteq K,$ there exists a unique probability distribution $\psi_C \in \mathbb{R}^K$ that is positive on that component, zero elsewhere, and is an equilibrium solution of the master equation:

$H \psi_C = 0$

Moreover, these probability distributions $\psi_C$ form a basis for the space of equilibrium solutions of the master equation. So, the dimension of this space is the number of components of $K.$

Proof. We start by assuming our graph has one connected component. We use the Perron–Frobenius theorem, as explained in Part 20. This applies to ‘nonnegative’ matrices, meaning those whose entries are all nonnegative. That is not true of $H$ itself, but only its diagonal entries can be negative, so if we choose a large enough number $c > 0,$ $H + c I$ will be nonnegative.

Since our graph is weakly reversible and has one connected component, it follows straight from the definitions that the operator $H + c I$ will also be ‘irreducible’ in the sense of Part 20. The Perron–Frobenius theorem then swings into action, and we instantly conclude several things.

First, $H + c I$ has a positive real eigenvalue $r$ such that any other eigenvalue, possibly complex, has absolute value $\le r.$ Second, there is an eigenvector $\psi$ with eigenvalue $r$ and all positive components. Third, any other eigenvector with eigenvalue $r$ is a scalar multiple of $\psi.$

Subtracting $c,$ it follows that $\lambda = r - c$ is the eigenvalue of $H$ with the largest real part. We have $H \psi = \lambda \psi,$ and any other vector with this property is a scalar multiple of $\psi.$

We can show that in fact $\lambda = 0.$ To do this we copy an argument from Part 20. First, since $\psi$ is positive we can normalize it to be a probability distribution:

$\displaystyle{ \sum_{i \in K} \psi_i = 1 }$

Since $H$ is infinitesimal stochastic, $\exp(t H)$ sends probability distributions to probability distributions:

$\displaystyle{ \sum_{i \in K} (\exp(t H) \psi)_i = 1 }$

for all $t \ge 0.$ On the other hand,

$\displaystyle{ \sum_{i \in K} (\exp(t H)\psi)_i = \sum_{i \in K} e^{t \lambda} \psi_i = e^{t \lambda} }$

so we must have $\lambda = 0.$

We conclude that when our graph has one connected component, there is a probability distribution $\psi \in \mathbb{R}^K$ that is positive everywhere and has $H \psi = 0.$ Moreover, any $\phi \in \mathbb{R}^K$ with $H \phi = 0$ is a scalar multiple of $\psi.$

When $K$ has several components, the matrix $H$ is block diagonal, with one block for each component. So, we can run the above argument on each component $C \subseteq K$ and get a probability distribution $\psi_C \in \mathbb{R}^K$ that is positive on $C.$ We can then check that $H \psi_C = 0$ and that every $\phi \in \mathbb{R}^K$ with $H \phi = 0$ can be expressed as a linear combination of these probability distributions $\psi_C$ in a unique way.   █

This result must be absurdly familiar to people who study Markov processes, but I haven’t bothered to look up a reference yet. Do you happen to know a good one? I’d like to see one that generalizes this theorem to graphs that aren’t weakly reversible. I think I see how it goes. We don’t need that generalization right now, but it would be good to have around.

### The Hamiltonian, revisited

One last small piece of business: last time I showed you a very slick formula for the Hamiltonian $H.$ I’d like to prove it agrees with the formula I gave this time.

We extend $s$ and $t$ to linear maps between vector spaces:

We define the boundary operator just as we did last time:

$\partial = t - s$

Then we put an inner product on the vector spaces $\mathbb{R}^T$ and $\mathbb{R}^K.$ So, for $\mathbb{R}^K$ we let the elements of $K$ be an orthonormal basis, but for $\mathbb{R}^T$ we define the inner product in a more clever way involving the rate constants:

$\displaystyle{ \langle \tau, \tau' \rangle = \frac{1}{r(\tau)} \delta_{\tau, \tau'} }$

where $\tau, \tau' \in T.$ This lets us define adjoints of the maps $s, t$ and $\partial,$ via formulas like this:

$\langle s^\dagger \phi, \psi \rangle = \langle \phi, s \psi \rangle$

Then:

Theorem. The Hamiltonian for a graph with rates is given by

$H = \partial s^\dagger$

Proof. It suffices to check that this formula agrees with the formula for $H$ given in Puzzle 1:

$\displaystyle{ H \kappa = \sum_{s(\tau) = \kappa} r(\tau) (t(\tau) - s(\tau)) }$

Here we are using the complex $\kappa \in K$ as a name for one of the standard basis vectors of $\mathbb{R}^K.$ Similarly shall we write things like $\tau$ or $\tau'$ for basis vectors of $\mathbb{R}^T.$

First, we claim that

$\displaystyle{ s^\dagger \kappa = \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, \tau }$

To prove this it’s enough to check that taking the inner products of either sides with any basis vector $\tau',$ we get results that agree. On the one hand:

$\begin{array}{ccl} \langle \tau' , s^\dagger \kappa \rangle &=& \langle s \tau', \kappa \rangle \\ \\ &=& \delta_{s(\tau'), \kappa} \end{array}$

On the other hand:

$\begin{array}{ccl} \displaystyle{ \langle \tau', \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, \tau \rangle } &=& \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, \langle \tau', \tau \rangle \\ \\ &=& \displaystyle{ \sum_{\tau: \; s(\tau) = \kappa} \delta_{\tau', \tau} } \\ \\ &=& \delta_{s(\tau'), \kappa} \end{array}$

where the factor of $1/r(\tau)$ in the inner product on $\mathbb{R}^T$ cancels the visible factor of $r(\tau).$ So indeed the results match.

Using this formula for $s^\dagger \kappa$ we now see that

$\begin{array}{ccl} H \kappa &=& \partial s^\dagger \kappa \\ \\ &=& \partial \displaystyle{ \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, \tau } \\ \\ &=& \displaystyle{ \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, (t(\tau) - s(\tau)) } \end{array}$

which is precisely what we want.   █

I hope you see through the formulas to their intuitive meaning. As usual, the formulas are just a way of precisely saying something that makes plenty of sense. If $\kappa$ is some state of our Markov process, $s^\dagger \kappa$ is the sum of all transitions starting at this state, weighted by their rates. Applying $\partial$ to a transition tells us what change in state it causes. So $\partial s^\dagger \kappa$ tells us the rate at which things change when we start in the state $\kappa.$ That’s why $\partial s^\dagger$ is the Hamiltonian for our Markov process. After all, the Hamiltonian tells us how things change:

$\displaystyle{ \frac{d \psi}{d t} = H \psi }$

Okay, we’ve got all the machinery in place. Next time we’ll prove the deficiency zero theorem!

## Network Theory (Part 22)

16 August, 2012

Okay, now let’s dig deeper into the proof of the deficiency zero theorem. We’re only going to prove a baby version, at first. Later we can enhance it:

Deficiency Zero Theorem (Baby Version). Suppose we have a weakly reversible reaction network with deficiency zero. Then for any choice of rate constants there exists an equilibrium solution of the rate equation where all species are present in nonzero amounts.

The first step is to write down the rate equation in a new, more conceptual way. It’s incredibly cool. You’ve probably seen Schrödinger’s equation, which describes the motion of a quantum particle:

$\displaystyle { \frac{d \psi}{d t} = -i H \psi }$

If you’ve been following this series, you’ve probably also seen the master equation, which describes the motion of a ‘stochastic’ particle:

$\displaystyle { \frac{d \psi}{d t} = H \psi }$

A ‘stochastic’ particle is one that’s carrying out a random walk, and now $\psi$ describes its probability to be somewhere, instead of its amplitude.

Today we’ll see that the rate equation for a reaction network looks somewhat similar:

$\displaystyle { \frac{d x}{d t} = Y H x^Y }$

where $Y$ is some matrix, and $x^Y$ is defined using a new thing called ‘matrix exponentiation’, which makes the equation nonlinear!

If you’re reading this you probably know how to multiply a vector by a matrix. But if you’re like me, you’ve never seen anyone take a vector and raise to the power of some matrix! I’ll explain it, don’t worry… right now I’m just trying to get you intrigued. It’s not complicated, but it’s exciting how this unusual operation shows up naturally in chemistry. That’s just what I’m looking for these days: new math ideas that show up in practical subjects like chemistry, and new ways that math can help people in these subjects.

Since we’re looking for an equilibrium solution of the rate equation, we actually want to solve

$\displaystyle { \frac{d x}{d t} = 0 }$

or in other words

$Y H x^Y = 0$

In fact we will do better: we will find a solution of

$H x^Y = 0$

And we’ll do this in two stages:

• First we’ll find all solutions of

$H \psi = 0$

This equation is linear, so it’s easy to understand.

• Then, among these solutions $\psi,$ we’ll find one that also obeys

$\psi = x^Y$

This is a nonlinear problem involving matrix exponentiation, but still, we can do it, using a clever trick called ‘logarithms’.

Putting the pieces together, we get our solution of

$H x^Y = 0$

and thus our equilibrium solution of the rate equation:

$\displaystyle { \frac{d x}{d t} = Y H x^Y = 0 }$

That’s a rough outline of the plan. But now let’s get started, because the details are actually fascinating. Today I’ll just show you how to rewrite the rate equation in this new way.

### The rate equation

Remember how the rate equation goes. We start with a stochastic reaction network, meaning a little diagram like this:

This contains quite a bit of information:

• a finite set $T$ of transitions,

• a finite set $K$ of complexes,

• a finite set $S$ of species,

• a map $r: T \to (0,\infty)$ giving a rate constant for each transition,

source and target maps $s,t : T \to K$ saying where each transition starts and ends,

• a one-to-one map $Y : K \to \mathbb{N}^S$ saying how each complex is made of species.

Given all this, the rate equation says how the amount of each species changes with time. We describe these amounts with a vector $x \in [0,\infty)^S.$ So, we want a differential equation filling in the question marks here:

$\displaystyle{ \frac{d x}{d t} = ??? }$

Now last time, we started by thinking of $K$ as a subset of $\mathbb{N}^S,$ and thus of the vector space $\mathbb{R}^S.$ Back then, we wrote the rate equation as follows:

$\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) \; \left(t(\tau) - s(\tau)\right) \; x^{s(\tau)} }$

where vector exponentiation is defined by

$x^s = x_1^{s_1} \cdots x_k^{s_k}$

when $x$ and $s$ are vectors in $\mathbb{R}^S.$

However, we’ve now switched to thinking of our set of complexes $K$ as a set in its own right that is mapped into $\mathbb{N}^S$ by $Y.$ This is good for lots of reasons, like defining the concept of ‘deficiency’, which we did last time. But it means the rate equation above doesn’t quite parse anymore! Things like $s(\tau)$ and $t(\tau)$ live in $K$; we need to explicitly convert them into elements of $\mathbb{R}^S$ using $Y$ for our equation to make sense!

So now we have to write the rate equation like this:

$\displaystyle{ \frac{d x}{d t} = Y \sum_{\tau \in T} r(\tau) \; \left(t(\tau) - s(\tau)\right) \; x^{Y s(\tau)} }$

This looks more ugly, but if you’ve got even one mathematical bone in your body, you can already see vague glimmerings of how we’ll rewrite this the way we want:

$\displaystyle { \frac{d x}{d t} = Y H x^Y }$

Here’s how.

First, we extend our maps $s, t$ and $Y$ to linear maps between vector spaces:

Then, we put an inner product on the vector spaces $\mathbb{R}^T,$ $\mathbb{R}^K$ and $\mathbb{R}^S.$ For $\mathbb{R}^K$ we do this in the most obvious way, by letting the complexes be an orthonormal basis. So, given two complexes $\kappa, \kappa',$ we define their inner product by

$\langle \kappa, \kappa' \rangle = \delta_{\kappa, \kappa'}$

We do the same for $\mathbb{R}^S.$ But for $\mathbb{R}^T$ we define the inner product in a more clever way involving the rate constants. If $\tau, \tau' \in T$ are two transitions, we define their inner product by:

$\langle \tau, \tau' \rangle = \frac{1}{r(\tau)} \delta_{\tau, \tau'}$

This will seem perfectly natural when we continue our study of circuits made of electrical resistors, and if you’re very clever you can already see it lurking in Part 16. But never mind.

Having put inner products on these three vector spaces, we can take the adjoints of the linear maps between them, to get linear maps going back the other way:

These are defined in the usual way—though we’re using daggers here they way physicists do, where mathematicians would prefer to see stars! For example, $s^\dagger : \mathbb{R}^K \to \mathbb{R}^T$ is defined by the relation

$\langle s^\dagger \phi, \psi \rangle = \langle \phi, s \psi \rangle$

and so on.

Next, we set up a random walk on the set of complexes. Remember, our reaction network is a graph with complexes as vertices and transitions as edges, like this:

Each transition $\tau$ has a number attached to it: the rate constant $r(\tau).$ So, we can randomly hop from complex to complex along these transitions, with probabilities per unit time described by these numbers. The probability of being at some particular complex will then be described by a function

$\psi : K \to \mathbb{R}$

which also depends on time, and changes according to the equation

$\displaystyle { \frac{d \psi}{d t} = H \psi }$

for some Hamiltonian

$H : \mathbb{R}^K \to \mathbb{R}^K$

I defined this Hamiltonian back in Part 15, but now I see a slicker way to write it:

$H = (t - s) s^\dagger$

I’ll justify this next time. For now, the main point is that with this Hamiltonian, the rate equation is equivalent to this:

$\displaystyle{ \frac{d x}{d t} = Y H x^Y }$

The only thing I haven’t defined yet is the funny exponential $x^Y.$ That’s what makes the equation nonlinear. We’re taking a vector to the power of a matrix and getting a vector. This sounds weird—but it actually makes sense!

It only makes sense because we have chosen bases for our vector spaces. To understand it, let’s number our species $1, \dots, k$ as we’ve been doing all along, and number our complexes $1, \dots, \ell.$ Our linear map $Y : \mathbb{R}^K \to \mathbb{R}^S$ then becomes a $k \times \ell$ matrix of natural numbers. Its entries say how many times each species shows up in each complex:

$Y = \left( \begin{array}{cccc} Y_{11} & Y_{12} & \cdots & Y_{1 \ell} \\ Y_{21} & Y_{22} & \cdots & Y_{2 \ell} \\ \vdots & \vdots & \ddots & \vdots \\ Y_{k1} & Y_{k2} & \cdots & Y_{k \ell} \end{array} \right)$

The entry $Y_{i j}$ says how many times the $i$th species shows up in the $j$th complex.

Now, let’s be a bit daring and think of the vector $x \in \mathbb{R}^S$ as a row vector with $k$ entries:

$x = \left(x_1 , x_2 , \dots , x_k \right)$

Then we can multiply $x$ on the right by the matrix $Y$ and get a vector in $\mathbb{R}^K$:

$\begin{array}{ccl} x Y &=& ( x_1 , x_2, \dots, x_k) \; \left( \begin{array}{cccc} Y_{11} & Y_{12} & \cdots & Y_{1 \ell} \\ Y_{21} & Y_{22} & \cdots & Y_{2 \ell} \\ \vdots & \vdots & \ddots & \vdots \\ Y_{k1} & Y_{k2} & \cdots & Y_{k \ell} \end{array} \right) \\ \\ &=& \left( x_1 Y_{11} + \cdots + x_k Y_{k1}, \; \dots, \; x_1 Y_{1 \ell} + \cdots + x_k Y_{k \ell} \right) \end{array}$

So far, no big deal. But now you’re ready to see the definition of $x^Y,$ which is very similar:

It’s exactly the same, but with multiplication replacing addition, and exponentiation replacing multiplication! Apparently my class on matrices stopped too soon: we learned about matrix multiplication, but matrix exponentiation is also worthwhile.

What’s the point of it? Well, suppose you have a certain number of hydrogen molecules, a certain number of oxygen molecules, a certain number of water molecules, and so on—a certain number of things of each species. You can list these numbers and get a vector $x \in \mathbb{R}^S.$ Then the components of $x^Y$ describe how many ways you can build up each complex from the things you have. For example,

$x_1^{Y_{11}} x_2^{Y_{21}} \cdots x_k^{Y_{k1}}$

say roughly how many ways you can build complex 1 by picking $Y_{11}$ things of species 1, $Y_{21}$ things of species 2, and so on.

Why ‘roughly’? Well, we’re pretending we can pick the same thing twice. So if we have 4 water molecules and we need to pick 3, this formula gives $4^3.$ The right answer is $4 \times 3 \times 2.$ To get this answer we’d need to use the ‘falling power’ $4^{\underline{3}} = 4 \times 3 \times 2,$ as explained in Part 4. But the rate equation describes chemistry in the limit where we have lots of things of each species. In this limit, the ordinary power becomes a good approximation.

Puzzle. In this post we’ve seen a vector raised to a matrix power, which is a vector, and also a vector raised to a vector power, which is a number. How are they related?

Theorem. The rate equation:

$\displaystyle{ \frac{d x}{d t} = Y \sum_{\tau \in T} r(\tau) \; \left(t(\tau) - s(\tau)\right) \; x^{Y s(\tau)} }$

is equivalent to this equation:

$\displaystyle{ \frac{d x}{d t} = Y (t - s) s^\dagger x^Y }$

or in other words:

$\displaystyle{ \frac{d x}{d t} = Y H x^Y }$

Proof. It’s enough to show

$\displaystyle{ (t - s) s^\dagger x^Y = \sum_{\tau \in T} r(\tau) \; \left(t(\tau) - s(\tau)\right) \; x^{Y s(\tau)} }$

So, we’ll compute $(t - s) s^\dagger x^Y,$ and think about the meaning of each quantity we get en route.

We start with $x \in \mathbb{R}^S.$ This is a list of numbers saying how many things of each species we have: our raw ingredients, as it were. Then we compute

$x^Y = (x_1^{Y_{11}} \cdots x_k^{Y_{k1}} , \dots, x_1^{Y_{1 \ell}} \cdots x_k^{Y_{k \ell}} )$

This is a vector in $\mathbb{R}^K.$ It’s a list of numbers saying how many ways we can build each complex starting from our raw ingredients.

Alternatively, we can write this vector $x^Y$ as a sum over basis vectors:

$x^Y = \sum_{\kappa \in K} x_1^{Y_{1\kappa}} \cdots x_k^{Y_{k\kappa}} \; \kappa$

Next let’s apply $s^\dagger$ to this. We claim that

$\displaystyle{ s^\dagger \kappa = \sum_{\tau \; : \; s(\tau) = \kappa} r(\tau) \; \tau }$

In other words, we claim $s^\dagger \kappa$ is the sum of all the transitions having $\kappa$ as their source, weighted by their rate constants! To prove this claim, it’s enough to take the inner product of each side with any transition $\tau',$ and check that we get the same answer. For the left side we get

$\langle s^\dagger \kappa, \tau' \rangle = \langle \kappa, s(\tau') \rangle = \delta_{\kappa, s (\tau') }$

To compute the right side, we need to use the cleverly chosen inner product on $\mathbb{R}^T.$ Here we get

$\displaystyle{ \left\langle \sum_{\tau \; : \; s(\tau) = \kappa} r(\tau) \tau, \; \tau' \right\rangle = \sum_{\tau \; : \; s(\tau) = \kappa} \delta_{\tau, \tau'} = \delta_{\kappa, s(\tau')} }$

In the first step here, the factor of $1 /r(\tau)$ in the cleverly chosen inner product canceled the visible factor of $r(\tau).$ For the second step, you just need to think for half a minute—or ten, depending on how much coffee you’ve had.

Either way, we we conclude that indeed

$\displaystyle{ s^\dagger \kappa = \sum_{\tau \; : \; s(\tau) = \kappa} r(\tau) \tau }$

Next let’s combine this with our formula for $x^Y$:

$\displaystyle { x^Y = \sum_{\kappa \in K} x_1^{Y_{1\kappa}} \cdots x_k^{Y_{k\kappa}} \; \kappa }$

We get this:

$\displaystyle { s^\dagger x^Y = \sum_{\kappa, \tau \; : \; s(\tau) = \kappa} r(\tau) \; x_1^{Y_{1\kappa}} \cdots x_k^{Y_{k\kappa}} \; \tau }$

In other words, $s^\dagger x^Y$ is a linear combination of transitions, where each one is weighted both by the rate it happens and how many ways it can happen starting with our raw ingredients.

Our goal is to compute $(t - s)s^\dagger x^Y.$ We’re almost there. Remember, $s$ says which complex is the input of a given transition, and $t$ says which complex is the output. So, $(t - s) s^\dagger x^Y$ says the total rate at which complexes are created and/or destroyed starting with the species in $x$ as our raw ingredients.

That sounds good. But let’s just pedantically check that everything works. Applying $t - s$ to both sides of our last equation, we get

$\displaystyle{ (t - s) s^\dagger x^Y = \sum_{\kappa, \tau \; : \; s(\tau) = \kappa} r(\tau) \; x_1^{Y_{1\kappa}} \cdots x_k^{Y_{k\kappa}} \; \left( t(\tau) - s(\tau)\right) }$

Remember, our goal was to prove that this equals

$\displaystyle{ \sum_{\tau \in T} r(\tau) \; \left(t(\tau) - s(\tau)\right) \; x^{Y s(\tau)} }$

But if you stare at these a while and think, you’ll see they’re equal.   █

It took me a couple of weeks to really understand this, so I’ll be happy if it takes you just a few days. It seems peculiar at first but ultimately it all makes sense. The interesting subtlety is that we use the linear map called ‘multiplying by $Y$’:

$\begin{array}{ccc} \mathbb{R}^K &\to& \mathbb{R}^S \\ \psi &\mapsto& Y \psi \end{array}$

to take a bunch of complexes and work out the species they contain, while we use the nonlinear map called ‘raising to the $Y$th power’:

$\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\ x &\mapsto& x^Y \end{array}$

to take a bunch of species and work out how many ways we can build each complex from them. There is much more to say about this: for example, these maps arise from a pair of what category theorists call ‘adjoint functors’. But I’m worn out and you probably are too, if you’re still here at all.

### References

I found this thesis to be the most helpful reference when I was trying to understand the proof of the deficiency zero theorem:

• Jonathan M. Guberman, Mass Action Reaction Networks and the Deficiency Zero Theorem, B.A. thesis, Department of Mathematics, Harvard University, 2003.

I urge you to check it out. In particular, Section 3 and Appendix A discuss matrix exponentiation. Has anyone discussed this before?

Here’s another good modern treatment of the deficiency zero theorem:

• Jeremy Gunawardena, Chemical reaction network theory for in silico biologists, 2003.

The theorem was first proved here:

• Martin Feinberg, Chemical reaction network structure and the stability of complex isothermal reactors: I. The deficiency zero and deficiency one theorems, Chemical Engineering Science 42 (1987), 2229-2268.

However, Feinberg’s treatment here relies heavily on this paper:

• F. Horn and R. Jackson, General mass action kinetics, Archive for Rational Mechanics and Analysis 47 (1972), 81-116.

(Does anyone work on ‘irrational mechanics’?) These lectures are also very helpful:

• Martin Feinberg, Lectures on reaction networks, 1979.

If you’ve seen other proofs, let us know.

## Network Theory (Part 21)

14 August, 2012

Recently we’ve been talking about ‘reaction networks’, like this:

The letters are names of chemicals, and the arrows are chemical reactions. If we know how fast each reaction goes, we can write down a ‘rate equation’ describing how the amount of each chemical changes with time.

In Part 17 we met the deficiency zero theorem. This is a powerful tool for finding equilibrium solutions of the rate equation: in other words, solutions where the amounts of the various chemicals don’t change at all. To apply this theorem, two conditions must hold. Both are quite easy to check:

• Your reaction network needs to be ‘weakly reversible’: if you have a reaction that takes one bunch of chemicals to another, there’s a series of reactions that takes that other bunch back to the one you started with.

• A number called the ‘deficiency’ that you can compute from your reaction network needs to be zero.

The first condition makes a lot of sense, intuitively: you won’t get an equilibrium with all the different chemicals present if some chemicals can turn into others but not the other way around. But the second condition, and indeed the concept of ‘deficiency’, seems mysterious.

Luckily, when you work through the proof of the deficiency zero theorem, the mystery evaporates. It turns out that there are two equivalent ways to define the deficiency of a reaction network. One makes it easy to compute, and that’s the definition people usually give. But the other explains why it’s important.

In fact the whole proof of the deficiency zero theorem is full of great insights, so I want to show it to you. This will be the most intense part of the network theory series so far, but also the climax of its first phase. After a little wrapping up, we will then turn to another kind of network: electrical circuits!

Today I’ll just unfold the concept of ‘deficiency’ so we see what it means. Next time I’ll show you a slick way to write the rate equation, which is crucial to proving the deficiency zero theorem. Then we’ll start the proof.

### Reaction networks

Let’s recall what a reaction network is, and set up our notation. In chemistry we consider a finite set of ‘species’ like C, O2, H2O and so on… and then consider reactions like

CH4 + 2O2 $\longrightarrow$ CO2 + 2H2O

On each side of this reaction we have a finite linear combination of species, with natural numbers as coefficients. Chemists call such a thing a complex.

So, given any finite collection of species, say $S,$ let’s write $\mathbb{N}^S$ to mean the set of finite linear combinations of elements of $S,$ with natural numbers as coefficients. The complexes appearing in our reactions will form a subset of this, say $K.$

We’ll also consider a finite collection of reactions—or in the language of Petri nets, ‘transitions’. Let’s call this $T$. Each transition goes from some complex to some other complex: if we want a reaction to be reversible we’ll explicitly include another reaction going the other way. So, given a transition $\tau \in T$ it will always go from some complex called its source $s(\tau)$ to some complex called its target $t(\tau).$

All this data, put together, is a reaction network:

Definition. A reaction network $(S,\; s,t : T \to K)$ consists of:

• a finite set $S$ of species,

• a finite set $T$ of transitions,

• a finite set $K \subset \mathbb{N}^S$ of complexes,

source and target maps $s,t : T \to K.$

We can draw a reaction network as a graph with complexes as vertices and transitions as edges:

The set of species here is $S = \{A,B,C,D,E\},$ and the set of complexes is $K = \{A,B,D,A+C,B+E\}.$

But to write down the ‘rate equation’ describing our chemical reactions, we need a bit more information: constants $r(\tau)$ saying the rate at which each transition $\tau$ occurs. So, we define:

Definition. A stochastic reaction network is a reaction network together with a map $r: T \to (0,\infty)$ assigning a rate constant to each transition.

Let me remind you how the rate equation works. At any time we have some amount $x_i \in [0,\infty)$ of each species $i.$ These numbers are the components of a vector $x \in \mathbb{R}^S,$ which of course depends on time. The rate equation says how this vector changes:

$\displaystyle{ \frac{d x_i}{d t} = \sum_{\tau \in T} r(\tau) \left(t_i(\tau) - s_i(\tau)\right) x^{s(\tau)} }$

Here I’m writing $s_i(\tau)$ for the $i$th component of the vector $s(\tau),$ and similarly for $t_i(\tau).$ I should remind you what $x^{s(\tau)}$ means, since here we are raising a vector to a vector power, which is a bit unusual. So, suppose we have any vector

$x = (x_1, \dots, x_k)$

and we raise it to the power of

$s = (s_1, \dots, s_k)$

Then by definition we get

$\displaystyle{ x^s = x_1^{s_1} \cdots x_k^{s_k} }$

Given this, I hope the rate equation makes intuitive sense! There’s one term for each transition $\tau.$ The factor of $t_i(\tau) - s_i(\tau)$ shows up because our transition destroys $s_i(\tau)$ things of the $i$th species and creates $t_i(\tau)$ of them. The big product

$\displaystyle{ x^{s(\tau)} = x_1^{s_1(\tau)} \cdots x_k^{s_k(\tau)} }$

shows up because our transition occurs at a rate proportional to the product of the numbers of things it takes as inputs. The constant of proportionality is the reaction rate $r(\tau).$

The deficiency zero theorem says lots of things, but in the next few episodes we’ll prove a weak version, like this:

Deficiency Zero Theorem (Baby Version). Suppose we have a weakly reversible reaction network with deficiency zero. Then for any choice of rate constants there exists an equilibrium solution $x \in (0,\infty)^S$ of the rate equation. In other words:

$\displaystyle{ \sum_{\tau \in T} r(\tau) \left(t_i(\tau) - s_i(\tau)\right) x^{s(\tau)} = 0}$

An important feature of this result is that all the components of the vector $x$ are positive. In other words, there’s actually some chemical of each species present!

But what do the hypotheses in this theorem mean?

A reaction network is weakly reversible if for any transition $\tau \in T$ going from a complex $\kappa$ to a complex $\kappa',$ there is a sequence of transitions going from $\kappa'$ back to $\kappa.$ But what about ‘deficiency zero’? As I mentioned, this requires more work to understand.

So, let’s dive in!

### Deficiency

In modern math, we like to take all the stuff we’re thinking about and compress it into a diagram with a few objects and maps going between these objects. So, to understand the deficiency zero theorem, I wanted to isolate the crucial maps. For starters, there’s an obvious map

$Y : K \to \mathbb{N}^S$

sending each complex to the linear combination of species that it is.

Indeed, we can change viewpoints a bit and think of $K$ as an abstract set equipped with this map saying how each complex is made of species. From now on this is what we’ll do. Then all the information in a stochastic reaction network sits in this diagram:

This is fundamental to everything we’ll do for the next few episodes, so take a minute to lock it into your brain.

We’ll do lots of different things with this diagram. For example, we often want to use ideas from linear algebra, and then we want our maps to be linear. For example, $Y$ extends uniquely to a linear map

$Y : \mathbb{R}^K \to \mathbb{R}^S$

sending real linear combinations of complexes to real linear combinations of species. Reusing the name $Y$ here won’t cause confusion. We can also extend $r,$ $s,$ and $t$ to linear maps in a unique way, getting a little diagram like this:

Linear algebra lets us talk about differences of complexes. Each transition $\tau$ gives a vector

$\partial(\tau) = t(\tau) - s(\tau) \in \mathbb{R}^K$

saying the change in complexes that it causes. And we can extend $\partial$ uniquely to a linear map

$\partial : \mathbb{R}^T \to \mathbb{R}^K$

defined on linear combinations of transitions. Mathematicians would call $\partial$ a boundary operator.

So, we have a little sequence of linear maps

$\mathbb{R}^T \stackrel{\partial}{\to} \mathbb{R}^K \stackrel{Y}{\to} \mathbb{R}^S$

This turns a transition into a change in complexes, and then a change in species.

If you know fancy math you’ll be wondering if this sequence is a ‘chain complex’, which is a fancy way of saying that $Y \partial = 0.$ The answer is no. This equation means that every linear combination of reactions leaves the amount of all species unchanged. Or equivalently: every reaction leaves the amount of all species unchanged. This only happens in very silly examples.

Nonetheless, it’s possible for a linear combination of reactions to leave the amount of all species unchanged.

For example, this will happen if we have a linear combination of reactions that leaves the amount of all complexes unchanged. But this sufficient condition is not necessary. And this leads us to the concept of ‘deficiency zero’:

Definition. A reaction network has deficiency zero if any linear combination of reactions that leaves the amount of every species unchanged also leaves the amount of every complex unchanged.

In short, a reaction network has deficiency zero iff

$Y (\partial \rho) = 0 \implies \partial \rho = 0$

for every $\rho \in \mathbb{R}^T.$ In other words—using some basic concepts from linear algebra—a reaction network has deficiency zero iff $Y$ is one-to-one when restricted to the image of $\partial.$ Remember, the image of $\partial$ is

$\mathrm{im} \partial = \{ \partial \rho \; : \; \rho \in \mathbb{R}^T \}$

Roughly speaking, this consists of all changes in complexes that can occur due to reactions.

In still other words, a reaction network has deficiency zero if 0 is the only vector in both the image of $\partial$ and the kernel of $Y:$

$\mathrm{im} \partial \cap \mathrm{ker} Y = \{ 0 \}$

Remember, the kernel of $Y$ is

$\mathrm{ker} Y = \{ \phi \in \mathbb{R}^K : Y \phi = 0 \}$

Roughly speaking, this consists of all changes in complexes that don’t cause changes in species. So, ‘deficiency zero’ roughly says that if a reaction causes a change in complexes, it causes a change in species.

(All this ‘roughly speaking’ stuff is because in reality I should be talking about linear combinations of transitions, complexes and species. But it’s a bit distracting to keep saying that when I’m trying to get across the basic ideas!)

Now we’re ready to understand deficiencies other than zero, at least a little. They’re defined like this:

Definition. The deficiency of a reaction network is the dimension of $\mathrm{im} \partial \cap \mathrm{ker} Y.$

### How to compute the deficiency

You can compute the deficiency of a reaction network just by looking at it. However, it takes a little training. First, remember that a reaction network can be drawn as a graph with complexes as vertices and transitions as edges, like this:

There are three important numbers we can extract from this graph:

• We can count the number of vertices in this graph; let’s call that $|K|,$ since it’s just the number of complexes.

• We can count the number of pieces or ‘components’ of this graph; let’s call that $\# \mathrm{components}$ for now.

• We can also count the dimension of the image of $Y \partial.$ This space, $\mathrm{im} Y \partial,$ is called the stochiometric subspace: vectors in here are changes in species that can be accomplished by transitions in our reaction network, or linear combinations of transitions.

These three numbers, all rather easy to compute, let us calculate the deficiency:

Theorem. The deficiency of a reaction network equals

$|K| - \# \mathrm{components} - \mathrm{dim} \left( \mathrm{im} Y \partial \right)$

Proof. By definition we have

$\mathrm{deficiency} = \dim \left( \mathrm{im} \partial \cap \mathrm{ker} Y \right)$

but another way to say this is

$\mathrm{deficiency} = \mathrm{dim} (\mathrm{ker} Y|_{\mathrm{im} \partial})$

where we are restricting $Y$ to the subspace $\mathrm{im} \partial,$ and taking the dimension of the kernel of that.

The rank-nullity theorem says that whenever you have a linear map $T: V \to W$ between finite-dimensional vector spaces, then

$\mathrm{dim} \left(\mathrm{ker} T \right) \; = \; \mathrm{dim}\left(\mathrm{dom} T\right) \; - \; \mathrm{dim} \left(\mathrm{im} T\right)$

where $\mathrm{dom} T$ is the domain of $T,$ namely the vector space $V.$ It follows that

$\mathrm{dim}(\mathrm{ker} Y|_{\mathrm{im} \partial}) = \mathrm{dim}(\mathrm{dom} Y|_{\mathrm{im} \partial}) - \mathrm{dim}(\mathrm{im} Y|_{\mathrm{im} \partial})$

The domain of $Y|_{\mathrm{im} \partial}$ is just $\mathrm{im} \partial,$ while its image equals $\mathrm{im} Y \partial,$ so

$\mathrm{deficiency} = \mathrm{dim}(\mathrm{im} \partial) - \mathrm{dim}(\mathrm{im} Y \partial)$

The theorem then follows from this:

Lemma. $\mathrm{dim} (\mathrm{im} \partial) = |K| - \# \mathrm{components}$

Proof. In fact this holds whenever we have a finite set of complexes and a finite set of transitions going between them. We get a diagram

We can extend the source and target functions to linear maps as usual:

and then we can define $\partial = t - s.$ We claim that

$\mathrm{dim} (\mathrm{im} \partial) = |K| - \# \mathrm{components}$

where $\# \mathrm{components}$ is the number of connected components of the graph with $K$ as vertices and $T$ as edges.

This is easiest to see using an inductive argument where we start by throwing out all the edges of our graph and then put them back in one at a time. When our graph has no edges, $\partial = 0$ and the number of components is $|K|,$ so our claim holds: both sides of the equation above are zero. Then, each time we put in an edge, there are two choices: either it goes between two different components of the graph we have built so far, or it doesn’t. If goes between two different components, it increases $\mathrm{dim} (\mathrm{im} \partial)$ by 1 and decreases the number of components by 1, so our equation continues to hold. If it doesn’t, neither of these quantities change, so our equation again continues to hold.   █

### Examples

This reaction network is not weakly reversible, since we can get from $B + E$ and $A + C$ to $D$ but not back. It becomes weakly reversible if we throw in another transition:

Taking a reaction network and throwing in the reverse of an existing transition never changes the number of complexes, the number of components, or the dimension of the stoichiometric subspace. So, the deficiency of the reaction network remains unchanged. We computed it back in Part 17. For either reaction network above:

• the number of complexes is 5:

$|K| = |\{A,B,D, A+C, B+E\}| = 5$

• the number of components is 2:

$\# \mathrm{components} = 2$

• the dimension of the stoichiometric subspace is 3. For this we go through each transition, see what change in species it causes, and take the vector space spanned by these changes. Then we find a basis for this space and count it:

$\mathrm{im} Y \partial =$

$\langle B - A, A - B, D - A - C, D - (B+E) , (B+E)-(A+C) \rangle$

$= \langle B -A, D - A - C, D - A - E \rangle$

so

$\mathrm{dim} \left(\mathrm{im} Y \partial \right) = 3$

As a result, the deficiency is zero:

$\begin{array}{ccl} \mathrm{deficiency} &=& |K| - \# \mathrm{components} - \mathrm{dim} \left( \mathrm{im} Y \partial \right) \\ &=& 5 - 2 - 3 \\ &=& 0 \end{array}$

Now let’s throw in another complex and some more transitions:

Now:

• the number of complexes increases by 1:

$|K| = |\{A,B,D,E, A+C, B+E\}| = 6$

• the number of components is unchanged:

$\# \mathrm{components} = 2$

• the dimension of the stoichiometric subspace increases by 1. We never need to include reverses of transitions when computing this:

$\mathrm{im} Y \partial =$

$\langle B-A, E-B, D - (A+C),$
$D - (B+E) , (B+E)-(A+C) \rangle$

$= \langle B -A, E-B, D - A - C, D - B - E \rangle$

so

$\mathrm{dim} \left(\mathrm{im} Y \partial \right) = 4$

As a result, the deficiency is still zero:

$\begin{array}{ccl} \mathrm{deficiency} &=& |K| - \# \mathrm{components} - \mathrm{dim} \left( \mathrm{im} Y \partial \right) \\ &=& 6 - 2 - 4 \\ &=& 0 \end{array}$

Do all reaction networks have deficiency zero? That would be nice. Let’s try one more example:

• the number of complexes is the same as in our last example:

$|K| = |\{A,B,D,E, A+C, B+E\}| = 6$

• the number of components is also the same:

$\# \mathrm{components} = 2$

• the dimension of the stoichiometric subspace is also the same:

$\mathrm{im} Y \partial =$

$\langle B-A, D - (A+C), D - (B+E) ,$
$(B+E)-(A+C), (B+E) - B \rangle$

$= \langle B -A, D - A - C, D - B , E\rangle$

so

$\mathrm{dim} \left(\mathrm{im} Y \partial \right) = 4$

So the deficiency is still zero:

$\begin{array}{ccl} \mathrm{deficiency} &=& |K| - \# \mathrm{components} - \mathrm{dim} \left( \mathrm{im} Y \partial \right) \\ &=& 6 - 2 - 4 \\ &=& 0 \end{array}$

It’s sure easy to find examples with deficiency zero!

Puzzle 1. Can you find an example where the deficiency is not zero?

Puzzle 2. If you can’t find an example, prove the deficiency is always zero. If you can, find 1) the smallest example and 2) the smallest example that actually arises in chemistry.

Note that not all reaction networks can actually arise in chemistry. For example, the transition $A \to A + A$ would violate conservation of mass. Nonetheless a reaction network like this might be useful in a very simple model of amoeba reproduction, one that doesn’t take limited resources into account.

### Different kinds of graphs

I’ll conclude with some technical remarks that only a mathematician could love; feel free to skip them if you’re not in the mood. As you’ve already seen, it’s important that a reaction network can be drawn as a graph:

But there are many kinds of graph. What kind of graph is this, exactly?

As I mentioned in Part 17, it’s a directed multigraph, meaning that the edges have arrows on them, more than one edge can go from one vertex to another, and we can have any number of edges going from a vertex to itself. Not all those features are present in this example, but they’re certainly allowed by our definition of reaction network!

After all, we’ve seen that a stochastic reaction network amounts to a little diagram like this:

If we throw out the rate constants, we’re left with a reaction network. So, a reaction network is just a diagram like this:

If we throw out the information about how complexes are made of species, we’re left with an even smaller diagram:

And this precisely matches the slick definition of a directed multigraph: it’s a set $E$ of edges, a set $V$ of vertices, and functions $s,t : E \to V$ saying where each edge starts and where it ends: its source and target.

Since this series is about networks, we should expect to run into many kinds of graphs. While their diversity is a bit annoying at first, we must learn to love it, at least if we’re mathematicians and want everything to be completely precise.

There are at least 23 = 8 kinds of graphs, depending on our answers to three questions:

• Do the edges have arrows on them?

• Can more than one edge can go between a specific pair of vertices?

and

• Can an edge can go from a vertex to itself?

We get directed multigraphs if we answer YES, YES and YES to these questions. Since they allow all three features, directed multigraphs are very important. For example, a category is a directed multigraph equipped with some extra structure. Also, what mathematicians call a quiver is just another name for a directed multigraph.

We’ve met two other kinds of graph so far:

• In Part 15 and Part 16 we described circuits made of resistors—or, in other words, Dirichlet operators—using ‘simple graphs’. We get simple graphs when we answer NO, NO and NO to the three questions above. The slick definition of a simple graph is that it’s a set $V$ of vertices together with a subset $E$ of the collection of 2-element subsets of $V$.

• In Part 20 we described Markov processes on finite sets—or, in other words, infinitesimal stochastic operators—using ‘directed graphs’. We get directed graphs when we answer YES, NO and YES to the three questions. The slick definition of a directed graph is that it’s a set $V$ of vertices together with a subset $E$ of the ordered pairs of $V$: $E \subseteq V \times V$.

There is a lot to say about this business, but for now I’ll just note that you can use directed multigraphs with edges labelled by positive numbers to describe Markov processes, just as we used directed graphs. You don’t get anything more general, though! After all, if we have multiple edges from one vertex to another, we can replace them by a single edge as long as we add up their rate constants. And an edge from a vertex to itself has no effect at all.

In short, both for Markov processes and reaction networks, we can take ‘graph’ to mean either ‘directed graph’ or ‘directed multigraph’, as long as we make some minor adjustments. In what follows, we’ll use directed multigraphs for both Markov processes and reaction networks. It will work a bit better if we ever get around to explaining how category theory fits into this story.

## Network Theory (Part 19)

18 July, 2012

joint with Jacob Biamonte

It’s time to resume the network theory series! We’re writing a little book based on some of these posts, so we want to finish up our discussion of stochastic Petri nets and chemical reaction networks. But it’s been a long time, so we’ll assume you forgot everything we’ve said before, and make this post as self-contained as possible.

Last time we started looking at a simple example: a diatomic gas.

A diatomic molecule of this gas can break apart into two atoms:

$A_2 \to A + A$

and conversely, two atoms can combine to form a diatomic molecule:

$A + A \to A_2$

We can draw both these reactions using a chemical reaction network:

where we’re writing $B$ instead of $A_2$ to abstract away some detail that’s just distracting here.

Last time we looked at the rate equation for this chemical reaction network, and found equilibrium solutions of that equation. Now let’s look at the master equation, and find equilibrium solutions of that. This will serve as a review of three big theorems.

### The master equation

We’ll start from scratch. The master equation is all about how atoms or molecules or rabbits or wolves or other things interact randomly and turn into other things. So, let’s write $\psi_{m,n}$ for the probability that we have $m$ atoms of $A$ and $n$ molecule of $B$ in our container. These probabilities are functions of time, and master equation will say how they change.

First we need to pick a rate constant for each reaction. Let’s say the rate constant for this reaction is some number $\alpha > 0$:

$B \to A + A$

while the rate constant for the reverse reaction is some number $\beta > 0$:

$A + A \to B$

Before we make it pretty using the ideas we’ve been explaining all along, the master equation says

$\begin{array}{ccr} \displaystyle{ \frac{d}{d t} \psi_{m,n} (t)} &=& \alpha (n+1) \, \psi_{m-2,n+1}(t) \\ \\ &&- \; \alpha n \, \psi_{m,n}(t) \\ \\ && + \beta (m+2)(m+1) \, \psi_{m+2,n-1}(t) \\ \\ && - \;\beta m(m-1) \, \psi_{m,n}(t) \\ \\ \end{array}$

where we define $\psi_{i,j}$ to be zero if either $i$ or $j$ is negative.

Yuck!

Normally we don’t show you such nasty equations. Indeed the whole point of our work has been to demonstrate that by packaging the equations in a better way, we can understand them using high-level concepts instead of mucking around with millions of scribbled symbols. But we thought we’d show you what’s secretly lying behind our beautiful abstract formalism, just once.

Each term has a meaning. For example, the third one:

$\beta (m+2)(m+1)\psi_{m+2,n-1}(t)$

means that the reaction $A + A \to B$ will tend to increase the probability of there being $m$ atoms of $A$ and $n$ molecules of $B$ if we start with $m+2$ atoms of $A$ and $n-1$ molecules of $B.$ This reaction can happen in $(m+2)(m+1)$ ways. And it happens at a probabilistic rate proportional to the rate constant for this reaction, $\beta.$

We won’t go through the rest of the terms. It’s a good exercise to do so, but there could easily be a typo in the formula, since it’s so long and messy. So let us know if you find one!

To simplify this mess, the key trick is to introduce a generating function that summarizes all the probabilities in a single power series:

$\Psi = \sum_{m,n \ge 0} \psi_{m,n} y^m \, z^n$

It’s a power series in two variables, $y$ and $z,$ since we have two chemical species: $A$s and $B$s.

Using this trick, the master equation looks like

$\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }$

where the Hamiltonian $H$ is a sum of terms, one for each reaction. This Hamiltonian is built from operators that annihilate and create $A$s and $B$s. The annihilation and creation operators for $A$ atoms are:

$\displaystyle{ a = \frac{\partial}{\partial y} , \qquad a^\dagger = y }$

The annihilation operator differentiates our power series with respect to the variable $y.$ The creation operator multiplies it by that variable. Similarly, the annihilation and creation operators for $B$ molecules are:

$\displaystyle{ b = \frac{\partial}{\partial z} , \qquad b^\dagger = z }$

In Part 8 we explained a recipe that lets us stare at our chemical reaction network and write down this Hamiltonian:

$H = \alpha ({a^\dagger}^2 b - b^\dagger b) + \beta (b^\dagger a^2 - {a^\dagger}^2 a^2)$

As promised, there’s one term for each reaction. But each term is itself a sum of two: one that increases the probability that our container of chemicals will be in a new state, and another that decreases the probability that it’s in its original state. We get a total of four terms, which correspond to the four terms in our previous way of writing the master equation.

Puzzle 1. Show that this new way of writing the master equation is equivalent to the previous one.

### Equilibrium solutions

Now we will look for all equilibrium solutions of the master equation: in other words, solutions that don’t change with time. So, we’re trying to solve

$H \Psi = 0$

Given the rather complicated form of the Hamiltonian, this seems tough. The challenge looks more concrete but even more scary if we go back to our original formulation. We’re looking for probabilities $\psi_{m,n},$ nonnegative numbers that sum to one, such that

$\begin{array}{ccr} 0 &=& \alpha (n+1) \, \psi_{m-2,n+1} \\ \\ &&- \; \alpha n \, \psi_{m,n} \\ \\ && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\ \end{array}$

for all $m$ and $n.$

This equation is horrid! But the good news is that it’s linear, so a linear combination of solutions is again a solution. This lets us simplify the problem using a conserved quantity.

Clearly, there’s a quantity that the reactions here don’t change:

What’s that? It’s the number of $A$s plus twice the number of $B$s. After all, a $B$ can turn into two $A$s, or vice versa.

(Of course the secret reason is that $B$ is a diatomic molecule made of two $A$s. But you’d be able to follow the logic here even if you didn’t know that, just by looking at the chemical reaction network… and sometimes this more abstract approach is handy! Indeed, the way chemists first discovered that certain molecules are made of certain atoms is by seeing which reactions were possible and which weren’t.)

Suppose we start in a situation where we know for sure that the number of $B$s plus twice the number of $A$s equals some number $k$:

$\psi_{m,n} = 0 \; \textrm{unless} \; m+2n = k$

Then we know $\Psi$ is initially of the form

$\displaystyle{ \Psi = \sum_{m+2n = k} \psi_{m,n} \, y^m z^n }$

But since the number of $A$s plus twice the number of $B$s is conserved, if $\Psi$ obeys the master equation it will continue to be of this form!

Put a fancier way, we know that if a solution of the master equation starts in this subspace:

$\displaystyle{ L_k = \{ \Psi: \; \Psi = \sum_{m+2n = k} \psi_{m,n} y^m z^n \; \textrm{for some} \; \psi_{m,n} \} }$

it will stay in this subspace. So, because the master equation is linear, we can take any solution $\Psi$ and write it as a linear combination of solutions $\Psi_k,$ one in each subspace $L_k.$

In particular, we can do this for an equilibrium solution $\Psi.$ And then all the solutions $\Psi_k$ are also equilibrium solutions: they’re linearly independent, so if one of them changed with time, $\Psi$ would too.

This means we can just look for equilibrium solutions in the subspaces $L_k.$ If we find these, we can get all equilibrium solutions by taking linear combinations.

Once we’ve noticed that, our horrid equation makes a bit more sense:

$\begin{array}{ccr} 0 &=& \alpha (n+1) \, \psi_{m-2,n+1} \\ \\ &&- \; \alpha n \, \psi_{m,n} \\ \\ && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\ \end{array}$

Note that if the pair of subscripts $m, n$ obey $m + 2n = k,$ the same is true for the other pairs of subscripts here! So our equation relates the values of $\psi_{m,n}$ for all the points $(m,n)$ with integer coordinates lying on this line segment:

$m+2n = k , \qquad m ,n \ge 0$

You should be visualizing something like this:

If you think about it a minute, you’ll see that if we know $\psi_{m,n}$ at two points on such a line, we can keep using our equation to recursively work out all the rest. So, there are at most two linearly independent equilibrium solutions of the master equation in each subspace $L_k.$

Why at most two? Why not two? Well, we have to be a bit careful about what happens at the ends of the line segment: remember that $\psi_{m,n}$ is defined to be zero when $m$ or $n$ becomes negative. If we think very hard about this, we’ll see there’s just one linearly independent equilibrium solution of the master equation in each subspace $L_k.$ But this is the sort of nitty-gritty calculation that’s not fun to watch someone else do, so we won’t bore you with that.

Soon we’ll move on to a more high-level approach to this problem. But first, one remark. Our horrid equation is like a fancy version of the usual discretized form of the equation

$\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }$

namely:

$\psi_{n-1} - 2 \psi_{n} + \psi_{n+1} = 0$

And this makes sense, since we get

$\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }$

by taking the heat equation:

$\displaystyle \frac{\partial \psi}{\partial t} = {\frac{\partial^2 \psi}{\partial x^2} }$

and assuming $\psi$ doesn’t depend on time. So what we’re doing is a lot like looking for equilibrium solutions of the heat equation.

The heat equation describes how heat smears out as little particles of heat randomly move around. True, there don’t really exist ‘little particles of heat’, but this equation also describes the diffusion of any other kind of particles as they randomly move around undergoing Brownian motion. Similarly, our master equation describes a random walk on this line segment:

$m+2n = k , \qquad m , n \ge 0$

or more precisely, the points on this segment with integer coordinates. The equilibrium solutions arise when the probabilities $\psi_{m,n}$ have diffused as much as possible.

If you think about it this way, it should be physically obvious that there’s just one linearly independent equilibrium solution of the master equation for each value of $k.$

There’s a general moral here, too, which we’re seeing in a special case: the master equation for a chemical reaction network really describes a bunch of random walks, one for each allowed value of the conserved quantities that can be built as linear combinations of number operators. In our case we have one such conserved quantity, but in general there may be more (or none).

Furthermore, these ‘random walks’ are what we’ve been calling Markov processes.

### Noether’s theorem

We simplified our task of finding equilibrium solutions of the master equation by finding a conserved quantity. The idea of simplifying problems using conserved quantities is fundamental to physics: this is why physicists are so enamored with quantities like energy, momentum, angular momentum and so on.

Nowadays physicists often use ‘Noether’s theorem’ to get conserved quantities from symmetries. There’s a very simple version of Noether’s theorem for quantum mechanics, but in Part 11 we saw a version for stochastic mechanics, and it’s that version that is relevant now. Here’s a paper which explains it in detail:

• John Baez and Brendan Fong, Noether’s theorem for Markov processes.

We don’t really need Noether’s theorem now, since we found the conserved quantity and exploited it without even noticing the symmetry. Nonetheless it’s interesting to see how it relates to what we’re doing.

For the reaction we’re looking at now, the idea is that the subspaces $L_k$ are eigenspaces of an operator that commutes with the Hamiltonian $H.$ It follows from standard math that a solution of the master equation that starts in one of these subspaces, stays in that subspace.

What is this operator? It’s built from ‘number operators’. The number operator for $A$s is

$N_A = a^\dagger a$

and the number operator for $B$s is

$N_B = b^\dagger b$

A little calculation shows

$N_A \,y^m z^n = m \, y^m z^n, \quad \qquad N_B\, y^m z^n = n \,y^m z^n$

so the eigenvalue of $N_A$ is the number of $A$s, while the eigenvalue of $N_B$ is the number of $B$s. This is why they’re called number operators.

As a consequence, the eigenvalue of the operator $N_A + 2N_B$ is the number of $A$s plus twice the number of $B$s:

$(N_A + 2N_B) \, y^m z^n = (m + 2n) \, y^m z^n$

Let’s call this operator $O,$ since it’s so important:

$O = N_A + 2N_B$

If you think about it, the spaces $L_k$ we saw a minute ago are precisely the eigenspaces of this operator:

$L_k = \{ \Psi : \; O \Psi = k \Psi \}$

As we’ve seen, solutions of the master equation that start in one of these eigenspaces will stay there. This lets take some techniques that are very familiar in quantum mechanics, and apply them to this stochastic situation.

First of all, time evolution as described by the master equation is given by the operators $\exp(t H).$ In other words,

$\displaystyle{ \frac{d}{d t} \Psi(t) } = H \Psi(t) \; \textrm{and} \; \Psi(0) = \Phi \quad \Rightarrow \quad \Psi(t) = \exp(t H) \Phi$

But if you start in some eigenspace of $O,$ you stay there. Thus if $\Phi$ is an eigenvector of $O,$ so is $\exp(t H) \Phi,$ with the same eigenvalue. In other words,

$O \Phi = k \Phi$

implies

$O \exp(t H) \Phi = k \exp(t H) \Phi = \exp(t H) O \Phi$

But since we can choose a basis consisting of eigenvectors of $O,$ we must have

$O \exp(t H) = \exp(t H) O$

or, throwing caution to the winds and differentiating:

$O H = H O$

So, as we’d expect from Noether’s theorem, our conserved quantity commutes with the Hamiltonian! This in turn implies that $H$ commutes with any polynomial in $O,$ which in turn suggests that

$\exp(s O) H = H \exp(s O)$

and also

$\exp(s O) \exp(t H) = \exp(t H) \exp(s O)$

The last equation says that $O$ generates a 1-parameter family of ‘symmetries’: operators $\exp(s O)$ that commute with time evolution. But what do these symmetries actually do? Since

$O y^m z^n = (m + 2n) y^m z^n$

we have

$\exp(s O) y^m z^n = e^{s(m + 2n)}\, y^m z^n$

So, this symmetry takes any probability distribution $\psi_{m,n}$ and multiplies it by $e^{s(m + 2n)}.$

In other words, our symmetry multiplies the relative probability of finding our container of gas in a given state by a factor of $e^s$ for each $A$ atom, and by a factor of $e^{2s}$ for each $B$ molecule. It might not seem obvious that this operation commutes with time evolution! However, experts on chemical reaction theory are familiar with this fact.

Finally, a couple of technical points. Starting where we said “throwing caution to the winds”, our treatment has not been rigorous, since $O$ and $H$ are unbounded operators, and these must be handled with caution. Nonetheless, all the commutation relations we wrote down are true.

The operators $\exp(s O)$ are unbounded for positive $s.$ They’re bounded for negative $s,$ so they give a one-parameter semigroup of bounded operators. But they’re not stochastic operators: even for $s$ negative, they don’t map probability distributions to probability distributions. However, they do map any nonzero vector $\Psi$ with $\psi_{m,n} \ge 0$ to a vector $\exp(s O) \Psi$ with the same properties. So, we can just normalize this vector and get a probability distribution. The need for this normalization is why we spoke of relative probabilities.

### The Anderson–Craciun–Kurtz theorem

Now we’ll actually find all equilibrium solutions of the master equation in closed form. To understand this final section, you really do need to remember some things we’ve discussed earlier. Last time we considered the same chemical reaction network we’re studying today, but we looked at its rate equation, which looks like this:

$\displaystyle{ \frac{d}{d t} x_1 = 2 \alpha x_2 - 2 \beta x_1^2}$

$\displaystyle{ \frac{d}{d t} x_2 = - \alpha x_2 + \beta x_1^2 }$

This describes how the number of $A$s and $B$s changes in the limit where there are lots of them and we can treat them as varying continuously, in a deterministic way. The number of $A$s is $x_1,$ and the number of $B$s is $x_2.$

We saw that the quantity

$x_1 + 2 x_2$

is conserved, just as today we’ve seen that $N_A + 2 N_B$ is conserved. We saw that the rate equation has one equilibrium solution for each choice of $x_1 + 2 x_2.$ And we saw that these equilibrium solutions obey

$\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }$

The Anderson–Craciun–Kurtz theorem, introduced in Part 9, is a powerful result that gets equilibrium solution of the master equation from equilibrium solutions of the rate equation. It only applies to equilibrium solutions that are ‘complex balanced’, but that’s okay:

Puzzle 2. Show that the equilibrium solutions of the rate equation for the chemical reaction network

are complex balanced.

So, given any equilibrium solution $(x_1,x_2)$ of our rate equation, we can hit it with the Anderson-Craciun-Kurtz theorem and get an equilibrium solution of the master equation! And it looks like this:

$\displaystyle{ \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }$

In this solution, the probability distribution

$\displaystyle{ \psi_{m,n} = e^{-(x_1 + x_2)} \, \frac{x_1^m x_2^n} {m! n! } }$

is a product of Poisson distributions. The factor in front is there to make the numbers $\psi_{m,n}$ add up to one. And remember, $x_1, x_2$ are any nonnegative numbers with

$\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }$

So from all we’ve said, the above formula is an explicit closed-form solution of the horrid equation

$\begin{array}{ccr} 0 &=& \alpha (n+1) \, \psi_{m-2,n+1} \\ \\ &&- \; \alpha n \, \psi_{m,n} \\ \\ && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\ \end{array}$

That’s pretty nice. We found some solutions without ever doing any nasty calculations.

But we’ve really done better than getting some equilibrium solutions of the master equation. By restricting attention to $n,m$ with $m+2n = k,$ our formula for $\psi_{m,n}$ gives an equilibrium solution that lives in the eigenspace $L_k$:

$\displaystyle{ \Psi_k = e^{-(x_1 + x_2)} \, \sum_{m+2n =k} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }$

And by what we’ve said, linear combinations of these give all equilibrium solutions of the master equation.

And we got them with very little work! Despite all the fancy talk in today’s post, we essentially just took the equilibrium solutions of the rate equation and plugged them into a straightforward formula to get equilibrium solutions of the master equation. This is why the Anderson–Craciun–Kurtz theorem is so nice. And of course we’re looking at a very simple reaction network: for more complicated ones it becomes even better to use this theorem to avoid painful calculations.

We could go further. For example, we could study nonequilibrium solutions using Feynman diagrams like this:

But instead, we will leave off with two more puzzles. We introduced some symmetries, but we haven’t really explored them yet:

Puzzle 3. What do the symmetries associated to the conserved quantity $O$ do to the equilibrium solutions of the master equation given by

$\displaystyle{ \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }$

where $(x_1,x_2)$ is an equilibrium solution of the rate equation? In other words, what is the significance of the one-parameter family of solutions

$\exp(s O) \Psi ?$

Also, we used a conceptual argument to check that $H$ commutes with $O,$ but it’s good to know that we can check this sort of thing directly:

Puzzle 4. Compute the commutator

$[H, O] = H O - O H$

and show it vanishes.