## Lyapunov Functions for Complex-Balanced Systems

7 January, 2014

guest post by Manoj Gopalkrishnan

A few weeks back, I promised to tell you more about a long-standing open problem in reaction networks, the ‘global attractor conjecture’. I am not going to quite get there today, but we shall take one step in that direction.

Today’s plan is to help you make friends with a very useful function we will call the ‘free energy’ which comes up all the time in the study of chemical reaction networks. We will see that for complex-balanced systems, the free energy function decreases along trajectories of the rate equation. I’m going to explain this statement, and give you most of the proof!

The point of doing all this work is that we will then be able to invoke Lyapunov’s theorem which implies stability of the dynamics. In Greek mythology, Sisyphus was cursed to roll a boulder up a hill only to have it roll down again, so that he had to keep repeating the task for all eternity. When I think of an unstable equilibrium, I imagine a boulder delicately balanced on top of a hill, which will fall off if given the slightest push:

or, more abstractly:

On the other hand, I picture a stable equilibrium as a pebble at the very bottom of a hill. Whichever way a perturbation takes it is up, so it will roll down again to the bottom:

Lyapunov’s theorem guarantees stability provided we can exhibit a nice enough function $V$ that decreases along trajectories. ‘Nice enough’ means that, viewing $V$ as a height function for the hill, the equilibrium configuration should be at the bottom, and every direction from there should be up. If Sisyphus had dug a pit at the top of the hill for the boulder to rest in, Lyapunov’s theorem would have applied, and he could have gone home to rest. The moral of the story is that it pays to learn dynamical systems theory!

Because of the connection to Lyapunov’s theorem, such functions that decrease along trajectories are also called Lyapunov functions. A similar situation is seen in Boltzmann’s H-theorem, and hence such functions are sometimes called H-functions by physicists.

Another reason for me to talk about these ideas now is that I have posted a new article on the arXiv:

• Manoj Gopalkrishnan, On the Lyapunov function for complex-balanced mass-action systems.

The free energy function in chemical reaction networks goes back at least to 1972, to this paper:

• Friedrich Horn and Roy Jackson, General mass action kinetics, Arch. Rational Mech. Analysis 49 (1972), 81–116.

Many of us credit Horn and Jackson’s paper with starting the mathematical study of reaction networks. My paper is an exposition of the main result of Horn and Jackson, with a shorter and simpler proof. The gain comes because Horn and Jackson proved all their results from scratch, whereas I’m using some easy results from graph theory, and the log-sum inequality.

We shall be talking about reaction networks. Remember the idea from the network theory series. We have a set $S$ whose elements are called species, for example

$S = \{ \mathrm{H}_2\mathrm{O}, \mathrm{H}^+, \mathrm{OH}^- \}$

A complex is a vector of natural numbers saying how many items of each species we have. For example, we could have a complex $(2,3,1).$ But chemists would usually write this as

$2 \mathrm{H}_2\mathrm{O} + 3 \mathrm{H}^+ + \mathrm{OH}^-$

A reaction network is a set $S$ of species and a set $T$ of transitions or reactions, where each transition $\tau \in T$ goes from some complex $m(\tau)$ to some complex $n(\tau).$ For example, we could have a transition $\tau$ with

$m(\tau) = \mathrm{H}_2\mathrm{O}$

and

$n(\tau) = \mathrm{H}^+ + \mathrm{OH}^-$

In this situation chemists usually write

$\mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-$

but we want names like $\tau$ for our transitions, so we might write

$\tau : \mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-$

or

$\mathrm{H}_2\mathrm{O} \stackrel{\tau}{\longrightarrow} \mathrm{H}^+ + \mathrm{OH}^-$

As John explained in Part 3 of the network theory series, chemists like to work with a vector of nonnegative real numbers $x(t)$ saying the concentration of each species at time $t.$ If we know a rate constant $r(\tau) > 0$ for each transition $\tau,$ we can write down an equation saying how these concentrations change with time:

$\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) (n(\tau) - m(\tau)) x^{m(\tau)} }$

This is called the rate equation. It’s really a system of ODEs describing how the concentration of each species change with time. Here an expression like $x^m$ is shorthand for the monomial ${x_1}^{m_1} \cdots {x_k}^{m_k}.$

John and Brendan talked about complex balance in Part 9. I’m going to recall this definition, from a slightly different point of view that will be helpful for the result we are trying to prove.

We can draw a reaction network as a graph! The vertices of this graph are all the complexes $m(\tau), n(\tau)$ where $\tau \in T.$ The edges are all the transitions $\tau\in T.$ We think of each edge $\tau$ as directed, going from $m(\tau)$ to $n(\tau).$

We will call the map that sends each transition $\tau$ to the positive real number $r(\tau) x^{m(\tau)}$ the flow $f_x(\tau)$ on this graph. The rate equation can be rewritten very simply in terms of this flow as:

$\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T}(n(\tau) - m(\tau)) \, f_x(\tau) }$

where the right-hand side is now a linear expression in the flow $f_x.$

Flows of water, or electric current, obey a version of Kirchhoff’s current law. Such flows are called conservative flows. The following two lemmas from graph theory are immediate for conservative flows:

Lemma 1. If f is a conservative flow then the net flow across every cut is zero.

A cut is a way of chopping the graph in two, like this:

It’s easy to prove Lemma 1 by induction, moving one vertex across the cut at a time.

Lemma 2. If a conservative flow exists then every edge $\tau\in T$ is part of a directed cycle.

Why is Lemma 2 true? Suppose there exists an edge $\tau : m \to n$ that is not part of any directed cycle. We will exhibit a cut with non-zero net flow. By Lemma 1, this will imply that the flow is not conservative.

One side of the cut will consist of all vertices from which $m$ is reachable by a directed path in the reaction network. The other side of the cut contains at least $n,$ since $m$ is not reachable from $n,$ by the assumption that $\tau$ is not part of a directed cycle. There is flow going from left to right of the cut, across the transition $\tau.$ Since there can be no flow coming back, this cut has nonzero net flow, and we’re done.     ▮

Now, back to the rate equation! We can ask if the flow $f_x$ is conservative. That is, we can ask if, for every complex $n$:

$\displaystyle{ \sum_{\tau : m \to n} f_x(m,n) = \sum_{\tau : n \to p} f_x(n,p). }$

In words, we are asking if the sum of the flow through all transitions coming in to $n$ equals the sum of the flow through all transitions going out of $n.$ If this condition is satisfied at a vector of concentrations $x = \alpha,$ so that the flow $f_\alpha$ is conservative, then we call $\alpha$ a point of complex balance. If in addition, every component of $\alpha$ is strictly positive, then we say that the system is complex balanced.

Clearly if $\alpha$ is a point of complex balance, it’s an equilibrium solution of the rate equation. In other words, $x(t) = \alpha$ is a solution of the rate equation, where $x(t)$ never changes.

I’m using ‘equilibrium’ the way mathematicians do. But I should warn you that chemists use ‘equilibrium’ to mean something more than merely a solution that doesn’t change with time. They often also mean it’s a point of complex balance, or even more. People actually get into arguments about this at conferences.

Complex balance implies more than mere equilibrium. For starters, if a reaction network is such that every edge belongs to a directed cycle, then one says that the reaction network is weakly reversible. So Lemmas 1 and 2 establish that complex-balanced systems must be weakly reversible!

From here on, we fix a complex-balanced system, with $\alpha$ a strictly positive point of complex balance.

Definition. The free energy function is the function

$g_\alpha(x) = \sum_{s\in S} x_s \log x_s - x_s - x_s \log \alpha_s$

where the sum is over all species in $S.$

The whole point of defining the function this way is because it is the unique function, up to an additive constant, whose partial derivative with respect to $x_s$ is $\log x_s/\alpha_s.$ This is important enough that we write it as a lemma. To state it in a pithy way, it is helpful to introduce vector notation for division and logarithms. If $x$ and $y$ are two vectors, we will understand $x/y$ to mean the vector $z$ such that $z_s = x_s/ y_s$ coordinate-wise. Similarly $\log x$ is defined in a coordinate-wise sense as the vector with coordinates $(\log x)_s = \log x_s.$

Lemma 3. The gradient $\nabla g_\alpha(x)$ of $g_\alpha(x)$ equals $\log(x/\alpha).$

We’re ready to state our main theorem!

Theorem. Fix a trajectory $x(t)$ of the rate equation. Then $g_\alpha(x(t))$ is a decreasing function of time $t.$ Further, it is strictly decreasing unless $x(t)$ is an equilibrium solution of the rate equation.

I find precise mathematical statements reassuring. You can often make up your mind about the truth value from a few examples. Very often, though not always, a few well-chosen examples are all you need to get the general idea for the proof. Such is the case for the above theorem. There are three key examples: the two-cycle, the three-cycle, and the figure-eight.

The two-cycle. The two-cycle is this reaction network:

It has two complexes $m$ and $n$ and two transitions $\tau_1 = m\to n$ and $\tau_2 = n\to m,$ with rates $r_1 = r(\tau_1)$ and $r_2 = r(\tau_2)$ respectively.

Fix a solution $x(t)$ of the rate equation. Then the flow from $m$ to $n$ equals $r_1 x^m$ and the backward flow equals $r_2 x^n.$ The condition for $f_\alpha$ to be a conservative flow requires that $f_\alpha = r_1 \alpha^m = r_2 \alpha^n.$ This is one binomial equation in at least one variable, and clearly has a solution in the positive reals. We have just shown that every two-cycle is complex balanced.

The derivative $d g_\alpha(x(t))/d t$ can now be computed by the chain rule, using Lemma 3. It works out to $f_\alpha$ times

$\displaystyle{ \left((x/\alpha)^m - (x/\alpha)^n\right) \, \log\frac{(x/\alpha)^n}{(x/\alpha)^m} }$

This is never positive, and it’s zero if and only if

$(x/\alpha)^m = (x/\alpha)^n$

Why is this? Simply because the logarithm of something greater than 1 is positive, while the log of something less than 1 is negative, so that the sign of $(x/\alpha)^m - (x/\alpha)^n$ is always opposite the sign of $\log \frac{(x/\alpha)^n}{(x/\alpha)^m}.$ We have verified our theorem for this example.

(Note that $(x/\alpha)^m = (x/\alpha)^n$ occurs when $x = \alpha,$ but also at other points: in this example, there is a whole hypersurface consisting of points of complex balance.)

In fact, this simple calculation achieves much more.

Definition. A reaction network is reversible if for every transition $\tau : m \to n$ there is a transition $\tau' : m \to n$ going back, called the reverse of $\tau.$ Suppose we have a reversible reaction network and a vector of concentrations $\alpha$ such that the flow along each edge equals that along the edge going back:

$f_\alpha(\tau) = f_\alpha(\tau')$

whenever $\tau'$ is the reverse $\tau.$ Then we say the reaction network is detailed balanced, and $\alpha$ is a point of detailed balance.

For a detailed-balanced system, the time derivative of $g_\alpha$ is a sum over the contributions of pairs consisting of an edge and its reverse. Hence, the two-cycle calculation shows that the theorem holds for all detailed balanced systems!

This linearity trick is going to prove very valuable. It will allow us to treat the general case of complex balanced systems one cycle at a time. The proof for a single cycle is essentially contained in the example of a three-cycle, which we treat next:

The three-cycle. The three-cycle is this reaction network:

We assume that the system is complex balanced, so that

$f_\alpha(m_1\to m_2) = f_\alpha(m_2\to m_3) = f_\alpha(m_3\to m_1)$

Let us call this nonnegative number $f_\alpha.$ A small calculation employing the chain rule shows that $d g_\alpha(x(t))/d t$ equals $f_\alpha$ times

$\displaystyle{ (x/\alpha)^{m_1}\, \log\frac{(x/\alpha)^{m_2}}{(x/\alpha)^{m_1}} \; + }$

$\displaystyle{ (x/\alpha)^{m_2} \, \log\frac{(x/\alpha)^{m_3}}{(x/\alpha)^{m_2}} \; + }$

$\displaystyle{ (x/\alpha)^{m_3}\, \log\frac{(x/\alpha)^{m_1}}{(x/\alpha)^{m_3}} }$

We need to think about the sign of this quantity:

Lemma 3. Let $a,b,c$ be positive numbers. Then $a \log b/a + b\log c/b + c\log a/c$ is less than or equal to zero, with equality precisely when $a=b=c.$

The proof is a direct application of the log sum inequality. In fact, this holds not just for three numbers, but for any finite list of numbers. Indeed, that is precisely how one obtains the proof for cycles of arbitrary length. Even the two-cycle proof is a special case! If you are wondering how the log sum inequality is proved, it is an application of Jensen’s inequality, that workhorse of convex analysis.

The three-cycle calculation extends to a proof for the theorem so long as there is no directed edge that is shared between two directed cycles. When there are such edges, we need to argue that the flows $f_\alpha$ and $f_x$ can be split between the cycles sharing that edge in a consistent manner, so that the cycles can be analyzed independently. We will need the following simple lemma about conservative flows from graph theory. We will apply this lemma to the flow $f_\alpha.$

Lemma 4. Let $f$ be a conservative flow on a graph $G.$ Then there exist directed cycles $C_1, C_2,\dots, C_k$ in $G,$ and nonnegative real ‘flows’ $f_1,f_2,\dots,f_k \in [0,\infty]$ such that for each directed edge $e$ in $G,$ the flow $f(e)$ equals the sum of $f_i$ over $i$ such the cycle $C_i$ contains the edge $e.$

Intuitively, this lemma says that conservative flows come from constant flows on the directed cycles of the graph. How does one show this lemma? I’m sure there are several proofs, and I hope some of you can share some of the really neat ones with me. The one I employed was algorithmic. The idea is to pick a cycle, any cycle, and subtract the maximum constant flow that this cycle allows, and repeat. This is most easily understood by looking at the example of the figure-eight:

The figure-eight. This reaction network consists of two three-cycles sharing an edge:

Here’s the proof for Lemma 4. Let $f$ be a conservative flow on this graph. We want to exhibit cycles and flows on this graph according to Lemma 4. We arbitrarily pick any cycle in the graph. For example, in the figure-eight, suppose we pick the cycle $u_0\to u_1\to u_2\to u_0.$ We pick an edge in this cycle on which the flow is minimum. In this case, $f(u_0\to u_1) = f(u_2\to u_0)$ is the minimum. We define a remainder flow by subtracting from $f$ this constant flow which was restricted to one cycle. So the remainder flow is the same as $f$ on edges that don’t belong to the picked cycle. For edges that belong to the cycle, the remainder flow is $f$ minus the minimum of $f$ on this cycle. We observe that this remainder flow satisfies the conditions of Lemma 4 on a graph with strictly fewer edges. Continuing in this way, since the lemma is trivially true for the empty graph, we are done by infinite descent.

Now that we know how to split the flow $f_\alpha$ across cycles, we can figure out how to split the rates across the different cycles. This will tell us how to split the flow $f_x$ across cycles. Again, this is best illustrated by an example.

The figure-eight. Again, this reaction network looks like

Suppose as in Lemma 4, we obtain the cycles

$C_1 = u_0\to u_1\to u_2\to u_0$

with constant flow $f_\alpha^1$

and

$C_2 = u_3\to u_1\to u_2\to u_3$ with constant flow $f_\alpha^2$ such that

$f_\alpha^1 + f_\alpha^2 = f_\alpha(u_1\to u_2)$

Here’s the picture:

Then we obtain rates $r^1(u_1\to u_2)$ and $r^2(u_1\to u_2)$ by solving the equations

$f^1_\alpha = r^1(u_1\to u_2) \alpha^{u_1}$

$f^2_\alpha = r^2(u_1\to u_2) \alpha^{u_2}$

Using these rates, we can define non-constant flows $f^1_x$ on $C_1$ and $f^2_x$ on $C_2$ by the usual formulas:

$f^1_x(u_1\to u_2) = r^1(u_1\to u_2) x^{u_1}$

and similarly for $f^2_x.$ In particular, this gives us

$f^1_x(u_1\to u_2)/f^1_\alpha = (x/\alpha)^{u_1}$

and similarly for $f^2_x.$

Using this, we obtain the proof of the Theorem! The time derivative of $g_\alpha$ along a trajectory has a contribution from each cycle $C$ as in Lemma 4, where each cycle is treated as a separate system with the new rates $r^C,$ and the new flows $f^C_\alpha$ and $f^C_x.$ So, we’ve reduced the problem to the case of a cycle, which we’ve already done.

Let’s review what happened. The time derivative of the function $g_\alpha$ has a very nice form, which is linear in the flow $f_x.$ The reaction network can be broken up into cycles. Th e conservative flow $f_\alpha$ for a complex balanced system can be split into conservative flows on cycles by Lemma 4. This informs us how to split the non-conservative flow $f_x$ across cycles. By linearity of the time derivative, we can separately treat the case for every cycle. For each cycle, we get an expression to which the log sum inequality applies, giving us the final result that $g_\alpha$ decreases along trajectories of the rate equation.

Now that we have a Lyapunov function, we will put it to use to obtain some nice theorems about the dynamics, and finally state the global attractor conjecture. All that and more, in the next blog post!

## Water

29 November, 2013

Over a year ago, I wrote here about ice. It has 16 known forms with different crystal geometries. The most common form on Earth, hexagonal ice I, is a surprisingly subtle blend of order and randomness:

Liquid water is even more complicated. It’s mainly a bunch of molecules like this jostling around:

The two hydrogens are tightly attached to the oxygen. But accidents do happen. On average, for every 555 million molecules of water, one is split into a negatively charged OH⁻ and a positively charged H⁺. And this actually matters a lot, in chemistry. It’s the reason we say water has pH 7.

Why? By definition, pH 7 means that for every liter of water, there’s 10-7 moles of H⁺. That’s where the 7 comes from. But there’s 55.5 moles of water in every liter, at least when the water is cold so its density is almost 1 kilogram/liter. So, do the math and you see one that for 555 million molecules of water, there’s only one H⁺.

Acids have a lot more. For example, lemon juice has one H⁺ per 8800 water molecules.

But let’s think about this H⁺ thing. What is it, really? It’s a hydrogen atom missing its electron: a proton, all by itself!

But what happens when you’ve got a lone proton in water? It doesn’t just sit there. It quickly attaches to a water molecule, forming H₃O⁺. This is called a hydronium ion, and it looks like this:

But hydronium is still positively charged, so it will attract electrons in other water molecules! Different things can happen. Here you see a hydronium ion water molecule surrounded by three water molecules in a symmetrical way:

This is called an Eigen cation, with chemical formula H₉O₄⁺. I believe it’s named after the Nobel-prize-winning chemist Manfred Eigen—not his grandfather Günther, the mathematician of ‘eigenvector’ fame.

And here you see a hydronium ion at lower right, attracted to water molecule at left:

The is a Zundel cation, with chemical formula H₅O₂⁺. It’s named after Georg Zundel, the German expert on hydrogen bonds. The H⁺ in the middle looks more tightly connected to the water at right than the water at left. But it should be completely symmetrical—at least, that’s the theory of how a Zundel cation works.

But the Eigen and Zundel cations are still positively charged, so they attract more water molecules, making bigger and bigger structures. Nowadays chemists are studying these using computer simulations, and comparing the results to experiments. In 2010, Evgenii Stoyanov, Irina Stoyanova and Christopher Reed used infrared spectroscopy to argue that a lone proton often attaches itself to 6 water molecules, forming H⁺(H₂O)₆, or H₁₃O₆⁺, like this:

As you can see, this forms when each hydrogen in a Zundel cation attracts an extra water molecule.

Even this larger structure attracts more water molecules:

But the positive charge, they claim, stays roughly within the dotted line.

Wait. Didn’t I say the lone proton was right in the middle? Isn’t that what the picture shows—the H in the middle?

Well, the picture is a bit misleading! First, everything is wiggling around a lot. And second, quantum mechanics says we don’t know the position of that proton precisely! Instead, it’s a ‘probability cloud’ smeared over a large region, ending roughly at the dashed line. (You can’t say precisely where a cloud ends.)

It seems that something about these subtleties makes the distance between the two oxygen nuclei at the center is surprisingly large. In an ordinary water molecule, the distance between the hydrogen and oxygen is a bit less than 100 pm—that’s 100 picometers, or 100 × 10-12 meters, or one angstrom (Å) in chemist’s units:

In ordinary ice, there are also weaker bonds called hydrogen bonds that attach neighboring water molecules. These bonds are a bit longer, as shown in this picture by Stephen Lower, who also drew that great picture of ice:

But the distance between the two central oxygens in H₁₃O₆⁺ is about 2.57 angstroms, or twice 1.28:

Stoyanov, Stoyanova and Reed put the exclamation mark here. I guess the big distance came as a big surprise!

I should emphasize that this work is new and still controversial. There’s some evidence, which I don’t understand, that 20 is a ‘magic number’: a lone proton is happiest when accompanied by 20 water molecules, forming H⁺(H₂O)₂₀. One possibility is that the proton is surrounded by a symmetrical cage of 20 water molecules shaped like a dodecahedron! But in 2005, a team of scientists did computer simulations and arrived at a different geometry, like this:

This is not symmetrical: there’s a Zundel cation highlighted at right, together with 20 water molecules.

Of course, in reality a number of different structures may predominate, in a rapidly changing and random way. Computer simulations should eventually let us figure this out. We’ve known the relevant laws of nature for over 80 years. But running them on a computer is not easy! Kieron Taylor did his PhD work on simulating water, and he wrote:

It’s a most vexatious substance to simulate in useful time scales. Including the proton exchange or even flexible multipoles requires immense computation.

It would be very interesting if the computational complexity of water were higher, in some precise sense, than many other liquids. It’s weird in other ways. It takes a lot of energy to heat water, it expands when it freezes, and its molecules have a large ‘dipole moment’—meaning the electric charge is distributed in a very lopsided way, thanks to the ‘Mickey Mouse’ way the two H’s are attached to the O.

I’ve been talking about the fate of the H⁺ when a water molecule splits into H⁺ and OH⁻. I should add that in heavy water, H⁺ could be something other than a lone proton. It could be a deuteron: a proton and a neutron stuck together. Or it could be a triton: a proton and two neutrons. For this reason, while most chemists call H⁺ simply a ‘proton’, the pedantically precise ones call it a hydron, which covers all the possibilities!

But what about the OH⁻? This is called a hydroxide ion:

But this, too, attracts other water molecules. First it grabs one and forms a bihydroxide ion, which is a chain like this:

H—O—H—O—H

with chemical formula H₃O₂⁻. And then the bihydroxide ion attracts other water molecules, perhaps like this:

Again, this is a guess—and certainly a simplified picture of a dynamic, quantum-mechanical system.

### References and digressions

For more, see:

• Evgenii S. Stoyanov, Irina V. Stoyanova, Christopher A. Reed, The unique nature of H⁺ in water, Chemical Science 2 (2011), 462–472.

Abstract: The H⁺(aq) ion in ionized strong aqueous acids is an unexpectedly unique H₁₃O₆⁺ entity, unlike those in gas phase H⁺(H₂O)n clusters or typical crystalline acid hydrates. IR spectroscopy indicates that the core structure has neither H₉O₄⁺ Eigen-like nor typical H₅O₂⁺ Zundel-like character. Rather, extensive delocalization of the positive charge leads to a H₁₃O₆⁺ ion having an unexpectedly long central OO separation of 2.57 Å and four conjugated OO separations of 2.7 Å. These dimensions are in conflict with the shorter OO separations found in structures calculated by theory. Ultrafast dynamic properties of the five H atoms involved in these H-bonds lead to a substantial collapse of normal IR vibrations and the appearance of a continuous broad absorption (cba) across the entire IR spectrum. This cba is distinguishable from the broad IR bands associated with typical low-barrier H-bonds. The solvation shell outside of the H₁₃O₆⁺ ion defines the boundary of positive charge delocalization. At low acid concentrations, the H₁₃O₆⁺ ion is a constituent part of an ion pair that has contact with the first hydration shell of the conjugate base anion. At higher concentrations, or with weaker acids, one or two H₂O molecules of H₁₃O₆⁺ cation are shared with the hydration shell of the anion. Even the strongest acids show evidence of ion pairing.﻿

Unfortunately this paper is not free, and my university doesn’t even subscribe to this journal. But I just discovered that Evgenii Stoyanov and Irina Stoyanova are here at U. C. Riverside! So, I may ask them some questions.

This picture:

came from here:

• Srinivasan S. Iyengar, Matt K. Petersen, Tyler J. F. Day, Christian J. Burnham, Virginia E. Teige and Gregory A. Voth, The properties of ion-water clusters. I. The protonated 21-water cluster, J. Chem. Phys. 123 (2005), 084309.

Abstract. The ab initio atom-centered density-matrix propagation approach and the multistate empirical valence bond method have been employed to study the structure, dynamics, and rovibrational spectrum of a hydrated proton in the “magic” 21 water cluster. In addition to the conclusion that the hydrated proton tends to reside on the surface of the cluster, with the lone pair on the protonated oxygen pointing “outwards,” it is also found that dynamical effects play an important role in determining the vibrational properties of such clusters. This result is used to analyze and complement recent experimental and theoretical studies.

This paper is free online! We live in a semi-barbaric age where science is probing the finest details of matter, space and time—but many of the discoveries, paid for by taxes levied on the hard-working poor, are snatched, hidden, and sold by profiteers. Luckily, a revolution is afoot…

There are other things in ‘pure water’ beside what I’ve mentioned. For example, there are some lone electrons! Since these are light, quantum mechanics says their probability cloud spreads out to be quite big. This picture by Michael Tauber shows what you should imagine:

He says:

Schematic representation of molecules in the first and second coordination shells around the solvated electron. First shell molecules are shown hydrogen bonded to the electron. Hydrogen bonds between molecules of 1st and 2nd shells are disrupted.

## Autocatalysis in Reaction Networks

11 October, 2013

guest post by Manoj Gopalkrishnan

Since this is my first time writing a blog post here, let me start with a word of introduction. I am a computer scientist at the Tata Institute of Fundamental Research, broadly interested in connections between Biology and Computer Science, with a particular interest in reaction networks. I first started thinking about them during my Ph.D. at the Laboratory for Molecular Science. My fascination with them has been predominantly mathematical. As a graduate student, I encountered an area with rich connections between combinatorics and dynamics, and surprisingly easy-to-state and compelling unsolved conjectures, and got hooked.

There is a story about Richard Feynman that he used to take bets with mathematicians. If any mathematician could make Feynman understand a mathematical statement, then Feynman would guess whether or not the statement was true. Of course, Feynman was in a habit of winning these bets, which allowed him to make the boast that mathematics, especially in its obsession for proof, was essentially irrelevant, since a relative novice like himself could after a moment’s thought guess at the truth of these mathematical statements. I have always felt Feynman’s claim to be unjust, but have often wondered what mathematical statement I would put to him so that his chances of winning were no better than random.

Today I want to tell you of a result about reaction networks that I have recently discovered with Abhishek Deshpande. The statement seems like a fine candidate to throw at Feynman because until we proved it, I would not have bet either way about its truth. Even after we obtained a short and elementary proof, I do not completely ‘see’ why it must be true. I am hoping some of you will be able to demystify it for me. So, I’m just going to introduce enough terms to be able to make the statement of our result, and let you think about how to prove it.

John and his colleagues have been talking about reaction networks as Petri nets in the network theory series on this blog. As discussed in part 2 of that series, a Petri net is a diagram like this:

Following John’s terminology, I will call the aqua squares ‘transitions’ and the yellow circles ‘species’. If we have some number #rabbit of rabbits and some number #wolf of wolves, we draw #rabbit many black dots called ‘tokens’ inside the yellow circle for rabbit, and #wolf tokens inside the yellow circle for wolf, like this:

Here #rabbit = 4 and #wolf = 3. The predation transition consumes one ‘rabbit’ token and one ‘wolf’ token, and produces two ‘wolf’ tokens, taking us here:

John explained in parts 2 and 3 how one can put rates on different transitions. For today I am only going to be concerned with ‘reachability:’ what token states are reachable from what other token states. John talked about this idea in part 25.

By a complex I will mean a population vector: a snapshot of the number of tokens in each species. In the example above, (#rabbit, #wolf) is a complex. If $y, y'$ are two complexes, then we write

$y \to y'$

if we can get from $y$ to $y'$ by a single transition in our Petri net. For example, we just saw that

$(4,3)\to (3,4)$

via the predation transition.

Reachability, denoted $\to^*$, is the transitive closure of the relation $\to$. So $y\to^* y'$ (read $y'$ is reachable from $y$) iff there are complexes

$y=y_0,y_1,y_2,\dots,y_k =y'$

such that

$y_0\to y_1\to\cdots\to y_{k-1}\to y_k.$

For example, here $(5,1) \to^* (1, 5)$ by repeated predation.

I am very interested in switches. After all, a computer is essentially a box of switches! You can build computers by connecting switches together. In fact, that’s how early computers like the Z3 were built. The CMOS gates at the heart of modern computers are essentially switches. By analogy, the study of switches in reaction networks may help us understand biochemical circuits.

A siphon is a set of species that is ‘switch-offable’. That is, if there are no tokens in the siphon states, then they will remain absent in future. Equivalently, the only reactions that can produce tokens in the siphon states are those that require tokens from the siphon states before they can fire. Note that no matter how many rabbits there are, if there are no wolves, there will continue to be no wolves. So {wolf} is a siphon. Similarly, {rabbit} is a siphon, as is the union {rabbit, wolf}. However, when Hydrogen and Oxygen form Water, {Water} is not a siphon.

For another example, consider this Petri net:

The set {HCl, NaCl} is a siphon. However, there is a conservation law: whenever an HCl token is destroyed, an NaCl token is created, so that #HCl + #NaCl is invariant. If both HCl and NaCl were present to begin with, the complexes where both are absent are not reachable. In this sense, this siphon is not ‘really’ switch-offable. As a first pass at capturing this idea, we will introduce the notion of ‘critical set’.

A conservation law is a linear expression involving numbers of tokens that is invariant under every transition in the Petri net. A conservation law is positive if all the coefficients are non-negative. A critical set of states is a set that does not contain the support of a positive conservation law.

For example, the support of the positive conservation law #HCl + #NaCl is {HCl, NaCl}, and hence no set containing this set is critical. Thus {HCl, NaCl} is a siphon, but not critical. On the other hand, the set {NaCl} is critical but not a siphon. {HCl} is a critical siphon. And in our other example, {Wolf, Rabbit} is a critical siphon.

Of particular interest to us will be minimal critical siphons, the minimal sets among critical siphons. Consider this example:

Here we have two transitions:

$X \to 2Y$

and

$2X \to Y$

The set $\{X,Y\}$ is a critical siphon. But so is the smaller set $\{X\}.$ So, $\{X,Y\}$ is not minimal.

We define a self-replicable set to be a set $A$ of species such that there exist complexes $y$ and $y'$ with $y\to^* y'$ such that for all $i \in A$ we have

$y'_i > y_i$

So, there are transitions that accomplish the job of creating more tokens for all the species in $A.$ In other words: these species can ‘replicate themselves’.

We define a drainable set by changing the $>$ to a $<$. So, there are transitions that accomplish the job of reducing the number of tokens for all the species in $A.$ These species can ‘drain away’.

Now here comes the statement:

Every minimal critical siphon is either drainable or self-replicable!

We prove it in this paper:

• Abhishek Deshpande and Manoj Gopalkrishnan, Autocatalysis in reaction networks.

But first note that the statement becomes false if the critical siphon is not minimal. Look at this example again:

The set $\{X,Y\}$ is a critical siphon. However $\{X,Y\}$ is neither self-replicable (since every reaction destroys $X$) nor drainable (since every reaction produces $Y$). But we’ve already seen that $\{X,Y\}$ is not minimal. It has a critical subsiphon, namely $\{X\}.$ This one is minimal—and it obeys our theorem, because it is drainable.

Checking these statements is a good way to make sure you understand the concepts! I know I’ve introduced a lot of terminology here, and it takes a while to absorb.

Anyway: our proof that every minimal critical siphon is either drainable or self-replicable makes use of a fun result about matrices. Consider a real square matrix with a sign pattern like this:

$\left( \begin{array}{cccc} <0 & >0 & \cdots & > 0 \\ >0 & <0 & \cdots &> 0 \\ \vdots & \vdots & <0 &> 0 \\ >0 & >0 & \cdots & <0 \end{array} \right)$

If the matrix is full-rank then there is a positive linear combination of the rows of the matrix so that all the entries are nonzero and have the same sign. In fact, we prove something stronger in Theorem 5.9 of our paper. At first, we thought this statement about matrices should be equivalent to one of the many well-known alternative statements of Farkas’ lemma, like Gordan’s theorem.

However, we could not find a way to make this work, so we ended up proving it by a different technique. Later, my colleague Jaikumar Radhakrishnan came up with a clever proof that uses Farkas’ lemma twice. However, so far we have not obtained the stronger result in Theorem 5.9 with this proof technique.

My interest in the result that every minimal critical siphon is either drainable or self-replicable is not purely aesthetic (though aesthetics is a big part of it). There is a research community of folks who are thinking of reaction networks as a programming language, and synthesizing molecular systems that exhibit sophisticated dynamical behavior as per specification:

Networks that exhibit some kind of catalytic behavior are a recurring theme among such systems, and even more so in biochemical circuits.

Here is an example of catalytic behavior:

$A + C \to B + C$

The ‘catalyst’ $C$ helps transform $A$ to $B.$ In the absence of $C,$ the reaction is turned off. Hence, catalysts are switches in chemical circuits! From this point of view, it is hardly surprising that they are required for the synthesis of complex behaviors.

In information processing, one needs amplification to make sure that a signal can propagate through a circuit without being overwhelmed by errors. Here is a chemical counterpart to such amplification:

$A + C \to 2C$

Here the catalyst $C$ catalyzes its own production: it is an ‘autocatalyst’, or a self-replicating species. By analogy, autocatalysis is key for scaling synthetic molecular systems.

Our work deals with these notions on a network level. We generalize the notion of catalysis in two ways. First, we allow a catalyst to be a set of species instead of a single species; second, its absence can turn off a reaction pathway instead of a single reaction. We propose the notion of self-replicable siphons as a generalization of the notion of autocatalysis. In particular, ‘weakly reversible’ networks have critical siphons precisely when they exhibit autocatalytic behavior. I was led to this work when I noticed the manifestation of this last statement in many examples.

Another hope I have is that perhaps one can study the dynamics of each minimal critical siphon of a reaction network separately, and then somehow be able to answer interesting questions about the dynamics of the entire network, by stitching together what we know for each minimal critical siphon. On the synthesis side, perhaps this could lead to a programming language to synthesize a reaction network that will achieve a specified dynamics. If any of this works out, it would be really cool! I think of how abelian group theory (and more broadly, the theory of abelian categories, which includes categories of vector bundles) benefits from a fundamental theorem that lets you break a finite abelian group into parts that are easy to study—or how number theory benefits from a special case, the fundamental theorem of arithmetic. John has also pointed out that reaction networks are really presentations of symmetric monoidal categories, so perhaps this could point the way to a Fundamental Theorem for Symmetric Monoidal Categories.

And then there is the Global Attractor Conjecture, a
long-standing open problem concerning the long-term behavior of solutions to the rate equations. Now that is a whole story by itself, and will have to wait for another day.

## Coherence for Solutions of the Master Equation

10 July, 2013

guest post by Arjun Jain

I am a master’s student in the physics department of the Indian Institute of Technology Roorkee. I’m originally from Delhi. Since some time now, I’ve been wanting to go into Mathematical Physics. I hope to do a PhD in that. Apart from maths and physics, I am also quite passionate about art and music.

Right now I am visiting John Baez at the Centre for Quantum Technologies, and we’re working on chemical reaction networks. This post can be considered as an annotation to the last paragraph of John’s paper, Quantum Techniques for Reaction Networks, where he raises the question of when a solution to the master equation that starts as a coherent state will remain coherent for all times. Remember, the ‘master equation’ describes the random evolution of collections of classical particles, and a ‘coherent state’ is one where the probability distribution of particles of each type is a Poisson distribution.

If you’ve been following the network theory series on this blog, you’ll know these concepts, and you’ll know the Anderson-Craciun-Kurtz theorem gives many examples of coherent states that remain coherent. However, all these are equilibrium solutions of the master equation: they don’t change with time. Moreover they are complex balanced equilibria: the rate at which any complex is produced equals the rate at which it is consumed.

There are also non-equilibrium examples where coherent states remain coherent. But they seem rather rare, and I would like to explain why. So, I will give a necessary condition for it to happen. I’ll give the proof first, and then discuss some simple examples. We will see that while the condition is necessary, it is not sufficient.

First, recall the setup. If you’ve been following the network theory series, you can skip the next section.

### Reaction networks

Definition. A reaction network consists of:

• a finite set $S$ of species,

• a finite set $K$ of complexes, where a complex is a finite sum of species, or in other words, an element of $\mathbb{N}^S,$

• a graph with $K$ as its set of vertices and some set $T$ of edges.

You should have in mind something like this:

where our set of species is $S = \{A,B,C,D,E\},$ the complexes are things like $A + E,$ and the arrows are the elements of $T,$ called transitions or reactions. So, we have functions

$s , t : T \to K$

saying the source and target of each transition.

Next:

Definition. A stochastic reaction network is a reaction network together with a function $r: T \to (0,\infty)$ assigning a rate constant to each reaction.

From this we can write down the master equation, which describes how a stochastic state evolves in time:

$\displaystyle{ \frac{d}{dt} \Psi(t) = H \Psi(t) }$

Here $\Psi(t)$ is a vector in the stochastic Fock space, which is the space of formal power series in a bunch of variables, one for each species, and $H$ is an operator on this space, called the Hamiltonian.

From now on I’ll number the species with numbers from $1$ to $k,$ so

$S = \{1, \dots, k\}$

Then the stochastic Fock space consists of real formal power series in variables that I’ll call $z_1, \dots, z_k.$ We can write any of these power series as

$\displaystyle{\Psi = \sum_{\ell \in \mathbb{N}^k} \psi_\ell z^\ell }$

where

$z^\ell = z_1^{\ell_1} \cdots z_k^{\ell_k}$

We have annihilation and creation operators on the stochastic Fock space:

$\displaystyle{ a_i \Psi = \frac{\partial}{\partial z_i} \Psi }$

$\displaystyle{ a_i^\dagger \Psi = z_i \Psi }$

and the Hamiltonian is built from these as follows:

$\displaystyle{ H = \sum_{\tau \in T} r(\tau) \, ({a^\dagger}^{t(\tau)} - {a^\dagger}^{s(\tau)}) \, a^{s(\tau)} }$

John explained this here (using slightly different notation), so I won’t go into much detail now, but I’ll say what all the symbols mean. Remember that the source of a transition $\tau$ is a complex, or list of natural numbers:

$s(\tau) = (s_1(\tau), \dots, s_k(\tau))$

So, the power $a^{s(\tau)}$ is really an abbreviation for a big product of annihilation operators, like this:

$\displaystyle{ a^{s(\tau)} = a_1^{s_1(\tau)} \cdots a_k^{s_k(\tau)} }$

This describes the annihilation of all the inputs to the transition $\tau.$ Similarly, we define

$\displaystyle{ {a^\dagger}^{s(\tau)} = {a_1^\dagger}^{s_1(\tau)} \cdots {a_k^\dagger}^{s_k(\tau)} }$

and

$\displaystyle{ {a^\dagger}^{t(\tau)} = {a_1^\dagger}^{t_1(\tau)} \cdots {a_k^\dagger}^{t_k(\tau)} }$

### The result

Here’s the result:

Theorem. If a solution $\Psi(t)$ of the master equation is a coherent state for all times $t \ge 0,$ then $\Psi(0)$ must be complex balanced except for complexes of degree 0 or 1.

This requires some explanation.

First, saying that $\Psi(t)$ is a coherent state means that it is an eigenvector of all the annihilation operators. Concretely this means

$\Psi (t) = \displaystyle{\frac{e^{c(t) \cdot z}}{e^{c_1(t) + \cdots + c_k(t)}}}$

where

$c(t) = (c_1(t), \dots, c_k(t)) \in [0,\infty)^k$

and

$z = (z_1, \dots, z_k)$

It will be helpful to write

$\mathbf{1}= (1,1,1,...)$

so we can write

$\Psi (t) = \displaystyle{ e^{c(t) \cdot (z - \mathbf{1})} }$

Second, we say that a complex has degree $d$ if it is a sum of exactly $d$ species. For example, in this reaction network:

the complexes $A + C$ and $B + E$ have degree 2, while the rest have degree 1. We use the word ‘degree’ because each complex $\ell$ gives a monomial

$z^\ell = z_1^{\ell_1} \cdots z_k^{\ell_k}$

and the degree of the complex is the degree of this monomial, namely

$\ell_1 + \cdots + \ell_k$

Third and finally, we say a solution $\Psi(t)$ of the master equation is complex balanced for a specific complex $\ell$ if the total rate at which that complex is produced equals the total rate at which it’s destroyed.

Now we are ready to prove the theorem:

Proof. Consider the master equation

$\displaystyle { \frac{d \Psi (t)}{d t} = H \psi (t) }$

Assume that $\Psi(t)$ is a coherent state for all $t \ge 0.$ This means

$\Psi (t) = \displaystyle{ e^{c(t) \cdot (z - \mathbf{1})} }$

For convenience, we write $c(t)$ simply as $c,$ and similarly for the components $c_i$. Then we have

$\displaystyle{ \frac{d\Psi(t)}{dt} = (\dot{c} \cdot (z - \mathbf{1})) \, e^{c \cdot (z - \mathbf{1})} }$

On the other hand, the master equation gives

$\begin{array}{ccl} \displaystyle {\frac{d\Psi(t)}{dt}} &=& \displaystyle{ \sum_{\tau \in T} r(\tau) \, ({a^\dagger}^{t(\tau)} - {a^\dagger}^{s(\tau)}) \, a^{s(\tau)} e^{c \cdot (z - \mathbf{1})} } \\ \\ &=& \displaystyle{\sum_{\tau \in T} c^{t(\tau)} r(\tau) \, ({z}^{t(\tau)} - {z}^{s(\tau)}) e^{c \cdot (z - \mathbf{1})} } \end{array}$

So,

$\displaystyle{ (\dot{c} \cdot (z - \mathbf{1})) \, e^{c \cdot (z - \mathbf{1})} =\sum_{\tau \in T} c^{t(\tau)} r(\tau) \, ({z}^{t(\tau)} - {z}^{s(\tau)}) e^{c \cdot (z - \mathbf{1})} }$

As a result, we get

$\displaystyle{ \dot{c}\cdot z -\dot{c}\cdot\mathbf{1} = \sum_{\tau \in T} c^{s(\tau)} r(\tau) \, ({z}^{t(\tau)} - {z}^{s(\tau)}) }.$

Comparing the coefficients of all $z^\ell,$ we obtain the following. For $\ell = 0,$ which is the only complex of degree zero, we get

$\displaystyle { \sum_{\tau: t(\tau)=0} r(\tau) c^{s(\tau)} - \sum_{\tau\;:\; s(\tau)= 0} r(\tau) c^{s(\tau)} = -\dot{c}\cdot\mathbf{1} }$

For the complexes $\ell$ of degree one, we get these equations:

$\displaystyle { \sum_{\tau\;:\; t(\tau)=(1,0,0,\dots)} r(\tau) c^{s(\tau)} - \sum_{\tau \;:\;s(\tau)=(1,0,0,\dots)} r(\tau) c^{s(\tau)}= \dot{c_1} }$

$\displaystyle { \sum_{\tau\; :\; t(\tau)=(0,1,0,\dots)} r(\tau) c^{s(\tau)} - \sum_{\tau\;:\; s(\tau)=(0,1,0,\dots)} r(\tau) c^{s(\tau)} = \dot{c_2} }$

and so on. For all the remaining complexes $\ell$ we have

$\displaystyle { \sum_{\tau\;:\; t(\tau)=\ell} r(\tau) c^{s(\tau)} = \sum_{\tau \;:\; s(\tau)=\ell} r(\tau) c^{s(\tau)} }.$

This says that the total rate at which this complex is produced equals the total rate at which it’s destroyed. So, our solution of the master equation is complex balanced for all complexes $\ell$ of degree greater than one. This is our necessary condition.                                                                                   █

To illustrate the theorem, I’ll consider three simple examples. The third example shows that the condition in the theorem, though necessary, is not sufficient. Note that our proof also gives a necessary and sufficient condition for a coherent state to remain coherent: namely, that all the equations we listed hold, not just initially but for all times. But this condition seems a bit complicated.

### Introducing amoebae into a Petri dish

Suppose that there is an inexhaustible supply of amoebae, randomly floating around in a huge pond. Each time an amoeba comes into our collection area, we catch it and add it to the population of amoebae in the Petri dish. Suppose that the rate constant for this process is 3.

So, the Hamiltonian is $3(a^\dagger -1).$ If we start with a coherent state, say

$\displaystyle { \Psi(0)=\frac{e^{cz}}{e^c} }$

then

$\displaystyle { \Psi(t) = e^{3(a^\dagger -1)t} \; \frac{e^{cz}}{e^c} = \frac{e^{(c+3t)z}}{e^{c+3t}} }$

which is coherent at all times.

We can see that the condition of the theorem is satisfied, as all the complexes in the reaction network have degree 0 or 1.

### Amoebae reproducing and competing

This example shows a Petri dish with one species, amoebae, and two transitions: fission and competition. We suppose that the rate constant for fission is 2, while that for competition is 1. The Hamiltonian is then

$H= 2({a^\dagger}^2-a^\dagger)a + (a^\dagger-{a^\dagger}^2)a^2$

If we start off with the coherent state

$\displaystyle{\Psi(0) = \frac{e^{2z}}{e^2}}$

we find that

$\displaystyle {\Psi(t)=e^{2(z^2-z)2+(z-z^2)4} \; \Psi(0)}=\Psi(0)$

which is coherent. It should be noted that the chosen initial state

$\displaystyle{ \frac{e^{2z}}{e^2}}$

was a complex balanced equilibrium solution. So, the Anderson–Craciun–Kurtz Theorem applies to this case.

### Amoebae reproducing, competing, and being introduced

This is a combination of the previous two examples, where apart from ongoing reproduction and competition, amoebae are being introduced into the dish with a rate constant 3.

As in the above examples, we might think that coherent states could remain coherent forever here too. Let’s check that.

Assuming that this was true, if

$\displaystyle{\Psi(t) = \frac{e^{c(t)z}}{e^{c(t)}} }$

then $c(t)$ would have to satisfy the following:

$\dot{c}(t) = c(t)^2 + 3 -2c(t)$

and

$c(t)^2=2c(t)$

Using the second equation, we get

$\dot{c}(t) = 3 \Rightarrow c = 3t+ c_0$

But this is certainly not a solution of the second equation. So, here we find that initially coherent states do not remain remain coherent for all times.

However, if we choose

$\displaystyle{\Psi(0) = \frac{e^{2z}}{e^2}}$

then this coherent state is complex balanced except for complexes of degree 1, since it was in the previous example, and the only new feature of this example, at time zero, is that single amoebas are being introduced—and these are complexes of degree 1. So, the condition of the theorem does hold.

So, the condition in the theorem is necessary but not sufficient. However, it is easy to check, and we can use it to show that in many cases, coherent states must cease to be coherent.

## The Large-Number Limit for Reaction Networks (Part 1)

1 July, 2013

Waiting for the other shoe to drop.

This is a figure of speech that means ‘waiting for the inevitable consequence of what’s come so far’. Do you know where it comes from? You have to imagine yourself in an apartment on the floor below someone who is taking off their shoes. When you hear one, you know the next is coming.

A guest who checked into an inn one night was warned to be quiet because the guest in the room next to his was a light sleeper. As he undressed for bed, he dropped one shoe, which, sure enough, awakened the other guest. He managed to get the other shoe off in silence, and got into bed. An hour later, he heard a pounding on the wall and a shout: “When are you going to drop the other shoe?”

When we were working on math together, James Dolan liked to say “the other shoe has dropped” whenever an inevitable consequence of some previous realization became clear. There’s also the mostly British phrase the penny has dropped. You say this when someone finally realizes the situation they’re in.

But sometimes one realization comes after another, in a long sequence. Then it feels like it’s raining shoes!

I guess that’s a rather strained metaphor. Perhaps falling like dominoes is better for these long chains of realizations.

This is how I’ve felt in my recent research on the interplay between quantum mechanics, stochastic mechanics, statistical mechanics and extremal principles like the principle of least action. The basics of these subjects should be completely figured out by now, but they aren’t—and a lot of what’s known, nobody bothered to tell most of us.

So, I was surprised to rediscover that the Maxwell relations in thermodynamics are formally identical to Hamilton’s equations in classical mechanics… though in retrospect it’s obvious. Thermodynamics obeys the principle of maximum entropy, while classical mechanics obeys the principle of least action. Wherever there’s an extremal principle, symplectic geometry, and equations like Hamilton’s equations, are sure to follow.

I was surprised to discover (or maybe rediscover, I’m not sure yet) that just as statistical mechanics is governed by the principle of maximum entropy, quantum mechanics is governed by a principle of maximum ‘quantropy’. The analogy between statistical mechanics and quantum mechanics has been known at least since Feynman and Schwinger. But this basic aspect was never explained to me!

I was also surprised to rediscover that simply by replacing amplitudes by probabilities in the formalism of quantum field theory, we get a nice formalism for studying stochastic many-body systems. This formalism happens to perfectly match the ‘stochastic Petri nets’ and ‘reaction networks’ already used in subjects from population biology to epidemiology to chemistry. But now we can systematically borrow tools from quantum field theory! All the tricks that particle physicists like—annihilation and creation operators, coherent states and so on—can be applied to problems like the battle between the AIDS virus and human white blood cells.

And, perhaps because I’m a bit slow on the uptake, I was surprised when yet another shoe came crashing to the floor the other day.

Because quantum field theory has, at least formally, a nice limit where Planck’s constant goes to zero, the same is true for for stochastic Petri nets and reaction networks!

In quantum field theory, we call this the ‘classical limit’. For example, if you have a really huge number of photons all in the same state, quantum effects sometimes become negligible, and we can describe them using the classical equations describing electromagnetism: the classical Maxwell equations. In stochastic situations, it makes more sense to call this limit the ‘large-number limit’: the main point is that there are lots of particles in each state.

In quantum mechanics, different observables don’t commute, so the so-called commutator matters a lot:

$[A,B] = AB - BA$

These commutators tend to be proportional to Planck’s constant. So in the limit where Planck’s constant $\hbar$ goes to zero, observables commute… but commutators continue to have a ghostly existence, in the form of Poisson bracket:

$\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }$

Poisson brackets are a key part of symplectic geometry—the geometry of classical mechanics. So, this sort of geometry naturally shows up in the study of stochastic Petri nets!

Let me sketch how it works. I’ll start with a section reviewing stuff you should already know if you’ve been following the network theory series.

### The stochastic Fock space

Suppose we have some finite set $S$. We call its elements species, since we think of them as different kinds of things—e.g., kinds of chemicals, or kinds of organisms.

To describe the probability of having any number of things of each kind, we need the stochastic Fock space. This is the space of real formal power series in a bunch of variables, one for each element of $S.$ It won’t hurt to simply say

$S = \{1, \dots, k \}$

Then the stochastic Fock space is

$\mathbb{R}[[z_1, \dots, z_k ]]$

this being math jargon for the space of formal power series with real coefficients in some variables $z_1, \dots, z_k,$ one for each element of $S.$

We write

$n = (n_1, \dots, n_k) \in \mathbb{N}^S$

and use this abbreviation:

$z^n = z_1^{n_1} \cdots z_k^{n_k}$

We use $z^n$ to describe a state where we have $n_1$ things of the first species, $n_2$ of the second species, and so on.

More generally, a stochastic state is an element $\Psi$ of the stochastic Fock space with

$\displaystyle{ \Psi = \sum_{n \in \mathbb{N}^k} \psi_n \, z^n }$

where

$\psi_n \ge 0$

and

$\displaystyle{ \sum_{n \in \mathbb{N}^k} \psi_n = 1 }$

We use $\Psi$ to describe a state where $\psi_n$ is the probability of having $n_1$ things of the first species, $n_2$ of the second species, and so on.

The stochastic Fock space has some important operators on it: the annihilation operators given by

$\displaystyle{ a_i \Psi = \frac{\partial}{\partial z_i} \Psi }$

and the creation operators given by

$\displaystyle{ a_i^\dagger \Psi = z_i \Psi }$

From these we can define the number operators:

$N_i = a_i^\dagger a_i$

Part of the point is that

$N_i z^n = n_i z^n$

This says the stochastic state $z^n$ is an eigenstate of all the number operators, with eigenvalues saying how many things there are of each species.

The annihilation, creation, and number operators obey some famous commutation relations, which are easy to check for yourself:

$[a_i, a_j] = 0$

$[a_i^\dagger, a_j^\dagger] = 0$

$[a_i, a_j^\dagger] = \delta_{i j}$

$[N_i, N_j ] = 0$

$[N_i , a_j^\dagger] = \delta_{i j} a_j^\dagger$

$[N_i , a_j] = - \delta_{i j} a_j^\dagger$

The last two have easy interpretations. The first of these two implies

$N_i a_i^\dagger \Psi = a_i^\dagger (N_i + 1) \Psi$

This says that if we start in some state $\Psi,$ create a thing of type $i,$ and then count the things of that type, we get one more than if we counted the number of things before creating one. Similarly,

$N_i a_i \Psi = a_i (N_i - 1) \Psi$

says that if we annihilate a thing of type $i$ and then count the things of that type, we get one less than if we counted the number of things before annihilating one.

### Introducing Planck’s constant

Now let’s introduce an extra parameter into this setup. To indicate the connection to quantum physics, I’ll call it $\hbar,$ which is the usual symbol for Planck’s constant. However, I want to emphasize that we’re not doing quantum physics here! We’ll see that the limit where $\hbar \to 0$ is very interesting, but it will correspond to a limit where there are many things of each kind.

We’ll start by defining

$A_i = \hbar \, a_i$

and

$C_i = a_i^\dagger$

Here $A$ stands for ‘annihilate’ and $C$ stands for ‘create’. Think of $A$ as a rescaled annihilation operator. Using this we can define a rescaled number operator:

$\widetilde{N}_i = C_i A_i$

So, we have

$\widetilde{N}_i = \hbar N_i$

and this explains the meaning of the parameter $\hbar.$ The idea is that instead of counting things one at time, we count them in bunches of size $1/\hbar.$

For example, suppose $\hbar = 1/12.$ Then we’re counting things in dozens! If we have a state $\Psi$ with

$N_i \Psi = 36 \Psi$

then there are 36 things of the ith kind. But this implies

$\widetilde{N}_i \Psi = 3 \Psi$

so there are 3 dozen things of the ith kind.

Chemists don’t count in dozens; they count things in big bunches called moles. A mole is approximately the number of carbon atoms in 12 grams: Avogadro’s number, 6.02 × 1023. When you count things by moles, you’re taking $\hbar$ to be 1.66 × 10-24, the reciprocal of Avogadro’s number.

So, while in quantum mechanics Planck’s constant is ‘the quantum of action’, a unit of action, here it’s ‘the quantum of quantity’: the amount that corresponds to one thing.

We can easily work out the commutation relations of our new rescaled operators:

$[A_i, A_j] = 0$

$[C_i, C_j] = 0$

$[A_i, C_j] = \hbar \, \delta_{i j}$

$[\widetilde{N}_i, \widetilde{N}_j ] = 0$

$[\widetilde{N}_i , C_j] = \hbar \, \delta_{i j} C_j$

$[\widetilde{N}_i , A_j] = - \hbar \, \delta_{i j} A_j$

These are just what you see in quantum mechanics! The commutators are all proportional to $\hbar.$

Again, we can understand what these relations mean if we think a bit. For example, the commutation relation for $\widetilde{N}_i$ and $C_i$ says

$N_i C_i \Psi = C_i (N_i + \hbar) \Psi$

This says that if we start in some state $\Psi,$ create a thing of type $i,$ and then count the things of that type, we get $\hbar$ more than if we counted the number of things before creating one. This is because we are counting things not one at a time, but in bunches of size $1/\hbar.$

You may be wondering why I defined the rescaled annihilation operator to be $\hbar$ times the original annihilation operator:

$A_i = \hbar \, a_i$

but left the creation operator unchanged:

$C_i = a_i^\dagger$

I’m wondering that too! I’m not sure I’m doing things the best way yet. I’ve also tried another more symmetrical scheme, taking $A_k = \sqrt{\hbar} \, a_k$ and $C_k = \sqrt{\hbar} a_k^\dagger.$ This gives the same commutation relations, but certain other formulas become more unpleasant. I’ll explain that some other day.

Next, we can take the limit as $\hbar \to 0$ and define Poisson brackets of operators by

$\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }$

To make this rigorous it’s best to proceed algebraically. For this we treat $\hbar$ as a formal variable rather than a specific number. So, our number system becomes $\mathbb{R}[\hbar],$ the algebra of polynomials in $\hbar$. We define the Weyl algebra to be the algebra over $\mathbb{R}[\hbar]$ generated by elements $A_i$ and $C_i$ obeying

$[A_i, A_j] = 0$

$[C_i, C_j] = 0$

$[A_i, C_j] = \hbar \, \delta_{i j}$

We can set $\hbar = 0$ in this formalism; then the Weyl algebra reduces to the algebra of polynomials in the variables $A_i$ and $C_i.$ This algebra is commutative! But we can define a Poisson bracket on this algebra by

$\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }$

It takes a bit of work to explain to algebraists exactly what’s going on in this formula, because it involves an interplay between the algebra of polynomials in $A_i$ and $C_i,$ which is commutative, and the Weyl algebra, which is not. I’ll be glad to explain the details if you want. But if you’re a physicist, you can just follow your nose and figure out what the formula gives. For example:

$\begin{array}{ccl} \{A_i, C_j\} &=& \displaystyle{ \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A_i, C_j] } \\ \\ &=& \displaystyle{ \lim_{\hbar \to 0} \; \frac{1}{\hbar} \, \hbar \, \delta_{i j} } \\ \\ &=& \delta_{i j} \end{array}$

Similarly, we have:

$\{ A_i, A_j \} = 0$

$\{ C_i, C_j \} = 0$

$\{ A_i, C_j \} = \delta_{i j}$

$\{ \widetilde{N}_i, \widetilde{N}_j \} = 0$

$\{ \widetilde{N}_i , C_j \} = \delta_{i j} C_j$

$\{ \widetilde{N}_i , A_j \} = - \delta_{i j} A_j$

I should probably use different symbols for $A_i, C_i$ and $\widetilde{N}_i$ after we’ve set $\hbar = 0,$ since they’re really different now, but I don’t have the patience to make up more names for things!

Now, we can think of $A_i$ and $C_i$ as coordinate functions on a 2k-dimensional vector space, and all the polynomials in $A_i$ and $C_i$ as functions on this space. This space is what physicists would call a ‘phase space’: they use this kind of space to describe the position and momentum of a particle, though here we are using it in a different way. Mathematicians would call it a ‘symplectic vector space’, because it’s equipped with a special structure, called a symplectic structure, that lets us define Poisson brackets of smooth functions on this space. We won’t need to get into that now, but it’s important—and it makes me happy to see it here.

### More

There’s a lot more to do, but not today. My main goal is to understand, in a really elegant way, how the master equation for a stochastic Petri net reduces to the rate equation in the large-number limit. What we’ve done so far is start thinking of this as a $\hbar \to 0$ limit. This should let us borrow ideas about classical limits in quantum mechanics, and apply them to stochastic mechanics.

Stay tuned!

## Quantum Techniques for Reaction Networks

11 June, 2013

Fans of the network theory series might like to look at this paper:

• John Baez, Quantum techniques for reaction networks.

and I would certainly appreciate comments and corrections.

This paper tackles a basic question we never got around to discussing: how the probabilistic description of a system where bunches of things randomly interact and turn into other bunches of things can reduce to a deterministic description in the limit where there are lots of things!

Mathematically, such systems are given by ‘stochastic Petri nets’, or if you prefer, ‘stochastic reaction networks’. These are just two equivalent pictures of the same thing. For example, we could describe some chemical reactions using this Petri net:

but chemists would use this reaction network:

C + O2 → CO2
CO2 + NaOH → NaHCO3
NaHCO3 + HCl → H2O + NaCl + CO2

Making either of them ‘stochastic’ merely means that we specify a ‘rate constant’ for each reaction, saying how probable it is.

For any such system we get a ‘master equation’ describing how the probability of having any number of things of each kind changes with time. In the class I taught on this last quarter, the students and I figured out how to derive from this an equation saying how the expected number of things of each kind changes with time. Later I figured out a much slicker argument… but either way, we get this result:

Theorem. For any stochastic reaction network and any stochastic state $\Psi(t)$ evolving in time according to the master equation, then

$\displaystyle{ \frac{d}{dt} \langle N \Psi(t) \rangle } = \displaystyle{\sum_{\tau \in T}} \, r(\tau) \, (s(\tau) - t(\tau)) \; \left\langle N^{\underline{s(\tau)}}\, \Psi(t) \right\rangle$

assuming the derivative exists.

Of course this will make no sense yet if you haven’t been following the network theory series! But I explain all the notation in the paper, so don’t be scared. The main point is that $\langle N \Psi(t) \rangle$ is a vector listing the expected number of things of each kind at time $t.$ The equation above says how this changes with time… but it closely resembles the ‘rate equation’, which describes the evolution of chemical systems in a deterministic way.

And indeed, the next big theorem says that the master equation actually implies the rate equation when the probability of having various numbers of things of each kind is given by a product of independent Poisson distributions. In this case $\Psi(t)$ is what people in quantum physics call a ‘coherent state’. So:

Theorem. Given any stochastic reaction network, let
$\Psi(t)$ be a mixed state evolving in time according to the master equation. If $\Psi(t)$ is a coherent state when $t = t_0,$ then $\langle N \Psi(t) \rangle$ obeys the rate equation when $t = t_0.$

In most cases, this only applies exactly at one moment of time: later $\Psi(t)$ will cease to be a coherent state. Then we must resort to the previous theorem to see how the expected number of things of each kind changes with time.

But sometimes our state $\Psi(t)$ will stay coherent forever! For one case where this happens, see the companion paper, which I blogged about a little while ago:

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

We wrote this first, but logically it comes after the one I just finished now!

All this material will get folded into the book I’m writing with Jacob Biamonte. There are just a few remaining loose ends that need to be tied up.

## Quantum Techniques for Studying Equilibrium in Reaction Networks

16 May, 2013

The summer before last, I invited Brendan Fong to Singapore to work with me on my new ‘network theory’ project. He quickly came up with a nice new proof of a result about mathematical chemistry. We blogged about it, and I added it to my book, but then he became a grad student at Oxford and got distracted by other kinds of networks—namely, Bayesian networks.

So, we’ve just now finally written up this result as a self-contained paper:

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

Check it out and let us know if you spot mistakes or stuff that’s not clear!

The idea, in brief, is to use math from quantum field theory to give a somewhat new proof of the Anderson–Craciun–Kurtz theorem.

This remarkable result says that in many cases, we can start with an equilibrium solution of the ‘rate equation’ which describes the behavior of chemical reactions in a deterministic way in the limit of a large numbers of molecules, and get an equilibrium solution of the ‘master equation’ which describes chemical reactions probabilistically for any number of molecules.

The trick, in our approach, is to start with a chemical reaction network, which is something like this:

and use it to write down a Hamiltonian describing the time evolution of the probability that you have various numbers of each kind of molecule: A, B, C, D, E, … Using ideas from quantum mechanics, we can write this Hamiltonian in terms of annihilation and creation operators—even though our problem involves probability theory, not quantum mechanics! Then we can write down the equilibrium solution as a ‘coherent state’. In quantum mechanics, that’s a quantum state that approximates a classical one as well as possible.

All this is part of a larger plan to take tricks from quantum mechanics and apply them to ‘stochastic mechanics’, simply by working with real numbers representing probabilities instead of complex numbers representing amplitudes!

I should add that Brendan’s work on Bayesian networks is also very cool, and I plan to talk about it here and even work it into the grand network theory project I have in mind. But this may take quite a long time, so for now you should read his paper:

• Brendan Fong, Causal theories: a categorical perspective on Bayesian networks.

## Network Theory (Part 29)

23 April, 2013

I’m talking about electrical circuits, but I’m interested in them as models of more general physical systems. Last time we started seeing how this works. We developed an analogy between electrical circuits and physical systems made of masses and springs, with friction:

 Electronics Mechanics charge: $Q$ position: $q$ current: $I = \dot{Q}$ velocity: $v = \dot{q}$ flux linkage: $\lambda$ momentum: $p$ voltage: $V = \dot{\lambda}$ force: $F = \dot{p}$ inductance: $L$ mass: $m$ resistance: $R$ damping coefficient: $r$ inverse capacitance: $1/C$ spring constant: $k$

But this is just the first of a large set of analogies. Let me list some, so you can see how wide-ranging they are!

### More analogies

People in system dynamics often use effort as a term to stand for anything analogous to force or voltage, and flow as a general term to stand for anything analogous to velocity or electric current. They call these variables $e$ and $f.$

To me it’s important that force is the time derivative of momentum, and velocity is the time derivative of position. Following physicists, I write momentum as $p$ and position as $q.$ So, I’ll usually write effort as $\dot{p}$ and flow as $\dot{q}$.

Of course, ‘position’ is a term special to mechanics; it’s nice to have a general term for the thing whose time derivative is flow, that applies to any context. People in systems dynamics seem to use displacement as that general term.

It would also be nice to have a general term for the thing whose time derivative is effort… but I don’t know one. So, I’ll use the word momentum.

Now let’s see the analogies! Let’s see how displacement $q$, flow $\dot{q},$ momentum $p$ and effort $\dot{p}$ show up in several subjects:

 displacement:    $q$ flow:      $\dot q$ momentum:      $p$ effort:           $\dot p$ Mechanics: translation position velocity momentum force Mechanics: rotation angle angular velocity angular momentum torque Electronics charge current flux linkage voltage Hydraulics volume flow pressure momentum pressure Thermal Physics entropy entropy flow temperature momentum temperature Chemistry moles molar flow chemical momentum chemical potential

We’d been considering mechanics of systems that move along a line, via translation, but we can also consider mechanics for systems that turn round and round, via rotation. So, there are two rows for mechanics here.

There’s a row for electronics, and then a row for hydraulics, which is closely analogous. In this analogy, a pipe is like a wire. The flow of water plays the role of current. Water pressure plays the role of electrostatic potential. The difference in water pressure between two ends of a pipe is like the voltage across a wire. When water flows through a pipe, the power equals the flow times this pressure difference—just as in an electrical circuit the power is the current times the voltage across the wire.

A resistor is like a narrowed pipe:

An inductor is like a heavy turbine placed inside a pipe: this makes the water tend to keep flowing at the same rate it’s already flowing! In other words, it provides a kind of ‘inertia’ analogous
to mass.

A capacitor is like a tank with pipes coming in from both ends, and a rubber sheet dividing it in two lengthwise:

When studying electrical circuits as a kid, I was shocked when I first learned that capacitors don’t let the electrons through: it didn’t seem likely you could do anything useful with something like that! But of course you can. Similarly, this gizmo doesn’t let the water through.

A voltage source is like a compressor set up to maintain a specified pressure difference between the input and output:

Similarly, a current source is like a pump set up to maintain a specified flow.

Finally, just as voltage is the time derivative of a fairly obscure quantity called ‘flux linkage’, pressure is the time derivative of an even more obscure quantity which has no standard name. I’m calling it ‘pressure momentum’, thanks to the analogy

momentum: force :: pressure momentum: pressure

Just as pressure has units of force per area, pressure momentum has units of momentum per area!

People invented this analogy back when they were first struggling to understand electricity, before electrons had been observed:

Hydraulic analogy, Wikipedia.

The famous electrical engineer Oliver Heaviside pooh-poohed this analogy, calling it the “drain-pipe theory”. I think he was making fun of William Henry Preece. Preece was another electrical engineer, who liked the hydraulic analogy and disliked Heaviside’s fancy math. In his inaugural speech as president of the Institution of Electrical Engineers in 1893, Preece proclaimed:

True theory does not require the abstruse language of mathematics to make it clear and to render it acceptable. All that is solid and substantial in science and usefully applied in practice, have been made clear by relegating mathematic symbols to their proper store place—the study.

According to the judgement of history, Heaviside made more progress in understanding electromagnetism than Preece. But there’s still a nice analogy between electronics and hydraulics. And I’ll eventually use the abstruse language of mathematics to make it very precise!

But now let’s move on to the row called ‘thermal physics’. We could also call this ‘thermodynamics’. It works like this. Say you have a physical system in thermal equilibrium and all you can do is heat it up or cool it down ‘reversibly’—that is, while keeping it in thermal equilibrium all along. For example, imagine a box of gas that you can heat up or cool down. If you put a tiny amount $dE$ of energy into the system in the form of heat, then its entropy increases by a tiny amount $dS.$ And they’re related by this equation:

$dE = TdS$

where $T$ is the temperature.

Another way to say this is

$\displaystyle{ \frac{dE}{dt} = T \frac{dS}{dt} }$

where $t$ is time. On the left we have the power put into the system in the form of heat. But since power should be ‘effort’ times ‘flow’, on the right we should have ‘effort’ times ‘flow’. It makes some sense to call $dS/dt$ the ‘entropy flow’. So temperature, $T,$ must play the role of ‘effort’.

This is a bit weird. I don’t usually think of temperature as a form of ‘effort’ analogous to force or torque. Stranger still, our analogy says that ‘effort’ should be the time derivative of some kind of ‘momentum’, So, we need to introduce temperature momentum: the integral of temperature over time. I’ve never seen people talk about this concept, so it makes me a bit nervous.

But when we have a more complicated physical system like a piston full of gas in thermal equilibrium, we can see the analogy working. Now we have

$dE = TdS - PdV$

The change in energy $dE$ of our gas now has two parts. There’s the change in heat energy $TdS$, which we saw already. But now there’s also the change in energy due to compressing the piston! When we change the volume of the gas by a tiny amount $dV,$ we put in energy $-PdV.$

Now look back at the first chart I drew! It says that pressure is a form of ‘effort’, while volume is a form of ‘displacement’. If you believe that, the equation above should help convince you that temperature is also a form of effort, while entropy is a form of displacement.

But what about the minus sign? That’s no big deal: it’s the result of some arbitrary conventions. $P$ is defined to be the outward pressure of the gas on our piston. If this is positive, reducing the volume of the gas takes a positive amount of energy, so we need to stick in a minus sign. I could eliminate this minus sign by changing some conventions—but if I did, the chemistry professors at UCR would haul me away and increase my heat energy by burning me at the stake.

Speaking of chemistry: here’s how the chemistry row in the analogy chart works. Suppose we have a piston full of gas made of different kinds of molecules, and there can be chemical reactions that change one kind into another. Now our equation gets fancier:

$\displaystyle{ dE = TdS - PdV + \sum_i \mu_i dN_i }$

Here $N_i$ is the number of molecules of the ith kind, while $\mu_i$ is a quantity called a chemical potential. The chemical potential simply says how much energy it takes to increase the number of molecules of a given kind. So, we see that chemical potential is another form of effort, while number of molecules is another form of displacement.

But chemists are too busy to count molecules one at a time, so they count them in big bunches called ‘moles’. A mole is the number of atoms in 12 grams of carbon-12. That’s roughly

602,214,150,000,000,000,000,000

atoms. This is called Avogadro’s constant. If we used 1 gram of hydrogen, we’d get a very close number called ‘Avogadro’s number’, which leads to lots of jokes:

(He must be desperate because he looks so weird… sort of like a mole!)

So, instead of saying that the displacement in chemistry is called ‘number of molecules’, you’ll sound more like an expert if you say ‘moles’. And the corresponding flow is called molar flow.

The truly obscure quantity in this row of the chart is the one whose time derivative is chemical potential! I’m calling it chemical momentum simply because I don’t know another name.

Why are linear and angular momentum so famous compared to pressure momentum, temperature momentum and chemical momentum?

I suspect it’s because the laws of physics are symmetrical
under translations and rotations. When the assumptions of Noether’s theorem hold, this guarantees that the total momentum and angular momentum of a closed system are conserved. Apparently the laws of physics lack the symmetries that would make the other kinds of momentum be conserved.

This suggests that we should dig deeper and try to understand more deeply how this chart is connected to ideas in classical mechanics, like Noether’s theorem or symplectic geometry. I will try to do that sometime later in this series.

More generally, we should try to understand what gives rise to a row in this analogy chart. Are there are lots of rows I haven’t talked about yet, or just a few? There are probably lots. But are there lots of practically important rows that I haven’t talked about—ones that can serve as the basis for new kinds of engineering? Or does something about the structure of the physical world limit the number of such rows?

### Mildly defective analogies

Engineers care a lot about dimensional analysis. So, they often make a big deal about the fact that while effort and flow have different dimensions in different rows of the analogy chart, the following four things are always true:

$pq$ has dimensions of action (= energy × time)
$\dot{p} q$ has dimensions of energy
$p \dot{q}$ has dimensions of energy
$\dot{p} \dot{q}$ has dimensions of power (= energy / time)

In fact any one of these things implies all the rest.

These facts are important when designing ‘mixed systems’, which combine different rows in the chart. For example, in mechatronics, we combine mechanical and electronic elements in a single circuit! And in a hydroelectric dam, power is converted from hydraulic to mechanical and then electric form:

One goal of network theory should be to develop a unified language for studying mixed systems! Engineers have already done most of the hard work. And they’ve realized that thanks to conservation of energy, working with pairs of flow and effort variables whose product has dimensions of power is very convenient. It makes it easy to track the flow of energy through these systems.

However, people have tried to extend the analogy chart to include ‘mildly defective’ examples where effort times flow doesn’t have dimensions of power. The two most popular are these:

 displacement:    $q$ flow:      $\dot q$ momentum:      $p$ effort:           $\dot p$ Heat flow heat heat flow temperature momentum temperature Economics inventory product flow economic momentum product price

The heat flow analogy comes up because people like to think of heat flow as analogous to electrical current, and temperature as analogous to voltage. Why? Because an insulated wall acts a bit like a resistor! The current flowing through a resistor is a function the voltage across it. Similarly, the heat flowing through an insulated wall is about proportional to the difference in temperature between the inside and the outside.

However, there’s a difference. Current times voltage has dimensions of power. Heat flow times temperature does not have dimensions of power. In fact, heat flow by itself already has dimensions of power! So, engineers feel somewhat guilty about this analogy.

Being a mathematical physicist, a possible way out presents itself to me: use units where temperature is dimensionless! In fact such units are pretty popular in some circles. But I don’t know if this solution is a real one, or whether it causes some sort of trouble.

In the economic example, ‘energy’ has been replaced by ‘money’. So other words, ‘inventory’ times ‘product price’ has units of money. And so does ‘product flow’ times ‘economic momentum’! I’d never heard of economic momentum before I started studying these analogies, but I didn’t make up that term. It’s the thing whose time derivative is ‘product price’. Apparently economists have noticed a tendency for rising prices to keep rising, and falling prices to keep falling… a tendency toward ‘conservation of momentum’ that doesn’t fit into their models of rational behavior.

I’m suspicious of any attempt to make economics seem like physics. Unlike elementary particles or rocks, people don’t seem to be very well modelled by simple differential equations. However, some economists have used the above analogy to model economic systems. And I can’t help but find that interesting—even if intellectually dubious when taken too seriously.

### An auto-analogy

Beside the analogy I’ve already described between electronics and mechanics, there’s another one, called ‘Firestone’s analogy’:

• F.A. Firestone, A new analogy between mechanical and electrical systems, Journal of the Acoustical Society of America 4 (1933), 249–267.

Alain Bossavit pointed this out in the comments to Part 27. The idea is to treat current as analogous to force instead of velocity… and treat voltage as analogous to velocity instead of force!

In other words, switch your $p$’s and $q$’s:

 Electronics Mechanics          (usual analogy) Mechanics      (Firestone’s analogy) charge position: $q$ momentum: $p$ current velocity: $\dot{q}$ force: $\dot{p}$ flux linkage momentum: $p$ position: $q$ voltage force: $\dot{p}$ velocity: $\dot{q}$

This new analogy is not ‘mildly defective’: the product of effort and flow variables still has dimensions of power. But why bother with another analogy?

It may be helpful to recall this circuit from last time:

It’s described by this differential equation:

$L \ddot{Q} + R \dot{Q} + C^{-1} Q = V$

We used the ‘usual analogy’ to translate it into classical mechanics problem, and we got a problem where an object of mass $L$ is hanging from a spring with spring constant $1/C$ and damping coefficient $R,$ and feeling an additional external force $F:$

$m \ddot{q} + r \dot{q} + k q = F$

And that’s fine. But there’s an intuitive sense in which all three forces are acting ‘in parallel’ on the mass, rather than in series. In other words, all side by side, instead of one after the other.

Using Firestone’s analogy, we get a different classical mechanics problem, where the three forces are acting in series. The spring is connected to source of friction, which in turn is connected to an external force.

This may seem a bit mysterious. But instead of trying to explain it, I’ll urge you to read his paper, which is short and clearly written. I instead want to make a somewhat different point, which is that we can take a mechanical system, convert it to an electrical one following the usual analogy, and then convert back to a mechanical one using Firestone’s analogy. This gives us an ‘auto-analogy’ between mechanics and itself, which switches $p$ and $q.$

And although I haven’t been able to figure out why from Firestone’s paper, I have other reasons for feeling sure this auto-analogy should contain a minus sign. For example:

$p \mapsto q, \qquad q \mapsto -p$

In other words, it should correspond to a 90° rotation in the $(p,q)$ plane. There’s nothing sacred about whether we rotate clockwise or counterclockwise; we can equally well do this:

$p \mapsto -q, \qquad q \mapsto p$

But we need the minus sign to get a so-called symplectic transformation of the $(p,q)$ plane. And from my experience with classical mechanics, I’m pretty sure we want that. If I’m wrong, please let me know!

I have a feeling we should revisit this issue when we get more deeply into the symplectic aspects of circuit theory. So, I won’t go on now.

### References

The analogies I’ve been talking about are studied in a branch of engineering called system dynamics. You can read more about it here:

• Dean C. Karnopp, Donald L. Margolis and Ronald C. Rosenberg, System Dynamics: a Unified Approach, Wiley, New York, 1990.

• Forbes T. Brown, Engineering System Dynamics: a Unified Graph-Centered Approach, CRC Press, Boca Raton, 2007.

• Francois E. Cellier, Continuous System Modelling, Springer, Berlin, 1991.

System dynamics already uses lots of diagrams of networks. One of my goals in weeks to come is to explain the category theory lurking behind these diagrams.

## Petri Net Programming (Part 2)

20 December, 2012

guest post by David A. Tanzer

### An introduction to stochastic Petri nets

In the previous article, I explored a simple computational model called Petri nets. They are used to model reaction networks, and have applications in a wide variety of fields, including population ecology, gene regulatory networks, and chemical reaction networks. I presented a simulator program for Petri nets, but it had an important limitation: the model and the simulator contain no notion of the rates of the reactions. But these rates critically determine the character of the dynamics of network.

Here I will introduce the topic of ‘stochastic Petri nets,’ which extends the basic model to include reaction dynamics. Stochastic means random, and it is presumed that there is an underlying random process that drives the reaction events. This topic is rich in both its mathematical foundations and its practical applications. A direct application of the theory yields the rate equation for chemical reactions, which is a cornerstone of chemical reaction theory. The theory also gives algorithms for analyzing and simulating Petri nets.

We are now entering the ‘business’ of software development for applications to science. The business logic here is nothing but math and science itself. Our study of this logic is not an academic exercise that is tangential to the implementation effort. Rather, it is the first phase of a complete software development process for scientific programming applications.

The end goals of this series are to develop working code to analyze and simulate Petri nets, and to apply these tools to informative case studies. But we have some work to do en route, because we need to truly understand the models in order to properly interpret the algorithms. The key questions here are when, why, and to what extent the algorithms give results that are empirically predictive. We will therefore be embarking on some exploratory adventures into the relevant theoretical foundations.

The overarching subject area to which stochastic Petri nets belong has been described as stochastic mechanics in the network theory series here on Azimuth. The theme development here will partly parallel that of the network theory series, but with a different focus, since I am addressing a computationally oriented reader. For an excellent text on the foundations and applications of stochastic mechanics, see:

• Darren Wilkinson, Stochastic Modelling for Systems Biology, Chapman and Hall/CRC Press, Boca Raton, Florida, 2011.

### Review of basic Petri nets

A Petri net is a graph with two kinds of nodes: species and transitions. The net is populated with a collection of ‘tokens’ that represent individual entities. Each token is attached to one of the species nodes, and this attachment indicates the type of the token. We may therefore view a species node as a container that holds all of the tokens of a given type.

The transitions represent conversion reactions between the tokens. Each transition is ‘wired’ to a collection of input species-containers, and to a collection of output containers. When it ‘fires’, it removes one token from each input container, and deposits one token to each output container.

Here is the example we gave, for a simplistic model of the formation and dissociation of H2O molecules:

The circles are for species, and the boxes are for transitions.

The transition combine takes in two H tokens and one O token, and outputs one H2O token. The reverse transition is split, which takes in one H2O, and outputs two H’s and one O.

An important application of Petri nets is to the modeling of biochemical reaction networks, which include the gene regulatory networks. Since genes and enzymes are molecules, and their binding interactions are chemical reactions, the Petri net model is directly applicable. For example, consider a transition that inputs one gene G, one enzyme E, and outputs the molecular form G • E in which E is bound to a particular site on G.

Applications of Petri nets may differ widely in terms of the population sizes involved in the model. In general chemistry reactions, the populations are measured in units of moles (where a mole is ‘Avogadro’s number’ 6.022 · 1023 entities). In gene regulatory networks, on the other hand, there may only be a handful of genes and enzymes involved in a reaction.

This difference in scale leads to a qualitative difference in the modelling. With small population sizes, the stochastic effects will predominate, but with large populations, a continuous, deterministic, average-based approximation can be used.

### Representing Petri nets by reaction formulas

Petri nets can also be represented by formulas used for chemical reaction networks. Here is the formula for the Petri net shown above:

H2O ↔ H + H + O

or the more compact:

H2O ↔ 2 H + O

The double arrow is a compact designation for two separate reactions, which happen to be opposites of each other.

By the way, this reaction is not physically realistic, because one doesn’t find isolated H and O atoms traveling around and meeting up to form water molecules. This is the actual reaction pair that predominates in water:

2 H2O ↔ OH- + H3O+

Here, a hydrogen nucleus H+, with one unit of positive charge, gets removed from one of the H2O molecules, leaving behind the hydroxide ion OH-. In the same stroke, this H+ gets re-attached to the other H2O molecule, which thereby becomes a hydronium ion, H3O+.

For a more detailed example, consider this reaction chain, which is of concern to the ocean environment:

CO2 + H2O ↔ H2CO3 ↔ H+ + HCO3-

This shows the formation of carbonic acid, namely H2CO3, from water and carbon dioxide. The next reaction represents the splitting of carbonic acid into a hydrogen ion and a negatively charged bicarbonate ion, HCO3-. There is a further reaction, in which a bicarbonate ion further ionizes into an H+ and a doubly negative carbonate ion CO32-. As the diagram indicates, for each of these reactions, a reverse reaction is also present. For a more detailed description of this reaction network, see:

• Stephen E. Bialkowski, Carbon dioxide and carbonic acid.

Increased levels of CO2 in the atmosphere will change the balance of these reactions, leading to a higher concentration of hydrogen ions in the water, i.e., a more acidic ocean. This is of concern because the metabolic processes of aquatic organisms is sensitive to the pH level of the water. The ultimate concern is that entire food chains could be disrupted, if some of the organisms cannot survive in a higher pH environment. See the Wikipedia page on ocean acidification for more information.

Exercise. Draw Petri net diagrams for these reaction networks.

### Motivation for the study of Petri net dynamics

The relative rates of the various reactions in a network critically determine the qualitative dynamics of the network as a whole. This is because the reactions are ‘competing’ with each other, and so their relative rates determine the direction in which the state of the system is changing. For instance, if molecules are breaking down faster then they are being formed, then the system is moving towards full dissociation. When the rates are equal, the processes balance out, and the system is in an equilibrium state. Then, there are only temporary fluctuations around the equilibrium conditions.

The rate of the reactions will depend on the number of tokens present in the system. For example, if any of the input tokens are zero, then the transition can’t fire, and so its rate must be zero. More generally, when there are few input tokens available, there will be fewer reaction events, and so the firing rates will be lower.

Given a specification for the rates in a reaction network, we can then pose the following kinds of questions about its dynamics:

• Does the network have an equilibrium state?

• If so, what are the concentrations of the species at equilibrium?

• How quickly does it approach the equilibrium?

• At the equilibrium state, there will still be temporary fluctuations around the equilibrium concentrations. What are the variances of these fluctuations?

• Are there modes in which the network will oscillate between states?

This is the grail we seek.

Aside from actually performing empirical experiments, such questions can be addressed either analytically or through simulation methods. In either case, our first step is to define a theoretical model for the dynamics of a Petri net.

### Stochastic Petri nets

A stochastic Petri net (with kinetics) is a Petri net that is augmented with a specification for the reaction dynamics. It is defined by the following:

• An underlying Petri net, which consists of species, transitions, an input map, and an output map. These maps assign to each transition a multiset of species. (Multiset means that duplicates are allowed.) Recall that the state of the net is defined by a marking function, that maps each species to its population count.

• A rate constant that is associated with each transition.

• A kinetic model, that gives the expected firing rate for each transition as a function of the current marking. Normally, this kinetic function will include the rate constant as a multiplicative factor.

A further ‘sanity constraint’ can be put on the kinetic function for a transition: it should give a positive value if and only if all of its inputs are positive.

• A stochastic model, which defines the probability distribution of the time intervals between firing events. This specific distribution of the firing intervals for a transition will be a function of the expected firing rate in the current marking.

This definition is based on the standard treatments found, for example in:

• M. Ajmone Marsan, Stochastic Petri nets: an elementary introduction, in Advances in Petri Nets, Springer, Berlin, 1989, 1–23.

or Wilkinson’s book mentioned above. I have also added an explicit mention of the kinetic model, based on the ‘kinetics’ described in here:

• Martin Feinberg, Lectures on chemical reaction networks.

There is an implied random process that drives the reaction events. A classical random process is given by a container with ‘particles’ that are randomly traveling around, bouncing off the walls, and colliding with each other. This is the general idea behind Brownian motion. It is called a random process because the outcome results from an ‘experiment’ that is not fully determined by the input specification. In this experiment, you pour in the ingredients (particles of different types), set the temperature (the distributions of the velocities), give it a stir, and then see what happens. The outcome consists of the paths taken by each of the particles.

In an important limiting case, the stochastic behavior becomes deterministic, and the population sizes become continuous. To see this, consider a graph of population sizes over time. With larger population sizes, the relative jumps caused by the firing of individual transitions become smaller, and graphs look more like continuous curves. In the limit, we obtain an approximation for high population counts, in which the graphs are continuous curves, and the concentrations are treated as continuous magnitudes. In a similar way, a pitcher of sugar can be approximately viewed as a continuous fluid.

This simplification permits the application of continuous mathematics to study of reaction network processes. It leads to the basic rate equation for reaction networks, which specifies the direction of change of the system as a function of the current state of the system.

In this article we will be exploring this continuous deterministic formulation of Petri nets, under what is known as the mass action kinetics. This kinetics is one implementation of the general specification of a kinetic model, as defined above. This means that it will define the expected firing rate of each transition, in a given marking of the net. The probabilistic variations in the spacing of the reactions—around the mean given by the expected firing rate—is part of the stochastic dynamics, and will be addressed in a subsequent article.

### The mass-action kinetics

Under the mass action kinetics, the expected firing rate of a transition is proportional to the product of the concentrations of its input species. For instance, if the reaction were A + C → D, then the firing rate would be proportional to the concentration of A times the concentration of C, and if the reaction were A + A → D, it would be proportional to the square of the concentration of A.

This principle is explained by Feinberg as follows:

For the reaction A+C → D, an occurrence requires that a molecule of A meet a molecule of C in the reaction, and we take the probability of such an encounter to be proportional to the product [of the concentrations of A and C]. Although we do not presume that every such encounter yields a molecule of D, we nevertheless take the occurrence rate of A+C → D to be governed by [the product of the concentrations].

For an in-depth proof of the mass action law, see this article:

• Daniel Gillespie, A rigorous definition of the chemical master equation, 1992.

Note that we can easily pass back and forth between speaking of the population counts for the species, and the concentrations of the species, which is just the population count divided by the total volume V of the system. The mass action law applies to both cases, the only difference being that the constant factors of (1/V) used for concentrations will get absorbed into the rate constants.

The mass action kinetics is a basic law of empirical chemistry. But there are limits to its validity. First, as indicated in the proof in the Gillespie, the mass action law rests on the assumptions that the system is well-stirred and in thermal equilibrium. Further limits are discussed here:

• Georg Job and Regina Ruffler, Physical Chemistry (first five chapters), Section 5.2, 2010.

They write:

…precise measurements show that the relation above is not strictly adhered to. At higher concentrations, values depart quite noticeably from this relation. If we gradually move to lower concentrations, the differences become smaller. The equation here expresses a so-called “limiting law“ which strictly applies only when c → 0.

In practice, this relation serves as a useful approximation up to rather high concentrations. In the case of electrically neutral substances, deviations are only noticeable above 100 mol m−3. For ions, deviations become observable above 1 mol m−3, but they are so small that they are easily neglected if accuracy is not of prime concern.

Why would the mass action kinetics break down at high concentrations? According to the book quoted, it is due to “molecular and ionic interactions.” I haven’t yet found a more detailed explanation, but here is my supposition about what is meant by molecular interactions in this context. Doubling the number of A molecules doubles the number of expected collisions between A and C molecules, but it also reduces the probability that any given A and C molecules that are within reacting distance will actually react. The reaction probability is reduced because the A molecules are ‘competing’ for reactions with the C molecules. With more A molecules, it becomes more likely that a C molecule will simultaneously be within reacting distance of several A molecules; each of these A molecules reduces the probability that the other A molecules will react with the C molecule. This is most pronounced when the concentrations in a gas get high enough that the molecules start to pack together to form a liquid.

### The equilibrium relation for a pair of opposite reactions

Suppose we have two opposite reactions:

$T: A + B \stackrel{u}{\longrightarrow} C + D$

$T': C + D \stackrel{v}{\longrightarrow} A + B$

Since the reactions have exactly opposite effects on the population sizes, in order for the population sizes to be in a stable equilibrium, the expected firing rates of $T$ and $T'$ must be equal:

$\mathrm{rate}(T') = \mathrm{rate}(T)$

By mass action kinetics:

$\mathrm{rate}(T) = u [A] [B]$

$\mathrm{rate}(T') = v [C] [D]$

where $[X]$ means the concentration of $X.$

Hence at equilibrium:

$u [A] [B] = v [C] [D]$

So:

$\displaystyle{ \frac{[A][B]}{[C][D]} = \frac{v}{u} = K }$

where $K$ is the equilibrium constant for the reaction pair.

### Equilibrium solution for the formation and dissociation of a diatomic molecule

Let A be some type of atom, and let D = A2 be the diatomic form of A. Then consider the opposite reactions:

$A + A \stackrel{u}{\longrightarrow} D$

$D \stackrel{v}{\longrightarrow} A + A$

From the preceding analysis, at equilibrium the following relation holds:

$u [A]^2 = v [D]$

Let $N(A)$ and $N(B)$ be the population counts for A and B, and let

$N = N(A) + 2 N(D)$

be the total number of units of A in the system, whether they be in the form of atoms or diatoms.

The value of $N$ is an invariant property of the system. The reactions cannot change it, because they are just shuffling the units of A from one arrangement to the other. By way of contrast, $N(A)$ is not an invariant quantity.

Dividing this equation by the total volume $V$, we get:

$[N] = [A] + 2 [D]$

where $[N]$ is the concentration of the units of A.

Given a fixed value for $[N]$ and the rate constants $u$ and $v$, we can then solve for the concentrations at equilibrium:

$\displaystyle{u [A]^2 = v [D] = v ([N] - [A]) / 2 }$

$\displaystyle{2 u [A]^2 + v [A] - v [N] = 0 }$

$\displaystyle{[A] = (-v \pm \sqrt{v^2 + 8 u v [N]}) / 4 u }$

Since $[A]$ can’t be negative, only the positive square root is valid.

Here is the solution for the case where $u = v = 1$:

$\displaystyle{[A] = (\sqrt{8 [N] + 1} - 1) / 4 }$

$\displaystyle{[D] = ([N] - [A]) / 2 }$

### Conclusion

We’ve covered a lot of ground, starting with the introduction of the stochastic Petri net model, followed by a general discussion of reaction network dynamics, the mass action laws, and calculating equilibrium solutions for simple reaction networks.

We still have a number of topics to cover on our journey into the foundations, before being able to write informed programs to solve problems with stochastic Petri nets. Upcoming topics are (1) the deterministic rate equation for general reaction networks and its application to finding equilibrium solutions, and (2) an exploration of the stochastic dynamics of a Petri net. These are the themes that will support our upcoming software development.

## Network Theory (Part 25)

3 November, 2012

In parts 2-24 of this network theory series, we’ve been talking about Petri nets and reaction networks. These parts are now getting turned into a book. You can see a draft here:

• John Baez and Jacob Biamonte, A course on quantum techniques for stochastic mechanics.

There’s a lot more to network theory than this. But before I dive into the next big topic, I want to mention a few more odds and ends about Petri nets and reaction networks. For example, their connection to logic and computation!

As we’ve seen, a stochastic Petri net can be used to describe a bunch of chemical reactions with certain reaction rates. We could try to use these reactions to build a ‘chemical computer’. But how powerful can such a computer be?

I don’t know the answer. But before people got interested in stochastic Petri nets, computer scientists spent quite some time studying plain old Petri nets, which don’t include the information about reaction rates. They used these as simple models of computation. And since computer scientists like to know which questions are decidable by means of an algorithm and which aren’t, they proved some interesting theorems about decidability for Petri nets.

Let me talk about: ‘reachability’: the question of which collections of molecules can turn into which other collections, given a fixed set of chemical reactions. For example, suppose you have these chemical reactions:

C + O2 → CO2

CO2 + NaOH → NaHCO3

NaHCO3 + HCl → H2O + NaCl + CO2

Can you use these to turn

C + O2 + NaOH + HCl

into

CO2 + H2O + NaCl ?

It’s not too hard to settle this particular question—we’ll do it soon. But settling all possible such questions turns out to be very hard

### Reachability

Remember:

Definition. A Petri net consists of a set $S$ of species and a set $T$ of transitions, together with a function

$i : S \times T \to \mathbb{N}$

saying how many copies of each state shows up as input for each transition, and a function

$o: S \times T \to \mathbb{N}$

saying how many times it shows up as output.

Today we’ll assume both $S$ and $T$ are finite.

Jacob and I like to draw the species as yellow circles and the transitions as aqua boxes, in a charmingly garish color scheme chosen by Dave Tweed. So, the chemical reactions I mentioned before:

C + O2 → CO2

CO2 + NaOH → NaHCO3

NaHCO3 + HCl → H2O + NaCl + CO2

give this Petri net:

A ‘complex’ is, roughly, a way of putting dots in the yellow circles. In chemistry this says how many molecules we have of each kind. Here’s an example:

This complex happens to have just zero or one dot in each circle, but that’s not required: we could have any number of dots in each circle. So, mathematically, a complex is a finite linear combination of species, with natural numbers as coefficients. In other words, it’s an element of $\mathbb{N}^S.$ In this particular example it’s

C + O2 + NaOH + HCl

Given two complexes, we say one is reachable from another if, loosely speaking, we can get to it from the other by a finite sequence of transitions. For example, earlier on I asked if we can get from the complex I just mentioned to the complex

CO2 + H2O + NaCl

which we can draw like this:

And the answer is yes, we can do it with this sequence of transitions:

This settles the question I asked earlier.

So in chemistry, reachability is all about whether it’s possible to use certain chemical reactions to turn one collection of molecules into another using a certain set of reactions. I hope this is clear enough; I could formalize it further but it seems unnecessary. If you have questions, ask me or read this:

Petri net: execution semantics, Wikipedia.

### The reachability problem

Now the reachability problem asks: given a Petri net and two complexes, is one reachable from the other?

If the answer is ‘yes’, of course you can show that by an exhaustive search of all possibilities. But if the answer is ‘no’, how can you be sure? It’s not obvious, in general. Back in the 1970′s, computer scientists felt this problem should be decidable by some algorithm… but they had a lot of trouble finding such an algorithm.

In 1976, Richard J. Lipton showed that if such an algorithm existed, it would need to take at least an exponential amount of memory space and an exponential amount of time to run:

• Richard J. Lipton, The reachability problem requires exponential space, Technical Report 62, Yale University, 1976.

This means that most computer scientists would consider any algorithm to solve the reachability problem ‘infeasible’, since they like polynomial time algorithms.

On the bright side, it means that Petri nets might be fairly powerful when viewed as computers themselves! After all, for a universal Turing machine, the analogue of the reachability problem is undecidable. So if the reachability problem for Petri nets were decidable, they couldn’t be universal computers. But if it were decidable but hard, Petri nets might be fairly powerful—though still not universal—computers.

In 1977, at the ACM Symposium on the Theory of Computing, two researchers presented a proof that reachability problem was decidable:

• S. Sacerdote and R. Tenney, The decidability of the reachability problem for vector addition systems, Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, 2-4 May 1977, Boulder, Colorado, USA, ACM, 1977, pp. 61–76.

• James L. Peterson, Petri Net Theory and the Modeling of Systems, Prentice–Hall, New Jersey, 1981.

This is a very nice introduction to early work on Petri nets and decidability. Peterson had an interesting idea, too:

There would seem to be a very useful connection between Petri nets and Presburger arithmetic.

He gave some evidence, and suggested using this to settle the decidability of the reachability problem. I found that intriguing! Let me explain why.

Presburger arithmetic is a simple set of axioms for the arithmetic of natural numbers, much weaker than Peano arithmetic or even Robinson arithmetic. Unlike those other systems, Presburger arithmetic doesn’t mention multiplication. And unlike those other systems, you can write an algorithm that decides whether any given statement in Presburger arithmetic is provable.

However, any such algorithm must be very slow! In 1974, Fischer and Rabin showed that any decision algorithm for Presburger arithmetic has a worst-case runtime of at least

$2^{2^{c n}}$

for some constant $c,$ where $n$ is the length of the statement. So we say the complexity is at least doubly exponential. That’s much worse than exponential! On the other hand, an algorithm with a triply exponential run time was found by Oppen in 1978.

I hope you see why this is intriguing. Provability is a lot like reachability, since in a proof you’re trying to reach the conclusion starting from the assumptions using certain rules. Like Presburger arithmetic, Petri nets are all about addition, since they consists of transitions going between linear combinations like this:

6 CO2 + 6 H2O → C6H12O6 + 6 O2

That’s why the old literature calls Petri nets vector addition systems. And finally, the difficulty of deciding provability in Presburger arithmetic smells a bit like the difficulty of deciding reachability in Petri nets.

So, I was eager to learn what happened after Peterson wrote his book.

For starters, in 1981, the very year Peterson’s book came out, Ernst Mayr showed that the reachability problem for Petri nets is decidable:

• Ernst Mayr, Persistence of vector replacement systems is decidable, Acta Informatica 15 (1981), 309–318.

As you can see from the title, Mayr actually proved some other property was decidable. However, it follows that reachability is decidable, and Mayr pointed this out in his paper. In fact the decidability of reachability for Petri nets is equivalent to lots of other interesting questions. You can see a bunch here:

• Javier Esparza and Mogens Nielsen, Decidability issues for Petri nets—a survey, Bulletin of the European Association for Theoretical Computer Science 52 (1994), 245–262.

Mayr’s algorithm was complicated. Worse still, it seems to take a hugely long time to run. It seems nobody knows an explicit bound on its runtime. The runtim might even grow faster than any primitive recursive function. The Ackermann function and the closely related Ackermann numbers are famous examples of functions that grow more rapidly than any primitive recursive function. If you don’t know about these, now is the time to learn!

Remember that we can define multiplication by iterating addition:

$n \times m = n + n + n + \cdots + n$

where add $n$ to itself $m$ times. Then we can define exponentiation by iterating multiplication:

$n \uparrow m = n \times n \times n \times \cdots \times n$

where we multiply $n$ by itself $m$ times. Here I’m using Knuth’s up-arrow notation. Then we can define tetration by iterating exponentiation:

$n \uparrow^2 m = n \uparrow (n \uparrow (n \uparrow \cdots \uparrow n)))$

Then we can define an operation $\uparrow^3$ by iterating tetration, and so on. All these functions are primitive recursive. But the $n$th Ackermann number is not; it’s defined to be

$n \uparrow^n n$

This grows at an insanely rapid rate:

$1 \uparrow 1 = 1$

$2 \uparrow^2 2 = 4$

$3 \uparrow^3 3 = 3^{3^{3^{.^{.^{.}}}}}$

where we have a stack of $3^{3^3}$ threes—or in other words, $3^{7625597484987}$ threes! When we get to $4 \uparrow^4 4,$ my mind boggles. I wish it didn’t, but it does.

In 1998 someone came up with a faster algorithm:

• Zakaria Bouziane, A primitive recursive algorithm for the general Petri net reachability problem, in 39th Annual Symposium on Foundations of Computer Science, IEEE, 1998, pp. 130-136.

Bouziane claimed this algorithm is doubly exponential in space and time. That’s very slow, but not insanely slow.

However, it seems that Bouziane made a mistake:

• Petr Jančar, Bouziane’s transformation of the Petri net reachability problem and incorrectness of the related algorithm, Information and Computation, 206 (2008), 1259–1263.

So: if I tell you some chemicals and a bunch of reactions involving these chemicals, you can decide when some combination of these chemicals can turn into another combination. But it may take a long time to decide this. And we don’t know exactly how long: just more than ‘exponentially long’!

What about the connection to Presburger arithmetic? This title suggests that it exists:

• Jérôme Leroux, The general vector addition system reachability problem by Presburger inductive separators, 2008.

But I don’t understand the paper well enough to be sure. Can someone say more?

Also, does anyone know more about the computational power of Petri nets? They’re not universal computers, but is there a good way to say how powerful they are? Does the fact that it takes a long time to settle the reachability question really imply that they have a lot of computational power?

### Symmetric monoidal categories

Next let me explain the secret reason I’m so fascinated by this. This section is mainly for people who like category theory.

As I mentioned once before, a Petri net is actually nothing but a presentation of a symmetric monoidal category that’s free on some set of objects and some set of morphisms going between tensor products of those objects:

Vladimiro Sassone, On the category of Petri net computations, 6th International Conference on Theory and Practice of Software Development, Proceedings of TAPSOFT ’95, Lecture Notes in Computer Science 915, Springer, Berlin, pp. 334-348.

In chemistry we write the tensor product additively, but we could also write it as $\otimes$. Then the reachability problem consists of questions of this general type:

Suppose we have a symmetric monoidal category freely generated by objects $A, B, C$ and morphisms

$e: A \to B \otimes C$

$f: A \otimes A \otimes B \to A \otimes C$

$g: A \otimes B \otimes C \to A \otimes B \otimes B$

$h : B \otimes A \otimes A \to B$

Is there a morphism from $B \otimes A \otimes A$ to $A \otimes A$?

This is reminiscent of the word problem for groups and other problems where we are given a presentation of an algebraic structure and have to decide if two elements are equal… but now, instead of asking whether two elements are equal we are asking if there is a morphism from one object to another. So, it is fascinating that this problem is decidable—unlike the word problem for groups—but still very hard to decide.

Just in case you want to see a more formal statement, let me finish off by giving you that:

Reachability problem. Given a symmetric monoidal category $C$ freely generated by a finite set of objects and a finite set of morphisms between tensor products of these objects, and given two objects $x,y \in C,$ is there a morphism $f: x \to y$?

Theorem (Lipton, Mayr). There is an algorithm that decides the reachability problem. However, for any such algorithm, for any $c > 0,$ the worst-case run-time exceeds $2^{c n}$ where $n$ is the size of the problem: the sum of the number of generating objects, the number of factors in the sources and targets of all the generating morphisms, and the number of factors in the objects $x,y \in C$ for which the reachability problem is posed.