## Network Theory I

2 March, 2014

Here’s a video of a talk I gave last Tuesday—part of a series. You can see the slides here:

One reason I’m glad I gave this talk is because afterwards Jamie Vicary pointed out some very interesting consequences of the relations among signal-flow diagrams listed in my talk. It turns out they imply equations familiar from the theory of complementarity in categorical quantum mechanics!

This is the kind of mathematical surprise that makes life worthwhile for me. It seemed utterly shocking at first, but I think I’ve figured out why it happens. Now is not the time to explain… but I’ll have to do it soon, both here and in the paper that Jason Eberle are writing about control theory.

• Brendan Fong, A compositional approach to control theory.

## Network Theory Overview

22 February, 2014

Here’s a video of a talk I gave yesterday, made by Brendan Fong. You can see the slides here—and then click the items in blue, and the pictures, for more information!

The idea: nature and the world of human technology are full of networks! People like to draw diagrams of networks. Mathematical physicists know that in principle these diagrams can be understood using category theory. But why should physicists have all the fun? This is the century of understanding living systems and adapting to life on a finite planet. Math isn’t the main thing we need for this, but it’s got to be part of the solution… so one thing we should do is develop a unified and powerful theory of networks.

We are still far from doing this. In this overview, I briefly described three parts of the jigsaw puzzle, and invited everyone to help fit them together:

• electrical circuits and signal-flow graphs.

• stochastic Petri nets, chemical reaction networks and Feynman diagrams.

• Bayesian networks, information and entropy.

In my talks coming up, I’ll go into more detail on each of these.﻿ With luck, you’ll be able to see videos here.

But if you’re near Oxford, you might as well actually attend! You can see dates, times, locations, my slides, and the talks themselves as they show up by going here.

## Network Theory Talks at Oxford

7 February, 2014

I’m giving some talks at Oxford:

### Network Theory

Nature and the world of human technology are full of networks. People like to draw diagrams of networks: flow charts, electrical circuit diagrams, signal-flow graphs, Bayesian networks, Feynman diagrams and the like. Mathematically minded people know that in principle these diagrams fit into a common framework: category theory. But we are still far from a unified theory of networks. After an overview, we will look at three portions of the jigsaw puzzle in three separate talks:

I. Electrical circuits and signal-flow graphs.

II. Stochastic Petri nets, chemical reaction networks and Feynman diagrams.

III. Bayesian networks, information and entropy.

If you’re nearby I hope you can come! All these talks will take place in Lecture Theatre B in the Computer Science Department—see the map below. Here are the times:

• Friday 21 February 2014, 2 pm: Network Theory: overview. See the slides or watch a video.

• Tuesday 25 February, 3:30 pm: Network Theory I: electrical circuits and signal-flow graphs. See the slides or watch a video.

• Tuesday 4 March, 3:30 pm: Network Theory II: stochastic Petri nets, chemical reaction networks and Feynman diagrams. See the slides.

• Tuesday 11 March, 3:30 pm: Network Theory III: Bayesian networks, information and entropy. See the slides.

The first talk will be part of the OASIS series, meaning the “Oxford Advanced Seminar on Informatic Structures”.

I thank Samson Abramsky, Bob Coecke and Jamie Vicary of the Computer Science Department for inviting me, and Ulrike Tillmann and Minhyong Kim of the Mathematical Institute for helping me get set up. I also thank all the people who helped do the work I’ll be talking about, most notably Jacob Biamonte, Jason Erbele, Brendan Fong, Tobias Fritz, Tom Leinster, Tu Pham, and Franciscus Rebro.

Ulrike Tillmann has also kindly invited me to give a topology seminar:

### Operads and the Tree of Life

Trees are not just combinatorial structures: they are also biological structures, both in the obvious way but also in the study of evolution. Starting from DNA samples from living species, biologists use increasingly sophisticated mathematical techniques to reconstruct the most likely “phylogenetic tree” describing how these species evolved from earlier ones. In their work on this subject, they have encountered an interesting example of an operad, which is obtained by applying a variant of the Boardmann–Vogt “W construction” to the operad for commutative monoids. The operations in this operad are labelled trees of a certain sort, and it plays a universal role in the study of stochastic processes that involve branching. It also shows up in tropical algebra. This talk is based on work in progress with Nina Otter.

I’m not sure exactly where this will take place, but surely somewhere in the Mathematical Institute building:

• Monday 24 February, 3:30 pm, Operads and the Tree of Life. See the slides.

The Computer Science Department is shown in the map here:

The Mathematical Institute is a bit to the west:

## Lyapunov Functions for Complex-Balanced Systems

7 January, 2014

guest post by Manoj Gopalkrishnan

A few weeks back, I promised to tell you more about a long-standing open problem in reaction networks, the ‘global attractor conjecture’. I am not going to quite get there today, but we shall take one step in that direction.

Today’s plan is to help you make friends with a very useful function we will call the ‘free energy’ which comes up all the time in the study of chemical reaction networks. We will see that for complex-balanced systems, the free energy function decreases along trajectories of the rate equation. I’m going to explain this statement, and give you most of the proof!

The point of doing all this work is that we will then be able to invoke Lyapunov’s theorem which implies stability of the dynamics. In Greek mythology, Sisyphus was cursed to roll a boulder up a hill only to have it roll down again, so that he had to keep repeating the task for all eternity. When I think of an unstable equilibrium, I imagine a boulder delicately balanced on top of a hill, which will fall off if given the slightest push:

or, more abstractly:

On the other hand, I picture a stable equilibrium as a pebble at the very bottom of a hill. Whichever way a perturbation takes it is up, so it will roll down again to the bottom:

Lyapunov’s theorem guarantees stability provided we can exhibit a nice enough function $V$ that decreases along trajectories. ‘Nice enough’ means that, viewing $V$ as a height function for the hill, the equilibrium configuration should be at the bottom, and every direction from there should be up. If Sisyphus had dug a pit at the top of the hill for the boulder to rest in, Lyapunov’s theorem would have applied, and he could have gone home to rest. The moral of the story is that it pays to learn dynamical systems theory!

Because of the connection to Lyapunov’s theorem, such functions that decrease along trajectories are also called Lyapunov functions. A similar situation is seen in Boltzmann’s H-theorem, and hence such functions are sometimes called H-functions by physicists.

Another reason for me to talk about these ideas now is that I have posted a new article on the arXiv:

• Manoj Gopalkrishnan, On the Lyapunov function for complex-balanced mass-action systems.

The free energy function in chemical reaction networks goes back at least to 1972, to this paper:

• Friedrich Horn and Roy Jackson, General mass action kinetics, Arch. Rational Mech. Analysis 49 (1972), 81–116.

Many of us credit Horn and Jackson’s paper with starting the mathematical study of reaction networks. My paper is an exposition of the main result of Horn and Jackson, with a shorter and simpler proof. The gain comes because Horn and Jackson proved all their results from scratch, whereas I’m using some easy results from graph theory, and the log-sum inequality.

We shall be talking about reaction networks. Remember the idea from the network theory series. We have a set $S$ whose elements are called species, for example

$S = \{ \mathrm{H}_2\mathrm{O}, \mathrm{H}^+, \mathrm{OH}^- \}$

A complex is a vector of natural numbers saying how many items of each species we have. For example, we could have a complex $(2,3,1).$ But chemists would usually write this as

$2 \mathrm{H}_2\mathrm{O} + 3 \mathrm{H}^+ + \mathrm{OH}^-$

A reaction network is a set $S$ of species and a set $T$ of transitions or reactions, where each transition $\tau \in T$ goes from some complex $m(\tau)$ to some complex $n(\tau).$ For example, we could have a transition $\tau$ with

$m(\tau) = \mathrm{H}_2\mathrm{O}$

and

$n(\tau) = \mathrm{H}^+ + \mathrm{OH}^-$

In this situation chemists usually write

$\mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-$

but we want names like $\tau$ for our transitions, so we might write

$\tau : \mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-$

or

$\mathrm{H}_2\mathrm{O} \stackrel{\tau}{\longrightarrow} \mathrm{H}^+ + \mathrm{OH}^-$

As John explained in Part 3 of the network theory series, chemists like to work with a vector of nonnegative real numbers $x(t)$ saying the concentration of each species at time $t.$ If we know a rate constant $r(\tau) > 0$ for each transition $\tau,$ we can write down an equation saying how these concentrations change with time:

$\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) (n(\tau) - m(\tau)) x^{m(\tau)} }$

This is called the rate equation. It’s really a system of ODEs describing how the concentration of each species change with time. Here an expression like $x^m$ is shorthand for the monomial ${x_1}^{m_1} \cdots {x_k}^{m_k}.$

John and Brendan talked about complex balance in Part 9. I’m going to recall this definition, from a slightly different point of view that will be helpful for the result we are trying to prove.

We can draw a reaction network as a graph! The vertices of this graph are all the complexes $m(\tau), n(\tau)$ where $\tau \in T.$ The edges are all the transitions $\tau\in T.$ We think of each edge $\tau$ as directed, going from $m(\tau)$ to $n(\tau).$

We will call the map that sends each transition $\tau$ to the positive real number $r(\tau) x^{m(\tau)}$ the flow $f_x(\tau)$ on this graph. The rate equation can be rewritten very simply in terms of this flow as:

$\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T}(n(\tau) - m(\tau)) \, f_x(\tau) }$

where the right-hand side is now a linear expression in the flow $f_x.$

Flows of water, or electric current, obey a version of Kirchhoff’s current law. Such flows are called conservative flows. The following two lemmas from graph theory are immediate for conservative flows:

Lemma 1. If f is a conservative flow then the net flow across every cut is zero.

A cut is a way of chopping the graph in two, like this:

It’s easy to prove Lemma 1 by induction, moving one vertex across the cut at a time.

Lemma 2. If a conservative flow exists then every edge $\tau\in T$ is part of a directed cycle.

Why is Lemma 2 true? Suppose there exists an edge $\tau : m \to n$ that is not part of any directed cycle. We will exhibit a cut with non-zero net flow. By Lemma 1, this will imply that the flow is not conservative.

One side of the cut will consist of all vertices from which $m$ is reachable by a directed path in the reaction network. The other side of the cut contains at least $n,$ since $m$ is not reachable from $n,$ by the assumption that $\tau$ is not part of a directed cycle. There is flow going from left to right of the cut, across the transition $\tau.$ Since there can be no flow coming back, this cut has nonzero net flow, and we’re done.     ▮

Now, back to the rate equation! We can ask if the flow $f_x$ is conservative. That is, we can ask if, for every complex $n$:

$\displaystyle{ \sum_{\tau : m \to n} f_x(m,n) = \sum_{\tau : n \to p} f_x(n,p). }$

In words, we are asking if the sum of the flow through all transitions coming in to $n$ equals the sum of the flow through all transitions going out of $n.$ If this condition is satisfied at a vector of concentrations $x = \alpha,$ so that the flow $f_\alpha$ is conservative, then we call $\alpha$ a point of complex balance. If in addition, every component of $\alpha$ is strictly positive, then we say that the system is complex balanced.

Clearly if $\alpha$ is a point of complex balance, it’s an equilibrium solution of the rate equation. In other words, $x(t) = \alpha$ is a solution of the rate equation, where $x(t)$ never changes.

I’m using ‘equilibrium’ the way mathematicians do. But I should warn you that chemists use ‘equilibrium’ to mean something more than merely a solution that doesn’t change with time. They often also mean it’s a point of complex balance, or even more. People actually get into arguments about this at conferences.

Complex balance implies more than mere equilibrium. For starters, if a reaction network is such that every edge belongs to a directed cycle, then one says that the reaction network is weakly reversible. So Lemmas 1 and 2 establish that complex-balanced systems must be weakly reversible!

From here on, we fix a complex-balanced system, with $\alpha$ a strictly positive point of complex balance.

Definition. The free energy function is the function

$g_\alpha(x) = \sum_{s\in S} x_s \log x_s - x_s - x_s \log \alpha_s$

where the sum is over all species in $S.$

The whole point of defining the function this way is because it is the unique function, up to an additive constant, whose partial derivative with respect to $x_s$ is $\log x_s/\alpha_s.$ This is important enough that we write it as a lemma. To state it in a pithy way, it is helpful to introduce vector notation for division and logarithms. If $x$ and $y$ are two vectors, we will understand $x/y$ to mean the vector $z$ such that $z_s = x_s/ y_s$ coordinate-wise. Similarly $\log x$ is defined in a coordinate-wise sense as the vector with coordinates $(\log x)_s = \log x_s.$

Lemma 3. The gradient $\nabla g_\alpha(x)$ of $g_\alpha(x)$ equals $\log(x/\alpha).$

We’re ready to state our main theorem!

Theorem. Fix a trajectory $x(t)$ of the rate equation. Then $g_\alpha(x(t))$ is a decreasing function of time $t.$ Further, it is strictly decreasing unless $x(t)$ is an equilibrium solution of the rate equation.

I find precise mathematical statements reassuring. You can often make up your mind about the truth value from a few examples. Very often, though not always, a few well-chosen examples are all you need to get the general idea for the proof. Such is the case for the above theorem. There are three key examples: the two-cycle, the three-cycle, and the figure-eight.

The two-cycle. The two-cycle is this reaction network:

It has two complexes $m$ and $n$ and two transitions $\tau_1 = m\to n$ and $\tau_2 = n\to m,$ with rates $r_1 = r(\tau_1)$ and $r_2 = r(\tau_2)$ respectively.

Fix a solution $x(t)$ of the rate equation. Then the flow from $m$ to $n$ equals $r_1 x^m$ and the backward flow equals $r_2 x^n.$ The condition for $f_\alpha$ to be a conservative flow requires that $f_\alpha = r_1 \alpha^m = r_2 \alpha^n.$ This is one binomial equation in at least one variable, and clearly has a solution in the positive reals. We have just shown that every two-cycle is complex balanced.

The derivative $d g_\alpha(x(t))/d t$ can now be computed by the chain rule, using Lemma 3. It works out to $f_\alpha$ times

$\displaystyle{ \left((x/\alpha)^m - (x/\alpha)^n\right) \, \log\frac{(x/\alpha)^n}{(x/\alpha)^m} }$

This is never positive, and it’s zero if and only if

$(x/\alpha)^m = (x/\alpha)^n$

Why is this? Simply because the logarithm of something greater than 1 is positive, while the log of something less than 1 is negative, so that the sign of $(x/\alpha)^m - (x/\alpha)^n$ is always opposite the sign of $\log \frac{(x/\alpha)^n}{(x/\alpha)^m}.$ We have verified our theorem for this example.

(Note that $(x/\alpha)^m = (x/\alpha)^n$ occurs when $x = \alpha,$ but also at other points: in this example, there is a whole hypersurface consisting of points of complex balance.)

In fact, this simple calculation achieves much more.

Definition. A reaction network is reversible if for every transition $\tau : m \to n$ there is a transition $\tau' : m \to n$ going back, called the reverse of $\tau.$ Suppose we have a reversible reaction network and a vector of concentrations $\alpha$ such that the flow along each edge equals that along the edge going back:

$f_\alpha(\tau) = f_\alpha(\tau')$

whenever $\tau'$ is the reverse $\tau.$ Then we say the reaction network is detailed balanced, and $\alpha$ is a point of detailed balance.

For a detailed-balanced system, the time derivative of $g_\alpha$ is a sum over the contributions of pairs consisting of an edge and its reverse. Hence, the two-cycle calculation shows that the theorem holds for all detailed balanced systems!

This linearity trick is going to prove very valuable. It will allow us to treat the general case of complex balanced systems one cycle at a time. The proof for a single cycle is essentially contained in the example of a three-cycle, which we treat next:

The three-cycle. The three-cycle is this reaction network:

We assume that the system is complex balanced, so that

$f_\alpha(m_1\to m_2) = f_\alpha(m_2\to m_3) = f_\alpha(m_3\to m_1)$

Let us call this nonnegative number $f_\alpha.$ A small calculation employing the chain rule shows that $d g_\alpha(x(t))/d t$ equals $f_\alpha$ times

$\displaystyle{ (x/\alpha)^{m_1}\, \log\frac{(x/\alpha)^{m_2}}{(x/\alpha)^{m_1}} \; + }$

$\displaystyle{ (x/\alpha)^{m_2} \, \log\frac{(x/\alpha)^{m_3}}{(x/\alpha)^{m_2}} \; + }$

$\displaystyle{ (x/\alpha)^{m_3}\, \log\frac{(x/\alpha)^{m_1}}{(x/\alpha)^{m_3}} }$

We need to think about the sign of this quantity:

Lemma 3. Let $a,b,c$ be positive numbers. Then $a \log b/a + b\log c/b + c\log a/c$ is less than or equal to zero, with equality precisely when $a=b=c.$

The proof is a direct application of the log sum inequality. In fact, this holds not just for three numbers, but for any finite list of numbers. Indeed, that is precisely how one obtains the proof for cycles of arbitrary length. Even the two-cycle proof is a special case! If you are wondering how the log sum inequality is proved, it is an application of Jensen’s inequality, that workhorse of convex analysis.

The three-cycle calculation extends to a proof for the theorem so long as there is no directed edge that is shared between two directed cycles. When there are such edges, we need to argue that the flows $f_\alpha$ and $f_x$ can be split between the cycles sharing that edge in a consistent manner, so that the cycles can be analyzed independently. We will need the following simple lemma about conservative flows from graph theory. We will apply this lemma to the flow $f_\alpha.$

Lemma 4. Let $f$ be a conservative flow on a graph $G.$ Then there exist directed cycles $C_1, C_2,\dots, C_k$ in $G,$ and nonnegative real ‘flows’ $f_1,f_2,\dots,f_k \in [0,\infty]$ such that for each directed edge $e$ in $G,$ the flow $f(e)$ equals the sum of $f_i$ over $i$ such the cycle $C_i$ contains the edge $e.$

Intuitively, this lemma says that conservative flows come from constant flows on the directed cycles of the graph. How does one show this lemma? I’m sure there are several proofs, and I hope some of you can share some of the really neat ones with me. The one I employed was algorithmic. The idea is to pick a cycle, any cycle, and subtract the maximum constant flow that this cycle allows, and repeat. This is most easily understood by looking at the example of the figure-eight:

The figure-eight. This reaction network consists of two three-cycles sharing an edge:

Here’s the proof for Lemma 4. Let $f$ be a conservative flow on this graph. We want to exhibit cycles and flows on this graph according to Lemma 4. We arbitrarily pick any cycle in the graph. For example, in the figure-eight, suppose we pick the cycle $u_0\to u_1\to u_2\to u_0.$ We pick an edge in this cycle on which the flow is minimum. In this case, $f(u_0\to u_1) = f(u_2\to u_0)$ is the minimum. We define a remainder flow by subtracting from $f$ this constant flow which was restricted to one cycle. So the remainder flow is the same as $f$ on edges that don’t belong to the picked cycle. For edges that belong to the cycle, the remainder flow is $f$ minus the minimum of $f$ on this cycle. We observe that this remainder flow satisfies the conditions of Lemma 4 on a graph with strictly fewer edges. Continuing in this way, since the lemma is trivially true for the empty graph, we are done by infinite descent.

Now that we know how to split the flow $f_\alpha$ across cycles, we can figure out how to split the rates across the different cycles. This will tell us how to split the flow $f_x$ across cycles. Again, this is best illustrated by an example.

The figure-eight. Again, this reaction network looks like

Suppose as in Lemma 4, we obtain the cycles

$C_1 = u_0\to u_1\to u_2\to u_0$

with constant flow $f_\alpha^1$

and

$C_2 = u_3\to u_1\to u_2\to u_3$ with constant flow $f_\alpha^2$ such that

$f_\alpha^1 + f_\alpha^2 = f_\alpha(u_1\to u_2)$

Here’s the picture:

Then we obtain rates $r^1(u_1\to u_2)$ and $r^2(u_1\to u_2)$ by solving the equations

$f^1_\alpha = r^1(u_1\to u_2) \alpha^{u_1}$

$f^2_\alpha = r^2(u_1\to u_2) \alpha^{u_2}$

Using these rates, we can define non-constant flows $f^1_x$ on $C_1$ and $f^2_x$ on $C_2$ by the usual formulas:

$f^1_x(u_1\to u_2) = r^1(u_1\to u_2) x^{u_1}$

and similarly for $f^2_x.$ In particular, this gives us

$f^1_x(u_1\to u_2)/f^1_\alpha = (x/\alpha)^{u_1}$

and similarly for $f^2_x.$

Using this, we obtain the proof of the Theorem! The time derivative of $g_\alpha$ along a trajectory has a contribution from each cycle $C$ as in Lemma 4, where each cycle is treated as a separate system with the new rates $r^C,$ and the new flows $f^C_\alpha$ and $f^C_x.$ So, we’ve reduced the problem to the case of a cycle, which we’ve already done.

Let’s review what happened. The time derivative of the function $g_\alpha$ has a very nice form, which is linear in the flow $f_x.$ The reaction network can be broken up into cycles. Th e conservative flow $f_\alpha$ for a complex balanced system can be split into conservative flows on cycles by Lemma 4. This informs us how to split the non-conservative flow $f_x$ across cycles. By linearity of the time derivative, we can separately treat the case for every cycle. For each cycle, we get an expression to which the log sum inequality applies, giving us the final result that $g_\alpha$ decreases along trajectories of the rate equation.

Now that we have a Lyapunov function, we will put it to use to obtain some nice theorems about the dynamics, and finally state the global attractor conjecture. All that and more, in the next blog post!

## Autocatalysis in Reaction Networks

11 October, 2013

guest post by Manoj Gopalkrishnan

Since this is my first time writing a blog post here, let me start with a word of introduction. I am a computer scientist at the Tata Institute of Fundamental Research, broadly interested in connections between Biology and Computer Science, with a particular interest in reaction networks. I first started thinking about them during my Ph.D. at the Laboratory for Molecular Science. My fascination with them has been predominantly mathematical. As a graduate student, I encountered an area with rich connections between combinatorics and dynamics, and surprisingly easy-to-state and compelling unsolved conjectures, and got hooked.

There is a story about Richard Feynman that he used to take bets with mathematicians. If any mathematician could make Feynman understand a mathematical statement, then Feynman would guess whether or not the statement was true. Of course, Feynman was in a habit of winning these bets, which allowed him to make the boast that mathematics, especially in its obsession for proof, was essentially irrelevant, since a relative novice like himself could after a moment’s thought guess at the truth of these mathematical statements. I have always felt Feynman’s claim to be unjust, but have often wondered what mathematical statement I would put to him so that his chances of winning were no better than random.

Today I want to tell you of a result about reaction networks that I have recently discovered with Abhishek Deshpande. The statement seems like a fine candidate to throw at Feynman because until we proved it, I would not have bet either way about its truth. Even after we obtained a short and elementary proof, I do not completely ‘see’ why it must be true. I am hoping some of you will be able to demystify it for me. So, I’m just going to introduce enough terms to be able to make the statement of our result, and let you think about how to prove it.

John and his colleagues have been talking about reaction networks as Petri nets in the network theory series on this blog. As discussed in part 2 of that series, a Petri net is a diagram like this:

Following John’s terminology, I will call the aqua squares ‘transitions’ and the yellow circles ‘species’. If we have some number #rabbit of rabbits and some number #wolf of wolves, we draw #rabbit many black dots called ‘tokens’ inside the yellow circle for rabbit, and #wolf tokens inside the yellow circle for wolf, like this:

Here #rabbit = 4 and #wolf = 3. The predation transition consumes one ‘rabbit’ token and one ‘wolf’ token, and produces two ‘wolf’ tokens, taking us here:

John explained in parts 2 and 3 how one can put rates on different transitions. For today I am only going to be concerned with ‘reachability:’ what token states are reachable from what other token states. John talked about this idea in part 25.

By a complex I will mean a population vector: a snapshot of the number of tokens in each species. In the example above, (#rabbit, #wolf) is a complex. If $y, y'$ are two complexes, then we write

$y \to y'$

if we can get from $y$ to $y'$ by a single transition in our Petri net. For example, we just saw that

$(4,3)\to (3,4)$

via the predation transition.

Reachability, denoted $\to^*$, is the transitive closure of the relation $\to$. So $y\to^* y'$ (read $y'$ is reachable from $y$) iff there are complexes

$y=y_0,y_1,y_2,\dots,y_k =y'$

such that

$y_0\to y_1\to\cdots\to y_{k-1}\to y_k.$

For example, here $(5,1) \to^* (1, 5)$ by repeated predation.

I am very interested in switches. After all, a computer is essentially a box of switches! You can build computers by connecting switches together. In fact, that’s how early computers like the Z3 were built. The CMOS gates at the heart of modern computers are essentially switches. By analogy, the study of switches in reaction networks may help us understand biochemical circuits.

A siphon is a set of species that is ‘switch-offable’. That is, if there are no tokens in the siphon states, then they will remain absent in future. Equivalently, the only reactions that can produce tokens in the siphon states are those that require tokens from the siphon states before they can fire. Note that no matter how many rabbits there are, if there are no wolves, there will continue to be no wolves. So {wolf} is a siphon. Similarly, {rabbit} is a siphon, as is the union {rabbit, wolf}. However, when Hydrogen and Oxygen form Water, {Water} is not a siphon.

For another example, consider this Petri net:

The set {HCl, NaCl} is a siphon. However, there is a conservation law: whenever an HCl token is destroyed, an NaCl token is created, so that #HCl + #NaCl is invariant. If both HCl and NaCl were present to begin with, the complexes where both are absent are not reachable. In this sense, this siphon is not ‘really’ switch-offable. As a first pass at capturing this idea, we will introduce the notion of ‘critical set’.

A conservation law is a linear expression involving numbers of tokens that is invariant under every transition in the Petri net. A conservation law is positive if all the coefficients are non-negative. A critical set of states is a set that does not contain the support of a positive conservation law.

For example, the support of the positive conservation law #HCl + #NaCl is {HCl, NaCl}, and hence no set containing this set is critical. Thus {HCl, NaCl} is a siphon, but not critical. On the other hand, the set {NaCl} is critical but not a siphon. {HCl} is a critical siphon. And in our other example, {Wolf, Rabbit} is a critical siphon.

Of particular interest to us will be minimal critical siphons, the minimal sets among critical siphons. Consider this example:

Here we have two transitions:

$X \to 2Y$

and

$2X \to Y$

The set $\{X,Y\}$ is a critical siphon. But so is the smaller set $\{X\}.$ So, $\{X,Y\}$ is not minimal.

We define a self-replicable set to be a set $A$ of species such that there exist complexes $y$ and $y'$ with $y\to^* y'$ such that for all $i \in A$ we have

$y'_i > y_i$

So, there are transitions that accomplish the job of creating more tokens for all the species in $A.$ In other words: these species can ‘replicate themselves’.

We define a drainable set by changing the $>$ to a $<$. So, there are transitions that accomplish the job of reducing the number of tokens for all the species in $A.$ These species can ‘drain away’.

Now here comes the statement:

Every minimal critical siphon is either drainable or self-replicable!

We prove it in this paper:

• Abhishek Deshpande and Manoj Gopalkrishnan, Autocatalysis in reaction networks.

But first note that the statement becomes false if the critical siphon is not minimal. Look at this example again:

The set $\{X,Y\}$ is a critical siphon. However $\{X,Y\}$ is neither self-replicable (since every reaction destroys $X$) nor drainable (since every reaction produces $Y$). But we’ve already seen that $\{X,Y\}$ is not minimal. It has a critical subsiphon, namely $\{X\}.$ This one is minimal—and it obeys our theorem, because it is drainable.

Checking these statements is a good way to make sure you understand the concepts! I know I’ve introduced a lot of terminology here, and it takes a while to absorb.

Anyway: our proof that every minimal critical siphon is either drainable or self-replicable makes use of a fun result about matrices. Consider a real square matrix with a sign pattern like this:

$\left( \begin{array}{cccc} <0 & >0 & \cdots & > 0 \\ >0 & <0 & \cdots &> 0 \\ \vdots & \vdots & <0 &> 0 \\ >0 & >0 & \cdots & <0 \end{array} \right)$

If the matrix is full-rank then there is a positive linear combination of the rows of the matrix so that all the entries are nonzero and have the same sign. In fact, we prove something stronger in Theorem 5.9 of our paper. At first, we thought this statement about matrices should be equivalent to one of the many well-known alternative statements of Farkas’ lemma, like Gordan’s theorem.

However, we could not find a way to make this work, so we ended up proving it by a different technique. Later, my colleague Jaikumar Radhakrishnan came up with a clever proof that uses Farkas’ lemma twice. However, so far we have not obtained the stronger result in Theorem 5.9 with this proof technique.

My interest in the result that every minimal critical siphon is either drainable or self-replicable is not purely aesthetic (though aesthetics is a big part of it). There is a research community of folks who are thinking of reaction networks as a programming language, and synthesizing molecular systems that exhibit sophisticated dynamical behavior as per specification:

Networks that exhibit some kind of catalytic behavior are a recurring theme among such systems, and even more so in biochemical circuits.

Here is an example of catalytic behavior:

$A + C \to B + C$

The ‘catalyst’ $C$ helps transform $A$ to $B.$ In the absence of $C,$ the reaction is turned off. Hence, catalysts are switches in chemical circuits! From this point of view, it is hardly surprising that they are required for the synthesis of complex behaviors.

In information processing, one needs amplification to make sure that a signal can propagate through a circuit without being overwhelmed by errors. Here is a chemical counterpart to such amplification:

$A + C \to 2C$

Here the catalyst $C$ catalyzes its own production: it is an ‘autocatalyst’, or a self-replicating species. By analogy, autocatalysis is key for scaling synthetic molecular systems.

Our work deals with these notions on a network level. We generalize the notion of catalysis in two ways. First, we allow a catalyst to be a set of species instead of a single species; second, its absence can turn off a reaction pathway instead of a single reaction. We propose the notion of self-replicable siphons as a generalization of the notion of autocatalysis. In particular, ‘weakly reversible’ networks have critical siphons precisely when they exhibit autocatalytic behavior. I was led to this work when I noticed the manifestation of this last statement in many examples.

Another hope I have is that perhaps one can study the dynamics of each minimal critical siphon of a reaction network separately, and then somehow be able to answer interesting questions about the dynamics of the entire network, by stitching together what we know for each minimal critical siphon. On the synthesis side, perhaps this could lead to a programming language to synthesize a reaction network that will achieve a specified dynamics. If any of this works out, it would be really cool! I think of how abelian group theory (and more broadly, the theory of abelian categories, which includes categories of vector bundles) benefits from a fundamental theorem that lets you break a finite abelian group into parts that are easy to study—or how number theory benefits from a special case, the fundamental theorem of arithmetic. John has also pointed out that reaction networks are really presentations of symmetric monoidal categories, so perhaps this could point the way to a Fundamental Theorem for Symmetric Monoidal Categories.

And then there is the Global Attractor Conjecture, a
long-standing open problem concerning the long-term behavior of solutions to the rate equations. Now that is a whole story by itself, and will have to wait for another day.

## Quantum Network Theory (Part 2)

13 August, 2013

guest post by Tomi Johnson

Last time I told you how a random walk called the ‘uniform escape walk’ could be used to analyze a network. In particular, Google uses it to rank nodes. For the case of an undirected network, the steady state of this random walk tells us the degrees of the nodes—that is, how many edges come out of each node.

Now I’m going to prove this to you. I’ll also exploit the connection between this random walk and a quantum walk, also introduced last time. In particular, I’ll connect the properties of this quantum walk to the degrees of a network by exploiting its relationship with the random walk.

This is pretty useful, considering how tricky these quantum walks can be. As the parts of the world that we model using quantum mechanics get bigger and have more complicated structures, like biological network, we need all the help in understanding quantum walks that we can get. So I’d better start!

### Flashback

Starting with any (simple, connected) graph, we can get an old-fashioned ‘stochastic’ random walk on this graph, but also a quantum walk. The first is the uniform escape stochastic walk, where the walker has an equal probability per time of walking along any edge leaving the node they are standing at. The second is the related quantum walk we’re going to study now. These two walks are generated by two matrices, which we called $S$ and $Q.$ The good thing is that these matrices are similar, in the technical sense.

We studied this last time, and everything we learned is summarized here:

where:

$G$ is a simple graph that specifies

$A$ the adjacency matrix (the generator of a quantum walk) with elements $A_{i j}$ equal to unity if nodes $i$ and $j$ are connected, and zero otherwise ($A_{i i} = 0$), which subtracted from

$D$ the diagonal matrix of degrees $D_{i i} = \sum_j A_{i j}$ gives

$L = D - A$ the symmetric Laplacian (generator of stochastic and quantum walks), which when normalized by $D$ returns both

$S = L D^{-1}$ the generator of the uniform escape stochastic walk and

$Q = D^{-1/2} L D^{-1/2}$ the quantum walk generator to which it is similar!

Now I hope you remember where we are. Next I’ll talk you through the mathematics of the uniform escape stochastic walk $S$ and how it connects to the degrees of the nodes in the large-time limit. Then I’ll show you how this helps us solve aspects of the quantum walk generated by $Q.$

### Stochastic walk

The uniform escape stochastic walk generated by $S$ is popular because it has a really useful stationary state.

To recap from Part 20 of the network theory series, a stationary state of a stochastic walk is one that does not change in time. By the master equation

$\displaystyle{ \frac{d}{d t} \psi(t) = -S \psi(t)}$

the stationary state must be an eigenvector of $S$ with eigenvalue $0.$

A fantastic pair of theorems hold:

• There is always a unique (up to multiplication by a positive number) positive eigenvector $\pi$ of $S$ with eigenvalue $0.$ That is, there is a unique stationary state $\pi.$

• Regardless of the initial state $\psi(0),$ any solution of the master equation approaches this stationary state $\pi$ in the large-time limit:

$\displaystyle{ \lim_{t \rightarrow \infty} \psi(t) = \pi }$

To find this unique stationary state, consider the Laplacian $L,$ which is both infinitesimal stochastic and symmetric. Among other things, this means the rows of $L$ sum to zero:

$\displaystyle{ \sum_j L_{i j} = 0 }$

Thus, the ‘all ones’ vector $\mathbf{1}$ is an eigenvector of $L$ with zero eigenvalue:

$L \mathbf{1} = 0$

Inserting the identity $I = D^{-1} D$ into this equation we then find $D \mathbf{1}$ is a zero eigenvector of $S$:

$L \mathbf{1} = ( L D^{-1} ) ( D \mathbf{1} ) = S ( D \mathbf{1} ) = 0$

Therefore we just need to normalize this to get the large-time stationary state of the walk:

$\displaystyle{ \pi = \frac{D \mathbf{1}}{\sum_i D_{i i}} }$

If we write $i$ for the basis vector that is zero except at the ith node of our graph, and 1 at that node, the inner product $\langle i , \pi \rangle$ is large-time probability of finding a walker at that node. The equation above implies this is proportional to the degree $D_{i i}$ of node $i.$

We can check this for the following graph:

We find that $\pi$ is

$\displaystyle{ \left( \begin{matrix} 1/6 \\ 1/6 \\ 1/4 \\ 1/4 \\ 1/6 \end{matrix} \right) }$

which implies large-time probability $1/6$ for nodes $1,$ $2$ and $5,$ and $1/4$ for nodes $3$ and $4.$ Comparing this to the original graph, this exactly reflects the arrangement of degrees, as we knew it must.

Math works!

### The quantum walk

Next up is the quantum walk generated by $Q.$ Not a lot is known about quantum walks on networks of arbitrary geometry, but below we’ll see some analytical results are obtained by exploiting the similarity of $S$ and $Q.$

Where to start? Well, let’s start at the bottom, what quantum physicists call the ground state. In contrast to stochastic walks, for a quantum walk every eigenvector $\phi_k$ of $Q$ is a stationary state of the quantum walk. (In Puzzle 5, at the bottom of this page, I ask you to prove this). The stationary state $\phi_0$ is of particular interest physically and mathematically. Physically, since eigenvectors of the $Q$ correspond to states of well-defined energy equal to the associated eigenvalue, $\phi_0$ is the state of lowest energy, energy zero, hence the name ‘ground state’. (In Puzzle 3, I ask you to prove that all eigenvalues of $Q$ are non-negative, so zero really does correspond to the ground state.)

Mathematically, the relationship between eigenvectors implied by the similarity of $S$ and $Q$ means

$\phi_0 \propto D^{-1/2} \pi \propto D^{1/2} \mathbf{1}$

So in the ground state, the probability of our quantum walker being found at node $i$ is

$| \langle i , \phi_0 \rangle |^2 \propto | \langle i , D^{1/2} \rangle \mathbf{1} |^2 = D_{i i}$

Amazingly, this probability is proportional to the degree and so is exactly the same as $\langle i , \pi \rangle,$ the probability in the stationary state $\pi$ of the stochastic walk!

In short: a zero energy quantum walk $Q$ leads to exactly the same distribution of the walker over the nodes as in the large-time limit of the uniform escape stochastic walk $S.$ The classically important notion of degree distribution also plays a role in quantum walks!

This is already pretty exciting. What else can we say? If you are someone who feels faint at the sight of quantum mechanics, well done for getting this far, but watch out for what’s coming next.

What if the walker starts in some other initial state? Is there some quantum walk analogue of the unique large-time state of a stochastic walk?

In fact, the quantum walk in general does not converge to a stationary state. But there is a probability distribution that can be thought to characterize the quantum walk in the same way as the large-time state characterizes the stochastic walk. It’s the large-time average probability vector $P.$

If you didn’t know the time that had passed since the beginning of a quantum walk, then the best estimate for the probability of your measuring the walker to be at node $i$ would be the large-time average probability

$\displaystyle{ \langle i , P \rangle = \lim_{T \rightarrow \infty} \frac{1}{T} \int_0^T | \psi_i (t) |^2 d t }$

There’s a bit that we can do to simplify this expression. As usual in quantum mechanics, let’s start with the trick of diagonalizing $Q.$ This amounts to writing

$\displaystyle{ Q= \sum_k \epsilon_k \Phi_k }$

where $\Phi_k$ are projectors onto the eigenvectors $\phi_k$ of $Q,$ and $\epsilon_k$ are the corresponding eigenvalues of $Q.$ If we insert this equation into

$\psi(t) = e^{-Q t} \psi(0)$

we get

$\displaystyle{ \psi(t) = \sum_k e^{-\epsilon_k t} \Phi_k \psi(0) }$

and thus

$\displaystyle{ \langle i , P \rangle = \lim_{T \rightarrow \infty} \frac{1}{T} \int_0^T | \sum_k e^{-i \epsilon_k t} \langle i, \Phi_k \psi (0) \rangle |^2 d t }$

Due to the integral over all time, the interference between terms corresponding to different eigenvalues averages to zero, leaving:

$\displaystyle{ \langle i , P \rangle = \sum_k | \langle i, \Phi_k \psi(0) \rangle |^2 }$

The large-time average probability is then the sum of terms contributed by the projections of the initial state onto each eigenspace.

So we have a distribution that characterizes a quantum walk for a general initial state, but it’s a complicated beast. What can we say about it?

Our best hope of understanding the large-time average probability is through the term $| \langle i, \Phi_0 \psi (0) \rangle |^2$ associated with the zero energy eigenspace, since we know everything about this space.

For example, we know the zero energy eigenspace is one-dimensional and spanned by the eigenvector $\phi_0.$ This means that the projector is just the usual outer product

$\Phi_0 = | \phi_0 \rangle \langle \phi_0 | = \phi_0 \phi_0^\dagger$

where we have normalized $\phi_0$ according to the inner product $\langle \phi_0, \phi_0\rangle = 1.$ (If you’re wondering why I’m using all these angled brackets, well, they’re a notation named after Dirac that is adored by quantum physicists.)

The zero eigenspace contribution to the large-time average probability then breaks nicely into two:

$\begin{array}{ccl} | \langle i, \Phi_0 \psi (0) \rangle |^2 &=& | \langle i, \phi_0\rangle \; \langle \phi_0, \psi (0) \rangle |^2 \\ \\ &=& | \langle i, \phi_0\rangle |^2 \; | \langle \phi_0 , \psi (0) \rangle |^2 \\ \\ &=& \langle i , \pi \rangle \; | \langle \phi_0 , \psi (0) \rangle |^2 \end{array}$

This is just the product of two probabilities:

• first, the probability $\langle i , \pi \rangle$ for a quantum state in the zero energy eigenspace to be at node $i,$ as we found above,

and

• second, the probability $| \langle \phi_0, \psi (0)\rangle |^2$ of being in this eigenspace to begin with. (Remember, in quantum mechanics the probability of measuring the system to have an energy is the modulus squared of the projection of the state onto the associated eigenspace, which for the one-dimensional zero energy eigenspace means just the inner product with the ground state.)

This is all we need to say something interesting about the large-time average probability for all states. We’ve basically shown that we can break the large-time probability vector $P$ into a sum of two normalized probability vectors:

$P = (1- \eta) \pi + \eta \Omega$

the first $\pi$ being the stochastic stationary state associated with the zero energy eigenspace, and the second $\Omega$ associated with the higher energy eigenspaces, with

$\displaystyle{ \langle i , \Omega \rangle = \frac{ \sum_{k\neq 0} | \langle i, \Phi_k \psi (0) \rangle |^2 }{ \eta} }$

The weight of each term is governed by the parameter

$\eta = 1 - | \langle \phi_0, \psi (0)\rangle |^2$

which you could think of as the quantumness of the result. This is one minus the probability of the walker being in the zero energy eigenspace, or equivalently the probability of the walker being outside the zero energy eigenspace.

So even if we don’t know $\Omega,$ we know its importance is controlled by a parameter $\eta$ that governs how close the large-time average distribution $P$ of the quantum walk is to the corresponding stochastic stationary distribution $\pi.$

What do we mean by ‘close’? Find out for yourself:

Puzzle 1. Show, using a triangle inequality, that the trace distance between the two characteristic stochastic and quantum distributions $\{ \langle i , P \rangle \}_i$ and $\{ \langle i , \pi \rangle \}_i$ is upper-bounded by $2 \eta.$

Can we say anything physical about when the quantumness $\eta$ is big or small?

Because the eigenvalues of $Q$ have a physical interpretation in terms of energy, the answer is yes. The quantumness $\eta$ is the probability of being outside the zero energy state. Call the next lowest eigenvalue $\Delta = \min_{k \neq 0} \epsilon_k$ the energy gap. If the quantum walk is not in the zero energy eigenspace then it must be in an eigenspace of energy greater or equal to $\Delta.$ Therefore the expected energy $E$ of the quantum walker must bound the quantumness $E \ge \eta \Delta.$

This tells us that a quantum walk with a low energy is similar to a stochastic walk in the large-time limit. We already knew this was exactly true in the zero energy limit, but this result goes further.

So little is known about quantum walks on networks of arbitrary geometry that we were very pleased to find this result. It says there is a special case in which the walk is characterized by the degree distribution of the network, and a clear physical parameter that bounds how far the walk is from this special case.

Also, in finding it we learned that the difficulties of the initial state dependence, enhanced by the lack of convergence to a stationary state, could be overcome for a quantum walk, and that the relationships between quantum and stochastic walks extend beyond those with shared generators.

### What next?

That’s all for the latest bit of idea sharing at the interface between stochastic and quantum systems.

Other questions we have include: What holds analytically about the form of the quantum correction? Numerically it is known that the so-called quantum correction $\Omega$ tends to enhance the probability of being found on nodes of low degree compared to $\pi.$ Can someone explain why? What happens if a small amount of stochastic noise is added to a quantum walk? Or a lot of noise?

It’s difficult to know who is best placed to answer these questions: experts in quantum physics, graph theory, complex networks or stochastic processes? I suspect it’ll take a bit of help from everyone.

A couple of textbooks with comprehensive sections on non-negative matrices and continuous-time stochastic processes are:

• Peter Lancaster and Miron Tismenetsky, The Theory of Matrices: with Applications, 2nd edition, Academic Press, San Diego, 1985.

• James R. Norris, Markov Chains, Cambridge University Press, Cambridge, 1997.

There is, of course, the book that arose from the Azimuth network theory series, which considers several relationships between quantum and stochastic processes on networks:

• John Baez and Jacob Biamonte, A Course on Quantum Techniques for Stochastic Mechanics, 2012.

Another couple of books on complex networks are:

• Mark Newman, Networks: An Introduction, Oxford University Press, Oxford, 2010.

• Ernesto Estrada, The Structure of Complex Networks: Theory and Applications, Oxford University Press, Oxford, 2011. Note that the first chapter is available free online.

There are plenty more useful references in our article on this topic:

• Mauro Faccin, Tomi Johnson, Jacob Biamonte, Sabre Kais and Piotr Migdał, Degree distribution in quantum walks on complex networks.

### Puzzles for the enthusiastic

Sadly I didn’t have space to show proofs of all the theorems I used. So here are a few puzzles that guide you to doing the proofs for yourself:

#### Stochastic walks and stationary states

Puzzle 2. (For the hard core.) Prove there is always a unique positive eigenvector for a stochastic walk generated by $S.$ You’ll need the assumption that the graph $G$ is connected. It’s not simple, and you’ll probably need help from a book, perhaps one of those above by Lancaster and Tismenetsky, and Norris.

Puzzle 3. Show that the eigenvalues of $S$ (and therefore $Q$) are non-negative. A good way to start this proof is to apply the Perron-Frobenius theorem to the non-negative matrix $M = - S + I \max_i S_{i i}.$ This implies that $M$ has a positive eigenvalue $r$ equal to its spectral radius

$r = \max_k | \lambda_k |$

where $\lambda_k$ are the eigenvalues of $M,$ and the associated eigenvector $v$ is positive. Since $S = - M + I \max_i S_{i i},$ it follows that $S$ shares the eigenvectors of $M$ and the associated eigenvalues are related by inverted translation:

$\epsilon_k = - \lambda_k + \max_i S_{i i}$

Puzzle 4. Prove that regardless of the initial state $\psi(0),$ the zero eigenvector $\pi$ is obtained in the large-time limit $\lim_{t \rightarrow \infty} \psi(t) = \pi$ of the walk generated by $S.$ This breaks down into two parts:

(a) Using the approach from Puzzle 5, to show that $S v = \epsilon_v v,$ the positivity of $v$ and the infinitesimal stochastic property $\sum_i S_{i j} = 0$ imply that $\epsilon_v = \epsilon_0 = 0$ and thus $v = \pi$ is actually the unique zero eigenvector and stationary state of $S$ (its uniqueness follows from puzzle 4, you don’t need to re-prove it).

(b) By inserting the decomposition $S = \sum_k \epsilon_k \Pi_k$ into $e^{-S t}$ and using the result of puzzle 5, complete the proof.

(Though I ask you to use the diagonalizability of $S,$ the main results still hold if the generator is irreducible but not diagonalizable.)

#### Quantum walks

Here are a couple of extra puzzles for those of you interested in quantum mechanics:

Puzzle 5. In quantum mechanics, probabilities are given by the moduli squared of amplitudes, so multiplying a state by a number of modulus unity has no physical effect. By inserting

$\displaystyle{ Q= \sum_k \epsilon_k \Phi_k }$

into the quantum time evolution matrix $e^{-Q t},$ show that if

$\psi(0) = \phi_k$

then

$\psi(t) = e^{ - i \epsilon_k t} \psi(0)$

hence $\phi_k$ is a stationary state in the quantum sense, as probabilities don’t change in time.

Puzzle 6. By expanding the initial state $\psi(0)$ in terms of the complete orthogonal basis vectors $\phi_k$ show that for a quantum walk $\psi(t)$ never converges to a stationary state unless it began in one.

## Quantum Network Theory (Part 1)

5 August, 2013

guest post by Tomi Johnson

If you were to randomly click a hyperlink on this web page and keep doing so on each page that followed, where would you end up?

As an esteemed user of Azimuth, I’d like to think you browse more intelligently, but the above is the question Google asks when deciding how to rank the world’s web pages.

Recently, together with the team (Mauro Faccin, Jacob Biamonte and Piotr Migdał) at the ISI Foundation in Turin, we attended a workshop in which several of the attendees were asking a similar question with a twist. “What if you, the web surfer, behaved quantum mechanically?”

Now don’t panic! I have no reason to think you might enter a superposition of locations or tunnel through a wall. This merely forms part of a recent drive towards understanding the role that network science can play in quantum physics.

As we’ll find, playing with quantum networks is fun. It could also become a necessity. The size of natural systems in which quantum effects have been identified has grown steadily over the past few years. For example, attention has recently turned to explaining the remarkable efficiency of light-harvesting complexes, comprising tens of molecules and thousands of atoms, using quantum mechanics. If this expansion continues, perhaps quantum physicists will have to embrace the concepts of complex networks.

To begin studying quantum complex networks, we found a revealing toy model. Let me tell you about it. Like all good stories, it has a beginning, a middle and an end. In this part, I’ll tell you the beginning and the middle. I’ll introduce the stochastic walk describing the randomly clicking web surfer mentioned above and a corresponding quantum walk. In part 2 the story ends with the bounding of the difference between the two walks in terms of the energy of the walker.

But for now I’ll start by introducing you to a graph, this time representing the internet!

If this taster gets you interested, there are more details available here:

• Mauro Faccin, Tomi Johnson, Jacob Biamonte, Sabre Kais and Piotr Migdał, Degree distribution in quantum walks on complex networks, arXiv:1305.6078 (2013).

### What does the internet look like from above?

As we all know, the idea of the internet is to connect computers to each other. What do these connections look like when abstracted as a network, with each computer a node and each connection an edge?

The internet on a local scale, such as in your house or office, might look something like this:

with several devices connected to a central hub. Each hub connects to other hubs, and so the internet on a slightly larger scale might look something like this:

What about the full global, not local, structure of the internet? To answer this question, researchers have developed representations of the whole internet, such as this one:

While such representations might be awe inspiring, how can we make any sense of them? Or are they merely excellent desktop wallpapers and new-age artworks?

In terms of complex network theory, there’s actually a lot that can be said that is not immediately obvious from the above representation.

For example, we find something very interesting if we plot the number of web pages with different incoming links (called degree) on a log-log axis. What is found for the African web is the following:

This shows that very few pages are linked to by a very large number others, while a very large number of pages receive very few links. More precisely, what this shows is a power law distribution, the signature of which is a straight line on a log-log axis.

In fact, power law distributions arise in a diverse number of real world networks, human-built networks such as the internet and naturally occurring networks. It is often discussed alongside the concept of the preferential attachment; highly connected nodes seem to accumulate connections more quickly. We all know of a successful blog whose success had led to an increased presence and more success. That’s an example of preferential attachment.

It’s clear then that degree is an important concept in network theory, and its distribution across the nodes a useful characteristic of a network. Degree gives one indication of how important a node is in a network.

And this is where stochastic walks come in. Google, who are in the business of ranking the importance of nodes (web pages) in a network (the web), use (up to a small modification) the idealized model of a stochastic walker (web surfer) who randomly hops to connected nodes (follows one of the links on a page). This is called the uniform escape model, since the total rate of leaving any node is set to be the same for all nodes. Leaving the walker to wander for a long while, Google then takes the probability of the walker being on a node to rank the importance of that node. In the case that the network is undirected (all links are reciprocated) this long-time probability, and therefore the rank of the node, is proportional to the degree of the node.

So node degrees and the uniform escape model play an important role in the fields of complex networks and stochastic walks. But can they tell us anything about the much more poorly understood topics of quantum networks and quantum walks? In fact, yes, and demonstrating that to you is the purpose of this pair of articles.

Before we move on to the interesting bit, the math, it’s worth just listing a few properties of quantum walks that make them hard to analyze, and explaining why they are poorly understood. These are the difficulties we will show how to overcome below.

No convergence. In a stochastic walk, if you leave the walker to wander for a long time, eventually the probability of finding a walker at a node converges to a constant value. In a quantum walk, this doesn’t happen, so the walk can’t be characterized so easily by its long-time properties.

Dependence on initial states. In some stochastic walks the long-time properties of the walk are independent of the initial state. It is possible to characterize the stochastic walk without referring to the initialization of the walker. Such a characterization is not so easy in quantum walks, since their evolution always depends on the initialization of the walker. Is it even possible then to say something useful that applies to all initializations?

Stochastic and quantum generators differ. Those of you familiar with the network theory series know that some generators produce both stochastic and quantum walks (see part 16 for more details). However, most stochastic walk generators, including that for the uniform escape model, do not generate quantum walks and vice versa. How do we then compare stochastic and quantum walks when their generators differ?

With the task outlined, let’s get started!

### Graphs and walks

In the next couple of sections I’m going to explain the diagram below to you. If you’ve been following the network theory series, in particular part 20, you’ll find parts of it familiar. But as it’s been a while since the last post covering this topic, let’s start with the basics.

A simple graph $G$ can be used to define both stochastic and quantum walks. A simple graph is something like this:

where there is at most one edge between any two nodes, there are no edges from a node to itself and all edges are undirected. To avoid complications, let’s stick to simple graphs with a finite number $n$ of nodes. Let’s also assume you can get from every node to every other node via some combination of edges i.e. the graph is connected.

In the particular example above the graph represents a network of $n = 5$ nodes, where nodes 3 and 4 have degree (number of edges) 3, and nodes 1, 2 and 5 have degree 2.

Every simple graph defines a matrix $A,$ called the adjacency matrix. For a network with $n$ nodes, this matrix is of size $n \times n,$ and each element $A_{i j}$ is unity if there is an edge between nodes $i$ and $j$, and zero otherwise (let’s use this basis for the rest of this post). For the graph drawn above the adjacency matrix is

$\left( \begin{matrix} 0 & 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 & 0 \end{matrix} \right)$

By construction, every adjacency matrix is symmetric:

$A =A^T$

(the $T$ means the transposition of the elements in the node basis) and further, because each $A$ is real, it is self-adjoint:

$A=A^\dagger$

(the $\dagger$ means conjugate transpose).

This is nice, since (as seen in parts 16 and 20) a self-adjoint matrix generates a continuous-time quantum walk.

To recap from the series, a quantum walk is an evolution arising from a quantum walker moving on a network.

A state of a quantum walk is represented by a size $n$ complex column vector $\psi$. Each element $\langle i , \psi \rangle$ of this vector is the so-called amplitude associated with node $i$ and the probability of the walker being found on that node (if measured) is the modulus of the amplitude squared $|\langle i , \psi \rangle|^2.$ Here $i$ is the standard basis vector with a single non-zero $i$th entry equal to unity, and $\langle u , v \rangle = u^\dagger v$ is the usual inner product.

A quantum walk evolves in time according to the Schrödinger equation

$\displaystyle{ \frac{d}{d t} \psi(t)= - i H \psi(t) }$

where $H$ is called the Hamiltonian. If the initial state is $\psi(0)$ then the solution is written as

$\psi(t) = \exp(- i t H) \psi(0)$

The probabilities $| \langle i , \psi (t) \rangle |^2$ are guaranteed to be correctly normalized when the Hamiltonian $H$ is self-adjoint.

There are other matrices that are defined by the graph. Perhaps the most familiar is the Laplacian, which has recently been a topic on this blog (see parts 15, 16 and 20 of the series, and this recent post).

The Laplacian $L$ is the $n \times n$ matrix

$L = D - A$

where the degree matrix $D$ is an $n \times n$ diagonal matrix with elements given by the degrees

$\displaystyle{ D_{i i}=\sum_{j} A_{i j} }$

For the graph drawn above, the degree matrix and Laplacian are:

$\left( \begin{matrix} 2 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 2 \end{matrix} \right) \qquad \mathrm{and} \qquad \left( \begin{matrix} 2 & -1 & 0 & -1 & 0 \\ -1 & 2 & -1 & 0 & 0 \\ 0 & -1 & 3 & -1 & -1 \\ -1 & 0 & -1 & 3 & -1 \\ 0 & 0 & -1 & -1 & 2 \end{matrix} \right)$

The Laplacian is self-adjoint and generates a quantum walk.

The Laplacian has another property; it is infinitesimal stochastic. This means that its off diagonal elements are non-positive and its columns sum to zero. This is interesting because an infinitesimal stochastic matrix generates a continuous-time stochastic walk.

To recap from the series, a stochastic walk is an evolution arising from a stochastic walker moving on a network.

A state of a stochastic walk is represented by a size $n$ non-negative column vector $\psi$. Each element $\langle i , \psi \rangle$ of this vector is the probability of the walker being found on node $i.$

A stochastic walk evolves in time according to the master equation

$\displaystyle{ \frac{d}{d t} \psi(t)= - H \psi(t) }$

where $H$ is called the stochastic Hamiltonian. If the initial state is $\psi(0)$ then the solution is written

$\psi(t) = \exp(- t H) \psi(0)$

The probabilities $\langle i , \psi (t) \rangle$ are guaranteed to be non-negative and correctly normalized when the stochastic Hamiltonian $H$ is infinitesimal stochastic.

So far, I have just presented what has been covered on Azimuth previously. However, to analyze the important uniform escape model we need to go beyond the class of (Dirichlet) generators that produce both quantum and stochastic walks. Further, we have to somehow find a related quantum walk. We’ll see below that both tasks are achieved by considering the normalized Laplacians: one generating the uniform escape stochastic walk and the other a related quantum walk.

### Normalized Laplacians

The two normalized Laplacians are:

• the asymmetric normalized Laplacian $S = L D^{-1}$ (that generates the uniform escape Stochastic walk) and

• the symmetric normalized Laplacian $Q = D^{-1/2} L D^{-1/2}$ (that generates a Quantum walk).

For the graph drawn above the asymmetric normalized Laplacian $S$ is

$\left( \begin{matrix} 1 & -1/2 & 0 & -1/3 & 0 \\ -1/2 & 1 & -1/3 & 0 & 0 \\ 0 & -1/2 & 1 & -1/3 & -1/2 \\ -1/2 & 0 & -1/3 & 1 & -1/2 \\ 0 & 0 & -1/3 & -1/3 & 1 \end{matrix} \right)$

The identical diagonal elements indicates that the total rates of leaving each node are identical, and the equality within each column of the other non-zero elements indicates that the walker is equally likely to hop to any node connected to its current node. This is the uniform escape model!

For the same graph the symmetric normalized Laplacian $Q$ is

$\left( \begin{matrix} 1 & -1/2 & 0 & -1/\sqrt{6} & 0 \\ -1/2 & 1 & -1/\sqrt{6} & 0 & 0 \\ 0 & -1/\sqrt{6} & 1 & -1/3 & -1/\sqrt{6} \\ -1/\sqrt{6} & 0 & -1/3 & 1 & -1/\sqrt{6} \\ 0 & 0 & -1/\sqrt{6} & -1/\sqrt{6} & 1 \end{matrix} \right)$

That the diagonal elements are identical in the quantum case indicates that all nodes are of equal energy, this is type of quantum walk usually considered.

Puzzle 1. Show that in general $S$ is infinitesimal stochastic but not self-adjoint.

Puzzle 2. Show that in general $Q$ is self-adjoint but not infinitesimal stochastic.

So a graph defines two matrices: one $S$ that generates a stochastic walk, and one $Q$ that generates a quantum walk. The natural question to ask is whether these walks are related. The answer is that they are!

Underpinning this relationship is the mathematical property that $S$ and $Q$ are similar. They are related by the following similarity transformation

$S = D^{1/2} Q D^{-1/2}$

which means that any eigenvector $\phi_k$ of $Q$ associated to eigenvalue $\epsilon_k$ gives a vector

$\pi_k \propto D^{1/2} \phi_k$

that is an eigenvector of $S$ with the same eigenvalue! To show this, insert the identity $I = D^{-1/2} D^{1/2}$ into

$Q \phi_k = \epsilon_k \phi_k$

and multiply from the left with $D^{1/2}$ to obtain

\begin{aligned} (D^{1/2} Q D^{-1/2} ) (D^{1/2} \phi_k) &= \epsilon_k ( D^{1/2} \phi_k ) \\ S \pi_k &= \epsilon_k \pi_k \end{aligned}

The same works in the opposite direction. Any eigenvector $\pi_k$ of $S$ gives an eigenvector

$\phi_k \propto D^{-1/2} \pi_k$

of $Q$ with the same eigenvalue $\epsilon_k.$

The mathematics is particularly nice because $Q$ is self-adjoint. A self-adjoint matrix is diagonalizable, and has real eigenvalues and orthogonal eigenvectors.

As a result, the symmetric normalized Laplacian can be decomposed as

$Q = \sum_k \epsilon_k \Phi_k$

where $\epsilon_k$ is real and $\Phi_k$ are orthogonal projectors. Each $\Phi_k$ acts as the identity only on vectors in the space spanned by $\phi_k$ and as zero on all others, such that

$\Phi_k \Phi_\ell = \delta_{k \ell} \Phi_k.$

Multiplying from the left by $D^{1/2}$ and the right by $D^{-1/2}$ results in a similar decomposition for $S$:

$S = \sum_k \epsilon_k \Pi_k$

with orthogonal projectors

$\Pi_k = D^{1/2} \Phi_k D^{-1/2}$

I promised above that I would explain the following diagram:

Let’s summarize what it represents now:

$G$ is a simple graph that specifies

$A$ the adjacency matrix (generator of a quantum walk), which subtracted from

$D$ the diagonal matrix of the degrees gives

$L$ the symmetric Laplacian (generator of stochastic and quantum walks), which when normalized by $D$ returns both

$S$ the generator of the uniform escape stochastic walk and

$Q$ the quantum walk generator to which it is similar!

### What next?

Sadly, this is where we’ll finish for now.

We have all the ingredients necessary to study the walks generated by the normalized Laplacians and exploit the relationship between them.

Next time, in part 2, I’ll talk you through the mathematics of the uniform escape stochastic walk $S$ and how it connects to the degrees of the nodes in the long-time limit. Then I’ll show you how this helps us solve aspects of the quantum walk generated by $Q.$

### In other news

Before I leave you, let me tell you about a workshop the ISI team recently attended (in fact helped organize) at the Institute of Quantum Computing, on the topic of quantum computation and complex networks. Needless to say, there were talks on papers related to quantum mechanics and networks!

Some researchers at the workshop gave exciting talks based on numerical examinations of what happens if a quantum walk is used instead of a stochastic walk to rank the nodes of a network:

• Giuseppe Davide Paparo and Miguel Angel Martín-Delgado, Google in a quantum network, Sci. Rep. 2 (2012), 444.

• Eduardo Sánchez-Burillo, Jordi Duch, Jesús Gómez-Gardenes and David Zueco, Quantum navigation and ranking in complex networks, Sci. Rep. 2 (2012), 605.

Others attending the workshop have numerically examined what happens when using quantum computers to represent the stationary state of a stochastic process:

• Silvano Garnerone, Paolo Zanardi and Daniel A. Lidar, Adiabatic quantum algorithm for search engine ranking, Phys. Rev. Lett. 108 (2012), 230506.

It was a fun workshop and we plan to organize/attend more in the future!

## Quantum Techniques for Reaction Networks

11 June, 2013

Fans of the network theory series might like to look at this paper:

• John Baez, Quantum techniques for reaction networks.

and I would certainly appreciate comments and corrections.

This paper tackles a basic question we never got around to discussing: how the probabilistic description of a system where bunches of things randomly interact and turn into other bunches of things can reduce to a deterministic description in the limit where there are lots of things!

Mathematically, such systems are given by ‘stochastic Petri nets’, or if you prefer, ‘stochastic reaction networks’. These are just two equivalent pictures of the same thing. For example, we could describe some chemical reactions using this Petri net:

but chemists would use this reaction network:

C + O2 → CO2
CO2 + NaOH → NaHCO3
NaHCO3 + HCl → H2O + NaCl + CO2

Making either of them ‘stochastic’ merely means that we specify a ‘rate constant’ for each reaction, saying how probable it is.

For any such system we get a ‘master equation’ describing how the probability of having any number of things of each kind changes with time. In the class I taught on this last quarter, the students and I figured out how to derive from this an equation saying how the expected number of things of each kind changes with time. Later I figured out a much slicker argument… but either way, we get this result:

Theorem. For any stochastic reaction network and any stochastic state $\Psi(t)$ evolving in time according to the master equation, then

$\displaystyle{ \frac{d}{dt} \langle N \Psi(t) \rangle } = \displaystyle{\sum_{\tau \in T}} \, r(\tau) \, (s(\tau) - t(\tau)) \; \left\langle N^{\underline{s(\tau)}}\, \Psi(t) \right\rangle$

assuming the derivative exists.

Of course this will make no sense yet if you haven’t been following the network theory series! But I explain all the notation in the paper, so don’t be scared. The main point is that $\langle N \Psi(t) \rangle$ is a vector listing the expected number of things of each kind at time $t.$ The equation above says how this changes with time… but it closely resembles the ‘rate equation’, which describes the evolution of chemical systems in a deterministic way.

And indeed, the next big theorem says that the master equation actually implies the rate equation when the probability of having various numbers of things of each kind is given by a product of independent Poisson distributions. In this case $\Psi(t)$ is what people in quantum physics call a ‘coherent state’. So:

Theorem. Given any stochastic reaction network, let
$\Psi(t)$ be a mixed state evolving in time according to the master equation. If $\Psi(t)$ is a coherent state when $t = t_0,$ then $\langle N \Psi(t) \rangle$ obeys the rate equation when $t = t_0.$

In most cases, this only applies exactly at one moment of time: later $\Psi(t)$ will cease to be a coherent state. Then we must resort to the previous theorem to see how the expected number of things of each kind changes with time.

But sometimes our state $\Psi(t)$ will stay coherent forever! For one case where this happens, see the companion paper, which I blogged about a little while ago:

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

We wrote this first, but logically it comes after the one I just finished now!

All this material will get folded into the book I’m writing with Jacob Biamonte. There are just a few remaining loose ends that need to be tied up.

## Network Theory (Part 29)

23 April, 2013

I’m talking about electrical circuits, but I’m interested in them as models of more general physical systems. Last time we started seeing how this works. We developed an analogy between electrical circuits and physical systems made of masses and springs, with friction:

 Electronics Mechanics charge: $Q$ position: $q$ current: $I = \dot{Q}$ velocity: $v = \dot{q}$ flux linkage: $\lambda$ momentum: $p$ voltage: $V = \dot{\lambda}$ force: $F = \dot{p}$ inductance: $L$ mass: $m$ resistance: $R$ damping coefficient: $r$ inverse capacitance: $1/C$ spring constant: $k$

But this is just the first of a large set of analogies. Let me list some, so you can see how wide-ranging they are!

### More analogies

People in system dynamics often use effort as a term to stand for anything analogous to force or voltage, and flow as a general term to stand for anything analogous to velocity or electric current. They call these variables $e$ and $f.$

To me it’s important that force is the time derivative of momentum, and velocity is the time derivative of position. Following physicists, I write momentum as $p$ and position as $q.$ So, I’ll usually write effort as $\dot{p}$ and flow as $\dot{q}$.

Of course, ‘position’ is a term special to mechanics; it’s nice to have a general term for the thing whose time derivative is flow, that applies to any context. People in systems dynamics seem to use displacement as that general term.

It would also be nice to have a general term for the thing whose time derivative is effort… but I don’t know one. So, I’ll use the word momentum.

Now let’s see the analogies! Let’s see how displacement $q$, flow $\dot{q},$ momentum $p$ and effort $\dot{p}$ show up in several subjects:

 displacement:    $q$ flow:      $\dot q$ momentum:      $p$ effort:           $\dot p$ Mechanics: translation position velocity momentum force Mechanics: rotation angle angular velocity angular momentum torque Electronics charge current flux linkage voltage Hydraulics volume flow pressure momentum pressure Thermal Physics entropy entropy flow temperature momentum temperature Chemistry moles molar flow chemical momentum chemical potential

We’d been considering mechanics of systems that move along a line, via translation, but we can also consider mechanics for systems that turn round and round, via rotation. So, there are two rows for mechanics here.

There’s a row for electronics, and then a row for hydraulics, which is closely analogous. In this analogy, a pipe is like a wire. The flow of water plays the role of current. Water pressure plays the role of electrostatic potential. The difference in water pressure between two ends of a pipe is like the voltage across a wire. When water flows through a pipe, the power equals the flow times this pressure difference—just as in an electrical circuit the power is the current times the voltage across the wire.

A resistor is like a narrowed pipe:

An inductor is like a heavy turbine placed inside a pipe: this makes the water tend to keep flowing at the same rate it’s already flowing! In other words, it provides a kind of ‘inertia’ analogous
to mass.

A capacitor is like a tank with pipes coming in from both ends, and a rubber sheet dividing it in two lengthwise:

When studying electrical circuits as a kid, I was shocked when I first learned that capacitors don’t let the electrons through: it didn’t seem likely you could do anything useful with something like that! But of course you can. Similarly, this gizmo doesn’t let the water through.

A voltage source is like a compressor set up to maintain a specified pressure difference between the input and output:

Similarly, a current source is like a pump set up to maintain a specified flow.

Finally, just as voltage is the time derivative of a fairly obscure quantity called ‘flux linkage’, pressure is the time derivative of an even more obscure quantity which has no standard name. I’m calling it ‘pressure momentum’, thanks to the analogy

momentum: force :: pressure momentum: pressure

Just as pressure has units of force per area, pressure momentum has units of momentum per area!

People invented this analogy back when they were first struggling to understand electricity, before electrons had been observed:

Hydraulic analogy, Wikipedia.

The famous electrical engineer Oliver Heaviside pooh-poohed this analogy, calling it the “drain-pipe theory”. I think he was making fun of William Henry Preece. Preece was another electrical engineer, who liked the hydraulic analogy and disliked Heaviside’s fancy math. In his inaugural speech as president of the Institution of Electrical Engineers in 1893, Preece proclaimed:

True theory does not require the abstruse language of mathematics to make it clear and to render it acceptable. All that is solid and substantial in science and usefully applied in practice, have been made clear by relegating mathematic symbols to their proper store place—the study.

According to the judgement of history, Heaviside made more progress in understanding electromagnetism than Preece. But there’s still a nice analogy between electronics and hydraulics. And I’ll eventually use the abstruse language of mathematics to make it very precise!

But now let’s move on to the row called ‘thermal physics’. We could also call this ‘thermodynamics’. It works like this. Say you have a physical system in thermal equilibrium and all you can do is heat it up or cool it down ‘reversibly’—that is, while keeping it in thermal equilibrium all along. For example, imagine a box of gas that you can heat up or cool down. If you put a tiny amount $dE$ of energy into the system in the form of heat, then its entropy increases by a tiny amount $dS.$ And they’re related by this equation:

$dE = TdS$

where $T$ is the temperature.

Another way to say this is

$\displaystyle{ \frac{dE}{dt} = T \frac{dS}{dt} }$

where $t$ is time. On the left we have the power put into the system in the form of heat. But since power should be ‘effort’ times ‘flow’, on the right we should have ‘effort’ times ‘flow’. It makes some sense to call $dS/dt$ the ‘entropy flow’. So temperature, $T,$ must play the role of ‘effort’.

This is a bit weird. I don’t usually think of temperature as a form of ‘effort’ analogous to force or torque. Stranger still, our analogy says that ‘effort’ should be the time derivative of some kind of ‘momentum’, So, we need to introduce temperature momentum: the integral of temperature over time. I’ve never seen people talk about this concept, so it makes me a bit nervous.

But when we have a more complicated physical system like a piston full of gas in thermal equilibrium, we can see the analogy working. Now we have

$dE = TdS - PdV$

The change in energy $dE$ of our gas now has two parts. There’s the change in heat energy $TdS$, which we saw already. But now there’s also the change in energy due to compressing the piston! When we change the volume of the gas by a tiny amount $dV,$ we put in energy $-PdV.$

Now look back at the first chart I drew! It says that pressure is a form of ‘effort’, while volume is a form of ‘displacement’. If you believe that, the equation above should help convince you that temperature is also a form of effort, while entropy is a form of displacement.

But what about the minus sign? That’s no big deal: it’s the result of some arbitrary conventions. $P$ is defined to be the outward pressure of the gas on our piston. If this is positive, reducing the volume of the gas takes a positive amount of energy, so we need to stick in a minus sign. I could eliminate this minus sign by changing some conventions—but if I did, the chemistry professors at UCR would haul me away and increase my heat energy by burning me at the stake.

Speaking of chemistry: here’s how the chemistry row in the analogy chart works. Suppose we have a piston full of gas made of different kinds of molecules, and there can be chemical reactions that change one kind into another. Now our equation gets fancier:

$\displaystyle{ dE = TdS - PdV + \sum_i \mu_i dN_i }$

Here $N_i$ is the number of molecules of the ith kind, while $\mu_i$ is a quantity called a chemical potential. The chemical potential simply says how much energy it takes to increase the number of molecules of a given kind. So, we see that chemical potential is another form of effort, while number of molecules is another form of displacement.

But chemists are too busy to count molecules one at a time, so they count them in big bunches called ‘moles’. A mole is the number of atoms in 12 grams of carbon-12. That’s roughly

602,214,150,000,000,000,000,000

atoms. This is called Avogadro’s constant. If we used 1 gram of hydrogen, we’d get a very close number called ‘Avogadro’s number’, which leads to lots of jokes:

(He must be desperate because he looks so weird… sort of like a mole!)

So, instead of saying that the displacement in chemistry is called ‘number of molecules’, you’ll sound more like an expert if you say ‘moles’. And the corresponding flow is called molar flow.

The truly obscure quantity in this row of the chart is the one whose time derivative is chemical potential! I’m calling it chemical momentum simply because I don’t know another name.

Why are linear and angular momentum so famous compared to pressure momentum, temperature momentum and chemical momentum?

I suspect it’s because the laws of physics are symmetrical
under translations and rotations. When the assumptions of Noether’s theorem hold, this guarantees that the total momentum and angular momentum of a closed system are conserved. Apparently the laws of physics lack the symmetries that would make the other kinds of momentum be conserved.

This suggests that we should dig deeper and try to understand more deeply how this chart is connected to ideas in classical mechanics, like Noether’s theorem or symplectic geometry. I will try to do that sometime later in this series.

More generally, we should try to understand what gives rise to a row in this analogy chart. Are there are lots of rows I haven’t talked about yet, or just a few? There are probably lots. But are there lots of practically important rows that I haven’t talked about—ones that can serve as the basis for new kinds of engineering? Or does something about the structure of the physical world limit the number of such rows?

### Mildly defective analogies

Engineers care a lot about dimensional analysis. So, they often make a big deal about the fact that while effort and flow have different dimensions in different rows of the analogy chart, the following four things are always true:

$pq$ has dimensions of action (= energy × time)
$\dot{p} q$ has dimensions of energy
$p \dot{q}$ has dimensions of energy
$\dot{p} \dot{q}$ has dimensions of power (= energy / time)

In fact any one of these things implies all the rest.

These facts are important when designing ‘mixed systems’, which combine different rows in the chart. For example, in mechatronics, we combine mechanical and electronic elements in a single circuit! And in a hydroelectric dam, power is converted from hydraulic to mechanical and then electric form:

One goal of network theory should be to develop a unified language for studying mixed systems! Engineers have already done most of the hard work. And they’ve realized that thanks to conservation of energy, working with pairs of flow and effort variables whose product has dimensions of power is very convenient. It makes it easy to track the flow of energy through these systems.

However, people have tried to extend the analogy chart to include ‘mildly defective’ examples where effort times flow doesn’t have dimensions of power. The two most popular are these:

 displacement:    $q$ flow:      $\dot q$ momentum:      $p$ effort:           $\dot p$ Heat flow heat heat flow temperature momentum temperature Economics inventory product flow economic momentum product price

The heat flow analogy comes up because people like to think of heat flow as analogous to electrical current, and temperature as analogous to voltage. Why? Because an insulated wall acts a bit like a resistor! The current flowing through a resistor is a function the voltage across it. Similarly, the heat flowing through an insulated wall is about proportional to the difference in temperature between the inside and the outside.

However, there’s a difference. Current times voltage has dimensions of power. Heat flow times temperature does not have dimensions of power. In fact, heat flow by itself already has dimensions of power! So, engineers feel somewhat guilty about this analogy.

Being a mathematical physicist, a possible way out presents itself to me: use units where temperature is dimensionless! In fact such units are pretty popular in some circles. But I don’t know if this solution is a real one, or whether it causes some sort of trouble.

In the economic example, ‘energy’ has been replaced by ‘money’. So other words, ‘inventory’ times ‘product price’ has units of money. And so does ‘product flow’ times ‘economic momentum’! I’d never heard of economic momentum before I started studying these analogies, but I didn’t make up that term. It’s the thing whose time derivative is ‘product price’. Apparently economists have noticed a tendency for rising prices to keep rising, and falling prices to keep falling… a tendency toward ‘conservation of momentum’ that doesn’t fit into their models of rational behavior.

I’m suspicious of any attempt to make economics seem like physics. Unlike elementary particles or rocks, people don’t seem to be very well modelled by simple differential equations. However, some economists have used the above analogy to model economic systems. And I can’t help but find that interesting—even if intellectually dubious when taken too seriously.

### An auto-analogy

Beside the analogy I’ve already described between electronics and mechanics, there’s another one, called ‘Firestone’s analogy’:

• F.A. Firestone, A new analogy between mechanical and electrical systems, Journal of the Acoustical Society of America 4 (1933), 249–267.

Alain Bossavit pointed this out in the comments to Part 27. The idea is to treat current as analogous to force instead of velocity… and treat voltage as analogous to velocity instead of force!

In other words, switch your $p$’s and $q$’s:

 Electronics Mechanics          (usual analogy) Mechanics      (Firestone’s analogy) charge position: $q$ momentum: $p$ current velocity: $\dot{q}$ force: $\dot{p}$ flux linkage momentum: $p$ position: $q$ voltage force: $\dot{p}$ velocity: $\dot{q}$

This new analogy is not ‘mildly defective’: the product of effort and flow variables still has dimensions of power. But why bother with another analogy?

It may be helpful to recall this circuit from last time:

It’s described by this differential equation:

$L \ddot{Q} + R \dot{Q} + C^{-1} Q = V$

We used the ‘usual analogy’ to translate it into classical mechanics problem, and we got a problem where an object of mass $L$ is hanging from a spring with spring constant $1/C$ and damping coefficient $R,$ and feeling an additional external force $F:$

$m \ddot{q} + r \dot{q} + k q = F$

And that’s fine. But there’s an intuitive sense in which all three forces are acting ‘in parallel’ on the mass, rather than in series. In other words, all side by side, instead of one after the other.

Using Firestone’s analogy, we get a different classical mechanics problem, where the three forces are acting in series. The spring is connected to source of friction, which in turn is connected to an external force.

This may seem a bit mysterious. But instead of trying to explain it, I’ll urge you to read his paper, which is short and clearly written. I instead want to make a somewhat different point, which is that we can take a mechanical system, convert it to an electrical one following the usual analogy, and then convert back to a mechanical one using Firestone’s analogy. This gives us an ‘auto-analogy’ between mechanics and itself, which switches $p$ and $q.$

And although I haven’t been able to figure out why from Firestone’s paper, I have other reasons for feeling sure this auto-analogy should contain a minus sign. For example:

$p \mapsto q, \qquad q \mapsto -p$

In other words, it should correspond to a 90° rotation in the $(p,q)$ plane. There’s nothing sacred about whether we rotate clockwise or counterclockwise; we can equally well do this:

$p \mapsto -q, \qquad q \mapsto p$

But we need the minus sign to get a so-called symplectic transformation of the $(p,q)$ plane. And from my experience with classical mechanics, I’m pretty sure we want that. If I’m wrong, please let me know!

I have a feeling we should revisit this issue when we get more deeply into the symplectic aspects of circuit theory. So, I won’t go on now.

### References

The analogies I’ve been talking about are studied in a branch of engineering called system dynamics. You can read more about it here:

• Dean C. Karnopp, Donald L. Margolis and Ronald C. Rosenberg, System Dynamics: a Unified Approach, Wiley, New York, 1990.

• Forbes T. Brown, Engineering System Dynamics: a Unified Graph-Centered Approach, CRC Press, Boca Raton, 2007.

• Francois E. Cellier, Continuous System Modelling, Springer, Berlin, 1991.

System dynamics already uses lots of diagrams of networks. One of my goals in weeks to come is to explain the category theory lurking behind these diagrams.

## Petri Net Programming (Part 3)

19 April, 2013

guest post by David Tanzer

### The role of differential equations

Last time we looked at stochastic Petri nets, which use a random event model for the reactions. Individual entities are represented by tokens that flow through the network. When the token counts get large, we observed that they can be approximated by continuous quantities, which opens the door to the application of continuous mathematics to the analysis of network dynamics.

A key result of this approach is the “rate equation,” which gives a law of motion for the expected sizes of the populations. Equilibrium can then be obtained by solving for zero motion. The rate equations are applied in chemistry, where they give the rates of change of the concentrations of the various species in a reaction.

But before discussing the rate equation, here I will talk about the mathematical form of this law of motion, which consists of differential equations. This form is naturally associated with deterministic systems involving continuous magnitudes. This includes the equations of motion for the sine wave:

and the graceful ellipses that are traced out by the orbits of the planets around the sun:

This post provides some mathematical context to programmers who have not worked on scientific applications. My goal is to get as many of you on board as possible, before setting sail with Petri net programming.

### Three approaches to equations: theoretical, formula-based, and computational

Let’s first consider the major approaches to equations in general. We’ll illustrate with a Diophantine equation

$x^9 + y^9 + z^9 = 2$

where $x, y$ and $z$ are integer variables.

In the theoretical approach (aka “qualitative analysis”), we start with the meaning of the equation and then proceed to reason about its solutions. Here are some simple consequences of this equation. They can’t all be zero, can’t all be positive, can’t all be negative, can’t all be even, and can’t all be odd.

In the formula-based approach, we seek formulas to describe the solutions. Here is an example of a formula (which does not solve our equation):

$\{(x,y,z) | x = n^3, y = 2n - 4, z = 4 n | 1 \leq n \leq 5 \}$

Such formulas are nice to have, but the pursuit of them is diabolically difficult. In fact, for Diophantine equations, even the question of whether an arbitrarily chosen equation has any solutions whatsoever has been proven to be algorithmically undecidable.

Finally, in the computational approach, we seek algorithms to enumerate or numerically approximate the solutions to the equations.

### The three approaches to differential equations

Let’s apply the preceding classification to differential equations.

#### Theoretical approach

A differential equation is one that constrains the rates at which the variables are changing. This can include constraints on the rates at which the rates are changing (second-order equations), etc. The equation is ordinary if there is a single independent variable, such as time, otherwise it is partial.

Consider the equation stating that a variable increases at a rate equal to its current value. The bigger it gets, the faster it increases. Given a starting value, this determines a process — the solution to the equation — which here is exponential growth.

Let $X(t)$ be the value at time $t,$ and let’s initialize it to 1 at time 0. So we have:

$X(0) = 1$

$X'(t) = X(t)$

These are first-order equations, because the derivative is applied at most once to any variable. They are linear equations, because the terms on each side of the equations are linear combinations of either individual variables or derivatives (in this case all of the coefficients are 1). Note also that a system of differential equations may in general have zero, one, or multiple solutions. This example belongs to a class of equations which are proven to have a unique solution for each initial condition.

You could imagine more complex systems of equations, involving multiple dependent variables, all still depending on time. That includes the rate equations for a Petri net, which have one dependent variable for each of the population sizes. The ideas for such systems are an extension of the ideas for a single-variable system. Then, a state of the system is a vector of values, with one component for each of the dependent variables. For first-order systems, such as the rate equations, where the derivatives appear on the left-hand sides, the equations determine, for each possible state of the system, a “direction” and rate of change for the state of the system.

Now here is a simple illustration of what I called the theoretical approach. Can $X(t)$ ever become negative? No, because it starts out positive at time 0, and in order to later become negative, it must be decreasing at a time $t_1$ when it is still positive. That is to say, $X(t_1) > 0$, and $X'(t_1) < 0$. But that contradicts the assumption $X'(t) = X(t)$. The general lesson here is that we don’t need a solution formula in order to make such inferences.

For the rate equations, the theoretical approach leads to substantial theorems about the existence and structure of equilibrium solutions.

#### Formula-based approach

It is natural to look for concise formulas to solve our equations, but the results of this overall quest are largely negative. The exponential differential equation cannot be solved by any formula that involves a finite combination of simple operations. So the solution function must be treated as a new primitive, and given a name, say $\exp(t)$. But even when we extend our language to include this new symbol, there are many differential equations that remain beyond the reach of finite formulas. So an endless collection of primitive functions is called for. (As standard practice, we always include $exp(t),$ and its complex extensions to the trigonometric functions, as primitives in our toolbox.)

But the hard mathematical reality does not end here, because even when solution formulas do exist, finding them may call for an ace detective. Only for certain classes of differential equations, such as the linear ones, do we have systematic solution methods.

The picture changes, however, if we let the formulas contain an infinite number of operations. Then the arithmetic operators give a far-reaching base for defining new functions. In fact, as you can verify, the power series

$X(t) = 1 + t + t^2/2! + t^3/3! + ...$

which we view as an “infinite polynomial” over the time parameter t, exactly satisfies our equations for exponential motion, $X(0) = 1$ and $X'(t) = X(t).$ This power series therefore defines $\exp(t).$ By the way, applying it to the input 1 produces a definition for the transcendental number $e$:

$e = X(1) = 1 + 1 + 1/2 + 1/6 + 1/24 + 1/120 + ... \approx 2.71828$

#### Computational approach

Let’s leave aside our troubles with formulas, and consider the computational approach. For broad classes of differential equations, there are approximation algorithms that be successfully applied.

For starters, any power series that satisfies a differential equation may work for a simple approximation method. If a series is known to converge over some range of inputs, then one can approximate the value at those points by stopping the computation after a finite number of terms.

But the standard methods work directly with the equations, provided that they can be put into the right form. The simplest one is called Euler’s method. It works over a sampling grid of points separated by some small number $\epsilon$. Let’s take the case where we have a first-order equation in explicit form, which means that $X'(t) = f(X(t))$ for some function $f.$

We begin with the initial value $X(0)$. Applying $f,$ we get $X'(0) = f(X(0)).$ Then for the interval from 0 to $\epsilon$,we use a linear approximation for $X(t)$, by assuming that the derivative remains constant at $X'(0).$ That gives $X(\epsilon) = X(0) + \epsilon \cdot X'(0).$ Next, $X'(\epsilon) = f(X(\epsilon)),$ and $X(2 \epsilon) = X(\epsilon) + \epsilon \cdot X'(\epsilon),$ etc. Formally,

$X(0) = \textrm{initial}$

$X((n+1) \epsilon) = X(n \epsilon) + \epsilon f(X(n \epsilon))$

Applying this to our exponential equation, where $f(X(t)) = X(t),$ we get:

$X(0) = 1$

$X((n+1) \epsilon) = X(n \epsilon) + \epsilon X(n \epsilon) = X(n \epsilon) (1 + \epsilon)$

Hence:

$X(n \epsilon) = (1 + \epsilon) ^ n$

So the approximation method gives a discrete exponential growth, which converges to a continuous exponential in the limit as the mesh size goes to zero.

Note, the case we just considered has more generality than might appear at first, because (1) the ideas here are easily extended to systems of explicit first order equations, and (2) higher-order equations that are “explicit” in an extended sense—meaning that the highest-order derivative is expressed as a function of time, of the variable, and of the lower-order derivatives—can be converted into an equivalent system of explicit first-order equations.

### The challenging world of differential equations

So, is our cup half-empty or half-full? We have no toolbox of primitive formulas for building the solutions to all differential equations by finite compositions. And even for those which can be solved by formulas, there is no general method for finding the solutions. That is how the cookie crumbles. But on the positive side, there is an array of theoretical tools for analyzing and solving important classes of differential equations, and numerical methods can be applied in many cases.

The study of differential equations leads to some challenging problems, such as the Navier-Stokes equations, which describe the flow of fluids.

These are partial differential equations involving flow velocity, pressure, density and external forces (such as gravity), all of which vary over space and time. There are non-linear (multiplicative) interactions between these variables and their spatial and temporal derivatives, which leads to complexity in the solutions.

At high flow rates, this complexity can produce chaotic solutions, which involve complex behavior at a wide range of resolution scales. This is turbulence. Here is an insightful portrait of turbulence, by Leonardo da Vinci, whose studies in turbulence date back to the 15th Century.

Turbulence, which has been described by Richard Feynman as the most important unsolved problem of classical physics, also presents a mathematical puzzle. The general existence of solutions to the Navier-Stokes equations remains unsettled. This is one of the “Millennium Prize Problems”, for which a one million dollar prize is offered: in three dimensions, given initial values for the velocity and scalar fields, does there exist a solution that is smooth and everywhere defined? There are also complications with grid-based numerical methods, which will fail to produce globally accurate results if the solutions contain details at a smaller scale than the grid mesh. So the ubiquitous phenomenon of turbulence, which is so basic to the movements of the atmosphere and the seas, remains an open case.

But fortunately we have enough traction with differential equations to proceed directly with the rate equations for Petri nets. There we will find illuminating equations, which are the subject of both major theorems and open problems. They are non-linear and intractable by formula-based methods, yet, as we will see, they are well handled by numerical methods.