Quantum Network Theory (Part 1)

5 August, 2013

guest post by Tomi Johnson

If you were to randomly click a hyperlink on this web page and keep doing so on each page that followed, where would you end up?

As an esteemed user of Azimuth, I’d like to think you browse more intelligently, but the above is the question Google asks when deciding how to rank the world’s web pages.

Recently, together with the team (Mauro Faccin, Jacob Biamonte and Piotr Migdał) at the ISI Foundation in Turin, we attended a workshop in which several of the attendees were asking a similar question with a twist. “What if you, the web surfer, behaved quantum mechanically?”

Now don’t panic! I have no reason to think you might enter a superposition of locations or tunnel through a wall. This merely forms part of a recent drive towards understanding the role that network science can play in quantum physics.

As we’ll find, playing with quantum networks is fun. It could also become a necessity. The size of natural systems in which quantum effects have been identified has grown steadily over the past few years. For example, attention has recently turned to explaining the remarkable efficiency of light-harvesting complexes, comprising tens of molecules and thousands of atoms, using quantum mechanics. If this expansion continues, perhaps quantum physicists will have to embrace the concepts of complex networks.

To begin studying quantum complex networks, we found a revealing toy model. Let me tell you about it. Like all good stories, it has a beginning, a middle and an end. In this part, I’ll tell you the beginning and the middle. I’ll introduce the stochastic walk describing the randomly clicking web surfer mentioned above and a corresponding quantum walk. In part 2 the story ends with the bounding of the difference between the two walks in terms of the energy of the walker.

But for now I’ll start by introducing you to a graph, this time representing the internet!

If this taster gets you interested, there are more details available here:

• Mauro Faccin, Tomi Johnson, Jacob Biamonte, Sabre Kais and Piotr Migdał, Degree distribution in quantum walks on complex networks, arXiv:1305.6078 (2013).

What does the internet look like from above?

As we all know, the idea of the internet is to connect computers to each other. What do these connections look like when abstracted as a network, with each computer a node and each connection an edge?

The internet on a local scale, such as in your house or office, might look something like this:


Local network

with several devices connected to a central hub. Each hub connects to other hubs, and so the internet on a slightly larger scale might look something like this:


Regional network

What about the full global, not local, structure of the internet? To answer this question, researchers have developed representations of the whole internet, such as this one:


Global network

While such representations might be awe inspiring, how can we make any sense of them? Or are they merely excellent desktop wallpapers and new-age artworks?

In terms of complex network theory, there’s actually a lot that can be said that is not immediately obvious from the above representation.

For example, we find something very interesting if we plot the number of web pages with different incoming links (called degree) on a log-log axis. What is found for the African web is the following:

Power law degree distribution

This shows that very few pages are linked to by a very large number others, while a very large number of pages receive very few links. More precisely, what this shows is a power law distribution, the signature of which is a straight line on a log-log axis.

In fact, power law distributions arise in a diverse number of real world networks, human-built networks such as the internet and naturally occurring networks. It is often discussed alongside the concept of the preferential attachment; highly connected nodes seem to accumulate connections more quickly. We all know of a successful blog whose success had led to an increased presence and more success. That’s an example of preferential attachment.

It’s clear then that degree is an important concept in network theory, and its distribution across the nodes a useful characteristic of a network. Degree gives one indication of how important a node is in a network.

And this is where stochastic walks come in. Google, who are in the business of ranking the importance of nodes (web pages) in a network (the web), use (up to a small modification) the idealized model of a stochastic walker (web surfer) who randomly hops to connected nodes (follows one of the links on a page). This is called the uniform escape model, since the total rate of leaving any node is set to be the same for all nodes. Leaving the walker to wander for a long while, Google then takes the probability of the walker being on a node to rank the importance of that node. In the case that the network is undirected (all links are reciprocated) this long-time probability, and therefore the rank of the node, is proportional to the degree of the node.

So node degrees and the uniform escape model play an important role in the fields of complex networks and stochastic walks. But can they tell us anything about the much more poorly understood topics of quantum networks and quantum walks? In fact, yes, and demonstrating that to you is the purpose of this pair of articles.

Before we move on to the interesting bit, the math, it’s worth just listing a few properties of quantum walks that make them hard to analyze, and explaining why they are poorly understood. These are the difficulties we will show how to overcome below.

No convergence. In a stochastic walk, if you leave the walker to wander for a long time, eventually the probability of finding a walker at a node converges to a constant value. In a quantum walk, this doesn’t happen, so the walk can’t be characterized so easily by its long-time properties.

Dependence on initial states. In some stochastic walks the long-time properties of the walk are independent of the initial state. It is possible to characterize the stochastic walk without referring to the initialization of the walker. Such a characterization is not so easy in quantum walks, since their evolution always depends on the initialization of the walker. Is it even possible then to say something useful that applies to all initializations?

Stochastic and quantum generators differ. Those of you familiar with the network theory series know that some generators produce both stochastic and quantum walks (see part 16 for more details). However, most stochastic walk generators, including that for the uniform escape model, do not generate quantum walks and vice versa. How do we then compare stochastic and quantum walks when their generators differ?

With the task outlined, let’s get started!

Graphs and walks

In the next couple of sections I’m going to explain the diagram below to you. If you’ve been following the network theory series, in particular part 20, you’ll find parts of it familiar. But as it’s been a while since the last post covering this topic, let’s start with the basics.

Diagram outlining the main concepts

A simple graph G can be used to define both stochastic and quantum walks. A simple graph is something like this:

Illustration of a simple graph

where there is at most one edge between any two nodes, there are no edges from a node to itself and all edges are undirected. To avoid complications, let’s stick to simple graphs with a finite number n of nodes. Let’s also assume you can get from every node to every other node via some combination of edges i.e. the graph is connected.

In the particular example above the graph represents a network of n = 5 nodes, where nodes 3 and 4 have degree (number of edges) 3, and nodes 1, 2 and 5 have degree 2.

Every simple graph defines a matrix A, called the adjacency matrix. For a network with n nodes, this matrix is of size n \times n, and each element A_{i j} is unity if there is an edge between nodes i and j, and zero otherwise (let’s use this basis for the rest of this post). For the graph drawn above the adjacency matrix is

\left( \begin{matrix} 0 & 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 & 0 \end{matrix} \right)

By construction, every adjacency matrix is symmetric:

A =A^T

(the T means the transposition of the elements in the node basis) and further, because each A is real, it is self-adjoint:


(the \dagger means conjugate transpose).

This is nice, since (as seen in parts 16 and 20) a self-adjoint matrix generates a continuous-time quantum walk.

To recap from the series, a quantum walk is an evolution arising from a quantum walker moving on a network.

A state of a quantum walk is represented by a size n complex column vector \psi. Each element \langle i , \psi \rangle of this vector is the so-called amplitude associated with node i and the probability of the walker being found on that node (if measured) is the modulus of the amplitude squared |\langle i , \psi \rangle|^2. Here i is the standard basis vector with a single non-zero ith entry equal to unity, and \langle u , v \rangle = u^\dagger v is the usual inner product.

A quantum walk evolves in time according to the Schrödinger equation

\displaystyle{ \frac{d}{d t} \psi(t)= - i H \psi(t) }

where H is called the Hamiltonian. If the initial state is \psi(0) then the solution is written as

\psi(t) = \exp(- i t H) \psi(0)

The probabilities | \langle i , \psi (t) \rangle |^2 are guaranteed to be correctly normalized when the Hamiltonian H is self-adjoint.

There are other matrices that are defined by the graph. Perhaps the most familiar is the Laplacian, which has recently been a topic on this blog (see parts 15, 16 and 20 of the series, and this recent post).

The Laplacian L is the n \times n matrix

L = D - A

where the degree matrix D is an n \times n diagonal matrix with elements given by the degrees

\displaystyle{ D_{i i}=\sum_{j} A_{i j} }

For the graph drawn above, the degree matrix and Laplacian are:

\left( \begin{matrix} 2 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 2 \end{matrix} \right)  \qquad \mathrm{and} \qquad \left( \begin{matrix} 2 & -1 & 0 & -1 & 0 \\ -1 & 2 & -1 & 0 & 0 \\ 0 & -1 & 3 & -1 & -1 \\ -1 & 0 & -1 & 3 & -1 \\ 0 & 0 & -1 & -1 & 2 \end{matrix} \right)

The Laplacian is self-adjoint and generates a quantum walk.

The Laplacian has another property; it is infinitesimal stochastic. This means that its off diagonal elements are non-positive and its columns sum to zero. This is interesting because an infinitesimal stochastic matrix generates a continuous-time stochastic walk.

To recap from the series, a stochastic walk is an evolution arising from a stochastic walker moving on a network.

A state of a stochastic walk is represented by a size n non-negative column vector \psi. Each element \langle i , \psi \rangle of this vector is the probability of the walker being found on node i.

A stochastic walk evolves in time according to the master equation

\displaystyle{ \frac{d}{d t} \psi(t)= - H \psi(t) }

where H is called the stochastic Hamiltonian. If the initial state is \psi(0) then the solution is written

\psi(t) = \exp(- t H) \psi(0)

The probabilities \langle i , \psi (t) \rangle are guaranteed to be non-negative and correctly normalized when the stochastic Hamiltonian H is infinitesimal stochastic.

So far, I have just presented what has been covered on Azimuth previously. However, to analyze the important uniform escape model we need to go beyond the class of (Dirichlet) generators that produce both quantum and stochastic walks. Further, we have to somehow find a related quantum walk. We’ll see below that both tasks are achieved by considering the normalized Laplacians: one generating the uniform escape stochastic walk and the other a related quantum walk.

Normalized Laplacians

The two normalized Laplacians are:

• the asymmetric normalized Laplacian S = L D^{-1} (that generates the uniform escape Stochastic walk) and

• the symmetric normalized Laplacian Q = D^{-1/2} L D^{-1/2} (that generates a Quantum walk).

For the graph drawn above the asymmetric normalized Laplacian S is

\left( \begin{matrix} 1 & -1/2 & 0 & -1/3 & 0 \\ -1/2 & 1 & -1/3 & 0 & 0 \\ 0 & -1/2 & 1 & -1/3 & -1/2 \\ -1/2 & 0 & -1/3 & 1 & -1/2 \\ 0 & 0 & -1/3 & -1/3 & 1 \end{matrix} \right)

The identical diagonal elements indicates that the total rates of leaving each node are identical, and the equality within each column of the other non-zero elements indicates that the walker is equally likely to hop to any node connected to its current node. This is the uniform escape model!

For the same graph the symmetric normalized Laplacian Q is

\left( \begin{matrix} 1 & -1/2 & 0 & -1/\sqrt{6}  & 0 \\ -1/2 & 1 & -1/\sqrt{6}  & 0 & 0 \\ 0 & -1/\sqrt{6}  & 1 & -1/3 & -1/\sqrt{6}  \\ -1/\sqrt{6} & 0 & -1/3 & 1 & -1/\sqrt{6}  \\ 0 & 0 & -1/\sqrt{6}  & -1/\sqrt{6}  & 1 \end{matrix} \right)

That the diagonal elements are identical in the quantum case indicates that all nodes are of equal energy, this is type of quantum walk usually considered.

Puzzle 1. Show that in general S is infinitesimal stochastic but not self-adjoint.

Puzzle 2. Show that in general Q is self-adjoint but not infinitesimal stochastic.

So a graph defines two matrices: one S that generates a stochastic walk, and one Q that generates a quantum walk. The natural question to ask is whether these walks are related. The answer is that they are!

Underpinning this relationship is the mathematical property that S and Q are similar. They are related by the following similarity transformation

S = D^{1/2} Q D^{-1/2}

which means that any eigenvector \phi_k of Q associated to eigenvalue \epsilon_k gives a vector

\pi_k \propto D^{1/2} \phi_k

that is an eigenvector of S with the same eigenvalue! To show this, insert the identity I = D^{-1/2} D^{1/2} into

Q \phi_k = \epsilon_k \phi_k

and multiply from the left with D^{1/2} to obtain

\begin{aligned} (D^{1/2}  Q D^{-1/2} ) (D^{1/2} \phi_k) &= \epsilon_k ( D^{1/2} \phi_k ) \\ S \pi_k &= \epsilon_k \pi_k  \end{aligned}

The same works in the opposite direction. Any eigenvector \pi_k of S gives an eigenvector

\phi_k \propto D^{-1/2} \pi_k

of Q with the same eigenvalue \epsilon_k.

The mathematics is particularly nice because Q is self-adjoint. A self-adjoint matrix is diagonalizable, and has real eigenvalues and orthogonal eigenvectors.

As a result, the symmetric normalized Laplacian can be decomposed as

Q = \sum_k \epsilon_k \Phi_k

where \epsilon_k is real and \Phi_k are orthogonal projectors. Each \Phi_k acts as the identity only on vectors in the space spanned by \phi_k and as zero on all others, such that

\Phi_k \Phi_\ell = \delta_{k \ell} \Phi_k.

Multiplying from the left by D^{1/2} and the right by D^{-1/2} results in a similar decomposition for S:

S = \sum_k \epsilon_k \Pi_k

with orthogonal projectors

\Pi_k  = D^{1/2} \Phi_k D^{-1/2}

I promised above that I would explain the following diagram:

Diagram outlining the main concepts (again)

Let’s summarize what it represents now:

G is a simple graph that specifies

A the adjacency matrix (generator of a quantum walk), which subtracted from

D the diagonal matrix of the degrees gives

L the symmetric Laplacian (generator of stochastic and quantum walks), which when normalized by D returns both

S the generator of the uniform escape stochastic walk and

Q the quantum walk generator to which it is similar!

What next?

Sadly, this is where we’ll finish for now.

We have all the ingredients necessary to study the walks generated by the normalized Laplacians and exploit the relationship between them.

Next time, in part 2, I’ll talk you through the mathematics of the uniform escape stochastic walk S and how it connects to the degrees of the nodes in the long-time limit. Then I’ll show you how this helps us solve aspects of the quantum walk generated by Q.

In other news

Before I leave you, let me tell you about a workshop the ISI team recently attended (in fact helped organize) at the Institute of Quantum Computing, on the topic of quantum computation and complex networks. Needless to say, there were talks on papers related to quantum mechanics and networks!

Some researchers at the workshop gave exciting talks based on numerical examinations of what happens if a quantum walk is used instead of a stochastic walk to rank the nodes of a network:

• Giuseppe Davide Paparo and Miguel Angel Martín-Delgado, Google in a quantum network, Sci. Rep. 2 (2012), 444.

• Eduardo Sánchez-Burillo, Jordi Duch, Jesús Gómez-Gardenes and David Zueco, Quantum navigation and ranking in complex networks, Sci. Rep. 2 (2012), 605.

Others attending the workshop have numerically examined what happens when using quantum computers to represent the stationary state of a stochastic process:

• Silvano Garnerone, Paolo Zanardi and Daniel A. Lidar, Adiabatic quantum algorithm for search engine ranking, Phys. Rev. Lett. 108 (2012), 230506.

It was a fun workshop and we plan to organize/attend more in the future!

Quantum Techniques for Reaction Networks

11 June, 2013

Fans of the network theory series might like to look at this paper:

• John Baez, Quantum techniques for reaction networks.

and I would certainly appreciate comments and corrections.

This paper tackles a basic question we never got around to discussing: how the probabilistic description of a system where bunches of things randomly interact and turn into other bunches of things can reduce to a deterministic description in the limit where there are lots of things!

Mathematically, such systems are given by ‘stochastic Petri nets’, or if you prefer, ‘stochastic reaction networks’. These are just two equivalent pictures of the same thing. For example, we could describe some chemical reactions using this Petri net:

but chemists would use this reaction network:

C + O2 → CO2
CO2 + NaOH → NaHCO3
NaHCO3 + HCl → H2O + NaCl + CO2

Making either of them ‘stochastic’ merely means that we specify a ‘rate constant’ for each reaction, saying how probable it is.

For any such system we get a ‘master equation’ describing how the probability of having any number of things of each kind changes with time. In the class I taught on this last quarter, the students and I figured out how to derive from this an equation saying how the expected number of things of each kind changes with time. Later I figured out a much slicker argument… but either way, we get this result:

Theorem. For any stochastic reaction network and any stochastic state \Psi(t) evolving in time according to the master equation, then

\displaystyle{ \frac{d}{dt} \langle N \Psi(t) \rangle } =  \displaystyle{\sum_{\tau \in T}} \, r(\tau) \,  (s(\tau) - t(\tau)) \;  \left\langle N^{\underline{s(\tau)}}\, \Psi(t) \right\rangle

assuming the derivative exists.

Of course this will make no sense yet if you haven’t been following the network theory series! But I explain all the notation in the paper, so don’t be scared. The main point is that \langle N \Psi(t) \rangle is a vector listing the expected number of things of each kind at time t. The equation above says how this changes with time… but it closely resembles the ‘rate equation’, which describes the evolution of chemical systems in a deterministic way.

And indeed, the next big theorem says that the master equation actually implies the rate equation when the probability of having various numbers of things of each kind is given by a product of independent Poisson distributions. In this case \Psi(t) is what people in quantum physics call a ‘coherent state’. So:

Theorem. Given any stochastic reaction network, let
\Psi(t) be a mixed state evolving in time according to the master equation. If \Psi(t) is a coherent state when t = t_0, then \langle N \Psi(t) \rangle obeys the rate equation when t = t_0.

In most cases, this only applies exactly at one moment of time: later \Psi(t) will cease to be a coherent state. Then we must resort to the previous theorem to see how the expected number of things of each kind changes with time.

But sometimes our state \Psi(t) will stay coherent forever! For one case where this happens, see the companion paper, which I blogged about a little while ago:

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

We wrote this first, but logically it comes after the one I just finished now!

All this material will get folded into the book I’m writing with Jacob Biamonte. There are just a few remaining loose ends that need to be tied up.

Network Theory (Part 29)

23 April, 2013

I’m talking about electrical circuits, but I’m interested in them as models of more general physical systems. Last time we started seeing how this works. We developed an analogy between electrical circuits and physical systems made of masses and springs, with friction:

Electronics Mechanics
charge: Q position: q
current: I = \dot{Q} velocity: v = \dot{q}
flux linkage: \lambda momentum: p
voltage: V = \dot{\lambda} force: F = \dot{p}
inductance: L mass: m
resistance: R damping coefficient: r  
inverse capacitance: 1/C   spring constant: k

But this is just the first of a large set of analogies. Let me list some, so you can see how wide-ranging they are!

More analogies

People in system dynamics often use effort as a term to stand for anything analogous to force or voltage, and flow as a general term to stand for anything analogous to velocity or electric current. They call these variables e and f.

To me it’s important that force is the time derivative of momentum, and velocity is the time derivative of position. Following physicists, I write momentum as p and position as q. So, I’ll usually write effort as \dot{p} and flow as \dot{q}.

Of course, ‘position’ is a term special to mechanics; it’s nice to have a general term for the thing whose time derivative is flow, that applies to any context. People in systems dynamics seem to use displacement as that general term.

It would also be nice to have a general term for the thing whose time derivative is effort… but I don’t know one. So, I’ll use the word momentum.

Now let’s see the analogies! Let’s see how displacement q, flow \dot{q}, momentum p and effort \dot{p} show up in several subjects:

displacement:    q flow:      \dot q momentum:      p effort:           \dot p
Mechanics: translation position velocity momentum force
Mechanics: rotation angle angular velocity angular momentum torque
Electronics charge current flux linkage voltage
Hydraulics volume flow pressure momentum pressure
Thermal Physics entropy entropy flow temperature momentum temperature
Chemistry moles molar flow chemical momentum chemical potential

We’d been considering mechanics of systems that move along a line, via translation, but we can also consider mechanics for systems that turn round and round, via rotation. So, there are two rows for mechanics here.

There’s a row for electronics, and then a row for hydraulics, which is closely analogous. In this analogy, a pipe is like a wire. The flow of water plays the role of current. Water pressure plays the role of electrostatic potential. The difference in water pressure between two ends of a pipe is like the voltage across a wire. When water flows through a pipe, the power equals the flow times this pressure difference—just as in an electrical circuit the power is the current times the voltage across the wire.

A resistor is like a narrowed pipe:

An inductor is like a heavy turbine placed inside a pipe: this makes the water tend to keep flowing at the same rate it’s already flowing! In other words, it provides a kind of ‘inertia’ analogous
to mass.

A capacitor is like a tank with pipes coming in from both ends, and a rubber sheet dividing it in two lengthwise:

When studying electrical circuits as a kid, I was shocked when I first learned that capacitors don’t let the electrons through: it didn’t seem likely you could do anything useful with something like that! But of course you can. Similarly, this gizmo doesn’t let the water through.

A voltage source is like a compressor set up to maintain a specified pressure difference between the input and output:

Similarly, a current source is like a pump set up to maintain a specified flow.

Finally, just as voltage is the time derivative of a fairly obscure quantity called ‘flux linkage’, pressure is the time derivative of an even more obscure quantity which has no standard name. I’m calling it ‘pressure momentum’, thanks to the analogy

momentum: force :: pressure momentum: pressure

Just as pressure has units of force per area, pressure momentum has units of momentum per area!

People invented this analogy back when they were first struggling to understand electricity, before electrons had been observed:

Hydraulic analogy, Wikipedia.

The famous electrical engineer Oliver Heaviside pooh-poohed this analogy, calling it the “drain-pipe theory”. I think he was making fun of William Henry Preece. Preece was another electrical engineer, who liked the hydraulic analogy and disliked Heaviside’s fancy math. In his inaugural speech as president of the Institution of Electrical Engineers in 1893, Preece proclaimed:

True theory does not require the abstruse language of mathematics to make it clear and to render it acceptable. All that is solid and substantial in science and usefully applied in practice, have been made clear by relegating mathematic symbols to their proper store place—the study.

According to the judgement of history, Heaviside made more progress in understanding electromagnetism than Preece. But there’s still a nice analogy between electronics and hydraulics. And I’ll eventually use the abstruse language of mathematics to make it very precise!

But now let’s move on to the row called ‘thermal physics’. We could also call this ‘thermodynamics’. It works like this. Say you have a physical system in thermal equilibrium and all you can do is heat it up or cool it down ‘reversibly’—that is, while keeping it in thermal equilibrium all along. For example, imagine a box of gas that you can heat up or cool down. If you put a tiny amount dE of energy into the system in the form of heat, then its entropy increases by a tiny amount dS. And they’re related by this equation:

dE = TdS

where T is the temperature.

Another way to say this is

\displaystyle{ \frac{dE}{dt} = T \frac{dS}{dt} }

where t is time. On the left we have the power put into the system in the form of heat. But since power should be ‘effort’ times ‘flow’, on the right we should have ‘effort’ times ‘flow’. It makes some sense to call dS/dt the ‘entropy flow’. So temperature, T, must play the role of ‘effort’.

This is a bit weird. I don’t usually think of temperature as a form of ‘effort’ analogous to force or torque. Stranger still, our analogy says that ‘effort’ should be the time derivative of some kind of ‘momentum’, So, we need to introduce temperature momentum: the integral of temperature over time. I’ve never seen people talk about this concept, so it makes me a bit nervous.

But when we have a more complicated physical system like a piston full of gas in thermal equilibrium, we can see the analogy working. Now we have

dE = TdS - PdV

The change in energy dE of our gas now has two parts. There’s the change in heat energy TdS, which we saw already. But now there’s also the change in energy due to compressing the piston! When we change the volume of the gas by a tiny amount dV, we put in energy -PdV.

Now look back at the first chart I drew! It says that pressure is a form of ‘effort’, while volume is a form of ‘displacement’. If you believe that, the equation above should help convince you that temperature is also a form of effort, while entropy is a form of displacement.

But what about the minus sign? That’s no big deal: it’s the result of some arbitrary conventions. P is defined to be the outward pressure of the gas on our piston. If this is positive, reducing the volume of the gas takes a positive amount of energy, so we need to stick in a minus sign. I could eliminate this minus sign by changing some conventions—but if I did, the chemistry professors at UCR would haul me away and increase my heat energy by burning me at the stake.

Speaking of chemistry: here’s how the chemistry row in the analogy chart works. Suppose we have a piston full of gas made of different kinds of molecules, and there can be chemical reactions that change one kind into another. Now our equation gets fancier:

\displaystyle{ dE = TdS - PdV + \sum_i  \mu_i dN_i }

Here N_i is the number of molecules of the ith kind, while \mu_i is a quantity called a chemical potential. The chemical potential simply says how much energy it takes to increase the number of molecules of a given kind. So, we see that chemical potential is another form of effort, while number of molecules is another form of displacement.

But chemists are too busy to count molecules one at a time, so they count them in big bunches called ‘moles’. A mole is the number of atoms in 12 grams of carbon-12. That’s roughly


atoms. This is called Avogadro’s constant. If we used 1 gram of hydrogen, we’d get a very close number called ‘Avogadro’s number’, which leads to lots of jokes:

(He must be desperate because he looks so weird… sort of like a mole!)

So, instead of saying that the displacement in chemistry is called ‘number of molecules’, you’ll sound more like an expert if you say ‘moles’. And the corresponding flow is called molar flow.

The truly obscure quantity in this row of the chart is the one whose time derivative is chemical potential! I’m calling it chemical momentum simply because I don’t know another name.

Why are linear and angular momentum so famous compared to pressure momentum, temperature momentum and chemical momentum?

I suspect it’s because the laws of physics are symmetrical
under translations and rotations. When the assumptions of Noether’s theorem hold, this guarantees that the total momentum and angular momentum of a closed system are conserved. Apparently the laws of physics lack the symmetries that would make the other kinds of momentum be conserved.

This suggests that we should dig deeper and try to understand more deeply how this chart is connected to ideas in classical mechanics, like Noether’s theorem or symplectic geometry. I will try to do that sometime later in this series.

More generally, we should try to understand what gives rise to a row in this analogy chart. Are there are lots of rows I haven’t talked about yet, or just a few? There are probably lots. But are there lots of practically important rows that I haven’t talked about—ones that can serve as the basis for new kinds of engineering? Or does something about the structure of the physical world limit the number of such rows?

Mildly defective analogies

Engineers care a lot about dimensional analysis. So, they often make a big deal about the fact that while effort and flow have different dimensions in different rows of the analogy chart, the following four things are always true:

pq has dimensions of action (= energy × time)
\dot{p} q has dimensions of energy
p \dot{q} has dimensions of energy
\dot{p} \dot{q} has dimensions of power (= energy / time)

In fact any one of these things implies all the rest.

These facts are important when designing ‘mixed systems’, which combine different rows in the chart. For example, in mechatronics, we combine mechanical and electronic elements in a single circuit! And in a hydroelectric dam, power is converted from hydraulic to mechanical and then electric form:

One goal of network theory should be to develop a unified language for studying mixed systems! Engineers have already done most of the hard work. And they’ve realized that thanks to conservation of energy, working with pairs of flow and effort variables whose product has dimensions of power is very convenient. It makes it easy to track the flow of energy through these systems.

However, people have tried to extend the analogy chart to include ‘mildly defective’ examples where effort times flow doesn’t have dimensions of power. The two most popular are these:

displacement:    q flow:      \dot q momentum:      p effort:           \dot p
Heat flow heat heat flow temperature momentum temperature
Economics inventory product flow economic momentum product price

The heat flow analogy comes up because people like to think of heat flow as analogous to electrical current, and temperature as analogous to voltage. Why? Because an insulated wall acts a bit like a resistor! The current flowing through a resistor is a function the voltage across it. Similarly, the heat flowing through an insulated wall is about proportional to the difference in temperature between the inside and the outside.

However, there’s a difference. Current times voltage has dimensions of power. Heat flow times temperature does not have dimensions of power. In fact, heat flow by itself already has dimensions of power! So, engineers feel somewhat guilty about this analogy.

Being a mathematical physicist, a possible way out presents itself to me: use units where temperature is dimensionless! In fact such units are pretty popular in some circles. But I don’t know if this solution is a real one, or whether it causes some sort of trouble.

In the economic example, ‘energy’ has been replaced by ‘money’. So other words, ‘inventory’ times ‘product price’ has units of money. And so does ‘product flow’ times ‘economic momentum’! I’d never heard of economic momentum before I started studying these analogies, but I didn’t make up that term. It’s the thing whose time derivative is ‘product price’. Apparently economists have noticed a tendency for rising prices to keep rising, and falling prices to keep falling… a tendency toward ‘conservation of momentum’ that doesn’t fit into their models of rational behavior.

I’m suspicious of any attempt to make economics seem like physics. Unlike elementary particles or rocks, people don’t seem to be very well modelled by simple differential equations. However, some economists have used the above analogy to model economic systems. And I can’t help but find that interesting—even if intellectually dubious when taken too seriously.

An auto-analogy

Beside the analogy I’ve already described between electronics and mechanics, there’s another one, called ‘Firestone’s analogy’:

• F.A. Firestone, A new analogy between mechanical and electrical systems, Journal of the Acoustical Society of America 4 (1933), 249–267.

Alain Bossavit pointed this out in the comments to Part 27. The idea is to treat current as analogous to force instead of velocity… and treat voltage as analogous to velocity instead of force!

In other words, switch your p’s and q’s:

Electronics Mechanics          (usual analogy) Mechanics      (Firestone’s analogy)
charge position: q momentum: p
current velocity: \dot{q} force: \dot{p}
flux linkage momentum: p position: q
voltage force: \dot{p} velocity: \dot{q}

This new analogy is not ‘mildly defective’: the product of effort and flow variables still has dimensions of power. But why bother with another analogy?

It may be helpful to recall this circuit from last time:

It’s described by this differential equation:

L \ddot{Q} + R \dot{Q} + C^{-1} Q = V

We used the ‘usual analogy’ to translate it into classical mechanics problem, and we got a problem where an object of mass L is hanging from a spring with spring constant 1/C and damping coefficient R, and feeling an additional external force F:

m \ddot{q} + r \dot{q} + k q = F

And that’s fine. But there’s an intuitive sense in which all three forces are acting ‘in parallel’ on the mass, rather than in series. In other words, all side by side, instead of one after the other.

Using Firestone’s analogy, we get a different classical mechanics problem, where the three forces are acting in series. The spring is connected to source of friction, which in turn is connected to an external force.

This may seem a bit mysterious. But instead of trying to explain it, I’ll urge you to read his paper, which is short and clearly written. I instead want to make a somewhat different point, which is that we can take a mechanical system, convert it to an electrical one following the usual analogy, and then convert back to a mechanical one using Firestone’s analogy. This gives us an ‘auto-analogy’ between mechanics and itself, which switches p and q.

And although I haven’t been able to figure out why from Firestone’s paper, I have other reasons for feeling sure this auto-analogy should contain a minus sign. For example:

p \mapsto q, \qquad q \mapsto -p

In other words, it should correspond to a 90° rotation in the (p,q) plane. There’s nothing sacred about whether we rotate clockwise or counterclockwise; we can equally well do this:

p \mapsto -q, \qquad q \mapsto p

But we need the minus sign to get a so-called symplectic transformation of the (p,q) plane. And from my experience with classical mechanics, I’m pretty sure we want that. If I’m wrong, please let me know!

I have a feeling we should revisit this issue when we get more deeply into the symplectic aspects of circuit theory. So, I won’t go on now.


The analogies I’ve been talking about are studied in a branch of engineering called system dynamics. You can read more about it here:

• Dean C. Karnopp, Donald L. Margolis and Ronald C. Rosenberg, System Dynamics: a Unified Approach, Wiley, New York, 1990.

• Forbes T. Brown, Engineering System Dynamics: a Unified Graph-Centered Approach, CRC Press, Boca Raton, 2007.

• Francois E. Cellier, Continuous System Modelling, Springer, Berlin, 1991.

System dynamics already uses lots of diagrams of networks. One of my goals in weeks to come is to explain the category theory lurking behind these diagrams.

Petri Net Programming (Part 3)

19 April, 2013

guest post by David Tanzer

The role of differential equations

Last time we looked at stochastic Petri nets, which use a random event model for the reactions. Individual entities are represented by tokens that flow through the network. When the token counts get large, we observed that they can be approximated by continuous quantities, which opens the door to the application of continuous mathematics to the analysis of network dynamics.

A key result of this approach is the “rate equation,” which gives a law of motion for the expected sizes of the populations. Equilibrium can then be obtained by solving for zero motion. The rate equations are applied in chemistry, where they give the rates of change of the concentrations of the various species in a reaction.

But before discussing the rate equation, here I will talk about the mathematical form of this law of motion, which consists of differential equations. This form is naturally associated with deterministic systems involving continuous magnitudes. This includes the equations of motion for the sine wave:

and the graceful ellipses that are traced out by the orbits of the planets around the sun:

This post provides some mathematical context to programmers who have not worked on scientific applications. My goal is to get as many of you on board as possible, before setting sail with Petri net programming.

Three approaches to equations: theoretical, formula-based, and computational

Let’s first consider the major approaches to equations in general. We’ll illustrate with a Diophantine equation

x^9 + y^9 + z^9 = 2

where x, y and z are integer variables.

In the theoretical approach (aka “qualitative analysis”), we start with the meaning of the equation and then proceed to reason about its solutions. Here are some simple consequences of this equation. They can’t all be zero, can’t all be positive, can’t all be negative, can’t all be even, and can’t all be odd.

In the formula-based approach, we seek formulas to describe the solutions. Here is an example of a formula (which does not solve our equation):

\{(x,y,z) | x = n^3, y = 2n - 4, z = 4 n | 1 \leq n \leq 5 \}

Such formulas are nice to have, but the pursuit of them is diabolically difficult. In fact, for Diophantine equations, even the question of whether an arbitrarily chosen equation has any solutions whatsoever has been proven to be algorithmically undecidable.

Finally, in the computational approach, we seek algorithms to enumerate or numerically approximate the solutions to the equations.

The three approaches to differential equations

Let’s apply the preceding classification to differential equations.

Theoretical approach

A differential equation is one that constrains the rates at which the variables are changing. This can include constraints on the rates at which the rates are changing (second-order equations), etc. The equation is ordinary if there is a single independent variable, such as time, otherwise it is partial.

Consider the equation stating that a variable increases at a rate equal to its current value. The bigger it gets, the faster it increases. Given a starting value, this determines a process — the solution to the equation — which here is exponential growth.

Let X(t) be the value at time t, and let’s initialize it to 1 at time 0. So we have:

X(0) = 1

X'(t) = X(t)

These are first-order equations, because the derivative is applied at most once to any variable. They are linear equations, because the terms on each side of the equations are linear combinations of either individual variables or derivatives (in this case all of the coefficients are 1). Note also that a system of differential equations may in general have zero, one, or multiple solutions. This example belongs to a class of equations which are proven to have a unique solution for each initial condition.

You could imagine more complex systems of equations, involving multiple dependent variables, all still depending on time. That includes the rate equations for a Petri net, which have one dependent variable for each of the population sizes. The ideas for such systems are an extension of the ideas for a single-variable system. Then, a state of the system is a vector of values, with one component for each of the dependent variables. For first-order systems, such as the rate equations, where the derivatives appear on the left-hand sides, the equations determine, for each possible state of the system, a “direction” and rate of change for the state of the system.

Now here is a simple illustration of what I called the theoretical approach. Can X(t) ever become negative? No, because it starts out positive at time 0, and in order to later become negative, it must be decreasing at a time t_1 when it is still positive. That is to say, X(t_1) > 0, and X'(t_1) < 0. But that contradicts the assumption X'(t) = X(t). The general lesson here is that we don’t need a solution formula in order to make such inferences.

For the rate equations, the theoretical approach leads to substantial theorems about the existence and structure of equilibrium solutions.

Formula-based approach

It is natural to look for concise formulas to solve our equations, but the results of this overall quest are largely negative. The exponential differential equation cannot be solved by any formula that involves a finite combination of simple operations. So the solution function must be treated as a new primitive, and given a name, say \exp(t). But even when we extend our language to include this new symbol, there are many differential equations that remain beyond the reach of finite formulas. So an endless collection of primitive functions is called for. (As standard practice, we always include exp(t), and its complex extensions to the trigonometric functions, as primitives in our toolbox.)

But the hard mathematical reality does not end here, because even when solution formulas do exist, finding them may call for an ace detective. Only for certain classes of differential equations, such as the linear ones, do we have systematic solution methods.

The picture changes, however, if we let the formulas contain an infinite number of operations. Then the arithmetic operators give a far-reaching base for defining new functions. In fact, as you can verify, the power series

X(t) = 1 + t + t^2/2! + t^3/3! + ...

which we view as an “infinite polynomial” over the time parameter t, exactly satisfies our equations for exponential motion, X(0) = 1 and X'(t) = X(t). This power series therefore defines \exp(t). By the way, applying it to the input 1 produces a definition for the transcendental number e:

e = X(1) = 1 + 1 + 1/2 + 1/6 + 1/24 + 1/120 + ... \approx 2.71828

Computational approach

Let’s leave aside our troubles with formulas, and consider the computational approach. For broad classes of differential equations, there are approximation algorithms that be successfully applied.

For starters, any power series that satisfies a differential equation may work for a simple approximation method. If a series is known to converge over some range of inputs, then one can approximate the value at those points by stopping the computation after a finite number of terms.

But the standard methods work directly with the equations, provided that they can be put into the right form. The simplest one is called Euler’s method. It works over a sampling grid of points separated by some small number \epsilon. Let’s take the case where we have a first-order equation in explicit form, which means that X'(t) = f(X(t)) for some function f.

We begin with the initial value X(0). Applying f, we get X'(0) = f(X(0)). Then for the interval from 0 to \epsilon,we use a linear approximation for X(t), by assuming that the derivative remains constant at X'(0). That gives X(\epsilon) = X(0) + \epsilon \cdot X'(0). Next, X'(\epsilon) = f(X(\epsilon)), and X(2 \epsilon) = X(\epsilon) + \epsilon \cdot X'(\epsilon), etc. Formally,

X(0) = \textrm{initial}

X((n+1) \epsilon) = X(n \epsilon) + \epsilon f(X(n \epsilon))

Applying this to our exponential equation, where f(X(t)) = X(t), we get:

X(0) = 1

X((n+1) \epsilon) = X(n \epsilon) + \epsilon X(n \epsilon) = X(n \epsilon) (1 + \epsilon)


X(n \epsilon) = (1 + \epsilon) ^ n

So the approximation method gives a discrete exponential growth, which converges to a continuous exponential in the limit as the mesh size goes to zero.

Note, the case we just considered has more generality than might appear at first, because (1) the ideas here are easily extended to systems of explicit first order equations, and (2) higher-order equations that are “explicit” in an extended sense—meaning that the highest-order derivative is expressed as a function of time, of the variable, and of the lower-order derivatives—can be converted into an equivalent system of explicit first-order equations.

The challenging world of differential equations

So, is our cup half-empty or half-full? We have no toolbox of primitive formulas for building the solutions to all differential equations by finite compositions. And even for those which can be solved by formulas, there is no general method for finding the solutions. That is how the cookie crumbles. But on the positive side, there is an array of theoretical tools for analyzing and solving important classes of differential equations, and numerical methods can be applied in many cases.

The study of differential equations leads to some challenging problems, such as the Navier-Stokes equations, which describe the flow of fluids.

These are partial differential equations involving flow velocity, pressure, density and external forces (such as gravity), all of which vary over space and time. There are non-linear (multiplicative) interactions between these variables and their spatial and temporal derivatives, which leads to complexity in the solutions.

At high flow rates, this complexity can produce chaotic solutions, which involve complex behavior at a wide range of resolution scales. This is turbulence. Here is an insightful portrait of turbulence, by Leonardo da Vinci, whose studies in turbulence date back to the 15th Century.

Turbulence, which has been described by Richard Feynman as the most important unsolved problem of classical physics, also presents a mathematical puzzle. The general existence of solutions to the Navier-Stokes equations remains unsettled. This is one of the “Millennium Prize Problems”, for which a one million dollar prize is offered: in three dimensions, given initial values for the velocity and scalar fields, does there exist a solution that is smooth and everywhere defined? There are also complications with grid-based numerical methods, which will fail to produce globally accurate results if the solutions contain details at a smaller scale than the grid mesh. So the ubiquitous phenomenon of turbulence, which is so basic to the movements of the atmosphere and the seas, remains an open case.

But fortunately we have enough traction with differential equations to proceed directly with the rate equations for Petri nets. There we will find illuminating equations, which are the subject of both major theorems and open problems. They are non-linear and intractable by formula-based methods, yet, as we will see, they are well handled by numerical methods.

Network Theory (Part 28)

10 April, 2013

Last time I left you with some puzzles. One was to use the laws of electrical circuits to work out what this one does:

If we do this puzzle, and keep our eyes open, we’ll see an analogy between electrical circuits and classical mechanics! And this is the first of a huge set of analogies. The same math shows up in many different subjects, whenever we study complex systems made of interacting parts. So, it should become part of any general theory of networks.

This simple circuit is very famous: it’s called a series RLC circuit, because it has a resistor of resistance R, an inductor of inductance L, and a capacitor of capacitance C, all hooked up ‘in series’, meaning one after another. But understand this circuit, it’s good to start with an even simpler one, where we leave out the voltage source:

This has three edges, so reading from top to bottom there are 3 voltages V_1, V_2, V_3, and 3 currents I_1, I_2, I_3, one for each edge. The white and black dots are called ‘nodes’, and the white ones are called ‘terminals’: current can flow in or out of those.

The voltages and currents obey a bunch of equations:

• Kirchhoff’s current law says the current flowing into each node that’s not a terminal equals the current flowing out:

I_1 = I_2 = I_3

• Kirchhoff’s voltage law says there are potentials \phi_0, \phi_1, \phi_2, \phi_3, one for each node, such that:

V_1 = \phi_0 - \phi_1

V_2 = \phi_1 - \phi_2

V_3 = \phi_2 - \phi_3

In this particular problem, Kirchhoff’s voltage law doesn’t say much, since we can always find potentials obeying this, given the voltages. But in other problems it can be important. And even here it suggests that the sum V_1 + V_2 + V_3 will be important; this is the ‘total voltage across the circuit’.

Next, we get one equation for each circuit element:

• The law for a resistor says:

V_1 = R I_1

The law for a inductor says:

\displaystyle{ V_2 = L \frac{d I_2}{d t} }

The law for a capacitor says:

\displaystyle{ I_3 = C \frac{d V_3}{d t} }

These are all our equations. What should we do with them? Since I_1 = I_2 = I_3, it makes sense to call all these currents simply I and solve for each voltage in terms of this. Here’s what we get:

V_1 = R I

\displaystyle{ V_2 = L \frac{d I}{d t} }

\displaystyle {V_3 = C^{-1} \int I \, dt }

So, if we know the current flowing through the circuit we can work out the voltage across each circuit element!

Well, not quite: in the case of the capacitor we only know it up to a constant, since there’s a constant of integration. This may seem like a minor objection, but it’s worth taking seriously. The point is that the charge on the capacitor’s plate is proportional to the voltage across the capacitor:

\displaystyle{V_3 = C^{-1} Q }

When electrons move on or off the plate, this charge changes, and we get a current:

\displaystyle{I = \frac{d Q}{d t} }

So, we can work out the time derivative of V_3 from the current I, but to work out V_3 itself we need the charge Q.

Treat these as definitions if you like, but they’re physical facts too! And they let us rewrite our trio of equations:

V_1 = R I

\displaystyle{ V_2 = L \frac{d I}{d t} }

\displaystyle{V_3 = C^{-1} \int I \, dt }

in terms of the charge, as follows:

V_1 = R \dot{Q}

V_2 = L \ddot{Q}

V_3 = C^{-1} Q

Then if we add these three equations, we get

V_1 + V_2 + V_3 = L \ddot Q + R \dot Q + C^{-1} Q

So, if we define the total voltage by

V = V_1 + V_2 + V_3 = \phi_0 - \phi_3

we get

L \ddot Q + R \dot Q + C^{-1} Q = V

And this is great!

Why? Because this equation is famous! If you’re a mathematician, you know it as the most general second-order linear ordinary differential equation with constant coefficients. But if you’re a physicist, you know it as the damped driven oscillator.

The analogy between electronics and mechanics

Here’s an example of a damped driven oscillator:

We’ve got an object hanging from a spring with some friction, and an external force pulling it down. Here the external force is gravity, so it’s constant in time, but we can imagine fancier situations where it’s not. So in a general damped driven oscillator:

• the object has mass m (and the spring is massless),

• the spring constant is k (this says how strong the spring force is),

• the damping coefficient is r (this says how much friction there is),

• the external force is F (in general a function of time).

Then Newton’s law says

m \ddot{q} + r \dot{q} + k q = F

And apart from the use of different letters, this is exactly like the equation for our circuit! Remember, that was

L \ddot Q + R \dot Q + C^{-1} Q = V

So, we get a wonderful analogy relating electronics and mechanics! It goes like this:

Electronics Mechanics
charge: Q position: q
current: I = \dot{Q} velocity: v = \dot{q}
voltage: V force: F
inductance: L mass: m
resistance: R damping coefficient: r  
inverse capacitance: 1/C   spring constant: k

If you understand mechanics, you can use this to get intuition about electronics… or vice versa. I’m more comfortable with mechanics, so when I see this circuit:

I imagine a current of electrons whizzing along, ‘forced’ by the voltage across the circuit, getting slowed by the ‘friction’ of the resistor, wanting to continue their motion thanks to the inertia or ‘mass’ of the inductor, and getting stuck on the plate of the capacitor, where their mutual repulsion pushes back against the flow of current—just like a spring fights back when you pull on it! This lets me know how the circuit will behave: I can use my mechanical intuition.

The only mildly annoying thing is that the inverse of the capacitance C is like the spring constant k. But this makes perfect sense. A capacitor is like a spring: you ‘pull’ on it with voltage and it ‘stretches’ by building up electric charge on its plate. If its capacitance is high, it’s like a easily stretchable spring. But this means the corresponding spring constant is low.

Besides letting us transfer intuition and techniques, the other great thing about analogies is that they suggest ways of extending themselves. For example, we’ve seen that current is the time derivative of charge. But if we hadn’t, we could still have guessed it, because current is like velocity, which is the time derivative of something important.

Similarly, force is analogous to voltage. But force is the time derivative of momentum! We don’t have momentum on our chart. Our chart is also missing the thing whose time derivative is voltage. This thing is called flux linkage, and sometimes denotes \lambda. So we should add this, and momentum, to our chart:

Electronics Mechanics
charge: Q position: q
current: I = \dot{Q} velocity: v = \dot{q}
flux linkage: \lambda momentum: p
voltage: V = \dot{\lambda} force: F = \dot{p}
inductance: L mass: m
resistance: R damping coefficient: r  
inverse capacitance: 1/C   spring constant: k

Fourier transforms

But before I get carried away talking about analogies, let’s try to solve the equation for our circuit:

L \ddot Q + R \dot Q + C^{-1} Q = V

This instantly tells us the voltage V as a function of time if we know the charge Q as a function of time. So, ‘solving’ it means figuring out Q if we know V. You may not care about Q—it’s the charge of the electrons stuck on the capacitor—but you should certainly care about the current I = \dot{Q}, and figuring out Q will get you that.

Besides, we’ll learn something good from solving this equation.

We could solve it using either the Laplace transform or the Fourier transform. They’re very similar. For some reason electrical engineers prefer the Laplace transform—does anyone know why? But I think the Fourier transform is conceptually preferable, slightly, so I’ll use that.

The idea is to write any function of time as a linear combination of oscillating functions \exp(i\omega t) with different frequencies \omega. More precisely, we write our function f as an integral

\displaystyle{ f(t) = \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^\infty \hat{f}(\omega) e^{i\omega t} \, d\omega }

Here the function \hat{f} is called the Fourier transform of f, and it’s given by

\displaystyle{ \hat{f}(\omega) = \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^\infty f(t) e^{-i\omega t} \, dt }

There is a lot one could say about this, but all I need right now is that differentiating a function has the effect of multiplying its Fourier transform by i\omega. To see this, we simply take the Fourier transform of \dot{f}:

\begin{array}{ccl}  \hat{\dot{f}}(\omega) &=& \displaystyle{  \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^\infty \frac{df(t)}{dt} \, e^{-i\omega t} \, dt } \\  \\  &=& \displaystyle{ -\frac{1}{\sqrt{2 \pi}} \int_{-\infty}^\infty f(t) \frac{d}{dt} e^{-i\omega t} \, dt } \\  \\  &=& \displaystyle{ i\omega \; \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^\infty f(t) e^{-i\omega t} \, dt } \\  \\  &=& i\omega \hat{f}(\omega) \end{array}

where in the second step we integrate by parts. So,

\hat{\dot{f}}(\omega) = i\omega \hat{f}(\omega)

The Fourier transform is linear, too, so we can start with our differential equation:

L \ddot Q + R \dot Q + C^{-1} Q = V

and take the Fourier transform of each term, getting

\displaystyle{ \left((i\omega)^2 L + (i\omega) R + C^{-1}\right) \hat{Q}(\omega) = \hat{V}(\omega) }

We can now solve for the charge in a completely painless way:

\displaystyle{  \hat{Q}(\omega) =  \frac{1}{((i\omega)^2 L + (i\omega) R + C^{-1})} \, \hat{V}(\omega) }

Well, we actually solved for \hat{Q} in terms of \hat{V}. But if we’re good at taking Fourier transforms, this is good enough. And it has a deep inner meaning.

To see its inner meaning, note that the Fourier transform of an oscillating function \exp(i \omega_0 t) is a delta function at the frequency \omega = \omega_0. This says that this oscillating function is purely of frequency \omega_0, like a laser beam of one pure color, or a sound of one pure pitch.

Actually there’s a little fudge factor due to how I defined the Fourier transform: if

f(t) = e^{i\omega_0 t}


\displaystyle{ \hat{f}(\omega) = \sqrt{2 \pi} \, \delta(\omega - \omega_0) }

But it’s no big deal. (You can define your Fourier transform so the 2\pi doesn’t show up here, but it’s bound to show up somewhere.)

Also, you may wonder how the complex numbers got into the game. What would it mean to say the voltage is \exp(i \omega t)? The answer is: don’t worry, everything in sight is linear, so we can take the real or imaginary part of any equation and get one that makes physical sense.

Anyway, what does our relation

\displaystyle{  \hat{Q}(\omega) =  \frac{1}{((i\omega)^2 L + (i\omega) R + C^{-1})} \hat{V}(\omega) }

mean? It means that if we put an oscillating voltage of frequency \omega_0 across our circuit, like this:

V(t) = e^{i \omega_0 t}

then we’ll get an oscillating charge at the same frequency, like this:

\displaystyle{  Q(t) =  \frac{1}{((i\omega_0)^2 L + (i\omega_0) R + C^{-1})}  e^{i \omega_0 t}  }

To see this, just use the fact that the Fourier transform of \exp(i \omega_0 t) is essentially a delta function at \omega_0, and juggle the equations appropriately!

But the magnitude and phase of this oscillating charge Q(t) depends on the function

\displaystyle{ \frac{1}{((i\omega_0)^2 L + (i\omega_0) R + C^{-1})}  }

For example, Q(t) will be big when \omega_0 is near a pole of this function! We can use this to study the resonant frequency of our circuit.

The same idea works for many more complicated circuits, and other things too. The function up there is an example of a transfer function: it describes the response of a linear, time-invariant system to an input of a given frequency. Here the ‘input’ is the voltage and the ‘response’ is the charge.


Taking this idea to its logical conclusion, we can see inductors and capacitors as being resistors with a frequency-dependent, complex-valued resistance! This generalized resistance is called ‘impedance. Let’s see how it works.

Suppose we have an electrical circuit. Consider any edge e of this circuit:

• If our edge e is labelled by a resistor of resistance R:


V_e = R I_e

Taking Fourier transforms, we get

\hat{V}_e = R \hat{I}_e

so nothing interesting here: our resistor acts like a resistor of resistance R no matter what the frequency of the voltage and current are!

• If our edge e is labelled by an inductor of inductance L:


\displaystyle{ V_e = L \frac{d I_e}{d t} }

Taking Fourier transforms, we get

\hat{V}_e = (i\omega L) \hat{I}_e

This is interesting: our inductor acts like a resistor of resistance i \omega L when the frequency of the current and voltage is \omega. So, we say the ‘impedance’ of the inductor is i \omega L.

• If our edge e is labelled by a capacitor of capacitance C:

we have

\displaystyle{ I_e = C \frac{d V_e}{d t} }

Taking Fourier transforms, we get

\hat{I}_e = (i\omega C) \hat{V}_e


\displaystyle{ \hat{V}_e = \frac{1}{i \omega C} \hat{I_e} }

So, our capacitor acts like a resistor of resistance 1/(i \omega C) when the frequency of the current and voltage is \omega. We say the ‘impedance’ of the capacitor is 1/(i \omega L).

It doesn’t make sense to talk about the impedance of a voltage source or current source, since these circuit elements don’t give a linear relation between voltage and current. But whenever an element is linear and its properties don’t change with time, the Fourier transformed voltage will be some function of frequency times the Fourier transformed current. And in this case, we call that function the impedance of the element. The symbol for impedance is Z, so we have

\hat{V}_e(\omega) = Z(\omega) \hat{I}_e(\omega)


\hat{V}_e = Z \hat{I}_e

for short.

The big picture

In case you’re getting lost in the details, here are the big lessons for today:

• There’s a detailed analogy between electronics and mechanics, which we’ll later extend to many other systems.

• The study of linear time-independent elements can be reduced to the study of resistors if we generalize resistance to impedance by letting it be a complex-valued function instead of a real number.

One thing we’re doing is preparing for a general study of linear time-independent open systems. We’ll use linear algebra, but the field—the number system in our linear algebra—will consist of complex-valued functions, rather than real numbers.


Let’s not forget our original problem:

This is closely related to the problem we just solved. All the equations we derived still hold! But if you do the math, or use some intuition, you’ll see the voltage source ensures that the voltage we’ve been calling V is a constant. So, the current I flowing around the wire obeys the same equation we got before:

L \ddot Q + R \dot Q + C^{-1} Q = V

where \dot Q = I. The only difference is that now V is constant.

Puzzle. Solve this equation for Q(t).

There are lots of ways to do this. You could use a Fourier transform, which would give a satisfying sense of completion to this blog article. Or, you could do it some other way.

Network Theory (Part 27)

3 April, 2013

This quarter my graduate seminar at UCR will be about network theory. I have a few students starting work on this, so it seems like a good chance to think harder about the foundations of the subject. I’ve decided that bicategories of spans play a basic role, so I want to talk about those.

If you haven’t read the series up to now, don’t worry! Nothing I do for a while will rely on that earlier stuff. I want a fresh start. But just for a minute, I want to talk about the big picture: how the new stuff will relate to the old stuff.

So far this series has been talking about three closely related kinds of networks:

Markov processes
stochastic Petri nets
stochastic reaction networks

but there are many other kinds of networks, and I want to bring some more into play:

circuit diagrams
bond graphs
signal-flow graphs

These come from the world of control theory and engineering—especially electrical engineering, but also mechanical, hydraulic and other kinds of engineering.

My goal is not to tour different formalisms, but to integrate them into a single framework, so we can easily take ideas and theorems from one discipline and apply them to another.

For example, in Part 16 we saw that a special class of Markov processes can also be seen as a special class of circuit diagrams: namely, electrical circuits made of resistors. Also, in Part 17 we saw that stochastic Petri nets and stochastic reaction networks are just two different ways of talking about the same thing. This allows us to take results from chemistry—where they like stochastic reaction networks, which they call ‘chemical reaction networks’—and apply them to epidemiology, where they like stochastic Petri nets, which they call ‘compartmental models’.

As you can see, fighting through the thicket of terminology is half the battle here! The problem is that people in different applied subjects keep reinventing the same mathematics, using terminologies specific to their own interests… making it harder to see how generally applicable their work actually is. But we can’t blame them for doing this. It’s the job of mathematicians to step in, learn all this stuff, and extract the general ideas.

We can see a similar thing happening when writing was invented in ancient Mesopotamia, around 3000 BC. Different trades invented their own numbering systems! A base-60 system, the S system, was used to count most discrete objects, such as sheep or people. But for ‘rations’ such as cheese or fish, they used a base 120 system, the B system. Another system, the ŠE system, was used to measure quantities of grain. There were about a dozen such systems! Only later did they get standardized.

Circuit diagrams

But enough chit-chat; let’s get to work. I want to talk about circuit diagrams—diagrams of electrical circuits. They can get really complicated:

This is a 10-watt audio amplifier with bass boost. It looks quite intimidating. But I’ll start with a simple class of circuit diagrams, made of just a few kinds of parts:

• resistors,
• inductors,
• capacitors,
• voltage sources

and maybe some others later on. I’ll explain how you can translate any such diagram into a system of differential equations that describes how the voltages and currents along the wires change with time.

This is something you’d learn in a basic course on electrical engineering, at least back in the old days before analogue circuits had been largely replaced by digital ones. But my goal is different. I’m not mainly interested in electrical circuits per se: to me the important thing is how circuit diagrams provide a pictorial way of reasoning about differential equations… and how we can use the differential equations to describe many kinds of systems, not just electrical circuits.

So, I won’t spend much time explaining why electrical circuits do what they do—see the links for that. I’ll focus on the math of circuit diagrams, and how they apply to many different subjects, not just electrical circuits.

Let’s start with an example:

This describes a current flowing around a loop of wire with 4 elements on it: a resistor, an inductor, a capacitor, and a voltage source—for example, a battery. Each of these elements is designated by a cute symbol, and each has a real number associated to it:

• This is a resistor:

and it comes with a number R, called its resistance.

• This is an inductor:

and it comes with a number L, called its inductance.

• This is a capacitor:

and it comes with a number C, called its capacitance.

• This is a voltage source:

and it comes with a number V, called its voltage.

You may wonder why inductance got called L instead of I. Well, it’s probably because I stands for ‘current’. And then you’ll ask why current is called I instead of C. I don’t know: maybe because C stands for ‘capacitance’. If every word started with its own unique letter, we wouldn’t have these problems. But then we wouldn’t need words.

Here’s another example:

This example has two new features. First, it has places where wires meet, drawn as black dots. These dots are often called nodes, or sometimes vertices. Since ‘vertex’ starts with V and so does ‘voltage’, let’s call the dots ‘nodes’. Roughly speaking, a graph is a thing with nodes and edges, like this:

This suggests that in our circuit, the wires with elements on them should be seen as edges of a graph. Or perhaps just the wires should be seen as edges, and the elements should be seen as nodes! This is an example of a ‘design decision’ we have to make when formalizing the theory of circuit diagrams. There are also various different precise definitions of ‘graph’, and we need to try to choose the best one.

A second new feature of this example is that it has some white dots called terminals, where wires end. Mathematically these terminals are also vertices in our graph, but they play a special role: they are places where we are allowed to connect this circuit to another circuit. You’ll notice this circuit doesn’t have a voltage source. So, it’s like piece of electrical equipment without its own battery. We need to plug it in for it to do anything interesting!

This is very important. Big complicated electrical circuits are often made by hooking together smaller ones. The pieces are best thought of as ‘open systems’: that is, physical systems that interact with the outside world. Traditionally, a lot of physics focuses on ‘closed systems’, which don’t interact with the outside the world—the part of the world we aren’t modeling. But network theory is all about how we can connect open systems together to form larger open systems (or closed systems). And this is one place where category shows up. As we’ll see, we can think of an open system as a ‘morphism’ going from some inputs to some outputs, and we can ‘compose’ morphisms to get new morphisms by hooking them together.

Differential equations from circuit diagrams

Let me sketch how to get a bunch of ordinary differential equations from a circuit diagram. These equations will say what the circuit does.

We start with a graph having some set N of nodes and some set E of edges. To say how much current is flowing along each edge it will be helpful to give each edge a direction, like this:

So, define a graph to consist of two functions

s,t : E \to N

Then each edge e will have some vertex s(e) as its source, or starting-point, and some vertex t(e) as its target, or endpoint:

(This kind of graph is often called a directed multigraph or quiver, to distinguish it from other kinds, but I’ll just say ‘graph’.)

Next, each edge is labelled by one of four elements: resistor, capacitor, inductor or voltage source. It’s also labelled by a real number, which we call the resistance, capacitance, inductance or voltage of that element. We will make this part prettier later on, so we can easily introduce more kinds of elements without any trouble.

Finally, we specify a subset T \subseteq N and call these nodes terminals.

Our goal now is to write down some ordinary differential equations that say how a bunch of variables change with time. These variables come in two kinds:

• Each edge e has a current running along it, which is a function of time denoted I_e. So, for each e \in E we have a function

I_e : \mathbb{R} \to \mathbb{R}

• Each edge e also has a voltage across it, which is a function of time denoted V_e. So, for each e \in E we have a function

V_e : \mathbb{R} \to \mathbb{R}

We now write down a bunch of equations obeyed by these currents and voltages. First there are some equations called Kirchhoff’s laws:

Kirchhoff’s current law says that for each node that is not a terminal, the total current flowing into that node equals the total current flowing out. In other words:

\displaystyle{ \sum_{e: t(e) = n} I_e = \sum_{e: s(e) = n} I_e }

for each node n \in N - T. We don’t impose Kirchhoff’s current law at terminals, because we want to allow current to flow in or out there!

Kirchhoff’s voltage law says that we can choose for each node a potential \phi_n, which is a function of time:

\phi_n : \mathbb{R} \to \mathbb{R}

such that

V_e = \phi_{s(e)} - \phi_{t(e)}

for each e \in E. In other words, the voltage across each edge is the difference of potentials at the two ends of this edge. This is a slightly nonstandard way to state Kirchhoff’s voltage law, but it’s equivalent to the usual one.

In addition to Kirchhoff’s laws, there’s an equation for each edge, relating the current and voltage on that edge. The details of this equation depends on the element labelling that edge, so we consider the four cases in turn:

• If our edge e is labelled by a resistor of resistance R:

we write the equation

V_e = R I_e

This is called Ohm’s law.

• If our edge e is labelled by an inductor of inductance L:

we write the equation

\displaystyle{ V_e = L \frac{d I_e}{d t} }

I don’t know a name for this equation, but you can read about it here.

• If our edge e is labelled by a capacitor of capacitance C:

we write the equation

\displaystyle{ I_e = C \frac{d V_e}{d t} }

I don’t know a name for this equation, but you can read about it here.

• If our edge e is labelled by a voltage source of voltage V:

we write the equation

V_e = V

This explains the term ‘voltage source’.


Next time we’ll look at some examples and see how we can start polishing up this formalism into something more pretty. But you can get to work now:

Puzzle 1. Starting from the rules above, write down and simplify the equations for this circuit:

Puzzle 2. Do the same for this circuit:

Puzzle 3. If we added a fifth kind of element, our rules for getting equations from circuit diagrams would have more symmetry between voltages and currents. What is this extra element?

Network Theory for Economists

15 January, 2013

Tomorrow I’m giving a talk in the econometrics seminar at U.C. Riverside. I was invited to speak on my work on network theory, so I don’t feel too bad about the fact that I’ll be saying only a little about economics and practically nothing about econometrics. Still, I’ve tried to slant the talk in a way that emphasizes possible applications to economics and game theory. Here are the slides:

Network Theory.

For long-time readers here the fun comes near the end. I explain how reaction networks can be used to describe evolutionary games. I point out that in certain classes of evolutionary games, evolution tends to increase ‘fitness’, and/or lead the players to a ‘Nash equilibrium’. For precise theorems you’ll have to click the links in my talk and read the references!

I conclude with an example: a game with three strategies and 7 Nash equilibria. Here evolution makes the proportion of these three strategies follow these flow lines, at least in the limit of large numbers of players:

This picture is from William Sandholm’s nice expository paper:

• William H. Sandholm, Evolutionary game theory, 2007.

I mentioned it before in Information Geometry (Part 12), en route to showing a proof that some quantity always decreases in a class of evolutionary games. Sometime I want to tell the whole story linking:

reaction networks
evolutionary games
the 2nd law of thermodynamics


Fisher’s fundamental theorem of natural selection.

But not today! Think of these talk slides as a little appetizer.

Network Theory (Part 26)

15 January, 2013

Last time I described the reachability problem for reaction networks, which has the nice feature of connecting chemistry, category theory, and computation.

Near the end I raised a question. Luca Cardelli, who works on biology and computation at Microsoft, answered it.

His answer is interesting enough that I thought you should all read it. I imagined an arbitrary chemical reaction network and said:

We could try to use these reactions to build a ‘chemical computer’. But how powerful can such a computer be? I don’t know the answer.

Cardelli replied:

The answer to that question is known. Stochastic Petri Nets (and equivalently chemical reaction networks) are not Turing-powerful in the strict sense, essentially because of all the decidable properties of Petri Nets. However, they can approximate a Turing machine up to any fixed error bound, so they are ‘almost’ Turing-complete, or ‘Turing-complete-up-to-epsilon’. The error bound can be fixed independently of the length of the computation (which, being a Turing machine, is not going to be known ahead of time); in practice, that means progressively slowing down the computation to make it more accurate over time and to remain below the global error bound.

Note also that polymerization is a chemical operation that goes beyond the power of Stochastic Petri Nets and plain chemical reaction networks: if you can form unbounded polymers (like, e.g., DNA), you can use them as registers or tapes and obtain full Turing completeness, chemically (or, you might say ‘biochemically’ because that’s where the most interesting polymers are found). An unbounded polymer corresponds to an infinite set of reactions (a small set of reactions for each polymer length), i.e. to an ‘actually infinite program’ in the language of simple reaction networks. Infinite programs of course are no good for any notion of Turing computation, so you need to use a more powerful language for describing polymerization, that is, a language that has the equivalent of molecular binding/unbinding as a primitive. That kind of language can be found in Process Algebra.

So, in addition to the


connection, there is another connection between


And he provided a list of references. The correct answer to my question appeared first here:

• D. Soloveichik, M. Cook, E. Winfree and J. Bruck, Computation with finite stochastic chemical reaction networks, Natural Computing 7 (2008), 615–633.

contradicting an earlier claim here:

• M.O. Magnasco, Chemical kinetics is Turing universal, Phys. Rev. Lett. 78 (1997), 1190–1193.

Further work can be found here:

• Matthew Cook, David Soloveichik, Erik Winfree and Jehoshua Bruck Programmability of chemical reaction networks, in Algorithmic Bioprocesses, ed. Luca Cardelli, Springer, Berlin, 543–584, 2009.

and some of Cardelli’s own work is here:

• Gianluigi Zavattaro and Luca Cardelli, Termination problems in chemical kinetics, in CONCUR 2008 – Concurrency Theory, eds. Gianluigi Zavattaro and Luca Cardelli, Lecture Notes in Computer Science 5201, Springer, Berlin, 2008, pp. 477–491.

• Luca Cardelli and Gianluigi Zavattaro, Turing universality of the biochemical ground form, Math. Struct. Comp. Sci. 20 (2010), 45–73.

You can find lots more interesting work on Cardelli’s homepage.

Petri Net Programming (Part 2)

20 December, 2012

guest post by David A. Tanzer

An introduction to stochastic Petri nets

In the previous article, I explored a simple computational model called Petri nets. They are used to model reaction networks, and have applications in a wide variety of fields, including population ecology, gene regulatory networks, and chemical reaction networks. I presented a simulator program for Petri nets, but it had an important limitation: the model and the simulator contain no notion of the rates of the reactions. But these rates critically determine the character of the dynamics of network.

Here I will introduce the topic of ‘stochastic Petri nets,’ which extends the basic model to include reaction dynamics. Stochastic means random, and it is presumed that there is an underlying random process that drives the reaction events. This topic is rich in both its mathematical foundations and its practical applications. A direct application of the theory yields the rate equation for chemical reactions, which is a cornerstone of chemical reaction theory. The theory also gives algorithms for analyzing and simulating Petri nets.

We are now entering the ‘business’ of software development for applications to science. The business logic here is nothing but math and science itself. Our study of this logic is not an academic exercise that is tangential to the implementation effort. Rather, it is the first phase of a complete software development process for scientific programming applications.

The end goals of this series are to develop working code to analyze and simulate Petri nets, and to apply these tools to informative case studies. But we have some work to do en route, because we need to truly understand the models in order to properly interpret the algorithms. The key questions here are when, why, and to what extent the algorithms give results that are empirically predictive. We will therefore be embarking on some exploratory adventures into the relevant theoretical foundations.

The overarching subject area to which stochastic Petri nets belong has been described as stochastic mechanics in the network theory series here on Azimuth. The theme development here will partly parallel that of the network theory series, but with a different focus, since I am addressing a computationally oriented reader. For an excellent text on the foundations and applications of stochastic mechanics, see:

• Darren Wilkinson, Stochastic Modelling for Systems Biology, Chapman and Hall/CRC Press, Boca Raton, Florida, 2011.

Review of basic Petri nets

A Petri net is a graph with two kinds of nodes: species and transitions. The net is populated with a collection of ‘tokens’ that represent individual entities. Each token is attached to one of the species nodes, and this attachment indicates the type of the token. We may therefore view a species node as a container that holds all of the tokens of a given type.

The transitions represent conversion reactions between the tokens. Each transition is ‘wired’ to a collection of input species-containers, and to a collection of output containers. When it ‘fires’, it removes one token from each input container, and deposits one token to each output container.

Here is the example we gave, for a simplistic model of the formation and dissociation of H2O molecules:

The circles are for species, and the boxes are for transitions.

The transition combine takes in two H tokens and one O token, and outputs one H2O token. The reverse transition is split, which takes in one H2O, and outputs two H’s and one O.

An important application of Petri nets is to the modeling of biochemical reaction networks, which include the gene regulatory networks. Since genes and enzymes are molecules, and their binding interactions are chemical reactions, the Petri net model is directly applicable. For example, consider a transition that inputs one gene G, one enzyme E, and outputs the molecular form G • E in which E is bound to a particular site on G.

Applications of Petri nets may differ widely in terms of the population sizes involved in the model. In general chemistry reactions, the populations are measured in units of moles (where a mole is ‘Avogadro’s number’ 6.022 · 1023 entities). In gene regulatory networks, on the other hand, there may only be a handful of genes and enzymes involved in a reaction.

This difference in scale leads to a qualitative difference in the modelling. With small population sizes, the stochastic effects will predominate, but with large populations, a continuous, deterministic, average-based approximation can be used.

Representing Petri nets by reaction formulas

Petri nets can also be represented by formulas used for chemical reaction networks. Here is the formula for the Petri net shown above:

H2O ↔ H + H + O

or the more compact:

H2O ↔ 2 H + O

The double arrow is a compact designation for two separate reactions, which happen to be opposites of each other.

By the way, this reaction is not physically realistic, because one doesn’t find isolated H and O atoms traveling around and meeting up to form water molecules. This is the actual reaction pair that predominates in water:

2 H2O ↔ OH- + H3O+

Here, a hydrogen nucleus H+, with one unit of positive charge, gets removed from one of the H2O molecules, leaving behind the hydroxide ion OH-. In the same stroke, this H+ gets re-attached to the other H2O molecule, which thereby becomes a hydronium ion, H3O+.

For a more detailed example, consider this reaction chain, which is of concern to the ocean environment:

CO2 + H2O ↔ H2CO3 ↔ H+ + HCO3-

This shows the formation of carbonic acid, namely H2CO3, from water and carbon dioxide. The next reaction represents the splitting of carbonic acid into a hydrogen ion and a negatively charged bicarbonate ion, HCO3-. There is a further reaction, in which a bicarbonate ion further ionizes into an H+ and a doubly negative carbonate ion CO32-. As the diagram indicates, for each of these reactions, a reverse reaction is also present. For a more detailed description of this reaction network, see:

• Stephen E. Bialkowski, Carbon dioxide and carbonic acid.

Increased levels of CO2 in the atmosphere will change the balance of these reactions, leading to a higher concentration of hydrogen ions in the water, i.e., a more acidic ocean. This is of concern because the metabolic processes of aquatic organisms is sensitive to the pH level of the water. The ultimate concern is that entire food chains could be disrupted, if some of the organisms cannot survive in a higher pH environment. See the Wikipedia page on ocean acidification for more information.

Exercise. Draw Petri net diagrams for these reaction networks.

Motivation for the study of Petri net dynamics

The relative rates of the various reactions in a network critically determine the qualitative dynamics of the network as a whole. This is because the reactions are ‘competing’ with each other, and so their relative rates determine the direction in which the state of the system is changing. For instance, if molecules are breaking down faster then they are being formed, then the system is moving towards full dissociation. When the rates are equal, the processes balance out, and the system is in an equilibrium state. Then, there are only temporary fluctuations around the equilibrium conditions.

The rate of the reactions will depend on the number of tokens present in the system. For example, if any of the input tokens are zero, then the transition can’t fire, and so its rate must be zero. More generally, when there are few input tokens available, there will be fewer reaction events, and so the firing rates will be lower.

Given a specification for the rates in a reaction network, we can then pose the following kinds of questions about its dynamics:

• Does the network have an equilibrium state?

• If so, what are the concentrations of the species at equilibrium?

• How quickly does it approach the equilibrium?

• At the equilibrium state, there will still be temporary fluctuations around the equilibrium concentrations. What are the variances of these fluctuations?

• Are there modes in which the network will oscillate between states?

This is the grail we seek.

Aside from actually performing empirical experiments, such questions can be addressed either analytically or through simulation methods. In either case, our first step is to define a theoretical model for the dynamics of a Petri net.

Stochastic Petri nets

A stochastic Petri net (with kinetics) is a Petri net that is augmented with a specification for the reaction dynamics. It is defined by the following:

• An underlying Petri net, which consists of species, transitions, an input map, and an output map. These maps assign to each transition a multiset of species. (Multiset means that duplicates are allowed.) Recall that the state of the net is defined by a marking function, that maps each species to its population count.

• A rate constant that is associated with each transition.

• A kinetic model, that gives the expected firing rate for each transition as a function of the current marking. Normally, this kinetic function will include the rate constant as a multiplicative factor.

A further ‘sanity constraint’ can be put on the kinetic function for a transition: it should give a positive value if and only if all of its inputs are positive.

• A stochastic model, which defines the probability distribution of the time intervals between firing events. This specific distribution of the firing intervals for a transition will be a function of the expected firing rate in the current marking.

This definition is based on the standard treatments found, for example in:

• M. Ajmone Marsan, Stochastic Petri nets: an elementary introduction, in Advances in Petri Nets, Springer, Berlin, 1989, 1–23.

or Wilkinson’s book mentioned above. I have also added an explicit mention of the kinetic model, based on the ‘kinetics’ described in here:

• Martin Feinberg, Lectures on chemical reaction networks.

There is an implied random process that drives the reaction events. A classical random process is given by a container with ‘particles’ that are randomly traveling around, bouncing off the walls, and colliding with each other. This is the general idea behind Brownian motion. It is called a random process because the outcome results from an ‘experiment’ that is not fully determined by the input specification. In this experiment, you pour in the ingredients (particles of different types), set the temperature (the distributions of the velocities), give it a stir, and then see what happens. The outcome consists of the paths taken by each of the particles.

In an important limiting case, the stochastic behavior becomes deterministic, and the population sizes become continuous. To see this, consider a graph of population sizes over time. With larger population sizes, the relative jumps caused by the firing of individual transitions become smaller, and graphs look more like continuous curves. In the limit, we obtain an approximation for high population counts, in which the graphs are continuous curves, and the concentrations are treated as continuous magnitudes. In a similar way, a pitcher of sugar can be approximately viewed as a continuous fluid.

This simplification permits the application of continuous mathematics to study of reaction network processes. It leads to the basic rate equation for reaction networks, which specifies the direction of change of the system as a function of the current state of the system.

In this article we will be exploring this continuous deterministic formulation of Petri nets, under what is known as the mass action kinetics. This kinetics is one implementation of the general specification of a kinetic model, as defined above. This means that it will define the expected firing rate of each transition, in a given marking of the net. The probabilistic variations in the spacing of the reactions—around the mean given by the expected firing rate—is part of the stochastic dynamics, and will be addressed in a subsequent article.

The mass-action kinetics

Under the mass action kinetics, the expected firing rate of a transition is proportional to the product of the concentrations of its input species. For instance, if the reaction were A + C → D, then the firing rate would be proportional to the concentration of A times the concentration of C, and if the reaction were A + A → D, it would be proportional to the square of the concentration of A.

This principle is explained by Feinberg as follows:

For the reaction A+C → D, an occurrence requires that a molecule of A meet a molecule of C in the reaction, and we take the probability of such an encounter to be proportional to the product [of the concentrations of A and C]. Although we do not presume that every such encounter yields a molecule of D, we nevertheless take the occurrence rate of A+C → D to be governed by [the product of the concentrations].

For an in-depth proof of the mass action law, see this article:

• Daniel Gillespie, A rigorous definition of the chemical master equation, 1992.

Note that we can easily pass back and forth between speaking of the population counts for the species, and the concentrations of the species, which is just the population count divided by the total volume V of the system. The mass action law applies to both cases, the only difference being that the constant factors of (1/V) used for concentrations will get absorbed into the rate constants.

The mass action kinetics is a basic law of empirical chemistry. But there are limits to its validity. First, as indicated in the proof in the Gillespie, the mass action law rests on the assumptions that the system is well-stirred and in thermal equilibrium. Further limits are discussed here:

• Georg Job and Regina Ruffler, Physical Chemistry (first five chapters), Section 5.2, 2010.

They write:

…precise measurements show that the relation above is not strictly adhered to. At higher concentrations, values depart quite noticeably from this relation. If we gradually move to lower concentrations, the differences become smaller. The equation here expresses a so-called “limiting law“ which strictly applies only when c → 0.

In practice, this relation serves as a useful approximation up to rather high concentrations. In the case of electrically neutral substances, deviations are only noticeable above 100 mol m−3. For ions, deviations become observable above 1 mol m−3, but they are so small that they are easily neglected if accuracy is not of prime concern.

Why would the mass action kinetics break down at high concentrations? According to the book quoted, it is due to “molecular and ionic interactions.” I haven’t yet found a more detailed explanation, but here is my supposition about what is meant by molecular interactions in this context. Doubling the number of A molecules doubles the number of expected collisions between A and C molecules, but it also reduces the probability that any given A and C molecules that are within reacting distance will actually react. The reaction probability is reduced because the A molecules are ‘competing’ for reactions with the C molecules. With more A molecules, it becomes more likely that a C molecule will simultaneously be within reacting distance of several A molecules; each of these A molecules reduces the probability that the other A molecules will react with the C molecule. This is most pronounced when the concentrations in a gas get high enough that the molecules start to pack together to form a liquid.

The equilibrium relation for a pair of opposite reactions

Suppose we have two opposite reactions:

T: A + B \stackrel{u}{\longrightarrow} C + D

T': C + D \stackrel{v}{\longrightarrow} A + B

Since the reactions have exactly opposite effects on the population sizes, in order for the population sizes to be in a stable equilibrium, the expected firing rates of T and T' must be equal:

\mathrm{rate}(T') = \mathrm{rate}(T)

By mass action kinetics:

\mathrm{rate}(T) = u [A] [B]

\mathrm{rate}(T') = v [C] [D]

where [X] means the concentration of X.

Hence at equilibrium:

u [A] [B] = v [C] [D]


\displaystyle{ \frac{[A][B]}{[C][D]} = \frac{v}{u} = K }

where K is the equilibrium constant for the reaction pair.

Equilibrium solution for the formation and dissociation of a diatomic molecule

Let A be some type of atom, and let D = A2 be the diatomic form of A. Then consider the opposite reactions:

A + A \stackrel{u}{\longrightarrow} D

D \stackrel{v}{\longrightarrow} A + A

From the preceding analysis, at equilibrium the following relation holds:

u [A]^2 = v [D]

Let N(A) and N(B) be the population counts for A and B, and let

N = N(A) + 2 N(D)

be the total number of units of A in the system, whether they be in the form of atoms or diatoms.

The value of N is an invariant property of the system. The reactions cannot change it, because they are just shuffling the units of A from one arrangement to the other. By way of contrast, N(A) is not an invariant quantity.

Dividing this equation by the total volume V, we get:

[N] = [A] + 2 [D]

where [N] is the concentration of the units of A.

Given a fixed value for [N] and the rate constants u and v, we can then solve for the concentrations at equilibrium:

\displaystyle{u [A]^2 = v [D] = v ([N] - [A]) / 2 }

\displaystyle{2 u [A]^2 + v [A] - v [N] = 0 }

\displaystyle{[A] = (-v \pm \sqrt{v^2 + 8 u v [N]}) / 4 u }

Since [A] can’t be negative, only the positive square root is valid.

Here is the solution for the case where u = v = 1:

\displaystyle{[A] = (\sqrt{8 [N] + 1} - 1) / 4 }

\displaystyle{[D] = ([N] - [A]) / 2 }


We’ve covered a lot of ground, starting with the introduction of the stochastic Petri net model, followed by a general discussion of reaction network dynamics, the mass action laws, and calculating equilibrium solutions for simple reaction networks.

We still have a number of topics to cover on our journey into the foundations, before being able to write informed programs to solve problems with stochastic Petri nets. Upcoming topics are (1) the deterministic rate equation for general reaction networks and its application to finding equilibrium solutions, and (2) an exploration of the stochastic dynamics of a Petri net. These are the themes that will support our upcoming software development.

Network Theory (Part 25)

3 November, 2012

In parts 2-24 of this network theory series, we’ve been talking about Petri nets and reaction networks. These parts are now getting turned into a book. You can see a draft here:

• John Baez and Jacob Biamonte, A course on quantum techniques for stochastic mechanics.

There’s a lot more to network theory than this. But before I dive into the next big topic, I want to mention a few more odds and ends about Petri nets and reaction networks. For example, their connection to logic and computation!

As we’ve seen, a stochastic Petri net can be used to describe a bunch of chemical reactions with certain reaction rates. We could try to use these reactions to build a ‘chemical computer’. But how powerful can such a computer be?

I don’t know the answer. But before people got interested in stochastic Petri nets, computer scientists spent quite some time studying plain old Petri nets, which don’t include the information about reaction rates. They used these as simple models of computation. And since computer scientists like to know which questions are decidable by means of an algorithm and which aren’t, they proved some interesting theorems about decidability for Petri nets.

Let me talk about: ‘reachability’: the question of which collections of molecules can turn into which other collections, given a fixed set of chemical reactions. For example, suppose you have these chemical reactions:

C + O2 → CO2

CO2 + NaOH → NaHCO3

NaHCO3 + HCl → H2O + NaCl + CO2

Can you use these to turn

C + O2 + NaOH + HCl


CO2 + H2O + NaCl ?

It’s not too hard to settle this particular question—we’ll do it soon. But settling all possible such questions turns out to be very hard



Definition. A Petri net consists of a set S of species and a set T of transitions, together with a function

i : S \times T \to \mathbb{N}

saying how many copies of each state shows up as input for each transition, and a function

o: S \times T \to \mathbb{N}

saying how many times it shows up as output.

Today we’ll assume both S and T are finite.

Jacob and I like to draw the species as yellow circles and the transitions as aqua boxes, in a charmingly garish color scheme chosen by Dave Tweed. So, the chemical reactions I mentioned before:

C + O2 → CO2

CO2 + NaOH → NaHCO3

NaHCO3 + HCl → H2O + NaCl + CO2

give this Petri net:

A ‘complex’ is, roughly, a way of putting dots in the yellow circles. In chemistry this says how many molecules we have of each kind. Here’s an example:

This complex happens to have just zero or one dot in each circle, but that’s not required: we could have any number of dots in each circle. So, mathematically, a complex is a finite linear combination of species, with natural numbers as coefficients. In other words, it’s an element of \mathbb{N}^S. In this particular example it’s

C + O2 + NaOH + HCl

Given two complexes, we say one is reachable from another if, loosely speaking, we can get to it from the other by a finite sequence of transitions. For example, earlier on I asked if we can get from the complex I just mentioned to the complex

CO2 + H2O + NaCl

which we can draw like this:

And the answer is yes, we can do it with this sequence of transitions:




This settles the question I asked earlier.

So in chemistry, reachability is all about whether it’s possible to use certain chemical reactions to turn one collection of molecules into another using a certain set of reactions. I hope this is clear enough; I could formalize it further but it seems unnecessary. If you have questions, ask me or read this:

Petri net: execution semantics, Wikipedia.

The reachability problem

Now the reachability problem asks: given a Petri net and two complexes, is one reachable from the other?

If the answer is ‘yes’, of course you can show that by an exhaustive search of all possibilities. But if the answer is ‘no’, how can you be sure? It’s not obvious, in general. Back in the 1970′s, computer scientists felt this problem should be decidable by some algorithm… but they had a lot of trouble finding such an algorithm.

In 1976, Richard J. Lipton showed that if such an algorithm existed, it would need to take at least an exponential amount of memory space and an exponential amount of time to run:

• Richard J. Lipton, The reachability problem requires exponential space, Technical Report 62, Yale University, 1976.

This means that most computer scientists would consider any algorithm to solve the reachability problem ‘infeasible’, since they like polynomial time algorithms.

On the bright side, it means that Petri nets might be fairly powerful when viewed as computers themselves! After all, for a universal Turing machine, the analogue of the reachability problem is undecidable. So if the reachability problem for Petri nets were decidable, they couldn’t be universal computers. But if it were decidable but hard, Petri nets might be fairly powerful—though still not universal—computers.

In 1977, at the ACM Symposium on the Theory of Computing, two researchers presented a proof that reachability problem was decidable:

• S. Sacerdote and R. Tenney, The decidability of the reachability problem for vector addition systems, Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, 2-4 May 1977, Boulder, Colorado, USA, ACM, 1977, pp. 61–76.

However, it turned out to be flawed! I read about this episode here:

• James L. Peterson, Petri Net Theory and the Modeling of Systems, Prentice–Hall, New Jersey, 1981.

This is a very nice introduction to early work on Petri nets and decidability. Peterson had an interesting idea, too:

There would seem to be a very useful connection between Petri nets and Presburger arithmetic.

He gave some evidence, and suggested using this to settle the decidability of the reachability problem. I found that intriguing! Let me explain why.

Presburger arithmetic is a simple set of axioms for the arithmetic of natural numbers, much weaker than Peano arithmetic or even Robinson arithmetic. Unlike those other systems, Presburger arithmetic doesn’t mention multiplication. And unlike those other systems, you can write an algorithm that decides whether any given statement in Presburger arithmetic is provable.

However, any such algorithm must be very slow! In 1974, Fischer and Rabin showed that any decision algorithm for Presburger arithmetic has a worst-case runtime of at least

2^{2^{c n}}

for some constant c, where n is the length of the statement. So we say the complexity is at least doubly exponential. That’s much worse than exponential! On the other hand, an algorithm with a triply exponential run time was found by Oppen in 1978.

I hope you see why this is intriguing. Provability is a lot like reachability, since in a proof you’re trying to reach the conclusion starting from the assumptions using certain rules. Like Presburger arithmetic, Petri nets are all about addition, since they consists of transitions going between linear combinations like this:

6 CO2 + 6 H2O → C6H12O6 + 6 O2

That’s why the old literature calls Petri nets vector addition systems. And finally, the difficulty of deciding provability in Presburger arithmetic smells a bit like the difficulty of deciding reachability in Petri nets.

So, I was eager to learn what happened after Peterson wrote his book.

For starters, in 1981, the very year Peterson’s book came out, Ernst Mayr showed that the reachability problem for Petri nets is decidable:

• Ernst Mayr, Persistence of vector replacement systems is decidable, Acta Informatica 15 (1981), 309–318.

As you can see from the title, Mayr actually proved some other property was decidable. However, it follows that reachability is decidable, and Mayr pointed this out in his paper. In fact the decidability of reachability for Petri nets is equivalent to lots of other interesting questions. You can see a bunch here:

• Javier Esparza and Mogens Nielsen, Decidability issues for Petri nets—a survey, Bulletin of the European Association for Theoretical Computer Science 52 (1994), 245–262.

Mayr’s algorithm was complicated. Worse still, it seems to take a hugely long time to run. It seems nobody knows an explicit bound on its runtime. The runtim might even grow faster than any primitive recursive function. The Ackermann function and the closely related Ackermann numbers are famous examples of functions that grow more rapidly than any primitive recursive function. If you don’t know about these, now is the time to learn!

Remember that we can define multiplication by iterating addition:

n \times m = n + n + n + \cdots + n

where add n to itself m times. Then we can define exponentiation by iterating multiplication:

n \uparrow m = n \times n \times n \times \cdots \times n

where we multiply n by itself m times. Here I’m using Knuth’s up-arrow notation. Then we can define tetration by iterating exponentiation:

n \uparrow^2 m = n \uparrow (n \uparrow (n \uparrow \cdots \uparrow n)))

Then we can define an operation \uparrow^3 by iterating tetration, and so on. All these functions are primitive recursive. But the nth Ackermann number is not; it’s defined to be

n \uparrow^n n

This grows at an insanely rapid rate:

1 \uparrow 1 = 1

2 \uparrow^2 2 = 4

3 \uparrow^3 3 = 3^{3^{3^{.^{.^{.}}}}}

where we have a stack of 3^{3^3} threes—or in other words, 3^{7625597484987} threes! When we get to 4 \uparrow^4 4, my mind boggles. I wish it didn’t, but it does.

In 1998 someone came up with a faster algorithm:

• Zakaria Bouziane, A primitive recursive algorithm for the general Petri net reachability problem, in 39th Annual Symposium on Foundations of Computer Science, IEEE, 1998, pp. 130-136.

Bouziane claimed this algorithm is doubly exponential in space and time. That’s very slow, but not insanely slow.

However, it seems that Bouziane made a mistake:

• Petr Jančar, Bouziane’s transformation of the Petri net reachability problem and incorrectness of the related algorithm, Information and Computation, 206 (2008), 1259–1263.

So: if I tell you some chemicals and a bunch of reactions involving these chemicals, you can decide when some combination of these chemicals can turn into another combination. But it may take a long time to decide this. And we don’t know exactly how long: just more than ‘exponentially long’!

What about the connection to Presburger arithmetic? This title suggests that it exists:

• Jérôme Leroux, The general vector addition system reachability problem by Presburger inductive separators, 2008.

But I don’t understand the paper well enough to be sure. Can someone say more?

Also, does anyone know more about the computational power of Petri nets? They’re not universal computers, but is there a good way to say how powerful they are? Does the fact that it takes a long time to settle the reachability question really imply that they have a lot of computational power?

Symmetric monoidal categories

Next let me explain the secret reason I’m so fascinated by this. This section is mainly for people who like category theory.

As I mentioned once before, a Petri net is actually nothing but a presentation of a symmetric monoidal category that’s free on some set of objects and some set of morphisms going between tensor products of those objects:

Vladimiro Sassone, On the category of Petri net computations, 6th International Conference on Theory and Practice of Software Development, Proceedings of TAPSOFT ’95, Lecture Notes in Computer Science 915, Springer, Berlin, pp. 334-348.

In chemistry we write the tensor product additively, but we could also write it as \otimes. Then the reachability problem consists of questions of this general type:

Suppose we have a symmetric monoidal category freely generated by objects A, B, C and morphisms

e: A \to B \otimes C

f: A \otimes A \otimes B \to A \otimes C

g: A \otimes B \otimes C \to A \otimes B \otimes B

h : B \otimes A \otimes A \to B

Is there a morphism from B \otimes A \otimes A to A \otimes A?

This is reminiscent of the word problem for groups and other problems where we are given a presentation of an algebraic structure and have to decide if two elements are equal… but now, instead of asking whether two elements are equal we are asking if there is a morphism from one object to another. So, it is fascinating that this problem is decidable—unlike the word problem for groups—but still very hard to decide.

Just in case you want to see a more formal statement, let me finish off by giving you that:

Reachability problem. Given a symmetric monoidal category C freely generated by a finite set of objects and a finite set of morphisms between tensor products of these objects, and given two objects x,y \in C, is there a morphism f: x \to y?

Theorem (Lipton, Mayr). There is an algorithm that decides the reachability problem. However, for any such algorithm, for any c > 0, the worst-case run-time exceeds 2^{c n} where n is the size of the problem: the sum of the number of generating objects, the number of factors in the sources and targets of all the generating morphisms, and the number of factors in the objects x,y \in C for which the reachability problem is posed.


Get every new post delivered to your Inbox.

Join 2,716 other followers