The Large-Number Limit for Reaction Networks (Part 3)

8 October, 2014

joint with Arjun Jain

We used to talk about reaction networks quite a lot here. When Arjun Jain was visiting the CQT, we made a lot of progress understanding how the master equation reduces to the rate equation in the limit where there are very large numbers of things of each kind. But we never told you the end of the story, and by now it’s been such a long time that you may need a reminder of some basic facts!

So…

The rate equation treats the number of things of each kind as continuous—a nonnegative real number—and says how it changes in a deterministic way.

The master equation treats the number of things of each kind as discrete—a nonnegative integer—and says how it changes in a probabilistic way.

You can think of the master equation as the ‘true’ description, and the rate equation as an approximation that’s good in some limit where there are large numbers of molecules — or more precisely, where the probability distribution of having some number of molecules of each kind is sharply peaked near some large value.

You may remember that in the master equation, the state of a chemical system is described by a vector \psi in a kind of ‘Fock space’, while time evolution is described with the help of an operator on this space, called the ‘Hamiltonian’ H:

\displaystyle{ \frac{d}{dt} \psi(t) = H \psi(t) }

The Hamiltonian is built from annihilation and creation operators, so all of this looks very much like quantum field theory. The details are here, and we won’t try to review them all:

• John Baez and Jacob Biamonte, Quantum Techniques for Stochastic Mechanics.

The point is this: the ‘large-number limit’ where the master equation reduces to the rate equation smells a lot like the ‘classical limit’ of quantum field theory, where the description of light in terms of photons reduces to the good old Maxwell equations. So, maybe we can understand the large-number limit by borrowing techniques from quantum field theory!

How do we take the classical limit of quantum electromagnetism and get the classical Maxwell equations? For simplicity let’s ignore charged particles and consider the ‘free electromagnetic field': just photons, described by the quantum version of Maxwell’s equations. When we take the classical limit we let Planck’s constant \hbar go to zero: that much is obvious. However, that’s not all! The energy of each photon is proportional to \hbar, so to take the classical limit and get a solution of the classical Maxwell’s equations with nonzero energy we also need to increase the number of photons. We cleverly do this in such a way that the total energy remains constant as \hbar \to 0.

So, in quantum electromagnetism the classical limit is also a large-number limit!

That’s a good sign. It suggests the same math will also apply to our reaction network problem.

But then we hit an apparent roadblock. What’s the analogue of Planck’s constant in chemical reaction networks? What should go to zero?

We told you the answer to that puzzle a while ago: it’s the reciprocal of Avogadro’s number!

You see, chemists measure numbers of molecules in ‘moles’. There’s a large number of molecules in each mole: Avogadro’s number. If we let the reciprocal of Avogadro’s number go to zero, we are taking a limit where chemistry becomes ‘continuous’ and the discreteness of molecules goes away. Of course this is just a mathematical trick, but it’s a very useful one.

So, we got around that roadblock. And then something nice happened.

When taking the classical limit of quantum electromagnetism, we focus attention on certain quantum states that are the ‘best approximations’ to classical states. These are called ‘coherent states’, and it’s very easy to study how the behave as we simultaneously let \hbar \to 0 and let the expected number of photons go to infinity.

And the nice thing is that these coherent states are also important in chemistry! But because chemistry involves probabilities rather than amplitudes, they have a different name: ‘Poisson distributions’. On this blog, Brendan Fong used them to give a new proof of a great result in mathematical chemistry, the Anderson–Craciun–Kurtz theorem.

So, we have most of the concepts and tools in place, and we can tackle the large-number limit using quantum techniques.

You can review the details here:

The large-number limit for reaction networks (part 1).

The large-number limit for reaction networks (part 2) .

So, after a quick refresher on the notation, we’ll plunge right in.

As you’ll see, we solve the problem except for one important technical detail: passing a derivative through a limit! This means our main result is not a theorem. Rather, it’s an idea for how to prove a theorem. Or if we act like physicists, we can call it a theorem.

Review of notation

The rate equation says

\displaystyle{\frac{d}{dt}{x}(t) = \sum_{\tau \in T} r(\tau) (t(\tau)-s(\tau)) {x}(t)^{s(\tau)} }

where:

x(t) \in \mathbb{R}^k is a vector describing concentrations of k different species at time t. In chemistry these species could be different kinds of molecules.

• Each \tau \in T is a transition, or in chemistry, a reaction.

s(\tau) \in \mathbb{N}^k is a vector of natural numbers saying how many items of each species appear as inputs to the reaction \tau. This is called the source of the reaction.

t(\tau) \in \mathbb{N}^k is a vector of natural numbers saying how many items of each species appear as outputs of the reaction \tau. This is called the target of the reaction. So, t(\tau) - s(\tau) says the net change of the number of items of each species in the reaction \tau.

• The rate at which the reaction \tau occurs is proportional to the rate constant r(\tau) times the number

x(t)^{s(\tau)}

Here we are raising a vector to a vector power and getting a number, using this rule:

x^r = x_1^{r_1} \cdots x_k^{r_k}

where r is any vector of natural numbers (r_1, \dots, r_k), and x is any vector of nonnegative real numbers. From now on we’ll call a vector of natural numbers a multi-index.

In this paper:

• John Baez, Quantum techniques for reaction networks.

it was shown that the master equation implies

\displaystyle{\frac{d}{dt}\langle N_i \psi(t)\rangle = \sum_{\tau \in T} r(\tau) (t_i(\tau)-s_i(\tau)) \; \langle N^{\, \underline{s(\tau)}} \, \psi(t)\rangle }

Here:

\psi(t) is the stochastic state saying the probability of having any particular number of items of each species at each time t. We won’t review the precise details of how this work; for that reread the relevant bit of Part 8.

N_i is the ith number operator, defined using annihilation and creation operators as in quantum mechanics:

N_i = a_i^\dagger a_i

For the annihilation and creation operators, see Part 8.

\langle N_i \psi(t) \rangle is the expected number of items of the ith species at time t.

• Similarly, \langle N^{\underline{s(\tau)}} \psi(t)\rangle is the expected value of a certain product of operators. For any multi-index r we define the falling power

N_i^{\underline{r}_i} = N_i (N_i - 1) (N_i - 2) \cdots (N_i - r_i +1)

and then we define

N^{\underline{r}} = N_1^{\underline{r_1}} \cdots N_k^{\underline{r_k}}

The large-number limit

Okay. Even if you don’t understand any of what we just said, you’ll see the master and rate equation look similar. The master equation implies this:

\displaystyle{\frac{d}{dt}\langle N_i \psi(t)\rangle = \sum_{\tau \in T} r(\tau) (t_i(\tau)-s_i(\tau)) \; \langle N^{\, \underline{s(\tau)}} \, \psi(t)\rangle }

while the rate equation says this:

\displaystyle{\frac{d}{dt}{x}(t) = \sum_{\tau \in T} r(\tau) (t(\tau)-s(\tau)) \; {x}(t)^{s(\tau)} }

So, we will try to get from the first to the second second with the help of a ‘large-number limit’.

We start with a few definitions. We introduce an adjustable dimensionless quantity which we call \hbar. This is just a positive number, which has nothing to do with quantum theory except that we’re using a mathematical analogy to quantum mechanics to motivate everything we’re doing.

Definition. The rescaled number operators are defined as \widetilde{N}_i = \hbar N_i. This can be thought of as a rescaling of the number of objects, so that instead of counting objects individually, we count them in bunches of size 1/\hbar.

Definition. For any multi-index r we define the rescaled falling power of the number operator N_i by:

\widetilde{N}_i^{\underline{r_i}} = \widetilde{N}_i (\widetilde{N}_i - \hbar)(\widetilde{N}_i-2\hbar)\cdots (\widetilde{N}_i-r_i\hbar+\hbar)

and also define

\widetilde{N}^{\underline{r}} = \widetilde{N}_1^{\underline{r_1}} \; \cdots \;\widetilde{N}_k^{\underline{r_k}}

for any multi-index r.

Using these, we get the following equation:

\displaystyle{\frac{1}{\hbar}\frac{d}{dt} \langle \widetilde{N}_i \psi(t)\rangle = \sum_{\tau \in T} r(\tau) (t_i(\tau)-s_i(\tau)) \; \langle \widetilde{N}^{\underline{s(\tau)}} \psi(t)\rangle \; \frac{1}{\hbar^{|s(\tau)|}} }

where for any multi-index r we set

|r| = r_1 + \cdots + r_k

This suggests a way to rescale the rate constants in the master equation:

Definition. The rescaled rate constants are

\displaystyle{\widetilde{r}(\tau) = \frac{r(\tau)}{\hbar^{|s(\tau)|-1}}}

From here onwards, we change our viewpoint. We consider the rescaled rate constants \widetilde{r}(\tau) to be fixed, instead of the original rate constants r(\tau). So, as we decrease \hbar, we are studying situations where the original rate constants change to ensure that the rescaled rate constants stays fixed!

So, we switch to working with a rescaled master equation:

Definition. The rescaled master equation is:

\displaystyle{\frac{d}{dt} \langle \widetilde{N}_i\widetilde{\psi}(t)\rangle = \sum_{\tau \in T} \widetilde{r}(\tau) (t_i(\tau)-s_i(\tau)) \; \langle \widetilde{N}^{\underline{s(\tau)}} \widetilde{\psi}(t)\rangle }

This is really a one-parameter family of equations, depending on \hbar\in (0,\infty). We write a solution of the rescaled master equation as \widetilde{\psi}(t), but it is really one solution for each value of \hbar.

Following the same procedure as above, we can rescale the rate equation, using the same definition of the rescaled rate constants:

Definition. The rescaled number of objects of the i\mathrm{th} species is defined as \widetilde{x_i}=\hbar x_i, where x_i is the original number of objects of the i\mathrm{th} species. Here again, we are counting in bunches of 1/\hbar.

Using this to rescale the rate equation, we get

Definition. The rescaled rate equation is

\displaystyle{\frac{d}{dt}\widetilde{x}(t) = \sum_{\tau \in T} \widetilde{r}(\tau) (t(\tau)-s(\tau))\; \widetilde{x}(t)^{s(\tau)} }

where

\widetilde{x}(t)=(\widetilde{x_1}(t),\widetilde{x_2}(t),\dots, \widetilde{x_k}(t))

Therefore, to go from the rescaled master equation to the rescaled rate equation, we require that

\langle\widetilde{N}^{\underline{r}}\, \widetilde{\psi}(t)\rangle \to \langle\widetilde{N}\widetilde{\psi}(t)\rangle^r

as \hbar \to 0. If this holds, we can identify \langle\widetilde{N}\widetilde{\psi}(t)\rangle with \widetilde{x}(t) and get the rate equation from the master equation!

To this end, we introduce the following crucial idea:

Definition. A semiclassical family of states, \widetilde{\psi}, is defined as a one-parameter family of states depending on \hbar \in (0,\infty) such that for some \widetilde{c} \in [0,\infty)^k we have

\langle\widetilde{N}^r\widetilde{\psi}\rangle \to \widetilde{c}^{\, r}

for every r\in \mathbb{N}^k as \hbar \to 0.

In particular, this implies

\langle\widetilde{N}_i\widetilde{\psi}\rangle \to \widetilde{c}_i

for every index i.

Intuitively, a semiclassical family is a family of probability distributions that becomes more and more sharply peaked with a larger and larger mean as \hbar decreases. We would like to show that in this limit, the rescaled master equation gives the rescaled rate equation.

We make this precise in the following propositions.

Proposition 1. If \widetilde{\psi} is a semiclassical family as defined above, then in the \hbar \to 0 limit, we have \langle\widetilde{N}^{\underline{r}}\widetilde{\psi}\rangle \to \widetilde{c}^{\; r} as well.

Proof. For each index i,

\displaystyle{ \langle\widetilde{N}_i^{\; \underline{r_i}}\, \widetilde{\psi}\rangle = \displaystyle{ \langle \widetilde{N}_i (\widetilde{N}_i - \hbar)(\widetilde{N}_i-2\hbar)\cdots(\widetilde{N}_i-r_i\hbar+\hbar)\,\widetilde{\psi}\rangle} }

\displaystyle{ = \Big\langle\Big(\widetilde{N}_i^r + \hbar\frac{r_i(r_i-1)}{2}\widetilde{N}_i^{r_i-1}+\cdots + \hbar^{r_i-1}(r_i-1)!\Big)\,\widetilde{\psi}\Big\rangle }

By the definition of a semiclassical family,

\displaystyle{ \lim_{\hbar\to 0} \langle\Big(\widetilde{N}_i^{r_i} + \hbar\frac{r_i(r_i-1)}{2}\widetilde{N}_i^{r_i-1}+ \cdots + \hbar^{r_i-1}(r_i-1)!\Big)\;\widetilde{\psi}\rangle} = \widetilde{c}_i^{\; r_i}

since every term but the first approaches zero. Thus, we have

\displaystyle{ \lim_{\hbar \to 0} \langle\widetilde{N}_i^{\; \underline{r_i}}\, \widetilde{\psi}\rangle =   \widetilde{c}_i^{\; r_i} }

A similar but more elaborate calculation shows that

\displaystyle{ \lim_{\hbar \to 0} \langle\widetilde{N}_1^{\, \underline{r_1}} \cdots\widetilde{N}_k^{\, \underline{r_k}} \, \widetilde{\psi}\rangle = \lim_{\hbar \to 0} \langle\widetilde{N}_1^{\, r_1}\cdots \widetilde{N}_k^{\, r_k} \widetilde{\psi}\rangle= \lim_{\hbar \to 0}\langle\widetilde{N}^{\, r} \, \widetilde{\psi}\rangle }

or in other words

\langle\widetilde{N}^{\, \underline{r}}\,\widetilde{\psi}\rangle \to \widetilde{c}^{\, r}

as \hbar \to 0.   █

Proposition 2. If \widetilde{\psi} is a semiclassical family of states, then

\displaystyle{  \langle (\widetilde{N}-\widetilde{c})^{r}\, \widetilde{\psi}\rangle \to 0 }

for any multi-index r.

Proof. Consider the r_i\mathrm{th} centered moment of the i\mathrm{th} number operator:

\displaystyle{\langle(\widetilde{N}_i-\widetilde{c}_i)^{r_i}\widetilde{\psi}\rangle = \sum_{p =0}^{r_i} {r_i \choose p}\langle\widetilde{N}_i^p\widetilde{\psi}\rangle(-\widetilde{c}_i)^{r_i-p} }

Taking the limit as \hbar goes to zero, this becomes

\begin{array}{ccl} \displaystyle{ \lim_{\hbar \to 0}\sum_{p =0}^{r_i} {r_i \choose p}\langle\widetilde{N}_i^p\widetilde{\psi}\rangle(-\widetilde{c}_i)^{r_i-p} } &=& \displaystyle{ \sum_{p =0}^{r_i} {r_i \choose p}(\widetilde{c}_i)^p(-\widetilde{c}_i)^{r_i-p} } \\ \\  &=& (\widetilde{c}_i-\widetilde{c}_i)^{r_i} \\ \\  &=& 0 \end{array}

For a general multi-index r we can prove the same sort of thing with a more elaborate calculation. First note that

\langle (\widetilde{N}-\widetilde{c})^{r}\widetilde{\psi}\rangle=\langle(\widetilde{N_1}-\widetilde{c_1})^{r_1} \cdots (\widetilde{N_k}-\widetilde{c_k})^{r_k})\widetilde{\psi}\rangle

The right-hand side can be expanded as

\displaystyle{ \langle(\sum_{p_1 =0}^{r_1} {r_1 \choose p_1}\widetilde{N}_1^{p_1}(-\widetilde{c}_1)^{r_1-p_1} ) \cdots (\sum_{p_k =0}^{r_k} {r_k \choose p_k}\widetilde{N}_k^{p_k}(-\widetilde{c}_k)^{r_k-p_k} )\widetilde{\psi} }\rangle

We can write this more tersely as

\displaystyle{ \sum_{p=0}^r} {r\choose p} \langle \widetilde{N}^p\widetilde{\psi}\rangle (-\widetilde{c})^{r-p}

where for any multi-index r we define

\displaystyle{{\sum_{p=0}^{r}}= \sum_{p_1 =0}^{r_1} \cdots  \sum_{p_k =0}^{r_k}  }

and for any multi-indices r, p we define

\displaystyle{ {r \choose p}={r_1 \choose p_1}{r_2 \choose p_2}\cdots {r_k \choose p_k}}

Now using the definition of a semiclassical state, we see

\displaystyle{ \lim_{\hbar \to 0} \sum_{p=0}^r} {r\choose p} \langle \widetilde{N}^p\widetilde{\psi}\rangle (-\widetilde{c})^{r-p}= \displaystyle{ \sum_{p=0}^r} {r\choose p} (\widetilde{c})^{p} (-\widetilde{c})^{r-p}

But this equals zero, as the last expression, expanded, is

\displaystyle{ (\widetilde{c})^r \left( \sum_{p_1=0}^{r_1} {r_1\choose p_1} (-1)^{r_1-p_1}\right) \cdots \left( \sum_{p_k=0}^{r_k} {r_k\choose p_k} (-1)^{r_k-p_k} \right) }

where each individual sum is zero.   █

Here is the theorem that would finish the job if we could give a fully rigorous proof:

“Theorem.” If \widetilde{\psi}(t) is a solution of the rescaled master equation and also a semiclassical family for the time interval [t_0,t_1], then \widetilde{x}(t) = \langle \widetilde{N} \widetilde{\psi}(t) \rangle is a solution of the rescaled rate equation for t \in [t_0,t_1].

Proof sketch. We sketch a proof that relies on the assumption that we can pass the \hbar \to 0 limit through a time derivative. Of course, to make this rigorous, we would need to justify this. Perhaps it is true only in certain cases.

Assuming that we can pass the limit through the derivative:

\displaystyle{\lim_{\hbar \to 0}\frac{d}{dt} \langle \widetilde{N}\widetilde{\psi}(t)\rangle = \lim_{\hbar \to 0} \sum_{\tau \in T} \widetilde{r}(\tau) (t(\tau)-s(\tau))\langle \widetilde{N}^{\, \underline{s(\tau)}} \, \widetilde{\psi}(t)\rangle }

and thus

\displaystyle{\frac{d}{dt}\lim_{\hbar \to 0} \langle \widetilde{N}\widetilde{\psi}(t)\rangle = \sum_{\tau \in T} \widetilde{r}(\tau) (t(\tau)-s(\tau)) \lim_{\hbar \to 0}\langle \widetilde{N}^{\, \underline{s(\tau)}} \, \widetilde{\psi}(t)\rangle }

and thus

\displaystyle{\frac{d}{dt}\widetilde{x}(t) = \sum_{\tau \in T} \widetilde{r}(\tau) (t(\tau)-s(\tau))\widetilde{x}^{\, s(\tau)} }.

As expected, we obtain the rescaled rate equation.   █

Another question is this: if we start with a semiclassical family of states as our initial data, does it remain semiclassical as we evolve it in time? This will probably be true only in certain cases.

An example: rescaled coherent states

The best-behaved semiclassical states are the coherent states.
Consider the family of coherent states

\displaystyle{\widetilde{\psi}_{\widetilde{c}} = \frac{e^{(\widetilde{c}/\hbar) z}}{e^{\widetilde{c}/\hbar}}}

using the notation developed in the earlier mentioned paper. In that paper it was shown that for any multi-index m and any coherent state \Psi we have

\langle N^{\underline{m}}\Psi\rangle = \langle N\Psi \rangle^m

Using this result for \widetilde{\psi}_{\widetilde{c}} we get

\displaystyle{\langle \widetilde{N}^{\underline{m}}\widetilde{\psi}_{\widetilde{c}}\rangle~=~\hbar^{|m|}\langle N^{\underline{m}}\widetilde{\psi}_{\widetilde{c}}\rangle~=~\hbar^{|m|}\langle N\widetilde{\psi}_{\widetilde{c}}\rangle^m~=~\hbar^{|m|}\frac{\widetilde{c}^m}{\hbar^{|m|}}~=~\widetilde{c}^m}

Since \langle\widetilde{N}^{\underline{m}}\widetilde{\psi}_{\widetilde{c}}\rangle equals \langle \widetilde{N}^{m} \widetilde{\psi}_{\widetilde{c}}\rangle plus terms of order \hbar, as \hbar \to 0 we have

\langle\widetilde{N}^{\underline{m}}\widetilde{\psi}_{\widetilde{c}}\rangle~\to~\langle\widetilde{N}^{m}\widetilde{\psi}_{\widetilde{c}}\rangle=\widetilde{c}^{m}

showing that our chosen \widetilde{\psi}_{\widetilde{c}} is indeed a semiclassical family.


Network Theory (Part 30)

3 October, 2014

The network theory series is back! You may have thought it died out, but in fact it’s just getting started. Over the last year my grad students have made huge strides in working out the math of networks. Now it’s time to explain what they’ve done.

In the last three episodes I explained how electrical circuits made of resistors, inductors and capacitors are a great metaphor for many kinds of complex systems made of interacting parts. And it’s not just a metaphor; there are mathematically rigorous analogies—in other words, isomorphisms—between such electrical circuits and various other kinds of ‘passive linear networks’.

I showed you a chart of these analogies last time:

displacement:    q flow:      \dot q momentum:      p effort:           \dot p
Mechanics: translation position velocity momentum force
Mechanics: rotation angle angular velocity angular momentum torque
Electronics charge current flux linkage voltage
Hydraulics volume flow pressure momentum pressure
Thermal Physics entropy entropy flow temperature momentum temperature
Chemistry moles molar flow chemical momentum chemical potential

But what do I mean by a ‘passive linear network’? Let me explain this very roughly at first, since we’ll be painfully precise later on.

Right now by ‘network’ I mean a graph with gizmos called ‘components’ on the edges. For example:

In a network there is some kind of ‘flow’ running along each edge, and also some kind of ‘effort’ across that edge. For example, in electronics the flow is electrical current and the effort is voltage. The chart shows the meaning of flow and effort in other examples.

‘Passivity’ means roughly that none of the components put out energy that didn’t earlier come in. For example, resistors lose energy (which goes into heat, which we’re not counting). Capacitors can store energy and later release it. So, resistors and capacitors are passive—and so are inductors. But batteries and current sources actually put out energy, so we won’t allow them in our networks yet. For now, we’re just studying how passive components respond to a source of flow or effort.

For some subtleties that show up when you try to make the concept of passivity precise, try:

Passivity (engineering), Wikipedia.

Finally, ‘linearity’ means that the flow along each edge of our network is linearly related to the effort across that edge. Here are the key examples:

• For electrical resistors, linearity is captured by Ohm’s law. If an edge e in our network is labelled by a resistor of resistance R, usually drawn like this:

then Ohm’s law says:

V = R I

where V is the voltage across that edge and I is the current along that edge.

• If our edge e is labelled by an inductor of inductance L:

we have

\displaystyle{ V = L \frac{d I}{d t} }

Here we need to think of the voltage and current as functions of time.

• If our edge e is labelled by a capacitor of capacitance C:

we write the equation

\displaystyle{ I = C \frac{d V}{d t} }

where again we think of the voltage and current as functions of time.

Both linearity and passivity are simplifying assumptions that we eventually want to drop. If we include batteries or current sources, we’re dropping passivity. And if include transistors, we’re dropping linearity. Obviously both these are important!

However, there is a lot to do even with these simplifying assumptions. And now it’s time to get started!

In what follows, I will not use the terms ‘flow’ and ‘effort’, which are chosen to be neutral and subject-independent. Instead, I’ll use the vocabulary from electronics, e.g. ‘current’ and ‘voltage’. The reason is that we’ve all heard of resistors, capacitors, Ohm’s law and Kirchhoff’s laws, and while these have analogues in every row of the chart, it seems pointless to make up weird new ‘neutral’ terms for all these concepts.

But don’t be fooled by the jargon! We’re not merely studying electrical circuits. We’ll be studying passive linear networks in full generality… with the help of category theory.

Linear passive networks as morphisms

To get going, let’s think about circuits made of resistors. We can do this without harm, because we’ll later include capacitors and inductors using a simple effortless trick. Namely, we’ll generalize the ‘resistance’ of a resistor, which is a real number, to something called ‘impedance’, which is an element of some larger field. Everything will be so abstract that replacing resistances with impedances will be as easy as snapping our fingers.

Right now I want to define a category where the morphisms are circuits made of resistors. Any morphism will go from some ‘inputs’ to some ‘outputs’, like this:

So a morphism is roughly a graph with edges labelled by numbers called ‘resistances’, with some special nodes called ‘inputs’ and some special nodes called ‘outputs’.

What can do with morphisms? Compose them! So, suppose we have a second morphism whose inputs match the outputs of the first:

Then we can compose them, attaching the outputs of the first to the inputs of the second. We get this morphism as the result:

So, composing morphisms is a way to build big electrical circuits—or other ‘linear passive networks’—out of little ones.

This seems pretty simple, but let’s try to formalize it and see why we have a category. In fact it takes a bit of thought. To check that we get a category, we need to check that composition is associative:

(fg)h = f(gh)

and that each object x has an identity morphism 1_x : x \to x that does what an identity should:

f 1_x = f

1_x g = g

All these equations seem obviously true in our example… until you try to prove them.

You might think an identity morphism should be a bunch of straight pieces of wire—a bunch of edges each with an input node and an output node—but that doesn’t quite work, since sticking an extra edge onto a graph gives a new graph with an extra edge!

Also, we are composing circuits by ‘sticking them together’. This process is formalized in category theory using a pushout, and pushouts are only defined ‘up to canonical isomorphism’. The very simplest example is the disjoint union of two sets. We all know what it means, but if you examine it carefully, you’ll see it’s only defined up to canonical isomorphism, because it involves a choice of how we make the two sets disjoint, and this choice is somewhat arbitrary.

All this means the category we’re after is a bit subtler than you might at first expect; in fact, it’s most naturally thought of as a bicategory, meaning roughly that all the equations above hold only ‘up to canonical isomorphism’.

So, we proceed like this.

First we define a concept of ‘labelled graph’, where (for now) only the edges are labelled. We do this because we want our circuits to have edges labelled by ‘resistances’, which are real numbers. But we do it in greater generality because later we’ll want the edges to be labelled by ‘impedances’, which are elements of some other field. And since we’re studying electrical circuits just as examples of networks, later still we will probably want graphs whose edges are labelled in still other ways.

So:

Definition. A graph consists a finite set E of edges, a finite set N of nodes, and two functions

s,t : E \to N

Thus each edge e will have some node s(e) as its source and some node t(e) as its target:

Definition. Given a set L, we define an L-labelled graph to be a graph together with a function r : E \to L. This assigns to each edge e \in E its label r(e) \in L. We call L the label set.

We use the letter r because for circuits of resistors we will take the label set to be

L = (0,\infty) \subset \mathbb{R}

the positive real numbers, and r(e) will be the resistance of the edge e. For circuits that also contain inductors and capacitors we will take the label set to be the positive elements of some larger field… but more about that later!

Now we want to give our L-labelled graph a set of nodes called ‘inputs’ and a set of nodes called ‘outputs’. You might think the set of inputs should be disjoint from the set of outputs, but that’s a tactical error! It turns out an identity morphism in our category should have the inputs being exactly the same as the outputs… and no edges at all:

To handle this nicely, we need to make a category of L-labelled graphs. This works in the obvious way, if you’re used to this stuff. A morphism from one L-labelled graph to another sends edges to edges, nodes to nodes, and preserves everything in sight:

Definition. Given L-graphs \Gamma = (E,N,s,t,r) and \Gamma' = (E',N',s',t',r'), a morphism of L-labelled graphs from \Gamma to \Gamma' is a pair of functions

\epsilon: E \to E'

\nu : N \to N'

such that the following diagrams commute:

There is a category L\mathrm{Graph} where the objects are L-labelled graphs and the morphisms are as we’ve just defined them.

Warning: the morphisms in L\mathrm{Graph} are not the morphisms of the kind we really want, the ones that look like this:

They are just a step along the way. A morphism of the kind we really want is a diagram like this in L\mathrm{Graph}:

where \Gamma is an L-labelled graph and I, O are L-labelled graphs with no edges!

You see, if I and O have no edges, all they have is nodes. We call the nodes of I the inputs and those of O the outputs. The morphisms i: I \to \Gamma and o : O \to \Gamma say how these nodes are included in \Gamma. \Gamma is our circuit made of resistors.

In general, any diagram shaped like this is called a cospan:

If we turned the arrows around it would be called a span. Cospans are good whenever you a thing with an ‘input end’ and an ‘output end’, and you want to describe how the ends are included in the thing. So, they’re precisely what we need for describing a circuit made of resistors, like this:

This makes us want to cook up a category L\mathrm{Circ} where:

• an object I is an L-labelled graph with no edges. We can alternatively think of it as a finite set: a set of nodes.

• a morphism from I to O is a cospan of L-labelled graphs:

We still need to say how to compose these morphisms. We know it will amount to attaching the outputs of one circuit to the inputs of the next—that’s all there is to it! But we need to make this precise and prove we get a category. And as I’ve hinted, we will actually get something bigger and better: a bicategory! This will come as no surprise to if you’re familiar with span and cospan bicategories–but it may induce a heart attack otherwise.

This bicategory can then be ‘watered down’ to give our category L\mathrm{Circ}. And when we take

L = (0,\infty)

we’ll get the category where morphisms are circuits made of resistors! We’ll call this \mathrm{ResCirc}.

Then I’ll explain what we can do with this category! There’s no end of things we could do with it. But the main thing Brendan does is study the ‘black-boxing’ operation, where we take a circuit, forget its inner details, and only keep track of what it does. This turns out to be quite interesting.

References

I thank Brendan Fong for drawing some of the pictures of circuits here. For the details of what I’m starting to explain here, read our paper:

• John Baez and Brendan Fong, A compositional framework for passive linear networks.

You can learn more about the underlying ideas here:

• Dean C. Karnopp, Donald L. Margolis and Ronald C. Rosenberg, System Dynamics: a Unified Approach, Wiley, New York, 1990.

• Forbes T. Brown, Engineering System Dynamics: a Unified Graph-Centered Approach, CRC Press, Boca Raton, 2007.

• Francois E. Cellier, Continuous System Modelling, Springer, Berlin, 1991.


Network Theory News

28 September, 2014

 

You may be wondering, somewhere deep in the back of your mind, what happened to the Network Theory series on this blog. It’s nowhere near done! I plan to revive it, since soon I’ll be teaching a seminar on network theory at U.C. Riverside. It will start in October and go on at least until the end of March.

Here’s a version of the advertisement I sent to the math grad students:

Network Theory Seminar

I’ll be running a seminar on Network Theory on Mondays from 3:10 to 4:30 pm in Surge 268 starting on October 6th.

Network theory uses the tools of modern math—categories, operads and more—to study complex systems made of interacting parts. The idea of this seminar is to start from scratch, explain what my grad students have been doing, and outline new projects that they and other students might want to work on. A lot has happened since I left town in January.

I hope to see you there!

If you want more detail, here is a sketch of what’s been happening.

1) Franciscus Rebro has been working on “cospans”, a very general way to treat a physical system with inputs and outputs and treat it as a morphism in category. This underlies all the other projects.

2) Brendan Fong, a student of mine at Oxford, is working on a category where the morphisms are electrical circuits, and composing morphisms is sticking together circuits:

• Brendan Fong, A compositional approach to control theory.

• John Baez and Brendan Fong, A compositional framework for passive linear networks.

The first paper here is an outline of the project; the second one actually carries out the project, or at least a large chunk of it.

3) Blake Pollard, a student in the physics department, has been studying Markov processes. In a Markov process, things randomly hop from one vertex of a graph to another along edges. Blake has created a category where morphisms are ‘open’ Markov process, in which things flow in and out of certain special vertices called ‘terminals’.

4) Jason Erbele has been working on categories in control theory, the branch of math used to study physical systems that interact with the outside world via inputs and outputs. After finishing this paper:

• John Baez and Brendan Fong, Categories in control.

he’s been taking more concept from control theory and formalizing them using categories.

5) Oh yeah, and what about me? I gave a series of lectures on network theory at Oxford, and you can see videos of them here:

• John Baez, Network Theory.

Jacob Biamonte and I have also more or less finished a book on chemical reaction networks and Petri nets:

• John Baez and Jacob Biamonte, Quantum Techniques for Stochastic Mechanics.

But I won’t be talking about that; I want to talk about the new work my students are doing!


Quantum Frontiers in Network Science

6 May, 2014

guest post by Jacob Biamonte

There’s going to be a workshop on quantum network theory in Berkeley this June. The event is being organized by some of my collaborators and will be a satellite of the biggest annual network science conference, NetSci.

A theme of the Network Theory series here on Azimuth has been to merge ideas appearing in quantum theory with other disciplines. Remember the first post by John which outlined the goal of a general theory of networks? Well, everyone’s been chipping away at this stuff for a few years now and I think you’ll agree that this workshop seems like an excellent way to push these topics even further, particularly as they apply to complex networks.

The event is being organized by Mauro Faccin, Filippo Radicchi and Zoltán Zimborás. You might recall when Tomi Johnson first explained to us some ideas connecting quantum physics with the concepts of complex networks (see Quantum Network Theory Part 1 and Part 2). Tomi’s going to be speaking at this event. I understand there is even still a little bit of space left to contribute talks and/or to attend. I suspect that those interested can sort this out by emailing the organizers or just follow the instructions to submit an abstract.

They have named their event Quantum Frontiers in Network Science or QNET for short. Here’s their call.

Quantum Frontiers in Network Science

This year the biggest annual network science conference, NetSci will take place in Berkeley California on 2-6 June. We are organizing a one-day Satellite Workshop on Quantum Frontiers in Network Science (QNET).

quantum netsci2014

A grand challenge in contemporary complex network science is to reconcile the staple “statistical mechanics based approach” with a theory based on quantum physics. When considering networks where quantum coherence effects play a non-trivial role, the predictive power of complex network science has been shown to break down. A new theory is now being developed which is based on quantum theory, from first principles. Network theory is a diverse subject which developed independently in several disciplines to rely on graphs with additional structure to model complex systems. Network science has of course played a significant role in quantum theory, for example in topics such as tensor network states, chiral quantum walks on complex networks, categorical tensor networks, and categorical models of quantum circuits, to name only a few. However, the ideas of complex network science are only now starting to be united with modern quantum theory. From this respect, one aim of the workshop is to put in contact two big and generally not very well connected scientific communities: statistical and quantum physicists.

The topic of network science underwent a revolution when it was realized that systems such as social or transport networks could be interrelated through common network properties, but what are the relevant properties to consider when facing quantum systems? This question is particularly timely as there has been a recent push towards studying increasingly larger quantum mechanical systems, where the analysis is only beginning to undergo a shift towards embracing the concepts of complex networks.

brain network

For example, theoretical and experimental attention has turned to explaining transport in photosynthetic complexes comprising tens to hundreds of molecules and thousands of atoms using quantum mechanics. Likewise, in condensed matter physics using the language of “chiral quantum walks”, the topological structure of the interconnections comprising complex materials strongly affects their transport properties.

An ultimate goal is a mathematical theory and formal description which pinpoints the similarities and differences between the use of networks throughout the quantum sciences. This would give rise to a theory of networks augmenting the current statistical mechanics approach to complex network structure, evolution, and process with a new theory based on quantum mechanics.

Topics of special interest to the satellite include

• Quantum transport and chiral quantum walks on complex networks
• Detecting community structure in quantum systems
• Tensor algebra and multiplex networks
• Quantum information measures (such as entropy) applied to complex networks
• Quantum critical phenomena in complex networks
• Quantum models of network growth
• Quantum techniques for reaction networks
• Quantum algorithms for problems in complex network science
• Foundations of quantum theory in relation to complex networks and processes thereon
• Quantum inspired mathematics as a foundation for network science

Info

QNET will be held at the NetSci Conference venue at the Clark Kerr Campus of the University of California, on June 2nd in the morning (8am-1pm).

Links

• Main conference page: NetSci2014
Call for abstracts and the program

It sounds interesting! You’ll notice that the list of topics seems reminiscent of some of the things we’ve been talking about right here on Azimuth! A general theme of the Network Theory Series has been geared towards developing frameworks to describe networked systems through a common language and then to map the use of tools and results across disciplines. It seems like a great place to talk about these ideas. Oh, and here’s a current list of the speakers:

Leonardo Banchi (UCL, London)
Ginestra Bianconi (London)
Silvano Garnerone (IQC, Waterloo)
Laetitia Gauvin (ISI Foundation)
Marco Javarone (Sassari)
Tomi Johnson (Oxford)

and again, the organizers are

Mauro Faccin (ISI Foundation)
Filippo Radicchi (Indiana University)
Zoltán Zimborás (UCL)

From the call, we can notice that a central discussion topic at QNET will be about contrasting stochastic and quantum mechanics. Here on Azimuth we like this stuff. You might remember that stochastic mechanics was formulated in the network theory series to mathematically resemble quantum theory (see e.g. Part 12). This formalism was then employed to produce several results, including a stochastic version of Noether’s theorem by John and Brendan in Parts 11 and 13—recently Ville has also written Noether’s Theorem: Quantum vs Stochastic. Several other results were produced by relating quantum field theory to Petri nets from population biology and to chemical reaction networks in chemistry (see the Network Theory homepage). It seems to me that people attending QNET will be interested in these sorts of things, as well as other related topics.

One of the features of complex network science is that it is often numerically based and geared directly towards interesting real-world applications. I suspect some interesting results should stem from the discussions that will take place at this workshop.

By the way, here’s a view of downtown San Francisco at dusk from Berkeley Hills California from the NetSci homepage:

San Francisco

Noether’s Theorem: Quantum vs Stochastic

3 May, 2014

guest post by Ville Bergholm

In 1915 Emmy Noether discovered an important connection between the symmetries of a system and its conserved quantities. Her result has become a staple of modern physics and is known as Noether’s theorem.

Photo of Emmy Noether

The theorem and its generalizations have found particularly wide use in quantum theory. Those of you following the Network Theory series here on Azimuth might recall Part 11 where John Baez and Brendan Fong proved a version of Noether’s theorem for stochastic systems. Their result is now published here:

• John Baez and Brendan Fong, A Noether theorem for stochastic mechanics, J. Math. Phys. 54:013301 (2013).

One goal of the network theory series here on Azimuth has been to merge ideas appearing in quantum theory with other disciplines. John and Brendan proved their stochastic version of Noether’s theorem by exploiting ‘stochastic mechanics’ which was formulated in the network theory series to mathematically resemble quantum theory. Their result, which we will outline below, was different than what would be expected in quantum theory, so it is interesting to try to figure out why.

Recently Jacob Biamonte, Mauro Faccin and myself have been working to try to get to the bottom of these differences. What we’ve done is prove a version of Noether’s theorem for Dirichlet operators. As you may recall from Parts 16 and 20 of the network theory series, these are the operators that generate both stochastic and quantum processes. In the language of the series, they lie in the intersection of stochastic and quantum mechanics. So, they are a subclass of the infinitesimal stochastic operators considered in John and Brendan’s work.

The extra structure of Dirichlet operators—compared with the wider class of infinitesimal stochastic operators—provided a handle for us to dig a little deeper into understanding the intersection of these two theories. By the end of this article, astute readers will be able to prove that Dirichlet operators generate doubly stochastic processes.

Before we get into the details of our proof, let’s recall first how conservation laws work in quantum mechanics, and then contrast this with what John and Brendan discovered for stochastic systems. (For a more detailed comparison between the stochastic and quantum versions of the theorem, see Part 13 of the network theory series.)

The quantum case

I’ll assume you’re familiar with quantum theory, but let’s start with a few reminders.

In standard quantum theory, when we have a closed system with n states, the unitary time evolution of a state |\psi(t)\rangle is generated by a self-adjoint n \times n matrix H called the Hamiltonian. In other words, |\psi(t)\rangle satisfies Schrödinger’s equation:

i \hbar \displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle.

The state of a system starting off at time zero in the state |\psi_0 \rangle and evolving for a time t is then given by

|\psi(t) \rangle = e^{-i t H}|\psi_0 \rangle.

The observable properties of a quantum system are associated with self-adjoint operators. In the state |\psi \rangle, the expected value of the observable associated to a self-adjoint operator O is

\langle O \rangle_{\psi} = \langle \psi | O | \psi \rangle

This expected value is constant in time for all states if and only if O commutes with the Hamiltonian H:

[O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t.

In this case we say O is a ‘conserved quantity’. The fact that we have two equivalent conditions for this is a quantum version of Noether’s theorem!

The stochastic case

In stochastic mechanics, the story changes a bit. Now a state |\psi(t)\rangle is a probability distribution: a vector with n nonnegative components that sum to 1. Schrödinger’s equation gets replaced by the master equation:

\displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle

If we start with a probability distribution |\psi_0 \rangle at time zero and evolve it according to this equation, at any later time have

|\psi(t)\rangle = e^{t H} |\psi_0 \rangle.

We want this always be a probability distribution. To ensure that this is so, the Hamiltonian H must be infinitesimal stochastic: that is, a real-valued n \times n matrix where the off-diagonal entries are nonnegative and the entries of each column sum to zero. It no longer needs to be self-adjoint!

When H is infinitesimal stochastic, the operators e^{t H} map the set of probability distributions to itself whenever t \ge 0, and we call this family of operators a continuous-time Markov process, or more precisely a Markov semigroup.

In stochastic mechanics, we say an observable O is a real diagonal n \times n matrix, and its expected value is given by

\langle O\rangle_{\psi} = \langle \hat{O} | \psi \rangle

where \hat{O} is the vector built from the diagonal entries of O. More concretely,

\langle O\rangle_{\psi} = \displaystyle{ \sum_i O_{i i} \psi_i }

where \psi_i is the ith component of the vector |\psi\rangle.

Here is a version of Noether’s theorem for stochastic mechanics:

Noether’s Theorem for Markov Processes (Baez–Fong). Suppose H is an infinitesimal stochastic operator and O is an observable. Then

[O,H] =0

if and only if

\displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0

and

\displaystyle{\frac{d}{d t}}\langle O^2 \rangle_{\psi(t)} = 0

for all t \ge 0 and all \psi(t) obeying the master equation.   █

So, just as in quantum mechanics, whenever [O,H]=0 the expected value of O will be conserved:

\displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0

for any \psi_0 and all t \ge 0. However, John and Brendan saw that—unlike in quantum mechanics—you need more than just the expectation value of the observable O to be constant to obtain the equation [O,H]=0. You really need both

\displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0

together with

\displaystyle{\frac{d}{d t}} \langle O^2\rangle_{\psi(t)} = 0

for all initial data \psi_0 to be sure that [O,H]=0.

So it’s a bit subtle, but symmetries and conserved quantities have a rather different relationship than they do in quantum theory.

A Noether theorem for Dirichlet operators

But what if the infinitesimal generator of our Markov semigroup is also self-adjoint? In other words, what if H is both an infinitesimal stochastic matrix but also its own transpose: H = H^\top? Then it’s called a Dirichlet operator… and we found that in this case, we get a stochastic version of Noether’s theorem that more closely resembles the usual quantum one:

Noether’s Theorem for Dirichlet Operators. If H is a Dirichlet operator and O is an observable, then

[O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0

Proof. The \Rightarrow direction is easy to show, and it follows from John and Brendan’s theorem. The point is to show the \Leftarrow direction. Since H is self-adjoint, we may use a spectral decomposition:

H = \displaystyle{ \sum_k E_k |\phi_k \rangle \langle \phi_k |}

where \phi_k are an orthonormal basis of eigenvectors, and E_k are the corresponding eigenvalues. We then have:

\displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = \langle \hat{O} | H e^{t H} |\psi_0 \rangle = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0

\iff \quad \langle \hat{O}| H e^{t H} = 0 \quad \forall t \ge 0

\iff \quad \sum_k \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} \langle \phi_k| = 0 \quad \forall t \ge 0

\iff \quad \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} = 0 \quad \forall t \ge 0

\iff \quad |\hat{O} \rangle \in \mathrm{Span}\{|\phi_k \rangle \, : \; E_k = 0\} = \ker \: H,

where the third equivalence is due to the vectors |\phi_k \rangle being linearly independent. For any infinitesimal stochastic operator H the corresponding transition graph consists of m connected components iff we can reorder (permute) the states of the system such that H becomes block-diagonal with m blocks. Now it is easy to see that the kernel of H is spanned by m eigenvectors, one for each block. Since H is also symmetric, the elements of each such vector can be chosen to be ones within the block and zeros outside it. Consequently

|\hat{O} \rangle \in \ker \: H

implies that we can choose the basis of eigenvectors of O to be the vectors |\phi_k \rangle, which implies

[O, H] = 0

Alternatively,

|\hat{O} \rangle \in \ker \, H

implies that

|\hat{O^2} \rangle \in \ker \: H \; \iff \; \cdots \; \iff \; \displaystyle{\frac{d}{d t}} \langle O^2 \rangle_{\psi(t)} = 0 \; \forall \: |\psi_0 \rangle, \forall t \ge 0,

where we have used the above sequence of equivalences backwards. Now, using John and Brendan’s original proof, we can obtain [O, H] = 0.   █

In summary, by restricting ourselves to the intersection of quantum and stochastic generators, we have found a version of Noether’s theorem for stochastic mechanics that looks formally just like the quantum version! However, this simplification comes at a cost. We find that the only observables O whose expected value remains constant with time are those of the very restricted type described above, where the observable has the same value in every state in a connected component.

Puzzles

Suppose we have a graph whose graph Laplacian matrix H generates a Markov semigroup as follows:

U(t) = e^{t H}

Puzzle 1. Suppose that also H = H^\top, so that H is a Dirichlet operator and hence i H generates a 1-parameter unitary group. Show that the indegree and outdegree of any node of our graph must be equal. Graphs with this property are called balanced.

Puzzle 2. Suppose that U(t) = e^{t H} is doubly stochastic Markov semigroup, meaning that for all t \ge 0 each row and each column of U(t) sums to 1:

\displaystyle{ \sum_i U(t)_{i j} = \sum_j U(t)_{i j} = 1 }

and all the matrix entries are nonnegative. Show that the Hamiltonian H obeys

\displaystyle{\sum_i H_{i j} = \sum_j H_{i j} = 0 }

and all the off-diagonal entries of H are nonnegative. Show the converse is also true.

Puzzle 3. Prove that any doubly stochastic Markov semigroup U(t) is of the form e^{t H} where H is the graph Laplacian of a balanced graph.

Puzzle 4. Let O(t) be a possibly time-dependent observable, and write \langle O(t) \rangle_{\psi(t)} for its expected value with respect to some initial state \psi_0 evolving according to the master equation. Show that

\displaystyle{ \frac{d}{d t}\langle O(t)\rangle_{\psi(t)} = \left\langle [O(t), H] \right\rangle_{\psi(t)} + \left\langle \frac{\partial O(t)}{\partial t}\right\rangle_{\psi(t)} }

This is a stochastic version of the Ehrenfest theorem.


Programming with Chemical Reaction Networks

23 March, 2014

 

There will be a 5-day workshop on Programming with Chemical Reaction Networks: Mathematical Foundation at BIRS from Sunday, June 8 to Friday June 13, 2014 It’s being organized by

Anne Condon (University of British Columbia)
David Doty (California Institute of Technology)
Chris Thachuk (University of Oxford).

BIRS is the Banff International Research Station, in the mountains west of Calgary, in Alberta, Canada.

Description

Here’s the workshop proposal on the BIRS website. It’s a pretty interesting proposal, especially if you’ve already read Luca Cardelli’s description of computing with chemical reaction networks, at the end of our series of posts on chemical reaction networks. The references include a lot of cool papers, so I’ve created links to those to help you get ahold of them.

This workshop will explore three of the most important research themes concerning stochastic chemical reaction networks (CRNs). Below we motivate each theme and highlight key questions that the workshop will address. Our main objective is to bring together distinct research communities in order to consider new problems that could not be fully appreciated in isolation. It is also our aim to determine commonalities between different disciplines and bodies of research. For example, research into population protocols, vector addition systems, and Petri networks provide a rich body of theoretical results that may already address contemporary problems arising in the study of CRNs.

Computational power of CRNs

Before designing robust and practical systems, it is useful to know the limits to computing with a chemical soup. Some interesting theoretical results are already known for stochastic chemical reaction networks. The computational power of CRNs depend upon a number of factors, including: (i) is the computation deterministic, or probabilistic, and (ii) does the CRN have an initial context — certain species, independent of the input, that are initially present in some exact, constant count.

In general, CRNs with a constant number of species (independent of the input length) are capable of Turing universal computation [17], if the input is represented by the exact (unary) count of one molecular species, some small probability of error is permitted and an initial context in the form of a single-copy leader molecule is used.

Could the same result hold in the absence of an initial context? In a surprising result based on the distributed computing model of population protocols, it has been shown that if a computation must be error-free, then deterministic computation with CRNs having an initial context is limited to computing semilinear predicates [1], later extended to functions outputting natural numbers encoded by molecular counts [5].

Furthermore, any semilinear predicate or function can be computed by that class of CRNs in expected time polylogarithmic in the input length. Building on this result, it was recently shown that by incurring an expected time linear in the input length, the same result holds for “leaderless” CRNs [8] — CRNs with no initial context. Can this result be improved to sub-linear expected time? Which class of functions can be computed deterministically by a CRN without an initial context in expected time polylogarithmic in the input length?

While (restricted) CRNs are Turing-universal, current results use space proportional to the computation time. Using a non-uniform construction, where the number of species is proportional to the input length and each initial species is present in some constant count, it is known that any S(n) space-bounded computation can be computed by a logically-reversible tagged CRN, within a reaction volume of size poly(S(n)) [18]. Tagged CRNs were introduced to model explicitly the fuel molecules in physical realizations of CRNs such as DNA strand displacement systems [6] that are necessary to supply matter and energy for implementing reactions such as X → X + Y that violate conservation of mass and/or energy.

Thus, for space-bounded computation, there exist CRNs that are time-efficient or are space-efficient. Does there exist time- and space-efficient CRNs to compute any space-bounded function?

Designing and verifying robust CRNs

While CRNs provide a concise model of chemistry, their physical realizations are often more complicated and more granular. How can one be sure they accurately implement the intended network behaviour? Probabilistic model checking has already been employed to find and correct inconsistencies between CRNs and their DNA strand displacement system (DSD) implementations [9]. However, at present, model checking of arbitrary CRNs is only capable of verifying the correctness of very small systems. Indeed, verification of these types of systems is a difficult problem: probabilistic state reachability is undecidable [17, 20] and general state reachability is EXPSPACE-hard [4].

How can larger systems be verified? A deeper understanding of CRN behaviour may simplify the process of model checking. As a motivating example, there has been recent progress towards verifying that certain DSD implementations correctly simulate underlying CRNs [16, 7, 10]. This is an important step to ensuring correctness, prior to experiments. However, DSDs can also suffer from other errors when implementing CRNs, such as spurious hybridization or strand displacement. Can DSDs and more generally CRNs be designed to be robust to such predictable errors? Can error correcting codes and redundant circuit designs used in traditional computing be leveraged in these chemical computers? Many other problems arise when implementing CRNs. Currently, unique types of fuel molecules must be designed for every reaction type. This complicates the engineering process significantly. Can a universal type of fuel be designed to smartly implement any reaction?

Energy efficient computing with CRNs

Rolf Landauer showed that logically irreversible computation — computation as modeled by a standard Turing machine — dissipates an amount of energy proportional to the number of bits of information lost, such as previous state information, and therefore cannot be energy efficient [11]. However, Charles Bennett showed that, in principle, energy efficient computation is possible, by proposing a universal Turing machine to perform logically-reversible computation and identified nucleic acids (RNA/DNA) as a potential medium to realize logically-reversible computation in a physical system [2].

There have been examples of logically-reversible DNA strand displacement systems — a physical realization of CRNs — that are, in theory, capable of complex computation [12, 19]. Are these systems energy efficient in a physical sense? How can this argument be made formally to satisfy both the computer science and the physics communities? Is a physical experiment feasible, or are these results merely theoretical footnotes?

References

[1] D. Angluin, J. Aspnes, and D. Eisenstat. Stably computable predicates are semilinear. In PODC, pages 292–299, 2006.

[2] C. H. Bennett. Logical reversibility of computation. IBM Journal of Research and Development, 17 (6):525–532, 1973.

[3] L. Cardelli and A. Csikasz-Nagy. The cell cycle switch computes approximate majority. Scientific Reports, 2, 2012.

[4] E. Cardoza, R. Lipton, A. R. Meyer. Exponential space complete problems for Petri nets and commutative semigroups (Preliminary Report). Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pages 507–54, 1976.

[5] H. L. Chen, D. Doty, and D. Soloveichik. Deterministic function computation with chemical reaction networks. DNA Computing and Molecular Programming, pages 25–42, 2012.

[6] A. Condon, A. J. Hu, J. Manuch, and C. Thachuk. Less haste, less waste: on recycling and its limits in strand displacement systems. Journal of the Royal Society: Interface Focus, 2 (4):512–521, 2012.

[7] Q. Dong. A bisimulation approach to verification of molecular implementations of formal chemical reaction network. Master’s thesis. SUNY Stony Brook, 2012.

[8] D. Doty and M. Hajiaghayi. Leaderless deterministic chemical reaction networks. In Proceedings of the 19th International Meeting on DNA Computing and Molecular Programming, 2013.

[9] M. R. Lakin, D. Parker, L. Cardelli, M. Kwiatkowska, and A. Phillips. Design and analysis of DNA strand displacement devices using probabilistic model checking. Journal of The Royal Society Interface, 2012.

[10] M. R. Lakin, D. Stefanovic and A. Phillips. Modular Verification of Two-domain DNA Strand Displacement Networks via Serializability Analysis. In Proceedings of the 19th Annual conference on DNA computing, 2013.

[11] R. Landauer. Irreversibility and heat generation in the computing process. IBM Journal of research and development, 5 (3):183–191, 1961.

[12] L. Qian, D. Soloveichik, and E. Winfree. Efficient Turing-universal computation with DNA polymers (extended abstract) . In Proceedings of the 16th Annual conference on DNA computing, pages 123–140, 2010.

[13] L. Qian and E. Winfree. Scaling up digital circuit computation with DNA strand displacement cascades. Science, 332 (6034):1196–1201, 2011.

[14] L. Qian, E. Winfree, and J. Bruck. Neural network computation with DNA strand displacement cascades. Nature, 475 (7356):368–372, 2011.

[15] G. Seelig, D. Soloveichik, D.Y. Zhang, and E. Winfree. Enzyme-free nucleic acid logic circuits. Science, 314 (5805):1585–1588, 2006.

[16] S. W. Shin. Compiling and verifying DNA-based chemical reaction network implementations. Master’s thesis. California Insitute of Technology, 2011.

[17] D. Soloveichik, M. Cook, E. Winfree, and J. Bruck. Computation with finite stochastic chemical reaction networks. Natural Computing, 7 (4):615–633, 2008.

[18] C. Thachuk. Space and energy efficient molecular programming. PhD thesis, University of British Columbia, 2012.

[19] C. Thachuk and A. Condon. Space and energy efficient computation with DNA strand displacement systems. In Proceedings of the 18th Annual International Conference on DNA computing and Molecular Programming, 2012.

[20] G. Zavattaro and L. Cardelli. Termination Problems in Chemical Kinetics. In Proceedings of the 2008 Conference on Concurrency Theory, pages 477–491, 2008.


Networks of Dynamical Systems

18 March, 2014

guest post by Eugene Lerman

Hi, I’m Eugene Lerman. I met John back in the mid 1980s when John and I were grad students at MIT. John was doing mathematical physics and I was studying symplectic geometry. We never talked about networks. Now I teach in the math department at the University of Illinois at Urbana, and we occasionally talk about networks on his blog.

A few years ago a friend of mine who studies locomotion in humans and other primates asked me if I knew of any math that could be useful to him.

I remember coming across an expository paper on ‘coupled cell networks':

• Martin Golubitsky and Ian Stewart, Nonlinear dynamics of networks: the groupoid formalism, Bull. Amer. Math. Soc. 43 (2006), 305–364.

In this paper, Golubitsky and Stewart used the study of animal gaits and models for the hypothetical neural networks called ‘central pattern generators’ that give rise to these gaits to motivate the study of networks of ordinary differential equations with symmetry. In particular they were interested in ‘synchrony’. When a horse trots, or canters, or gallops, its limbs move in an appropriate pattern, with different pairs of legs moving in synchrony:


They explained that synchrony (and the patterns) could arise when the differential equations have finite group symmetries. They also proposed several systems of symmetric ordinary differential equations that could generate the appropriate patterns.

Later on Golubitsky and Stewart noticed that there are systems of ODEs that have no group symmetries but whose solutions nonetheless exhibit certain synchrony. They found an explanation: these ODEs were ‘groupoid invariant’. I thought that it would be fun to understand what ‘groupoid invariant’ meant and why such invariance leads to synchrony.

I talked my colleague Lee DeVille into joining me on this adventure. At the time Lee had just arrived at Urbana after a postdoc at NYU. After a few years of thinking about these networks Lee and I realized that strictly speaking one doesn’t really need groupoids for these synchrony results and it’s better to think of the social life of networks instead. Here is what we figured out—a full and much too precise story is here:

• Eugene Lerman and Lee DeVille, Dynamics on networks of manifolds.

Let’s start with an example of a class of ODEs with a mysterious property:

Example. Consider this ordinary differential equation for a function \vec{x} : \mathbb{R} \to {\mathbb{R}}^3

\begin{array}{rcl}  \dot{x}_1&=& f(x_1,x_2)\\  \dot{x}_2&=& f(x_2,x_1)\\  \dot{x}_3&=& f(x_3, x_2)  \end{array}

for some function f:{\mathbb{R}}^2 \to {\mathbb{R}}. It is easy to see that a function x(t) solving

\displaystyle{  \dot{x} = f(x,x)  }

gives a solution of these equations if we set

\vec{x}(t) = (x(t),x(t),x(t))

You can think of the differential equations in this example as describing the dynamics of a complex system built out of three interacting subsystems. Then any solution of the form

\vec{x}(t) = (x(t),x(t),x(t))

may be thought of as a synchronization: the three subsystems are evolving in lockstep.

One can also view the result geometrically: the diagonal

\displaystyle{  \Delta = \{(x_1,x_2, x_3)\in {\mathbb{R}}^3 \mid x_1 =x_2 = x_3\}  }

is an invariant subsystem of the continuous-time dynamical system defined by the differential equations. Remarkably enough, such a subsystem exists for any choice of a function f.

Where does such a synchronization or invariant subsystem come from? There is no apparent symmetry of {\mathbb{R}}^3 that preserves the differential equations and fixes the diagonal \Delta, and thus could account for this invariant subsystem. It turns out that what matters is the structure of the mutual dependencies of the three subsystems making up the big system. That is, the evolution of x_1 depends only on x_1 and x_2, the evolution of x_2 depends only on x_2 and x_3, and the evolution of x_3 depends only on x_3 and x_2.

These dependencies can be conveniently pictured as a directed graph:

The graph G has no symmetries. Nonetheless, the existence of the invariant subsystem living on the diagonal \Delta can be deduced from certain properties of the graph G. The key is the existence of a surjective map of graphs

\varphi :G\to G'

from our graph G to a graph G' with exactly one node, call it a, and one arrow. It is also crucial that \varphi has the following lifting property: there is a unique way to lift the one and only arrow of G' to an arrow of G once we specify the target node of the lift.

We now formally define the notion of a ‘network of phase spaces’ and of a continuous-time dynamical system on such a network. Equivalently, we define a ‘network of continuous-time dynamical systems’. We start with a directed graph

G=\{G_1\rightrightarrows G_0\}

Here G_1 is the set of edges, G_0 is the set of nodes, and the two arrows assign to an edge its source and target, respectively. To each node we attach a phase space (or more formally a manifold, perhaps with boundary or corners). Here ‘attach’ means that we choose a function {\mathcal P}:G_0 \to {\mathsf{PhaseSpace}}; it assigns to each node a\in G_0 a phase space {\mathcal P}(a).

In our running example, to each node of the graph G we attach the real line {\mathbb{R}}. (If we think of the set G_0 as a discrete category and {\mathsf{PhaseSpace}} as a category of manifolds and smooth maps, then {\mathcal P} is simply a functor.)

Thus a network of phase spaces is a pair (G,{\mathcal P}), where G is a directed graph and {\mathcal P} is an assignment of phase spaces to the nodes of G.

We think of the collection \{{\mathcal P}(a)\}_{a\in G_0} as the collection of phase spaces of the subsystems constituting the network (G, {\mathcal P}). We refer to {\mathcal P} as a phase space function. Since the state of the network should be determined completely and uniquely by the states of its subsystems, it is reasonable to take the total phase space of the network to be the product

\displaystyle{  {\mathbb{P}}(G, {\mathcal P}):= \bigsqcap_{a\in G_0} {\mathcal P}(a).  }

In the example the total phase space of the network (G,{\mathcal P}) is {\mathbb{R}}^3, while the phase space of the network (G', {\mathcal P}') is the real line {\mathbb{R}}.

Finally we need to interpret the arrows. An arrow b\xrightarrow{\gamma}a in a graph G should encode the fact that the dynamics of the subsystem associated to the node a depends on the states of the subsystem associated to the node b. To make this precise requires the notion of an ‘open system’, or ‘control system’, which we define below. It also requires a way to associate an open system to the set of arrows coming into a node/vertex a.

To a first approximation an open system (or control system, I use the two terms interchangeably) is a system of ODEs depending on parameters. I like to think of a control system geometrically: a control system on a phase space M controlled by the the phase space U is a map

F: U\times M \to TM

where TM is the tangent bundle of the space M, so that for all (u,m)\in U\times M, F(u,m) is a vector tangent to M at the point m. Given phase spaces U and M the set of all corresponding control systems forms a vector space which we denote by

\displaystyle{ \mathsf{Ctrl}(U\times M \to M)}

(More generally one can talk about the space of control systems associated with a surjective submersion Q\to M. For us, submersions of the form U\times M \to M are general enough.)

To encode the incoming arrows, we introduce the input tree I(a) (this is a very short tree, a corolla if you like). The input tree of a node a of a graph G is a directed graph whose arrows are precisely the arrows of G coming into the vertex a, but any two parallel arrows of G with target a will have disjoint sources in I(a). In the example the input tree I of the one node of a of G' is the tree

There is always a map of graphs \xi:I(a) \to G. For instance for the input tree in the example we just discussed, \xi is the map

Consequently if (G,{\mathcal P}) is a network and I(a) is an input tree of a node of G, then (I(a), {\mathcal P}\circ \xi) is also a network. This allows us to talk about the phase space {\mathbb{P}} I(a) of an input tree. In our running example,

{\mathbb{P}} I(a) = {\mathbb{R}}^2

Given a network (G,{\mathcal P}), there is a vector space \mathsf{Ctrl}({\mathbb{P}} I(a)\to {\mathbb{P}} a) of open systems associated with every node a of G.

In our running example, the vector space associated to the one node a of (G',{\mathcal P}') is

\mathsf{Ctrl}({\mathbb{R}}^2, {\mathbb{R}})  \simeq C^\infty({\mathbb{R}}^2, {\mathbb{R}})

In the same example, the network (G,{\mathcal P}) has three nodes and we associate the same vector space C^\infty({\mathbb{R}}^2, {\mathbb{R}}) to each one of them.

We then construct an interconnection map

\displaystyle{  {\mathcal{I}}: \bigsqcap_{a\in G_0} \mathsf{Ctrl}({\mathbb{P}} I(a)\to {\mathbb{P}} a) \to \Gamma (T{\mathbb{P}}(G, {\mathcal P})) }

from the product of spaces of all control systems to the space of vector fields

\Gamma (T{\mathbb{P}} (G, {\mathcal P}))

on the total phase space of the network. (We use the standard notation to denote the tangent bundle of a manifold R by TR and the space of vector fields on R by \Gamma (TR)). In our running example the interconnection map for the network (G',{\mathcal P}') is the map

\displaystyle{  {\mathcal{I}}: C^\infty({\mathbb{R}}^2, {\mathbb{R}}) \to C^\infty({\mathbb{R}}, {\mathbb{R}}), \quad f(x,u) \mapsto f(x,x).  }

The interconnection map for the network (G,{\mathcal P}) is the map

\displaystyle{  {\mathcal{I}}: C^\infty({\mathbb{R}}^2, {\mathbb{R}})^3 \to C^\infty({\mathbb{R}}^3, {\mathbb{R}}^3)}

given by

\displaystyle{  ({\mathscr{I}}(f_1,f_2, f_3))\,(x_1,x_2, x_3) = (f_1(x_1,x_2), f_2(x_2,x_1),  f_3(x_3,x_2)).  }

To summarize: a dynamical system on a network of phase spaces is the data (G, {\mathcal P}, (w_a)_{a\in G_0} ) where G=\{G_1\rightrightarrows G_0\} is a directed graph, {\mathcal P}:{\mathcal P}:G_0\to {\mathsf{PhaseSpace}} is a phase space function and (w_a)_{a\in G_0} is a collection of open systems compatible with the graph and the phase space function. The corresponding vector field on the total space of the network is obtained by interconnecting the open systems.

Dynamical systems on networks can be made into the objects of a category. Carrying this out gives us a way to associate maps of dynamical systems to combinatorial data.

The first step is to form the category of networks of phase spaces, which we call {\mathsf{Graph}}/{\mathsf{PhaseSpace}}. In this category, by definition, a morphism from a network (G,{\mathcal P}) to a network (G', {\mathcal P}') is a map of directed graphs \varphi:G\to G' which is compatible with the phase space functions:

\displaystyle{  {\mathcal P}'\circ \varphi = {\mathcal P}.  }

Using the universal properties of products it is easy to show that a map of networks \varphi: (G,{\mathcal P})\to (G',{\mathcal P}') defines a map {\mathbb{P}}\varphi of total phase spaces in the opposite direction:

\displaystyle{  {\mathbb{P}} \varphi: {\mathbb{P}} G' \to {\mathbb{P}} G.  }

In the category theory language the total phase space assignment extends to a contravariant functor

\displaystyle{ {\mathbb{P}}:  {({\mathsf{Graph}}/{\mathsf{Region}})}^{\mbox{\sf {\tiny {op}}}} \to  {\mathsf{Region}}.  }

We call this functor the total phase space functor. In our running example, the map

{\mathbb{P}}\varphi:{\mathbb{R}} = {\mathbb{P}}(G',{\mathcal P}') \to  {\mathbb{R}}^3 = {\mathbb{P}} (G,{\mathcal P})

is given by

\displaystyle{  {\mathbb{P}} \varphi (x) = (x,x,x).  }

Continuous-time dynamical systems also form a category, which we denote by \mathsf{DS}. The objects of this category are pairs consisting of a phase space and a vector field on this phase space. A morphism in this category is a smooth map of phase spaces that intertwines the two vector fields. That is:

\displaystyle{  \mathrm{Hom}_\mathsf{DS} ((M,X), (N,Y))   = \{f:M\to N \mid Df \circ X = Y\circ f\}  }

for any pair of objects (M,X), (N,Y) in \mathsf{DS}.

In general, morphisms in this category are difficult to determine explicitly. For example if (M, X) = ((a,b), \frac{d}{dt}) then a morphism from (M,X) to some dynamical system (N,Y) is simply a piece of an integral curve of the vector field Y defined on an interval (a,b). And if (M, X) = (S^1, \frac{d}{d\theta}) is the constant vector field on the circle then a morphism from (M,X) to (N,Y) is a periodic orbit of Y. Proving that a given dynamical system has a periodic orbit is usually hard.

Consequently, given a map of networks

\varphi:(G,{\mathcal P})\to (G',{\mathcal P}')

and a collection of open systems

\{w'_{a'}\}_{a'\in G'_0}

on (G',{\mathcal P}') we expect it to be very difficult if not impossible to find a collection of open systems \{w_a\}_{a\in G_0} so that

\displaystyle{  {\mathbb{P}} \varphi: ({\mathbb{P}} G', {\mathscr{I}}' (w'))\to ({\mathbb{P}} G, {\mathscr{I}} (w))  }

is a map of dynamical systems.

It is therefore somewhat surprising that there is a class of maps of graphs for which the above problem has an easy solution! The graph maps of this class are known by several different names. Following Boldi and Vigna we refer to them as graph fibrations. Note that despite what the name suggests, graph fibrations in general are not required to be surjective on nodes or edges. For example, the inclusion

is a graph fibration. We say that a map of networks

\varphi:(G,{\mathcal P})\to (G',{\mathcal P}')

is a fibration of networks if \varphi:G\to G' is a graph fibration. With some work one can show that a fibration of networks induces a pullback map

\displaystyle{  \varphi^*: \bigsqcap_{b\in G_0'} \mathsf{Ctrl}({\mathbb{P}} I(b)\to {\mathbb{P} b) \to  \bigsqcap_{a\in G_0} \mathsf{Ctrl}({\mathbb{P}}} I(a)\to {\mathbb{P}} a)  }

on the sets of tuples of the associated open systems. The main result of Dynamics on networks of manifolds is a proof that for a fibration of networks \varphi:(G,{\mathcal P})\to (G',{\mathcal P}') the maps \varphi^*, {\mathbb{P}} \varphi and the two interconnection maps {\mathcal{I}} and {\mathcal{I}}' are compatible:

Theorem. Let \varphi:(G,{\mathcal P})\to (G',{\mathcal P}') be a fibration of networks of manifolds. Then the pullback map

\displaystyle{ \varphi^*: \mathsf{Ctrl}(G',{\mathcal P}')\to \mathsf{Ctrl}(G,{\mathcal P})  }

is compatible with the interconnection maps

\displaystyle{  {\mathcal{I}}': \mathsf{Ctrl}(G',{\mathcal P}')) \to \Gamma (T{\mathbb{P}} G') }

and

\displaystyle{  {\mathcal{I}}:  (\mathsf{Ctrl}(G,{\mathcal P})) \to \Gamma (T{\mathbb{P}} G)  }

Namely for any collection w'\in \mathsf{Ctrl}(G',{\mathcal P}') of open systems on the network (G', {\mathcal P}') the diagram

commutes. In other words,

\displaystyle{ {\mathbb{P}} \varphi: ({\mathbb{P}}  (G',{\mathcal P}'), {\mathscr{I}}' (w'))\to ({\mathbb{P}} (G,  {\mathcal P}), {\mathscr{I}} (\varphi^* w')) }

is a map of continuous-time dynamical systems, a morphism in \mathsf{DS}.

At this point the pure mathematician in me is quite happy: I have a theorem! Better yet, the theorem solves the puzzle at the beginning of this post. But if you have read this far, you may well be wondering: “Ok, you told us about your theorem. Very nice. Now what?”

There is plenty to do. On the practical side of things, the continuous-time dynamical systems are much too limited for contemporary engineers. Most of the engineers I know care a lot more about hybrid systems. These kinds of systems exhibit both continuous time behavior and abrupt transitions, hence the name. For example, anti-lock breaks on a car is just that kind of a system — if a sensor detects that the car is skidding and the foot break is pressed, it starts pulsing the breaks. This is not your father’s continuous time dynamical system! Hybrid dynamical systems are very hard to understand. They have been all the rage in the engineering literature for the last 10-15 years. Sadly, most pure mathematicians never heard of them. It would be interesting to extend the theorem I am bragging about to hybrid systems.

Here is another problem that may be worth thinking about: how much of the theorem holds up to numerical simulation? My feeling is that any explicit numerical integration method will behave well. Implicit methods I am not sure about.

And finally a more general issue. John has been talking about networks quite a bit on this blog. But his networks and my networks look like very different mathematical structures. What do they have in common besides nodes and arrows? What mathematical structure are they glimpses of?


Follow

Get every new post delivered to your Inbox.

Join 2,877 other followers