Network Theory (Part 24)

23 August, 2012

Now we’ve reached the climax of our story so far: we’re ready to prove the deficiency zero theorem. First let’s talk about it informally a bit. Then we’ll review the notation, and then—hang on to your seat!—we’ll give the proof.

The crucial trick is to relate a bunch of chemical reactions, described by a ‘reaction network’ like this:

to a simpler problem where a system randomly hops between states arranged in the same pattern:

This is sort of amazing, because we’ve thrown out lots of detail. It’s also amazing because this simpler problem is linear. In the original problem, the chance that a reaction turns a B + E into a D is proportional to the number of B’s times the number of E’s. That’s nonlinear! But in the simplified problem, the chance that your system hops from state 4 to state 3 is just proportional to the probability that it’s in state 4 to begin with. That’s linear.

The wonderful thing is that, at least under some conditions, we can find equilibrium solutions of our original problem starting from equilibrium solutions of the simpler problem.

Let’s roughly sketch how it works, and where we are so far. Our simplified problem is described by an equation like this:

\displaystyle{ \frac{d}{d t} \psi = H \psi }

where \psi is a function that the probability of being in each state, and H describes the probability per time of hopping from one state to another. We can easily understand quite a lot about the equilibrium solutions, where \psi doesn’t change at all:

H \psi = 0

because this is a linear equation. We did this in Part 23. Of course, when I say ‘easily’, that’s a relative thing: we needed to use the Perron–Frobenius theorem, which Jacob introduced in Part 20. But that’s a well-known theorem in linear algebra, and it’s easy to apply here.

In Part 22, we saw that the original problem was described by an equation like this, called the ‘rate equation’:

\displaystyle{ \frac{d x}{d t} = Y H x^Y  }

Here x is a vector whose entries describe the amount of each kind of chemical: the amount of A’s, the amount of B’s, and so on. The matrix H is the same as in the simplified problem, but Y is a matrix that says how many times each chemical shows up in each spot in our reaction network:

The key thing to notice is x^Y, where we take a vector and raise it to the power of a matrix. We explained this operation back in Part 22. It’s this operation that says how many B + E pairs we have, for example, given the number of B’s and the number of E’s. It’s this that makes the rate equation nonlinear.

Now, we’re looking for equilibrium solutions of the rate equation, where the rate of change is zero:

Y H x^Y = 0

But in fact we’ll do even better! We’ll find solutions of this:

H x^Y = 0

And we’ll get these by taking our solutions of this:

H \psi = 0

and adjusting them so that

\psi = x^Y

while \psi remains a solution of H \psi = 0.

But: how do we do this ‘adjusting’? That’s the crux of the whole business! That’s what we’ll do today.

Remember, \psi is a function that gives a probability for each ‘state’, or numbered box here:

The picture here consists of two pieces, called ‘connected components’: the piece containing boxes 0 and 1, and the piece containing boxes 2, 3 and 4. It turns out that we can multiply \psi by a function that’s constant on each connected component, and if we had H \psi = 0 to begin with, that will still be true afterward. The reason is that there’s no way for \psi to ‘leak across’ from one component to another. It’s like having water in separate buckets. You can increase the amount of water in one bucket, and decrease it another, and as long as the water’s surface remains flat in each bucket, the whole situation remains in equilibrium.

That’s sort of obvious. What’s not obvious is that we can adjust \psi this way so as to ensure

\psi = x^Y

for some x.

And indeed, it’s not always true! It’s only true if our reaction network obeys a special condition. It needs to have ‘deficiency zero’. We defined this concept back in Part 21, but now we’ll finally use it for something. It turns out to be precisely the right condition to guarantee we can tweak any function on our set of states, multiplying it by a function that’s constant on each connected component, and get a new function \psi with

\psi = x^Y

When all is said and done, that is the key to the deficiency zero theorem.

Review

The battle is almost upon us—we’ve got one last chance to review our notation. We start with a stochastic reaction network:

This consists of:

• finite sets of transitions T, complexes K and species S,

• a map r: T \to (0,\infty) giving a rate constant for each transition,

source and target maps s,t : T \to K saying where each transition starts and ends,

• a one-to-one map Y : K \to \mathbb{N}^S saying how each complex is made of species.

Then we extend s, t and Y to linear maps:

Then we put inner products on these vector spaces as described last time, which lets us ‘turn around’ the maps s and t by taking their adjoints:

s^\dagger, t^\dagger : \mathbb{R}^K \to \mathbb{R}^T

More surprisingly, we can ‘turn around’ Y and get a nonlinear map using ‘matrix exponentiation’:

\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\                               x     &\mapsto&   x^Y \end{array}

This is most easily understood by thinking of x as a row vector and Y as a matrix:

Remember, complexes are made out of species. The matrix entry Y_{i j} says how many things of the jth species there are in a complex of the ith kind. If \psi \in \mathbb{R}^K says how many complexes there are of each kind, Y \psi \in \mathbb{R}^S says how many things there are of each species. Conversely, if x \in \mathbb{R}^S says how many things there are of each species, x^Y \in \mathbb{R}^K says how many ways we can build each kind of complex from them.

So, we get these maps:

Next, the boundary operator

\partial : \mathbb{R}^T \to \mathbb{R}^K

describes how each transition causes a change in complexes:

\partial = t - s

As we saw last time, there is a Hamiltonian

H : \mathbb{R}^K \to \mathbb{R}^K

describing a Markov processes on the set of complexes, given by

H = \partial s^\dagger

But the star of the show is the rate equation. This describes how the number of things of each species changes with time. We write these numbers in a list and get a vector x \in \mathbb{R}^S with nonnegative components. The rate equation says:

\displaystyle{ \frac{d x}{d t} = Y H x^Y }

We can read this as follows:

x says how many things of each species we have now.

x^Y says how many complexes of each kind we can build from these species.

s^\dagger x^Y says how many transitions of each kind can originate starting from these complexes, with each transition weighted by its rate.

H x^Y = \partial s^\dagger x^Y is the rate of change of the number of complexes of each kind, due to these transitions.

Y H x^Y is the rate of change of the number of things of each species.

The zero deficiency theorem

We are looking for equilibrium solutions of the rate equation, where the number of things of each species doesn’t change at all:

Y H x^Y = 0

In fact we will find complex balanced equilibrium solutions, where even the number of complexes of each kind doesn’t change:

H x^Y = 0

More precisely, we have:

Deficiency Zero Theorem (Child’s Version). Suppose we have a reaction network obeying these two conditions:

1. It is weakly reversible, meaning that whenever there’s a transition from one complex \kappa to another \kappa', there’s a directed path of transitions going back from \kappa' to \kappa.

2. It has deficiency zero, meaning \mathrm{im} \partial  \cap \mathrm{ker} Y = \{ 0 \} .

Then for any choice of rate constants there exists a complex balanced equilibrium solution of the rate equation where all species are present in nonzero amounts. In other words, there exists x \in \mathbb{R}^S with all components positive and such that:

H x^Y = 0

Proof. Because our reaction network is weakly reversible, the theorems in Part 23 show there exists \psi \in (0,\infty)^K with

H \psi = 0

This \psi may not be of the form x^Y, but we shall adjust \psi so that it becomes of this form, while still remaining a solution of H \psi = 0 latex . To do this, we need a couple of lemmas:

Lemma 1. \mathrm{ker} \partial^\dagger + \mathrm{im} Y^\dagger = \mathbb{R}^K.

Proof. We need to use a few facts from linear algebra. If V is a finite-dimensional vector space with inner product, the orthogonal complement L^\perp of a subspace L \subseteq V consists of vectors that are orthogonal to everything in L:

L^\perp = \{ v \in V : \quad \forall w \in L \quad \langle v, w \rangle = 0 \}

We have

(L \cap M)^\perp = L^\perp + M^\perp

where L and M are subspaces of V and + denotes the sum of two subspaces: that is, the smallest subspace containing both. Also, if T: V \to W is a linear map between finite-dimensional vector spaces with inner product, we have

(\mathrm{ker} T)^\perp = \mathrm{im} T^\dagger

and

(\mathrm{im} T)^\perp = \mathrm{ker} T^\dagger

Now, because our reaction network has deficiency zero, we know that

\mathrm{im} \partial \cap \mathrm{ker} Y = \{ 0 \}

Taking the orthogonal complement of both sides, we get

(\mathrm{im} \partial \cap \mathrm{ker} Y)^\perp = \mathbb{R}^K

and using the rules we mentioned, we obtain

\mathrm{ker} \partial^\dagger + \mathrm{im} Y^\dagger = \mathbb{R}^K

as desired.   █

Now, given a vector \phi in \mathbb{R}^K or \mathbb{R}^S with all positive components, we can define the logarithm of such a vector, component-wise:

(\ln \phi)_i = \ln (\phi_i)

Similarly, for any vector \phi in either of these spaces, we can define its exponential in a component-wise way:

(\exp \phi)_i = \exp(\phi_i)

These operations are inverse to each other. Moreover:

Lemma 2. The nonlinear operator

\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\                               x     &\mapsto&   x^Y \end{array}

is related to the linear operator

\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\                               x     &\mapsto&   Y^\dagger x \end{array}

by the formula

x^Y = \exp(Y^\dagger \ln x )

which holds for all x \in (0,\infty)^S.

Proof. A straightforward calculation. By the way, this formula would look a bit nicer if we treated \ln x as a row vector and multiplied it on the right by Y: then we would have

x^Y = \exp((\ln x) Y)

The problem is that we are following the usual convention of multiplying vectors by matrices on the left, yet writing the matrix on the right in x^Y. Taking the transpose Y^\dagger of the matrix Y serves to compensate for this.   █

Now, given our vector \psi \in (0,\infty)^K with H \psi = 0, we can take its logarithm and get \ln \psi \in \mathbb{R}^K. Lemma 1 says that

\mathbb{R}^K = \mathrm{ker} \partial^\dagger + \mathrm{im} Y^\dagger

so we can write

\ln \psi =  \alpha + Y^\dagger \beta

where \alpha \in \mathrm{ker} \partial^\dagger and \beta \in \mathbb{R}^S. Moreover, we can write

\beta = \ln x

for some x \in (0,\infty)^S, so that

\ln \psi = \alpha + Y^\dagger (\ln x)

Exponentiating both sides component-wise, we get

\psi  =   \exp(\alpha) \; \exp(Y^\dagger (\ln x))

where at right we are taking the component-wise product of vectors. Thanks to Lemma 2, we conclude that

\psi = \exp(\alpha) x^Y

So, we have taken \psi and almost written it in the form x^Y—but not quite! We can adjust \psi to make it be of this form:

\exp(-\alpha) \psi = x^Y

Clearly all the components of \exp(-\alpha) \psi are positive, since the same is true for both \psi and \exp(-\alpha). So, the only remaining task is to check that

H(\exp(-\alpha) \psi) = 0

We do this using two lemmas:

Lemma 3. If H \psi = 0 and \alpha \in \mathrm{ker} \partial^\dagger, then H(\exp(-\alpha) \psi) = 0.

Proof. It is enough to check that multiplication by \exp(-\alpha) commutes with the Hamiltonian H, since then

H(\exp(-\alpha) \psi) = \exp(-\alpha) H \psi = 0

Recall from Part 23 that H is the Hamiltonian of a Markov process associated to this ‘graph with rates’:

As noted here:

• John Baez and Brendan Fong, A Noether theorem for Markov processes.

multiplication by some function on K commutes with H if and only if that function is constant on each connected component of this graph. Such functions are called conserved quantities.

So, it suffices to show that \exp(-\alpha) is constant on each connected component. For this, it is enough to show that \alpha itself is constant on each connected component. But this will follow from the next lemma, since \alpha \in \mathrm{ker} \partial^\dagger.   █

Lemma 4. A function \alpha \in \mathbb{R}^K is a conserved quantity iff \partial^\dagger \alpha = 0 . In other words, \alpha is constant on each connected component of the graph s, t: T \to K iff \partial^\dagger \alpha = 0 .

Proof. Suppose \partial^\dagger \alpha = 0, or in other words, \alpha \in \mathrm{ker} \partial^\dagger, or in still other words, \alpha \in (\mathrm{im} \partial)^\perp. To show that \alpha is constant on each connected component, it suffices to show that whenever we have two complexes connected by a transition, like this:

\tau: \kappa \to \kappa'

then \alpha takes the same value at both these complexes:

\alpha_\kappa = \alpha_{\kappa'}

To see this, note

\partial \tau = t(\tau) - s(\tau) = \kappa' - \kappa

and since \alpha \in (\mathrm{im} \partial)^\perp, we conclude

\langle \alpha, \kappa' - \kappa \rangle = 0

But calculating this inner product, we see

\alpha_{\kappa'} - \alpha_{\kappa} = 0

as desired.

For the converse, we simply turn the argument around: if \alpha is constant on each connected component, we see \langle \alpha, \kappa' - \kappa \rangle = 0 whenever there is a transition \tau : \kappa \to \kappa'. It follows that \langle \alpha, \partial \tau \rangle = 0 for every transition \tau, so \alpha \in (\mathrm{im} \partial)^\perp .

And thus concludes the proof of the lemma!   █

And thus concludes the proof of the theorem!   █

And thus concludes this post!


Network Theory (Part 23)

21 August, 2012

We’ve been looking at reaction networks, and we’re getting ready to find equilibrium solutions of the equations they give. To do this, we’ll need to connect them to another kind of network we’ve studied. A reaction network is something like this:

It’s a bunch of complexes, which are sums of basic building-blocks called species, together with arrows called transitions going between the complexes. If we know a number for each transition describing the rate at which it occurs, we get an equation called the ‘rate equation’. This describes how the amount of each species changes with time. We’ve been talking about this equation ever since the start of this series! Last time, we wrote it down in a new very compact form:

\displaystyle{ \frac{d x}{d t} = Y H x^Y  }

Here x is a vector whose components are the amounts of each species, while H and Y are certain matrices.

But now suppose we forget how each complex is made of species! Suppose we just think of them as abstract things in their own right, like numbered boxes:

We can use these boxes to describe states of some system. The arrows still describe transitions, but now we think of these as ways for the system to hop from one state to another. Say we know a number for each transition describing the probability per time at which it occurs:

Then we get a ‘Markov process’—or in other words, a random walk where our system hops from one state to another. If \psi is the probability distribution saying how likely the system is to be in each state, this Markov process is described by this equation:

\displaystyle{ \frac{d \psi}{d t} = H \psi  }

This is simpler than the rate equation, because it’s linear. But the matrix H is the same—we’ll see that explicitly later on today.

What’s the point? Well, our ultimate goal is to prove the deficiency zero theorem, which gives equilibrium solutions of the rate equation. That means finding x with

Y H x^Y = 0

Today we’ll find all equilibria for the Markov process, meaning all \psi with

H \psi = 0

Then next time we’ll show some of these have the form

\psi = x^Y

So, we’ll get

H x^Y = 0

and thus

Y H x^Y = 0

as desired!

So, let’s get to to work.

The Markov process of a graph with rates

We’ve been looking at stochastic reaction networks, which are things like this:

However, we can build a Markov process starting from just part of this information:

Let’s call this thing a ‘graph with rates’, for lack of a better name. We’ve been calling the things in K ‘complexes’, but now we’ll think of them as ‘states’. So:

Definition. A graph with rates consists of:

• a finite set of states K,

• a finite set of transitions T,

• a map r: T \to (0,\infty) giving a rate constant for each transition,

source and target maps s,t : T \to K saying where each transition starts and ends.

Starting from this, we can get a Markov process describing how a probability distribution \psi on our set of states will change with time. As usual, this Markov process is described by a master equation:

\displaystyle{ \frac{d \psi}{d t} = H \psi }

for some Hamiltonian:

H : \mathbb{R}^K \to \mathbb{R}^K

What is this Hamiltonian, exactly? Let’s think of it as a matrix where H_{i j} is the probability per time for our system to hop from the state j to the state i. This looks backwards, but don’t blame me—blame the guys who invented the usual conventions for matrix algebra. Clearly if i \ne j this probability per time should be the sum of the rate constants of all transitions going from j to i:

\displaystyle{ i \ne j \quad \Rightarrow \quad H_{i j} =  \sum_{\tau: j \to i} r(\tau) }

where we write \tau: j \to i when \tau is a transition with source j and target i.

Now, we saw in Part 11 that for a probability distribution to remain a probability distribution as it evolves in time according to the master equation, we need H to be infinitesimal stochastic: its off-diagonal entries must be nonnegative, and the sum of the entries in each column must be zero.

The first condition holds already, and the second one tells us what the diagonal entries must be. So, we’re basically done describing H. But we can summarize it this way:

Puzzle 1. Think of \mathbb{R}^K as the vector space consisting of finite linear combinations of elements \kappa \in K. Then show

\displaystyle{  H \kappa = \sum_{s(\tau) = \kappa} r(\tau) (t(\tau) - s(\tau)) }

Equilibrium solutions of the master equation

Now we’ll classify equilibrium solutions of the master equation, meaning \psi \in \mathbb{R}^K with

H \psi = 0

We’ll do only do this when our graph with rates is ‘weakly reversible’. This concept doesn’t actually depend on the rates, so let’s be general and say:

Definition. A graph is weakly reversible if for every edge \tau : i \to j, there is directed path going back from j to i, meaning that we have edges

\tau_1 : j \to j_1 , \quad \tau_2 : j_1 \to j_2 , \quad \dots, \quad \tau_n: j_{n-1} \to i

This graph with rates is not weakly reversible:

but this one is:

The good thing about the weakly reversible case is that we get one equilibrium solution of the master equation for each component of our graph, and all equilibrium solutions are linear combinations of these. This is not true in general! For example, this guy is not weakly reversible:

It has only one component, but the master equation has two linearly independent equilibrium solutions: one that vanishes except at the state 0, and one that vanishes except at the state 2.

The idea of a ‘component’ is supposed to be fairly intuitive—our graph falls apart into pieces called components—but we should make it precise. As explained in Part 21, the graphs we’re using here are directed multigraphs, meaning things like

s, t : E \to V

where E is the set of edges (our transitions) and V is the set of vertices (our states). There are actually two famous concepts of ‘component’ for graphs of this sort: ‘strongly connected’ components and ‘connected’ components. We only need connected components, but let me explain both concepts, in a futile attempt to slake your insatiable thirst for knowledge.

Two vertices i and j of a graph lie in the same strongly connected component iff you can find a directed path of edges from i to j and also one from j back to i.

Remember, a directed path from i to j looks like this:

i \to a \to b \to c \to j

Here’s a path from x to y that is not directed:

i \to a \leftarrow b \to c \to j

and I hope you can write down the obvious but tedious definition of an ‘undirected path’, meaning a path made of edges that don’t necessarily point in the correct direction. Given that, we say two vertices i and j lie in the same connected component iff you can find an undirected path going from i to j. In this case, there will automatically also be an undirected path going from j to i.

For example, i and j lie in the same connected component here, but not the same strongly connected component:

i \to a \leftarrow b \to c \to j

Here’s a graph with one connected component and 3 strongly connected components, which are marked in blue:

For the theory we’re looking at now, we only care about connected components, not strongly connected components! However:

Puzzle 2. Show that for weakly reversible graph, the connected components are the same as the strongly connected components.

With these definitions out of way, we can state today’s big theorem:

Theorem. Suppose H is the Hamiltonian of a weakly reversible graph with rates:

Then for each connected component C \subseteq K, there exists a unique probability distribution \psi_C \in \mathbb{R}^K that is positive on that component, zero elsewhere, and is an equilibrium solution of the master equation:

H \psi_C = 0

Moreover, these probability distributions \psi_C form a basis for the space of equilibrium solutions of the master equation. So, the dimension of this space is the number of components of K.

Proof. We start by assuming our graph has one connected component. We use the Perron–Frobenius theorem, as explained in Part 20. This applies to ‘nonnegative’ matrices, meaning those whose entries are all nonnegative. That is not true of H itself, but only its diagonal entries can be negative, so if we choose a large enough number c > 0, H + c I will be nonnegative.

Since our graph is weakly reversible and has one connected component, it follows straight from the definitions that the operator H + c I will also be ‘irreducible’ in the sense of Part 20. The Perron–Frobenius theorem then swings into action, and we instantly conclude several things.

First, H + c I has a positive real eigenvalue r such that any other eigenvalue, possibly complex, has absolute value \le r. Second, there is an eigenvector \psi with eigenvalue r and all positive components. Third, any other eigenvector with eigenvalue r is a scalar multiple of \psi.

Subtracting c, it follows that \lambda = r - c is the eigenvalue of H with the largest real part. We have H \psi = \lambda \psi, and any other vector with this property is a scalar multiple of \psi.

We can show that in fact \lambda = 0. To do this we copy an argument from Part 20. First, since \psi is positive we can normalize it to be a probability distribution:

\displaystyle{ \sum_{i \in K} \psi_i = 1 }

Since H is infinitesimal stochastic, \exp(t H) sends probability distributions to probability distributions:

\displaystyle{ \sum_{i \in K} (\exp(t H) \psi)_i = 1 }

for all t \ge 0. On the other hand,

\displaystyle{ \sum_{i \in K} (\exp(t H)\psi)_i = \sum_{i \in K} e^{t \lambda} \psi_i = e^{t \lambda} }

so we must have \lambda = 0.

We conclude that when our graph has one connected component, there is a probability distribution \psi \in \mathbb{R}^K that is positive everywhere and has H \psi = 0. Moreover, any \phi \in \mathbb{R}^K with H \phi = 0 is a scalar multiple of \psi.

When K has several components, the matrix H is block diagonal, with one block for each component. So, we can run the above argument on each component C \subseteq K and get a probability distribution \psi_C \in \mathbb{R}^K that is positive on C. We can then check that H \psi_C = 0 and that every \phi \in \mathbb{R}^K with H \phi = 0 can be expressed as a linear combination of these probability distributions \psi_C in a unique way.   █

This result must be absurdly familiar to people who study Markov processes, but I haven’t bothered to look up a reference yet. Do you happen to know a good one? I’d like to see one that generalizes this theorem to graphs that aren’t weakly reversible. I think I see how it goes. We don’t need that generalization right now, but it would be good to have around.

The Hamiltonian, revisited

One last small piece of business: last time I showed you a very slick formula for the Hamiltonian H. I’d like to prove it agrees with the formula I gave this time.

We start with any graph with rates:

We extend s and t to linear maps between vector spaces:

We define the boundary operator just as we did last time:

\partial = t - s

Then we put an inner product on the vector spaces \mathbb{R}^T and \mathbb{R}^K. So, for \mathbb{R}^K we let the elements of K be an orthonormal basis, but for \mathbb{R}^T we define the inner product in a more clever way involving the rate constants:

\displaystyle{ \langle \tau, \tau' \rangle = \frac{1}{r(\tau)} \delta_{\tau, \tau'} }

where \tau, \tau' \in T. This lets us define adjoints of the maps s, t and \partial, via formulas like this:

\langle s^\dagger \phi, \psi \rangle = \langle \phi, s \psi \rangle

Then:

Theorem. The Hamiltonian for a graph with rates is given by

H = \partial s^\dagger

Proof. It suffices to check that this formula agrees with the formula for H given in Puzzle 1:

\displaystyle{   H \kappa = \sum_{s(\tau) = \kappa} r(\tau) (t(\tau) - s(\tau)) }

Here we are using the complex \kappa \in K as a name for one of the standard basis vectors of \mathbb{R}^K. Similarly shall we write things like \tau or \tau' for basis vectors of \mathbb{R}^T.

First, we claim that

\displaystyle{ s^\dagger \kappa = \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, \tau }

To prove this it’s enough to check that taking the inner products of either sides with any basis vector \tau', we get results that agree. On the one hand:

\begin{array}{ccl}  \langle \tau' , s^\dagger \kappa \rangle &=&   \langle s \tau', \kappa \rangle \\  \\  &=& \delta_{s(\tau'), \kappa}    \end{array}

On the other hand:

\begin{array}{ccl} \displaystyle{ \langle \tau', \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, \tau \rangle } &=&  \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, \langle \tau', \tau \rangle   \\  \\  &=& \displaystyle{ \sum_{\tau: \; s(\tau) = \kappa} \delta_{\tau', \tau} }   \\  \\  &=&   \delta_{s(\tau'), \kappa}   \end{array}

where the factor of 1/r(\tau) in the inner product on \mathbb{R}^T cancels the visible factor of r(\tau). So indeed the results match.

Using this formula for s^\dagger \kappa we now see that

\begin{array}{ccl}  H \kappa &=& \partial s^\dagger \kappa   \\  \\  &=& \partial \displaystyle{ \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, \tau }    \\  \\  &=& \displaystyle{ \sum_{\tau: \; s(\tau) = \kappa} r(\tau) \, (t(\tau) - s(\tau)) }  \end{array}

which is precisely what we want.   █

I hope you see through the formulas to their intuitive meaning. As usual, the formulas are just a way of precisely saying something that makes plenty of sense. If \kappa is some state of our Markov process, s^\dagger \kappa is the sum of all transitions starting at this state, weighted by their rates. Applying \partial to a transition tells us what change in state it causes. So \partial s^\dagger \kappa tells us the rate at which things change when we start in the state \kappa. That’s why \partial s^\dagger is the Hamiltonian for our Markov process. After all, the Hamiltonian tells us how things change:

\displaystyle{ \frac{d \psi}{d t} = H \psi }

Okay, we’ve got all the machinery in place. Next time we’ll prove the deficiency zero theorem!


Network Theory (Part 22)

16 August, 2012

Okay, now let’s dig deeper into the proof of the deficiency zero theorem. We’re only going to prove a baby version, at first. Later we can enhance it:

Deficiency Zero Theorem (Baby Version). Suppose we have a weakly reversible reaction network with deficiency zero. Then for any choice of rate constants there exists an equilibrium solution of the rate equation where all species are present in nonzero amounts.

The first step is to write down the rate equation in a new, more conceptual way. It’s incredibly cool. You’ve probably seen Schrödinger’s equation, which describes the motion of a quantum particle:

\displaystyle { \frac{d \psi}{d t} = -i H \psi }

If you’ve been following this series, you’ve probably also seen the master equation, which describes the motion of a ‘stochastic’ particle:

\displaystyle { \frac{d \psi}{d t} = H \psi }

A ‘stochastic’ particle is one that’s carrying out a random walk, and now \psi describes its probability to be somewhere, instead of its amplitude.

Today we’ll see that the rate equation for a reaction network looks somewhat similar:

\displaystyle { \frac{d x}{d t} =  Y H x^Y }

where Y is some matrix, and x^Y is defined using a new thing called ‘matrix exponentiation’, which makes the equation nonlinear!

If you’re reading this you probably know how to multiply a vector by a matrix. But if you’re like me, you’ve never seen anyone take a vector and raise to the power of some matrix! I’ll explain it, don’t worry… right now I’m just trying to get you intrigued. It’s not complicated, but it’s exciting how this unusual operation shows up naturally in chemistry. That’s just what I’m looking for these days: new math ideas that show up in practical subjects like chemistry, and new ways that math can help people in these subjects.

Since we’re looking for an equilibrium solution of the rate equation, we actually want to solve

\displaystyle { \frac{d x}{d t} =  0 }

or in other words

Y H x^Y = 0

In fact we will do better: we will find a solution of

H x^Y = 0

And we’ll do this in two stages:

• First we’ll find all solutions of

H \psi = 0

This equation is linear, so it’s easy to understand.

• Then, among these solutions \psi, we’ll find one that also obeys

\psi = x^Y

This is a nonlinear problem involving matrix exponentiation, but still, we can do it, using a clever trick called ‘logarithms’.

Putting the pieces together, we get our solution of

H x^Y = 0

and thus our equilibrium solution of the rate equation:

\displaystyle { \frac{d x}{d t} = Y H x^Y = 0 }

That’s a rough outline of the plan. But now let’s get started, because the details are actually fascinating. Today I’ll just show you how to rewrite the rate equation in this new way.

The rate equation

Remember how the rate equation goes. We start with a stochastic reaction network, meaning a little diagram like this:

This contains quite a bit of information:

• a finite set T of transitions,

• a finite set K of complexes,

• a finite set S of species,

• a map r: T \to (0,\infty) giving a rate constant for each transition,

source and target maps s,t : T \to K saying where each transition starts and ends,

• a one-to-one map Y : K \to \mathbb{N}^S saying how each complex is made of species.

Given all this, the rate equation says how the amount of each species changes with time. We describe these amounts with a vector x \in [0,\infty)^S. So, we want a differential equation filling in the question marks here:

\displaystyle{ \frac{d x}{d t} = ??? }

Now last time, we started by thinking of K as a subset of \mathbb{N}^S, and thus of the vector space \mathbb{R}^S. Back then, we wrote the rate equation as follows:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) \; \left(t(\tau) - s(\tau)\right) \; x^{s(\tau)} }

where vector exponentiation is defined by

x^s = x_1^{s_1} \cdots x_k^{s_k}

when x and s are vectors in \mathbb{R}^S.

However, we’ve now switched to thinking of our set of complexes K as a set in its own right that is mapped into \mathbb{N}^S by Y. This is good for lots of reasons, like defining the concept of ‘deficiency’, which we did last time. But it means the rate equation above doesn’t quite parse anymore! Things like s(\tau) and t(\tau) live in K; we need to explicitly convert them into elements of \mathbb{R}^S using Y for our equation to make sense!

So now we have to write the rate equation like this:

\displaystyle{ \frac{d x}{d t} = Y \sum_{\tau \in T} r(\tau) \;  \left(t(\tau) - s(\tau)\right) \; x^{Y s(\tau)} }

This looks more ugly, but if you’ve got even one mathematical bone in your body, you can already see vague glimmerings of how we’ll rewrite this the way we want:

\displaystyle { \frac{d x}{d t} =  Y H x^Y }

Here’s how.

First, we extend our maps s, t and Y to linear maps between vector spaces:

Then, we put an inner product on the vector spaces \mathbb{R}^T, \mathbb{R}^K and \mathbb{R}^S. For \mathbb{R}^K we do this in the most obvious way, by letting the complexes be an orthonormal basis. So, given two complexes \kappa, \kappa', we define their inner product by

\langle \kappa, \kappa' \rangle = \delta_{\kappa, \kappa'}

We do the same for \mathbb{R}^S. But for \mathbb{R}^T we define the inner product in a more clever way involving the rate constants. If \tau, \tau' \in T are two transitions, we define their inner product by:

\langle \tau, \tau' \rangle = \frac{1}{r(\tau)} \delta_{\tau, \tau'}

This will seem perfectly natural when we continue our study of circuits made of electrical resistors, and if you’re very clever you can already see it lurking in Part 16. But never mind.

Having put inner products on these three vector spaces, we can take the adjoints of the linear maps between them, to get linear maps going back the other way:

These are defined in the usual way—though we’re using daggers here they way physicists do, where mathematicians would prefer to see stars! For example, s^\dagger : \mathbb{R}^K \to \mathbb{R}^T is defined by the relation

\langle s^\dagger \phi, \psi \rangle = \langle \phi, s \psi \rangle

and so on.

Next, we set up a random walk on the set of complexes. Remember, our reaction network is a graph with complexes as vertices and transitions as edges, like this:

Each transition \tau has a number attached to it: the rate constant r(\tau). So, we can randomly hop from complex to complex along these transitions, with probabilities per unit time described by these numbers. The probability of being at some particular complex will then be described by a function

\psi : K \to \mathbb{R}

which also depends on time, and changes according to the equation

\displaystyle { \frac{d \psi}{d t} = H \psi }

for some Hamiltonian

H : \mathbb{R}^K \to \mathbb{R}^K

I defined this Hamiltonian back in Part 15, but now I see a slicker way to write it:

H = (t - s) s^\dagger

I’ll justify this next time. For now, the main point is that with this Hamiltonian, the rate equation is equivalent to this:

\displaystyle{ \frac{d x}{d t} = Y H x^Y }

The only thing I haven’t defined yet is the funny exponential x^Y. That’s what makes the equation nonlinear. We’re taking a vector to the power of a matrix and getting a vector. This sounds weird—but it actually makes sense!

It only makes sense because we have chosen bases for our vector spaces. To understand it, let’s number our species 1, \dots, k as we’ve been doing all along, and number our complexes 1, \dots, \ell. Our linear map Y : \mathbb{R}^K \to \mathbb{R}^S then becomes a k \times \ell matrix of natural numbers. Its entries say how many times each species shows up in each complex:

Y = \left( \begin{array}{cccc}   Y_{11} & Y_{12}  & \cdots & Y_{1 \ell} \\  Y_{21} & Y_{22}  & \cdots & Y_{2 \ell} \\  \vdots  & \vdots   & \ddots & \vdots \\  Y_{k1} & Y_{k2}  & \cdots & Y_{k \ell} \end{array} \right)

The entry Y_{i j} says how many times the ith species shows up in the jth complex.

Now, let’s be a bit daring and think of the vector x \in \mathbb{R}^S as a row vector with k entries:

x = \left(x_1 , x_2 ,  \dots ,  x_k \right)

Then we can multiply x on the right by the matrix Y and get a vector in \mathbb{R}^K:

\begin{array}{ccl} x Y &=& ( x_1 , x_2, \dots, x_k) \;   \left( \begin{array}{cccc}   Y_{11} & Y_{12}  & \cdots & Y_{1 \ell} \\  Y_{21} & Y_{22}  & \cdots & Y_{2 \ell} \\  \vdots  & \vdots   & \ddots & \vdots \\  Y_{k1} & Y_{k2}  & \cdots & Y_{k \ell} \end{array} \right)   \\  \\  &=& \left( x_1 Y_{11} + \cdots + x_k Y_{k1}, \; \dots, \; x_1 Y_{1 \ell} + \cdots + x_k Y_{k \ell} \right)   \end{array}

So far, no big deal. But now you’re ready to see the definition of x^Y, which is very similar:

It’s exactly the same, but with multiplication replacing addition, and exponentiation replacing multiplication! Apparently my class on matrices stopped too soon: we learned about matrix multiplication, but matrix exponentiation is also worthwhile.

What’s the point of it? Well, suppose you have a certain number of hydrogen molecules, a certain number of oxygen molecules, a certain number of water molecules, and so on—a certain number of things of each species. You can list these numbers and get a vector x \in \mathbb{R}^S. Then the components of x^Y describe how many ways you can build up each complex from the things you have. For example,

x_1^{Y_{11}} x_2^{Y_{21}} \cdots  x_k^{Y_{k1}}

say roughly how many ways you can build complex 1 by picking Y_{11} things of species 1, Y_{21} things of species 2, and so on.

Why ‘roughly’? Well, we’re pretending we can pick the same thing twice. So if we have 4 water molecules and we need to pick 3, this formula gives 4^3. The right answer is 4 \times 3 \times 2. To get this answer we’d need to use the ‘falling power’ 4^{\underline{3}} = 4 \times 3 \times 2, as explained in Part 4. But the rate equation describes chemistry in the limit where we have lots of things of each species. In this limit, the ordinary power becomes a good approximation.

Puzzle. In this post we’ve seen a vector raised to a matrix power, which is a vector, and also a vector raised to a vector power, which is a number. How are they related?

There’s more to say about this, which I’d be glad to explain if you’re interested. But let’s get to the punchline:

Theorem. The rate equation:

\displaystyle{ \frac{d x}{d t} = Y \sum_{\tau \in T} r(\tau) \; \left(t(\tau) - s(\tau)\right) \; x^{Y s(\tau)} }

is equivalent to this equation:

\displaystyle{ \frac{d x}{d t} = Y (t - s) s^\dagger x^Y }

or in other words:

\displaystyle{ \frac{d x}{d t} = Y H x^Y }

Proof. It’s enough to show

\displaystyle{ (t - s) s^\dagger x^Y = \sum_{\tau \in T} r(\tau) \; \left(t(\tau) - s(\tau)\right) \; x^{Y s(\tau)} }

So, we’ll compute (t - s) s^\dagger x^Y, and think about the meaning of each quantity we get en route.

We start with x \in \mathbb{R}^S. This is a list of numbers saying how many things of each species we have: our raw ingredients, as it were. Then we compute

x^Y = (x_1^{Y_{11}} \cdots  x_k^{Y_{k1}} ,  \dots,  x_1^{Y_{1 \ell}} \cdots x_k^{Y_{k \ell}} )

This is a vector in \mathbb{R}^K. It’s a list of numbers saying how many ways we can build each complex starting from our raw ingredients.

Alternatively, we can write this vector x^Y as a sum over basis vectors:

x^Y = \sum_{\kappa \in K} x_1^{Y_{1\kappa}} \cdots  x_k^{Y_{k\kappa}} \; \kappa

Next let’s apply s^\dagger to this. We claim that

\displaystyle{ s^\dagger \kappa = \sum_{\tau \; : \; s(\tau) = \kappa} r(\tau) \; \tau }

In other words, we claim s^\dagger \kappa is the sum of all the transitions having \kappa as their source, weighted by their rate constants! To prove this claim, it’s enough to take the inner product of each side with any transition \tau', and check that we get the same answer. For the left side we get

\langle s^\dagger \kappa, \tau' \rangle = \langle \kappa, s(\tau') \rangle = \delta_{\kappa, s (\tau') }

To compute the right side, we need to use the cleverly chosen inner product on \mathbb{R}^T. Here we get

\displaystyle{ \left\langle \sum_{\tau \; : \; s(\tau) = \kappa} r(\tau) \tau, \; \tau' \right\rangle =  \sum_{\tau \; : \; s(\tau) = \kappa} \delta_{\tau, \tau'} = \delta_{\kappa, s(\tau')}  }

In the first step here, the factor of 1 /r(\tau) in the cleverly chosen inner product canceled the visible factor of r(\tau). For the second step, you just need to think for half a minute—or ten, depending on how much coffee you’ve had.

Either way, we we conclude that indeed

\displaystyle{ s^\dagger \kappa = \sum_{\tau \; : \; s(\tau) = \kappa} r(\tau) \tau }

Next let’s combine this with our formula for x^Y:

\displaystyle { x^Y = \sum_{\kappa \in K} x_1^{Y_{1\kappa}} \cdots  x_k^{Y_{k\kappa}} \; \kappa }

We get this:

\displaystyle { s^\dagger x^Y = \sum_{\kappa, \tau \; : \; s(\tau) = \kappa} r(\tau) \; x_1^{Y_{1\kappa}} \cdots  x_k^{Y_{k\kappa}} \; \tau }

In other words, s^\dagger x^Y is a linear combination of transitions, where each one is weighted both by the rate it happens and how many ways it can happen starting with our raw ingredients.

Our goal is to compute (t - s)s^\dagger x^Y. We’re almost there. Remember, s says which complex is the input of a given transition, and t says which complex is the output. So, (t - s) s^\dagger x^Y says the total rate at which complexes are created and/or destroyed starting with the species in x as our raw ingredients.

That sounds good. But let’s just pedantically check that everything works. Applying t - s to both sides of our last equation, we get

\displaystyle{ (t - s) s^\dagger x^Y = \sum_{\kappa, \tau \; : \; s(\tau) = \kappa} r(\tau) \; x_1^{Y_{1\kappa}} \cdots  x_k^{Y_{k\kappa}} \; \left( t(\tau) - s(\tau)\right) }

Remember, our goal was to prove that this equals

\displaystyle{ \sum_{\tau \in T} r(\tau) \; \left(t(\tau) - s(\tau)\right) \; x^{Y s(\tau)} }

But if you stare at these a while and think, you’ll see they’re equal.   █

It took me a couple of weeks to really understand this, so I’ll be happy if it takes you just a few days. It seems peculiar at first but ultimately it all makes sense. The interesting subtlety is that we use the linear map called ‘multiplying by Y’:

\begin{array}{ccc} \mathbb{R}^K &\to& \mathbb{R}^S \\                               \psi     &\mapsto&   Y \psi \end{array}

to take a bunch of complexes and work out the species they contain, while we use the nonlinear map called ‘raising to the Yth power’:

\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\                               x     &\mapsto&   x^Y \end{array}

to take a bunch of species and work out how many ways we can build each complex from them. There is much more to say about this: for example, these maps arise from a pair of what category theorists call ‘adjoint functors’. But I’m worn out and you probably are too, if you’re still here at all.

References

I found this thesis to be the most helpful reference when I was trying to understand the proof of the deficiency zero theorem:

• Jonathan M. Guberman, Mass Action Reaction Networks and the Deficiency Zero Theorem, B.A. thesis, Department of Mathematics, Harvard University, 2003.

I urge you to check it out. In particular, Section 3 and Appendix A discuss matrix exponentiation. Has anyone discussed this before?

Here’s another good modern treatment of the deficiency zero theorem:

• Jeremy Gunawardena, Chemical reaction network theory for in silico biologists, 2003.

The theorem was first proved here:

• Martin Feinberg, Chemical reaction network structure and the stability of complex isothermal reactors: I. The deficiency zero and deficiency one theorems, Chemical Engineering Science 42 (1987), 2229-2268.

However, Feinberg’s treatment here relies heavily on this paper:

• F. Horn and R. Jackson, General mass action kinetics, Archive for Rational Mechanics and Analysis 47 (1972), 81-116.

(Does anyone work on ‘irrational mechanics’?) These lectures are also very helpful:

• Martin Feinberg, Lectures on reaction networks, 1979.

If you’ve seen other proofs, let us know.


Network Theory (Part 21)

14 August, 2012

Recently we’ve been talking about ‘reaction networks’, like this:

The letters are names of chemicals, and the arrows are chemical reactions. If we know how fast each reaction goes, we can write down a ‘rate equation’ describing how the amount of each chemical changes with time.

In Part 17 we met the deficiency zero theorem. This is a powerful tool for finding equilibrium solutions of the rate equation: in other words, solutions where the amounts of the various chemicals don’t change at all. To apply this theorem, two conditions must hold. Both are quite easy to check:

• Your reaction network needs to be ‘weakly reversible’: if you have a reaction that takes one bunch of chemicals to another, there’s a series of reactions that takes that other bunch back to the one you started with.

• A number called the ‘deficiency’ that you can compute from your reaction network needs to be zero.

The first condition makes a lot of sense, intuitively: you won’t get an equilibrium with all the different chemicals present if some chemicals can turn into others but not the other way around. But the second condition, and indeed the concept of ‘deficiency’, seems mysterious.

Luckily, when you work through the proof of the deficiency zero theorem, the mystery evaporates. It turns out that there are two equivalent ways to define the deficiency of a reaction network. One makes it easy to compute, and that’s the definition people usually give. But the other explains why it’s important.

In fact the whole proof of the deficiency zero theorem is full of great insights, so I want to show it to you. This will be the most intense part of the network theory series so far, but also the climax of its first phase. After a little wrapping up, we will then turn to another kind of network: electrical circuits!

Today I’ll just unfold the concept of ‘deficiency’ so we see what it means. Next time I’ll show you a slick way to write the rate equation, which is crucial to proving the deficiency zero theorem. Then we’ll start the proof.

Reaction networks

Let’s recall what a reaction network is, and set up our notation. In chemistry we consider a finite set of ‘species’ like C, O2, H2O and so on… and then consider reactions like

CH4 + 2O2 \longrightarrow CO2 + 2H2O

On each side of this reaction we have a finite linear combination of species, with natural numbers as coefficients. Chemists call such a thing a complex.

So, given any finite collection of species, say S, let’s write \mathbb{N}^S to mean the set of finite linear combinations of elements of S, with natural numbers as coefficients. The complexes appearing in our reactions will form a subset of this, say K.

We’ll also consider a finite collection of reactions—or in the language of Petri nets, ‘transitions’. Let’s call this T. Each transition goes from some complex to some other complex: if we want a reaction to be reversible we’ll explicitly include another reaction going the other way. So, given a transition \tau \in T it will always go from some complex called its source s(\tau) to some complex called its target t(\tau).

All this data, put together, is a reaction network:

Definition. A reaction network (S,\; s,t : T \to K) consists of:

• a finite set S of species,

• a finite set T of transitions,

• a finite set K \subset \mathbb{N}^S of complexes,

source and target maps s,t : T \to K.

We can draw a reaction network as a graph with complexes as vertices and transitions as edges:

The set of species here is S = \{A,B,C,D,E\}, and the set of complexes is K = \{A,B,D,A+C,B+E\}.

But to write down the ‘rate equation’ describing our chemical reactions, we need a bit more information: constants r(\tau) saying the rate at which each transition \tau occurs. So, we define:

Definition. A stochastic reaction network is a reaction network together with a map r: T \to (0,\infty) assigning a rate constant to each transition.

Let me remind you how the rate equation works. At any time we have some amount x_i \in [0,\infty) of each species i. These numbers are the components of a vector x \in \mathbb{R}^S, which of course depends on time. The rate equation says how this vector changes:

\displaystyle{ \frac{d x_i}{d t} = \sum_{\tau \in T} r(\tau) \left(t_i(\tau) - s_i(\tau)\right)  x^{s(\tau)} }

Here I’m writing s_i(\tau) for the ith component of the vector s(\tau), and similarly for t_i(\tau). I should remind you what x^{s(\tau)} means, since here we are raising a vector to a vector power, which is a bit unusual. So, suppose we have any vector

x = (x_1, \dots, x_k)

and we raise it to the power of

s = (s_1, \dots, s_k)

Then by definition we get

\displaystyle{ x^s = x_1^{s_1} \cdots x_k^{s_k} }

Given this, I hope the rate equation makes intuitive sense! There’s one term for each transition \tau. The factor of t_i(\tau) - s_i(\tau) shows up because our transition destroys s_i(\tau) things of the ith species and creates t_i(\tau) of them. The big product

\displaystyle{ x^{s(\tau)} = x_1^{s_1(\tau)} \cdots x_k^{s_k(\tau)} }

shows up because our transition occurs at a rate proportional to the product of the numbers of things it takes as inputs. The constant of proportionality is the reaction rate r(\tau).

The deficiency zero theorem says lots of things, but in the next few episodes we’ll prove a weak version, like this:

Deficiency Zero Theorem (Baby Version). Suppose we have a weakly reversible reaction network with deficiency zero. Then for any choice of rate constants there exists an equilibrium solution x \in (0,\infty)^S of the rate equation. In other words:

\displaystyle{ \sum_{\tau \in T} r(\tau) \left(t_i(\tau) - s_i(\tau)\right)  x^{s(\tau)} = 0}

An important feature of this result is that all the components of the vector x are positive. In other words, there’s actually some chemical of each species present!

But what do the hypotheses in this theorem mean?

A reaction network is weakly reversible if for any transition \tau \in T going from a complex \kappa to a complex \kappa', there is a sequence of transitions going from \kappa' back to \kappa. But what about ‘deficiency zero’? As I mentioned, this requires more work to understand.

So, let’s dive in!

Deficiency

In modern math, we like to take all the stuff we’re thinking about and compress it into a diagram with a few objects and maps going between these objects. So, to understand the deficiency zero theorem, I wanted to isolate the crucial maps. For starters, there’s an obvious map

Y : K \to \mathbb{N}^S

sending each complex to the linear combination of species that it is.

Indeed, we can change viewpoints a bit and think of K as an abstract set equipped with this map saying how each complex is made of species. From now on this is what we’ll do. Then all the information in a stochastic reaction network sits in this diagram:

This is fundamental to everything we’ll do for the next few episodes, so take a minute to lock it into your brain.

We’ll do lots of different things with this diagram. For example, we often want to use ideas from linear algebra, and then we want our maps to be linear. For example, Y extends uniquely to a linear map

Y : \mathbb{R}^K \to \mathbb{R}^S

sending real linear combinations of complexes to real linear combinations of species. Reusing the name Y here won’t cause confusion. We can also extend r, s, and t to linear maps in a unique way, getting a little diagram like this:

Linear algebra lets us talk about differences of complexes. Each transition \tau gives a vector

\partial(\tau) = t(\tau) - s(\tau) \in \mathbb{R}^K

saying the change in complexes that it causes. And we can extend \partial uniquely to a linear map

\partial : \mathbb{R}^T \to \mathbb{R}^K

defined on linear combinations of transitions. Mathematicians would call \partial a boundary operator.

So, we have a little sequence of linear maps

\mathbb{R}^T \stackrel{\partial}{\to} \mathbb{R}^K \stackrel{Y}{\to} \mathbb{R}^S

This turns a transition into a change in complexes, and then a change in species.

If you know fancy math you’ll be wondering if this sequence is a ‘chain complex’, which is a fancy way of saying that Y \partial = 0. The answer is no. This equation means that every linear combination of reactions leaves the amount of all species unchanged. Or equivalently: every reaction leaves the amount of all species unchanged. This only happens in very silly examples.

Nonetheless, it’s possible for a linear combination of reactions to leave the amount of all species unchanged.

For example, this will happen if we have a linear combination of reactions that leaves the amount of all complexes unchanged. But this sufficient condition is not necessary. And this leads us to the concept of ‘deficiency zero’:

Definition. A reaction network has deficiency zero if any linear combination of reactions that leaves the amount of every species unchanged also leaves the amount of every complex unchanged.

In short, a reaction network has deficiency zero iff

Y (\partial \rho) = 0 \implies \partial \rho = 0

for every \rho \in \mathbb{R}^T. In other words—using some basic concepts from linear algebra—a reaction network has deficiency zero iff Y is one-to-one when restricted to the image of \partial. Remember, the image of \partial is

\mathrm{im} \partial = \{ \partial \rho \; : \; \rho \in \mathbb{R}^T \}

Roughly speaking, this consists of all changes in complexes that can occur due to reactions.

In still other words, a reaction network has deficiency zero if 0 is the only vector in both the image of \partial and the kernel of Y:

\mathrm{im} \partial \cap \mathrm{ker} Y = \{ 0 \}

Remember, the kernel of Y is

\mathrm{ker} Y = \{ \phi \in \mathbb{R}^K : Y \phi = 0 \}

Roughly speaking, this consists of all changes in complexes that don’t cause changes in species. So, ‘deficiency zero’ roughly says that if a reaction causes a change in complexes, it causes a change in species.

(All this ‘roughly speaking’ stuff is because in reality I should be talking about linear combinations of transitions, complexes and species. But it’s a bit distracting to keep saying that when I’m trying to get across the basic ideas!)

Now we’re ready to understand deficiencies other than zero, at least a little. They’re defined like this:

Definition. The deficiency of a reaction network is the dimension of \mathrm{im} \partial \cap \mathrm{ker} Y.

How to compute the deficiency

You can compute the deficiency of a reaction network just by looking at it. However, it takes a little training. First, remember that a reaction network can be drawn as a graph with complexes as vertices and transitions as edges, like this:

There are three important numbers we can extract from this graph:

• We can count the number of vertices in this graph; let’s call that |K|, since it’s just the number of complexes.

• We can count the number of pieces or ‘components’ of this graph; let’s call that \# \mathrm{components} for now.

• We can also count the dimension of the image of Y \partial. This space, \mathrm{im} Y \partial, is called the stochiometric subspace: vectors in here are changes in species that can be accomplished by transitions in our reaction network, or linear combinations of transitions.

These three numbers, all rather easy to compute, let us calculate the deficiency:

Theorem. The deficiency of a reaction network equals

|K| - \# \mathrm{components} - \mathrm{dim} \left( \mathrm{im} Y \partial \right)

Proof. By definition we have

\mathrm{deficiency} = \dim \left( \mathrm{im} \partial \cap \mathrm{ker} Y \right)

but another way to say this is

\mathrm{deficiency} = \mathrm{dim} (\mathrm{ker} Y|_{\mathrm{im} \partial})

where we are restricting Y to the subspace \mathrm{im} \partial, and taking the dimension of the kernel of that.

The rank-nullity theorem says that whenever you have a linear map T: V \to W between finite-dimensional vector spaces, then

\mathrm{dim} \left(\mathrm{ker} T \right) \; = \; \mathrm{dim}\left(\mathrm{dom} T\right) \; - \; \mathrm{dim} \left(\mathrm{im} T\right)

where \mathrm{dom} T is the domain of T, namely the vector space V. It follows that

\mathrm{dim}(\mathrm{ker} Y|_{\mathrm{im} \partial}) = \mathrm{dim}(\mathrm{dom}  Y|_{\mathrm{im} \partial}) - \mathrm{dim}(\mathrm{im}  Y|_{\mathrm{im} \partial})

The domain of Y|_{\mathrm{im} \partial} is just \mathrm{im} \partial, while its image equals \mathrm{im} Y \partial, so

\mathrm{deficiency} = \mathrm{dim}(\mathrm{im} \partial) - \mathrm{dim}(\mathrm{im} Y \partial)

The theorem then follows from this:

Lemma. \mathrm{dim} (\mathrm{im} \partial) = |K| - \# \mathrm{components}

Proof. In fact this holds whenever we have a finite set of complexes and a finite set of transitions going between them. We get a diagram

We can extend the source and target functions to linear maps as usual:

and then we can define \partial = t - s. We claim that

\mathrm{dim} (\mathrm{im} \partial) = |K| - \# \mathrm{components}

where \# \mathrm{components} is the number of connected components of the graph with K as vertices and T as edges.

This is easiest to see using an inductive argument where we start by throwing out all the edges of our graph and then put them back in one at a time. When our graph has no edges, \partial = 0 and the number of components is |K|, so our claim holds: both sides of the equation above are zero. Then, each time we put in an edge, there are two choices: either it goes between two different components of the graph we have built so far, or it doesn’t. If goes between two different components, it increases \mathrm{dim} (\mathrm{im} \partial) by 1 and decreases the number of components by 1, so our equation continues to hold. If it doesn’t, neither of these quantities change, so our equation again continues to hold.   █

Examples

This reaction network is not weakly reversible, since we can get from B + E and A + C to D but not back. It becomes weakly reversible if we throw in another transition:

Taking a reaction network and throwing in the reverse of an existing transition never changes the number of complexes, the number of components, or the dimension of the stoichiometric subspace. So, the deficiency of the reaction network remains unchanged. We computed it back in Part 17. For either reaction network above:

• the number of complexes is 5:

|K| = |\{A,B,D, A+C, B+E\}| = 5

• the number of components is 2:

\# \mathrm{components} = 2

• the dimension of the stoichiometric subspace is 3. For this we go through each transition, see what change in species it causes, and take the vector space spanned by these changes. Then we find a basis for this space and count it:

\mathrm{im} Y \partial =

\langle B - A, A - B, D - A - C, D - (B+E) ,  (B+E)-(A+C) \rangle

= \langle B -A, D - A - C, D - A - E \rangle

so

\mathrm{dim} \left(\mathrm{im} Y \partial \right) = 3

As a result, the deficiency is zero:

\begin{array}{ccl} \mathrm{deficiency}   &=& |K| - \# \mathrm{components} - \mathrm{dim} \left( \mathrm{im} Y \partial \right) \\  &=& 5 - 2 - 3 \\  &=&  0 \end{array}

Now let’s throw in another complex and some more transitions:

Now:

• the number of complexes increases by 1:

|K| = |\{A,B,D,E, A+C, B+E\}| = 6

• the number of components is unchanged:

\# \mathrm{components} = 2

• the dimension of the stoichiometric subspace increases by 1. We never need to include reverses of transitions when computing this:

\mathrm{im} Y \partial =

\langle B-A, E-B,  D - (A+C),
D - (B+E) , (B+E)-(A+C) \rangle

= \langle B -A, E-B, D - A - C, D - B - E \rangle

so

\mathrm{dim} \left(\mathrm{im} Y \partial \right) = 4

As a result, the deficiency is still zero:

\begin{array}{ccl} \mathrm{deficiency}   &=& |K| - \# \mathrm{components} - \mathrm{dim} \left( \mathrm{im} Y \partial \right) \\  &=& 6 - 2 - 4 \\  &=&  0 \end{array}

Do all reaction networks have deficiency zero? That would be nice. Let’s try one more example:

• the number of complexes is the same as in our last example:

|K| = |\{A,B,D,E, A+C, B+E\}| = 6

• the number of components is also the same:

\# \mathrm{components} = 2

• the dimension of the stoichiometric subspace is also the same:

\mathrm{im} Y \partial =

\langle B-A,  D - (A+C), D - (B+E) ,
(B+E)-(A+C), (B+E) - B \rangle

= \langle B -A, D - A - C, D - B , E\rangle

so

\mathrm{dim} \left(\mathrm{im} Y \partial \right) = 4

So the deficiency is still zero:

\begin{array}{ccl} \mathrm{deficiency}   &=& |K| - \# \mathrm{components} - \mathrm{dim} \left( \mathrm{im} Y \partial \right) \\  &=& 6 - 2 - 4 \\  &=&  0 \end{array}

It’s sure easy to find examples with deficiency zero!

Puzzle 1. Can you find an example where the deficiency is not zero?

Puzzle 2. If you can’t find an example, prove the deficiency is always zero. If you can, find 1) the smallest example and 2) the smallest example that actually arises in chemistry.

Note that not all reaction networks can actually arise in chemistry. For example, the transition $A \to A + A$ would violate conservation of mass. Nonetheless a reaction network like this might be useful in a very simple model of amoeba reproduction, one that doesn’t take limited resources into account.

Different kinds of graphs

I’ll conclude with some technical remarks that only a mathematician could love; feel free to skip them if you’re not in the mood. As you’ve already seen, it’s important that a reaction network can be drawn as a graph:

But there are many kinds of graph. What kind of graph is this, exactly?

As I mentioned in Part 17, it’s a directed multigraph, meaning that the edges have arrows on them, more than one edge can go from one vertex to another, and we can have any number of edges going from a vertex to itself. Not all those features are present in this example, but they’re certainly allowed by our definition of reaction network!

After all, we’ve seen that a stochastic reaction network amounts to a little diagram like this:

If we throw out the rate constants, we’re left with a reaction network. So, a reaction network is just a diagram like this:

If we throw out the information about how complexes are made of species, we’re left with an even smaller diagram:

And this precisely matches the slick definition of a directed multigraph: it’s a set E of edges, a set V of vertices, and functions s,t : E \to V saying where each edge starts and where it ends: its source and target.

Since this series is about networks, we should expect to run into many kinds of graphs. While their diversity is a bit annoying at first, we must learn to love it, at least if we’re mathematicians and want everything to be completely precise.

There are at least 23 = 8 kinds of graphs, depending on our answers to three questions:

• Do the edges have arrows on them?

• Can more than one edge can go between a specific pair of vertices?

and

• Can an edge can go from a vertex to itself?

We get directed multigraphs if we answer YES, YES and YES to these questions. Since they allow all three features, directed multigraphs are very important. For example, a category is a directed multigraph equipped with some extra structure. Also, what mathematicians call a quiver is just another name for a directed multigraph.

We’ve met two other kinds of graph so far:

• In Part 15 and Part 16 we described circuits made of resistors—or, in other words, Dirichlet operators—using ‘simple graphs’. We get simple graphs when we answer NO, NO and NO to the three questions above. The slick definition of a simple graph is that it’s a set V of vertices together with a subset E of the collection of 2-element subsets of V.

• In Part 20 we described Markov processes on finite sets—or, in other words, infinitesimal stochastic operators—using ‘directed graphs’. We get directed graphs when we answer YES, NO and YES to the three questions. The slick definition of a directed graph is that it’s a set V of vertices together with a subset E of the ordered pairs of V: E \subseteq V \times V.

There is a lot to say about this business, but for now I’ll just note that you can use directed multigraphs with edges labelled by positive numbers to describe Markov processes, just as we used directed graphs. You don’t get anything more general, though! After all, if we have multiple edges from one vertex to another, we can replace them by a single edge as long as we add up their rate constants. And an edge from a vertex to itself has no effect at all.

In short, both for Markov processes and reaction networks, we can take ‘graph’ to mean either ‘directed graph’ or ‘directed multigraph’, as long as we make some minor adjustments. In what follows, we’ll use directed multigraphs for both Markov processes and reaction networks. It will work a bit better if we ever get around to explaining how category theory fits into this story.


Network Theory (Part 19)

18 July, 2012

joint with Jacob Biamonte

It’s time to resume the network theory series! We’re writing a little book based on some of these posts, so we want to finish up our discussion of stochastic Petri nets and chemical reaction networks. But it’s been a long time, so we’ll assume you forgot everything we’ve said before, and make this post as self-contained as possible.

Last time we started looking at a simple example: a diatomic gas.

A diatomic molecule of this gas can break apart into two atoms:

A_2 \to A + A

and conversely, two atoms can combine to form a diatomic molecule:

A + A \to A_2

We can draw both these reactions using a chemical reaction network:

where we’re writing B instead of A_2 to abstract away some detail that’s just distracting here.

Last time we looked at the rate equation for this chemical reaction network, and found equilibrium solutions of that equation. Now let’s look at the master equation, and find equilibrium solutions of that. This will serve as a review of three big theorems.

The master equation

We’ll start from scratch. The master equation is all about how atoms or molecules or rabbits or wolves or other things interact randomly and turn into other things. So, let’s write \psi_{m,n} for the probability that we have m atoms of A and n molecule of B in our container. These probabilities are functions of time, and master equation will say how they change.

First we need to pick a rate constant for each reaction. Let’s say the rate constant for this reaction is some number \alpha > 0:

B \to A + A

while the rate constant for the reverse reaction is some number \beta > 0:

A + A \to B

Before we make it pretty using the ideas we’ve been explaining all along, the master equation says

\begin{array}{ccr} \displaystyle{ \frac{d}{d t} \psi_{m,n} (t)} &=&  \alpha (n+1) \, \psi_{m-2,n+1}(t) \\ \\   &&- \; \alpha n \, \psi_{m,n}(t) \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1}(t) \\ \\ && - \;\beta m(m-1) \, \psi_{m,n}(t) \\ \\   \end{array}

where we define \psi_{i,j} to be zero if either i or j is negative.

Yuck!

Normally we don’t show you such nasty equations. Indeed the whole point of our work has been to demonstrate that by packaging the equations in a better way, we can understand them using high-level concepts instead of mucking around with millions of scribbled symbols. But we thought we’d show you what’s secretly lying behind our beautiful abstract formalism, just once.

Each term has a meaning. For example, the third one:

\beta (m+2)(m+1)\psi_{m+2,n-1}(t)

means that the reaction A + A \to B will tend to increase the probability of there being m atoms of A and n molecules of B if we start with m+2 atoms of A and n-1 molecules of B. This reaction can happen in (m+2)(m+1) ways. And it happens at a probabilistic rate proportional to the rate constant for this reaction, \beta.

We won’t go through the rest of the terms. It’s a good exercise to do so, but there could easily be a typo in the formula, since it’s so long and messy. So let us know if you find one!

To simplify this mess, the key trick is to introduce a generating function that summarizes all the probabilities in a single power series:

\Psi = \sum_{m,n \ge 0} \psi_{m,n} y^m \, z^n

It’s a power series in two variables, y and z, since we have two chemical species: As and Bs.

Using this trick, the master equation looks like

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

where the Hamiltonian H is a sum of terms, one for each reaction. This Hamiltonian is built from operators that annihilate and create As and Bs. The annihilation and creation operators for A atoms are:

\displaystyle{ a = \frac{\partial}{\partial y} , \qquad a^\dagger = y }

The annihilation operator differentiates our power series with respect to the variable y. The creation operator multiplies it by that variable. Similarly, the annihilation and creation operators for B molecules are:

\displaystyle{ b = \frac{\partial}{\partial z} , \qquad b^\dagger = z }

In Part 8 we explained a recipe that lets us stare at our chemical reaction network and write down this Hamiltonian:

H = \alpha ({a^\dagger}^2 b - b^\dagger b) + \beta (b^\dagger a^2 - {a^\dagger}^2 a^2)

As promised, there’s one term for each reaction. But each term is itself a sum of two: one that increases the probability that our container of chemicals will be in a new state, and another that decreases the probability that it’s in its original state. We get a total of four terms, which correspond to the four terms in our previous way of writing the master equation.

Puzzle 1. Show that this new way of writing the master equation is equivalent to the previous one.

Equilibrium solutions

Now we will look for all equilibrium solutions of the master equation: in other words, solutions that don’t change with time. So, we’re trying to solve

H \Psi = 0

Given the rather complicated form of the Hamiltonian, this seems tough. The challenge looks more concrete but even more scary if we go back to our original formulation. We’re looking for probabilities \psi_{m,n}, nonnegative numbers that sum to one, such that

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

for all m and n.

This equation is horrid! But the good news is that it’s linear, so a linear combination of solutions is again a solution. This lets us simplify the problem using a conserved quantity.

Clearly, there’s a quantity that the reactions here don’t change:

What’s that? It’s the number of As plus twice the number of Bs. After all, a B can turn into two As, or vice versa.

(Of course the secret reason is that B is a diatomic molecule made of two As. But you’d be able to follow the logic here even if you didn’t know that, just by looking at the chemical reaction network… and sometimes this more abstract approach is handy! Indeed, the way chemists first discovered that certain molecules are made of certain atoms is by seeing which reactions were possible and which weren’t.)

Suppose we start in a situation where we know for sure that the number of Bs plus twice the number of As equals some number k:

\psi_{m,n} = 0  \;  \textrm{unless} \; m+2n = k

Then we know \Psi is initially of the form

\displaystyle{ \Psi = \sum_{m+2n = k} \psi_{m,n} \, y^m z^n  }

But since the number of As plus twice the number of Bs is conserved, if \Psi obeys the master equation it will continue to be of this form!

Put a fancier way, we know that if a solution of the master equation starts in this subspace:

\displaystyle{ L_k =  \{ \Psi: \; \Psi = \sum_{m+2n = k} \psi_{m,n} y^m z^n \; \textrm{for some} \; \psi_{m,n} \}  }

it will stay in this subspace. So, because the master equation is linear, we can take any solution \Psi and write it as a linear combination of solutions \Psi_k, one in each subspace L_k.

In particular, we can do this for an equilibrium solution \Psi. And then all the solutions \Psi_k are also equilibrium solutions: they’re linearly independent, so if one of them changed with time, \Psi would too.

This means we can just look for equilibrium solutions in the subspaces L_k. If we find these, we can get all equilibrium solutions by taking linear combinations.

Once we’ve noticed that, our horrid equation makes a bit more sense:

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

Note that if the pair of subscripts m, n obey m + 2n = k, the same is true for the other pairs of subscripts here! So our equation relates the values of \psi_{m,n} for all the points (m,n) with integer coordinates lying on this line segment:

m+2n = k , \qquad m ,n \ge 0

You should be visualizing something like this:

If you think about it a minute, you’ll see that if we know \psi_{m,n} at two points on such a line, we can keep using our equation to recursively work out all the rest. So, there are at most two linearly independent equilibrium solutions of the master equation in each subspace L_k.

Why at most two? Why not two? Well, we have to be a bit careful about what happens at the ends of the line segment: remember that \psi_{m,n} is defined to be zero when m or n becomes negative. If we think very hard about this, we’ll see there’s just one linearly independent equilibrium solution of the master equation in each subspace L_k. But this is the sort of nitty-gritty calculation that’s not fun to watch someone else do, so we won’t bore you with that.

Soon we’ll move on to a more high-level approach to this problem. But first, one remark. Our horrid equation is like a fancy version of the usual discretized form of the equation

\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }

namely:

\psi_{n-1} - 2 \psi_{n} + \psi_{n+1} = 0

And this makes sense, since we get

\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }

by taking the heat equation:

\displaystyle \frac{\partial \psi}{\partial t} = {\frac{\partial^2 \psi}{\partial x^2} }

and assuming \psi doesn’t depend on time. So what we’re doing is a lot like looking for equilibrium solutions of the heat equation.

The heat equation describes how heat smears out as little particles of heat randomly move around. True, there don’t really exist ‘little particles of heat’, but this equation also describes the diffusion of any other kind of particles as they randomly move around undergoing Brownian motion. Similarly, our master equation describes a random walk on this line segment:

m+2n = k , \qquad m , n \ge 0

or more precisely, the points on this segment with integer coordinates. The equilibrium solutions arise when the probabilities \psi_{m,n} have diffused as much as possible.

If you think about it this way, it should be physically obvious that there’s just one linearly independent equilibrium solution of the master equation for each value of k.

There’s a general moral here, too, which we’re seeing in a special case: the master equation for a chemical reaction network really describes a bunch of random walks, one for each allowed value of the conserved quantities that can be built as linear combinations of number operators. In our case we have one such conserved quantity, but in general there may be more (or none).

Furthermore, these ‘random walks’ are what we’ve been calling Markov processes.

Noether’s theorem

We simplified our task of finding equilibrium solutions of the master equation by finding a conserved quantity. The idea of simplifying problems using conserved quantities is fundamental to physics: this is why physicists are so enamored with quantities like energy, momentum, angular momentum and so on.

Nowadays physicists often use ‘Noether’s theorem’ to get conserved quantities from symmetries. There’s a very simple version of Noether’s theorem for quantum mechanics, but in Part 11 we saw a version for stochastic mechanics, and it’s that version that is relevant now. Here’s a paper which explains it in detail:

• John Baez and Brendan Fong, Noether’s theorem for Markov processes.

We don’t really need Noether’s theorem now, since we found the conserved quantity and exploited it without even noticing the symmetry. Nonetheless it’s interesting to see how it relates to what we’re doing.

For the reaction we’re looking at now, the idea is that the subspaces L_k are eigenspaces of an operator that commutes with the Hamiltonian H. It follows from standard math that a solution of the master equation that starts in one of these subspaces, stays in that subspace.

What is this operator? It’s built from ‘number operators’. The number operator for As is

N_A = a^\dagger a

and the number operator for Bs is

N_B = b^\dagger b

A little calculation shows

N_A \,y^m z^n = m \, y^m z^n, \quad \qquad  N_B\, y^m z^n = n \,y^m z^n

so the eigenvalue of N_A is the number of As, while the eigenvalue of N_B is the number of Bs. This is why they’re called number operators.

As a consequence, the eigenvalue of the operator N_A + 2N_B is the number of As plus twice the number of Bs:

(N_A + 2N_B) \, y^m z^n = (m + 2n) \, y^m z^n

Let’s call this operator O, since it’s so important:

O = N_A + 2N_B

If you think about it, the spaces L_k we saw a minute ago are precisely the eigenspaces of this operator:

L_k = \{ \Psi : \; O \Psi = k \Psi \}

As we’ve seen, solutions of the master equation that start in one of these eigenspaces will stay there. This lets take some techniques that are very familiar in quantum mechanics, and apply them to this stochastic situation.

First of all, time evolution as described by the master equation is given by the operators \exp(t H). In other words,

\displaystyle{ \frac{d}{d t} \Psi(t) } = H \Psi(t) \; \textrm{and} \;  \Psi(0) = \Phi \quad  \Rightarrow \quad \Psi(t) = \exp(t H) \Phi

But if you start in some eigenspace of O, you stay there. Thus if \Phi is an eigenvector of O, so is \exp(t H) \Phi, with the same eigenvalue. In other words,

O \Phi = k \Phi

implies

O \exp(t H) \Phi = k \exp(t H) \Phi = \exp(t H) O \Phi

But since we can choose a basis consisting of eigenvectors of O, we must have

O \exp(t H) = \exp(t H) O

or, throwing caution to the winds and differentiating:

O H = H O

So, as we’d expect from Noether’s theorem, our conserved quantity commutes with the Hamiltonian! This in turn implies that H commutes with any polynomial in O, which in turn suggests that

\exp(s O) H = H \exp(s O)

and also

\exp(s O) \exp(t H) = \exp(t H) \exp(s O)

The last equation says that O generates a 1-parameter family of ‘symmetries’: operators \exp(s O) that commute with time evolution. But what do these symmetries actually do? Since

O y^m z^n = (m + 2n) y^m z^n

we have

\exp(s O) y^m z^n = e^{s(m + 2n)}\, y^m z^n

So, this symmetry takes any probability distribution \psi_{m,n} and multiplies it by e^{s(m + 2n)}.

In other words, our symmetry multiplies the relative probability of finding our container of gas in a given state by a factor of e^s for each A atom, and by a factor of e^{2s} for each B molecule. It might not seem obvious that this operation commutes with time evolution! However, experts on chemical reaction theory are familiar with this fact.

Finally, a couple of technical points. Starting where we said “throwing caution to the winds”, our treatment has not been rigorous, since O and H are unbounded operators, and these must be handled with caution. Nonetheless, all the commutation relations we wrote down are true.

The operators \exp(s O) are unbounded for positive s. They’re bounded for negative s, so they give a one-parameter semigroup of bounded operators. But they’re not stochastic operators: even for s negative, they don’t map probability distributions to probability distributions. However, they do map any nonzero vector \Psi with \psi_{m,n} \ge 0 to a vector \exp(s O) \Psi with the same properties. So, we can just normalize this vector and get a probability distribution. The need for this normalization is why we spoke of relative probabilities.

The Anderson–Craciun–Kurtz theorem

Now we’ll actually find all equilibrium solutions of the master equation in closed form. To understand this final section, you really do need to remember some things we’ve discussed earlier. Last time we considered the same chemical reaction network we’re studying today, but we looked at its rate equation, which looks like this:

\displaystyle{ \frac{d}{d t} x_1 =  2 \alpha x_2 - 2 \beta x_1^2}

\displaystyle{ \frac{d}{d t} x_2 = - \alpha x_2 + \beta x_1^2 }

This describes how the number of As and Bs changes in the limit where there are lots of them and we can treat them as varying continuously, in a deterministic way. The number of As is x_1, and the number of Bs is x_2.

We saw that the quantity

x_1 + 2 x_2

is conserved, just as today we’ve seen that N_A + 2 N_B is conserved. We saw that the rate equation has one equilibrium solution for each choice of x_1 + 2 x_2. And we saw that these equilibrium solutions obey

\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }

The Anderson–Craciun–Kurtz theorem, introduced in Part 9, is a powerful result that gets equilibrium solution of the master equation from equilibrium solutions of the rate equation. It only applies to equilibrium solutions that are ‘complex balanced’, but that’s okay:

Puzzle 2. Show that the equilibrium solutions of the rate equation for the chemical reaction network

are complex balanced.

So, given any equilibrium solution (x_1,x_2) of our rate equation, we can hit it with the Anderson-Craciun-Kurtz theorem and get an equilibrium solution of the master equation! And it looks like this:

\displaystyle{  \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

In this solution, the probability distribution

\displaystyle{ \psi_{m,n} = e^{-(x_1 + x_2)} \,  \frac{x_1^m x_2^n} {m! n! } }

is a product of Poisson distributions. The factor in front is there to make the numbers \psi_{m,n} add up to one. And remember, x_1, x_2 are any nonnegative numbers with

\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }

So from all we’ve said, the above formula is an explicit closed-form solution of the horrid equation

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

That’s pretty nice. We found some solutions without ever doing any nasty calculations.

But we’ve really done better than getting some equilibrium solutions of the master equation. By restricting attention to n,m with m+2n = k, our formula for \psi_{m,n} gives an equilibrium solution that lives in the eigenspace L_k:

\displaystyle{  \Psi_k = e^{-(x_1 + x_2)} \, \sum_{m+2n =k} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

And by what we’ve said, linear combinations of these give all equilibrium solutions of the master equation.

And we got them with very little work! Despite all the fancy talk in today’s post, we essentially just took the equilibrium solutions of the rate equation and plugged them into a straightforward formula to get equilibrium solutions of the master equation. This is why the Anderson–Craciun–Kurtz theorem is so nice. And of course we’re looking at a very simple reaction network: for more complicated ones it becomes even better to use this theorem to avoid painful calculations.

We could go further. For example, we could study nonequilibrium solutions using Feynman diagrams like this:

But instead, we will leave off with two more puzzles. We introduced some symmetries, but we haven’t really explored them yet:

Puzzle 3. What do the symmetries associated to the conserved quantity O do to the equilibrium solutions of the master equation given by

\displaystyle{  \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

where (x_1,x_2) is an equilibrium solution of the rate equation? In other words, what is the significance of the one-parameter family of solutions

\exp(s O) \Psi ?

Also, we used a conceptual argument to check that H commutes with O, but it’s good to know that we can check this sort of thing directly:

Puzzle 4. Compute the commutator

[H, O] = H O - O H

and show it vanishes.


Metallic Hydrogen

7 May, 2012

 

See that gray stuff inside Jupiter? It’s metallic hydrogen—according to theory, anyway.

But how much do you need to squeeze hydrogen before the H2 molecules break down, the individual atoms form a crystal, and electrons start roaming freely so the stuff starts conducting electricity and becomes shiny—in short, becomes a metal?

In 1935 the famous physicist Eugene Wigner and a coauthor predicted that 250,000 times normal Earth atmospheric pressure would do the job. But now they’ve squeezed it to 3.6 million atmospheres and it’s still not conducting electricity! Here’s a news report:

Probing hydrogen under extreme conditions, PhysOrg, 13 April 2012.

and here’s the original article, which unfortunately ain’t free:

• Chang-Sheng Zha, Zhenxian Liu, and Russell J. Hemley, Synchrotron infrared measurements of dense hydrogen to 360 GPa, Phys. Rev. Lett. 108 (2012), 146402.

Three phases of highly compressed solid hydrogen are known, with phase I starting at 1 million atmospheres and phase III kicking in around 1.5 million. I would love to know more about these! Do you know where to find out? Some people also think there’s a liquid metallic phase, and a superconducting liquid metallic phase. In fact there are claims that liquid metallic hydrogen has already been seen:

• W.J. Nellis, S.T. Weir and A.C. Mitchell, Metallization of fluid hydrogen at 140 GPa (1.4 Mbar) by shock compression, Shock Waves 9 (1999), 301–305.

1.4 Mbar, or megabar, is about 1.4 million atmospheres of pressure. Here’s the abstract:

Abstract. Shock compression was used to produce the first observation of a metallic state of condensed hydrogen. The conditions of metallization are a pressure of 140 GPa (1.4 Mbar), 0.6 g/cm (ninefold compression of initial liquid-H density), and 3000 K. The relatively modest temperature generated by a reverberating shock wave produced the metallic state in a warm fluid at a lower pressure than expected previously for the crystallographically ordered solid at low temperatures. The relatively large sample diameter of 25 mm permitted measurement of electrical conductivity. The experimental technique and data analysis are described.

Apprently the electric resistivity of fluid metallic hydrogen is about the same as the fluid alkali metals cesium and rubidium at 2000 kelvin, right when they undergo the same sort of nonmetal-metal transition. Wow! So does that mean that at 2000 kelvin but at lower pressures, these elements don’t act like metals? I hadn’t known that!

Another reason this is interesting is that if you look at hydrogen on the periodic table, you’ll see it can’t make up its mind whether it’s an alkali metal—since its outer shell has just one electron in it—or a halogen—since its outer shell is just one electron short from being full! You could say compressing hydrogen until it becomes metallic is like trying to get it to break down and admit its an alkali metal.

Apparently the metal-nonmetal transition for for liquid cesium, rubidium and hydrogen all happen when the stuff gets squashed so much that the distance between atoms goes down to about 0.3 times the size of these atoms in vacuum… where by ‘size’ I mean the Bohr radius of the outermost shell.

How did Huntington and Wigner get their original calculation so wrong? I don’t know! Their original paper is here:

• Eugene Wigner and Hillard Bell Huntington, On the possibility of a metallic modification of hydrogen J. Chem. Phys. 3 (1935), 764-771.

It’s not free; I guess the American Institute of Physics is still trying to milk it for all it’s worth. One interesting thing is that they assumed the crystal stucture of metallic hydogen would be a ‘body-centered cubic’… it’s rather hard to compute these things from scratch without computers. But this more recent paper claims that a diamond cubic is energetically favored at 3 million atmospheres:

Crystal structure of atomic hydrogen, Phys. Rev. Lett. 70 (1993), 1952–-1955.

I explained the diamond cubic crystal structure in my recent post about ice. Remember, it looks like this:

Since the body-centered cubic is one of the crystal lattices I didn’t talk about in that post, let me tell you about it now. It’s built of cells that look like this:

… which explains its name. In the same style of drawing, the face-centered cubic looks like this:

In my post about ice, I mentioned that if you pack equal-sized spheres with centers at points in the face-centered cubic lattice, you get the maximum density possible, namely about 74%. The body-centered cubic does slightly worse, about 68%.

So, I always thought of it as a kind of a second-best thing. But apparently it’s the best in some ‘sampling’ problems where you’re trying to estimate a function on space by measuring it only at points in your lattice! That’s because its dual is the face-centered cubic.

Eh? Well, the dual L^* of a lattice L in a vector space V consists of all vectors k in the dual vector space V^* such that k(x) is an integer for all points x in L. The dual of a lattice is again a lattice, and taking the dual twice gets you back where you started. Since the Fourier transform of a function on a vector space is a function on the dual vector space, I don’t find it surprising that this is related to sampling problems. I don’t understand the details, but I bet I could find them here:

• Alireza Entezari, Ramsay Dyer and Torsten Möller, From sphere packing to the theory of optimal lattice sampling.

So: from the core of Jupiter to Fourier transforms! Yet again, everything seems to be related.


Ice

15 April, 2012

Water is complicated and fascinating stuff! There are at least sixteen known crystal phases of ice, many shown in this diagram based on the work of Martin Chaplin:

(Click for a larger version.)

For example, ordinary ice is called ice Ih, because it has hexagonal crystals. But if you cool it below -37 °C, scientists believe it will gradually turn into a cubic form, called ice Ic. In this form of ice the water molecules are arranged like carbons in a diamond! And apparently some of it can be found in ice crystals in the stratosphere.

But if you wait longer, ice below -37 °C will turn into ice XI. The transformation process is slow, but ice XI has been found in Antarctic ice that’s 100 to 10,000 years old. This ice is ferroelectric, meaning that it spontaneously become electrically polarized, just like a ferromagnet spontaneously magnetizes.

That’s the usual textbook story, anyway. The true story may be even more complicated:

• Simon Hadlington, A question mark over cubic ice’s existence, Phys.org, 9 January 2012.

But now a team led by Benjamin Murray at the University of Leeds has carried out work that suggests that cubic ice may not in fact exist. The researchers searched for cubic ice by suspending water droplets in oil and gradually cooling them to -40 °C while observing the X-ray diffraction pattern of the resulting crystals. “We modelled the diffraction pattern we obtained and compared it to perfect cubic and perfect hexagonal ice, and it was clearly neither of them,” Murray says. “Nor is it merely a mixture of the two. Rather it is something quite distinct.”

Analysis of the diffraction data shows that in the ice crystals the stacking of the atomic layers is disordered. “The crystals that form have randomly stacked layers of cubic and hexagonal sequences,” Murray says. “As each new layer is added, there is a 50% probability of it being either hexagonal or cubic.” The result is a novel, metastable form of ice with a stacking-disordered structure.

Re-examination of what had previously been identified as cubic ice suggests that this was stacking-disordered structures too, Murray says. “Cubic ice may not exist.”

Even plain old ice Ih is surprisingly tricky stuff. The crystal structure is subtle, first worked out by Linus Pauling in 1935. Here’s a nice picture by Steve Lower:

There’s a lot of fun math here. First focus on the oxygen atoms, in red. What sort of pattern do they form?

Close-packings

To warm up for that question, it really helps to start by watching this video:

It’s not flashy, but it helped me understand something that had been bothering me for decades! This is truly beautiful stuff: part of the geometry of nature.

But I’ll explain ice Ih from scratch—and while I’m at it, ice Ic.

First we need to think about packing balls. Say you’re trying to pack equal-sized balls as densely as possible. You put down a layer in a hexagonal way, like the balls labelled ‘a’ here:

Then you put down a second hexagonal layer of balls. You might as well put them in the spots labelled ‘b’. There’s another choice, but the symmetry of the situation means it doesn’t matter which you pick!

The fun starts at the third layer. If you want, you can put these balls in the positions labelled ‘a’, directly over the balls in the first layer. And you can go on like this forever, like this:

-a-b-a-b-a-b-a-b-a-b-a-

This gives you the hexagonal close packing.

But there’s another choice for where to put the balls in the third layer… and now it does matter which choice you pick! The other choice is to put these balls in the spots labelled ‘c’:

And then you can go on forever like this:

-a-b-c-a-b-c-a-b-c-a-b-c-

This gives you the cubic close packing.

There’s also an uncountable infinity of other patterns that all give you equally dense packings. For example:

-a-b-a-c-b-a-c-b-c-b-a-c-

You just can’t repeat the same letter twice in a row!

To review, here’s the difference between the hexagonal and cubic close packings:

But what’s ‘cubic’ about the cubic close packing? Well, look at this. At left we see a bit of a hexagonal close packing, while at right we see a bit of a cubic close packing:

The dashed white circle at right shows what you shouldn’t do for a cubic close packing. But the red lines at right show why it’s called ‘cubic’! If you grab that cube and pull it out, it looks like this:

As you can see, there’s a ball at each corner of a cube, but also one at the middle of each face. That’s why this pattern is also called a face-centered cubic.

If we shrink the balls a bit, the face-centered cubic looks like this:

What used to confuse me so much is that the cubic close packing looks very different from different angles. For example, when you see it like this, the cubical symmetry is hard to spot:

But it’s there! If you stack some balls in the hexagonal close packing, they look different:

And to my eye, they look uglier—they’re less symmetrical.

Anyway, perhaps now we’re in a better position to understand what Benjamin Murray meant when he said:

As each new layer is added, there is a 50% probability of it being either hexagonal or cubic.

I’m guessing that means that at each layer we randomly make either of the two choices I mentioned. I’m not completely sure. My guess makes the most sense if the ice is growing by accretion one layer at a time.

Diamonds

But there’s more to ice than close-packed spheres. If we ignore the hydrogen atoms and focus only on the oxygens, the two kinds of ice we’re talking about—the hexagonal ice Ih and the cubic ice Ic, assuming it exists—are just like two kinds of diamond!

The oxygens in ice Ic are arranged in the same pattern as carbons in an ordinary diamond:

This pattern is called a diamond cubic. We can get it by taking the cubic close packing, making the balls smaller, and then putting in new ones, each at the center of a tetrahedron formed by the old ones. I hope you can see this. There’s a ball at each corner of the cube and one at the center of each face, just as we want for a face-centered cubic. But then there are are 4 more!

But what about good old ice Ih? Here the oxygens are arranged in the same pattern as carbons in a hexagonal diamond.

You’ve heard about diamonds, but you might not have heard about hexagonal diamond, also known as lonsdaelite. It forms when graphite is crushed… typically by a meteor impact! It’s also been made in the lab.

Its crystal structure looks like this:

We get this pattern by taking a hexagonal close packing, making the balls smaller, and then putting in new ones, each at the center of a tetrahedron formed by the old ones. Again, I hope you can see this!

Entropy

Now we understand how the oxygens are arranged in good old ice Ih. They’re the red balls here, and the pattern is exactly like lonsdaelite, though viewed from a confusingly different angle:

But what about the hydrogens? They’re very interesting: they’re arranged in a somewhat random way!

Pick any oxygen near the middle of the picture above. It has 4 bonds to other oxygens, shown in green. But only 2 of these bonds have a hydrogen near your oxygen! The other 2 bonds have hydrogens far away, at the other end.

There are many ways for this to happen, and they’re all allowed—so ice is like a jigsaw puzzle that you can put together in lots of ways!

So, even though ice is a crystal, it’s disordered. This gives it entropy. Figuring out how much entropy is a nice math puzzle: it’s about 3/2 times Boltzmann’s constant per water molecule. Why?

Rahul Siddharthan explained this to me over on Google+. Here’s the story.

The oxygen atoms form a bipartite lattice: in other words, they can be divided into two sets, with all the neighbors of an oxygen atom from one set lying in the other set. You can see this if you look.

Focus attention on the oxygen atoms in one set: there are N/2 of them. Each has 4 hydrogen bonds, with two hydrogens close to it and two far away. This means there are

\binom{4}{2} = 6

allowed configurations of hydrogens for this oxygen atom. Thus there are 6^{N/2} configurations that satisfy these N/2 atoms.

But now consider the remaining N/2 oxygen atoms: in general they won’t be satisfied: they won’t have precisely two hydrogen atoms near them). For each of those, there are

2^4 = 16

possible placements of the hydrogen atoms along their hydrogen bonds, of which 6 are allowed. So, naively, we would expect the total number of configurations to be

6^{N/2} (6/16)^{N/2} = (3/2)^N

Using Boltzmann’s ideas on entropy, we conclude that

S = Nk\ln(3/2)

where k is Boltzmann’s constant. This gives an entropy of 3.37 joules per mole per kelvin, a value close to the measured value. But this estimate is ‘naive’ because it assumes the 6 out of 16 hydrogen configurations for oxygen atoms in the second set can be independently chosen, which is false. More complex methods can be employed to better approximate the exact number of possible configurations, and achieve results closer to measured values.

By the way: I’ve been trying to avoid unnecessary jargon, but this randomness in ice Ih has such a cool name I can’t resist mentioning it. It’s called proton disorder, since the hydrogens I’ve been talking about are really just hydrogen nuclei, or protons. The electrons, which form the bonds, are smeared all over thanks to the wonders of quantum mechanics.

Higher pressures

If I were going to live forever, I’d definitely enjoy studying and explaining all sixteen known forms of ice. For example, I’d tell you the story of what happens to ordinary ice as we slowly increase the pressure, moving up this diagram until we hit ice XI:

Heck, I’d like to know what happens at every stage as we crush water down to neutronium! Unfortunately I’m mortal, with lots to do. So I recommend these:

• Martin Chaplin, Water phase diagram.

• Norman Anderson, The many phases of ice.

But there’s a bit of news I can’t resist mentioning…

Researchers at Sandia Labs in Albuquerque, New Mexico have been using the Z Machine to study ice. Here it is in operation, shooting out sparks:

Click for an even more impressive image of the whole thing.

How does it work? It fires a very powerful electrical current—about 20 million amps—into an array of thin, parallel tungsten wires. For a very short time, it uses a power of 290 terawatts. That’s 80 times the world’s total electrical power output! The current vaporizes the wires, and they turn into tubes of plasma. At the same time, the current creates a powerful magnetic field. This radially compresses the plasma tubes at speeds of nearly 100,000 kilometers per hour. And this lets the mad scientists who run this machine study materials at extremely high pressures.

Back in 2007, the Z Machine made ice VII, a cubic crystalline form of ice that coexists with liquid water and ice VI at 82 Celsius and about 20,000 atmospheres. Its crystal structure is theorized to look like this:


But now the Z Machine is making far more compressed forms of ice. Above 1 million atmospheres, ice X is stable… and above 8 million atmospheres, a hexagonal ferroelectric form called ice XI comes into play. This has a density over 2.5 times that of ordinary water!

Recent experiments with the Z Machine seem to show water is 30% less compressible than had been thought under conditions like those at the center of Neptune. These experiments seem to match new theoretical calculations, too:

Experiments may force revision of astrophysical models: ice giant planets have more water volume than believed, Science Daily, 19 March 2012.


While it’s named after the Roman god of the seas, and it’s nice and blue, Neptune’s upper atmosphere is very dry. However, there seems to be water down below, and it has a heart of ice. A mix of water, ammonia and methane ice surrounding a rocky core, to be precise! But the pressure down there is about 8 million atmospheres, so there’s probably ice X and ice XI. And if that ice is less compressible than people thought, it’ll force some changes in our understanding of this planet.

Not that this matters much for most purposes. But it’s cool.

And if you don’t love math, this might be a good place to stop reading.

A little more math

Okay… if you’re still reading, you must want more! And having wasted the day on this post, I might as well explain a bit of the math of the crystal structures I described. I’ll want to someday; it might as well be now.

A lattice in n-dimensional Euclidean space is a subset consisting of all linear combinations of n basis vectors. The centers of the balls in the cubic close packing form a lattice called the face-centered cubic or A3 lattice. It’s most elegant to think of these as the vectors (a,b,c,d) in 4-dimensional space having integer components and lying in the hyperplane

a + b + c + d = 0

That way, you can see this lattice has the group of permutations of 4 letters as symmetries. Not coincidentally, this is the symmetry group of the cube.

In this description, you can take the three basis vectors to be

\displaystyle{ (1,-1,0,0), \quad (0,1,-1,0), \quad (0,0,1,-1) }

Note that they all have the same length and each lies at a 120° angle from the previous one, while the first and last are at right angles.

We can also get the A3 lattice by taking the center of each red cube in a 3d red-and-black checkerboard. That means taking all vectors (a,b,c) in 3-dimensional space with integer coefficients that sum to an even number. In this description, you can take the three basis vectors to be

\displaystyle{ (1,-1,0), \quad (0,1,-1), \quad (-1,-1,0) }

This seems like an ugly choice, but note: they all have the same length and each lies at a 120° angle from the previous one, while the first and last are at right angles. So, we know it’s the A3 lattice!

We see more symmetry if we look at all the shortest nonzero vectors in this description of the lattice:

\displaystyle{ (\pm 1, \pm 1,0), \quad (0,\pm 1,\pm 1), \quad (\pm 1,\pm 1,0) }

These form the 12 midpoints of the edges of a cube, and thus the corners of a cuboctahedron:

So, in the cubic close packing each ball touches 12 others, centered at the vertices of a cuboctahedron, as shown at top here:

Below we see that in the hexagonal close packing each ball also touches twelve others, but centered at the vertices of a mutant cuboctahedron whose top has been twisted relative to its bottom. This shape is called a triangular orthobicupola.

I should talk more about the lattice associated to the hexagonal close packing, but I don’t understand it well enough. Instead I’ll wrap up by explaining the math of the diamond cubic:

The diamond cubic is not a lattice. We can get it by taking the union of two copies of the A3 lattice: the original lattice and a translated copy. For example, we can start with all vectors (a,b,c) with integer coefficients summing to an even number, and then throw in all vectors (a + \frac{1}{2}, b + \frac{1}{2}, c + \frac{1}{2}). This is called the D3+ pattern.

The A3 lattice gives a dense-as-possible packing of balls, the cubic close packing, with density

\displaystyle{ \frac{\pi}{3 \sqrt{2}} \sim 0.74 }

The D3+ pattern, on the other hand, gives a way to pack spheres with a density of merely

\displaystyle{ \frac{\pi \sqrt{3}} {16} \sim 0.34 }

This is why ordinary ice actually becomes denser when it melts. It’s not packed in the diamond cubic pattern: that would be ice Ic. Ordinary ice is packed in a similar pattern built starting from the hexagonal close packing! But the hexagonal close packing looks just like the cubic close packing if you only look at two layers… so this similar pattern gives a packing of balls with the same density as the diamond cubic.

Finally, for a tiny taste of some more abstract math: in any dimension you can define a checkerboard lattice called Dn, consisting of all n-tuples of integers that sum to an even integer. Then you can define a set called Dn+ by taking the union of two copies of the Dn lattice: the original and another shifted by the vector (\frac{1}{2}, \dots, \frac{1}{2}).

Dn+ is only a lattice when the dimension n is even! When the dimension is a multiple of 4, it’s an integral lattice, meaning that the dot product of any two vectors in the lattice is an integer. It’s also unimodular, meaning that the volume of the unit cell is 1. And when the dimension is a multiple of 8, it’s also even, meaning that the dot product of any vector with itself is even.

Now, D8+ is very famous: it’s the only even unimodular lattice in 8 dimensions, and it’s usually called E8. In week193, I showed you that in the packing of balls based on this lattice, each ball touches 240 others. It’s extremely beautiful.

But these days I’m trying to stay more focused on the so-called ‘real world’ of chemistry, biology, ecology and the like. This has a beauty of its own. In 3 dimensions, D3 = A3 is the face-centered cubic. D3+ is the diamond lattice. So, in a way, diamonds come as close to E8 as possible in our 3-dimensional world.

The diamond cubic may seem mathematically less thrilling than E8: not even a lattice. Ordinary ice is even more messy. But it has a different charm—the charm of lying at the bordeline of simplicity and complexity—and the advantage of manifesting itself in the universe we easily see around us, with a rich network of relationships to everything else we see.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers