Okay, now let’s dig deeper into the proof of the deficiency zero theorem. We’re only going to prove a baby version, at first. Later we can enhance it:

**Deficiency Zero Theorem (Baby Version).** Suppose we have a weakly reversible reaction network with deficiency zero. Then for any choice of rate constants there exists an equilibrium solution of the rate equation where all species are present in nonzero amounts.

The first step is to write down the rate equation in a new, more conceptual way. It’s incredibly cool. You’ve probably seen Schrödinger’s equation, which describes the motion of a quantum particle:

If you’ve been following this series, you’ve probably also seen the master equation, which describes the motion of a ‘stochastic’ particle:

A ‘stochastic’ particle is one that’s carrying out a random walk, and now describes its probability to be somewhere, instead of its amplitude.

Today we’ll see that the rate equation for a reaction network looks somewhat similar:

where is some matrix, and is defined using a new thing called ‘matrix exponentiation’, which makes the equation *nonlinear!*

If you’re reading this you probably know how to multiply a vector by a matrix. But if you’re like me, you’ve never seen anyone take a vector and raise to the power of some matrix! I’ll explain it, don’t worry… right now I’m just trying to get you intrigued. It’s not complicated, but it’s exciting how this unusual operation shows up naturally in chemistry. That’s just what I’m looking for these days: new math ideas that show up in practical subjects like chemistry, and new ways that math can help people in these subjects.

Since we’re looking for an *equilibrium* solution of the rate equation, we actually want to solve

or in other words

In fact we will do better: we will find a solution of

And we’ll do this in two stages:

• First we’ll find all solutions of

This equation is *linear*, so it’s easy to understand.

• Then, among these solutions we’ll find one that also obeys

This is a *nonlinear* problem involving matrix exponentiation, but still, we can do it, using a clever trick called ‘logarithms’.

Putting the pieces together, we get our solution of

and thus our equilibrium solution of the rate equation:

That’s a rough outline of the plan. But now let’s get started, because the details are actually fascinating. Today I’ll just show you how to rewrite the rate equation in this new way.

### The rate equation

Remember how the rate equation goes. We start with a **stochastic reaction network**, meaning a little diagram like this:

This contains quite a bit of information:

• a finite set of **transitions**,

• a finite set of **complexes**,

• a finite set of **species**,

• a map giving a **rate constant** for each transition,

• **source** and **target** maps saying where each transition starts and ends,

• a one-to-one map saying how each complex is made of species.

Given all this, the rate equation says how the amount of each species changes with time. We describe these amounts with a vector So, we want a differential equation filling in the question marks here:

Now last time, we started by thinking of as a subset of and thus of the vector space Back then, we wrote the rate equation as follows:

where vector exponentiation is defined by

when and are vectors in

However, we’ve now switched to thinking of our set of complexes as a set in its own right that is mapped into by This is good for lots of reasons, like defining the concept of ‘deficiency’, which we did last time. But it means the rate equation above doesn’t quite parse anymore! Things like and live in ; we need to explicitly convert them into elements of using for our equation to make sense!

So now we have to write the **rate equation** like this:

This looks more ugly, but if you’ve got even one mathematical bone in your body, you can already see vague glimmerings of how we’ll rewrite this the way we want:

Here’s how.

First, we extend our maps and to linear maps between vector spaces:

Then, we put an inner product on the vector spaces and For we do this in the most obvious way, by letting the complexes be an orthonormal basis. So, given two complexes we define their inner product by

We do the same for But for we define the inner product in a more clever way involving the rate constants. If are two transitions, we define their inner product by:

This will seem perfectly natural when we continue our study of circuits made of electrical resistors, and if you’re very clever you can already see it lurking in Part 16. But never mind.

Having put inner products on these three vector spaces, we can take the adjoints of the linear maps between them, to get linear maps going back the other way:

These are defined in the usual way—though we’re using daggers here they way physicists do, where mathematicians would prefer to see stars! For example, is defined by the relation

and so on.

Next, we set up a random walk on the set of complexes. Remember, our reaction network is a graph with complexes as vertices and transitions as edges, like this:

Each transition has a number attached to it: the rate constant So, we can randomly hop from complex to complex along these transitions, with probabilities per unit time described by these numbers. The probability of being at some particular complex will then be described by a function

which also depends on time, and changes according to the equation

for some **Hamiltonian**

I defined this Hamiltonian back in Part 15, but now I see a slicker way to write it:

I’ll justify this next time. For now, the main point is that with this Hamiltonian, the rate equation is equivalent to this:

The only thing I haven’t defined yet is the funny exponential That’s what makes the equation nonlinear. We’re taking a *vector* to the power of a *matrix* and getting a *vector*. This sounds weird—but it actually makes sense!

It only makes sense because we have chosen bases for our vector spaces. To understand it, let’s number our species as we’ve been doing all along, and number our complexes Our linear map then becomes a matrix of natural numbers. Its entries say how many times each species shows up in each complex:

The entry says how many times the th species shows up in the th complex.

Now, let’s be a bit daring and think of the vector as a row vector with entries:

Then we can multiply on the *right* by the matrix and get a vector in :

So far, no big deal. But now you’re ready to see the definition of which is very similar:

It’s exactly the same, but with *multiplication* replacing *addition*, and *exponentiation* replacing *multiplication!* Apparently my class on matrices stopped too soon: we learned about matrix multiplication, but matrix exponentiation is also worthwhile.

What’s the point of it? Well, suppose you have a certain number of hydrogen molecules, a certain number of oxygen molecules, a certain number of water molecules, and so on—a certain number of things of each species. You can list these numbers and get a vector Then the components of describe how many ways you can build up each complex from the things you have. For example,

say roughly how many ways you can build complex 1 by picking things of species 1, things of species 2, and so on.

Why ‘roughly’? Well, we’re pretending we can pick the same thing twice. So if we have 4 water molecules and we need to pick 3, this formula gives The right answer is To get this answer we’d need to use the ‘falling power’ as explained in Part 4. But the rate equation describes chemistry in the limit where we have lots of things of each species. In this limit, the ordinary power becomes a good approximation.

**Puzzle.** In this post we’ve seen a vector raised to a matrix power, which is a vector, and also a vector raised to a vector power, which is a number. How are they related?

There’s more to say about this, which I’d be glad to explain if you’re interested. But let’s get to the punchline:

**Theorem.** The rate equation:

is equivalent to this equation:

or in other words:

*Proof.* It’s enough to show

So, we’ll compute and think about the meaning of each quantity we get *en route*.

We start with This is a list of numbers saying how many things of each species we have: our raw ingredients, as it were. Then we compute

This is a vector in It’s a list of numbers saying how many ways we can build each complex starting from our raw ingredients.

Alternatively, we can write this vector as a sum over basis vectors:

Next let’s apply to this. We claim that

In other words, we claim is the sum of all the transitions having as their source, *weighted by their rate constants!* To prove this claim, it’s enough to take the inner product of each side with any transition and check that we get the same answer. For the left side we get

To compute the right side, we need to use the cleverly chosen inner product on Here we get

In the first step here, the factor of in the cleverly chosen inner product canceled the visible factor of For the second step, you just need to think for half a minute—or ten, depending on how much coffee you’ve had.

Either way, we we conclude that indeed

Next let’s combine this with our formula for :

We get this:

In other words, is a linear combination of transitions, where each one is weighted both by *the rate it happens* and *how many ways it can happen* starting with our raw ingredients.

Our goal is to compute We’re almost there. Remember, says which complex is the input of a given transition, and says which complex is the output. So, says the total rate at which complexes are created and/or destroyed starting with the species in as our raw ingredients.

That sounds good. But let’s just pedantically check that everything works. Applying to both sides of our last equation, we get

Remember, our goal was to prove that this equals

But if you stare at these a while and think, you’ll see they’re equal. █

It took me a couple of weeks to really understand this, so I’ll be happy if it takes you just a few days. It seems peculiar at first but ultimately it all makes sense. The interesting subtlety is that we use the linear map called ‘multiplying by ’:

to take a bunch of complexes and work out the species they contain, while we use the *nonlinear* map called ‘raising to the th power’:

to take a bunch of species and work out how many ways we can build each complex from them. There is much more to say about this: for example, these maps arise from a pair of what category theorists call ‘adjoint functors’. But I’m worn out and you probably are too, if you’re still here at all.

### References

I found this thesis to be the most helpful reference when I was trying to understand the proof of the deficiency zero theorem:

• Jonathan M. Guberman, *Mass Action Reaction Networks and the Deficiency Zero Theorem*, B.A. thesis, Department of Mathematics, Harvard University, 2003.

I urge you to check it out. In particular, Section 3 and Appendix A discuss matrix exponentiation. Has anyone discussed this before?

Here’s another good modern treatment of the deficiency zero theorem:

• Jeremy Gunawardena, Chemical reaction network theory for *in silico* biologists, 2003.

The theorem was first proved here:

• Martin Feinberg, Chemical reaction network structure and the stability of complex isothermal reactors: I. The deficiency zero and deficiency one theorems, *Chemical Engineering Science* **42** (1987), 2229-2268.

However, Feinberg’s treatment here relies heavily on this paper:

• F. Horn and R. Jackson, General mass action kinetics, *Archive for Rational Mechanics and Analysis* **47** (1972), 81-116.

(Does anyone work on ‘irrational mechanics’?) These lectures are also very helpful:

• Martin Feinberg, Lectures on reaction networks, 1979.

If you’ve seen other proofs, let us know.

As to the puzzle, the vector raised to a matrix power appears to be a vector of vectors raised to vector powers, namely, the base vector raised to each column of the matrix in turn. Is that all you’re getting at, or is there something deeper?

That’s good for starters, but there’s something else to say, which is even ‘better’.

Still, you get the prize for being the being the only person in the world to have read this post. I found this post to the most exciting one to write so far, but that’s always dangerous. In school, I’ve noticed that when the teacher gets interested in what they’re saying, the class gets lost or bored. So my usual rule for teaching is: don’t focus on what you’re saying, focus on how you’re saying it and how people react. But I broke that rule here because I’d just understood a bunch of new stuff.

Okay, well, I’ll take another stab at it then. If you think of vectors as column vectors (i.e., as matrices) then the two definitions coincide (assuming you identify vectors with numbers), so that raising a vector to a vector is a special case of raising a vector to a matrix. Am I getting warmer or colder?

And I enjoyed this post very much. I just need to chew on it a while longer. Hopefully I’ll be able to find the time.

Speaking of which, I have a (possibly silly, ill-posed, not well-thought out) question. I haven’t been able to keep up with this series as closely as I’d like, so it may have been discussed in one of the earlier posts, but here goes. If I think of the stochastic reaction network as a stochastic Petri net, is the adjoint of the map introduced here related to some kind of Fockification (maybe something akin to the functor in Reed and Simon that I vaguely recall from another life) that gives rise to the appropriate powers of raising and lowering operators we saw so much of in the earlier posts? Hopefully that makes some kind of sense…. If not, I’ll think on it some more and try to ask a better question.

On further reflection, that question is overly convoluted. So, let me try a simpler version: Where are the raising and lowering operators hiding in this formalism? Is it another map that isn’t present here or do they arise from one of the maps you’ve introduced (or perhaps a composition of some of the maps you’ve introduced)?

I’ll make the trivial observation that raising a vector to a vector can be regarded as looking at what the “projection” of raising the vector to a matrix with that as one of the columns along that basis. So in particular, one can imagine a “spectrum” of exponentiaions ranging from along one basis dimension (ie, result complex) trhrough subsets of the resulting complexes (using a sub-matrix formed from the columns) ending up at the exponential using the full matrix (ie, accounting for all the complexes generatable).

At this point, my mind reaches for the standard tools for understanding linear algebra, namely the singular value decomposition. This talks about how any matrix linear transformation can be decomposed into a sum of its actions on an orthogonal basis. If we were to look for some similarish decomposition of the matrix that would still tell us something about the result of the nonlinear vector exponential….?

Continuing the thought in my (currently awaiting approval) comment, if we get an SVD-like decomposition where we’ve absorbed the diagonal term of the SVD into the s (say), then then if , then the contribution to first component is , the contribution to the second component is , etc. So “having computed the s”, the th component of is . Not sure if this actually helps understand possible behaviours much better though…

Dan wrote:

You’re hot!

That’s the answer I was fishing for: when you raise a vector to a vector power and get a number, it’s a

special caseof raising an matrix to the power of an matrix and getting an matrix. Just take .It just like how when you take the dot product of two vectors and get a number, you can see it as a special case of matrix multiplication. Think of the first vector as a row vector: a matrix. Think of the second vector as a column vector: a matrix. Multiply these matrices and presto, you get a matrix, or number.

Of course later, when we get sophisticated, we like to think of column vectors and row vectors as living in dual vector spaces, not the same vector space, and then this operation is called, not the ‘dot product’, but the pairing between a vector and a dual vector.

But never mind: all that sophisticated stuff is designed to ultimately make things basis-independent, but matrix exponentiation definitely

doesdepend on a choice of basis. That’s okay, because our vectors here are really functions on a finite set: the set of complexes, say, or the set of species. So, we have god-given bases, and we get to use them.Dan wrote:

If you watched the earlier episodes of this show, you know that raising and lowering operators, and Fock space, are extremely important in studying the master equation for a stochastic reaction network (aka stochastic Petri net).

But now we’re talking about the rate equation. This is like the ‘classical limit’ of the master equation, or more precisely its ‘large number limit’.

The master equation is all about the probability that we have some number of things of each species. These numbers are natural numbers, so they jump discretely… but we only know the

probabilitythat these numbers are this or that, and these probabilities vary smoothly with the passage of time.The rate equation is all about the number of things of each species. Now we act like we know them for sure, and that they’re positive real numbers, and that they vary smoothly with the passage of time.

That’s all fun… and we can rigorously derive the rate equation from the master equation in a certain limit where there are large numbers of things.

But in Part 23 we’ll see an amazing extra twist! In our quest to prove the deficiency zero theorem, we’ll introduce

anothermaster equation, which describes a Markov process on the set of complexes (which are bunches of things of our various species). The dynamics of this Markov process are in general different than that of the rate equation… but we’ll get equilibrium solutions of the rate equation from equilibria of this Markov process!Operators like and play a huge role in constructing the Hamiltonian for this Markov process on complexes. But they are not, as far as I can tell, directly related to operators on the Fock space we saw earlier.

But of course, this is the sort of thing I’m still busy trying to understand.

By the way, any Markov process gives rise to a stochastic Petri net of a special sort, where each transition has just one input and one output. The space of states for the stochastic Petri net is the ‘Fock space’ of that for the Markov process. In other words, the space of states for the Markov process is the ‘single-particle space’ of the space of states for the stochastic Petri net. And here’s what you’ll like: time evolution for the Petri net is obtained by applying Reed and Simon’s , or at least its stochastic analogue, to time evolution for the Markov process!

I’ve been meaning to talk about this for a long time. But this is

notone of the games I’m playing right now.Thank you for the detailed reply to a very vague question. It hit exactly on what was confusing me, namely, the you introduced above is (in retrospect, obviously) different from the hamiltonian that gives rise to the Master equation that the rate equation is a large number approximation for.

But of course, the danger of a detailed reply is that it often raises other questions that can get you off topic. So, rather than ask an off topic question, I’ll just put in a request: I’d love to see the rigorous proof showing the rate equation as the limit of the Master equation. Defining , using some notation from a ways back, it’s straightforward algebra to show that the Master equation implies something very much like the rate equation, only with falling powers in place of ordinary powers. And intuitively, for large N those are the same. But from there, I’m not sure what sort of convergence to try and whether I need to make additional assumptions, like possibly a condition on the variance of ?

Dan wrote:

Yes, that’s something that’s missing in these posts, which really should be included. I haven’t actually written up such a proof, and I haven’t seen one anywhere though it seems like an obvious thing for people to have studied. You sketched the basic idea quite nicely—and based on that idea, my ‘analysis brain’ feels completely confident it can jump in and supply all the necessary epsilons and deltas, and hypotheses, that will lead to a nice theorem. Getting the strongest possible theorem would be hard! Getting a theorem that illustrates the idea should not be hard. But I won’t try to do it now… that’s more like a blog post on its own, and one that will take about a week to write.

A crucial point is that the solution of the master equation will only be close to that of the rate equation if the ‘spread’ of the probability distributions of the numbers of particles of each species is sufficiently small, as well as their means being large. However, as time passes, the spread will tend to grow. So, the rate equation will only be a good approximation to the master equation for a certain amount of time, not forever… except in certain very special cases.

For example, there are stochastic reaction networks whose rate equations have periodic solutions which never approach any sort of equilibrium. However, the solutions of the corresponding master equation will probably ‘smear out’ and approach an equilibrium, at least in many cases.

You certainly need something like that, but probably more. I did a bunch of calculations at one point and it seems I needed not just bounds on the variance but also the higher moments. Basically the trick is just to calculate the difference between what you want and what you’ve got, and see what’s required to make that be small!

I read it! I just don’t have anything all that interesting to say.

(I’m trying to arrange things so that I’ll have time to work on the spatial version of stochastic mechanics later in the fall.)

I read it too — But not carefully enough to understand what it is that I don’t understand.

Okay. If you have any questions, don’t be shy. The series is building up to a climax: in Part 24 I’ll prove the deficiency zero theorem, or at least the part of it that I want to prove. But as usual in math, this climax involves putting together lots of ideas that were discussed earlier… and as usual in math, most people will be left scratching their heads and wondering what just happened. At least in a blog, unlike a textbook, one can ask questions or complain.

Puzzle: if I see things correctly, as you say a vector raised to a vector power gives you a number, which is similar to a dot product, while a vector raised to a matrix power, gives you a vector, which is similar to a vector (external) product, since the dimensionality can be whatever you want. But there is something more I think that goes around here, I think.

In fact, let’s assume that you choose a square matrix so that the initial vector and the final vector both live on the same space, then I’d guess you may find a bunch of matrices which leave the initial vector unchanged up to an overall constant. This is like an eigenstate/eigenvector of the matrix power operator, and the whole story looks like a generalization of the usual ‘linear’ version of it, very common in linear algebra or quantum mechanics. You can find your projectors, orthogonal spaces, and so on, and even do a little group theory using matrices that will switch around components of the initial vector and so on.

I’m too tired right now to actually try, but it seems very intriguing since the generators of the typical groups (rotations etc) must have the same algebra by definition but I think will have a different form, since the way the algebra acts on them is not the usual matrix multiplication but this power multiplications. It would be fun to see how this generators relate to the ‘linear’ ones.

In some sense the S matrix in particle physics represents all the possible interactions between an initial and a final state, and now I’m starting thinking whether this formulation could be useful in physics in general or at least in cosmology before the last scattering or so, when all the fundamental interactions may look more like chemistry.

Maybe I’m really to tired hahaha, anyhow I hope I’m not seeing ghosts here, and thanks a lot for the very cool post!

Ruggero wrote:

Indeed this ‘generalization’ is closely related to the usual linear story. In the climactic moment of Part 24, I will suddenly pull out this formula:

and use it to reduce a problem involving to a problem involving ordinary matrix multiplication.

I’ll explain this formula when I need it, but in fact I’ve already begun to explain it down here in the comments. The above formula would look nicer if I let myself multiply a vector by a matrix on the right:

and it would look even nicer, perhaps, if I took the componentwise logarithm of each side:

Anyway, this means that tools developed for matrix multiplication can be applied to matrix exponentiation.

Over on Google+, I admitted that with matrix exponentiation as defined here, needn’t be an identity matrix.

Dayal D. Purohit replied:

I responded, a wee bit tetchily:

In this article, if we reversed the directions of s, t so that , then wouldn’t the new rate equation even more resemble the master equation, i.e. ?

Also, am I right in concluding that we rewrote the rate equation to have a complex based definition rather than a transition based definition as before, perhaps because the ACK and the deficiency 0 theorems have conditions involving complexes in their statements?