Lyapunov Functions for Complex-Balanced Systems

guest post by Manoj Gopalkrishnan

A few weeks back, I promised to tell you more about a long-standing open problem in reaction networks, the ‘global attractor conjecture’. I am not going to quite get there today, but we shall take one step in that direction.

Today’s plan is to help you make friends with a very useful function we will call the ‘free energy’ which comes up all the time in the study of chemical reaction networks. We will see that for complex-balanced systems, the free energy function decreases along trajectories of the rate equation. I’m going to explain this statement, and give you most of the proof!

The point of doing all this work is that we will then be able to invoke Lyapunov’s theorem which implies stability of the dynamics. In Greek mythology, Sisyphus was cursed to roll a boulder up a hill only to have it roll down again, so that he had to keep repeating the task for all eternity. When I think of an unstable equilibrium, I imagine a boulder delicately balanced on top of a hill, which will fall off if given the slightest push:

or, more abstractly:

On the other hand, I picture a stable equilibrium as a pebble at the very bottom of a hill. Whichever way a perturbation takes it is up, so it will roll down again to the bottom:

Lyapunov’s theorem guarantees stability provided we can exhibit a nice enough function V that decreases along trajectories. ‘Nice enough’ means that, viewing V as a height function for the hill, the equilibrium configuration should be at the bottom, and every direction from there should be up. If Sisyphus had dug a pit at the top of the hill for the boulder to rest in, Lyapunov’s theorem would have applied, and he could have gone home to rest. The moral of the story is that it pays to learn dynamical systems theory!

Because of the connection to Lyapunov’s theorem, such functions that decrease along trajectories are also called Lyapunov functions. A similar situation is seen in Boltzmann’s H-theorem, and hence such functions are sometimes called H-functions by physicists.

Another reason for me to talk about these ideas now is that I have posted a new article on the arXiv:

• Manoj Gopalkrishnan, On the Lyapunov function for complex-balanced mass-action systems.

The free energy function in chemical reaction networks goes back at least to 1972, to this paper:

• Friedrich Horn and Roy Jackson, General mass action kinetics, Arch. Rational Mech. Analysis 49 (1972), 81–116.

Many of us credit Horn and Jackson’s paper with starting the mathematical study of reaction networks. My paper is an exposition of the main result of Horn and Jackson, with a shorter and simpler proof. The gain comes because Horn and Jackson proved all their results from scratch, whereas I’m using some easy results from graph theory, and the log-sum inequality.

We shall be talking about reaction networks. Remember the idea from the network theory series. We have a set S whose elements are called species, for example

S = \{ \mathrm{H}_2\mathrm{O}, \mathrm{H}^+, \mathrm{OH}^- \}

A complex is a vector of natural numbers saying how many items of each species we have. For example, we could have a complex (2,3,1). But chemists would usually write this as

2 \mathrm{H}_2\mathrm{O} + 3 \mathrm{H}^+ + \mathrm{OH}^-

A reaction network is a set S of species and a set T of transitions or reactions, where each transition \tau \in T goes from some complex m(\tau) to some complex n(\tau). For example, we could have a transition \tau with

m(\tau) = \mathrm{H}_2\mathrm{O}

and

n(\tau) = \mathrm{H}^+ + \mathrm{OH}^-

In this situation chemists usually write

\mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-

but we want names like \tau for our transitions, so we might write

\tau : \mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-

or

\mathrm{H}_2\mathrm{O} \stackrel{\tau}{\longrightarrow} \mathrm{H}^+ + \mathrm{OH}^-

As John explained in Part 3 of the network theory series, chemists like to work with a vector of nonnegative real numbers x(t) saying the concentration of each species at time t. If we know a rate constant r(\tau) > 0 for each transition \tau, we can write down an equation saying how these concentrations change with time:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) (n(\tau) - m(\tau)) x^{m(\tau)} }

This is called the rate equation. It’s really a system of ODEs describing how the concentration of each species change with time. Here an expression like x^m is shorthand for the monomial {x_1}^{m_1} \cdots {x_k}^{m_k}.

John and Brendan talked about complex balance in Part 9. I’m going to recall this definition, from a slightly different point of view that will be helpful for the result we are trying to prove.

We can draw a reaction network as a graph! The vertices of this graph are all the complexes m(\tau), n(\tau) where \tau \in T. The edges are all the transitions \tau\in T. We think of each edge \tau as directed, going from m(\tau) to n(\tau).

We will call the map that sends each transition \tau to the positive real number r(\tau) x^{m(\tau)} the flow f_x(\tau) on this graph. The rate equation can be rewritten very simply in terms of this flow as:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T}(n(\tau) - m(\tau)) \, f_x(\tau) }

where the right-hand side is now a linear expression in the flow f_x.

Flows of water, or electric current, obey a version of Kirchhoff’s current law. Such flows are called conservative flows. The following two lemmas from graph theory are immediate for conservative flows:

Lemma 1. If f is a conservative flow then the net flow across every cut is zero.

A cut is a way of chopping the graph in two, like this:


It’s easy to prove Lemma 1 by induction, moving one vertex across the cut at a time.

Lemma 2. If a conservative flow exists then every edge \tau\in T is part of a directed cycle.

Why is Lemma 2 true? Suppose there exists an edge \tau : m \to n that is not part of any directed cycle. We will exhibit a cut with non-zero net flow. By Lemma 1, this will imply that the flow is not conservative.

One side of the cut will consist of all vertices from which m is reachable by a directed path in the reaction network. The other side of the cut contains at least n, since m is not reachable from n, by the assumption that \tau is not part of a directed cycle. There is flow going from left to right of the cut, across the transition \tau. Since there can be no flow coming back, this cut has nonzero net flow, and we’re done.     ▮

Now, back to the rate equation! We can ask if the flow f_x is conservative. That is, we can ask if, for every complex n:

\displaystyle{ \sum_{\tau : m \to n} f_x(m,n) = \sum_{\tau : n \to p} f_x(n,p). }

In words, we are asking if the sum of the flow through all transitions coming in to n equals the sum of the flow through all transitions going out of n. If this condition is satisfied at a vector of concentrations x = \alpha, so that the flow f_\alpha is conservative, then we call \alpha a point of complex balance. If in addition, every component of \alpha is strictly positive, then we say that the system is complex balanced.

Clearly if \alpha is a point of complex balance, it’s an equilibrium solution of the rate equation. In other words, x(t) = \alpha is a solution of the rate equation, where x(t) never changes.

I’m using ‘equilibrium’ the way mathematicians do. But I should warn you that chemists use ‘equilibrium’ to mean something more than merely a solution that doesn’t change with time. They often also mean it’s a point of complex balance, or even more. People actually get into arguments about this at conferences.

Complex balance implies more than mere equilibrium. For starters, if a reaction network is such that every edge belongs to a directed cycle, then one says that the reaction network is weakly reversible. So Lemmas 1 and 2 establish that complex-balanced systems must be weakly reversible!

From here on, we fix a complex-balanced system, with \alpha a strictly positive point of complex balance.

Definition. The free energy function is the function

g_\alpha(x) = \sum_{s\in S} x_s \log x_s - x_s - x_s \log \alpha_s

where the sum is over all species in S.

The whole point of defining the function this way is because it is the unique function, up to an additive constant, whose partial derivative with respect to x_s is \log x_s/\alpha_s. This is important enough that we write it as a lemma. To state it in a pithy way, it is helpful to introduce vector notation for division and logarithms. If x and y are two vectors, we will understand x/y to mean the vector z such that z_s = x_s/ y_s coordinate-wise. Similarly \log x is defined in a coordinate-wise sense as the vector with coordinates (\log x)_s = \log x_s.

Lemma 3. The gradient \nabla g_\alpha(x) of g_\alpha(x) equals \log(x/\alpha).

We’re ready to state our main theorem!

Theorem. Fix a trajectory x(t) of the rate equation. Then g_\alpha(x(t)) is a decreasing function of time t. Further, it is strictly decreasing unless x(t) is an equilibrium solution of the rate equation.

I find precise mathematical statements reassuring. You can often make up your mind about the truth value from a few examples. Very often, though not always, a few well-chosen examples are all you need to get the general idea for the proof. Such is the case for the above theorem. There are three key examples: the two-cycle, the three-cycle, and the figure-eight.

The two-cycle. The two-cycle is this reaction network:

It has two complexes m and n and two transitions \tau_1 = m\to n and \tau_2 = n\to m, with rates r_1 = r(\tau_1) and r_2 = r(\tau_2) respectively.

Fix a solution x(t) of the rate equation. Then the flow from m to n equals r_1 x^m and the backward flow equals r_2 x^n. The condition for f_\alpha to be a conservative flow requires that f_\alpha = r_1 \alpha^m = r_2 \alpha^n. This is one binomial equation in at least one variable, and clearly has a solution in the positive reals. We have just shown that every two-cycle is complex balanced.

The derivative d g_\alpha(x(t))/d t can now be computed by the chain rule, using Lemma 3. It works out to f_\alpha times

\displaystyle{ \left((x/\alpha)^m - (x/\alpha)^n\right) \, \log\frac{(x/\alpha)^n}{(x/\alpha)^m} }

This is never positive, and it’s zero if and only if

(x/\alpha)^m = (x/\alpha)^n

Why is this? Simply because the logarithm of something greater than 1 is positive, while the log of something less than 1 is negative, so that the sign of (x/\alpha)^m - (x/\alpha)^n is always opposite the sign of \log \frac{(x/\alpha)^n}{(x/\alpha)^m}. We have verified our theorem for this example.

(Note that (x/\alpha)^m = (x/\alpha)^n occurs when x = \alpha, but also at other points: in this example, there is a whole hypersurface consisting of points of complex balance.)

In fact, this simple calculation achieves much more.

Definition. A reaction network is reversible if for every transition \tau  : m \to n there is a transition \tau' : m \to n going back, called the reverse of \tau. Suppose we have a reversible reaction network and a vector of concentrations \alpha such that the flow along each edge equals that along the edge going back:

f_\alpha(\tau) = f_\alpha(\tau')

whenever \tau' is the reverse \tau. Then we say the reaction network is detailed balanced, and \alpha is a point of detailed balance.

For a detailed-balanced system, the time derivative of g_\alpha is a sum over the contributions of pairs consisting of an edge and its reverse. Hence, the two-cycle calculation shows that the theorem holds for all detailed balanced systems!

This linearity trick is going to prove very valuable. It will allow us to treat the general case of complex balanced systems one cycle at a time. The proof for a single cycle is essentially contained in the example of a three-cycle, which we treat next:

The three-cycle. The three-cycle is this reaction network:

We assume that the system is complex balanced, so that

f_\alpha(m_1\to m_2) = f_\alpha(m_2\to m_3) = f_\alpha(m_3\to m_1)

Let us call this nonnegative number f_\alpha. A small calculation employing the chain rule shows that d g_\alpha(x(t))/d t equals f_\alpha times

\displaystyle{ (x/\alpha)^{m_1}\, \log\frac{(x/\alpha)^{m_2}}{(x/\alpha)^{m_1}} \; + }

\displaystyle{ (x/\alpha)^{m_2} \, \log\frac{(x/\alpha)^{m_3}}{(x/\alpha)^{m_2}} \; + }

\displaystyle{  (x/\alpha)^{m_3}\, \log\frac{(x/\alpha)^{m_1}}{(x/\alpha)^{m_3}} }

We need to think about the sign of this quantity:

Lemma 3. Let a,b,c be positive numbers. Then a \log b/a + b\log c/b + c\log a/c is less than or equal to zero, with equality precisely when a=b=c.

The proof is a direct application of the log sum inequality. In fact, this holds not just for three numbers, but for any finite list of numbers. Indeed, that is precisely how one obtains the proof for cycles of arbitrary length. Even the two-cycle proof is a special case! If you are wondering how the log sum inequality is proved, it is an application of Jensen’s inequality, that workhorse of convex analysis.

The three-cycle calculation extends to a proof for the theorem so long as there is no directed edge that is shared between two directed cycles. When there are such edges, we need to argue that the flows f_\alpha and f_x can be split between the cycles sharing that edge in a consistent manner, so that the cycles can be analyzed independently. We will need the following simple lemma about conservative flows from graph theory. We will apply this lemma to the flow f_\alpha.

Lemma 4. Let f be a conservative flow on a graph G. Then there exist directed cycles C_1, C_2,\dots, C_k in G, and nonnegative real ‘flows’ f_1,f_2,\dots,f_k \in [0,\infty] such that for each directed edge e in G, the flow f(e) equals the sum of f_i over i such the cycle C_i contains the edge e.

Intuitively, this lemma says that conservative flows come from constant flows on the directed cycles of the graph. How does one show this lemma? I’m sure there are several proofs, and I hope some of you can share some of the really neat ones with me. The one I employed was algorithmic. The idea is to pick a cycle, any cycle, and subtract the maximum constant flow that this cycle allows, and repeat. This is most easily understood by looking at the example of the figure-eight:

The figure-eight. This reaction network consists of two three-cycles sharing an edge:

Here’s the proof for Lemma 4. Let f be a conservative flow on this graph. We want to exhibit cycles and flows on this graph according to Lemma 4. We arbitrarily pick any cycle in the graph. For example, in the figure-eight, suppose we pick the cycle u_0\to u_1\to u_2\to u_0. We pick an edge in this cycle on which the flow is minimum. In this case, f(u_0\to u_1) = f(u_2\to u_0) is the minimum. We define a remainder flow by subtracting from f this constant flow which was restricted to one cycle. So the remainder flow is the same as f on edges that don’t belong to the picked cycle. For edges that belong to the cycle, the remainder flow is f minus the minimum of f on this cycle. We observe that this remainder flow satisfies the conditions of Lemma 4 on a graph with strictly fewer edges. Continuing in this way, since the lemma is trivially true for the empty graph, we are done by infinite descent.

Now that we know how to split the flow f_\alpha across cycles, we can figure out how to split the rates across the different cycles. This will tell us how to split the flow f_x across cycles. Again, this is best illustrated by an example.

The figure-eight. Again, this reaction network looks like

Suppose as in Lemma 4, we obtain the cycles

C_1 = u_0\to u_1\to u_2\to u_0

with constant flow f_\alpha^1

and

C_2 = u_3\to u_1\to u_2\to u_3 with constant flow f_\alpha^2 such that

f_\alpha^1 + f_\alpha^2 = f_\alpha(u_1\to u_2)

Here’s the picture:

Then we obtain rates r^1(u_1\to u_2) and r^2(u_1\to u_2) by solving the equations

f^1_\alpha = r^1(u_1\to u_2) \alpha^{u_1}

f^2_\alpha = r^2(u_1\to u_2) \alpha^{u_2}

Using these rates, we can define non-constant flows f^1_x on C_1 and f^2_x on C_2 by the usual formulas:

f^1_x(u_1\to u_2) = r^1(u_1\to u_2) x^{u_1}

and similarly for f^2_x. In particular, this gives us

f^1_x(u_1\to u_2)/f^1_\alpha = (x/\alpha)^{u_1}

and similarly for f^2_x.

Using this, we obtain the proof of the Theorem! The time derivative of g_\alpha along a trajectory has a contribution from each cycle C as in Lemma 4, where each cycle is treated as a separate system with the new rates r^C, and the new flows f^C_\alpha and f^C_x. So, we’ve reduced the problem to the case of a cycle, which we’ve already done.

Let’s review what happened. The time derivative of the function g_\alpha has a very nice form, which is linear in the flow f_x. The reaction network can be broken up into cycles. Th e conservative flow f_\alpha for a complex balanced system can be split into conservative flows on cycles by Lemma 4. This informs us how to split the non-conservative flow f_x across cycles. By linearity of the time derivative, we can separately treat the case for every cycle. For each cycle, we get an expression to which the log sum inequality applies, giving us the final result that g_\alpha decreases along trajectories of the rate equation.

Now that we have a Lyapunov function, we will put it to use to obtain some nice theorems about the dynamics, and finally state the global attractor conjecture. All that and more, in the next blog post!

32 Responses to Lyapunov Functions for Complex-Balanced Systems

  1. John Baez says:

    Great article! I’ve got a number of questions. The first is, how does your proof simplify the arguments used, say, here:

    • Jonathan M. Guberman, Mass Action Reaction Networks and the Deficiency Zero Theorem, B.A. thesis, Department of Mathematics, Harvard University, 2003.

    This thesis is mainly a review of known stuff, so I imagine his proof that free energy is a Lyapunov function is similar to Horn and Jackson’s; I’m just citing this because it’s a nice self-contained treatment of the deficiency zero theorem, and some things seem to have been cleaned up.

    He defines the free energy function on page 28 of the PDF file (which is numbered page 26—don’t you just hate that?). Starting on page 30 he proves stuff about it for “cyclic systems”, and starting on page 34 he talks about a “generalization to non-cyclic systems”, saying at one point

    The following lemma is the fundamental result that allows the decomposition of general mass action systems into cyclic systems; it states that any system can be broken down into a cyclic and a noncyclic component, and that the noncyclic component is in some way “smaller” than the original system, allowing this process to be iterated such that it does not continue indefinitely.

    This sounds like your argument using “infinite descent”.

    Your argument looks simpler and shorter to me, but not having read this part of Glauberman’s thesis in detail I’m curious what your main simplifications actually were!

    • Hi John,

      Great article!

      Thanks! And thanks for your help in editing this document!

      Thanks for pointing me to Guberman’s B. A. thesis, I hadn’t looked at it before. The observation that the analysis for a complex balanced system can be broken down into analysis on cycles was one of the key ideas in Horn and Jackson’s paper! Perhaps I should have stressed this more.

      My simplifications are:
      1. Lemma 7.2 in Guberman’s thesis. Horn and Jackson prove a similar lemma in their Appendix. I prove this by invoking the log-sum inequality, a trick which was actually first pointed out to me by my colleague Pranab Sen.
      2. Writing the decomposition lemmas in the language of cuts and flows makes some steps more transparent. For example, I am able to state Lemma 4 for graphs, and then apply it to reaction networks.

  2. David Tanzer says:

    Thanks for this fine article.

    You wrote: “For a detail-balanced system, the time derivative of g_a is a sum over the contributions of pairs consisting of an edge and its reverse.” Can you write out this statement more formally. Thanks.

  3. Hi David,

    Thanks, and also thank you for the useful comments at the draft stage.

    By the chain rule, the time derivative of g_\alpha equals:

    \displaystyle\frac{dg_\alpha(x(t))}{dt} = \log(x/\alpha) . \frac{dx}{dt}

    We wrote the rate equation in terms of the flow as:

    \displaystyle \frac{dx}{dt} = \sum_{\tau\in T} (m(\tau) - n(\tau)) f_x(\tau).

    So we can write

    \displaystyle\frac{dg_\alpha(x(t))}{dt} = \sum_{\tau\in T} \left(\log(x/\alpha) (m(\tau) - n(\tau))\right)f_x(\tau).

    Now it is up to us how we want to collect terms in this sum. If the system is detailed balanced, we collect a transition and its reverse together. If the system is complex-balanced, we decompose into cycles, also decomposing the flows appropriately, and collect terms for each cycle. Hope that helped!

  4. John Baez says:

    Here’s another question. You mention how chemists tend to mean a lot more by ‘equilibrium’ than merely time-independence. It seems not just complex balance but detailed balance is often considered a fundamental law for chemical systems in equilibrium. For example, Wikipedia says:

    In 1901, Rudolf Wegscheider introduced the principle of detailed balance for chemical kinetics. In particular, he demonstrated that the irreversible cycles A_1 \to A_2 \to ... \to A_n \to A_1 are impossible and found explicitly the relations between kinetic constants that follow from the principle of detailed balance.

    So, I wonder what conditions on a reaction network imply that every point of complex balance is a point of detailed balance? The simplest most obvious guess is that the reaction network be reversible! This is obviously necessary, but is it sufficient?

  5. Yes, John, in many communities “equilibrium” automatically means “detailed-balanced.”

    No it is not sufficient. I think it was Bernd Sturmfels who pointed out to me that there are reversible reaction networks that are complex balanced but not detailed balanced! And these are not hard to find, you should be able to find a 3-cycle on two species that satisfies this property.

    • John Baez says:

      Okay, thanks. This raises an interesting question, then. Since in chemistry detailed balance seems to be a “law of nature” for systems in equilibrium, there should be some property of “realistic” reaction networks that guarantees the existence of points of detailed balance. What is this property?

      Maybe this property should guarantee the existence of at least one point of detailed balance per conservation class. Maybe it should guarantee that every point of complex balance is a point of detailed balance. Or maybe it should be weaker.

      Looking at the counterexample you mention, and seeing what if anything is “unrealistic” about it, may help us understand this issue—if nobody has figured it out already.

      Getting some conditions that pick out “realistic” reaction networks could be interesting, for many reasons. For example, maybe it’s easier to prove the Global Attractor Conjecture for “realistic” networks.

      • Hi John,

        Next time I will show that:

        • If a reaction network is complex-balanced, then every non-negative equilibrium is complex-balanced. Further, within every conservation class, there is precisely one positive equilibrium — which of course is complex-balanced by what I said above.

        • If a reaction network is detailed-balanced, then every non-negative equilibrium is detailed-balanced. Further, within every conservation class, there is precisely one positive equilibrium.

        Detailed-balance is a manifestation of “microscopic reversibility.” By Noether’s theorem, this leads to a conserved quantity, i.e., energy. Indeed, if we assign an energy to each species, then detailed-balance is the condition that there are no energy cycles. We present this idea in our paper:
        On the Mathematics of the Law of Mass Action. In other words, detailed balance == first law of thermodynamics.

        Indeed it is easier to prove the Global Attractor Conjecture for “realistic” networks! We do this also in our paper linked above! We introduce the notion of “Atomic” reaction networks, where there are some elementary species called atoms, and all other species are made out of atoms in a unique way. Further, reactions preserve atoms. For such networks that satisfy detailed balance, we are able to show the global attractor conjecture.

        The autocatalysis result I spoke about in my last post is in fact a generalization of this initial result. The sequence of results goes thus: all atomic systems are “prime” – they generate prime ideals in some appropriate sense. All prime systems are non-catalytic. All non-catalytic systems are non-autocatalytic. We can prove that non-autocatalytic systems are precisely the ones without critical siphons, thus obtaining the Global Attractor Conjecture for all non-autocatalytic complex-balanced systems. These ideas are developed in the three papers:

        On the Mathematics of the Law of Mass Action

        Catalysis in Reaction Networks Bull. Math Biol. 2011, 73:2962-2982,

        Autocatalysis in Reaction Networks

      • John Baez says:

        Great! I’ll need to read these 3 papers. They sound very interesting.

        The results in your paper On the mathematics of the law of mass action seem very important. I need to read this paper and thoroughly understand it. But the term ‘event-system’ is a bit off-putting to me. I need to make sure I can translate results about event-systems into results about chemical reaction networks. Maybe you can help me.

        Is a ‘finite, physical event-system’ the same as what I’d call a ‘reaction network’ with some finite set of species, finite set of transitions (=reactions), and a positive rate constant for each transition?

      • John Baez says:

        I’m really glad your paper On the mathematics of the law of mass action introduces the concept of a reaction network where each species has an energy. This indeed sounds like just the right idea if you want detailed balance!

        Has someone studied how this concept of energy is related to the concept of ‘free energy’ discussed in this post? There should be a theorem for this class of reaction networks saying

        F = E - T S

        If nobody has proved it yet, we should do it now!

        I’ll say a bit about this below, in the special case of reversible reaction networks where each transition has just a single species as input and a single species as output.

  6. domenico says:

    Great article!

    I have a problem; the mathematical description of the chemical reaction is true for great number of molecules, so that if it used a little number of molecules, or if the concentration of the gas is low, then the reaction network have not fixed the rate constants (it seem, to me, that there are rate constant fluctuations); it seem, to me, that it is not like a cross section that have a fixed value, because of – here – there is a probability of the trajectories crossing.

    If the rate constants are not fixed, then there is a fluctuation of the chemical reactions.

    But if this is true for low concentrations, then this is true ever (there is not a clear separation between low density and high density, with ever a little reaction fluctuations).

    • John Baez says:

      With low numbers of molecules you don’t want to use the rate equation described here. You want to use the master equation, which was introduced in Part 4 of the network theory series. The difference between the two was explained in Part 2

      But if this is true for low concentrations, then this is true ever (there is not a clear separation between low density and high density, with ever a little reaction fluctuations).

      The rate equation is only an approximation to the master equation, which is itself only an approximation to the actual laws of quantum mechanics governing chemistry, which are themselves only approximations to quantum field theory, etc.

      Arjun Jain and I have a partially completed proof that the master equation reduces to the rate equation in a certain limit. I need to post that here! The remaining step in the proof is to rigorously justify passing a limit through a derivative.

  7. arch1 says:

    I’m at most partly getting this (not your fault, it’s a great writeup:-), so the following question might not be sensible: The main theorem seems to imply that an arbitrary trajectory converges to an arbitrary strictly positive point of complex balance. But can’t there be many of the latter for a given system, and if so how can a fixed trajectory converge to all of them at once?

    • John Baez says:

      arch1 wrote:

      the main theorem seems to imply that an arbitrary trajectory converges to an arbitrary strictly positive point of complex balance.

      It definitely doesn’t say or imply that—that’s the Global Attractor Conjecture, which is the biggest open question in reaction network theory! Manoj will talk about that more in his next post.

      But you shouldn’t feel bad about making this slip, because Manoj said that some famous researchers in this subject made the same mistake. I forget the details—I hope he can tell us this story at some point.

      But can’t there be many of the latter for a given system, and if so how can a fixed trajectory converge to all of them at once?

      In chemistry we have lots of conserved quantities, like the total amount of hydrogen, or oxygen, etc. Chemical reactions don’t change these. There is often one equilibrium for each choice of the values of these conserved quantities.

      So, if we start with half a pound of hydrogen and two pounds of oxygen, we expect our chemical system to approach the equilibrium that has half a pound of hydrogen and two pounds of oxygen.

      Taking these ideas and turning them into theorems—that’s where the fun starts. I suspect Manoj will also talk about this. But not everything we expect has been proved.

      • What John said :-)

        Yes in fact Horn and Jackson also thought they had proved global convergence. This was the one small blemish in their otherwise extraordinary 1972 paper. Horn realized the mistake, and published a retraction two years later. We’ll see what the catch is next time. If you’re seen the autocatalysis post, then I can mention that it has to do with critical siphons.

  8. John Baez says:

    A reaction network where every transition has just one species as input and one as output is the same as a continuous-time Markov chain, something like this:

    We’ve got some finite set of states X and a matrix H_{ij} describing the probabilistic rate for a state to jump to the state i given that it’s in the state j. If the probability for the system to be found in any state is given by the distribution

    p(t) : X \to \mathbb{R}

    at time t, it evolves according to the master equation:

    \displaystyle{   \frac{d}{dt} p(t) = H p(t) }

    For probabilities to stay positive and for total probability to be conserved, we require that H be infinitesimal stochastic, meaning that

    i \ne j \; \Rightarrow \; H_{i j} \ge 0

    and

    \displaystyle{ \sum_{i \in X}  H_{i j} = 0 }

    for all j \in X.

    When pondering ‘free energy’ and other thermodynamic notions for reaction networks, it’s good to start with this well-understood special case.

    Kolmogorov came up with a criterion for a continuous-time Markov chain to be reversible. In the reversible case there exists an equilibrium state p that obeys detailed balance. By equilibrium I simply mean that p doesn’t change with time:

    H p = 0

    but detailed balance says more:

    H_{i j} p_j = H_{j i} p_i

    In other words, the probability per time for the state to hop from i to j equals the probability per time for it to hop from j to i.

    All this is just setting up notation and recalling standard stuff. Next I’ll bring in a concept from thermodynamics: namely, entropy!

  9. John Baez says:

    Okay, suppose we have a reversible continuous-time Markov chain and p is an equilibrium probability distribution obeying the detailed balance condition.

    To keep things simple, let’s also suppose the Markov chain is irreducible, so there’s just one equilibrium. (The general case can be broken up into irreducible pieces.)

    In this situation every probability distribution q(t) will approach the equilibrium p as it evolves in time according to the master equation:

    \displaystyle{ \lim_{t \to +\infty} q(t) = p }

    There are many Lyapunov functions, meaning functions F of a probability distribution q(t) that always increase (or if you prefer to flip the sign, decrease) as time goes on:

    \displaystyle{ \frac{d}{dt} q(t) = H q(t) \; \Rightarrow \; \frac{d}{dt} F(q(t)) \ge 0 }

    and reach their maximum (or if you prefer, minimum) only at the equilibrium p.

    Many of these Lyapunov functions have been studied in detail. A systematic review is here:

    • A. N. Gorban, P. A. Gorban and G. Judge, Entropy: the Markov ordering approach, Entropy 12 (2010), 1145–1193.

    For example, when the equilibrium p : X \to \mathbb{R} is a constant function, one of these Lyapunov functions is the Shannon entropy

    \displaystyle{ S(q) = - \sum_i q_i \ln q_i }

    So, we get back the second law of thermodynamics!

    \displaystyle{ \frac{d}{dt} q(t) = H q(t) \; \Rightarrow \;  \frac{d}{d t} S(q(t)) \ge 0 }

    But other Lyapunov functions give other ‘second laws’. For example, not only does Shannon entropy increase, so do all the Rényi entropies: these are a family of entropy functions which include Shannon entropy as a special case. I discussed this here:

    More second laws of thermodynamics, Azimuth.

    When p is not constant, we can reduce to the case where it is constant by a kind of transformation. So, all the formulas look a bit more fancy in this more general case, but the ideas are just the same. For example, instead of ordinary entropy being our Lyapunov function, we can use the relative entropy

    \displaystyle{ S(q, p) = -\sum_{i \in X} q_i \ln \frac{q_i}{p_i} }

    In other words:

    \displaystyle{ \frac{d}{dt} q(t) = H q(t) \; \Rightarrow \; \frac{d}{dt} S(q(t),p) \ge 0 }

    Now, I claim that the decrease of free energy over time can be seen quite nicely in this general picture. However, it takes a little work!

    After all, free energy is not quite the same as entropy. There’s a formula relating entropy, energy and free energy. But so far I haven’t brought energy into the picture. And this is what I want to do next. But it’s getting late, so I’ll stop here.

    (I imagine plenty of experts know this whole story already. However, I’ve never seen it spelled out simply all in one place; I’ve been trying to pick it up here and there and assemble it all in my head. So, I feel the need to talk about it.)

  10. John Baez says:

    So suppose we have a reversible continuous-time Markov chain and there’s a unique probability distribution p obeying the detailed balance condition. This implies p is an equilibrium, but a lot more too, as described above.

    Now let’s bring in energy. The idea is to pick a real number \beta, the inverse temperature or coolness, and write the probabilities p_i as a Boltzmann distribution, familiar from statistical mechanics:

    \displaystyle{ p_i = \frac{e^{-\beta E_i}}{Z(\beta)} }

    Here E_i is a real number called the energy of the state i \in X, while Z(\beta) is a number chosen to make the probabilities sum to one, as they must:

    Z(\beta) = \sum_{i \in X} e^{-\beta E_i}

    This number Z(\beta) is called the partition function.

    No matter what our probabilities p_i are, and no matter what \beta \in \mathbb{R} we pick, we can find energies and a partition function that makes

    \displaystyle{ p_i = \frac{e^{-\beta E_i}}{Z(\beta)} }

    The energies are not unique: we can add the same constant to all the E_i, and multiply Z(\beta) by some number, without changing the probabilities p_i. But this is the only ambiguity: the energies are well-defined up to an additive constant, which is what you expect in physics.

    Speaking of ambiguities, our choice of inverse temperature \beta was arbitrary as long as it’s not zero. If we multiply \beta by some number and divide all the energies by that same number, the numbers \exp(-\beta E_i) and thus the probabilities p_i don’t change. This just says that our units of temperature are arbitrary as long as we correspondingly change our units of energy.

    (I’m assuming Boltzmann’s constant is 1, so our units of temperature and units of energy are locked together.)

    So, we’ve introduced concepts of energy and temperature into an arbitrary reversible continuous-time Markov chain that has a unique equilibrium p obeying the detailed balance condition.

    In my last comment I introduced entropy. Given energy and entropy and temperature, we can define free energy. I’ll do that next.

  11. John Baez says:

    Hmm, this may be more problematic than I thought, but let’s try.

    In thermodynamics, free energy can be defined by

    F = \langle E \rangle - T S

    where \langle E \rangle is the expected energy, T is temperature and S is entropy.

    Let’s say we’ve got a reversible continuous-time Markov chain that has a unique equilibrium probability distribution p obeying the detailed balance condition. In this situation we can take any other probability distribution q and try to compute its free energy using the formula above.

    It’s straightforward to define the entropy of q:

    \displaystyle{ S(q) = - \sum_{i \in X} q_i \ln q_i }

    We can also compute its expected energy

    \displaystyle{ \langle E(q) \rangle = \sum_{i \in X} E_i q_i }

    where we define the energies E_i as in my previous comment, using

    \displaystyle{ p_i = \frac{e^{-\beta E_i}}{Z(\beta)} }

    where

    \displaystyle{  Z(\beta) = \sum_{i \in X} e^{-\beta E_i}  }

    As mentioned earlier, these equations only define the energies E_i up to an additive constant. Since I can’t think of anything better to do, let’s choose that constant so that Z(\beta) = 1 for our chosen inverse temperature \beta. Then we have

    \displaystyle{ p_i = e^{-\beta E_i} }

    so

    E_i = - T \ln p_i

    where the temperature T is defined by

    \displaystyle{ \beta = \frac{1}{T} }

    Doing this, we get

    \displaystyle{ \langle E(q) \rangle = \sum_{i \in X} E_i q_i = - T \sum_{i \in X} q_i \ln p_i  }

    so apparently the free energy of q is

    \displaystyle{ F(q) = \langle E(q) \rangle - T S(q) =  T \sum_{i \in X} q_i \ln q_i - q_i \ln p_i }

    I say ‘apparently’ because the temperature T has nothing intrinsic to do with q; it’s the temperature we arbitrarily assigned to the equilibrium state p.

    This makes me nervous, but at least we can do something now: we can compare this formula for free energy with the one Manoj gave in his post!

  12. John Baez says:

    Manoj defined a concept of free energy for reaction networks by

    \displaystyle{ g_\alpha(x) = \sum_{s\in S} x_s \ln x_s - x_s - x_s \ln \alpha_s }

    Here S is the set of species, \alpha_s is the number of items of items of species s in the complex balanced equilibrium \alpha, and x_s is the number of items of this species at some other point x.

    A reaction network where every transition has just one species as input and one as output can be reintepreted as a continuous-time Markov chain. Translating to the Markov chain notation I’ve been using in previous comments, we get a concept of free energy

    \displaystyle{ G(q) = \sum_{i \in X} q_i \ln q_i - q_i - q_i \ln p_i }

    This is very similar to the quantity I called free energy in my last comment:

    \displaystyle{ F(q) = T \sum_{i \in X} q_i \ln q_i - q_i \ln p_i }

    I claim they’re the same for all practical purposes. First, I introduced the constant T in an ad hoc way, and it could be anything, so it would be fine to set T = 1 if all we want is some Lyapunov function. Then we get

    \displaystyle{ F(q) = \sum_{i \in X} q_i \ln q_i - q_i \ln p_i }

    On the other hand, in our Markov process

    \displaystyle{ \sum_{i \in X} q_i }

    is constant as a function of time: 1 if we treat the q_i as probabilities, as I’ve been doing, or the total population of items, if we take the reaction network stance. Either way, it’s just some constant C. So,

    \displaystyle{ G(q) = - C + \sum_{i \in X} q_i \ln q_i - q_i \ln p_i }

    and thus

    G(q) = F(q) - C .

    So, for a very special kind of reaction network, we’ve managed to use standard ideas from thermodynamics to understand the free energy function Manoj was discussing. And so it’s natural to try to generalize this argument to more reaction networks!

  13. John Baez says:

    I won’t try the generalization to more general reaction networks now. I just want to mention another way to think about this free energy function for reversible Markov processes.

    We’ve seen that setting the temperature T to 1, we have

    \displaystyle{ F(q) = \sum_{i \in X} q_i \ln q_i - q_i \ln p_i    }

    But if we rewrite this as

    \displaystyle{ F(q) = \sum_{i \in X} q_i \ln \left( \frac{q_i}{p_i} \right) }

    we recognize it as the relative information of q relative to p, or minus the relative entropy

    \displaystyle{ S(q,p) = -\sum_{i \in X} q_i \ln \left( \frac{q_i}{p_i} \right) }

    I’ve already mentioned that this is a Lyapunov function; now we’re seeing it in a somewhat new light!

  14. arch1 says:

    Manoj, I can’t quite make sense of the proof of Lemma 4 unless I replace “We observe that this remainder flow satisfies the conditions of Lemma 4 on a graph with strictly fewer edges” with something like “We observe that *if* this remainder flow (on a graph with strictly fewer edges) satisfies the conditions of Lemma 4, then so does f.” Does the latter express your intended meaning, or am I confused?

    • @arch1 no, I think I said what I intended to say. Perhaps it will be helpful if I break things down further.

      What do I mean by the conditions of Lemma 4? I mean that the remainder flow is conservative. That’s all! And that is easy to verify.

      Now because the remainder flow is conservative, if Lemma 4 were true for the remainder flow, I would be done. But this process has reduced the problem of proving Lemma 4 to the problem of proving Lemma 4 on a smaller graph. Now I can apply infinite descent, which is a fancy name for mathematical induction run backwards.

  15. […] hat tip to the Azimuth Project and thanks to Manoj Gopalkrishnan for this interesting […]

  16. First a disclaimer: I’m not sure how to post nice looking mathematics here.

    I think it is worth noting that Martin Feinberg proved things in a different way than did Horn and Jackson. Marty posted some nice lecture notes (from a series of lectures in Wisconsin in 1979) that can be found at: http://crnt.engineering.osu.edu/LecturesOnReactionNetworks

    In particular, his proof that the function being discussed here is, in fact, a Lyapunov function is quite nice. Let me redo it here as it is quite nice, though I will change the proof slightly by dropping the notion of “complex-space,” which Marty makes use of in his notes. I like this proof as it really shows (to me at least) where the complex balance condition comes in.

    **********

    Let \{\mathcal S,\mathcal C,\mathcal R\} be a deterministically modeled chemical reaction system with mass-action kinetics. Suppose that there are precisely N species. We denote the kth reaction by y_k \to y_k', and denote the span of the reaction vectors by S = \text{span}\{y_k'-y_k\}. The ODE governing the dynamics of the system is

    \displaystyle{ \dot x(t) = \sum_k \kappa_k x(t)^{y_k}(y_k' - y_k).}

    Assume that the system is complex-balanced with complex-balanced equilibrium \overline c \in \mathbb{R}^N_{>0}. This means that for each \eta \in \mathcal C,

    \displaystyle{ \sum_{k: \eta = y_k} \kappa_k (\overline c)^{y_k} = \sum_{k: \eta = y_k'} \kappa_k (\overline c)^{y_k}, }

    where the sum on the left is over all reactions for which \eta is the source complex, and the sum on the right is over all reactions for which \eta is the product complex.

    Now define the function g by

    \displaystyle{ g(x) = \sum_{i=1}^N x_i \left(\ln(x_i) - \ln \overline (c_i) -1 \right) + \overline c_i. }

    The fact that V is a Lyapunov function for the system is captured in the following result.

    Theorem. Suppose that x\in \mathbb{R}^N_{>0} with x - \overline c \in S. Then

    \displaystyle{ \nabla g(x) \cdot \sum_k  \kappa_k x^{y_k} (y_k' - y_k) = \sum_k \kappa_k x^{y_k} (y_k'-y_k)\cdot (\ln(x) - \ln(\overline c)) \le 0,}

    with equality if and only if x = \overline c.

    Proof. Note that

    \displaystyle{ \sum_k \kappa_k x^{y_k} (y_k'- y_k) \cdot (\ln x - \ln \overline c) = }
    \displaystyle{ \sum_k \kappa_k (\overline c)^{y_k}  \left( \frac{x}{\overline c} \right)^{y_k}  \left(\ln \left\{ \left(\frac{x}{\overline c}\right)^{y_k'}\right\} -        \ln \left\{ \left(\frac{x}{\overline c}\right)^{y_k}\right\}  \right). }

    Using that for any real numbers a,b \in \mathbb{R} we have e^a(b-a) \le e^b - e^a with equality if and only if a = b (consider secant lines of e^x), we have

    \displaystyle{ \sum_k \kappa_k (\overline c)^{y_k} ( \frac{x}{\overline c})^{y_k} \left(  \ln \left\{ (\frac{x}{\overline c} )^{y_k'} \right\} -   \ln \left\{ (\frac{x}{\overline c} )^{y_k} \right\} \right) }
    \displaystyle{ \le \sum_k \kappa_k  (\overline c)^{y_k} \left( (\frac{x}{\overline c})^{y_k'} -                         (\frac{x}{\overline c})^{y_k} \right) }

    \displaystyle{ = \sum_{\eta \in \mathcal{C}} \left( \sum_{k: \eta = y_k'} \kappa_k (\overline c)^{y_k} \left(\frac{x}{\overline c}\right)^{y_k'} -  \sum_{k: \eta = y_k}  \kappa_k (\overline c)^{y_k}  \left(\frac{x}{\overline c}\right)^{y_k}  \right) }

    = \displaystyle{ \sum_{\eta \in \mathcal{C}}  (\frac{x}{\overline c})^{\eta}  \left( \sum_{k: \eta = y_k'} \kappa_k (\overline c)^{y_k}  -     \sum_{k: \eta = y_k}  \kappa_k (\overline c)^{y_k} \right) }
    = 0

    where the final equality holds by the condition above on complex-balancing.

    Thus, we have a strict inequality unless

    \displaystyle{ (y_k' - y_k) \cdot \left( \ln(x) - \ln(\overline c)\right) = 0,}

    for all k. That is, we have a strict inequality unless

    \displaystyle{	\ln(x) - \ln(\overline c) \in S^{\perp}. }

    Following precisely the argument on page 4–33 of Feinberg’s notes, we now note that if both

    x - \overline c\in S

    and

    \ln(x) - \ln(\overline c) \in S^{\perp}

    hold, then

    \displaystyle{ 0 = (x-\overline c) \cdot \left(\ln(x) - \ln(\overline c)\right) = \sum_{i=1}^N (x_i-\overline c_i)(\ln(x_i) -\ln(\overline c_i)), }

    which, by the monotonicity of the \ln function, can only happen if x_i = \overline c_i for all i.

  17. Oh dear, that did not work. I have simply posted the note here:

    http://www.math.wisc.edu/~anderson/CRNT_Lyapunov.pdf

    • John Baez says:

      I’ve fixed the LaTeX in your comment, but it’s nice to have a PDF version too.

      I’m glad to see you here! We’ve spent a lot of time here discussing your work on stationary solutions of the master equation for complex-balanced systems.

      Thanks for posting this comment! It’s great to see another proof of this important result.

    • John Baez says:

      For more remarks comparing different paper on Lyapunov functions for chemical reaction networks and evolutionary games, go here. I will try to summarize all these ideas in a blog post at some point, but right now we’ve got two parallel conversations going on: one featuring chemical reaction network theorists and one featuring evolutionary game theorists… both talking about free energy as a Lyapunov function.

  18. Now Tobias Fritz and I have finally finished our paper on this subject:

    A Bayesian characterization of relative entropy.

You can use HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,797 other followers