Bio-Inspired Information Theory

31 January, 2014

 

There will be a 5-day workshop on Biological and Bio-Inspired Information Theory at BIRS from Sunday the 26th to Friday the 31st of October, 2014. It’s being organized by

Toby Berger (University of Virginia)
Andrew Eckford (York University)
Peter Thomas (Case Western Reserve University)

BIRS is the Banff International Research Station, a conference venue in a rather wild part of Alberta, in the mountains west of Calgary.

Description

Here’s the workshop proposal on the BIRS website:

Currently, research in the community is organized into three streams:

• Information theory and biochemistry (including information theory and intercellular communication);

• Information theory and neuroscience; and

• Information-theoretic analysis of biologically-inspired communication systems (including nano-networking and design of biologically implemented information processing networks).

We propose a BIRS workshop to explore these streams, focusing on mathematical open problems that cut across the streams. The main objectives of this workshop would be: to bring together the most prominent researchers in this field; to discuss and review the current state of mathematical research in this field; to promote cross-pollination among the various streams of research to find common problems; and to collectively identify key future directions or open problems that would bring the greatest impact to this field over the next few years.

Expected impact

A BIRS workshop involving the field’s leading researchers would allow a review of the current state of the art, and would promote cross-pollination among these three streams of research. We expect to have these leading researchers in attendance. For example, Prof. Toby Berger (U. Virginia), a widely recognized pioneer in this field and a recipient of the Shannon Award (the top prize awarded by the IEEE Information Theory Society), is one of the co-organizers of the workshop. Moreover, we have approached many of the field’s most prominent mathematicians and scientists: a complete list is found elsewhere in this proposal, but among the most prominent confirmed participants are: Prof. Tadashi Nakano (Osaka U.), one of the earliest researchers in engineered molecular communication; Dr. Thomas D. Schneider (NIH – National Cancer Institute), inventor of the sequence logo and prominent researcher in genetic information theory; and Profs. William Bialek (Princeton U.) and Naftali Tishby (Hebrew U.), prominent experts on information theory in neural coding.

Although the focus of our workshop is on mathematical fundamentals, our list of expected participants includes a few experimental scientists, e.g. Raymond Cheong and Andre Levchenko (both from Johns Hopkins U.), in addition to mathematical scientists. This is because quantitative application of information theoretic analysis to biological systems typically requires empirical estimation of joint probability distributions for multiple input and output variables, often posing daunting data collection challenges which pioneered the use of high-throughput experimental methods to collect large data sets quantifying the input/output relationships for a specific biochemical signaling pathway). We believe a blended approach, emphasizing mathematics but including experimental perspectives, will enhance the impact of our workshop and increase the usefulness to our participants.

Given that publications in these research areas have achieved prominence in the past few years, the time is right for a general meeting among the key researchers to review the state of the field and develop future directions. Thus, our proposed workshop is timely and would be expected to have a tremendous impact on the field over the next several years.


The Rarest Things in the Universe

27 January, 2014

guest post by Leonard Adleman

About 50 years ago Kolomogorov assigned to each finite binary string \sigma a non-negative integer that measured that string’s ‘descriptive complexity’. Informally, K(\sigma) is the length (in binary) of the shortest (Turing machine) program that with the empty string as input, outputs \sigma and halts. A related measure of descriptive complexity is M(\sigma)=\frac{K(\sigma)}{|\sigma|}, where |\sigma| denotes the length of \sigma. A simple string like:

\sigma_0=\overbrace{111\ldots 111}^{1,000,000}

can be produced by a very short program; hence M(\sigma_0) is near 0. But if \sigma is a ‘random string’ (e.g. obtained by flipping coins), then, with high probability, it cannot be produced by a program significantly shorter than \sigma itself; hence M(\sigma) will be near 1.

If I asked you to produce (using a computer) a thousand strings of length one million with M near 1, it would be easy to do; just flip a lot of coins. If I asked you to produce a thousand strings with M near 0, that would also be easy. For example, you could start with a short random string and repeat it a lot. Actually, if I chose my favorite \alpha\in [0,1] and wanted a thousand strings of length one million with M near \alpha, then a mix of the preceding approaches can be used to produce them. So strings with a desired M are not rare.

Now let’s consider ‘deep strings’*. I will be informal here, but the underlying theory can be found in my Time space and randomness.

For all binary strings \sigma, we assign a value that measures the ‘depth’ of \sigma. D(\sigma) is obtained by considering both the size and the number of steps used by each program that with the empty string as input, outputs \sigma and halts**. D(\sigma) has the following properties:

  • If there is no short program to produce \sigma, then D(\sigma) is small.
  • If there is a short program to produce \sigma and it uses few steps, then D(\sigma) is small.
  • If there is a short program to produce \sigma, but all short programs to produce \sigma use lots of steps, then D(\sigma) is large. Roughly speaking, the more steps small programs use to produce \sigma, the larger D(\sigma) will be.

Informally, we call a string with large D ‘deep’ and one with small D ‘shallow’. A few examples may help.

Consider a string \sigma obtained by flipping coins. With high probability there is no short program to produce \sigma, hence \sigma is shallow.

Now consider \sigma_0 above. Since there is a short program to produce \sigma_0 and that program uses few steps, \sigma_0 is shallow.

Now treat \sigma_0 as a number in binary (i.e. 2^{1,000,000}-1) and consider the prime factorization. The fundamental theorem tells us it exists and will be about one million bits long. But, unless 2^{1,000,000}-1 is somehow special (e.g. a prime times a very smooth number), its prime factorization may be very very deep. A short program can generate the prime factorization (just generate one million 1s with a short program and then give it to a short factoring program). But if it turns out that factoring can’t be done in polynomial time, then perhaps all short programs that generate the prime factorization use a huge number of steps. So the prime factorization would have a very large D. Conceivably, since steps on a computer use time and energy, the prime factorization can never be realized. It is not a long string (only one million bits) but it may exist only in theory and not in reality.

If I asked you to produce even one string of length one million with D near that of the prime factorization of 2^{1,000,000}-1, could you do it? I would not know how and I suspect that as a practical matter it cannot be done. So strings with such a D are rare.

Here is a string that does exist in our universe and that (I suspect) is quite deep:

46817226351072265620777670675006972301618979214252832875068976303839400413682313
921168154465151768472420980044715745858522803980473207943564433*5277396428112339
17558838216073534609312522896254707972010583175760467054896492872702786549764052
643493511382273226052631979775533936351462037464331880467187717179256707148303247

In fact, this string is the prime factorization of 2^{1061}-1. We should not expect it to be as deep as the prime factorization of 2^{1,000,000}-1, but we should still expect it to have considerable depth. There is even some experimental evidence that supports this view. How did I get this prime factorization? I got it from: Factorization of a 1061-bit number by the Special Number Field Sieve, by Greg Childers. So it was easy to get. Well, easy for me anyway; not so easy for Childers. He reports that the factorization took about `3 CPU-centuries’ using the Special Number Field Sieve. Other than by factoring 2^{1061}-1, how could you ever come to write this pair of primes? I venture to say that since the Special Number Field Sieve is the fastest known algorithm for factoring numbers of this kind, no method available today could have written these primes (for the first time) using fewer steps (and hence less time/energy).

The situation might be compared to that of atomic physics. I am not a physicist, but I suppose it is possible to theorize about an atomic nucleus with a million protons. But what if I want to create one? It appears that producing transuranic elements takes huge amounts of time/energy and the greater the number of protons, the more time/energy it takes. It is even conceivable (to me at least) that there is not enough time/energy available (at least on earth) to actually produce one. Like the prime factorization of 2^{1,000,000}-1, it may exist in theory but not in reality. On the other hand, physicists from Russia and America, using lots of time/energy, have created an atomic nucleus with 118 protons called Ununoctium. Ununoctium is analogous to Childers’ prime factorization; both exist in reality; both were very costly to create.

But my focus here is neither ontology nor physics; it is money. Recently Bitcoins and Bitcoin-like things have captured the world’s attention. I suspect that this kind of currency will revolutionize the nature of economies, and consequently the nature of governance. Personally, I am agnostic on whether this is a good thing or a bad thing. But, in any case, I am wondering if ‘depth’ might form part of a theoretical basis for understanding new currencies. I must confess to knowing few details about Bitcoins; consequently, I will start from first principles and consider some of the properties of currency:

  • Mass: A currency may have mass or may be massless. U.S. dollars and gold have mass. Bitcoins are massless. The efficiencies of the Internet market require a massless currency. Imagine running eBay or Amazon using gold. U.S. dollars have mass, so we do not actually use them for transactions on the Internet, rather we use credit card numbers to create massless IOUs that promise to deliver (but, in fact, seldom actually deliver) U.S. dollars in the future. These IOUs are essentially a massless currency.
  • Production: Who can produce or destroy the currency? This has been an important political, legal and economic issue for millennia. In the west, coins began to appear at about the time Solon ruled Athens. Solon realized that he could manipulate the value of coins. And so he did. Solon’s lesson has not been lost on governments ever since. In its simplest form: if you owe K dollars, you print K dollars, and voila! no more debt. Creditors don’t like debtors to do that, so they want to be paid in a currency that no one can produce more of. Gold comes close. You can produce more gold but you have to spend a lot of time/energy in a mine to do so. Production also includes counterfeiting. Counterfeiting comes in at least two important forms: de novo-counterfeiting (build a press) and duplication-counterfeiting (get a xerox machine). For now, all massless currencies are in digital form and are vulnerable to duplication-counterfeiting since computers make it cheap and easy to create perfect copies of digital notes (a unit of a currency will be called a ‘note’). As a result, massless currencies will typically be associated with systems that mitigate this threat. Perhaps the elaborate ledgers implemented by the creators of Bitcoin are an attempt to deal with the threat of duplication-counterfeiting.
  • Abundance: As is often said, money can be used to ‘lubricate the economy’. To accomplish this a currency has to be sufficiently abundant. For example, Ununoctium, might be attractive to creditors because, compared to gold, it is far more costly in time/energy to produce more. However, it is not desirable for lubricating the economy because it is so costly to produce that less than 100 atoms have ever been created.

The transition from mass to masslessness will lead to currencies with uses and properties we do not normally associate with money. For example, using the techniques of secret-sharing, it becomes possible to create digital currencies where a single note can be jointly owned by 1000 individuals; any 501 of whom could cooperate to spend it, while any 500 of whom would be unable to do so.

What is the perfect currency? This is probably the wrong question, rather we should ask what properties may currency have, in theory and in reality. Let’s consider massless currencies.

Is there a massless currency similar to the U.S. dollar? I think yes. For example, the Government could simply publish a set of numbers and declare the numbers to be currency. Put another way, make the U.S. dollar smaller and smaller to decrease mass until asymptotically all that is left is the serial number. With regard to abundance, like the U.S. dollar, the Government is free to determining the number of notes available. With regard to production, as with the U.S. dollar, the Government can print more by simply publishing new numbers to be added to the set (or destroy some by declaring that some numbers have been removed from the set). With regard to counterfeiting, the U.S. dollar has some advantages. The mass of the U.S. dollar turns counterfeiting from a digital problem to a physical one. This provides the Government with the ability to change the U.S. dollar physically to defeat technology that might arise to produce faithful (either de novo or duplication) counterfeits.

Is there a massless currency similar to gold? I think yes. I think this is what Bitcoin-like currencies are all about. They are ‘deep-currencies’, sets of deep strings. With regard to abundance, they are superior to gold. The total number of deep strings in the set can be chosen at initiation by the creator of the currency. The abundance of gold on the other hand has already been set by nature (and, at least as currently used, gold is not sufficiently abundant to lubricate the world economy). With regard to production, as with gold, making notes requires time/energy. With regard to counterfeiting, gold has an advantage. Counterfeit gold is an absurdity, since all gold (by the ounce, not numismatically), no matter how it arises, is the same atomically and is perceived to have the same value. On the other hand, as stated above, massless currencies are vulnerable to duplication-counterfeiting. Interestingly, deep-currencies may be resistant to de novo-counterfeiting, since the creator of the deep-currency is free to choose the depth of the notes, and consequently the cost of producing new notes.

The value of a massless currency in our information based world is clear. Deep-currencies such as Bitcoin offer an attractive approach. But there is an interesting issue that may soon arise. The issue stems from the fact that the production of each note in currencies like Bitcoin requires a large investment of time/energy, and as with gold, this deprives governments of the prerogative to print money. Creditors may like this, but governments will not. What will governments do? Perhaps they will want to create a ‘Dual-currency’. A Dual-currency should be massless. It should come with a secret key. If you do not possess the secret key, then, like gold, it should be very costly to produce a new note, but if you possess the secret key, then, like the U.S. dollar, it should be inexpensive to produce a new note. Is a Dual-currency possible? Here is an example of an approach I call aurum:

  • Generate a pair of RSA keys: a public key \langle E,N\rangle, and a secret key \langle D,N\rangle. Publish the public key \langle E,N\rangle.
  • Declare that notes are exactly those integers G such that 2\leq G\leq N-1 and (the least positive residue of) G^E\mbox{Mod}(N) is less than or equal to \frac{N}{10^9}.

So, choosing a random integer V such that 2\leq V\leq N-1, and then computing V^E\mbox{Mod}(N) has about one chance in a billion of producing a note. Hence the expected number of modular exponentiations to produce a note is about one billion. On the other hand, those who possess the secret key can chose an integer W such that 2\leq W\leq \frac{N}{10^9} and calculate W^D\mbox{Mod}(N) to produce a note after just one modular exponentiation. There are many bells and whistles that can be accommodated with aurum, but here is what is ‘really’ going on. The depth of a string \sigma is obtained by considering programs running with the empty string as input, but we can consider a more general concept: ‘relative depth’. Given a pair of strings \sigma and \tau, the depth of \sigma relative to \tau is obtained by considering programs running with \tau as the input. Hence depth as we have been discussing it is the same as depth relative to the empty string. In the example above, we have made `dual-strings’; strings that are deep relative to the public key, but shallow relative to the secret key.

One of the interesting phenomena of theoretical computer science is that you can sometimes turn bad news into good news. If factoring is hard, then we are deprived of the ability to do something we (at least number theorists) would like to do. Bad news. But surprisingly, we acquire public-key cryptography and the ability to preserve our privacy. Good news. Similarly, strings that are very hard to produce seem useless, but Bitcoins have revealed that such strings can provide a new and useful form of currency. Now that we are aware that deep strings can have value, I expect that clever people will find many new uses for them.

After thinking about deep strings for many years, I see them everywhere – I invite you to do the same. I will finish with one of my favorite observations. The watchmaker analogy is well known and is frequently used in arguing the existence of a creator. If you stumble upon a watch, you recognize from its complexity that it did not form by chance, and you conclude that there must have been a watchmaker. A human is more complex than a watch, so there must have been a ‘human-maker’ – a creator. An alternative view is that the complexity you recognize is actually ‘depth’ and the conclusion you should reach is that there must have been a computational process sustained through a great many steps. In the case of humans, the process is 3.6 billion years of evolution and the depth can be read in the genome. The watch is deep as well, but much of its depth is acquired from the human genome relative to which it is not nearly so deep.

*It has been brought to my attention that others have considered similar concepts including Charles Bennett, Murray Gell-Mann, and Luis Antunes, Lance Fortnow, Dieter Van Melkebeek, and N. Variyam Vinodchandran. Indeed, the latter group has even used the term ‘deep string’, hence it is to them we owe the name.

**For all \sigma\in\{0,1\}^{*}:

  • T(\sigma) denotes the set of Turing machine programs that with the empty string as input, output \sigma and halt.
  • E(\sigma) denotes \min\{\max\{|P|,\log_{2}(\vec{P})\}|P\in T(\sigma)\}, where |P| denotes the length of P, and \vec{P} denotes the number of steps used by P with the empty string as input.

It may be convenient for the reader to assume that D(\sigma) is approximately \frac{E(\sigma)}{K(\sigma)}; however, the proper theory of deep strings extends the notion of depth to sets of strings and accommodates the use of randomness in computation. Depth in the context of quantum computation may also be of interest.


Relative Entropy in Evolutionary Dynamics

22 January, 2014

guest post by Marc Harper

In John’s information geometry series, he mentioned some of my work in evolutionary dynamics. Today I’m going to tell you about some exciting extensions!

The replicator equation

First a little refresher. For a population of n replicating types, such as individuals with different eye colors or a gene with n distinct alleles, the ‘replicator equation’ expresses the main idea of natural selection: the relative rate of growth of each type should be proportional to the difference between the fitness of the type and the mean fitness in the population.

To see why this equation should be true, let P_i be the population of individuals of the ith type, which we allow to be any nonnegative real number. We can list all these numbers and get a vector:

P = (P_1, \dots, P_n)

The Lotka–Volterra equation is a very general rule for how these numbers can change with time:

\displaystyle{ \frac{d P_i}{d t} = f_i(P) P_i }

Each population grows at a rate proportional to itself, where the ‘constant of proportionality’, f_i(P), is not necessarily constant: it can be any real-valued function of P. This function is called the fitness of the ith type. Taken all together, these functions f_i are called the fitness landscape.

Let p_i be the fraction of individuals who are of the ith type:

\displaystyle{ p_i = \frac{P_i}{\sum_{i =1}^n P_i } }

These numbers p_i are between 0 and 1, and they add up to 1. So, we can also think of them as probabilities: p_i is the probability that a randomly chosen individual is of the ith type. This is how probability theory, and eventually entropy, gets into the game.

Again, we can bundle these numbers into a vector:

p = (p_1, \dots, p_n)

which we call the population distribution. It turns out that the Lotka–Volterra equation implies the replicator equation:

\displaystyle{ \frac{d p_i}{d t} = \left( f_i(P) - \langle f(P) \rangle \right) \, p_i }

where

\displaystyle{ \langle f(P) \rangle = \sum_{i =1}^n  f_i(P)  p_i  }

is the mean fitness of all the individuals. You can see the proof in Part 9 of the information geometry series.

By the way: if each fitness f_i(P) only depends on the fraction of individuals of each type, not the total numbers, we can write the replicator equation in a simpler way:

\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p) - \langle f(p) \rangle \right) \, p_i }

From now on, when talking about this equation, that’s what I’ll do.

Anyway, the take-home message is this: the replicator equation says the fraction of individuals of any type changes at a rate proportional to fitness of that type minus the mean fitness.

Now, it has been known since the late 1970s or early 1980s, thanks to the work of Akin, Bomze, Hofbauer, Shahshahani, and others, that the replicator equation has some very interesting properties. For one thing, it often makes ‘relative entropy’ decrease. For another, it’s often an example of ‘gradient flow’. Let’s look at both of these in turn, and then talk about some new generalizations of these facts.

Relative entropy as a Lyapunov function

I mentioned that we can think of a population distribution as a probability distribution. This lets us take ideas from probability theory and even information theory and apply them to evolutionary dynamics! For example, given two population distributions p and q, the information of q relative to p is

I(q,p) = \displaystyle{ \sum_i q_i \ln \left(\frac{q_i}{p_i }\right)}

This measures how much information you gain if you have a hypothesis about some state of affairs given by the probability distribution p, and then someone tells you “no, the best hypothesis is q!”

It may seem weird to treat a population distribution as a hypothesis, but this turns out to be a good idea. Evolution can then be seen as a learning process: a process of improving the hypothesis.

We can make this precise by seeing how the relative information changes with the passage of time. Suppose we have two population distributions q and p. Suppose q is fixed, while p evolves in time according to the replicator equation. Then

\displaystyle{  \frac{d}{d t} I(q,p)  =  \sum_i f_i(P) (p_i - q_i) }

For the proof, see Part 11 of the information geometry series.

So, the information of q relative to p will decrease as p evolves according to the replicator equation if

\displaystyle{  \sum_i f_i(P) (p_i - q_i) } \le 0

If q makes this true for all p, we say q is an evolutionarily stable state. For some reasons why, see Part 13.

What matters now is that when q is an evolutionarily stable state, I(q,p) says how much information the population has ‘left to learn’—and we’re seeing that this always decreases. Moreover, it turns out that we always have

I(q,p) \ge 0

and I(q,p) = 0 precisely when p = q.

People summarize all this by saying that relative information is a ‘Lyapunov function’. Very roughly, a Lyapunov function is something that decreases with the passage of time, and is zero only at the unique stable state. To be a bit more precise, suppose we have a differential equation like

\displaystyle{  \frac{d}{d t} x(t) = v(x(t)) }

where x(t) \in \mathbb{R}^n and v is some smooth vector field on \mathbb{R}^n. Then a smooth function

V : \mathbb{R}^n \to \mathbb{R}

is a Lyapunov function if

V(x) \ge 0 for all x

V(x) = 0 iff x is some particular point x_0

and

\displaystyle{ \frac{d}{d t} V(x(t)) \le 0 } for every solution of our differential equation.

In this situation, the point x_0 is a stable equilibrium for our differential equation: this is Lyapunov’s theorem.

The replicator equation as a gradient flow equation

The basic idea of Lyapunov’s theorem is that when a ball likes to roll downhill and the landscape has just one bottom point, that point will be the unique stable equilibrium for the ball.

The idea of gradient flow is similar, but different: sometimes things like to roll downhill as efficiently as possible: they move in the exactly the best direction to make some quantity smaller! Under certain conditions, the replicator equation is an example of this phenomenon.

Let’s fill in some details. For starters, suppose we have some function

F : \mathbb{R}^n \to \mathbb{R}

Think of V as ‘height’. Then the gradient flow equation says how a point x(t) \in \mathbb{R}^n will move if it’s always trying its very best to go downhill:

\displaystyle{ \frac{d}{d t} x(t) = - \nabla V(x(t)) }

Here \nabla is the usual gradient in Euclidean space:

\displaystyle{ \nabla V = \left(\partial_1 V, \dots, \partial_n V \right)  }

where \partial_i is short for the partial derivative with respect to the ith coordinate.

The interesting thing is that under certain conditions, the replicator equation is an example of a gradient flow equation… but typically not one where \nabla is the usual gradient in Euclidean space. Instead, it’s the gradient on some other space, the space of all population distributions, which has a non-Euclidean geometry!

The space of all population distributions is a simplex:

\{ p \in \mathbb{R}^n : \; p_i \ge 0, \; \sum_{i = 1}^n p_i = 1 \} .

For example, it’s an equilateral triangle when n = 3. The equilateral triangle looks flat, but if we measure distances another way it becomes round, exactly like a portion of a sphere, and that’s the non-Euclidean geometry we need!

In fact this trick works in any dimension. The idea is to give the simplex a special Riemannian metric, the ‘Fisher information metric’. The usual metric on Euclidean space is

\delta_{i j} = \left\{\begin{array}{ccl} 1 & \mathrm{ if } & i = j \\                                       0 &\mathrm{ if } & i \ne j \end{array} \right.

This simply says that two standard basis vectors like (0,1,0,0) and (0,0,1,0) have dot product zero if the 1’s are in different places, and one if they’re in the same place. The Fisher information metric is a bit more complicated:

\displaystyle{ g_{i j} = \frac{\delta_{i j}}{p_i} }

As before, g_{i j} is a formula for the dot product of the ith and jth standard basis vectors, but now it depends on where you are in the simplex of population distributions.

We saw how this formula arises from information theory back in Part 7. I won’t repeat the calculation, but the idea is this. Fix a population distribution p and consider the information of another one, say q, relative to this. We get I(q,p). If q = p this is zero:

\displaystyle{ \left. I(q,p)\right|_{q = p} = 0 }

and this point is a local minimum for the relative information. So, the first derivative of I(q,p) as we change q must be zero:

\displaystyle{ \left. \frac{\partial}{\partial q_i} I(q,p) \right|_{q = p} = 0 }

But the second derivatives are not zero. In fact, since we’re at a local minimum, it should not be surprising that we get a positive definite matrix of second derivatives:

\displaystyle{  g_{i j} = \left. \frac{\partial^2}{\partial q_i \partial q_j} I(q,p) \right|_{q = p} = 0 }

And, this is the Fisher information metric! So, the Fisher information metric is a way of taking dot products between vectors in the simplex of population distribution that’s based on the concept of relative information.

This is not the place to explain Riemannian geometry, but any metric gives a way to measure angles and distances, and thus a way to define the gradient of a function. After all, the gradient of a function should point at right angles to the level sets of that function, and its length should equal the slope of that function:

So, if we change our way of measuring angles and distances, we get a new concept of gradient! The ith component of this new gradient vector field turns out to b

(\nabla_g V)^i = g^{i j} \partial_j V

where g^{i j} is the inverse of the matrix g_{i j}, and we sum over the repeated index j. As a sanity check, make sure you see why this is the usual Euclidean gradient when g_{i j} = \delta_{i j}.

Now suppose the fitness landscape is the good old Euclidean gradient of some function. Then it turns out that the replicator equation is a special case of gradient flow on the space of population distributions… but where we use the Fisher information metric to define our concept of gradient!

To get a feel for this, it’s good to start with the Lotka–Volterra equation, which describes how the total number of individuals of each type changes. Suppose the fitness landscape is the Euclidean gradient of some function V:

\displaystyle{ f_i(P) = \frac{\partial V}{\partial P_i} }

Then the Lotka–Volterra equation becomes this:

\displaystyle{ \frac{d P_i}{d t} = \frac{\partial V}{\partial P_i} \, P_i }

This doesn’t look like the gradient flow equation, thanks to that annoying P_i on the right-hand side! It certainly ain’t the gradient flow coming from the function V and the usual Euclidean gradient. However, it is gradient flow coming from V and some other metric on the space

\{ P \in \mathbb{R}^n : \; P_i \ge 0 \}

For a proof, and the formula for this other metric, see Section 3.7 in this survey:

• Marc Harper, Information geometry and evolutionary game theory.

Now let’s turn to the replicator equation:

\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p)  - \langle f(p) \rangle \right) \, p_i }

Again, if the fitness landscape is a Euclidean gradient, we can rewrite the replicator equation as a gradient flow equation… but again, not with respect to the Euclidean metric. This time we need to use the Fisher information metric! I sketch a proof in my paper above.

In fact, both these results were first worked out by Shahshahani:

• Siavash Shahshahani, A New Mathematical Framework for the Study of Linkage and Selection, Memoirs of the AMS 17, 1979.

New directions

All this is just the beginning! The ideas I just explained are unified in information geometry, where distance-like quantities such as the relative entropy and the Fisher information metric are studied. From here it’s a short walk to a very nice version of Fisher’s fundamental theorem of natural selection, which is familiar to researchers both in evolutionary dynamics and in information geometry.

You can see some very nice versions of this story for maximum likelihood estimators and linear programming here:

• Akio Fujiwara and Shun-ichi Amari, Gradient systems in view of information geometry, Physica D: Nonlinear Phenomena 80 (1995), 317–327.

Indeed, this seems to be the first paper discussing the similarities between evolutionary game theory and information geometry.

Dash Fryer (at Pomona College) and I have generalized this story in several interesting ways.

First, there are two famous ways to generalize the usual formula for entropy: Tsallis entropy and Rényi entropy, both of which involve a parameter q. There are Tsallis and Rényi versions of relative entropy and the Fisher information metric as well. Everything I just explained about:

• conditions under which relative entropy is a Lyapunov function for the replicator equation, and

• conditions under which the replicator equation is a special case of gradient flow

generalize to these cases! However, these generalized entropies give modified versions of the replicator equation. When we set q=1 we get back the usual story. See

• Marc Harper, Escort evolutionary game theory.

My initial interest in these alternate entropies was mostly mathematical—what is so special about the corresponding geometries?—but now researchers are starting to find populations that evolve according to these kinds of modified population dynamics! For example:

• A. Hernando et al, The workings of the Maximum Entropy Principle in collective human behavior.

There’s an interesting special case worth some attention. Lots of people fret about the relative entropy not being a distance function obeying the axioms that mathematicians like: for example, it doesn’t obey the triangle inequality. Many describe the relative entropy as a distance-like function, and this is often a valid interpretation contextually. On the other hand, the q=0 relative entropy is one-half the Euclidean distance squared! In this case the modified version of the replicator equation looks like this:

\displaystyle{ \frac{d p_i}{d t} = f_i(p) - \frac{1}{n} \sum_{j = 1}^n f_j(p) }

This equation is called the projection dynamic.

Later, I showed that there is a reasonable definition of relative entropy for a much larger family of geometries that satisfies a similar distance minimization property.

In a different direction, Dash showed that you can change the way that selection acts by using a variety of alternative ‘incentives’, extending the story to some other well-known equations describing evolutionary dynamics. By replacing the terms x_i f_i(x) in the replicator equation with a variety of other functions, called incentives, we can generate many commonly studied models of evolutionary dynamics. For instance if we exponentiate the fitness landscape (to make it always positive), we get what is commonly known as the logit dynamic. This amounts to changing the fitness landscape as follows:

\displaystyle{ f_i \mapsto \frac{x_i e^{\beta f_i}}{\sum_j{x_j e^{\beta f_j}}} }

where \beta is known as an inverse temperature in statistical thermodynamics and as an intensity of selection in evolutionary dynamics. There are lots of modified versions of the replicator equation, like the best-reply and projection dynamics, more common in economic applications of evolutionary game theory, and they can all be captured in this family. (There are also other ways to simultaneously capture such families, such as Bill Sandholm’s revision protocols, which were introduced earlier in his exploration of the foundations of game dynamics.)

Dash showed that there is a natural generalization of evolutionarily stable states to ‘incentive stable states’, and that for incentive stable states, the relative entropy is decreasing to zero when the trajectories get near the equilibrium. For the logit and projection dynamics, incentive stable states are simply evolutionarily stable states, and this happens frequently, but not always.

The third generalization is to look at different ‘time-scales’—that is, different ways of describing time! We can make up the symbol \mathbb{T} for a general choice of ‘time-scale’. So far I’ve been treating time as a real number, so

\mathbb{T} = \mathbb{R}

But we can also treat time as coming in discrete evenly spaced steps, which amounts to treating time as an integer:

\mathbb{T} = \mathbb{Z}

More generally, we can make the steps have duration h, where h is any positive real number:

\mathbb{T} = h\mathbb{Z}

There is a nice way to simultaneously describe the cases \mathbb{T} = \mathbb{R} and \mathbb{T} = h\mathbb{Z} using the time-scale calculus and time-scale derivatives. For the time-scale \mathbb{T} = \mathbb{R} the time-scale derivative is just the ordinary derivative. For the time-scale \mathbb{T} = h\mathbb{Z}, the time-scale derivative is given by the difference quotient from first year calculus:

\displaystyle{ f^{\Delta}(z) = \frac{f(z+h) - f(z)}{h} }

and using this as a substitute for the derivative gives difference equations like a discrete-time version of the replicator equation. There are many other choices of time-scale, such as the quantum time-scale given by \mathbb{T} = q^{\mathbb{Z}}, in which case the time-scale derivative is called the q-derivative, but that’s a tale for another time. In any case, the fact that the successive relative entropies are decreasing can be simply state by saying they have negative \mathbb{T} = h\mathbb{Z} time-scale derivative. The continuous case we started with corresponds to \mathbb{T} = \mathbb{R}.

Remarkably, Dash and I were able to show that you can combine all three of these generalizations into one theorem, and even allow for multiple interacting populations! This produces some really neat population trajectories, such as the following two populations with three types, with fitness functions corresponding to the rock-paper-scissors game. On top we have the replicator equation, which goes along with the Fisher information metric; on the bottom we have the logit dynamic, which goes along with the Euclidean metric on the simplex:

From our theorem, it follows that the relative entropy (ordinary relative entropy on top, the q = 0 entropy on bottom) converges to zero along the population trajectories.

The final form of the theorem is loosely as follows. Pick a Riemannian geometry given by a metric g (obeying some mild conditions) and an incentive for each population, as well as a time scale (\mathbb{R} or h \mathbb{Z}) for every population. This gives an evolutionary dynamic with a natural generalization of evolutionarily stable states, and a suitable version of the relative entropy. Then, if there is an evolutionarily stable state in the interior of the simplex, the time-scale derivative of sum of the relative entropies for each population will decrease as the trajectories converge to the stable state!

When there isn’t such a stable state, we still get some interesting population dynamics, like the following:


See this paper for details:

• Marc Harper and Dashiell E. A. Fryer, Stability of evolutionary dynamics on time scales.

Next time we’ll see how to make the main idea work in finite populations, without derivatives or deterministic trajectories!


Wormholes and Entanglement

20 January, 2014

 

An apparent contradiction in what most physicists believe about black holes—the ‘firewall problem’—is making some very good physicists reach for some very crazy-sounding ideas to find a way out. In particular, Maldacena and Susskind have come up with the idea that any pair of quantum-entangled particles is actually connected by a wormhole.

Entanglement is a spooky way for far-away particles to be correlated, but you can’t use it communicate faster than light. It’s been seen in the lab, but it’s only possible thanks to quantum mechanics. The first people to make a fuss over entanglement were Einstein, Podolsky and Rosen, back in 1935.

A wormhole is a spooky way for far-away regions of space to be connected by a kind of ‘tunnel’—but you probably can’t use it to communicate faster than light. Nobody has ever seen one, but they’re theoretically possible thanks to general relativity. The first people to make a fuss over wormholes were Einstein and Rosen, back in 1935.

So, superficially, it makes sense that there should be a connection between wormholes and entanglement. But when you learn enough physics, you’ll see that Maldacena and Susskind’s proposal sounds completely hare-brained.

But when you learn more physics—maybe more than enough?—you might decide there’s some merit to this idea after all. At the Centre for Quantum Technologies last summer, Jamie Vicary and I noticed some interesting connections between wormholes and quantum entanglement. We now have a paper out!

In it, we study quantum gravity in a universe where space is just 2-dimensional, not 3-dimensional like ours. It’s not realistic, but it has one huge advantage: there’s a working theory of what quantum gravity could be like when space is 2-dimensional, so you can calculate stuff!

So, we calculate what happens when a wormhole forms, and we show the ends look like a particle and its antiparticle (this was already known), and we note that this particle-antiparticle pair is entangled. In fact it’s completely entangled: any piece of information you might want to know about one can also be found in the other.

However, in a sense that Jamie and I make precise, this entanglement is ‘fake’. The reason is that the two ends of the wormhole are not independent things. They’re just two views of the same thing… and, technically, it doesn’t count as entanglement when something is ‘entangled with itself’. This fact is crucial to how Maldacena and Susskind want to get around the firewall problem.

For more details, try this:

Wormholes and entanglement, The n-Category Café.

This has links to other stuff, including our paper, but also some blog articles explaining the firewall problem, the paper by Maldacena and Susskind, and the original Einstein–Podolsky–Rosen and Einstein–Rosen papers (in English).

Since this quantum gravity stuff is more suited to the n-Category Café than here, I won’t enable comments here. If you want to talk, please go there. Sorry!


Unreliable Biomedical Research

13 January, 2014

An American drug company, Amgen, that tried to replicate 53 landmark studies in cancer was able to reproduce the original results in only 6 cases—even though they worked with the original researchers!

That’s not all. Scientists at the pharmaceutical company Bayer were able to reproduce the published results in just a quarter of 67 studies!

How could things be so bad? The picture here shows two reasons:

If most interesting hypotheses are false, a lot of positive results will be ‘false positives’. Negative results may be more reliable. But few people publish negative results, so we miss out on those!

And then there’s wishful thinking, sloppiness and downright fraud. Read this Economist article for more on the problems—and how to fix them:

Trouble at the lab, Economist, 18 October 2013.

That’s where I got the picture above.


Geometry Puzzles

12 January, 2014

Here are four puzzles about areas, in approximate order of increasing difficulty.

Mysteries of the equilateral triangle


Puzzle: Show the area of the orange circle equals the total area of the two blue regions.

In case you’re wondering, the picture above shows an equilateral triangle with a small circle inscribed in it and a big circle circumscribed around it.

The puzzle is easy if you think about it the right way. I never knew this cool fact until last month, when I read this fun free book:

• Brian McCartin, Mysteries of the equilateral triangle.

I’ve given talks on the numbers 5, 8, and 24. I’ve imagined writing a book that had chapters on all my favorite natural numbers, starting with Chapter 0. But it would be very hard to write Chapter 3, since this number has too many interesting properties! McCartin’s book covers a lot of them.

Square the lune to the tune of Claire de Lune


Puzzle: Show the crescent has the same area as the triangle.

Greek mathematicians really wanted to square the circle, by which I mean: use straightedge and compass to first draw a circle and then construct a square with the same area.

In 440 BC, Hippocrates of Chios figured out how to square the above crescent-shaped region, which lies between the circle centered at O and the smaller circle centered at D. So, this region is called the Lune of Hippocrates.

I’ve heard it said that this result gave some Greek geometers hope that the circle could be squared. I’m not sure this is true; Hippocrates himself was probably too smart to be fooled. But in any event, it would have been a false hope. Much later, around 1885, Lindemann and Weierstrass proved that squaring the circle was impossible.

Any crescent-shaped region formed by two circular arcs is called a lune. It’s widely believed that there are only 5 squarable lunes. In other words, there are only 5 shapes of lune constructible by straightedge and compass whose area equals that of a square constructible using straightedge and compass. (Obviously these lunes can come in many different sizes.)

Hippocrates discovered three squarable lunes. Two more were discovered by Martin Johan Wallenius in 1766. A proof that these are the only squarable lunes was given by Tchebatorew and Dorodnow, and summarized by the famous topologist Postnikov:

• M. M. Postnikov, The problem of squarable Lunes, translated from the Russian by Abe Shenitzer, American Mathematical Monthly, 107 (Aug.-Sep. 2000), 645–651.

However, there may be a loophole in this proof: Will Jagy claims that these Russians assumed without proof that the ratio of two angles involved in the construction of the lune is rational. With this assumption, finding a squarable lune amounts to finding a rational number u and a constructible number \sin a such that

\sin ( u a) = \sqrt{u} \sin a

Puzzle: Why should u be rational? Do you know the true state of the art here?

(My puzzles include hard questions I don’t know the answer to, and this is one.)

For a nice discussion of the 5 squarable lunes, with pictures, see this page:

The five squarable lunes, MathPages.

Twice in a blue lune


Puzzle: Show the area of this right triangle equals the total area inside the blue lunes. The outside of each lune is a semicircle. The inside of each lune is part of the circle containing the points A, B, and C.

The circle through all 3 corners of a triangle is called its circumcircle. You can construct the circumcircle using a straightedge and compass, if you want.

Again this is a famous old problem. The two blue lunes here are called the Lunes of Alhazen. This problem was later posed and solve by Leonardo da Vinci!

This puzzle is a lot of fun, so I urge you not to give up—but if you do, you can see da Vinci’s solution here.

The arbelos


Puzzle: show the area of the green shape equals the area of the circle.

The green shape is called an arbelos, which means ‘shoemaker’s knife’ in Greek, since it looks a bit like that. It shows up in Propositions 4-8 of the Book of Lemmas. This book goes back at least to Thābit ibn Qurra, a mathematician, astronomer and physician who live in Baghdad from 826 to 901 AD. Ibn Qurra said the book was written by Archimedes! But nobody is sure.

So, when you do this puzzle, you may be matching wits with Archimedes. But you’ll certainly be sharing some thoughts with Thābit ibn Qurra.

Math has been called the world’s longest conversation. Perhaps this is an exaggeration: people have been passing on stories for a long time. But Babylonians were computing the square root of two back in 1700 BC, probably using techniques that are still interesting today… so it really is remarkable how old mathematics still makes good sense, unlike the old creation myths.

For more on the arbelos try:

Arbelos, Wikipedia.

and

• Thomas Schoch, Arbelos references, 2013.

For the 15 propositions in the Book of Lemmas, see:

Book of Lemmas, Wikipedia.

Two semicircles is half a circle


Puzzle: show the total area of the two semicircles is half the area of the large circle.

Amazingly, it seems this fact was noticed only in 2011:

• Andrew K. Jobbings, Two semicircles fill half a circle, The Mathematical Gazette 95 (Nov. 2011), 538–540.

However, Andrew Jobbings is a genius when it comes to ‘recreational mathematics’. For more of his work, check out this page:

• Andrew Jobbings, Arbelos.

I learned about this puzzle from Alexander Bogomolny, who has a wonderful website full of Javascript geometry demonstrations:

• Alexander Bogomolny, A property of semicircles.

You’ll get a scary warning asking “Do you want to run this application?” Say yes. You’ll get an applet that lets you slide the point where the semicircles touch: no matter where it is, the semicircles have the same total area! Click “hint” and you’ll get a hint. If you’re still stuck, and too impatient to solve the puzzle yourself, scroll down and see a proof!

However, that proof is long and confusing. With geometry I like demonstrations where after some thought you can simply see that the result is true. For this puzzle, such a demonstration was provided by Greg Egan.

Click here if you give up on this puzzle and want to see Egan’s solution. You may need to ponder it a bit, but when you’re done you should say “yes, this result is clear!

As a separate hint: the answers to this puzzle and the previous one are similar, in a very nice way!


Lyapunov Functions for Complex-Balanced Systems

7 January, 2014

guest post by Manoj Gopalkrishnan

A few weeks back, I promised to tell you more about a long-standing open problem in reaction networks, the ‘global attractor conjecture’. I am not going to quite get there today, but we shall take one step in that direction.

Today’s plan is to help you make friends with a very useful function we will call the ‘free energy’ which comes up all the time in the study of chemical reaction networks. We will see that for complex-balanced systems, the free energy function decreases along trajectories of the rate equation. I’m going to explain this statement, and give you most of the proof!

The point of doing all this work is that we will then be able to invoke Lyapunov’s theorem which implies stability of the dynamics. In Greek mythology, Sisyphus was cursed to roll a boulder up a hill only to have it roll down again, so that he had to keep repeating the task for all eternity. When I think of an unstable equilibrium, I imagine a boulder delicately balanced on top of a hill, which will fall off if given the slightest push:

or, more abstractly:

On the other hand, I picture a stable equilibrium as a pebble at the very bottom of a hill. Whichever way a perturbation takes it is up, so it will roll down again to the bottom:

Lyapunov’s theorem guarantees stability provided we can exhibit a nice enough function V that decreases along trajectories. ‘Nice enough’ means that, viewing V as a height function for the hill, the equilibrium configuration should be at the bottom, and every direction from there should be up. If Sisyphus had dug a pit at the top of the hill for the boulder to rest in, Lyapunov’s theorem would have applied, and he could have gone home to rest. The moral of the story is that it pays to learn dynamical systems theory!

Because of the connection to Lyapunov’s theorem, such functions that decrease along trajectories are also called Lyapunov functions. A similar situation is seen in Boltzmann’s H-theorem, and hence such functions are sometimes called H-functions by physicists.

Another reason for me to talk about these ideas now is that I have posted a new article on the arXiv:

• Manoj Gopalkrishnan, On the Lyapunov function for complex-balanced mass-action systems.

The free energy function in chemical reaction networks goes back at least to 1972, to this paper:

• Friedrich Horn and Roy Jackson, General mass action kinetics, Arch. Rational Mech. Analysis 49 (1972), 81–116.

Many of us credit Horn and Jackson’s paper with starting the mathematical study of reaction networks. My paper is an exposition of the main result of Horn and Jackson, with a shorter and simpler proof. The gain comes because Horn and Jackson proved all their results from scratch, whereas I’m using some easy results from graph theory, and the log-sum inequality.

We shall be talking about reaction networks. Remember the idea from the network theory series. We have a set S whose elements are called species, for example

S = \{ \mathrm{H}_2\mathrm{O}, \mathrm{H}^+, \mathrm{OH}^- \}

A complex is a vector of natural numbers saying how many items of each species we have. For example, we could have a complex (2,3,1). But chemists would usually write this as

2 \mathrm{H}_2\mathrm{O} + 3 \mathrm{H}^+ + \mathrm{OH}^-

A reaction network is a set S of species and a set T of transitions or reactions, where each transition \tau \in T goes from some complex m(\tau) to some complex n(\tau). For example, we could have a transition \tau with

m(\tau) = \mathrm{H}_2\mathrm{O}

and

n(\tau) = \mathrm{H}^+ + \mathrm{OH}^-

In this situation chemists usually write

\mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-

but we want names like \tau for our transitions, so we might write

\tau : \mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-

or

\mathrm{H}_2\mathrm{O} \stackrel{\tau}{\longrightarrow} \mathrm{H}^+ + \mathrm{OH}^-

As John explained in Part 3 of the network theory series, chemists like to work with a vector of nonnegative real numbers x(t) saying the concentration of each species at time t. If we know a rate constant r(\tau) > 0 for each transition \tau, we can write down an equation saying how these concentrations change with time:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) (n(\tau) - m(\tau)) x^{m(\tau)} }

This is called the rate equation. It’s really a system of ODEs describing how the concentration of each species change with time. Here an expression like x^m is shorthand for the monomial {x_1}^{m_1} \cdots {x_k}^{m_k}.

John and Brendan talked about complex balance in Part 9. I’m going to recall this definition, from a slightly different point of view that will be helpful for the result we are trying to prove.

We can draw a reaction network as a graph! The vertices of this graph are all the complexes m(\tau), n(\tau) where \tau \in T. The edges are all the transitions \tau\in T. We think of each edge \tau as directed, going from m(\tau) to n(\tau).

We will call the map that sends each transition \tau to the positive real number r(\tau) x^{m(\tau)} the flow f_x(\tau) on this graph. The rate equation can be rewritten very simply in terms of this flow as:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T}(n(\tau) - m(\tau)) \, f_x(\tau) }

where the right-hand side is now a linear expression in the flow f_x.

Flows of water, or electric current, obey a version of Kirchhoff’s current law. Such flows are called conservative flows. The following two lemmas from graph theory are immediate for conservative flows:

Lemma 1. If f is a conservative flow then the net flow across every cut is zero.

A cut is a way of chopping the graph in two, like this:


It’s easy to prove Lemma 1 by induction, moving one vertex across the cut at a time.

Lemma 2. If a conservative flow exists then every edge \tau\in T is part of a directed cycle.

Why is Lemma 2 true? Suppose there exists an edge \tau : m \to n that is not part of any directed cycle. We will exhibit a cut with non-zero net flow. By Lemma 1, this will imply that the flow is not conservative.

One side of the cut will consist of all vertices from which m is reachable by a directed path in the reaction network. The other side of the cut contains at least n, since m is not reachable from n, by the assumption that \tau is not part of a directed cycle. There is flow going from left to right of the cut, across the transition \tau. Since there can be no flow coming back, this cut has nonzero net flow, and we’re done.     ▮

Now, back to the rate equation! We can ask if the flow f_x is conservative. That is, we can ask if, for every complex n:

\displaystyle{ \sum_{\tau : m \to n} f_x(m,n) = \sum_{\tau : n \to p} f_x(n,p). }

In words, we are asking if the sum of the flow through all transitions coming in to n equals the sum of the flow through all transitions going out of n. If this condition is satisfied at a vector of concentrations x = \alpha, so that the flow f_\alpha is conservative, then we call \alpha a point of complex balance. If in addition, every component of \alpha is strictly positive, then we say that the system is complex balanced.

Clearly if \alpha is a point of complex balance, it’s an equilibrium solution of the rate equation. In other words, x(t) = \alpha is a solution of the rate equation, where x(t) never changes.

I’m using ‘equilibrium’ the way mathematicians do. But I should warn you that chemists use ‘equilibrium’ to mean something more than merely a solution that doesn’t change with time. They often also mean it’s a point of complex balance, or even more. People actually get into arguments about this at conferences.

Complex balance implies more than mere equilibrium. For starters, if a reaction network is such that every edge belongs to a directed cycle, then one says that the reaction network is weakly reversible. So Lemmas 1 and 2 establish that complex-balanced systems must be weakly reversible!

From here on, we fix a complex-balanced system, with \alpha a strictly positive point of complex balance.

Definition. The free energy function is the function

g_\alpha(x) = \sum_{s\in S} x_s \log x_s - x_s - x_s \log \alpha_s

where the sum is over all species in S.

The whole point of defining the function this way is because it is the unique function, up to an additive constant, whose partial derivative with respect to x_s is \log x_s/\alpha_s. This is important enough that we write it as a lemma. To state it in a pithy way, it is helpful to introduce vector notation for division and logarithms. If x and y are two vectors, we will understand x/y to mean the vector z such that z_s = x_s/ y_s coordinate-wise. Similarly \log x is defined in a coordinate-wise sense as the vector with coordinates (\log x)_s = \log x_s.

Lemma 3. The gradient \nabla g_\alpha(x) of g_\alpha(x) equals \log(x/\alpha).

We’re ready to state our main theorem!

Theorem. Fix a trajectory x(t) of the rate equation. Then g_\alpha(x(t)) is a decreasing function of time t. Further, it is strictly decreasing unless x(t) is an equilibrium solution of the rate equation.

I find precise mathematical statements reassuring. You can often make up your mind about the truth value from a few examples. Very often, though not always, a few well-chosen examples are all you need to get the general idea for the proof. Such is the case for the above theorem. There are three key examples: the two-cycle, the three-cycle, and the figure-eight.

The two-cycle. The two-cycle is this reaction network:

It has two complexes m and n and two transitions \tau_1 = m\to n and \tau_2 = n\to m, with rates r_1 = r(\tau_1) and r_2 = r(\tau_2) respectively.

Fix a solution x(t) of the rate equation. Then the flow from m to n equals r_1 x^m and the backward flow equals r_2 x^n. The condition for f_\alpha to be a conservative flow requires that f_\alpha = r_1 \alpha^m = r_2 \alpha^n. This is one binomial equation in at least one variable, and clearly has a solution in the positive reals. We have just shown that every two-cycle is complex balanced.

The derivative d g_\alpha(x(t))/d t can now be computed by the chain rule, using Lemma 3. It works out to f_\alpha times

\displaystyle{ \left((x/\alpha)^m - (x/\alpha)^n\right) \, \log\frac{(x/\alpha)^n}{(x/\alpha)^m} }

This is never positive, and it’s zero if and only if

(x/\alpha)^m = (x/\alpha)^n

Why is this? Simply because the logarithm of something greater than 1 is positive, while the log of something less than 1 is negative, so that the sign of (x/\alpha)^m - (x/\alpha)^n is always opposite the sign of \log \frac{(x/\alpha)^n}{(x/\alpha)^m}. We have verified our theorem for this example.

(Note that (x/\alpha)^m = (x/\alpha)^n occurs when x = \alpha, but also at other points: in this example, there is a whole hypersurface consisting of points of complex balance.)

In fact, this simple calculation achieves much more.

Definition. A reaction network is reversible if for every transition \tau  : m \to n there is a transition \tau' : m \to n going back, called the reverse of \tau. Suppose we have a reversible reaction network and a vector of concentrations \alpha such that the flow along each edge equals that along the edge going back:

f_\alpha(\tau) = f_\alpha(\tau')

whenever \tau' is the reverse \tau. Then we say the reaction network is detailed balanced, and \alpha is a point of detailed balance.

For a detailed-balanced system, the time derivative of g_\alpha is a sum over the contributions of pairs consisting of an edge and its reverse. Hence, the two-cycle calculation shows that the theorem holds for all detailed balanced systems!

This linearity trick is going to prove very valuable. It will allow us to treat the general case of complex balanced systems one cycle at a time. The proof for a single cycle is essentially contained in the example of a three-cycle, which we treat next:

The three-cycle. The three-cycle is this reaction network:

We assume that the system is complex balanced, so that

f_\alpha(m_1\to m_2) = f_\alpha(m_2\to m_3) = f_\alpha(m_3\to m_1)

Let us call this nonnegative number f_\alpha. A small calculation employing the chain rule shows that d g_\alpha(x(t))/d t equals f_\alpha times

\displaystyle{ (x/\alpha)^{m_1}\, \log\frac{(x/\alpha)^{m_2}}{(x/\alpha)^{m_1}} \; + }

\displaystyle{ (x/\alpha)^{m_2} \, \log\frac{(x/\alpha)^{m_3}}{(x/\alpha)^{m_2}} \; + }

\displaystyle{  (x/\alpha)^{m_3}\, \log\frac{(x/\alpha)^{m_1}}{(x/\alpha)^{m_3}} }

We need to think about the sign of this quantity:

Lemma 3. Let a,b,c be positive numbers. Then a \log b/a + b\log c/b + c\log a/c is less than or equal to zero, with equality precisely when a=b=c.

The proof is a direct application of the log sum inequality. In fact, this holds not just for three numbers, but for any finite list of numbers. Indeed, that is precisely how one obtains the proof for cycles of arbitrary length. Even the two-cycle proof is a special case! If you are wondering how the log sum inequality is proved, it is an application of Jensen’s inequality, that workhorse of convex analysis.

The three-cycle calculation extends to a proof for the theorem so long as there is no directed edge that is shared between two directed cycles. When there are such edges, we need to argue that the flows f_\alpha and f_x can be split between the cycles sharing that edge in a consistent manner, so that the cycles can be analyzed independently. We will need the following simple lemma about conservative flows from graph theory. We will apply this lemma to the flow f_\alpha.

Lemma 4. Let f be a conservative flow on a graph G. Then there exist directed cycles C_1, C_2,\dots, C_k in G, and nonnegative real ‘flows’ f_1,f_2,\dots,f_k \in [0,\infty] such that for each directed edge e in G, the flow f(e) equals the sum of f_i over i such the cycle C_i contains the edge e.

Intuitively, this lemma says that conservative flows come from constant flows on the directed cycles of the graph. How does one show this lemma? I’m sure there are several proofs, and I hope some of you can share some of the really neat ones with me. The one I employed was algorithmic. The idea is to pick a cycle, any cycle, and subtract the maximum constant flow that this cycle allows, and repeat. This is most easily understood by looking at the example of the figure-eight:

The figure-eight. This reaction network consists of two three-cycles sharing an edge:

Here’s the proof for Lemma 4. Let f be a conservative flow on this graph. We want to exhibit cycles and flows on this graph according to Lemma 4. We arbitrarily pick any cycle in the graph. For example, in the figure-eight, suppose we pick the cycle u_0\to u_1\to u_2\to u_0. We pick an edge in this cycle on which the flow is minimum. In this case, f(u_0\to u_1) = f(u_2\to u_0) is the minimum. We define a remainder flow by subtracting from f this constant flow which was restricted to one cycle. So the remainder flow is the same as f on edges that don’t belong to the picked cycle. For edges that belong to the cycle, the remainder flow is f minus the minimum of f on this cycle. We observe that this remainder flow satisfies the conditions of Lemma 4 on a graph with strictly fewer edges. Continuing in this way, since the lemma is trivially true for the empty graph, we are done by infinite descent.

Now that we know how to split the flow f_\alpha across cycles, we can figure out how to split the rates across the different cycles. This will tell us how to split the flow f_x across cycles. Again, this is best illustrated by an example.

The figure-eight. Again, this reaction network looks like

Suppose as in Lemma 4, we obtain the cycles

C_1 = u_0\to u_1\to u_2\to u_0

with constant flow f_\alpha^1

and

C_2 = u_3\to u_1\to u_2\to u_3 with constant flow f_\alpha^2 such that

f_\alpha^1 + f_\alpha^2 = f_\alpha(u_1\to u_2)

Here’s the picture:

Then we obtain rates r^1(u_1\to u_2) and r^2(u_1\to u_2) by solving the equations

f^1_\alpha = r^1(u_1\to u_2) \alpha^{u_1}

f^2_\alpha = r^2(u_1\to u_2) \alpha^{u_2}

Using these rates, we can define non-constant flows f^1_x on C_1 and f^2_x on C_2 by the usual formulas:

f^1_x(u_1\to u_2) = r^1(u_1\to u_2) x^{u_1}

and similarly for f^2_x. In particular, this gives us

f^1_x(u_1\to u_2)/f^1_\alpha = (x/\alpha)^{u_1}

and similarly for f^2_x.

Using this, we obtain the proof of the Theorem! The time derivative of g_\alpha along a trajectory has a contribution from each cycle C as in Lemma 4, where each cycle is treated as a separate system with the new rates r^C, and the new flows f^C_\alpha and f^C_x. So, we’ve reduced the problem to the case of a cycle, which we’ve already done.

Let’s review what happened. The time derivative of the function g_\alpha has a very nice form, which is linear in the flow f_x. The reaction network can be broken up into cycles. Th e conservative flow f_\alpha for a complex balanced system can be split into conservative flows on cycles by Lemma 4. This informs us how to split the non-conservative flow f_x across cycles. By linearity of the time derivative, we can separately treat the case for every cycle. For each cycle, we get an expression to which the log sum inequality applies, giving us the final result that g_\alpha decreases along trajectories of the rate equation.

Now that we have a Lyapunov function, we will put it to use to obtain some nice theorems about the dynamics, and finally state the global attractor conjecture. All that and more, in the next blog post!


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers