Lyapunov Functions for Complex-Balanced Systems

7 January, 2014

guest post by Manoj Gopalkrishnan

A few weeks back, I promised to tell you more about a long-standing open problem in reaction networks, the ‘global attractor conjecture’. I am not going to quite get there today, but we shall take one step in that direction.

Today’s plan is to help you make friends with a very useful function we will call the ‘free energy’ which comes up all the time in the study of chemical reaction networks. We will see that for complex-balanced systems, the free energy function decreases along trajectories of the rate equation. I’m going to explain this statement, and give you most of the proof!

The point of doing all this work is that we will then be able to invoke Lyapunov’s theorem which implies stability of the dynamics. In Greek mythology, Sisyphus was cursed to roll a boulder up a hill only to have it roll down again, so that he had to keep repeating the task for all eternity. When I think of an unstable equilibrium, I imagine a boulder delicately balanced on top of a hill, which will fall off if given the slightest push:

or, more abstractly:

On the other hand, I picture a stable equilibrium as a pebble at the very bottom of a hill. Whichever way a perturbation takes it is up, so it will roll down again to the bottom:

Lyapunov’s theorem guarantees stability provided we can exhibit a nice enough function V that decreases along trajectories. ‘Nice enough’ means that, viewing V as a height function for the hill, the equilibrium configuration should be at the bottom, and every direction from there should be up. If Sisyphus had dug a pit at the top of the hill for the boulder to rest in, Lyapunov’s theorem would have applied, and he could have gone home to rest. The moral of the story is that it pays to learn dynamical systems theory!

Because of the connection to Lyapunov’s theorem, such functions that decrease along trajectories are also called Lyapunov functions. A similar situation is seen in Boltzmann’s H-theorem, and hence such functions are sometimes called H-functions by physicists.

Another reason for me to talk about these ideas now is that I have posted a new article on the arXiv:

• Manoj Gopalkrishnan, On the Lyapunov function for complex-balanced mass-action systems.

The free energy function in chemical reaction networks goes back at least to 1972, to this paper:

• Friedrich Horn and Roy Jackson, General mass action kinetics, Arch. Rational Mech. Analysis 49 (1972), 81–116.

Many of us credit Horn and Jackson’s paper with starting the mathematical study of reaction networks. My paper is an exposition of the main result of Horn and Jackson, with a shorter and simpler proof. The gain comes because Horn and Jackson proved all their results from scratch, whereas I’m using some easy results from graph theory, and the log-sum inequality.

We shall be talking about reaction networks. Remember the idea from the network theory series. We have a set S whose elements are called species, for example

S = \{ \mathrm{H}_2\mathrm{O}, \mathrm{H}^+, \mathrm{OH}^- \}

A complex is a vector of natural numbers saying how many items of each species we have. For example, we could have a complex (2,3,1). But chemists would usually write this as

2 \mathrm{H}_2\mathrm{O} + 3 \mathrm{H}^+ + \mathrm{OH}^-

A reaction network is a set S of species and a set T of transitions or reactions, where each transition \tau \in T goes from some complex m(\tau) to some complex n(\tau). For example, we could have a transition \tau with

m(\tau) = \mathrm{H}_2\mathrm{O}

and

n(\tau) = \mathrm{H}^+ + \mathrm{OH}^-

In this situation chemists usually write

\mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-

but we want names like \tau for our transitions, so we might write

\tau : \mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-

or

\mathrm{H}_2\mathrm{O} \stackrel{\tau}{\longrightarrow} \mathrm{H}^+ + \mathrm{OH}^-

As John explained in Part 3 of the network theory series, chemists like to work with a vector of nonnegative real numbers x(t) saying the concentration of each species at time t. If we know a rate constant r(\tau) > 0 for each transition \tau, we can write down an equation saying how these concentrations change with time:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) (n(\tau) - m(\tau)) x^{m(\tau)} }

This is called the rate equation. It’s really a system of ODEs describing how the concentration of each species change with time. Here an expression like x^m is shorthand for the monomial {x_1}^{m_1} \cdots {x_k}^{m_k}.

John and Brendan talked about complex balance in Part 9. I’m going to recall this definition, from a slightly different point of view that will be helpful for the result we are trying to prove.

We can draw a reaction network as a graph! The vertices of this graph are all the complexes m(\tau), n(\tau) where \tau \in T. The edges are all the transitions \tau\in T. We think of each edge \tau as directed, going from m(\tau) to n(\tau).

We will call the map that sends each transition \tau to the positive real number r(\tau) x^{m(\tau)} the flow f_x(\tau) on this graph. The rate equation can be rewritten very simply in terms of this flow as:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T}(n(\tau) - m(\tau)) \, f_x(\tau) }

where the right-hand side is now a linear expression in the flow f_x.

Flows of water, or electric current, obey a version of Kirchhoff’s current law. Such flows are called conservative flows. The following two lemmas from graph theory are immediate for conservative flows:

Lemma 1. If f is a conservative flow then the net flow across every cut is zero.

A cut is a way of chopping the graph in two, like this:


It’s easy to prove Lemma 1 by induction, moving one vertex across the cut at a time.

Lemma 2. If a conservative flow exists then every edge \tau\in T is part of a directed cycle.

Why is Lemma 2 true? Suppose there exists an edge \tau : m \to n that is not part of any directed cycle. We will exhibit a cut with non-zero net flow. By Lemma 1, this will imply that the flow is not conservative.

One side of the cut will consist of all vertices from which m is reachable by a directed path in the reaction network. The other side of the cut contains at least n, since m is not reachable from n, by the assumption that \tau is not part of a directed cycle. There is flow going from left to right of the cut, across the transition \tau. Since there can be no flow coming back, this cut has nonzero net flow, and we’re done.     ▮

Now, back to the rate equation! We can ask if the flow f_x is conservative. That is, we can ask if, for every complex n:

\displaystyle{ \sum_{\tau : m \to n} f_x(m,n) = \sum_{\tau : n \to p} f_x(n,p). }

In words, we are asking if the sum of the flow through all transitions coming in to n equals the sum of the flow through all transitions going out of n. If this condition is satisfied at a vector of concentrations x = \alpha, so that the flow f_\alpha is conservative, then we call \alpha a point of complex balance. If in addition, every component of \alpha is strictly positive, then we say that the system is complex balanced.

Clearly if \alpha is a point of complex balance, it’s an equilibrium solution of the rate equation. In other words, x(t) = \alpha is a solution of the rate equation, where x(t) never changes.

I’m using ‘equilibrium’ the way mathematicians do. But I should warn you that chemists use ‘equilibrium’ to mean something more than merely a solution that doesn’t change with time. They often also mean it’s a point of complex balance, or even more. People actually get into arguments about this at conferences.

Complex balance implies more than mere equilibrium. For starters, if a reaction network is such that every edge belongs to a directed cycle, then one says that the reaction network is weakly reversible. So Lemmas 1 and 2 establish that complex-balanced systems must be weakly reversible!

From here on, we fix a complex-balanced system, with \alpha a strictly positive point of complex balance.

Definition. The free energy function is the function

g_\alpha(x) = \sum_{s\in S} x_s \log x_s - x_s - x_s \log \alpha_s

where the sum is over all species in S.

The whole point of defining the function this way is because it is the unique function, up to an additive constant, whose partial derivative with respect to x_s is \log x_s/\alpha_s. This is important enough that we write it as a lemma. To state it in a pithy way, it is helpful to introduce vector notation for division and logarithms. If x and y are two vectors, we will understand x/y to mean the vector z such that z_s = x_s/ y_s coordinate-wise. Similarly \log x is defined in a coordinate-wise sense as the vector with coordinates (\log x)_s = \log x_s.

Lemma 3. The gradient \nabla g_\alpha(x) of g_\alpha(x) equals \log(x/\alpha).

We’re ready to state our main theorem!

Theorem. Fix a trajectory x(t) of the rate equation. Then g_\alpha(x(t)) is a decreasing function of time t. Further, it is strictly decreasing unless x(t) is an equilibrium solution of the rate equation.

I find precise mathematical statements reassuring. You can often make up your mind about the truth value from a few examples. Very often, though not always, a few well-chosen examples are all you need to get the general idea for the proof. Such is the case for the above theorem. There are three key examples: the two-cycle, the three-cycle, and the figure-eight.

The two-cycle. The two-cycle is this reaction network:

It has two complexes m and n and two transitions \tau_1 = m\to n and \tau_2 = n\to m, with rates r_1 = r(\tau_1) and r_2 = r(\tau_2) respectively.

Fix a solution x(t) of the rate equation. Then the flow from m to n equals r_1 x^m and the backward flow equals r_2 x^n. The condition for f_\alpha to be a conservative flow requires that f_\alpha = r_1 \alpha^m = r_2 \alpha^n. This is one binomial equation in at least one variable, and clearly has a solution in the positive reals. We have just shown that every two-cycle is complex balanced.

The derivative d g_\alpha(x(t))/d t can now be computed by the chain rule, using Lemma 3. It works out to f_\alpha times

\displaystyle{ \left((x/\alpha)^m - (x/\alpha)^n\right) \, \log\frac{(x/\alpha)^n}{(x/\alpha)^m} }

This is never positive, and it’s zero if and only if

(x/\alpha)^m = (x/\alpha)^n

Why is this? Simply because the logarithm of something greater than 1 is positive, while the log of something less than 1 is negative, so that the sign of (x/\alpha)^m - (x/\alpha)^n is always opposite the sign of \log \frac{(x/\alpha)^n}{(x/\alpha)^m}. We have verified our theorem for this example.

(Note that (x/\alpha)^m = (x/\alpha)^n occurs when x = \alpha, but also at other points: in this example, there is a whole hypersurface consisting of points of complex balance.)

In fact, this simple calculation achieves much more.

Definition. A reaction network is reversible if for every transition \tau  : m \to n there is a transition \tau' : m \to n going back, called the reverse of \tau. Suppose we have a reversible reaction network and a vector of concentrations \alpha such that the flow along each edge equals that along the edge going back:

f_\alpha(\tau) = f_\alpha(\tau')

whenever \tau' is the reverse \tau. Then we say the reaction network is detailed balanced, and \alpha is a point of detailed balance.

For a detailed-balanced system, the time derivative of g_\alpha is a sum over the contributions of pairs consisting of an edge and its reverse. Hence, the two-cycle calculation shows that the theorem holds for all detailed balanced systems!

This linearity trick is going to prove very valuable. It will allow us to treat the general case of complex balanced systems one cycle at a time. The proof for a single cycle is essentially contained in the example of a three-cycle, which we treat next:

The three-cycle. The three-cycle is this reaction network:

We assume that the system is complex balanced, so that

f_\alpha(m_1\to m_2) = f_\alpha(m_2\to m_3) = f_\alpha(m_3\to m_1)

Let us call this nonnegative number f_\alpha. A small calculation employing the chain rule shows that d g_\alpha(x(t))/d t equals f_\alpha times

\displaystyle{ (x/\alpha)^{m_1}\, \log\frac{(x/\alpha)^{m_2}}{(x/\alpha)^{m_1}} \; + }

\displaystyle{ (x/\alpha)^{m_2} \, \log\frac{(x/\alpha)^{m_3}}{(x/\alpha)^{m_2}} \; + }

\displaystyle{  (x/\alpha)^{m_3}\, \log\frac{(x/\alpha)^{m_1}}{(x/\alpha)^{m_3}} }

We need to think about the sign of this quantity:

Lemma 3. Let a,b,c be positive numbers. Then a \log b/a + b\log c/b + c\log a/c is less than or equal to zero, with equality precisely when a=b=c.

The proof is a direct application of the log sum inequality. In fact, this holds not just for three numbers, but for any finite list of numbers. Indeed, that is precisely how one obtains the proof for cycles of arbitrary length. Even the two-cycle proof is a special case! If you are wondering how the log sum inequality is proved, it is an application of Jensen’s inequality, that workhorse of convex analysis.

The three-cycle calculation extends to a proof for the theorem so long as there is no directed edge that is shared between two directed cycles. When there are such edges, we need to argue that the flows f_\alpha and f_x can be split between the cycles sharing that edge in a consistent manner, so that the cycles can be analyzed independently. We will need the following simple lemma about conservative flows from graph theory. We will apply this lemma to the flow f_\alpha.

Lemma 4. Let f be a conservative flow on a graph G. Then there exist directed cycles C_1, C_2,\dots, C_k in G, and nonnegative real ‘flows’ f_1,f_2,\dots,f_k \in [0,\infty] such that for each directed edge e in G, the flow f(e) equals the sum of f_i over i such the cycle C_i contains the edge e.

Intuitively, this lemma says that conservative flows come from constant flows on the directed cycles of the graph. How does one show this lemma? I’m sure there are several proofs, and I hope some of you can share some of the really neat ones with me. The one I employed was algorithmic. The idea is to pick a cycle, any cycle, and subtract the maximum constant flow that this cycle allows, and repeat. This is most easily understood by looking at the example of the figure-eight:

The figure-eight. This reaction network consists of two three-cycles sharing an edge:

Here’s the proof for Lemma 4. Let f be a conservative flow on this graph. We want to exhibit cycles and flows on this graph according to Lemma 4. We arbitrarily pick any cycle in the graph. For example, in the figure-eight, suppose we pick the cycle u_0\to u_1\to u_2\to u_0. We pick an edge in this cycle on which the flow is minimum. In this case, f(u_0\to u_1) = f(u_2\to u_0) is the minimum. We define a remainder flow by subtracting from f this constant flow which was restricted to one cycle. So the remainder flow is the same as f on edges that don’t belong to the picked cycle. For edges that belong to the cycle, the remainder flow is f minus the minimum of f on this cycle. We observe that this remainder flow satisfies the conditions of Lemma 4 on a graph with strictly fewer edges. Continuing in this way, since the lemma is trivially true for the empty graph, we are done by infinite descent.

Now that we know how to split the flow f_\alpha across cycles, we can figure out how to split the rates across the different cycles. This will tell us how to split the flow f_x across cycles. Again, this is best illustrated by an example.

The figure-eight. Again, this reaction network looks like

Suppose as in Lemma 4, we obtain the cycles

C_1 = u_0\to u_1\to u_2\to u_0

with constant flow f_\alpha^1

and

C_2 = u_3\to u_1\to u_2\to u_3 with constant flow f_\alpha^2 such that

f_\alpha^1 + f_\alpha^2 = f_\alpha(u_1\to u_2)

Here’s the picture:

Then we obtain rates r^1(u_1\to u_2) and r^2(u_1\to u_2) by solving the equations

f^1_\alpha = r^1(u_1\to u_2) \alpha^{u_1}

f^2_\alpha = r^2(u_1\to u_2) \alpha^{u_2}

Using these rates, we can define non-constant flows f^1_x on C_1 and f^2_x on C_2 by the usual formulas:

f^1_x(u_1\to u_2) = r^1(u_1\to u_2) x^{u_1}

and similarly for f^2_x. In particular, this gives us

f^1_x(u_1\to u_2)/f^1_\alpha = (x/\alpha)^{u_1}

and similarly for f^2_x.

Using this, we obtain the proof of the Theorem! The time derivative of g_\alpha along a trajectory has a contribution from each cycle C as in Lemma 4, where each cycle is treated as a separate system with the new rates r^C, and the new flows f^C_\alpha and f^C_x. So, we’ve reduced the problem to the case of a cycle, which we’ve already done.

Let’s review what happened. The time derivative of the function g_\alpha has a very nice form, which is linear in the flow f_x. The reaction network can be broken up into cycles. Th e conservative flow f_\alpha for a complex balanced system can be split into conservative flows on cycles by Lemma 4. This informs us how to split the non-conservative flow f_x across cycles. By linearity of the time derivative, we can separately treat the case for every cycle. For each cycle, we get an expression to which the log sum inequality applies, giving us the final result that g_\alpha decreases along trajectories of the rate equation.

Now that we have a Lyapunov function, we will put it to use to obtain some nice theorems about the dynamics, and finally state the global attractor conjecture. All that and more, in the next blog post!


The Pentagram of Venus

4 January, 2014

 

This image, made by Greg Egan, shows the orbit of Venus.

Look down on the plane of the Solar System from above the Earth. Track the Earth so it always appears directly below you, but don’t turn along with it. With the passage of each year, you will see the Sun go around the Earth. As the Sun goes around the Earth 8 times, Venus goes around the Sun 13 times, and traces out the pretty curve shown here.

It’s called the pentagram of Venus, because it has 5 ‘lobes’ where Venus makes its closest approach to Earth. At each closest approach, Venus move backwards compared to its usual motion across the sky: this is called retrograde motion.

Actually, what I just said is only approximately true. The Earth orbits the Sun once every

365.256

days. Venus orbits the Sun once every

224.701

days. So, Venus orbits the Sun in

224.701 / 365.256 ≈ 0.615187

Earth years. And here’s the cool coincidence:

8/13 ≈ 0.615385

That’s pretty close! So in 8 Earth years, Venus goes around the Sun almost 13 times. Actually, it goes around 13.004 times.

During this 8-year cycle, Venus gets as close as possible to the Earth about

13 – 8 = 5

times. And each time it does, Venus moves to a new lobe of the pentagram of Venus! This new lobe is

8 – 5 = 3

steps ahead of the last one. Check to make sure:

That’s why they call it the pentagram of Venus!



When Venus gets as close as possible to us, we see it directly in front of the Sun. This is called an inferior conjunction. Astronomers have names for all of these things:

So, every 8 years there are about 5 inferior conjunctions of Venus.

Puzzle 1: Suppose the Earth orbits the Sun n times while another planet, closer to the Sun, orbits it m times. Under what conditions does the ‘generalized pentagram’ have k = mn lobes? (The pentagram of Venus has 5 = 13 – 8 lobes.)

Puzzle 2: Under what conditions does the planet move forward j = nk steps each time it reaches a new lobe? (Venus moves ahead 3 = 8 – 5 steps each time.)

Now, I’m sure you’ve noticed that these numbers:

3, 5, 8, 13

are consecutive Fibonacci numbers.

Puzzle 3: Is this just a coincidence?

As you may have heard, ratios of consecutive Fibonacci numbers give the best approximations to the golden ratio φ = (√5 – 1)/2. This number actually plays a role in celestial mechanics: the Kolmogorov–Arnol’d–Moser theorem says two systems vibrating with frequencies having a ratio equal to φ are especially stable against disruption by resonances, because this number is hard to approximate well by rationals. But the Venus/Earth period ratio 0.615187 is actually closer to the rational number 8/13 ≈ 0.615385 than φ ≈ 0.618034. So if this period ratio is trying to avoid rational numbers by being equal to φ, it’s not doing a great job!

It’s all rather tricky, because sometimes rational numbers cause destabilizing resonances, as we see in the gaps of Saturn’s rings:


whereas other times rational numbers stabilize orbits, as with the moons of Jupiter:


I’ve never understood this, and I’m afraid no amount of words will help me: I’ll need to dig into the math.

Given my fascination with rolling circles and the number 5, I can’t believe that I learned about the pentagram of Venus only recently! It’s been known at least for centuries, perhaps millennia. Here’s a figure from James Ferguson’s 1799 book Astronomy Explained Upon Sir Isaac Newton’s Principles:


Naturally, some people get too excited about all this stuff—the combination of Venus, Fibonacci numbers, the golden ratio, and a ‘pentagram’ overloads their tiny brains. Some claim the pentagram got its origin from this astronomical phenomenon. I doubt we’ll ever know. Some get excited about the fact that a Latin name for the planet Venus is Lucifer. Lucifer, pentagrams… get it?

I got the above picture from here:

Venus and the pentagram, Grand Lodge of British Columbia and Yukon.

This website is defending the Freemasons against accusations of Satanism!

On a sweeter note, the pentagram of Venus is also called the rose of Venus. You can buy a pendant in this pattern:

It’s pretty—but according to the advertisement, that’s not all! It’s also “an energetic tool that creates a harmonising field of Negative Ion around our body to support and balance our own magnetic field and aura.”

In The Da Vinci Code, someone claims that Venus traces “a perfect pentacle across the ecliptic sky every 8 years.”

But it’s not perfect! Every 8 years, Venus goes around the Sun 13.004 times. So the whole pattern keeps shifting. It makes a full turn about once every 160 years. You can see this slippage using this nice applet, especially if you crank up the speed:

• Steven Deutch, The (almost) Venus-Earth pentagram.

Also, the orbits of Earth and Venus aren’t perfect circles!

But still, it’s fun. The universe is full of mathematical beauty. It seems we need to get closer and closer to the fundamental laws of nature to make the math and the universe match more and more accurately. Maybe that’s what ‘fundamental laws’ means. But the universe is also richly packed with beautiful approximate mathematical patterns, stacked on top of each other in a dizzying way.

 


2014 on Azimuth

31 December, 2013

 

Happy New Year! We’ve got some fun guest posts lined up for next year, including:

Marc Harper, Relative entropy in evolutionary dynamics.

Marc Harper uses ideas from information theory in his work on bioinformatics and evolutionary game theory. This article explains some of his new work. And as a warmup, it explains how relative entropy can serve as a Lyapunov function in evolution!

This includes answering the question:

“What is a Lyapunov function, and why should I care?”

The brief answer, in case you’re eager to know, is this. A Lyapunov function is something that always increases—or always decreases—as time goes on. Examples include entropy and free energy. So, a Lyapunov function can be a way of making the 2nd law of thermodynamics mathematically precise! It’s also a way to show things are approaching equilibrium.

The overall goal here is applying entropy and information theory to better understand the behavior of biological and ecological systems. And in April 2015, Marc Harper and I are helping run a workshop on this topic! We’re doing this with John Harte, an ecologist who uses maximum entropy methods to predict the distribution, abundance and energy usage of species. It should be really interesting!

But back to blog articles:

Manoj Gopalkrishnan, Lyapunov functions for complex-balanced systems.

Manoj Gopalkrishnan is a mathematician at the Tata Institute of Fundamental Research in Mumbai who works on problems coming from chemistry and biology. This post will explain his recent paper on a Lyapunov function for chemical reactions. This function is closely related to free energy, a concept from thermodynamics. So again, one of the overall goals is to apply entropy to better understand living systems.

Since some evolutionary games are isomorphic to chemical reaction networks, this post should be connected to Marc’s. But there’s some mental work left to make the connection—for me, at least. It should be really cool when it all fits together!

Alastair Jamieson-Lane, Stochastic cross impact balance analysis.

Alastair Jamieson-Lane is a mathematician in the master’s program at the University of British Columbia. Very roughly, this post is about a method for determining which economic scenarios are more likely. The likely scenarios get fed into things like the IPCC climate models, so this is important.

This blog article has an interesting origin. Vanessa Schweizer has a bachelor’s degree in physics, a masters in environmental studies, and a PhD in engineering and public policy. She now works at the University of Waterloo on long-term decision-making problems.

A while back, I met Vanessa at a workshop called What Is Climate Change and What To Do About It?, at the Balsillie School of International Affairs, which is in Waterloo. She described her work with Alastair Jamieson-Lane and the physicist Matteo Smerlak on stochastic cross impact balance analysis. It sounded really interesting, something I’d like to work on. So I solicited some blog articles from them. I hope this is just the first!

So: Happy New Year, and good reading!

Also: we’re always looking for good guest posts here on Azimuth, and we have a system for helping you write them. So, if you know something interesting about environmental or energy issues, ecology, biology or chemistry, consider giving it a try!

If you read some posts here, especially guest posts, you’ll get an idea of what we’re looking for. David Tanzer, a software developer in New York who is very active in the Azimuth Project these days, made an organized list of Azimuth blog posts here:

Azimuth Blog Overview.

You can see the guest posts listed by author. This overview is also great for catching up on old posts!


Logic, Probability and Reflection

26 December, 2013

 

Last week I attended the Machine Intelligence Research Institute’s sixth Workshop on Logic, Probability, and Reflection. This one was in Berkeley, where the institute has its headquarters.

You may know this institute under their previous name: the Singularity Institute. It seems to be the brainchild of Eliezer Yudkowsky, a well-known advocate of ‘friendly artificial intelligence’, whom I interviewed in week311, week312 and week313 of This Week’s Finds. He takes an approach to artificial intelligence that’s heavily influenced by mathematical logic, and I got invited to the workshop because I blogged about a paper he wrote with Mihaly Barasz, Paul Christiano and Marcello Herresho ff on probability theory and logic.

I only have the energy to lay the groundwork for a good explanation of what happened in the workshop. So, after you read my post, please read this:

• Benja Fallenstein, Results from MIRI’s December workshop, Less Wrong, 28 December 2013.

The workshop had two main themes, so let me tell you what they were.

Scientific induction in mathematics

The first theme is related to that paper I just mentioned. How should a rational agent assign probabilities to statements in mathematics? Of course an omniscient being could assign

probability 1 to every mathematical statement that’s provable,

probability 0 to every statement whose negation is provable,

and

to every statement that is neither provable nor disprovable.

But a real-world rational agent will never have time to check all proofs, so there will always be lots of statements it’s not sure about. Actual mathematicians always have conjectures, like the Twin Prime Conjecture, that we consider plausible even though nobody has proved them. And whenever we do research, we’re constantly estimating how likely it is for statements to be true, and changing our estimates as new evidence comes in. In other words, we use scientific induction in mathematics.

How could we automate this? Most of us don’t consciously assign numerical probabilities to mathematical statements. But maybe an AI mathematician should. If so, what rules should it follow?

It’s natural to try a version of Solomonoff induction, where our probability estimate, before any evidence comes in, favors statements that are simple. However, this runs up against problems. If you’re interested in learning more about this, try:

• Jeremy Hahn, Scientific induction in probabilistic mathematics.

It’s a summary of ideas people came up with during the workshop. I would like to explain them sometime, but for now I should move on.

The Löbian obstacle

The second main theme was the ‘Löbian obstacle’. Löb’s theorem is the flip side of Gödel’s first incompleteness theorem, less famous but just as shocking. It seems to put limitations on how much a perfectly rational being can trust itself.

Since it’s the day after Christmas, let’s ease our way into these deep waters with the Santa Claus paradox, also known as Curry’s paradox.

If you have a child who is worried that Santa Claus might not exist, you can reassure them using this sentence:

If this sentence is true, Santa Claus exists.

Call it P, for short.

Assume, for the sake of argument, that P is true. Then what it says is true: “If P is true, Santa Claus exists.” And we’re assuming P is true. So, Santa Claus exists.

So, we’ve proved that if P is true, Santa Claus exists.

But that’s just what P says!

So, P is true.

So, Santa Claus exists!

There must be something wrong about this argument, even if Santa Claus does exist, because if it were valid you could you use it to prove anything at all. The self-reference is obviously suspicious. The sentence in question is a variant of the Liar Paradox:

This sentence is false.

since we can rewrite the Liar Paradox as

If this sentence is true, 0 = 1.

and then replace “0=1” by any false statement you like.

However, Gödel figured out a way to squeeze solid insights from these dubious self-referential sentences. He did this by creating a statement in the language of arithmetic, referring to nothing but numbers, which nonetheless manages to effectively say

This sentence is unprovable.

If it were provable, you’d get a contradiction! So, either arithmetic is inconsistent or this sentence is unprovable. But if it’s unprovable, it’s true. So, there are true but unprovable statements in arithmetic… unless arithmetic is inconsistent! This discovery shook the world of mathematics.

Here I’m being quite sloppy, just to get the idea across.

For one thing, when I’m saying ‘provable’, I mean provable given some specific axioms for arithmetic, like the Peano axioms. If we change our axioms, different statements will be provable.

For another, the concept of ‘true’ statements in arithmetic is often shunned by logicians. That may sound shocking, but there are many reasons for this: for example, Tarski showed that the truth of statements about arithmetic is undefinable in arithmetic. ‘Provability’ is much easier to deal with.

So, a better way of thinking about Gödel’s result is that he constructed a statement that is neither provable nor disprovable from Peano’s axioms of arithmetic, unless those axioms are inconsistent (in which case we can prove everything, but it’s all worthless).

Furthermore, this result applies not just to Peano’s axioms but to any stronger set of axioms, as long as you can write a computer program to list those axioms.

In 1952, the logician Leon Henkin flipped Gödel’s idea around and asked about a sentence in the language of arithmetic that says:

This sentence is provable.

He asked: is this provable or not? The answer is much less obvious than for Gödel’s sentence. Play around with it and see what I mean.

But in 1954, Martin Hugo Löb showed that Henkin’s sentence is provable!

And Henkin noticed something amazing: Löb’s proof shows much more.

At this point it pays to become a bit more precise. Let us write \mathrm{PA} \vdash P to mean the statement P is provable from the Peano axioms of arithmetic. Gödel figured out how to encode statements in arithmetic as numbers, so let’s write \# P for the Gödel number of any statement P. And Gödel figured out how to write a statement in arithmetic, say

\mathrm{Provable}(n)

which says that the statement with Gödel number n is provable using the Peano axioms.

Using this terminology, what Henkin originally did was find a number n such that the sentence

\mathrm{Provable}(n)

has Gödel number n. So, this sentence says

This sentence is provable from the Peano axioms of arithmetic.

What Löb did was show

\mathrm{PA} \vdash \mathrm{Provable}(n)

In other words, he showed that Henkin sentence really is provable from the Peano axioms!

What Henkin then did is prove that for any sentence P in the language of arithmetic, if

\mathrm{PA} \vdash \mathrm{Provable}(\# P) \implies P

then

\mathrm{PA} \vdash P

In other words, suppose we can prove that the provability of P implies P. Then we can prove P!

At first this merely sounds nightmarishly complicated. But if you think about it long enough, you’ll see it’s downright terrifying! For example, suppose P is some famous open question in arithmetic, like the Twin Prime Conjecture. You might hope to prove

The provability of the Twin Prime Conjecture implies the Twin Prime Conjecture.

Indeed, that seems like a perfectly reasonable thing to want. But it turns out that proving this is as hard as proving the Twin Prime Conjecture! Why? Because if we can prove the boldface sentence above, Löb and Henkin’s work instantly gives us a proof of Twin Prime Conjecture!

What does all this have to do with artificial intelligence?

Well, what I just said is true not only for Peano arithmetic, but any set of axioms including Peano arithmetic that a computer program can list. Suppose your highly logical AI mathematician has some such set of axioms, say \mathrm{AI}. You might want it to trust itself. In other words, you might want

\mathrm{AI} \vdash \mathrm{Provable}(\# P) \implies P

for every sentence P. This says, roughly, that whatever the AI can prove it can prove, it can prove.

But then Löb’s theorem would kick in and give

\mathrm{AI} \vdash P

for every sentence P. And this would be disastrous: our AI would be inconsistent, because it could prove everything!

This is just the beginning of the problems. It gets more involved when we consider AI’s that spawn new AI’s and want to trust them. For more see:

• Eliezer Yudkowsky and Marcello Herreshoff, Tiling agents for self-modifying AI, and the Löbian obstacle.

At workshop various people made progress on this issue, which is recorded in these summaries:

• Eliezer Yudkowsky, The procrastination paradox.

Abstract. A theorem by Marcello Herresho , Benja Fallenstein, and Stuart Armstrong shows that if there exists an infinite series of theories T_i extending \mathrm{PA} where each T_i proves the soundness of T_{i+1}, then all the T_i must have only nonstandard models. We call this the Procrastination Theorem for reasons which will become apparent.

• Benja Fallenstein, An in finitely descending sequence of sound theories each proving the next consistent.

Here Fallenstein constructs a di fferent sequence of theories T_i extending Peano arithmetic such that each T_i proves the consistency of T_{i+1}, and all the theories are sound for \Pi_1 sentences—that is, sentences with only one \forall quantifier outside the rest of the stuff.

The following summaries would take more work to explain:

• Nate Soares, Fallenstein’s monster.

• Nisan Stiennon, Recursively-defined logical theories are well-defined.

• Benja Fallenstein, The 5-and-10 problem and the tiling agents formalism.

• Benja Fallenstein, Decreasing mathematical strength in one formalization of parametric polymorphism.

Again: before reading any of these summaries, I urge you to read Benja Fallenstein’s post, which will help you understand them!


Relative Entropy (Part 3)

25 December, 2013

Holidays are great. There’s nothing I need to do! Everybody is celebrating! So, I can finally get some real work done.

In the last couple of days I’ve finished a paper with Jamie Vicary on wormholes and entanglement… subject to his approval and corrections. More on that later. And now I’ve returned to working on a paper with Tobias Fritz where we give a Bayesian characterization of the concept of ‘relative entropy’. This summer I wrote two blog articles about this paper:

Relative Entropy (Part 1): how various structures important in probability theory arise naturally when you do linear algebra using only the nonnegative real numbers.

Relative Entropy (Part 2): a category related to statistical inference, \mathrm{FinStat}, and how relative entropy defines a functor on this category.

But then Tobias Fritz noticed a big problem. Our characterization of relative entropy was inspired by this paper:

• D. Petz, Characterization of the relative entropy of states of matrix algebras, Acta Math. Hungar. 59 (1992), 449–455.

Here Petz sought to characterize relative entropy both in the ‘classical’ case we are concerned with and in the more general ‘quantum’ setting. Our original goal was merely to express his results in a more category-theoretic framework! Unfortunately Petz’s proof contained a significant flaw. Tobias noticed this and spent a lot of time fixing it, with no help from me.

Our paper is now self-contained, and considerably longer. My job now is to polish it up and make it pretty. What follows is the introduction, which should explain the basic ideas.

A Bayesian characterization of relative entropy

This paper gives a new characterization of the concept of relative entropy, also known as ‘relative information’, ‘information gain’ or ‘Kullback-Leibler divergence’. Whenever we have two probability distributions p and q on the same finite set X, we can define the entropy of q relative to p:

S(q,p) = \sum_{x\in X} q_x \ln\left( \frac{q_x}{p_x} \right)

Here we set

q_x \ln\left( \frac{q_x}{p_x} \right)

equal to \infty when p_x = 0, unless q_x is also zero, in which case we set it equal to 0. Relative entropy thus takes values in [0,\infty].

Intuitively speaking, S(q,p) is the expected amount of information gained when we discover the probability distribution is really q, when we had thought it was p. We should think of p as a ‘prior’ and q as a ‘posterior’. When we take p to be the uniform distribution on X, relative entropy reduces to the ordinary Shannon entropy, up to an additive constant. The advantage of relative entropy is that it makes the role of the prior explicit.

Since Bayesian probability theory emphasizes the role of the prior, relative entropy naturally lends itself to a Bayesian interpretation: it measures how much information we gain given a certain prior. Our goal here is to make this precise in a mathematical characterization of relative entropy. We do this using a category \mathrm{FinStat} where:

• an object (X,q) consists of a finite set X and a probability distribution x \mapsto q_x on that set;

• a morphism $(f,s) : (X,q) \to (Y,r)$ consists of a measure-preserving function $f$ from $X$ to $Y,$ together with a probability distribution $x \mapsto s_{x y}$ on $X$ for each element y \in Y, with the property that s_{xy} = 0 unless f(x) = y.

We can think of an object of \mathrm{FinStat} as a system with some finite set of states together with a probability distribution on its states. A morphism

(f,s) : (X,q) \to (Y,r)

then consists of two parts. First, there is a deterministic measurement process f : X \to Y mapping states of the system being measured, X, to states of the measurement apparatus, Y. The condition that f be measure-preserving says that, after the measurement, the probability that the apparatus be in any state y \in Y is the sum of the probabilities of all states of X leading to that outcome:

\displaystyle{  r_y = \sum_{x \in f^{-1}(y)} q_x }

Second, there is a hypothesis s: an assumption about the probability s_{xy} that the system being measured is in the state x \in X given any measurement outcome y \in Y.

Suppose we have any morphism

(f,s) : (X,q) \to (Y,r)

in \mathrm{FinStat}. From this we obtain two probability distributions on the states of the system being measured. First, we have the probability distribution p given by

\displaystyle{   p_x = \sum_{y \in Y} s_{xy} r_y } \qquad \qquad (1)

This is our prior, given our hypothesis and the probability distribution of measurement results. Second we have the ‘true’ probability distribution q, which would be the posterior if we updated our prior using complete direct knowledge of the system being measured.

It follows that any morphism in \mathrm{FinStat} has a relative entropy S(q,p) associated to it. This is the expected amount of information we gain when we update our prior p to the posterior q.

In fact, this way of assigning relative entropies to morphisms defines a functor

F_0 : \mathrm{FinStat} \to [0,\infty]

where we use [0,\infty] to denote the category with one object, the numbers 0 \le x \le \infty as morphisms, and addition as composition. More precisely, if

(f,s) : (X,q) \to (Y,r)

is any morphism in \mathrm{FinStat}, we define

F_0(f,s) = S(q,p)

where the prior p is defined as in the equation (1).

The fact that F_0 is a functor is nontrivial and rather interesting. It says that given any composable pair of measurement processes:

(X,q) \stackrel{(f,s)}{\longrightarrow} (Y,r) \stackrel{(g,t)}{\longrightarrow} (Z,u)

the relative entropy of their composite is the sum of the relative entropies of the two parts:

F_0((g,t) \circ (f,s)) = F_0(g,t) + F_0(f,s) .

We prove that F_0 is a functor. However, we go much further: we characterize relative entropy by saying that up to a constant multiple, F_0 is the unique functor from \mathrm{FinStat} to [0,\infty] obeying three reasonable conditions.

The first condition is that F_0 vanishes on morphisms (f,s) : (X,q) \to (Y,r) where the hypothesis s is optimal. By this, we mean that Equation (1) gives a prior p equal to the ‘true’ probability distribution q on the states of the system being measured.

The second condition is that F_0 is lower semicontinuous. The set P(X) of probability distibutions on a finite set X naturally has the topology of an (n-1)-simplex when X has n elements. The set [0,\infty] has an obvious topology where it’s homeomorphic to a closed interval. However, with these topologies, the relative entropy does not define a continuous function

\begin{array}{rcl}         S : P(X) \times P(X) &\to& [0,\infty]  \\                                            (q,p) &\mapsto & S(q,p) .  \end{array}

The problem is that

\displaystyle{ S(q,p) = \sum_{x\in X} q_x \ln\left( \frac{q_x}{p_x} \right) }

and we define q_x \ln(q_x/p_x) to be \infty when p_x = 0 and q_x > 0 but 0 when p_x = q_x = 0. So, it turns out that S is only lower semicontinuous, meaning that if p^i , q^i are sequences of probability distributions on X with p^i \to p and q^i \to q then

S(q,p) \le \liminf_{i \to \infty} S(q^i, p^i)

We give the set of morphisms in \mathrm{FinStat} its most obvious topology, and show that with this topology, F_0 maps morphisms to morphisms in a lower semicontinuous way.

The third condition is that F_0 is convex linear. We describe how to take convex linear combinations of morphisms in \mathrm{FinStat}, and then the functor F_0 is convex linear in the sense that it maps any convex linear combination of morphisms in \mathrm{FinStat} to the corresponding convex linear combination of numbers in [0,\infty]. Intuitively, this means that if we take a coin with probability P of landing heads up, and flip it to decide whether to perform one measurement process or another, the expected information gained is P times the expected information gain of the first process plus 1-P times the expected information gain of the second process.

Here, then, is our main theorem:

Theorem. Any lower semicontinuous, convex-linear functor

F : \mathrm{FinStat} \to [0,\infty]

that vanishes on every morphism with an optimal hypothesis must equal some constant times the relative entropy. In other words, there exists some constant c \in [0,\infty] such that

F(f,s) = c F_0(f,s)

for any any morphism (f,s) : (X,p) \to (Y,q) in \mathrm{FinStat}.

Remarks

If you’re a maniacally thorough reader of this blog, with a photographic memory, you’ll recall that our theorem now says ‘lower semicontinuous’, where in Part 2 of this series I’d originally said ‘continuous’.

I’ve fixed that blog article now… but it was Tobias who noticed this mistake. In the process of fixing our proof to address this issue, he eventually noticed that the proof of Petz’s theorem, which we’d been planning to use in our work, was also flawed.

Now I just need to finish polishing the rest of the paper!


Who is Bankrolling the Climate Change Counter-Movement?

23 December, 2013

It’s mostly secret. A new refereed paper by Robert Brulle of Drexel University looks into it.

“The climate change countermovement has had a real political and ecological impact on the failure of the world to act on the issue of global warming,” said Brulle. “Like a play on Broadway, the countermovement has stars in the spotlight—often prominent contrarian scientists or conservative politicians—but behind the stars is an organizational structure of directors, script writers and producers, in the form of conservative foundations. If you want to understand what’s driving this movement, you have to look at what’s going on behind the scenes.”

So he looked, and he found this:

• The biggest known funders of organizations downplaying the importance of man-made climate change are foundations such as the Searle Freedom Trust, the John William Pope Foundation, the Howard Charitable Foundation and the Sarah Scaife Foundation.

• Koch and ExxonMobil have pulled back from publicly visible funding. From 2003 to 2007, the Koch Affiliated Foundations and the ExxonMobil Foundation were heavily involved in funding the climate change countermovement. But since 2008, they are no longer making publicly traceable contributions.

• Funding has shifted to pass through untraceable sources. As traceable funding drops, the amount of funding given to the countermovement by the Donors Trust has risen dramatically. Donors Trust is a donor-directed foundation whose funders cannot be traced. This one foundation now provides about 25% of all traceable foundation funding used by organizations engaged in promoting systematic denial of human-caused climate change.

• Most funding for denial efforts is untraceable. Despite extensive digging, only a small part of the hundreds of millions in contributions to climate change denying organizations can be found out from public records. A group of 91 climate change denial organizations has a total income of $900 million per year—but only $64 million in identifiable foundation support!

All this is from the original paper:

• Robert J. Brulle, Institutionalizing delay: foundation funding and the creation of U.S. climate change counter-movement organizations, Climatic Change, 2013.

and this summary:

Not just Koch brothers: new study reveals funders behind climate change denial effort, ScienceDaily, 20 December 2013.

Ironically, the original paper would probably be hidden behind a paywall if someone hadn’t liberated it. Get your copy now, while you can!

You can also get 120 pages of details—names and dollar amounts—here:

• Robert J. Brulle, Supplementary online material.

Here is Brulle’s pie chart of who is doing the (traceable!) funding—click to enlarge:

And here is his chart of the organizations who are getting funded:

Here is his graph showing the rise of Donors Trust:

And here is his picture of the social network involved in climate change denial. He calls this the Climate Change Counter Movement. You really need to enlarge this one to see anything:


Life’s Struggle to Survive

19 December, 2013

Here’s the talk I gave at the SETI Institute:

When pondering the number of extraterrestrial civilizations, it is worth noting that even after it got started, the success of life on Earth was not a foregone conclusion. In this talk, I recount some thrilling episodes from the history of our planet, some well-documented but others merely theorized: our collision with the planet Theia, the oxygen catastrophe, the snowball Earth events, the Permian-Triassic mass extinction event, the asteroid that hit Chicxulub, and more, including the massive environmental changes we are causing now. All of these hold lessons for what may happen on other planets!

To watch the talk, click on the video above. To see
slides of the talk, click here!

Here’s a mistake in my talk that doesn’t appear in the slides: I suggested that Theia started at the Lagrange point in Earth’s orbit. After my talk, an expert said that at that time, the Solar System had lots of objects with orbits of high eccentricity, and Theia was probably one of these. He said the Lagrange point theory is an idiosyncratic theory, not widely accepted, that somehow found its way onto Wikipedia.

Another issue was brought up in the questions. In a paper in Science, Sherwood and Huber argued that:

Any exceedence of 35 °C for extended periods should
induce hyperthermia in humans and other mammals, as dissipation of metabolic heat becomes impossible. While this never happens now, it would begin to occur with global-mean warming of about 7 °C, calling the habitability of some regions into question. With 11-12 °C warming, such regions would spread to encompass the majority of the human population as currently distributed. Eventual warmings of 12 °C are
possible from fossil fuel burning.

However, the Paleocene-Eocene Thermal Maximum seems to have been even hotter:

So, the question is: where did mammals live during this period, which mammals went extinct, if any, and does the survival of other mammals call into question Sherwood and Huber’s conclusion?


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers