## Information Geometry (Part 16)

1 February, 2017

This week I’m giving a talk on biology and information:

• John Baez, Biology as information dynamics, talk for Biological Complexity: Can it be Quantified?, a workshop at the Beyond Center, 2 February 2017.

While preparing this talk, I discovered a cool fact. I doubt it’s new, but I haven’t exactly seen it elsewhere. I came up with it while trying to give a precise and general statement of ‘Fisher’s fundamental theorem of natural selection’. I won’t start by explaining that theorem, since my version looks rather different than Fisher’s, and I came up with mine precisely because I had trouble understanding his. I’ll say a bit more about this at the end.

Here’s my version:

The square of the rate at which a population learns information is the variance of its fitness.

This is a nice advertisement for the virtues of diversity: more variance means faster learning. But it requires some explanation!

### The setup

Let’s start by assuming we have $n$ different kinds of self-replicating entities with populations $P_1, \dots, P_n.$ As usual, these could be all sorts of things:

• molecules of different chemicals
• organisms belonging to different species
• genes of different alleles
• restaurants belonging to different chains
• people with different beliefs
• game-players with different strategies
• etc.

I’ll call them replicators of different species.

Let’s suppose each population $P_i$ is a function of time that grows at a rate equal to this population times its ‘fitness’. I explained the resulting equation back in Part 9, but it’s pretty simple:

$\displaystyle{ \frac{d}{d t} P_i(t) = f_i(P_1(t), \dots, P_n(t)) \, P_i(t) }$

Here $f_i$ is a completely arbitrary smooth function of all the populations! We call it the fitness of the ith species.

This equation is important, so we want a short way to write it. I’ll often write $f_i(P_1(t), \dots, P_n(t))$ simply as $f_i,$ and $P_i(t)$ simply as $P_i.$ With these abbreviations, which any red-blooded physicist would take for granted, our equation becomes simply this:

$\displaystyle{ \frac{dP_i}{d t} = f_i \, P_i }$

Next, let $p_i(t)$ be the probability that a randomly chosen organism is of the ith species:

$\displaystyle{ p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)} }$

Starting from our equation describing how the populations evolve, we can figure out how these probabilities evolve. The answer is called the replicator equation:

$\displaystyle{ \frac{d}{d t} p_i(t) = ( f_i - \langle f \rangle ) \, p_i(t) }$

Here $\langle f \rangle$ is the average fitness of all the replicators, or mean fitness:

$\displaystyle{ \langle f \rangle = \sum_j f_j(P_1(t), \dots, P_n(t)) \, p_j(t) }$

In what follows I’ll abbreviate the replicator equation as follows:

$\displaystyle{ \frac{dp_i}{d t} = ( f_i - \langle f \rangle ) \, p_i }$

### The result

Okay, now let’s figure out how fast the probability distribution

$p(t) = (p_1(t), \dots, p_n(t))$

changes with time. For this we need to choose a way to measure the length of the vector

$\displaystyle{ \frac{dp}{dt} = (\frac{d}{dt} p_1(t), \dots, \frac{d}{dt} p_n(t)) }$

And here information geometry comes to the rescue! We can use the Fisher information metric, which is a Riemannian metric on the space of probability distributions.

I’ve talked about the Fisher information metric in many ways in this series. The most important fact is that as a probability distribution $p(t)$ changes with time, its speed

$\displaystyle{ \left\| \frac{dp}{dt} \right\|}$

as measured using the Fisher information metric can be seen as the rate at which information is learned. I’ll explain that later. Right now I just want a simple formula for the Fisher information metric. Suppose $v$ and $w$ are two tangent vectors to the point $p$ in the space of probability distributions. Then the Fisher information metric is given as follows:

$\displaystyle{ \langle v, w \rangle = \sum_i \frac{1}{p_i} \, v_i w_i }$

Using this we can calculate the speed at which $p(t)$ moves when it obeys the replicator equation. Actually the square of the speed is simpler:

$\begin{array}{ccl} \displaystyle{ \left\| \frac{dp}{dt} \right\|^2 } &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( \frac{dp_i}{dt} \right)^2 } \\ \\ &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( ( f_i - \langle f \rangle ) \, p_i \right)^2 } \\ \\ &=& \displaystyle{ \sum_i ( f_i - \langle f \rangle )^2 p_i } \end{array}$

The answer has a nice meaning, too! It’s just the variance of the fitness: that is, the square of its standard deviation.

So, if you’re willing to buy my claim that the speed $\|dp/dt\|$ is the rate at which our population learns new information, then we’ve seen that the square of the rate at which a population learns information is the variance of its fitness!

### Fisher’s fundamental theorem

Now, how is this related to Fisher’s fundamental theorem of natural selection? First of all, what is Fisher’s fundamental theorem? Here’s what Wikipedia says about it:

It uses some mathematical notation but is not a theorem in the mathematical sense.

It states:

“The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.”

Or in more modern terminology:

“The rate of increase in the mean fitness of any organism at any time ascribable to natural selection acting through changes in gene frequencies is exactly equal to its genetic variance in fitness at that time”.

Largely as a result of Fisher’s feud with the American geneticist Sewall Wright about adaptive landscapes, the theorem was widely misunderstood to mean that the average fitness of a population would always increase, even though models showed this not to be the case. In 1972, George R. Price showed that Fisher’s theorem was indeed correct (and that Fisher’s proof was also correct, given a typo or two), but did not find it to be of great significance. The sophistication that Price pointed out, and that had made understanding difficult, is that the theorem gives a formula for part of the change in gene frequency, and not for all of it. This is a part that can be said to be due to natural selection

Price’s paper is here:

• George R. Price, Fisher’s ‘fundamental theorem’ made clear, Annals of Human Genetics 36 (1972), 129–140.

I don’t find it very clear, perhaps because I didn’t spend enough time on it. But I think I get the idea.

My result is a theorem in the mathematical sense, though quite an easy one. I assume a population distribution evolves according to the replicator equation and derive an equation whose right-hand side matches that of Fisher’s original equation: the variance of the fitness.

But my left-hand side is different: it’s the square of the speed of the corresponding probability distribution, where speed is measured using the ‘Fisher information metric’. This metric was discovered by the same guy, Ronald Fisher, but I don’t think he used it in his work on the fundamental theorem!

Something a bit similar to my statement appears as Theorem 2 of this paper:

• Marc Harper, Information geometry and evolutionary game theory.

and for that theorem he cites:

• Josef Hofbauer and Karl Sigmund, Evolutionary Games and Population Dynamics, Cambridge University Press, Cambridge, 1998.

However, his Theorem 2 really concerns the rate of increase of fitness, like Fisher’s fundamental theorem. Moreover, he assumes that the probability distribution $p(t)$ flows along the gradient of a function, and I’m not assuming that. Indeed, my version applies to situations where the probability distribution moves round and round in periodic orbits!

### Relative information and the Fisher information metric

The key to generalizing Fisher’s fundamental theorem is thus to focus on the speed at which $p(t)$ moves, rather than the increase in fitness. Why do I call this speed the ‘rate at which the population learns information’? It’s because we’re measuring this speed using the Fisher information metric, which is closely connected to relative information, also known as relative entropy or the Kullback–Leibler divergence.

I explained this back in Part 7, but that explanation seems hopelessly technical to me now, so here’s a faster one, which I created while preparing my talk.

The information of a probability distribution $q$ relative to a probability distribution $p$ is

$\displaystyle{ I(q,p) = \sum_{i =1}^n q_i \log\left(\frac{q_i}{p_i}\right) }$

It says how much information you learn if you start with a hypothesis $p$ saying that the probability of the ith situation was $p_i,$ and then update this to a new hypothesis $q.$

Now suppose you have a hypothesis that’s changing with time in a smooth way, given by a time-dependent probability $p(t).$ Then a calculation shows that

$\displaystyle{ \left.\frac{d}{dt} I(p(t),p(t_0)) \right|_{t = t_0} = 0 }$

for all times $t_0$. This seems paradoxical at first. I like to jokingly put it this way:

To first order, you’re never learning anything.

However, as long as the velocity $\frac{d}{dt}p(t_0)$ is nonzero, we have

$\displaystyle{ \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} > 0 }$

so we can say

To second order, you’re always learning something… unless your opinions are fixed.

This lets us define a ‘rate of learning’—that is, a ‘speed’ at which the probability distribution $p(t)$ moves. And this is precisely the speed given by the Fisher information metric!

In other words:

$\displaystyle{ \left\|\frac{dp}{dt}(t_0)\right\|^2 = \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} }$

where the length is given by Fisher information metric. Indeed, this formula can be used to define the Fisher information metric. From this definition we can easily work out the concrete formula I gave earlier.

In summary: as a probability distribution moves around, the relative information between the new probability distribution and the original one grows approximately as the square of time, not linearly. So, to talk about a ‘rate at which information is learned’, we need to use the above formula, involving a second time derivative. This rate is just the speed at which the probability distribution moves, measured using the Fisher information metric. And when we have a probability distribution describing how many replicators are of different species, and it’s evolving according to the replicator equation, this speed is also just the variance of the fitness!

## Biology as Information Dynamics (Part 1)

31 January, 2017

This is my talk for the workshop Biological Complexity: Can It Be Quantified?

• John Baez, Biology as information dynamics, 2 February 2017.

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’—a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Leibler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clean general formulation of Fisher’s fundamental theorem of natural selection.

• Marc Harper, The replicator equation as an inference dynamic.

• Marc Harper, Information geometry and evolutionary game theory.

• Barry Sinervo and Curt M. Lively, The rock-paper-scissors game and the evolution of alternative male strategies, Nature 380 (1996), 240–243.

• John Baez, Diversity, entropy and thermodynamics.

• John Baez, Information geometry.

The last reference contains proofs of the equations shown in red in my slides.
In particular, Part 16 contains a proof of my updated version of Fisher’s fundamental theorem.

## Quantifying Biological Complexity

23 January, 2017

Next week I’m going to this workshop:

Biological Complexity: Can It Be Quantified?, 1-3 February 2017, Beyond Center for Fundamental Concepts in Science, Arizona State University, Tempe Arizona. Organized by Paul Davies.

I haven’t heard that any of it will be made publicly available, but I’ll see if there’s something I can show you. Here’s the schedule:

### Wednesday February 1st

9:00 – 9:30 am Paul Davies

Brief welcome address, outline of the subject and aims of the meeting

#### Session 1. Life: do we know it when we see it?

9:30 – 10:15 am: Chris McKay, “Mission to Enceladus”

10:15 – 10:45 am: Discussion

10:45– 11:15 am: Tea/coffee break

11:15 – 12:00 pm: Kate Adamala, “Alive but not life”

12:00 – 12:30 pm: Discussion

12:30 – 2:00 pm: Lunch

#### Session 2. Quantifying life

2:00 – 2:45 pm: Lee Cronin, “The living and the dead: molecular signatures of life”

2:45 – 3:30 pm: Sara Walker, “Can we build a life meter?”

3:30 – 4:00 pm: Discussion

4:00 – 4:30 pm: Tea/coffee break

4:30 – 5:15 pm: Manfred Laubichler, “Complexity is smaller than you think”

5:15 – 5:30 pm: Discussion

#### The Beyond Annual Lecture

7:00 – 8:30 pm: Sean Carroll, “Our place in the universe”

### Thursday February 2nd

#### Session 3: Life, information and the second law of thermodynamics

9:00 – 9:45 am: James Crutchfield, “Vital bits: the fuel of life”

9:45 – 10:00 am: Discussion

10:00 – 10:45 pm: John Baez, “Information and entropy in biology”

10:45 – 11:00 am: Discussion

11:00 – 11:30 pm: Tea/coffee break

11:30 – 12:15 pm: Chris Adami, “What is biological information?”

12:15 – 12:30 pm: Discussion

12:30 – 2:00 pm: Lunch

#### Session 4: The emergence of agency

2:00 – 2:45 pm: Olaf Khang Witkowski, “When do autonomous agents act collectively?”

2:45 – 3:00 pm: Discussion

3:00 – 3:45 pm: William Marshall, “When macro beats micro”

3:45 – 4:00 pm: Discussion

4:00 – 4:30 am: Tea/coffee break

4:30 – 5:15pm: Alexander Boyd, “Biology’s demons”

5:15 – 5:30 pm: Discussion

### Friday February 3rd

#### Session 5: New physics?

9:00 – 9:45 am: Sean Carroll, “Laws of complexity, laws of life?”

9:45 – 10:00 am: Discussion

10:00 – 10:45 am: Andreas Wagner, “The arrival of the fittest”

10:45 – 11:00 am: Discussion

11:00 – 11:30 am: Tea/coffee break

11:30 – 12:30 pm: George Ellis, “Top-down causation demands new laws”

12:30 – 2:00 pm: Lunch

## Algorithmic Thermodynamics (Part 3)

15 November, 2016

This is my talk for the Santa Fe Institute workshop on Statistical Mechanics, Information Processing and Biology:

It’s about the link between computation and entropy. I take the idea of a Turing machine for granted, but starting with that I explain recursive functions, the Church-Turing thesis, Kolomogorov complexity, the relation between Kolmogorov complexity and Shannon entropy, the uncomputability of Kolmogorov complexity, the ‘complexity barrier’, Levin’s computable version of complexity, and finally my work with Mike Stay on algorithmic thermodynamics.

In my talk slides I mention the ‘complexity barrier’, and state this theorem:

Theorem. Choose your favorite set of axioms for math. If it’s finite and consistent, there exists C ≥ 0, the complexity barrier, such that for no natural number n can you prove the Kolmogorov complexity of n exceeds C.

For a sketch of the proof of this result, go here:

In my talk I showed a movie related to this: an animated video created in 2009 using a program less than 4 kilobytes long that runs on a Windows XP machine:

### For more

For more details, read our paper:

• John Baez and Mike Stay, Algorithmic thermodynamics, Math. Struct. Comp. Sci. 22 (2012), 771-787.

or these blog articles:

They all emphasize slightly different aspects!

## Information Processing and Biology

7 November, 2016

The Santa Fe Institute, in New Mexico, is a place for studying complex systems. I’ve never been there! Next week I’ll go there to give a colloquium on network theory, and also to participate in this workshop:

Statistical Mechanics, Information Processing and Biology, November 16–18, Santa Fe Institute. Organized by David Krakauer, Michael Lachmann, Manfred Laubichler, Peter Stadler, and David Wolpert.

Abstract. This workshop will address a fundamental question in theoretical biology: Does the relationship between statistical physics and the need of biological systems to process information underpin some of their deepest features? It recognizes that a core feature of biological systems is that they acquire, store and process information (i.e., perform computation). However to manipulate information in this way they require a steady flux of free energy from their environments. These two, inter-related attributes of biological systems are often taken for granted; they are not part of standard analyses of either the homeostasis or the evolution of biological systems. In this workshop we aim to fill in this major gap in our understanding of biological systems, by gaining deeper insight in the relation between the need for biological systems to process information and the free energy they need to pay for that processing.

The goal of this workshop is to address these issues by focusing on a set three specific questions: 1) How has the fraction of free energy flux on earth that is used by biological computation changed with time? 2) What is the free energy cost of biological computation or functioning? 3) What is the free energy cost of the evolution of biological computation or functioning? In all of these cases we are interested in the fundamental limits that the laws of physics impose on various aspects of living systems as expressed by these three questions.

I think it’s not open to the public, but I will try to blog about it. The speakers include a lot of experts on information theory, statistical mechanics, and biology. Here they are:

Wednesday November 16: Chris Jarzynski, Seth Lloyd, Artemy Kolchinski, John Baez, Manfred Laubichler, Harold de Vladar, Sonja Prohaska, Chris Kempes.

Thursday November 17: Phil Ball, Matina C. Donaldson-Matasci, Sebastian Deffner, David Wolpert, Daniel Polani, Christoph Flamm, Massimiliano Esposito, Hildegard Meyer-Ortmanns, Blake Pollard, Mikhail Prokopenko, Peter Stadler, Ben Machta.

Friday November 18: Jim Crutchfield, Sara Walker, Hyunju Kim, Takahiro Sagawa, Michael Lachmann, Wojciech Zurek, Christian Van den Broeck, Susanne Still, Chris Stephens.

## Information Geometry (Part 15)

14 January, 2016

joint with Blake Pollard

Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).

The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how life works. That’s a difficult job! But one has to start somewhere, and this is one place to start.

We have a few papers on this subject:

• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)

However, right now we just want to show you three closely connected results about how relative entropy changes in open Markov processes.

### Definitions

An open Markov process consists of a finite set $X$ of states, a subset $B \subseteq X$ of boundary states, and an infinitesimal stochastic operator $H: \mathbb{R}^X \to \mathbb{R}^X,$ meaning a linear operator with

$H_{ij} \geq 0 \ \ \text{for all} \ \ i \neq j$

and

$\sum_i H_{ij} = 0 \ \ \text{for all} \ \ j$

For each state $i \in X$ we introduce a population $p_i \in [0,\infty).$ We call the resulting function $p : X \to [0,\infty)$ the population distribution.

Populations evolve in time according to the open master equation:

$\displaystyle{ \frac{dp_i}{dt} = \sum_j H_{ij}p_j} \ \ \text{for all} \ \ i \in X-B$

$p_i(t) = b_i(t) \ \ \text{for all} \ \ i \in B$

So, the populations $p_i$ obey a linear differential equation at states $i$ that are not in the boundary, but they are specified ‘by the user’ to be chosen functions $b_i$ at the boundary states. The off-diagonal entry $H_{ij}$ for $i \neq j$ describe the rate at which population transitions from the $j$th to the $i$th state.

A closed Markov process, or continuous-time discrete-state Markov chain, is an open Markov process whose boundary is empty. For a closed Markov process, the open master equation becomes the usual master equation:

$\displaystyle{ \frac{dp}{dt} = Hp }$

In a closed Markov process the total population is conserved:

$\displaystyle{ \frac{d}{dt} \sum_{i \in X} p_i = \sum_{i,j} H_{ij}p_j = 0 }$

This lets us normalize the initial total population to 1 and have it stay equal to 1. If we do this, we can talk about probabilities instead of populations. In an open Markov process, population can flow in and out at the boundary states.

For any pair of distinct states $i,j,$ $H_{ij}p_j$ is the flow of population from $j$ to $i.$ The net flux of population from the $j$th state to the $i$th state is the flow from $j$ to $i$ minus the flow from $i$ to $j$:

$J_{ij} = H_{ij}p_j - H_{ji}p_i$

A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. The idea is that population can flow in or out at the boundary states.

We say an equilibrium $p : X \to [0,\infty)$ of a Markov process is detailed balanced if all the net fluxes vanish:

$J_{ij} = 0 \ \ \text{for all} \ \ i,j \in X$

or in other words:

$H_{ij}p_j = H_{ji}p_i \ \ \text{for all} \ \ i,j \in X$

Given two population distributions $p, q : X \to [0,\infty)$ we can define the relative entropy

$\displaystyle{ I(p,q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right)}$

When $q$ is a detailed balanced equilibrium solution of the master equation, the relative entropy can be seen as the ‘free energy’ of $p.$ For a precise statement, see Section 4 of Relative entropy in biological systems.

The Second Law of Thermodynamics implies that the free energy of a closed system tends to decrease with time, so for closed Markov processes we expect $I(p,q)$ to be nonincreasing. And this is true! But for open Markov processes, free energy can flow in from outside. This is just one of several nice results about how relative entropy changes with time.

### Results

Theorem 1. Consider an open Markov process with $X$ as its set of states and $B$ as the set of boundary states. Suppose $p(t)$ and $q(t)$ obey the open master equation, and let the quantities

$\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_{j \in X} H_{ij}p_j }$

$\displaystyle{ \frac{Dq_i}{Dt} = \frac{dq_i}{dt} - \sum_{j \in X} H_{ij}q_j }$

measure how much the time derivatives of $p_i$ and $q_i$ fail to obey the master equation. Then we have

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)} \\ \\ && \; + \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} } \end{array}$

This result separates the change in relative entropy change into two parts: an ‘internal’ part and a ‘boundary’ part.

It turns out the ‘internal’ part is always less than or equal to zero. So, from Theorem 1 we can deduce a version of the Second Law of Thermodynamics for open Markov processes:

Theorem 2. Given the conditions of Theorem 1, we have

$\displaystyle{ \frac{d}{dt} I(p(t),q(t)) \; \le \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} }$

Intuitively, this says that free energy can only increase if it comes in from the boundary!

There is another nice result that holds when $q$ is an equilibrium solution of the master equation. This idea seems to go back to Schnakenberg:

Theorem 3. Given the conditions of Theorem 1, suppose also that $q$ is an equilibrium solution of the master equation. Then we have

$\displaystyle{ \frac{d}{dt} I(p(t),q) = -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} \; + \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} }$

where

$J_{ij} = H_{ij}p_j - H_{ji}p_i$

is the net flux from $j$ to $i,$ while

$\displaystyle{ A_{ij} = \ln \left(\frac{p_j q_i}{p_i q_j} \right) }$

is the conjugate thermodynamic force.

The flux $J_{ij}$ has a nice meaning: it’s the net flow of population from $j$ to $i.$ The thermodynamic force is a bit subtler, but this theorem reveals its meaning: it says how much the population wants to flow from $j$ to $i.$

More precisely, up to that factor of $1/2,$ the thermodynamic force $A_{ij}$ says how much free energy loss is caused by net flux from $j$ to $i.$ There’s a nice analogy here to water losing potential energy as it flows downhill due to the force of gravity.

### Proofs

Proof of Theorem 1. We begin by taking the time derivative of the relative information:

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i \in X} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} + \frac{\partial I}{\partial q_i} \frac{dq_i}{dt} } \end{array}$

We can separate this into a sum over states $i \in X - B,$ for which the time derivatives of $p_i$ and $q_i$ are given by the master equation, and boundary states $i \in B,$ for which they are not:

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i \in X-B, \; j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j }\\ \\ && + \; \; \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} + \frac{\partial I}{\partial q_i} \frac{dq_i}{dt}} \end{array}$

For boundary states we have

$\displaystyle{ \frac{dp_i}{dt} = \frac{Dp_i}{Dt} + \sum_{j \in X} H_{ij}p_j }$

and similarly for the time derivative of $q_i.$ We thus obtain

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i,j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j }\\ \\ && + \; \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt}} \end{array}$

To evaluate the first sum, recall that

$\displaystyle{ I(p,q) = \sum_{i \in X} p_i \ln (\frac{p_i}{q_i})}$

so

$\displaystyle{\frac{\partial I}{\partial p_i}} =\displaystyle{1 + \ln (\frac{p_i}{q_i})} , \qquad \displaystyle{ \frac{\partial I}{\partial q_i}}= \displaystyle{- \frac{p_i}{q_i} }$

Thus, we have

$\displaystyle{ \sum_{i,j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j = \sum_{i,j\in X} (1 + \ln (\frac{p_i}{q_i})) H_{ij} p_j - \frac{p_i}{q_i} H_{ij} q_j }$

We can rewrite this as

$\displaystyle{ \sum_{i,j \in X} H_{ij} p_j \left( 1 + \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }$

Since $H_{ij}$ is infinitesimal stochastic we have $\sum_{i} H_{ij} = 0,$ so the first term drops out, and we are left with

$\displaystyle{ \sum_{i,j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }$

as desired.   █

Proof of Theorem 2. Thanks to Theorem 1, to prove

$\displaystyle{ \frac{d}{dt} I(p(t),q(t)) \; \le \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} }$

it suffices to show that

$\displaystyle{ \sum_{i,j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) \le 0 }$

or equivalently (recalling the proof of Theorem 1):

$\displaystyle{ \sum_{i,j} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \le 0 }$

The last two terms on the left hand side cancel when $i = j.$ Thus, if we break the sum into an $i \ne j$ part and an $i = j$ part, the left side becomes

$\displaystyle{ \sum_{i \ne j} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \; + \; \sum_j H_{jj} p_j \ln(\frac{p_j}{q_j}) }$

Next we can use the infinitesimal stochastic property of $H$ to write $H_{jj}$ as the sum of $-H_{ij}$ over $i$ not equal to $j,$ obtaining

$\displaystyle{ \sum_{i \ne j} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) - \sum_{i \ne j} H_{ij} p_j \ln(\frac{p_j}{q_j}) } =$

$\displaystyle{ \sum_{i \ne j} H_{ij} p_j \left( \ln(\frac{p_iq_j}{p_j q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) }$

Since $H_{ij} \ge 0$ when $i \ne j$ and $\ln(s) + 1 - s \le 0$ for all $s > 0,$ we conclude that this quantity is $\le 0.$   █

Proof of Theorem 3. Now suppose also that $q$ is an equilibrium solution of the master equation. Then $Dq_i/Dt = dq_i/dt = 0$ for all states $i,$ so by Theorem 1 we need to show

$\displaystyle{ \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) \; = \; -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }$

We also have $\sum_{j \in X} H_{ij} q_j = 0,$ so the second
term in the sum at left vanishes, and it suffices to show

$\displaystyle{ \sum_{i, j \in X} H_{ij} p_j \ln(\frac{p_i}{q_i}) \; = \; - \frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }$

By definition we have

$\displaystyle{ \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} = \displaystyle{ \frac{1}{2} \sum_{i,j} \left( H_{ij} p_j - H_{ji}p_i \right) \ln \left( \frac{p_j q_i}{p_i q_j} \right) }$

This in turn equals

$\displaystyle{ \frac{1}{2} \sum_{i,j} H_{ij}p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) - \frac{1}{2} \sum_{i,j} H_{ji}p_i \ln \left( \frac{p_j q_i}{p_i q_j} \right) }$

and we can switch the dummy indices $i,j$ in the second sum, obtaining

$\displaystyle{ \frac{1}{2} \sum_{i,j} H_{ij}p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) - \frac{1}{2} \sum_{i,j} H_{ij}p_j \ln \left( \frac{p_i q_j}{p_j q_i} \right) }$

or simply

$\displaystyle{ \sum_{i,j} H_{ij} p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) }$

But this is

$\displaystyle{ \sum_{i,j} H_{ij} p_j \left(\ln ( \frac{p_j}{q_j}) + \ln (\frac{q_i}{p_i}) \right) }$

and the first term vanishes because $H$ is infinitesimal stochastic: $\sum_i H_{ij} = 0.$ We thus have

$\displaystyle{ \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} = \sum_{i,j} H_{ij} p_j \ln (\frac{q_i}{p_i} )$

as desired.   █

## Information Geometry (Part 14)

11 January, 2016

joint with Blake Pollard

It’s been a long time since you’ve seen an installment of the information geometry series on this blog! Before I took a long break, I was explaining relative entropy and how it changes in evolutionary games. Much of what I said is summarized and carried further here:

• John Baez and Blake Pollard, Relative entropy in biological systems, Entropy 18 (2016), 46. (Blog article here.)

But now Blake has a new paper, and I want to talk about that:

• Blake Pollard, Open Markov processes: a compositional perspective on non-equilibrium steady states in biology, Entropy 18 (2016), 140.

I’ll focus on just one aspect: the principle of minimum entropy production. This is an exciting yet controversial principle in non-equilibrium thermodynamics. Blake examines it in a situation where we can tell exactly what’s happening.

Life exists away from equilibrium. Left isolated, systems will tend toward thermodynamic equilibrium. However, biology is about open systems: physical systems that exchange matter or energy with their surroundings. Open systems can be maintained away from equilibrium by this exchange. This leads to the idea of a non-equilibrium steady state—a state of an open system that doesn’t change, but is not in equilibrium.

A simple example is a pan of water sitting on a stove. Heat passes from the flame to the water and then to the air above. If the flame is very low, the water doesn’t boil and nothing moves. So, we have a steady state, at least approximately. But this is not an equilibrium, because there is a constant flow of energy through the water.

Of course in reality the water will be slowly evaporating, so we don’t really have a steady state. As always, models are approximations. If the water is evaporating slowly enough, it can be useful to approximate the situation with a non-equilibrium steady state.

There is much more to biology than steady states. However, to dip our toe into the chilly waters of non-equilibrium thermodynamics, it is nice to start with steady states. And already here there are puzzles left to solve.

### Minimum entropy production

Ilya Prigogine won the Nobel prize for his work on non-equilibrium thermodynamics. One reason is that he had an interesting idea about steady states. He claimed that under certain conditions, a non-equilibrium steady state will minimize entropy production!

There has been a lot of work trying to make the ‘principle of minimum entropy production’ precise and turn it into a theorem. In this book:

• G. Lebon and D. Jou, Understanding Non-equilibrium Thermodynamics, Springer, Berlin, 2008.

the authors give an argument for the principle of minimum entropy production based on four conditions:

time-independent boundary conditions: the surroundings of the system don’t change with time.

linear phenomenological laws: the laws governing the macroscopic behavior of the system are linear.

constant phenomenological coefficients: the laws governing the macroscopic behavior of the system don’t change with time.

symmetry of the phenomenological coefficients: since they are linear, the laws governing the macroscopic behavior of the system can be described by a linear operator, and we demand that in a suitable basis the matrix for this operator is symmetric: $T_{ij} = T_{ji}.$

The last condition is obviously the subtlest one; it’s sometimes called Onsager reciprocity, and people have spent a lot of time trying to derive it from other conditions.

However, Blake goes in a different direction. He considers a concrete class of open systems, a very large class called ‘open Markov processes’. These systems obey the first three conditions listed above, and the ‘detailed balanced’ open Markov processes also obey the last one. But Blake shows that minimum entropy production holds only approximately—with the approximation being good for steady states that are near equilibrium!

However, he shows that another minimum principle holds exactly, even for steady states that are far from equilibrium. He calls this the ‘principle of minimum dissipation’.

We actually discussed the principle of minimum dissipation in an earlier paper:

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

But one advantage of Blake’s new paper is that it presents the results with a minimum of category theory. Of course I love category theory, and I think it’s the right way to formalize open systems, but it can be intimidating.

Another good thing about Blake’s new paper is that it explicitly compares the principle of minimum entropy to the principle of minimum dissipation. He shows they agree in a certain limit—namely, the limit where the system is close to equilibrium.

Let me explain this. I won’t include the nice example from biology that Blake discusses: a very simple model of membrane transport. For that, read his paper! I’ll just give the general results.

### The principle of minimum dissipation

An open Markov process consists of a finite set $X$ of states, a subset $B \subseteq X$ of boundary states, and an infinitesimal stochastic operator $H: \mathbb{R}^X \to \mathbb{R}^X,$ meaning a linear operator with

$H_{ij} \geq 0 \ \ \text{for all} \ \ i \neq j$

and

$\sum_i H_{ij} = 0 \ \ \text{for all} \ \ j$

I’ll explain these two conditions in a minute.

For each $i \in X$ we introduce a population $p_i \in [0,\infty).$ We call the resulting function $p : X \to [0,\infty)$ the population distribution. Populations evolve in time according to the open master equation:

$\displaystyle{ \frac{dp_i}{dt} = \sum_j H_{ij}p_j} \ \ \text{for all} \ \ i \in X-B$

$p_i(t) = b_i(t) \ \ \text{for all} \ \ i \in B$

So, the populations $p_i$ obey a linear differential equation at states $i$ that are not in the boundary, but they are specified ‘by the user’ to be chosen functions $b_i$ at the boundary states.

The off-diagonal entries $H_{ij}, \ i \neq j$ are the rates at which population hops from the $j$th to the $i$th state. This lets us understand the definition of an infinitesimal stochastic operator. The first condition:

$H_{ij} \geq 0 \ \ \text{for all} \ \ i \neq j$

says that the rate for population to transition from one state to another is non-negative. The second:

$\sum_i H_{ij} = 0 \ \ \text{for all} \ \ j$

says that population is conserved, at least if there are no boundary states. Population can flow in or out at boundary states, since the master equation doesn’t hold there.

A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. Again, the reason is that population can flow in or out at the boundary.

We say an equilibrium $q : X \to [0,\infty)$ of a Markov process is detailed balanced if the rate at which population flows from the $i$th state to the $j$th state is equal to the rate at which it flows from the $j$th state to the $i$th:

$H_{ji}q_i = H_{ij}q_j \ \ \text{for all} \ \ i,j \in X$

Suppose we’ve got an open Markov process that has a detailed balanced equilibrium $q$. Then a non-equilibrium steady state $p$ will minimize a function called the ‘dissipation’, subject to constraints on its boundary populations. There’s a nice formula for the dissipation in terms of $p$ and $q.$

Definition. Given an open Markov process with detailed balanced equilibrium $q$ we define the dissipation for a population distribution $p$ to be

$\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j}{q_j} - \frac{p_i}{q_i} \right)^2 }$

This formula is a bit tricky, but you’ll notice it’s quadratic in $p$ and it vanishes when $p = q.$ So, it’s pretty nice.

Using this concept we can formulate a principle of minimum dissipation, and prove that non-equilibrium steady states obey this principle:

Definition. We say a population distribution $p: X \to \mathbb{R}$ obeys the principle of minimum dissipation with boundary population $b: X \to \mathbb{R}$ if $p$ minimizes $D(p)$ subject to the constraint that

$p_i = b_i \ \ \text{for all} \ \ i \in B$

Theorem 1. A population distribution $p$ is a steady state with $p_i = b_i$ for all boundary states $i$ if and only if $p$ obeys the principle of minimum dissipation with boundary population $b.$

Proof. This follows from Theorem 28 in A compositional framework for Markov processes.

### Minimum entropy production versus minimum dissipation

How does dissipation compare with entropy production? To answer this, first we must ask: what really is entropy production? And: how does the equilibrium state $q$ show up in the concept of entropy production?

The relative entropy of two population distributions $p,q$ is given by

$\displaystyle{ I(p,q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right) }$

It is well known that for a closed Markov process with $q$ as a detailed balanced equilibrium, the relative entropy is monotonically decreasing with time. This is due to an annoying sign convention in the definition of relative entropy: while entropy is typically increasing, relative entropy typically decreases. We could fix this by putting a minus sign in the above formula or giving this quantity $I(p,q)$ some other name. A lot of people call it the Kullback–Leibler divergence, but I have taken to calling it relative information. For more, see:

• John Baez and Blake Pollard, Relative entropy in biological systems. (Blog article here.)

We say ‘relative entropy’ in the title, but then we explain why ‘relative information’ is a better name, and use that. More importantly, we explain why $I(p,q)$ has the physical meaning of free energy. Free energy tends to decrease, so everything is okay. For details, see Section 4.

Blake has a nice formula for how fast $I(p,q)$ decreases:

Theorem 2. Consider an open Markov process with $X$ as its set of states and $B$ as the set of boundary states. Suppose $p(t)$ obeys the open master equation and $q$ is a detailed balanced equilibrium. For any boundary state $i \in B,$ let

$\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_{j \in X} H_{ij}p_j }$

measure how much $p_i$ fails to obey the master equation. Then we have

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q) } &=& \displaystyle{ \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)} \\ \\ && \; + \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} } \end{array}$

Moreover, the first term is less than or equal to zero.

Proof. For a self-contained proof, see Information geometry (part 15), which is coming up soon. It will be a special case of the theorems there.   █

Blake compares this result to previous work by Schnakenberg:

• J. Schnakenberg, Network theory of microscopic and macroscopic behavior of master equation systems, Rev. Mod. Phys. 48 (1976), 571–585.

The negative of Blake’s first term is this:

$\displaystyle{ K(p) = - \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }$

Under certain circumstances, this equals what Schnakenberg calls the entropy production. But a better name for this quantity might be free energy loss, since for a closed Markov process that’s exactly what it is! In this case there are no boundary states, so the theorem above says $K(p)$ is the rate at which relative entropy—or in other words, free energy—decreases.

For an open Markov process, things are more complicated. The theorem above shows that free energy can also flow in or out at the boundary, thanks to the second term in the formula.

Anyway, the sensible thing is to compare a principle of ‘minimum free energy loss’ to the principle of minimum dissipation. The principle of minimum dissipation is true. How about the principle of minimum free energy loss? It turns out to be approximately true near equilibrium.

For this, consider the situation in which $p$ is near to the equilibrium distribution $q$ in the sense that

$\displaystyle{ \frac{p_i}{q_i} = 1 + \epsilon_i }$

for some small numbers $\epsilon_i.$ We collect these numbers in a vector called $\epsilon.$

Theorem 3. Consider an open Markov process with $X$ as its set of states and $B$ as the set of boundary states. Suppose $q$ is a detailed balanced equilibrium and let $p$ be arbitrary. Then

$K(p) = D(p) + O(\epsilon^2)$

where $K(p)$ is the free energy loss, $D(p)$ is the dissipation, $\epsilon_i$ is defined as above, and by $O(\epsilon^2)$ we mean a sum of terms of order $\epsilon_i^2.$

Proof. First take the free energy loss:

$\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)}$

Expanding the logarithm to first order in $\epsilon,$ we get

$\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} p_j \left( \frac{p_i}{q_i} - 1 - \frac{p_i q_j}{p_j q_i} \right) + O(\epsilon^2) }$

Since $H$ is infinitesimal stochastic, $\sum_i H_{ij} = 0,$ so the second term in the sum vanishes, leaving

$\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} p_j \left( \frac{p_i}{q_i} - \frac{p_i q_j}{p_j q_i} \right) \; + O(\epsilon^2) }$

or

$\displaystyle{ K(p) = -\sum_{i, j \in X} \left( H_{ij} p_j \frac{p_i}{q_i} - H_{ij} q_j \frac{p_i}{q_i} \right) \; + O(\epsilon^2) }$

Since $q$ is a equilibrium we have $\sum_j H_{ij} q_j = 0,$ so now the last term in the sum vanishes, leaving

$\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} \frac{p_i p_j}{q_i} \; + O(\epsilon^2) }$

Next, take the dissipation

$\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j}{q_j} - \frac{p_i}{q_i} \right)^2 }$

and expand the square, getting

$\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j^2}{q_j^2} - 2\frac{p_i p_j}{q_i q_j} + \frac{p_i^2}{q_i^2} \right) }$

Since $H$ is infinitesimal stochastic, $\sum_i H_{ij} = 0.$ The first term is just this times a function of $j,$ summed over $j,$ so it vanishes, leaving

$\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left(- 2\frac{p_i p_j}{q_i q_j} + \frac{p_i^2}{q_i^2} \right) }$

Since $q$ is an equilibrium, $\sum_j H_{ij} q_j = 0.$ The last term above is this times a function of $i,$ summed over $i,$ so it vanishes, leaving

$\displaystyle{ D(p) = - \sum_{i,j} H_{ij}q_j \frac{p_i p_j}{q_i q_j} = - \sum_{i,j} H_{ij} \frac{p_i p_j}{q_i} }$

This matches what we got for $K(p),$ up to terms of order $O(\epsilon^2).$   █

In short: detailed balanced open Markov processes are governed by the principle of minimum dissipation, not minimum entropy production. Minimum dissipation agrees with minimum entropy production only near equilibrium.

## Relative Entropy in Biological Systems

27 November, 2015

Here’s a paper for the proceedings of a workshop on Information and Entropy in Biological System this spring:

• John Baez and Blake Pollard, Relative entropy in biological systems, with Blake S. Pollard, Entropy 18 (2016), 46.

We’d love any comments or questions you might have. I’m not happy with the title. In the paper we advocate using the term ‘relative information’ instead of ‘relative entropy’—yet the latter is much more widely used, so I feel we need it in the title to let people know what the paper is about!

Here’s the basic idea.

Life relies on nonequilibrium thermodynamics, since in thermal equilibrium there are no flows of free energy. Biological systems are also open systems, in the sense that both matter and energy flow in and out of them. Nonetheless, it is important in biology that systems can sometimes be treated as approximately closed, and sometimes approach equilibrium before being disrupted in one way or another. This can occur on a wide range of scales, from large ecosystems to within a single cell or organelle. Examples include:

• A population approaching an evolutionarily stable state.

• Random processes such as mutation, genetic drift, the diffusion of organisms in an environment or the diffusion of molecules in a liquid.

• A chemical reaction approaching equilibrium.

An interesting common feature of these processes is that as they occur, quantities mathematically akin to entropy tend to increase. Closely related quantities such as free energy tend to decrease. In this review, we explain some mathematical results that make this idea precise.

Most of these results involve a quantity that is variously known as ‘relative information’, ‘relative entropy’, ‘information gain’ or the ‘Kullback–Leibler divergence’. We’ll use the first term. Given two probability distributions $p$ and $q$ on a finite set $X$, their relative information, or more precisely the information of $p$ relative to $q$, is

$\displaystyle{ I(p\|q) = \sum_{i \in X} p_i \ln\left(\frac{p_i}{q_i}\right) }$

We use the word ‘information’ instead of ‘entropy’ because one expects entropy to increase with time, and the theorems we present will say that $I(p\|q)$ decreases with time under various conditions. The reason is that the Shannon entropy

$\displaystyle{ S(p) = -\sum_{i \in X} p_i \ln p_i }$

contains a minus sign that is missing from the definition of relative information.

Intuitively, $I(p\|q)$ is the amount of information gained when we start with a hypothesis given by some probability distribution $q$ and then learn the ‘true’ probability distribution $p$. For example, if we start with the hypothesis that a coin is fair and then are told that it landed heads up, the relative information is $\ln 2$, so we have gained 1 bit of information. If however we started with the hypothesis that the coin always lands heads up, we would have gained no information.

We put the word ‘true’ in quotes here, because the notion of a ‘true’ probability distribution, which subjective Bayesians reject, is not required to use relative information. A more cautious description of relative information is that it is a divergence: a way of measuring the difference between probability distributions that obeys

$I(p \| q) \ge 0$

and

$I(p \| q) = 0 \iff p = q$

but not necessarily the other axioms for a distance function, symmetry and the triangle inequality, which indeed fail for relative information.

There are many other divergences besides relative information, some of which we discuss in Section 6. However, relative information can be singled out by a number of characterizations, including one based on ideas from Bayesian inference. The relative information is also close to the expected number of extra bits required to code messages distributed according to the probability measure $p$ using a code optimized for messages distributed according to $q$.

In this review, we describe various ways in which a population or probability distribution evolves continuously according to some differential equation. For all these differential equations, I describe conditions under which relative information decreases. Briefly, the results are as follows. We hasten to reassure the reader that our paper explains all the jargon involved, and the proofs of the claims are given in full:

• In Section 2, we consider a very general form of the Lotka–Volterra equations, which are a commonly used model of population dynamics. Starting from the population $P_i$ of each type of replicating entity, we can define a probability distribution

$p_i = \displaystyle{\frac{P_i}{\sum_{i \in X} P_i }}$

which evolves according to a nonlinear equation called the replicator equation. We describe a necessary and sufficient condition under which $I(q\|p(t))$ is nonincreasing when $p(t)$ evolves according to the replicator equation while $q$ is held fixed.

• In Section 3, we consider a special case of the replicator equation that is widely studied in evolutionary game theory. In this case we can think of probability distributions as mixed strategies in a two-player game. When $q$ is a dominant strategy, $I(q\|p(t))$ can never increase when $p(t)$ evolves according to the replicator equation. We can think of $I(q\|p(t))$ as the information that the population has left to learn. Thus, evolution is analogous to a learning process—an analogy that in the field of artificial intelligence is exploited by evolutionary algorithms!

• In Section 4 we consider continuous-time, finite-state Markov processes. Here we have probability distributions on a finite set $X$ evolving according to a linear equation called the master equation. In this case $I(p(t)\|q(t))$ can never increase. Thus, if $q$ is a steady state solution of the master equation, both $I(p(t)\|q)$ and $I(q\|p(t))$ are nonincreasing. We can always write $q$ as the Boltzmann distribution for some energy function $E : X \to \mathbb{R}$, meaning that

$\displaystyle{ q_i = \frac{\exp(-E_i / k T)}{\displaystyle{\sum_{j \in X} \exp(-E_j / k T)}} }$

where $T$ is temperature and $k$ is Boltzmann’s constant. In this case, $I(p(t)\|q)$ is proportional to a difference of free energies:

$\displaystyle{ I(p(t)\|q) = \frac{F(p) - F(q)}{T} }$

Thus, the nonincreasing nature of $I(p(t)\|q)$ is a version of the Second Law of Thermodynamics.

• In Section 5, we consider chemical reactions and other processes described by reaction networks. In this context we have populations $P_i$ of entities of various kinds $i \in X$, and these populations evolve according to a nonlinear equation called the rate equation. We can generalize relative information from probability distributions to populations by setting

$\displaystyle{ I(P\|Q) = \sum_{i \in X} P_i \ln\left(\frac{P_i}{Q_i}\right) - \left(P_i - Q_i\right) }$

If $Q$ is a special sort of steady state solution of the rate equation, called a complex balanced equilibrium, $I(P(t)\|Q)$ can never increase when $P(t)$ evolves according to the rate equation.

• Finally, in Section 6, we consider a class of functions called $f$-divergences which include relative information as a special case. For any convex function $f : [0,\infty) \to [0,\infty)$, the f-divergence of two probability distributions $p, q : X \to [0,1]$ is given by

$\displaystyle{ I_f(p\|q) = \sum_{i \in X} q_i f\left(\frac{p_i}{q_i}\right)}$

Whenever $p(t)$ and $q(t)$ are probability distributions evolving according to the master equation of some Markov process, $I_f(p(t)\|q(t))$ is nonincreasing. The $f$-divergence is also well-defined for populations, and nonincreasing for two populations that both evolve according to the master equation.

## Information and Entropy in Biological Systems (Part 7)

6 June, 2015

In 1961, Rolf Landauer argued that that the least possible amount of energy required to erase one bit of information stored in memory at temperature $T$ is $kT \ln 2,$ where $k$ is Boltzmann’s constant.

This is called the Landauer limit, and it came after many decades of arguments concerning Maxwell’s demon and the relation between information and entropy.

In fact, these arguments are still not finished. For example, here’s an argument that the Landauer limit is not as solid as widely believed:

• John D. Norton, Waiting for Landauer, Studies in History and Philosophy of Modern Physics 42 (2011), 184–198.

But something like the Landauer limit almost surely holds under some conditions! And if it holds, it puts some limits on what organisms can do. That’s what David Wolpert spoke about at our workshop! You can see his slides here:

You can also watch a video:

## Information and Entropy in Biological Systems (Part 6)

1 June, 2015

The resounding lack of comment to this series of posts confirms my theory that a blog post that says “go somewhere else and read something” will never be popular. Even if it’s “go somewhere else and watch a video”, this is too much like saying

Hi! Want to talk? Okay, go into that other room and watch TV, then come back when you’re done and we’ll talk about it.

But no matter: our workshop on Information and Entropy in Biological Systems was really exciting! I want to make it available to the world as much as possible. I’m running around too much to create lovingly hand-crafted summaries of each talk—and I know you’re punishing me for that, with your silence. But I’ll keep on going, just to get the material out there.

Marc Harper spoke about information in evolutionary game theory, and we have a nice video of that. I’ve been excited about his work for quite a while, because it shows that the analogy between ‘evolution’ and ‘learning’ can be made mathematically precise. I summarized some of his ideas in my information geometry series, and I’ve also gotten him to write two articles for this blog:

• Marc Harper, Relative entropy in evolutionary dynamics, Azimuth, 22 January 2014.

• Marc Harper, Stationary stability in finite populations, Azimuth, 24 March 2015.

Here are the slides and video of his talk:

• Marc Harper, Information transport and evolutionary dynamics.