## Information Geometry (Part 16)

14 January, 2016

joint with Blake Pollard

Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).

The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how life works. That’s a difficult job! But one has to start somewhere, and this is one place to start.

We have a few papers on this subject:

• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)

However, right now we just want to show you three closely connected results about how relative entropy changes in open Markov processes.

### Definitions

An open Markov process consists of a finite set $X$ of states, a subset $B \subseteq X$ of boundary states, and an infinitesimal stochastic operator $H: \mathbb{R}^X \to \mathbb{R}^X,$ meaning a linear operator with

$H_{ij} \geq 0 \ \ \text{for all} \ \ i \neq j$

and

$\sum_i H_{ij} = 0 \ \ \text{for all} \ \ j$

For each state $i \in X$ we introduce a population $p_i \in [0,\infty).$ We call the resulting function $p : X \to [0,\infty)$ the population distribution.

Populations evolve in time according to the open master equation:

$\displaystyle{ \frac{dp_i}{dt} = \sum_j H_{ij}p_j} \ \ \text{for all} \ \ i \in X-B$

$p_i(t) = b_i(t) \ \ \text{for all} \ \ i \in B$

So, the populations $p_i$ obey a linear differential equation at states $i$ that are not in the boundary, but they are specified ‘by the user’ to be chosen functions $b_i$ at the boundary states. The off-diagonal entry $H_{ij}$ for $i \neq j$ describe the rate at which population transitions from the $j$th to the $i$th state.

A closed Markov process, or continuous-time discrete-state Markov chain, is an open Markov process whose boundary is empty. For a closed Markov process, the open master equation becomes the usual master equation:

$\displaystyle{ \frac{dp}{dt} = Hp }$

In a closed Markov process the total population is conserved:

$\displaystyle{ \frac{d}{dt} \sum_{i \in X} p_i = \sum_{i,j} H_{ij}p_j = 0 }$

This lets us normalize the initial total population to 1 and have it stay equal to 1. If we do this, we can talk about probabilities instead of populations. In an open Markov process, population can flow in and out at the boundary states.

For any pair of distinct states $i,j,$ $H_{ij}p_j$ is the flow of population from $j$ to $i.$ The net flux of population from the $j$th state to the $i$th state is the flow from $j$ to $i$ minus the flow from $i$ to $j$:

$J_{ij} = H_{ij}p_j - H_{ji}p_i$

A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. The idea is that population can flow in or out at the boundary states.

We say an equilibrium $p : X \to [0,\infty)$ of a Markov process is detailed balanced if all the net fluxes vanish:

$J_{ij} = 0 \ \ \text{for all} \ \ i,j \in X$

or in other words:

$H_{ij}p_j = H_{ji}p_i \ \ \text{for all} \ \ i,j \in X$

Given two population distributions $p, q : X \to [0,\infty)$ we can define the relative entropy

$\displaystyle{ I(p,q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right)}$

When $q$ is a detailed balanced equilibrium solution of the master equation, the relative entropy can be seen as the ‘free energy’ of $p.$ For a precise statement, see Section 4 of Relative entropy in biological systems.

The Second Law of Thermodynamics implies that the free energy of a closed system tends to decrease with time, so for closed Markov processes we expect $I(p,q)$ to be nonincreasing. And this is true! But for open Markov processes, free energy can flow in from outside. This is just one of several nice results about how relative entropy changes with time.

### Results

Theorem 1. Consider an open Markov process with $X$ as its set of states and $B$ as the set of boundary states. Suppose $p(t)$ and $q(t)$ obey the open master equation, and let the quantities

$\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_{j \in X} H_{ij}p_j }$

$\displaystyle{ \frac{Dq_i}{Dt} = \frac{dq_i}{dt} - \sum_{j \in X} H_{ij}q_j }$

measure how much the time derivatives of $p_i$ and $q_i$ fail to obey the master equation. Then we have

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)} \\ \\ && \; + \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} } \end{array}$

This result separates the change in relative entropy change into two parts: an ‘internal’ part and a ‘boundary’ part.

It turns out the ‘internal’ part is always less than or equal to zero. So, from Theorem 1 we can deduce a version of the Second Law of Thermodynamics for open Markov processes:

Theorem 2. Given the conditions of Theorem 1, we have

$\displaystyle{ \frac{d}{dt} I(p(t),q(t)) \; \le \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} }$

Intuitively, this says that free energy can only increase if it comes in from the boundary!

There is another nice result that holds when $q$ is an equilibrium solution of the master equation. This idea seems to go back to Schnakenberg:

Theorem 3. Given the conditions of Theorem 1, suppose also that $q$ is an equilibrium solution of the master equation. Then we have

$\displaystyle{ \frac{d}{dt} I(p(t),q) = -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} \; + \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} }$

where

$J_{ij} = H_{ij}p_j - H_{ji}p_i$

is the net flux from $j$ to $i,$ while

$\displaystyle{ A_{ij} = \ln \left(\frac{p_j q_i}{p_i q_j} \right) }$

is the conjugate thermodynamic force.

The flux $J_{ij}$ has a nice meaning: it’s the net flow of population from $j$ to $i.$ The thermodynamic force is a bit subtler, but this theorem reveals its meaning: it says how much the population wants to flow from $j$ to $i.$

More precisely, up to that factor of $1/2,$ the thermodynamic force $A_{ij}$ says how much free energy loss is caused by net flux from $j$ to $i.$ There’s a nice analogy here to water losing potential energy as it flows downhill due to the force of gravity.

### Proofs

Proof of Theorem 1. We begin by taking the time derivative of the relative information:

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i \in X} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} + \frac{\partial I}{\partial q_i} \frac{dq_i}{dt} } \end{array}$

We can separate this into a sum over states $i \in X - B,$ for which the time derivatives of $p_i$ and $q_i$ are given by the master equation, and boundary states $i \in B,$ for which they are not:

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i \in X-B, \; j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j }\\ \\ && + \; \; \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} + \frac{\partial I}{\partial q_i} \frac{dq_i}{dt}} \end{array}$

For boundary states we have

$\displaystyle{ \frac{dp_i}{dt} = \frac{Dp_i}{Dt} + \sum_{j \in X} H_{ij}p_j }$

and similarly for the time derivative of $q_i.$ We thus obtain

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i,j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j }\\ \\ && + \; \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt}} \end{array}$

To evaluate the first sum, recall that

$\displaystyle{ I(p,q) = \sum_{i \in X} p_i \ln (\frac{p_i}{q_i})}$

so

$\displaystyle{\frac{\partial I}{\partial p_i}} =\displaystyle{1 + \ln (\frac{p_i}{q_i})} , \qquad \displaystyle{ \frac{\partial I}{\partial q_i}}= \displaystyle{- \frac{p_i}{q_i} }$

Thus, we have

$\displaystyle{ \sum_{i,j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j = \sum_{i,j\in X} (1 + \ln (\frac{p_i}{q_i})) H_{ij} p_j - \frac{p_i}{q_i} H_{ij} q_j }$

We can rewrite this as

$\displaystyle{ \sum_{i,j \in X} H_{ij} p_j \left( 1 + \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }$

Since $H_{ij}$ is infinitesimal stochastic we have $\sum_{i} H_{ij} = 0,$ so the first term drops out, and we are left with

$\displaystyle{ \sum_{i,j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }$

as desired.   █

Proof of Theorem 2. Thanks to Theorem 1, to prove

$\displaystyle{ \frac{d}{dt} I(p(t),q(t)) \; \le \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} }$

it suffices to show that

$\displaystyle{ \sum_{i,j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) \le 0 }$

or equivalently (recalling the proof of Theorem 1):

$\displaystyle{ \sum_{i,j} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \le 0 }$

The last two terms on the left hand side cancel when $i = j.$ Thus, if we break the sum into an $i \ne j$ part and an $i = j$ part, the left side becomes

$\displaystyle{ \sum_{i \ne j} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \; + \; \sum_j H_{jj} p_j \ln(\frac{p_j}{q_j}) }$

Next we can use the infinitesimal stochastic property of $H$ to write $H_{jj}$ as the sum of $-H_{ij}$ over $i$ not equal to $j,$ obtaining

$\displaystyle{ \sum_{i \ne j} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) - \sum_{i \ne j} H_{ij} p_j \ln(\frac{p_j}{q_j}) } =$

$\displaystyle{ \sum_{i \ne j} H_{ij} p_j \left( \ln(\frac{p_iq_j}{p_j q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) }$

Since $H_{ij} \ge 0$ when $i \ne j$ and $\ln(s) + 1 - s \le 0$ for all $s > 0,$ we conclude that this quantity is $\le 0.$   █

Proof of Theorem 3. Now suppose also that $q$ is an equilibrium solution of the master equation. Then $Dq_i/Dt = dq_i/dt = 0$ for all states $i,$ so by Theorem 1 we need to show

$\displaystyle{ \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) \; = \; -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }$

We also have $\sum_{j \in X} H_{ij} q_j = 0,$ so the second
term in the sum at left vanishes, and it suffices to show

$\displaystyle{ \sum_{i, j \in X} H_{ij} p_j \ln(\frac{p_i}{q_i}) \; = \; - \frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }$

By definition we have

$\displaystyle{ \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} = \displaystyle{ \frac{1}{2} \sum_{i,j} \left( H_{ij} p_j - H_{ji}p_i \right) \ln \left( \frac{p_j q_i}{p_i q_j} \right) }$

This in turn equals

$\displaystyle{ \frac{1}{2} \sum_{i,j} H_{ij}p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) - \frac{1}{2} \sum_{i,j} H_{ji}p_i \ln \left( \frac{p_j q_i}{p_i q_j} \right) }$

and we can switch the dummy indices $i,j$ in the second sum, obtaining

$\displaystyle{ \frac{1}{2} \sum_{i,j} H_{ij}p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) - \frac{1}{2} \sum_{i,j} H_{ij}p_j \ln \left( \frac{p_i q_j}{p_j q_i} \right) }$

or simply

$\displaystyle{ \sum_{i,j} H_{ij} p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) }$

But this is

$\displaystyle{ \sum_{i,j} H_{ij} p_j \left(\ln ( \frac{p_j}{q_j}) + \ln (\frac{q_i}{p_i}) \right) }$

and the first term vanishes because $H$ is infinitesimal stochastic: $\sum_i H_{ij} = 0.$ We thus have

$\displaystyle{ \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} = \sum_{i,j} H_{ij} p_j \ln (\frac{q_i}{p_i} )$

as desired.   █

## Information Geometry (Part 15)

11 January, 2016

It’s been a long time since you’ve seen an installment of the information geometry series on this blog! Before I took a long break, I was explaining relative entropy and how it changes in evolutionary games. Much of what I said is summarized and carried further here:

• John Baez and Blake Pollard, Relative entropy in biological systems, Entropy 18 (2016), 46. (Blog article here.)

But now Blake has a new paper, and I want to talk about that:

• Blake Pollard, Open Markov processes: a compositional perspective on non-equilibrium steady states in biology, Entropy 18 (2016), 140.

I’ll focus on just one aspect: the principle of minimum entropy production. This is an exciting yet controversial principle in non-equilibrium thermodynamics. Blake examines it in a situation where we can tell exactly what’s happening.

Life exists away from equilibrium. Left isolated, systems will tend toward thermodynamic equilibrium. However, biology is about open systems: physical systems that exchange matter or energy with their surroundings. Open systems can be maintained away from equilibrium by this exchange. This leads to the idea of a non-equilibrium steady state—a state of an open system that doesn’t change, but is not in equilibrium.

A simple example is a pan of water sitting on a stove. Heat passes from the flame to the water and then to the air above. If the flame is very low, the water doesn’t boil and nothing moves. So, we have a steady state, at least approximately. But this is not an equilibrium, because there is a constant flow of energy through the water.

Of course in reality the water will be slowly evaporating, so we don’t really have a steady state. As always, models are approximations. If the water is evaporating slowly enough, it can be useful to approximate the situation with a non-equilibrium steady state.

There is much more to biology than steady states. However, to dip our toe into the chilly waters of non-equilibrium thermodynamics, it is nice to start with steady states. And already here there are puzzles left to solve.

### Minimum entropy production

Ilya Prigogine won the Nobel prize for his work on non-equilibrium thermodynamics. One reason is that he had an interesting idea about steady states. He claimed that under certain conditions, a non-equilibrium steady state will minimize entropy production!

There has been a lot of work trying to make the ‘principle of minimum entropy production’ precise and turn it into a theorem. In this book:

• G. Lebon and D. Jou, Understanding Non-equilibrium Thermodynamics, Springer, Berlin, 2008.

the authors give an argument for the principle of minimum entropy production based on four conditions:

time-independent boundary conditions: the surroundings of the system don’t change with time.

linear phenomenological laws: the laws governing the macroscopic behavior of the system are linear.

constant phenomenological coefficients: the laws governing the macroscopic behavior of the system don’t change with time.

symmetry of the phenomenological coefficients: since they are linear, the laws governing the macroscopic behavior of the system can be described by a linear operator, and we demand that in a suitable basis the matrix for this operator is symmetric: $T_{ij} = T_{ji}.$

The last condition is obviously the subtlest one; it’s sometimes called Onsager reciprocity, and people have spent a lot of time trying to derive it from other conditions.

However, Blake goes in a different direction. He considers a concrete class of open systems, a very large class called ‘open Markov processes’. These systems obey the first three conditions listed above, and the ‘detailed balanced’ open Markov processes also obey the last one. But Blake shows that minimum entropy production holds only approximately—with the approximation being good for steady states that are near equilibrium!

However, he shows that another minimum principle holds exactly, even for steady states that are far from equilibrium. He calls this the ‘principle of minimum dissipation’.

We actually discussed the principle of minimum dissipation in an earlier paper:

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

But one advantage of Blake’s new paper is that it presents the results with a minimum of category theory. Of course I love category theory, and I think it’s the right way to formalize open systems, but it can be intimidating.

Another good thing about Blake’s new paper is that it explicitly compares the principle of minimum entropy to the principle of minimum dissipation. He shows they agree in a certain limit—namely, the limit where the system is close to equilibrium.

Let me explain this. I won’t include the nice example from biology that Blake discusses: a very simple model of membrane transport. For that, read his paper! I’ll just give the general results.

### The principle of minimum dissipation

An open Markov process consists of a finite set $X$ of states, a subset $B \subseteq X$ of boundary states, and an infinitesimal stochastic operator $H: \mathbb{R}^X \to \mathbb{R}^X,$ meaning a linear operator with

$H_{ij} \geq 0 \ \ \text{for all} \ \ i \neq j$

and

$\sum_i H_{ij} = 0 \ \ \text{for all} \ \ j$

I’ll explain these two conditions in a minute.

For each $i \in X$ we introduce a population $p_i \in [0,\infty).$ We call the resulting function $p : X \to [0,\infty)$ the population distribution. Populations evolve in time according to the open master equation:

$\displaystyle{ \frac{dp_i}{dt} = \sum_j H_{ij}p_j} \ \ \text{for all} \ \ i \in X-B$

$p_i(t) = b_i(t) \ \ \text{for all} \ \ i \in B$

So, the populations $p_i$ obey a linear differential equation at states $i$ that are not in the boundary, but they are specified ‘by the user’ to be chosen functions $b_i$ at the boundary states.

The off-diagonal entries $H_{ij}, \ i \neq j$ are the rates at which population hops from the $j$th to the $i$th state. This lets us understand the definition of an infinitesimal stochastic operator. The first condition:

$H_{ij} \geq 0 \ \ \text{for all} \ \ i \neq j$

says that the rate for population to transition from one state to another is non-negative. The second:

$\sum_i H_{ij} = 0 \ \ \text{for all} \ \ j$

says that population is conserved, at least if there are no boundary states. Population can flow in or out at boundary states, since the master equation doesn’t hold there.

A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. Again, the reason is that population can flow in or out at the boundary.

We say an equilibrium $q : X \to [0,\infty)$ of a Markov process is detailed balanced if the rate at which population flows from the $i$th state to the $j$th state is equal to the rate at which it flows from the $j$th state to the $i$th:

$H_{ji}q_i = H_{ij}q_j \ \ \text{for all} \ \ i,j \in X$

Suppose we’ve got an open Markov process that has a detailed balanced equilibrium $q$. Then a non-equilibrium steady state $p$ will minimize a function called the ‘dissipation’, subject to constraints on its boundary populations. There’s a nice formula for the dissipation in terms of $p$ and $q.$

Definition. Given an open Markov process with detailed balanced equilibrium $q$ we define the dissipation for a population distribution $p$ to be

$\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j}{q_j} - \frac{p_i}{q_i} \right)^2 }$

This formula is a bit tricky, but you’ll notice it’s quadratic in $p$ and it vanishes when $p = q.$ So, it’s pretty nice.

Using this concept we can formulate a principle of minimum dissipation, and prove that non-equilibrium steady states obey this principle:

Definition. We say a population distribution $p: X \to \mathbb{R}$ obeys the principle of minimum dissipation with boundary population $b: X \to \mathbb{R}$ if $p$ minimizes $D(p)$ subject to the constraint that

$p_i = b_i \ \ \text{for all} \ \ i \in B.$

Theorem 1. A population distribution $p$ is a steady state with $p_i = b_i$ for all boundary states $i$ if and only if $p$ obeys the principle of minimum dissipation with boundary population $b$.

Proof. This follows from Theorem 28 in A compositional framework for Markov processes.

### Minimum entropy production versus minimum dissipation

How does dissipation compare with entropy production? To answer this, first we must ask: what really is entropy production? And: how does the equilibrium state $q$ show up in the concept of entropy production?

The relative entropy of two population distributions $p,q$ is given by

$\displaystyle{ I(p,q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right) }$

It is well known that for a closed Markov process with $q$ as a detailed balanced equilibrium, the relative entropy is monotonically decreasing with time. This is due to an annoying sign convention in the definition of relative entropy: while entropy is typically increasing, relative entropy typically decreases. We could fix this by putting a minus sign in the above formula or giving this quantity $I(p,q)$ some other name. A lot of people call it the Kullback–Leibler divergence, but I have taken to calling it relative information. For more, see:

• John Baez and Blake Pollard, Relative entropy in biological systems. (Blog article here.)

We say ‘relative entropy’ in the title, but then we explain why ‘relative information’ is a better name, and use that. More importantly, we explain why $I(p,q)$ has the physical meaning of free energy. Free energy tends to decrease, so everything is okay. For details, see Section 4.

Blake has a nice formula for how fast $I(p,q)$ decreases:

Theorem 2. Consider an open Markov process with $X$ as its set of states and $B$ as the set of boundary states. Suppose $p(t)$ obeys the open master equation and $q$ is a detailed balanced equilibrium. For any boundary state $i \in B,$ let

$\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_{j \in X} H_{ij}p_j }$

measure how much $p_i$ fails to obey the master equation. Then we have

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q) } &=& \displaystyle{ \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)} \\ \\ && \; + \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} } \end{array}$

Moreover, the first term is less than or equal to zero.

Proof. For a self-contained proof, see Information geometry (part 16), which is coming up soon. It will be a special case of the theorems there.   █

Blake compares this result to previous work by Schnakenberg:

• J. Schnakenberg, Network theory of microscopic and macroscopic behavior of master equation systems, Rev. Mod. Phys. 48 (1976), 571–585.

The negative of Blake’s first term is this:

$\displaystyle{ K(p) = - \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }$

Under certain circumstances, this equals what Schnakenberg calls the entropy production. But a better name for this quantity might be free energy loss, since for a closed Markov process that’s exactly what it is! In this case there are no boundary states, so the theorem above says $K(p)$ is the rate at which relative entropy—or in other words, free energy—decreases.

For an open Markov process, things are more complicated. The theorem above shows that free energy can also flow in or out at the boundary, thanks to the second term in the formula.

Anyway, the sensible thing is to compare a principle of ‘minimum free energy loss’ to the principle of minimum dissipation. The principle of minimum dissipation is true. How about the principle of minimum free energy loss? It turns out to be approximately true near equilibrium.

For this, consider the situation in which $p$ is near to the equilibrium distribution $q$ in the sense that

$\displaystyle{ \frac{p_i}{q_i} = 1 + \epsilon_i }$

for some small numbers $\epsilon_i.$ We collect these numbers in a vector called $\epsilon.$

Theorem 3. Consider an open Markov process with $X$ as its set of states and $B$ as the set of boundary states. Suppose $q$ is a detailed balanced equilibrium and let $p$ be arbitrary. Then

$K(p) = D(p) + O(\epsilon^2)$

where $K(p)$ is the free energy loss, $D(p)$ is the dissipation, $\epsilon_i$ is defined as above, and by $O(\epsilon^2)$ we mean a sum of terms of order $\epsilon_i^2.$

Proof. First take the free energy loss:

$\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)}$

Expanding the logarithm to first order in $\epsilon,$ we get

$\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} p_j \left( \frac{p_i}{q_i} - 1 - \frac{p_i q_j}{p_j q_i} \right) + O(\epsilon^2) }$

Since $H$ is infinitesimal stochastic, $\sum_i H_{ij} = 0,$ so the second term in the sum vanishes, leaving

$\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} p_j \left( \frac{p_i}{q_i} - \frac{p_i q_j}{p_j q_i} \right) \; + O(\epsilon^2) }$

or

$\displaystyle{ K(p) = -\sum_{i, j \in X} \left( H_{ij} p_j \frac{p_i}{q_i} - H_{ij} q_j \frac{p_i}{q_i} \right) \; + O(\epsilon^2) }$

Since $q$ is a equilibrium we have $\sum_j H_{ij} q_j = 0,$ so now the last term in the sum vanishes, leaving

$\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} \frac{p_i p_j}{q_i} \; + O(\epsilon^2) }$

Next, take the dissipation

$\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j}{q_j} - \frac{p_i}{q_i} \right)^2 }$

and expand the square, getting

$\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j^2}{q_j^2} - 2\frac{p_i p_j}{q_i q_j} + \frac{p_i^2}{q_i^2} \right) }$

Since $H$ is infinitesimal stochastic, $\sum_i H_{ij} = 0.$ The first term is just this times a function of $j,$ summed over $j,$ so it vanishes, leaving

$\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left(- 2\frac{p_i p_j}{q_i q_j} + \frac{p_i^2}{q_i^2} \right) }$

Since $q$ is an equilibrium, $\sum_j H_{ij} q_j = 0.$ The last term above is this times a function of $i,$ summed over $i,$ so it vanishes, leaving

$\displaystyle{ D(p) = - \sum_{i,j} H_{ij}q_j \frac{p_i p_j}{q_i q_j} = - \sum_{i,j} H_{ij} \frac{p_i p_j}{q_i} }$

This matches what we got for $K(p),$ up to terms of order $O(\epsilon^2).$   █

In short: detailed balanced open Markov processes are governed by the principle of minimum dissipation, not minimum entropy production. Minimum dissipation agrees with minimum entropy production only near equilibrium.

## Relative Entropy in Biological Systems

27 November, 2015

Here’s a paper for the proceedings of a workshop on Information and Entropy in Biological System this spring:

• John Baez and Blake Pollard, Relative entropy in biological systems, with Blake S. Pollard, Entropy 18 (2016), 46.

We’d love any comments or questions you might have. I’m not happy with the title. In the paper we advocate using the term ‘relative information’ instead of ‘relative entropy’—yet the latter is much more widely used, so I feel we need it in the title to let people know what the paper is about!

Here’s the basic idea.

Life relies on nonequilibrium thermodynamics, since in thermal equilibrium there are no flows of free energy. Biological systems are also open systems, in the sense that both matter and energy flow in and out of them. Nonetheless, it is important in biology that systems can sometimes be treated as approximately closed, and sometimes approach equilibrium before being disrupted in one way or another. This can occur on a wide range of scales, from large ecosystems to within a single cell or organelle. Examples include:

• A population approaching an evolutionarily stable state.

• Random processes such as mutation, genetic drift, the diffusion of organisms in an environment or the diffusion of molecules in a liquid.

• A chemical reaction approaching equilibrium.

An interesting common feature of these processes is that as they occur, quantities mathematically akin to entropy tend to increase. Closely related quantities such as free energy tend to decrease. In this review, we explain some mathematical results that make this idea precise.

Most of these results involve a quantity that is variously known as ‘relative information’, ‘relative entropy’, ‘information gain’ or the ‘Kullback–Leibler divergence’. We’ll use the first term. Given two probability distributions $p$ and $q$ on a finite set $X$, their relative information, or more precisely the information of $p$ relative to $q$, is

$\displaystyle{ I(p\|q) = \sum_{i \in X} p_i \ln\left(\frac{p_i}{q_i}\right) }$

We use the word ‘information’ instead of ‘entropy’ because one expects entropy to increase with time, and the theorems we present will say that $I(p\|q)$ decreases with time under various conditions. The reason is that the Shannon entropy

$\displaystyle{ S(p) = -\sum_{i \in X} p_i \ln p_i }$

contains a minus sign that is missing from the definition of relative information.

Intuitively, $I(p\|q)$ is the amount of information gained when we start with a hypothesis given by some probability distribution $q$ and then learn the ‘true’ probability distribution $p$. For example, if we start with the hypothesis that a coin is fair and then are told that it landed heads up, the relative information is $\ln 2$, so we have gained 1 bit of information. If however we started with the hypothesis that the coin always lands heads up, we would have gained no information.

We put the word ‘true’ in quotes here, because the notion of a ‘true’ probability distribution, which subjective Bayesians reject, is not required to use relative information. A more cautious description of relative information is that it is a divergence: a way of measuring the difference between probability distributions that obeys

$I(p \| q) \ge 0$

and

$I(p \| q) = 0 \iff p = q$

but not necessarily the other axioms for a distance function, symmetry and the triangle inequality, which indeed fail for relative information.

There are many other divergences besides relative information, some of which we discuss in Section 6. However, relative information can be singled out by a number of characterizations, including one based on ideas from Bayesian inference. The relative information is also close to the expected number of extra bits required to code messages distributed according to the probability measure $p$ using a code optimized for messages distributed according to $q$.

In this review, we describe various ways in which a population or probability distribution evolves continuously according to some differential equation. For all these differential equations, I describe conditions under which relative information decreases. Briefly, the results are as follows. We hasten to reassure the reader that our paper explains all the jargon involved, and the proofs of the claims are given in full:

• In Section 2, we consider a very general form of the Lotka–Volterra equations, which are a commonly used model of population dynamics. Starting from the population $P_i$ of each type of replicating entity, we can define a probability distribution

$p_i = \displaystyle{\frac{P_i}{\sum_{i \in X} P_i }}$

which evolves according to a nonlinear equation called the replicator equation. We describe a necessary and sufficient condition under which $I(q\|p(t))$ is nonincreasing when $p(t)$ evolves according to the replicator equation while $q$ is held fixed.

• In Section 3, we consider a special case of the replicator equation that is widely studied in evolutionary game theory. In this case we can think of probability distributions as mixed strategies in a two-player game. When $q$ is a dominant strategy, $I(q\|p(t))$ can never increase when $p(t)$ evolves according to the replicator equation. We can think of $I(q\|p(t))$ as the information that the population has left to learn. Thus, evolution is analogous to a learning process—an analogy that in the field of artificial intelligence is exploited by evolutionary algorithms!

• In Section 4 we consider continuous-time, finite-state Markov processes. Here we have probability distributions on a finite set $X$ evolving according to a linear equation called the master equation. In this case $I(p(t)\|q(t))$ can never increase. Thus, if $q$ is a steady state solution of the master equation, both $I(p(t)\|q)$ and $I(q\|p(t))$ are nonincreasing. We can always write $q$ as the Boltzmann distribution for some energy function $E : X \to \mathbb{R}$, meaning that

$\displaystyle{ q_i = \frac{\exp(-E_i / k T)}{\displaystyle{\sum_{j \in X} \exp(-E_j / k T)}} }$

where $T$ is temperature and $k$ is Boltzmann’s constant. In this case, $I(p(t)\|q)$ is proportional to a difference of free energies:

$\displaystyle{ I(p(t)\|q) = \frac{F(p) - F(q)}{T} }$

Thus, the nonincreasing nature of $I(p(t)\|q)$ is a version of the Second Law of Thermodynamics.

• In Section 5, we consider chemical reactions and other processes described by reaction networks. In this context we have populations $P_i$ of entities of various kinds $i \in X$, and these populations evolve according to a nonlinear equation called the rate equation. We can generalize relative information from probability distributions to populations by setting

$\displaystyle{ I(P\|Q) = \sum_{i \in X} P_i \ln\left(\frac{P_i}{Q_i}\right) - \left(P_i - Q_i\right) }$

If $Q$ is a special sort of steady state solution of the rate equation, called a complex balanced equilibrium, $I(P(t)\|Q)$ can never increase when $P(t)$ evolves according to the rate equation.

• Finally, in Section 6, we consider a class of functions called $f$-divergences which include relative information as a special case. For any convex function $f : [0,\infty) \to [0,\infty)$, the f-divergence of two probability distributions $p, q : X \to [0,1]$ is given by

$\displaystyle{ I_f(p\|q) = \sum_{i \in X} q_i f\left(\frac{p_i}{q_i}\right)}$

Whenever $p(t)$ and $q(t)$ are probability distributions evolving according to the master equation of some Markov process, $I_f(p(t)\|q(t))$ is nonincreasing. The $f$-divergence is also well-defined for populations, and nonincreasing for two populations that both evolve according to the master equation.

## Information and Entropy in Biological Systems (Part 7)

6 June, 2015

In 1961, Rolf Landauer argued that that the least possible amount of energy required to erase one bit of information stored in memory at temperature $T$ is $kT \ln 2,$ where $k$ is Boltzmann’s constant.

This is called the Landauer limit, and it came after many decades of arguments concerning Maxwell’s demon and the relation between information and entropy.

In fact, these arguments are still not finished. For example, here’s an argument that the Landauer limit is not as solid as widely believed:

• John D. Norton, Waiting for Landauer, Studies in History and Philosophy of Modern Physics 42 (2011), 184–198.

But something like the Landauer limit almost surely holds under some conditions! And if it holds, it puts some limits on what organisms can do. That’s what David Wolpert spoke about at our workshop! You can see his slides here:

You can also watch a video:

## Information and Entropy in Biological Systems (Part 6)

1 June, 2015

The resounding lack of comment to this series of posts confirms my theory that a blog post that says “go somewhere else and read something” will never be popular. Even if it’s “go somewhere else and watch a video”, this is too much like saying

Hi! Want to talk? Okay, go into that other room and watch TV, then come back when you’re done and we’ll talk about it.

But no matter: our workshop on Information and Entropy in Biological Systems was really exciting! I want to make it available to the world as much as possible. I’m running around too much to create lovingly hand-crafted summaries of each talk—and I know you’re punishing me for that, with your silence. But I’ll keep on going, just to get the material out there.

Marc Harper spoke about information in evolutionary game theory, and we have a nice video of that. I’ve been excited about his work for quite a while, because it shows that the analogy between ‘evolution’ and ‘learning’ can be made mathematically precise. I summarized some of his ideas in my information geometry series, and I’ve also gotten him to write two articles for this blog:

• Marc Harper, Relative entropy in evolutionary dynamics, Azimuth, 22 January 2014.

• Marc Harper, Stationary stability in finite populations, Azimuth, 24 March 2015.

Here are the slides and video of his talk:

• Marc Harper, Information transport and evolutionary dynamics.

## Information and Entropy in Biological Systems (Part 5)

30 May, 2015

John Harte of U. C. Berkeley spoke about the maximum entropy method as a method of predicting patterns in ecology. Annette Ostling of the University of Michigan spoke about some competing theories, such as the ‘neutral model’ of biodiversity—a theory that sounds much too simple to be right, yet fits the data surprisingly well!

We managed to get a video of Ostling’s talk, but not Harte’s. Luckily, you can see the slides of both. You can also see a summary of Harte’s book Maximum Entropy and Ecology:

• John Baez, Maximum entropy and ecology, Azimuth, 21 February 2013.

Here are his talk slides and abstract:

Abstract. Constrained maximization of information entropy (MaxEnt) yields least-biased probability distributions. In statistical physics, this powerful inference method yields classical statistical mechanics/thermodynamics under the constraints imposed by conservation laws. I apply MaxEnt to macroecology, the study of the distribution, abundance, and energetics of species in ecosystems. With constraints derived from ratios of ecological state variables, I show that MaxEnt yields realistic abundance distributions, species-area relationships, spatial aggregation patterns, and body-size distributions over a wide range of taxonomic groups, habitats and spatial scales. I conclude with a brief summary of some of the major opportunities at the frontier of MaxEnt-based macroecological theory.

Here is a video of Ostling’s talk, as well as her slides and some papers she recommended:

• Annette Ostling, The neutral theory of biodiversity and other competitors to maximum entropy.

Abstract: I am a bit of the odd man out in that I will not talk that much about information and entropy, but instead about neutral theory and niche theory in ecology. My interest in coming to this workshop is in part out of an interest in what greater insights we can get into neutral models and stochastic population dynamics in general using entropy and information theory.

I will present the niche and neutral theories of the maintenance of diversity of competing species in ecology, and explain the dynamics included in neutral models in ecology. I will also briefly explain how one can derive a species abundance distribution from neutral models. I will present the view that neutral models have the potential to serve as more process-based null models than previously used in ecology for detecting the signature of niches and habitat filtering. However, tests of neutral theory in ecology have not as of yet been as useful as tests of neutral theory in evolutionary biology, because they leave open the possibility that pattern is influenced by “demographic complexity” rather than niches. I will mention briefly some of the work I’ve been doing to try to construct better tests of neutral theory.

Finally I’ll mention some connections that have been made so far between predictions of entropy theory and predictions of neutral theory in ecology and evolution.

These papers present interesting relations between ecology and statistical mechanics. Check out the nice ‘analogy chart’ in the second one!

• M. G. Bowler, Species abundance distributions, statistical mechanics and the priors of MaxEnt, Theoretical Population Biology 92 (2014), 69–77.

Abstract. The methods of Maximum Entropy have been deployed for some years to address the problem of species abundance distributions. In this approach, it is important to identify the correct weighting factors, or priors, to be applied before maximising the entropy function subject to constraints. The forms of such priors depend not only on the exact problem but can also depend on the way it is set up; priors are determined by the underlying dynamics of the complex system under consideration. The problem is one of statistical mechanics and it is the properties of the system that yield the correct MaxEnt priors, appropriate to the way the problem is framed. Here I calculate, in several different ways, the species abundance distribution resulting when individuals in a community are born and die independently. In
the usual formulation the prior distribution for the number of species over the number of individuals is 1/n; the problem can be reformulated in terms of the distribution of individuals over species classes, with a uniform prior. Results are obtained using master equations for the dynamics and separately through the combinatoric methods of elementary statistical mechanics; the MaxEnt priors then emerge a posteriori. The first object is to establish the log series species abundance distribution as the outcome of per capita guild dynamics. The second is to clarify the true nature and origin of priors in the language of MaxEnt. Finally, I consider how it may come about that the distribution is similar to log series in the event that filled niches dominate species abundance. For the general ecologist, there are two messages. First, that species abundance distributions are determined largely by population sorting through fractional processes (resulting in the 1/n factor) and secondly that useful information is likely to be found only in departures from the log series. For the MaxEnt practitioner, the message is that the prior with respect to which the entropy is to be maximised is determined by the nature of the problem and the way in which it is formulated.

• Guy Sella and Aaron E. Hirsh, The application of statistical physics to evolutionary biology, Proc. Nat. Acad. Sci. 102 (2005), 9541–9546.

A number of fundamental mathematical models of the evolutionary process exhibit dynamics that can be difficult to understand analytically. Here we show that a precise mathematical analogy can be drawn between certain evolutionary and thermodynamic systems, allowing application of the powerful machinery of statistical physics to analysis of a family of evolutionary models. Analytical results that follow directly from this approach include the steady-state distribution of fixed genotypes and the load in finite populations. The analogy with statistical physics also reveals that, contrary to a basic tenet of the nearly neutral theory of molecular evolution, the frequencies of adaptive and deleterious substitutions at steady state are equal. Finally, just as the free energy function quantitatively characterizes the balance between energy and entropy, a free fitness function provides an analytical expression for the balance between natural selection and stochastic drift.

## Information and Entropy in Biological Systems (Part 4)

21 May, 2015

I kicked off the workshop on Information and Entropy in Biological Systems with a broad overview of the many ways information theory and entropy get used in biology:

• John Baez, Information and entropy in biological systems.

Abstract. Information and entropy are being used in biology in many different ways: for example, to study biological communication systems, the ‘action-perception loop’, the thermodynamic foundations of biology, the structure of ecosystems, measures of biodiversity, and evolution. Can we unify these? To do this, we must learn to talk to each other. This will be easier if we share some basic concepts which I’ll sketch here.

The talk is full of links, in blue. If you click on these you can get more details. You can also watch a video of my talk: