*joint with Blake Pollard*

Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).

The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how *life* works. That’s a difficult job! But one has to start somewhere, and this is one place to start.

We have a few papers on this subject:

• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)

However, right now we just want to show you three closely connected results about how relative entropy changes in open Markov processes.

### Definitions

An **open Markov process** consists of a finite set of **states**, a subset of **boundary states**, and an **infinitesimal stochastic** operator meaning a linear operator with

and

For each state we introduce a **population** We call the resulting function the **population distribution**.

Populations evolve in time according to the **open master equation**:

So, the populations obey a linear differential equation at states that are not in the boundary, but they are specified ‘by the user’ to be chosen functions at the boundary states. The off-diagonal entry for describe the rate at which population transitions from the th to the th state.

A **closed Markov process**, or continuous-time discrete-state Markov chain, is an open Markov process whose boundary is empty. For a closed Markov process, the open master equation becomes the usual **master equation**:

In a closed Markov process the total population is conserved:

This lets us normalize the initial total population to 1 and have it stay equal to 1. If we do this, we can talk about *probabilities* instead of populations. In an open Markov process, population can flow in and out at the boundary states.

For any pair of distinct states is the flow of population from to The **net flux** of population from the th state to the th state is the flow from to minus the flow from to :

A **steady state** is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an **equilibrium**. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. The idea is that population can flow in or out at the boundary states.

We say an equilibrium of a Markov process is **detailed balanced** if all the net fluxes vanish:

or in other words:

Given two population distributions we can define the **relative entropy**

When is a detailed balanced equilibrium solution of the master equation, the relative entropy can be seen as the ‘free energy’ of For a precise statement, see Section 4 of Relative entropy in biological systems.

The Second Law of Thermodynamics implies that the free energy of a closed system tends to decrease with time, so for *closed* Markov processes we expect to be nonincreasing. And this is true! But for *open* Markov processes, free energy can flow in from outside. This is just one of several nice results about how relative entropy changes with time.

### Results

**Theorem 1.** Consider an open Markov process with as its set of states and as the set of boundary states. Suppose and obey the open master equation, and let the quantities

measure how much the time derivatives of and fail to obey the master equation. Then we have

This result separates the change in relative entropy change into two parts: an ‘internal’ part and a ‘boundary’ part.

It turns out the ‘internal’ part is always less than or equal to zero. So, from Theorem 1 we can deduce a version of the Second Law of Thermodynamics for open Markov processes:

**Theorem 2.** Given the conditions of Theorem 1, we have

Intuitively, this says that free energy can only increase if it comes in from the boundary!

There is another nice result that holds when is an equilibrium solution of the master equation. This idea seems to go back to Schnakenberg:

**Theorem 3.** Given the conditions of Theorem 1, suppose also that is an equilibrium solution of the master equation. Then we have

where

is the **net flux** from to while

is the conjugate **thermodynamic force**.

The flux has a nice meaning: it’s the net flow of population from to The thermodynamic force is a bit subtler, but this theorem reveals its meaning: it says how much the population *wants* to flow from to

More precisely, up to that factor of the thermodynamic force says how much free energy loss is caused by net flux from to There’s a nice analogy here to water losing potential energy as it flows downhill due to the force of gravity.

### Proofs

**Proof of Theorem 1.** We begin by taking the time derivative of the relative information:

We can separate this into a sum over states for which the time derivatives of and are given by the master equation, and boundary states for which they are not:

For boundary states we have

and similarly for the time derivative of We thus obtain

To evaluate the first sum, recall that

so

Thus, we have

We can rewrite this as

Since is infinitesimal stochastic we have so the first term drops out, and we are left with

as desired. █

**Proof of Theorem 2.** Thanks to Theorem 1, to prove

it suffices to show that

or equivalently (recalling the proof of Theorem 1):

The last two terms on the left hand side cancel when Thus, if we break the sum into an part and an part, the left side becomes

Next we can use the infinitesimal stochastic property of to write as the sum of over not equal to obtaining

Since when and for all we conclude that this quantity is █

**Proof of Theorem 3.** Now suppose also that is an equilibrium solution of the master equation. Then for all states so by Theorem 1 we need to show

We also have so the second

term in the sum at left vanishes, and it suffices to show

By definition we have

This in turn equals

and we can switch the dummy indices in the second sum, obtaining

or simply

But this is

and the first term vanishes because is infinitesimal stochastic: We thus have

as desired. █

[…] For a self-contained proof, see Information geometry (part 16), which is coming up soon. It will be a special case of the theorems there. […]

Is the definition of I(p, q) above missing terms to account for them being unnormalised populations rather than probability distributions?

Perhaps. Maybe you noticed that in our paper Relative entropy in biological systems we added such extra terms to ensure that is a

divergenceeven for unnormalized populations: in other words, greater than or equal to zero, and vanishing only forI came up with this idea at approximately the same time that Blake was doing the calculations here. We should see if these extra terms help or hurt the calculations here. I’m quite happy with the calculations already, so I forgot to try redoing them with the extra terms. There’s no sacred reason that we

needto be a divergence.In an earlier paper (RELATIVE ENTROPY IN BIOLOGICAL SYSTEMS) there was a list of possible definitions of relative entropy. My favourite is the Tsallis divergence with alpha=1/2. This the same as the Hellinger distance (https://en.wikipedia.org/wiki/Hellinger_distance).

I worked through the proofs of the three theorems using the Hellinger distance instead of the Kullback–Leibler divergence, and it all seems simpler and nicer. Or I made a mistake.

Graham wrote:

You’re reading our minds! Many of those divergences were examples of a single concept, the ‘f-divergence’. Blake has recently reproved all three above theorems for a general f-divergence.

Cool!

Why is the Hellinger distance your favorite?

By the way, the Wikipedia article says it’s an example of an f-divergence, but I don’t see why: they write it in a form where this is unobvious. Maybe I just need more coffee.

Some famous divergences are not f-divergences, but nonlinear monotone functions of f-divergences.

I like the Hellinger distance H(,) because it is a metric, and it is simpler than the square root of the Jensen–Shannon divergence. Also it’s familiar to me. In the continuous case it is nice to have a distance that doesn’t depend on units of measurement. Eg if you model the heights of grey squirrels with g(x) and the heights of red squirrels with r(x), then you get the same value of H(r,g) whether x is measured in mm or m.

I should have said the

squaredHellinger distance is the same as the Tsallis divergence with alpha=1/2. Perhaps there’s a factor of two different too.I’m probably missing something really obvious here but:

You are consistently writing:

But if I sum over all the only index remaining would be j. So shouldn’t this be ?

Are there some nontrivial which correspond to a detailed balanced Markov process regardless of what populations we have? (I.e. )

The condition lets me think that this is not necessarily the case. The easiest example I could come up with is where and – this should fulfill detailed balance and, if it’s a closed Markov process, it would simply leave all the populations constant. For an open one the populations should be able to grow or shrink but all of them equally, I think?

Kram wrote:

Yes, sorry! I’ll fix that typo. (Typos tend to become ‘consistent errors’ when I do a lot of cut-and-paste.)

I’m having a bit of trouble reading and understanding your next question, so I’ll think about it after I fix this!

I’m sorry if my second question isn’t as coherent as it should be. Basically, with that detailed-balanced condition in place, the allowed choice of seems to critically depend on my set of . All I was wondering is whether there are non-trivial which describe detailed balanced processes regardless of my current

I

thinkthat I found a rather trivial detailed balanced example in and but that’s a boring case that never changes (it’s in equilibrium “from day one”) and it also depends on my populations.I

thinkthat I found a rather trivial detailed balanced example in and but that’s a boring case that never changes (it’s in equilibrium “from day one”) and it also depends on my populations.I’m not certain I understand the question, but hope this helps.

If , then is determined by whenever . So if has ‘enough’ non-zeroes, is uniquely determined (up to a constant scaling) by the detailed-balanced condition.

has enough non-zeroes if the process is irreducible. (https://en.wikipedia.org/wiki/Continuous-time_Markov_chain#Irreducibility)

Actually, I think that should answer it perfectly, once I have learned what ever else I’m doing wrong (see my reply below), thanks!

Let me expand on Graham’s answer. We start with an infinitesimal stochastic matrix

Draw a graph with one vertex for each state and one directed edge from each vertex to each vertex whenever A

directededge is an edge with an arrow on it, in this case pointing from toTwo vertices and are

strongly connectedif there’s a directed edge path from to and also one back from to Adirected edge pathis a path of edges where at each step you move in the direction of the arrow, not against it.A

strongly connected componentis a maximal set of strongly connected vertices. For example, here is a graph with 3 strongly connected components:Now here’s an answer to Kram’s puzzle:

Theorem.If the graph associated to has just a single strongly connected compponent, there is only one probability distribution obeyingand thus

I’m a bit confused about the stochastic operator. I must be really misunderstanding something here.

You defined which forces . So the are negative.

Furthermore you said (for the closed case) .

But for a normal Markov process, if I’m not mistaken, for this to be the case, , even if and , or rather:

And the equilibrium can be found by

If I try doing the same thing with your definition, I get a divergence even in the closed case.

The simplest examples I have are: and with (so in this simple case the equilibrium is reached after a single step with both states having a population equal to the average of the initial population, and the total population remains the same) but

Meanwhile, if I’m not mistaken, the flows J, J’ are:

and

which in both cases simply dictates that all the populations for each state must be equal for these systems to be in a steady state. But for the second matrix this is not a steady state! Instead, the vector with populations immediately becomes which then

isa steady state, for any a (and it lost a population of which should not have happened since I’m considering purely closed Matrix processes here)What am I missing?

is an infinitesimal stochastic matrix, or ‘rate matrix’. You seem to be thinking of a matrix of transition probabilities. The connection is that .

ah, ok, that makes sense. Thanks! So it’s essentially because we are talking of a continuous process rather than a discrete one.

Is the example H’ I wrote up a correct example then?

I just checked and does indeed converge against something plausible, namely which is rather logical: It’ll average together all populations and return a constant vector in agreement with the condition above.

Similarly, my “” above which is, indeed, a transition matrix and should thus rather be called would have the corresponding infinitesimal version which does behave as required.

I think that has cleared this up for me. Thanks!

Graham beat me to it, but the problem is that Kram was writing formulas suitable for a

discrete-timeMarkov process, while this article is about acontinuous-timeMarkov process.In the discrete-time case we update a probability distribution as follows:

and the matrix needs to be

stochastic: all its entries must be nonnegative, and its columns must sum to 1.In the continuous-time case, we evolve a probability distribution according to this differential equation, called the

master equation:and the matrix needs to be

infinitesimal stochastic: all its off-diagonal entries must be nonnegative, and its columns must sum to 0.The discrete-time and continuous-time cases work rather similarly, but there are differences… the most important of which being that they’re not the same thing!

Thanks for this particularly user-friendly writeup, at least as measured by how far I got before slowing down a lot:-)

1) It might be even friendlier if you just called H a “matrix”, at least to start with.

2) Does the master equation

implythe net flux equations involving the Jij, or are the latter just an added part of the definition of a Markov process?1) In my community—namely, fancy-schmancy pure mathematicians—calling an operator a ‘matrix’ is considered

declassé: we say a linear operator is a kind of function from a vector space to a vector space, whichcan be describedby a matrix if you so wish.But this is a rather snobbish community, and I should try to remember now and then what ordinary folks are like.

2) The master equation implies that the flow of population from the jth state to the ith is

but also, simultaneously, the flow from the ith state to the jth is

Thus, the

netflow from the jth state to the ith is the difference of these,So this definition of the net flow is a natural spinoff of the master equation, not some sort of extra requirement.

Thanks John. When I first saw the master equation I read it as saying only that ‘s value

influencesthe change in (to the degree specified by ), but not necessarily via a direct flow from to . I guess you’re saying that, unless these influences always take the form of direct pairwise flows, the bookkeeping can’t work out.The value of influences the rate of change of various s and also of itself, but the ‘infinitesimal stochastic’ condition says that these rates of change are such that the decrease of is exactly counterbalanced by the increase of the s it is influencing. So, we interpret this ‘influence’ as a flow from to You could try another interpretation, but I don’t see a reasonable way to do it.

OK, thanks John.

[…] For a self-contained proof, see Information Geometry (Part 15), which is coming up soon. It will be a special case of the theorems there. […]