joint with Blake Pollard
Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).
The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how life works. That’s a difficult job! But one has to start somewhere, and this is one place to start.
We have a few papers on this subject:
• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)
• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)
• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)
However, right now we just want to show you three closely connected results about how relative entropy changes in open Markov processes.
Definitions
An open Markov process consists of a finite set of states, a subset
of boundary states, and an infinitesimal stochastic operator
meaning a linear operator with
and
For each state we introduce a population
We call the resulting function
the population distribution.
Populations evolve in time according to the open master equation:
So, the populations obey a linear differential equation at states
that are not in the boundary, but they are specified ‘by the user’ to be chosen functions
at the boundary states. The off-diagonal entry
for
describe the rate at which population transitions from the
th to the
th state.
A closed Markov process, or continuous-time discrete-state Markov chain, is an open Markov process whose boundary is empty. For a closed Markov process, the open master equation becomes the usual master equation:
In a closed Markov process the total population is conserved:
This lets us normalize the initial total population to 1 and have it stay equal to 1. If we do this, we can talk about probabilities instead of populations. In an open Markov process, population can flow in and out at the boundary states.
For any pair of distinct states
is the flow of population from
to
The net flux of population from the
th state to the
th state is the flow from
to
minus the flow from
to
:
A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. The idea is that population can flow in or out at the boundary states.
We say an equilibrium of a Markov process is detailed balanced if all the net fluxes vanish:
or in other words:
Given two population distributions we can define the relative entropy
When is a detailed balanced equilibrium solution of the master equation, the relative entropy can be seen as the ‘free energy’ of
For a precise statement, see Section 4 of Relative entropy in biological systems.
The Second Law of Thermodynamics implies that the free energy of a closed system tends to decrease with time, so for closed Markov processes we expect to be nonincreasing. And this is true! But for open Markov processes, free energy can flow in from outside. This is just one of several nice results about how relative entropy changes with time.
Results
Theorem 1. Consider an open Markov process with as its set of states and
as the set of boundary states. Suppose
and
obey the open master equation, and let the quantities
measure how much the time derivatives of and
fail to obey the master equation. Then we have
This result separates the change in relative entropy change into two parts: an ‘internal’ part and a ‘boundary’ part.
It turns out the ‘internal’ part is always less than or equal to zero. So, from Theorem 1 we can deduce a version of the Second Law of Thermodynamics for open Markov processes:
Theorem 2. Given the conditions of Theorem 1, we have
Intuitively, this says that free energy can only increase if it comes in from the boundary!
There is another nice result that holds when is an equilibrium solution of the master equation. This idea seems to go back to Schnakenberg:
Theorem 3. Given the conditions of Theorem 1, suppose also that is an equilibrium solution of the master equation. Then we have
where
is the net flux from to
while
is the conjugate thermodynamic force.
The flux has a nice meaning: it’s the net flow of population from
to
The thermodynamic force is a bit subtler, but this theorem reveals its meaning: it says how much the population wants to flow from
to
More precisely, up to that factor of the thermodynamic force
says how much free energy loss is caused by net flux from
to
There’s a nice analogy here to water losing potential energy as it flows downhill due to the force of gravity.
Proofs
Proof of Theorem 1. We begin by taking the time derivative of the relative information:
We can separate this into a sum over states for which the time derivatives of
and
are given by the master equation, and boundary states
for which they are not:
For boundary states we have
and similarly for the time derivative of We thus obtain
To evaluate the first sum, recall that
so
Thus, we have
We can rewrite this as
Since is infinitesimal stochastic we have
so the first term drops out, and we are left with
as desired. █
Proof of Theorem 2. Thanks to Theorem 1, to prove
it suffices to show that
or equivalently (recalling the proof of Theorem 1):
The last two terms on the left hand side cancel when Thus, if we break the sum into an
part and an
part, the left side becomes
Next we can use the infinitesimal stochastic property of to write
as the sum of
over
not equal to
obtaining
Since when
and
for all
we conclude that this quantity is
█
Proof of Theorem 3. Now suppose also that is an equilibrium solution of the master equation. Then
for all states
so by Theorem 1 we need to show
We also have so the second
term in the sum at left vanishes, and it suffices to show
By definition we have
This in turn equals
and we can switch the dummy indices in the second sum, obtaining
or simply
But this is
and the first term vanishes because is infinitesimal stochastic:
We thus have
as desired. █
[…] For a self-contained proof, see Information geometry (part 16), which is coming up soon. It will be a special case of the theorems there. […]
Is the definition of I(p, q) above missing terms to account for them being unnormalised populations rather than probability distributions?
Perhaps. Maybe you noticed that in our paper Relative entropy in biological systems we added such extra terms to ensure that
is a divergence even for unnormalized populations: in other words, greater than or equal to zero, and vanishing only for 
I came up with this idea at approximately the same time that Blake was doing the calculations here. We should see if these extra terms help or hurt the calculations here. I’m quite happy with the calculations already, so I forgot to try redoing them with the extra terms. There’s no sacred reason that we need
to be a divergence.
In an earlier paper (RELATIVE ENTROPY IN BIOLOGICAL SYSTEMS) there was a list of possible definitions of relative entropy. My favourite is the Tsallis divergence with alpha=1/2. This the same as the Hellinger distance (https://en.wikipedia.org/wiki/Hellinger_distance).
I worked through the proofs of the three theorems using the Hellinger distance instead of the Kullback–Leibler divergence, and it all seems simpler and nicer. Or I made a mistake.
Graham wrote:
You’re reading our minds! Many of those divergences were examples of a single concept, the ‘f-divergence’. Blake has recently reproved all three above theorems for a general f-divergence.
Cool!
Why is the Hellinger distance your favorite?
By the way, the Wikipedia article says it’s an example of an f-divergence, but I don’t see why: they write it in a form where this is unobvious. Maybe I just need more coffee.
Some famous divergences are not f-divergences, but nonlinear monotone functions of f-divergences.
I like the Hellinger distance H(,) because it is a metric, and it is simpler than the square root of the Jensen–Shannon divergence. Also it’s familiar to me. In the continuous case it is nice to have a distance that doesn’t depend on units of measurement. Eg if you model the heights of grey squirrels with g(x) and the heights of red squirrels with r(x), then you get the same value of H(r,g) whether x is measured in mm or m.
I should have said the squared Hellinger distance is the same as the Tsallis divergence with alpha=1/2. Perhaps there’s a factor of two different too.
I’m probably missing something really obvious here but:

the only index remaining would be j. So shouldn’t this be
?
You are consistently writing:
But if I sum over all
Are there some nontrivial
which correspond to a detailed balanced Markov process regardless of what populations we have? (I.e.
)
lets me think that this is not necessarily the case. The easiest example I could come up with is where
and
– this should fulfill detailed balance and, if it’s a closed Markov process, it would simply leave all the populations constant. For an open one the populations should be able to grow or shrink but all of them equally, I think?
The condition
Kram wrote:
Yes, sorry! I’ll fix that typo. (Typos tend to become ‘consistent errors’ when I do a lot of cut-and-paste.)
I’m having a bit of trouble reading and understanding your next question, so I’ll think about it after I fix this!
I’m sorry if my second question isn’t as coherent as it should be. Basically, with that detailed-balanced condition in place, the allowed choice of
seems to critically depend on my set of
. All I was wondering is whether there are non-trivial
which describe detailed balanced processes regardless of my current 
I think that I found a rather trivial detailed balanced example in
and
but that’s a boring case that never changes (it’s in equilibrium “from day one”) and it also depends on my populations.
and
but that’s a boring case that never changes (it’s in equilibrium “from day one”) and it also depends on my populations.
I think that I found a rather trivial detailed balanced example in
I’m not certain I understand the question, but hope this helps.
If
, then
is determined by
whenever
. So if
has ‘enough’ non-zeroes,
is uniquely determined (up to a constant scaling) by the detailed-balanced condition.
has enough non-zeroes if the process is irreducible. (https://en.wikipedia.org/wiki/Continuous-time_Markov_chain#Irreducibility)
Actually, I think that should answer it perfectly, once I have learned what ever else I’m doing wrong (see my reply below), thanks!
Let me expand on Graham’s answer. We start with an infinitesimal stochastic matrix
Draw a graph with one vertex for each state
and one directed edge from each vertex
to each vertex
whenever
A directed edge is an edge with an arrow on it, in this case pointing from
to 
Two vertices
and
are strongly connected if there’s a directed edge path from
to
and also one back from
to
A directed edge path is a path of edges where at each step you move in the direction of the arrow, not against it.
A strongly connected component is a maximal set of strongly connected vertices. For example, here is a graph with 3 strongly connected components:
Now here’s an answer to Kram’s puzzle:
Theorem. If the graph associated to
has just a single strongly connected compponent, there is only one probability distribution obeying
and thus
I’m a bit confused about the stochastic operator. I must be really misunderstanding something here.
You defined
which forces
. So the
are negative.
Furthermore you said (for the closed case)
.
But for a normal Markov process, if I’m not mistaken, for this to be the case,
, even if
and
, or rather:
And the equilibrium can be found by
If I try doing the same thing with your definition, I get a divergence even in the closed case.
The simplest examples I have are:
and
with
(so in this simple case the equilibrium is reached after a single step with both states having a population equal to the average of the initial population, and the total population remains the same) but 
Meanwhile, if I’m not mistaken, the flows J, J’ are:
which in both cases simply dictates that all the populations for each state must be equal for these systems to be in a steady state. But for the second matrix this is not a steady state! Instead, the vector with populations
immediately becomes
which then is a steady state, for any a (and it lost a population of
which should not have happened since I’m considering purely closed Matrix processes here)
What am I missing?
ah, ok, that makes sense. Thanks! So it’s essentially because we are talking of a continuous process rather than a discrete one.
does indeed converge against something plausible, namely
which is rather logical: It’ll average together all populations and return a constant vector in agreement with the condition
above.
Is the example H’ I wrote up a correct example then?
I just checked and
Similarly, my “
” above which is, indeed, a transition matrix and should thus rather be called
would have the corresponding infinitesimal version
which does behave as required.
I think that has cleared this up for me. Thanks!
Graham beat me to it, but the problem is that Kram was writing formulas suitable for a discrete-time Markov process, while this article is about a continuous-time Markov process.
In the discrete-time case we update a probability distribution as follows:
and the matrix
needs to be stochastic: all its entries must be nonnegative, and its columns must sum to 1.
In the continuous-time case, we evolve a probability distribution according to this differential equation, called the master equation:
and the matrix
needs to be infinitesimal stochastic: all its off-diagonal entries must be nonnegative, and its columns must sum to 0.
The discrete-time and continuous-time cases work rather similarly, but there are differences… the most important of which being that they’re not the same thing!
Thanks for this particularly user-friendly writeup, at least as measured by how far I got before slowing down a lot:-)
1) It might be even friendlier if you just called H a “matrix”, at least to start with.
2) Does the master equation imply the net flux equations involving the Jij, or are the latter just an added part of the definition of a Markov process?
1) In my community—namely, fancy-schmancy pure mathematicians—calling an operator a ‘matrix’ is considered declassé: we say a linear operator is a kind of function from a vector space to a vector space, which can be described by a matrix if you so wish.
But this is a rather snobbish community, and I should try to remember now and then what ordinary folks are like.
2) The master equation implies that the flow of population from the jth state to the ith is
but also, simultaneously, the flow from the ith state to the jth is
Thus, the net flow from the jth state to the ith is the difference of these,
So this definition of the net flow is a natural spinoff of the master equation, not some sort of extra requirement.
Thanks John. When I first saw the master equation I read it as saying only that
‘s value influences the change in
(to the degree specified by
), but not necessarily via a direct flow from
to
. I guess you’re saying that, unless these influences always take the form of direct pairwise flows, the bookkeeping can’t work out.
The value of
influences the rate of change of various
s and also of
itself, but the ‘infinitesimal stochastic’ condition says that these rates of change are such that the decrease of
is exactly counterbalanced by the increase of the
s it is influencing. So, we interpret this ‘influence’ as a flow from
to
You could try another interpretation, but I don’t see a reasonable way to do it.
OK, thanks John.
[…] For a self-contained proof, see Information Geometry (Part 15), which is coming up soon. It will be a special case of the theorems there. […]