## Information Geometry (Part 15)

joint with Blake Pollard

Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).

The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how life works. That’s a difficult job! But one has to start somewhere, and this is one place to start.

We have a few papers on this subject:

• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)

However, right now we just want to show you three closely connected results about how relative entropy changes in open Markov processes.

### Definitions

An open Markov process consists of a finite set $X$ of states, a subset $B \subseteq X$ of boundary states, and an infinitesimal stochastic operator $H: \mathbb{R}^X \to \mathbb{R}^X,$ meaning a linear operator with

$H_{ij} \geq 0 \ \ \text{for all} \ \ i \neq j$

and

$\sum_i H_{ij} = 0 \ \ \text{for all} \ \ j$

For each state $i \in X$ we introduce a population $p_i \in [0,\infty).$ We call the resulting function $p : X \to [0,\infty)$ the population distribution.

Populations evolve in time according to the open master equation:

$\displaystyle{ \frac{dp_i}{dt} = \sum_j H_{ij}p_j} \ \ \text{for all} \ \ i \in X-B$

$p_i(t) = b_i(t) \ \ \text{for all} \ \ i \in B$

So, the populations $p_i$ obey a linear differential equation at states $i$ that are not in the boundary, but they are specified ‘by the user’ to be chosen functions $b_i$ at the boundary states. The off-diagonal entry $H_{ij}$ for $i \neq j$ describe the rate at which population transitions from the $j$th to the $i$th state.

A closed Markov process, or continuous-time discrete-state Markov chain, is an open Markov process whose boundary is empty. For a closed Markov process, the open master equation becomes the usual master equation:

$\displaystyle{ \frac{dp}{dt} = Hp }$

In a closed Markov process the total population is conserved:

$\displaystyle{ \frac{d}{dt} \sum_{i \in X} p_i = \sum_{i,j} H_{ij}p_j = 0 }$

This lets us normalize the initial total population to 1 and have it stay equal to 1. If we do this, we can talk about probabilities instead of populations. In an open Markov process, population can flow in and out at the boundary states.

For any pair of distinct states $i,j,$ $H_{ij}p_j$ is the flow of population from $j$ to $i.$ The net flux of population from the $j$th state to the $i$th state is the flow from $j$ to $i$ minus the flow from $i$ to $j$:

$J_{ij} = H_{ij}p_j - H_{ji}p_i$

A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. The idea is that population can flow in or out at the boundary states.

We say an equilibrium $p : X \to [0,\infty)$ of a Markov process is detailed balanced if all the net fluxes vanish:

$J_{ij} = 0 \ \ \text{for all} \ \ i,j \in X$

or in other words:

$H_{ij}p_j = H_{ji}p_i \ \ \text{for all} \ \ i,j \in X$

Given two population distributions $p, q : X \to [0,\infty)$ we can define the relative entropy

$\displaystyle{ I(p,q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right)}$

When $q$ is a detailed balanced equilibrium solution of the master equation, the relative entropy can be seen as the ‘free energy’ of $p.$ For a precise statement, see Section 4 of Relative entropy in biological systems.

The Second Law of Thermodynamics implies that the free energy of a closed system tends to decrease with time, so for closed Markov processes we expect $I(p,q)$ to be nonincreasing. And this is true! But for open Markov processes, free energy can flow in from outside. This is just one of several nice results about how relative entropy changes with time.

### Results

Theorem 1. Consider an open Markov process with $X$ as its set of states and $B$ as the set of boundary states. Suppose $p(t)$ and $q(t)$ obey the open master equation, and let the quantities

$\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_{j \in X} H_{ij}p_j }$

$\displaystyle{ \frac{Dq_i}{Dt} = \frac{dq_i}{dt} - \sum_{j \in X} H_{ij}q_j }$

measure how much the time derivatives of $p_i$ and $q_i$ fail to obey the master equation. Then we have

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)} \\ \\ && \; + \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} } \end{array}$

This result separates the change in relative entropy change into two parts: an ‘internal’ part and a ‘boundary’ part.

It turns out the ‘internal’ part is always less than or equal to zero. So, from Theorem 1 we can deduce a version of the Second Law of Thermodynamics for open Markov processes:

Theorem 2. Given the conditions of Theorem 1, we have

$\displaystyle{ \frac{d}{dt} I(p(t),q(t)) \; \le \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} }$

Intuitively, this says that free energy can only increase if it comes in from the boundary!

There is another nice result that holds when $q$ is an equilibrium solution of the master equation. This idea seems to go back to Schnakenberg:

Theorem 3. Given the conditions of Theorem 1, suppose also that $q$ is an equilibrium solution of the master equation. Then we have

$\displaystyle{ \frac{d}{dt} I(p(t),q) = -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} \; + \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} }$

where

$J_{ij} = H_{ij}p_j - H_{ji}p_i$

is the net flux from $j$ to $i,$ while

$\displaystyle{ A_{ij} = \ln \left(\frac{p_j q_i}{p_i q_j} \right) }$

is the conjugate thermodynamic force.

The flux $J_{ij}$ has a nice meaning: it’s the net flow of population from $j$ to $i.$ The thermodynamic force is a bit subtler, but this theorem reveals its meaning: it says how much the population wants to flow from $j$ to $i.$

More precisely, up to that factor of $1/2,$ the thermodynamic force $A_{ij}$ says how much free energy loss is caused by net flux from $j$ to $i.$ There’s a nice analogy here to water losing potential energy as it flows downhill due to the force of gravity.

### Proofs

Proof of Theorem 1. We begin by taking the time derivative of the relative information:

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i \in X} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} + \frac{\partial I}{\partial q_i} \frac{dq_i}{dt} } \end{array}$

We can separate this into a sum over states $i \in X - B,$ for which the time derivatives of $p_i$ and $q_i$ are given by the master equation, and boundary states $i \in B,$ for which they are not:

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i \in X-B, \; j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j }\\ \\ && + \; \; \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} + \frac{\partial I}{\partial q_i} \frac{dq_i}{dt}} \end{array}$

For boundary states we have

$\displaystyle{ \frac{dp_i}{dt} = \frac{Dp_i}{Dt} + \sum_{j \in X} H_{ij}p_j }$

and similarly for the time derivative of $q_i.$ We thus obtain

$\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(p(t),q(t)) } &=& \displaystyle{ \sum_{i,j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j }\\ \\ && + \; \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt}} \end{array}$

To evaluate the first sum, recall that

$\displaystyle{ I(p,q) = \sum_{i \in X} p_i \ln (\frac{p_i}{q_i})}$

so

$\displaystyle{\frac{\partial I}{\partial p_i}} =\displaystyle{1 + \ln (\frac{p_i}{q_i})} , \qquad \displaystyle{ \frac{\partial I}{\partial q_i}}= \displaystyle{- \frac{p_i}{q_i} }$

Thus, we have

$\displaystyle{ \sum_{i,j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j = \sum_{i,j\in X} (1 + \ln (\frac{p_i}{q_i})) H_{ij} p_j - \frac{p_i}{q_i} H_{ij} q_j }$

We can rewrite this as

$\displaystyle{ \sum_{i,j \in X} H_{ij} p_j \left( 1 + \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }$

Since $H_{ij}$ is infinitesimal stochastic we have $\sum_{i} H_{ij} = 0,$ so the first term drops out, and we are left with

$\displaystyle{ \sum_{i,j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }$

as desired.   █

Proof of Theorem 2. Thanks to Theorem 1, to prove

$\displaystyle{ \frac{d}{dt} I(p(t),q(t)) \; \le \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} + \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} }$

it suffices to show that

$\displaystyle{ \sum_{i,j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) \le 0 }$

or equivalently (recalling the proof of Theorem 1):

$\displaystyle{ \sum_{i,j} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \le 0 }$

The last two terms on the left hand side cancel when $i = j.$ Thus, if we break the sum into an $i \ne j$ part and an $i = j$ part, the left side becomes

$\displaystyle{ \sum_{i \ne j} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \; + \; \sum_j H_{jj} p_j \ln(\frac{p_j}{q_j}) }$

Next we can use the infinitesimal stochastic property of $H$ to write $H_{jj}$ as the sum of $-H_{ij}$ over $i$ not equal to $j,$ obtaining

$\displaystyle{ \sum_{i \ne j} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) - \sum_{i \ne j} H_{ij} p_j \ln(\frac{p_j}{q_j}) } =$

$\displaystyle{ \sum_{i \ne j} H_{ij} p_j \left( \ln(\frac{p_iq_j}{p_j q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) }$

Since $H_{ij} \ge 0$ when $i \ne j$ and $\ln(s) + 1 - s \le 0$ for all $s > 0,$ we conclude that this quantity is $\le 0.$   █

Proof of Theorem 3. Now suppose also that $q$ is an equilibrium solution of the master equation. Then $Dq_i/Dt = dq_i/dt = 0$ for all states $i,$ so by Theorem 1 we need to show

$\displaystyle{ \sum_{i, j \in X} H_{ij} p_j \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) \; = \; -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }$

We also have $\sum_{j \in X} H_{ij} q_j = 0,$ so the second
term in the sum at left vanishes, and it suffices to show

$\displaystyle{ \sum_{i, j \in X} H_{ij} p_j \ln(\frac{p_i}{q_i}) \; = \; - \frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }$

By definition we have

$\displaystyle{ \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} = \displaystyle{ \frac{1}{2} \sum_{i,j} \left( H_{ij} p_j - H_{ji}p_i \right) \ln \left( \frac{p_j q_i}{p_i q_j} \right) }$

This in turn equals

$\displaystyle{ \frac{1}{2} \sum_{i,j} H_{ij}p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) - \frac{1}{2} \sum_{i,j} H_{ji}p_i \ln \left( \frac{p_j q_i}{p_i q_j} \right) }$

and we can switch the dummy indices $i,j$ in the second sum, obtaining

$\displaystyle{ \frac{1}{2} \sum_{i,j} H_{ij}p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) - \frac{1}{2} \sum_{i,j} H_{ij}p_j \ln \left( \frac{p_i q_j}{p_j q_i} \right) }$

or simply

$\displaystyle{ \sum_{i,j} H_{ij} p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) }$

But this is

$\displaystyle{ \sum_{i,j} H_{ij} p_j \left(\ln ( \frac{p_j}{q_j}) + \ln (\frac{q_i}{p_i}) \right) }$

and the first term vanishes because $H$ is infinitesimal stochastic: $\sum_i H_{ij} = 0.$ We thus have

$\displaystyle{ \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} = \sum_{i,j} H_{ij} p_j \ln (\frac{q_i}{p_i} )$

as desired.   █

### 22 Responses to Information Geometry (Part 15)

1. […] For a self-contained proof, see Information geometry (part 16), which is coming up soon. It will be a special case of the theorems there.   […]

2. benmoran says:

Is the definition of I(p, q) above missing terms to account for them being unnormalised populations rather than probability distributions?

• John Baez says:

Perhaps. Maybe you noticed that in our paper Relative entropy in biological systems we added such extra terms to ensure that $I(p,q,)$ is a divergence even for unnormalized populations: in other words, greater than or equal to zero, and vanishing only for $p = q.$

I came up with this idea at approximately the same time that Blake was doing the calculations here. We should see if these extra terms help or hurt the calculations here. I’m quite happy with the calculations already, so I forgot to try redoing them with the extra terms. There’s no sacred reason that we need $I$ to be a divergence.

3. Graham says:

In an earlier paper (RELATIVE ENTROPY IN BIOLOGICAL SYSTEMS) there was a list of possible definitions of relative entropy. My favourite is the Tsallis divergence with alpha=1/2. This the same as the Hellinger distance (https://en.wikipedia.org/wiki/Hellinger_distance).

I worked through the proofs of the three theorems using the Hellinger distance instead of the Kullback–Leibler divergence, and it all seems simpler and nicer. Or I made a mistake.

• John Baez says:

Graham wrote:

In an earlier paper (Relative entropy in biological systems) there was a list of possible definitions of relative entropy.

You’re reading our minds! Many of those divergences were examples of a single concept, the ‘f-divergence’. Blake has recently reproved all three above theorems for a general f-divergence.

I worked through the proofs of the three theorems using the Hellinger distance instead of the Kullback–Leibler divergence, and it all seems simpler and nicer.

Cool!

Why is the Hellinger distance your favorite?

By the way, the Wikipedia article says it’s an example of an f-divergence, but I don’t see why: they write it in a form where this is unobvious. Maybe I just need more coffee.

Some famous divergences are not f-divergences, but nonlinear monotone functions of f-divergences.

• Graham says:

I like the Hellinger distance H(,) because it is a metric, and it is simpler than the square root of the Jensen–Shannon divergence. Also it’s familiar to me. In the continuous case it is nice to have a distance that doesn’t depend on units of measurement. Eg if you model the heights of grey squirrels with g(x) and the heights of red squirrels with r(x), then you get the same value of H(r,g) whether x is measured in mm or m.

I should have said the squared Hellinger distance is the same as the Tsallis divergence with alpha=1/2. Perhaps there’s a factor of two different too.

4. I’m probably missing something really obvious here but:
You are consistently writing:
$\sum_i{H_{ij}}=0 \ \forall_i$
But if I sum over all $i$ the only index remaining would be j. So shouldn’t this be $\forall_j$ ?

Are there some nontrivial $H_{ij}$ which correspond to a detailed balanced Markov process regardless of what populations we have? (I.e. $\exists{H}|\forall p J=0$ )
The condition $J_{ij}=H_{ij}p_i-H_{ji}p_j=0$ lets me think that this is not necessarily the case. The easiest example I could come up with is where $H_{ij}=H_{ji}$ and $p_i = p_j = p_0 = p = const \forall_{i \ j}$ – this should fulfill detailed balance and, if it’s a closed Markov process, it would simply leave all the populations constant. For an open one the populations should be able to grow or shrink but all of them equally, I think?

• John Baez says:

Kram wrote:

shouldn’t this be $\forall_j$?

Yes, sorry! I’ll fix that typo. (Typos tend to become ‘consistent errors’ when I do a lot of cut-and-paste.)

I’m having a bit of trouble reading and understanding your next question, so I’ll think about it after I fix this!

5. I’m sorry if my second question isn’t as coherent as it should be. Basically, with that detailed-balanced condition in place, the allowed choice of $H_{ij}$ seems to critically depend on my set of $p_i$. All I was wondering is whether there are non-trivial $H_{ij}$ which describe detailed balanced processes regardless of my current $p_i$

I think that I found a rather trivial detailed balanced example in $H_{ij} = H_{ji}$ and $p_i = p_j = const$ but that’s a boring case that never changes (it’s in equilibrium “from day one”) and it also depends on my populations.
I think that I found a rather trivial detailed balanced example in $H_{ij} = H_{ji}$ and $p_i = p_j = const$ but that’s a boring case that never changes (it’s in equilibrium “from day one”) and it also depends on my populations.

• Graham says:

I’m not certain I understand the question, but hope this helps.

If $H_{ij}p_i = H_{ji}p_j$, then $p_j$ is determined by $p_i$ whenever $H_{ji} \neq 0$. So if $H$ has ‘enough’ non-zeroes, $p$ is uniquely determined (up to a constant scaling) by the detailed-balanced condition.
$H$ has enough non-zeroes if the process is irreducible. (https://en.wikipedia.org/wiki/Continuous-time_Markov_chain#Irreducibility)

• Actually, I think that should answer it perfectly, once I have learned what ever else I’m doing wrong (see my reply below), thanks!

• John Baez says:

Let me expand on Graham’s answer. We start with an infinitesimal stochastic matrix $H.$

Draw a graph with one vertex for each state $i,$ and one directed edge from each vertex $j$ to each vertex $i$ whenever $H_{ij} \ne 0.$ A directed edge is an edge with an arrow on it, in this case pointing from $j$ to $i.$

Two vertices $i$ and $j$ are strongly connected if there’s a directed edge path from $i$ to $j$ and also one back from $j$ to $i.$ A directed edge path is a path of edges where at each step you move in the direction of the arrow, not against it.

A strongly connected component is a maximal set of strongly connected vertices. For example, here is a graph with 3 strongly connected components:

Now here’s an answer to Kram’s puzzle:

Theorem. If the graph associated to $H$ has just a single strongly connected compponent, there is only one probability distribution obeying

$H p = 0$

and thus

$\displaystyle{ \frac{d p}{d t} = 0}$

• I’m a bit confused about the stochastic operator. I must be really misunderstanding something here.

You defined $H_{ij} \ge 0 \ \forall_{i \ne j} \ \text{and} \ \sum_i{H_{ij}} = 0 \ \forall_j$ which forces $H_{ii} = -\sum_{i \ne j}{H_{ij}}\le 0 \ \forall_{ij}$. So the $H_{ii}$ are negative.

Furthermore you said (for the closed case) $\frac{dp_i}{dt}=\sum_j H_{ij} p_j$.

But for a normal Markov process, if I’m not mistaken, for this to be the case, $H_{ij} \ge 0 \ \forall_{ij}$, even if $i=j$ and $\sum_i{H_{ij}}=1$, or rather:

$p_i^{(n+1)}=H_{ij} p_j^{(n)}$

And the equilibrium can be found by

$p_i^{(\infty)} =\lim_{n \to \infty} H_{ij}^n p_j^{(0)}$

If I try doing the same thing with your definition, I get a divergence even in the closed case.

The simplest examples I have are: $H=\left[\begin{matrix} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \\ \end{matrix}\right]$ and $H'=\left[\begin{matrix} -1 & \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & -1 & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} & -1 \end{matrix}\right]$ with $H^\infty=H$ (so in this simple case the equilibrium is reached after a single step with both states having a population equal to the average of the initial population, and the total population remains the same) but $\not{\exists} \lim_{n->\infty} H'^n$

Meanwhile, if I’m not mistaken, the flows J, J’ are:

$J=\frac{1}{2}\left[\begin{matrix} 0 & p_2-p_1 \\ p_1-p_2 & 0\end{matrix}\right]$ and $J'=\frac{1}{2}\left[\begin{matrix}0 & p_2-p_1 & p_3-p_1 \\p_1-p_2 & 0 & p_3-p_2\\p_1-p_3 & p_2-p_3 & 0\end{matrix}\right]$

which in both cases simply dictates that all the populations for each state must be equal for these systems to be in a steady state. But for the second matrix this is not a steady state! Instead, the vector with populations $\left[\begin{matrix} a \\ a \\ a \end{matrix}\right]$ immediately becomes $\left[\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}\right]$ which then is a steady state, for any a (and it lost a population of $3a$ which should not have happened since I’m considering purely closed Matrix processes here)

What am I missing?

• Graham says:

$H$ is an infinitesimal stochastic matrix, or ‘rate matrix’. You seem to be thinking of a matrix $P$ of transition probabilities. The connection is that $P(t) = \exp(Ht)$.

• ah, ok, that makes sense. Thanks! So it’s essentially because we are talking of a continuous process rather than a discrete one.
Is the example H’ I wrote up a correct example then?
I just checked and $\lim_{t->\infty} e^{H' t}$ does indeed converge against something plausible, namely $\frac{1}{3} \left[\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{matrix}\right]$ which is rather logical: It’ll average together all populations and return a constant vector in agreement with the condition $J=0$ above.

Similarly, my “$H$” above which is, indeed, a transition matrix and should thus rather be called $P$ would have the corresponding infinitesimal version $H = \left[\begin{matrix} -1 & 1 \\ 1 & -1 \end{matrix}\right]$ which does behave as required.

I think that has cleared this up for me. Thanks!

• John Baez says:

Graham beat me to it, but the problem is that Kram was writing formulas suitable for a discrete-time Markov process, while this article is about a continuous-time Markov process.

In the discrete-time case we update a probability distribution as follows:

$p_i(t+1) = \sum_{j} U_{ij} p_j(t)$

and the matrix $U$ needs to be stochastic: all its entries must be nonnegative, and its columns must sum to 1.

In the continuous-time case, we evolve a probability distribution according to this differential equation, called the master equation:

$\displaystyle{ \frac{dp_i(t)}{dt} = \sum_j H_{ij} p_j(t) }$

and the matrix $H$ needs to be infinitesimal stochastic: all its off-diagonal entries must be nonnegative, and its columns must sum to 0.

The discrete-time and continuous-time cases work rather similarly, but there are differences… the most important of which being that they’re not the same thing!

6. arch1 says:

Thanks for this particularly user-friendly writeup, at least as measured by how far I got before slowing down a lot:-)
1) It might be even friendlier if you just called H a “matrix”, at least to start with.
2) Does the master equation imply the net flux equations involving the Jij, or are the latter just an added part of the definition of a Markov process?

• John Baez says:

1) In my community—namely, fancy-schmancy pure mathematicians—calling an operator a ‘matrix’ is considered declassé: we say a linear operator is a kind of function from a vector space to a vector space, which can be described by a matrix if you so wish.

But this is a rather snobbish community, and I should try to remember now and then what ordinary folks are like.

2) The master equation implies that the flow of population from the jth state to the ith is

$H_{ij} p_j$

but also, simultaneously, the flow from the ith state to the jth is

$H_{ij} p_i$

Thus, the net flow from the jth state to the ith is the difference of these,

$J_{ij} = H_{ij} p_j - H_{ji} p_i$

So this definition of the net flow is a natural spinoff of the master equation, not some sort of extra requirement.

• arch1 says:

Thanks John. When I first saw the master equation I read it as saying only that $p_j$‘s value influences the change in $p_i$ (to the degree specified by $H_{ij}$), but not necessarily via a direct flow from $p_j$ to $p_i$. I guess you’re saying that, unless these influences always take the form of direct pairwise flows, the bookkeeping can’t work out.

• John Baez says:

The value of $p_j$ influences the rate of change of various $p_i$s and also of $p_j$ itself, but the ‘infinitesimal stochastic’ condition says that these rates of change are such that the decrease of $p_i$ is exactly counterbalanced by the increase of the $p_j$s it is influencing. So, we interpret this ‘influence’ as a flow from $j$ to $i.$ You could try another interpretation, but I don’t see a reasonable way to do it.

7. arch1 says:

OK, thanks John.

8. […] For a self-contained proof, see Information Geometry (Part 15), which is coming up soon. It will be a special case of the theorems there.   […]