A Second Law for Open Markov Processes

What comes to mind when you hear the term ‘random process’? Do you think of Brownian motion? Do you think of particles hopping around? Do you think of a drunkard staggering home?

Today I’m going to tell you about a version of the drunkard’s walk with a few modifications. Firstly, we don’t have just one drunkard: we can have any positive real number of drunkards. Secondly, our drunkards have no memory; where they go next doesn’t depend on where they’ve been. Thirdly, there are special places, such as entrances to bars, where drunkards magically appear and disappear.

The second condition says that our drunkards satisfy the Markov property, making their random walk into a Markov process. The third condition is really what I want to tell you about, because it makes our Markov process into a more general ‘open Markov process’.

There are a collection of places the drunkards can be, for example:

$V= \{ \text{bar},\text{sidewalk}, \text{street}, \text{taco truck}, \text{home} \}$

We call this set $V$ the set of states. There are certain probabilities associated with traveling between these places. We call these transition rates. For example it is more likely for a drunkard to go from the bar to the taco truck than to go from the bar to home so the transition rate between the bar and the taco truck should be greater than the transition rate from the bar to home. Sometimes you can’t get from one place to another without passing through intermediate places. In reality the drunkard can’t go directly from the bar to the taco truck: he or she has to go from the bar to sidewalk to the taco truck.

This information can all be summarized by drawing a directed graph where the positive numbers labelling the edges are the transition rates:

For simplicity we draw only three states: home, bar, taco truck. Drunkards go from home to the bar and back, but they never go straight from home to the taco truck.

We can keep track of where all of our drunkards are using a vector with 3 entries:

$\displaystyle{ p(t) = \left( \begin{array}{c} p_h(t) \\ p_b(t) \\ p_{tt}(t) \end{array} \right) \in \mathbb{R}^3 }$

We call this our population distribution. The first entry $p_h$ is the number of drunkards that are at home, the second $p_b$ is how many are at the bar, and the third $p_{tt}$ is how many are at the taco truck.

There is a set of coupled, linear, first-order differential equations we can write down using the information in our graph that tells us how the number of drunkards in each place change with time. This is called the master equation:

$\displaystyle{ \frac{d p}{d t} = H p }$

where $H$ is a 3×3 matrix which we call the Hamiltonian. The off-diagonal entries are nonnegative:

$H_{ij} \geq 0, i \neq j$

and the columns sum to zero:

$\sum_i H_{ij}=0$

We call a matrix satisfying these conditions infinitesimal stochastic. Stochastic matrices have columns that sum to one. If we take the exponential of an infinitesimal stochastic matrix we get one whose columns sum to one, hence the label ‘infinitesimal’.

The Hamiltonian for the graph above is

$H = \left( \begin{array}{ccc} -2 & 5 & 10 \\ 2 & -12 & 0 \\ 0 & 7 & -10 \end{array} \right)$

John has written a lot about Markov processes and infinitesimal stochastic Hamiltonians in previous posts.

Given two vectors $p,q \in \mathbb{R}^3$ describing the populations of drunkards which obey the same master equation, we can calculate the relative entropy of $p$ relative to $q$ :

$\displaystyle{ S(p,q) = \sum_{ i \in V} p_i \ln \left( \frac{p_i}{q_i} \right) }$

This is an example of a ‘divergence’. In statistics, a divergence a way of measuring the distance between probability distributions, which may not be symmetrical and may even not obey the triangle inequality.

The relative entropy is important because it decreases monotonically with time, making it a Lyapunov function for Markov processes. Indeed, it is a well known fact that

$\displaystyle{ \frac{dS(p(t),q(t) ) } {dt} \leq 0 }$

This is true for any two population distributions which evolve according to the same master equation, though you have to allow infinity as a possible value for the relative entropy and negative infinity for its time derivative.

Why is entropy decreasing? Doesn’t the Second Law of Thermodynamics say entropy increases?

Don’t worry: the reason is that I have not put a minus sign in my definition of relative entropy. Put one in if you like, and then it will increase. Sometimes without the minus sign it’s called the Kullback–Leibler divergence. This decreases with the passage of time, saying that any two population distributions $p(t)$ and $q(t)$ get ‘closer together’ as they get randomized with the passage of time.

That itself is a nice result, but I want to tell you what happens when you allow drunkards to appear and disappear at certain states. Drunkards appear at the bar once they’ve had enough to drink and once they are home for long enough they can disappear. The set of places where drunkards can appear or disappear $B$ is called the set of boundary states. So for the above process

$B = \{ \text{home},\text{bar} \}$

is the set of boundary states. This changes the way in which the population of drunkards changes with time!

The drunkards at the taco truck obey the master equation. For them,

$\displaystyle{ \frac{dp_{tt}}{dt} = 7p_b -10 p_{tt} }$

still holds. But because the populations can appear or disappear at the boundary states the master equation no longer holds at those states! Instead it is useful to define the flow of drunkards into the $i^{th}$ state by

$\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt}-\sum_j H_{ij} p_j}$

This quantity describes by how much the rate of change of the populations at the boundary states differ from that given by the master equation.

The reason why we are interested in open Markov processes is because you can take two open Markov processes and glue them together along some subset of their boundary states to get a new open Markov process! This allows us to build up or break down complicated Markov processes using open Markov processes as the building blocks.

For example we can draw the graph corresponding to the drunkards’ walk again, only now we will distinguish boundary states from internal states by coloring internal states blue and having boundary states be white:

Consider another open Markov process with states

$V=\{ \text{home},\text{work},\text{bar} \}$

where

$B=\{ \text{home}, \text{bar}\}$

are the boundary states, leaving

$I=\{\text{work}\}$

as an internal state:

Since the boundary states of this process overlap with the boundary states of the first process we can compose the two to form a new Markov process:

Notice the boundary states are now internal states. I hope any Markov process that could approximately model your behavior has more interesting nodes! There is a nice way to figure out the Hamiltonian of the composite from the Hamiltonians of the pieces, but we will leave that for another time.

We can ask ourselves, how does relative entropy change with time in open Markov processes? You can read my paper for the details, but here is the punchline:

$\displaystyle{ \frac{dS(p(t),q(t) ) }{dt} \leq \sum_{i \in B} \frac{Dp_i}{Dt}\frac{\partial S}{\partial p_i} + \frac{Dq_i}{Dt}\frac{\partial S}{\partial q_i} }$

This is a version of the Second Law of Thermodynamics for open Markov processes.

It is important to notice that the sum is only over the boundary states! This inequality tells us that relative entropy still decreases inside our process, but depending on the flow of populations through the boundary states the relative entropy of the whole process could either increase or decrease! This inequality will be important when we study how the relative entropy changes in different parts of a bigger more complicated process.

That is all for now, but I leave it as an exercise for you to imagine a Markov process that describes your life. How many states does it have? What are the relative transition rates? Are there states you would like to spend more or less time in? Are there states somewhere you would like to visit?

Here is my paper, which proves the above inequality:

• Blake Pollard, A Second Law for open Markov processes, Open Systems and Information Dynamics 23 (2016), 1650006.

If you have comments or corrections, let me know!

This entry was posted on Saturday, November 15th, 2014 at 3:00 am and is filed under information and entropy, mathematics, physics. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

29 Responses to A Second Law for Open Markov Processes

enrique says:

15 November, 2014 at 5:45 pm

Reblogged this on prior probability and commented:
If you are fascinated by the concept of randomness and by Markov processes, there is a high probability that you will enjoy the post below …

Reply
Bruce Smith says:

15 November, 2014 at 7:11 pm

How are the diagonal entries of the Hamiltonian computed from the transition graph, in your first example? (I guessed “total incoming rate minus total outgoing rate”, but that doesn’t quite fit the numbers given.)

Reply
- Bruce Smith says:
  
  15 November, 2014 at 7:39 pm
  
  Ah, I see you answered this already by saying that the columns should sum to zero. Sorry, nevermind!
  
  Reply
  - Blake Pollard says:
    
    17 November, 2014 at 8:06 pm
    
    Hi Bruce, thanks for your question. I just wanted to mention that people in different fields often work with ‘row infinitesimal stochastic’ meaning that the rows sum to zero. Then when you take the exponential you get a stochastic matrix whose rows sum to one. People often use the terms left stochastic for when the columns sum to one, because you multiply on the left of your probability distribution. Similarly right stochastic is used to refer to row stochastic because you multiply that matrix on the right of your probability distribution. So there is plenty of room for confusion!
    
    Reply
domenico says:

17 November, 2014 at 11:53 am

I write some thing, but I don’t know if they are trivial.
The master equation can be transformed in higher order derivative
$\frac{d^2 p}{dt^2} = H \frac{d p}{dt} = H^2 p$
so that it is simple to obtain
$e^{\frac{d}{dt}} = e^{H}$
I think that it is ever possible to obtain a polynomial equation with degree equal to the dimension of the matrix H:
$\sum_i a_i H^i = 0$
where the coefficients are obtained from the coefficient the product:
$\prod_i (\lambda - \lambda_i) = 0$
where the parameters are the eigenvalues of the Hamiltonian (this is simple if it is possible the eigendecomposition of the Hamiltonian, so that the diagonalized polynomial have only eigenvalues).
So that exist a linear differential equation with the same solution of the eigendecomposition polynomial:
$\sum_i a_i \frac{d^i p}{dt^i} = \sum_i a_i H^i p = 0$
it is interesting (if it is all right) that a differential equation have the same dynamic of the master equation, and that there is a Second Law for a differential equation, and that the dynamic of a differential equation depends only on the eigenvalues of a Hamiltonian (the geometry, the form of the surface covered by possible the trajectories depends only on the eigenvalue).

Reply
- Blake Pollard says:
  
  17 November, 2014 at 8:11 pm
  
  I’m not sure I understand your comment. Let me just mention that it is true that we get solutions of the form $p(t) = p(0) e^{Ht}$ to the master equation. When you say you get a polynomial equation, $\sum_i a_i H^i$ , do you mean some numbers times different powers of $H$ ? So for a 3×3 Hamiltonian you mean an equation like $a_1H+a_2H^2+a_3H^3 = 0$ ?
  
  Reply
  - domenico says:
    
    17 November, 2014 at 10:16 pm
    
    Yes, a polynomial with Hamiltonian matrices instead of numbers.
    The exponential was an example, because each function of derivative is a function of Hamiltonian
    $f\left(\frac{d}{dt}\right) p = f\left(H \right) p$
    so that the exponential was the first example that came to my mind.
    
    Reply
John Baez says:

17 November, 2014 at 8:34 pm

Blake: I wrote a very simplified introduction to this blog article on Google+. I hope some people go from that to reading the actual blog article, because the intro just provides context; it doesn’t delve into what you’ve actually done!

Reply
nad says:

18 November, 2014 at 2:47 pm

John wrote on Google+:

Very roughly speaking, he shows that for these systems, entropy can only decrease if it flows out of the system.

Do you interpret an entropy change at the boundary ( $\frac{\partial S}{\partial p_i } \neq 0$ ) as an entropy which flows in/out of the system?

Reply
- Blake Pollard says:
  
  18 November, 2014 at 6:35 pm
  
  The term $\frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_j H_{ij} p_j$ is the ‘flow’ of probability ‘into’ the $i^{th}$ vertex. The term $\frac{\partial S}{\partial p_i}$ tells you how the relative entropy changes when you change $p_i$ . So the term $\frac{Dp_i}{Dt}\frac{\partial S}{\partial p_i}$ gives the rate at which relative entropy flows into the system at the $i^{th}$ vertex.
  
  Reply
  - nad says:
    
    18 November, 2014 at 7:22 pm
    
    Thanks for the answer.
    
    I have a little problem to see this “flow of probabilities” as such that is I miss somehow some words on what happens with respect to the “probabilities sum to one” feature, but then this is probably mentioned in your article. Furthermore I am not sure whether I would call a change in entropy multiplied by a “probability flow at the boundary” an “entropy flow rate”.
    
    Reply
    - Blake Pollard says:
      
      18 November, 2014 at 11:35 pm
      
      Your point about probabilities summing to one is a valid one. For open systems the $p_i$ ’s don’t sum to one! When considering open systems I should always be using the word ‘populations’ instead of ‘probabilities’. We called the term $\frac{Dp_i}{Dt}$ the ‘flow’ of population into the $i^{th}$ vertex, really it is the amount by which the rate of change of the population differs from that given by the master equation. So it tells you how ‘un-Markovian’ the populations at the boundaries are. The term $\frac{ \partial S}{\partial p_i}$ is the change is relative entropy when you change the population at the $i^{th}$ vertex. And the product of the two, $\frac{Dp_i}{Dt} \frac{\partial S}{p_i}$ is the rate of change of relative entropy of coming from the ‘non-Markovian’ behavior at the boundaries.
      
      Part of the fun of thinking about these open Markov processes is seeing how far you can get with populations that don’t sum to one. Some things change, such as the values which relative entropy can take, but other results for Markov processes still hold. Depending on what kind of question you want to ask you could think of a way to cook up probabilities from the populations at a particular moment in time.
  - John Baez says:
    
    19 November, 2014 at 11:34 pm
    
    John wrote:
    
    Very roughly speaking, he shows that for these systems, entropy can only decrease if it flows out of the system.
    
    Nad wrote:
    
    Do you interpret an entropy change at the boundary ( $\frac{\partial S}{\partial p_i \neq 0}$ ) as an entropy which flows in/out of the system?
    
    Very roughly, yes. If I had wanted to be precise, I would have said
    
    $\displaystyle{ \frac{\partial S}{\partial p_i} \neq 0 }$
    
    But I was writing for an impatient nontechnical audience. I don’t know a better one-sentence way to summarize Blake’s result.
    
    Reply
arch1 says:

18 November, 2014 at 6:23 pm

“Firstly, we don’t have just one drunkard: we can have any positive real number of drunkards.”

I always get stuck on the basics. How could one even *distinguish* (say) 10^(-30) drunkards from the same quantity of non-drunkards?

Reply
- Blake Pollard says:
  
  18 November, 2014 at 6:50 pm
  
  In the above discussion we were only modelling the behavior of drunkards. So there aren’t any non-drunkards to distinguish them from. For open Markov processes the appearance/disappearance of populations at the boundaries doesn’t say anything about where they came from or what they turned into. That kind of information has to be supplied by the outside world, which tells you how the populations behave at the boundaries.
  
  One way to think of this is that a Markov process is a chemical reaction with only one species. We could have a chemical reaction network with multiple species, but then the analogy with the above example can get confusing. A typical chemical reaction network says something like two A’s and a B turn into a C at some rate. It doesn’t say anything about where this happened.
  
  If we wanted to model the behavior of drunkards where we also included non-drunkards in the mix, then we could have our states be something like drunkards at the bar, non-drunkards at the bar, drunkards at home, non-drunkards at home, drunkards at the taco truck, non-drunkards at the taco truck. Then our transitions would be more like non-drunkard at the bar turns into drunkard at the bar or drunkard at the taco truck turns into non-drunkard at home and so on.
  
  The whole framework is really flexible. I don’t think any of that answers your question of how to really distinguish between two species. By giving them different labels in your reactions you are assuming that you can distinguish them somehow. In practice though this might be more difficult than distinguishing a drunkard from a non-drunkard.
  
  Reply
  - arch1 says:
    
    18 November, 2014 at 9:03 pm
    
    Blake, thanks much for clarifiying Markov processes vs the more general network model. OK time for me to start reading in earnest.
    
    Reply
John Baez says:

19 November, 2014 at 11:36 pm

Here is your 15 minutes of fame, Blake:

This shows the number of people viewing this site per hour for the last two days. Your post was noted on Hacker News a little while ago.

Reply
- Dr. P says:
  
  20 November, 2014 at 6:55 pm
  
  That’s a nice spike!
  
  Reply
WebHubTelescope says:

20 November, 2014 at 12:26 am

Markov diagrams are often used in modeling continuous failure rates in systems. Every transition points in an outward direction towards a failure, or greater entropy. Yet those same diagrams can be used to model maintenance, whereby the part is repaired and the arrow points in the other direction. That is one way of looking at it — that any closing of a loop regains order in the system.

Just a suggestion for an application!

Reply
Curious says:

21 November, 2014 at 6:24 am

Can you motivate why it is okay to think of the KL divergence as being relevant for a 2nd law analogy? It seems like the more natural comparison would have been with the entropy, and in that case, the entropy of the stationary distribution need not be greater than the original entropy.

Reply
- John Baez says:
  
  21 November, 2014 at 7:26 pm
  
  Up to some minor fudge factors, the entropy of a probability distribution $p$ is the Kullback-Leibler divergence of $p$ relative to the uniform distribution. Every Markov process with a finite set of states has at least one equilibium distribution. For Markov processes where the equilibrium distribution is the uniform distribution, entropy increases. This is the simplest version of the Second Law. Equivalently, the Kullback-Leibler divergence of $p(t)$ relative to the uniform distribution decreases.
  
  For Markov processes where the equilibrium distribution is some other distribution $q,$ the Kullback-Leibler divergence of $p(t)$ relative to $q$ decreases. This is the simplest version of the Second Law that applies to all Markov processes with a finite set of states.
  
  But in fact there’s a more powerful generalization, which is what Blake uses. For any Markov process, if we take two probability distributions $p$ and $q$ and evolve them forward in time, the Kullback-Leibler divergence of $p(t)$ relative to $q(t)$ decreases!
  
  This subsumes all the results I mentioned previously. It has the great advantage that we don’t need to know an equilibrium distribution to apply it. This, I claim, is the right way to think about the Second Law for Markov processes.
  
  Reply
  - Curious says:
    
    22 November, 2014 at 5:59 am
    
    Call me convinced.
    
    Reply
francesco says:

15 December, 2014 at 2:57 am

Very nice post! Landed here after googling “markov second law.” Not sure whether I am going to contravene any blogosphere etiquette now (in which case, I apologise in advance), but I would like to point out that a sort of information-theoretic “second-law–like” converse also hold, i.e., that any process that makes any ensemble of initial distributions less and less distinguishable over time is necessarily Markovian? I would be happy to discuss this further, in case you are interested.

Reply
- John Baez says:
  
  15 December, 2014 at 4:10 am
  
  Francesco wrote:
  
  I would like to point out that a sort of information-theoretic “second-law–like” converse also hold, i.e., that any process that makes any ensemble of initial distributions less and less distinguishable over time is necessarily Markovian?
  
  The question mark at the end confuses me. Are you pointing out a fact, or asking a question?
  
  Reply
  - francesco says:
    
    15 December, 2014 at 7:50 am
    
    Sorry, that’s meant to be a fact. :)
    (The question mark was left behind after reformulating my comment.)
    
    Reply
zhangwfjh says:

27 March, 2015 at 8:07 am

Very interesting post. I have a question. Why do we consider relative entropy for Markov processes rather than entropy? As far as I know, the entropy doesn’t always increase along the time evolution of Markov process, instead relative entropy does. How to explain the differences?

Reply
- John Baez says:
  
  27 March, 2015 at 4:06 pm
  
  Entropy is maximized by the “flat” or “constant” probability distribution: the one with
  
  $p_i = p_j$
  
  for all $i, j.$ So, it’s not surprising that entropy increases for any Markov process for which the flat probability distribution is an equilibrium state. This happens if and only if time evolution is given by doubly stochastic matrices: matrices of nonnegative numbers where the entries in any column and in any row sum to 1.
  
  But for most Markov processes, time evolution is not doubly stochastic. For most Markov processes, the flat probability distribution is not an equilibrium.
  
  So, for most Markov processes we need to use relative entropy. If we evolve a probability distribution according to a Markov process, its entropy relative to any equilibrium distribution will increase. (Blake says it decreases, but that’s just because he’s leaving the minus sign out of the definition of relative entropy. It’s just a different convention.)
  
  This is the best way to generalize the 2nd law from doubly stochastic Markov processes to general Markov processes. I have argued that relative entropy is more fundamental than entropy, because how much information you gain when you learn something depends on what you already knew. Note that the ordinary entropy is — up to a constant factor and additive constant — the same as entropy relative to the flat probability distribution. So even ordinary entropy is relative entropy in disguise.
  
  But the really nice thing is that when we use relative entropy, we don’t need to worry about the equilibrium state! If $p(t)$ and $q(t)$ are two probability distributions evolving in time according to a Markov process, the relative entropy $S(p(t), q(t))$ always increases! The case where $q(t)$ is an equilibrium solution — constant in time — is just a special case.
  
  Reply
  - zhangwfjh says:
    
    28 March, 2015 at 5:59 am
    
    Got it. Thank you very much.
    
    Reply
Information Geometry (Part 16) | Azimuth says:

14 January, 2016 at 7:04 am

Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).

The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how life works. That’s a difficult job! But one has to start somewhere, and this is one place to start.

We have a few papers on this subject:

• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)

Reply

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

	John Baez on Agent-Based Models (Part …
	Grandpa D on Agent-Based Models (Part …
	Anton Sherwood on Well Temperaments (Part 3…
	John Baez on Well Temperaments (Part 3…
	Anton Sherwood on Well Temperaments (Part 3…
	John Baez on Hexagonal Tiling Honeycomb
	Victor V Albert on Hexagonal Tiling Honeycomb
	Matias Schrauf on Agent-Based Models (Part …
	John Tromp on The Busy Beaver Game
	Wolfgang on Protonium

Azimuth