Coarse-Graining Open Markov Processes

Kenny Courser and I have been working hard on this paper for months:

• John Baez and Kenny Courser, Coarse-graining open Markov processes.

It may be almost done. So, it would be great if people here could take a look and comment on it! It’s a cool mix of probability theory and double categories. I’ve posted a similar but non-isomorphic article on the n-Category Café, where people know a lot about double categories. But maybe some of you here know more about Markov processes!

‘Coarse-graining’ is a standard method of extracting a simple Markov process from a more complicated one by identifying states. We extend coarse-graining to open Markov processes. An ‘open’ Markov process is one where probability can flow in or out of certain states called ‘inputs’ and ‘outputs’. One can build up an ordinary Markov process from smaller open pieces in two basic ways:

• composition, where we identify the outputs of one open Markov process with the inputs of another,

and

• tensoring, where we set two open Markov processes side by side.

A while back, Brendan Fong, Blake Pollard and I showed that these constructions make open Markov processes into the morphisms of a symmetric monoidal category:

A compositional framework for Markov processes, Azimuth, January 12, 2016.

Here Kenny and I go further by constructing a symmetric monoidal double category where the 2-morphisms include ways of coarse-graining open Markov processes. We also extend the previously defined ‘black-boxing’ functor from the category of open Markov processes to this double category.

But before you dive into the paper, let me explain all this stuff a bit more….

Very roughly speaking, a ‘Markov process’ is a stochastic model describing a sequence of transitions between states in which the probability of a transition depends only on the current state. But the only Markov processes talk about are continuous-time Markov processes with a finite set of states. These can be drawn as labeled graphs:

where the number labeling each edge describes the probability per time of making a transition from one state to another.

An ‘open’ Markov process is a generalization in which probability can also flow in or out of certain states designated as ‘inputs’ and outputs’:

Open Markov processes can be seen as morphisms in a category, since we can compose two open Markov processes by identifying the outputs of the first with the inputs of the second. Composition lets us build a Markov process from smaller open parts—or conversely, analyze the behavior of a Markov process in terms of its parts.

In this paper, Kenny extend the study of open Markov processes to include coarse-graining. ‘Coarse-graining’ is a widely studied method of simplifying a Markov process by mapping its set of states X onto some smaller set X' in a manner that respects the dynamics. Here we introduce coarse-graining for open Markov processes. And we show how to extend this notion to the case of maps p: X \to X' that are not surjective, obtaining a general concept of morphism between open Markov processes.

Since open Markov processes are already morphisms in a category, it is natural to treat morphisms between them as morphisms between morphisms, or ‘2-morphisms’. We can do this using double categories!

Double categories were first introduced by Ehresmann around 1963. Since then, they’ve used in topology and other branches of pure math—but more recently they’ve been used to study open dynamical systems and open discrete-time Markov chains. So, it should not be surprising that they are also useful for open Markov processes..

A 2-morphism in a double category looks like this:

While a mere category has only objects and morphisms, here we have a few more types of things. We call A, B, C and D ‘objects’, f and g ‘vertical 1-morphisms’, M and N ‘horizontal 1-cells’, and \alpha a ‘2-morphism’. We can compose vertical 1-morphisms to get new vertical 1-morphisms and compose horizontal 1-cells to get new horizontal 1-cells. We can compose the 2-morphisms in two ways: horizontally by setting squares side by side, and vertically by setting one on top of the other. The ‘interchange law’ relates vertical and horizontal composition of 2-morphisms.

In a ‘strict’ double category all these forms of composition are associative. In a ‘pseudo’ double category, horizontal 1-cells compose in a weakly associative manner: that is, the associative law holds only up to an invertible 2-morphism, the ‘associator’, which obeys a coherence law. All this is just a sketch; for details on strict and pseudo double categories try the paper by Grandis and Paré.

Kenny and I construct a double category \mathbb{M}\mathbf{ark} with:

  1. finite sets as objects,
  2. maps between finite sets as vertical 1-morphisms,
  3. open Markov processes as horizontal 1-cells,
  4. morphisms between open Markov processes as 2-morphisms.

I won’t give the definition of item 4 here; you gotta read our paper for that! Composition of open Markov processes is only weakly associative, so \mathbb{M}\mathbf{ark} is a pseudo double category.

This is how our paper goes. In Section 2 we define open Markov processes and steady state solutions of the open master equation. In Section 3 we introduce coarse-graining first for Markov processes and then open Markov processes. In Section 4 we construct the double category \mathbb{M}\mathbf{ark} described above. We prove this is a symmetric monoidal double category in the sense defined by Mike Shulman. This captures the fact that we can not only compose open Markov processes but also ‘tensor’ them by setting them side by side.

For example, if we compose this open Markov process:

with the one I showed you before:

we get this open Markov process:

But if we tensor them, we get this:

As compared with an ordinary Markov process, the key new feature of an open Markov process is that probability can flow in or out. To describe this we need a generalization of the usual master equation for Markov processes, called the ‘open master equation’.

This is something that Brendan, Blake and I came up with earlier. In this equation, the probabilities at input and output states are arbitrary specified functions of time, while the probabilities at other states obey the usual master equation. As a result, the probabilities are not necessarily normalized. We interpret this by saying probability can flow either in or out at both the input and the output states.

If we fix constant probabilities at the inputs and outputs, there typically exist solutions of the open master equation with these boundary conditions that are constant as a function of time. These are called ‘steady states’. Often these are nonequilibrium steady states, meaning that there is a nonzero net flow of probabilities at the inputs and outputs. For example, probability can flow through an open Markov process at a constant rate in a nonequilibrium steady state. It’s like a bathtub where water is flowing in from the faucet, and flowing out of the drain, but the level of the water isn’t changing.

Brendan, Blake and I studied the relation between probabilities and flows at the inputs and outputs that holds in steady state. We called the process of extracting this relation from an open Markov process ‘black-boxing’, since it gives a way to forget the internal workings of an open system and remember only its externally observable behavior. We showed that black-boxing is compatible with composition and tensoring. In other words, we showed that black-boxing is a symmetric monoidal functor.

In Section 5 of our new paper, Kenny and I show that black-boxing is compatible with morphisms between open Markov processes. To make this idea precise, we prove that black-boxing gives a map from the double category \mathbb{M}\mathbf{ark} to another double category, called \mathbb{L}\mathbf{inRel}, which has:

  1. finite-dimensional real vector spaces U,V,W,\dots as objects,
  2. linear maps f : V \to W as vertical 1-morphisms from V to W,
  3. linear relations R \subseteq V \oplus W as horizontal 1-cells from V to W,
  4. squares

    obeying (f \oplus g)R \subseteq S as 2-morphisms.

Here a ‘linear relation’ from a vector space V to a vector space W is a linear subspace R \subseteq V \oplus W. Linear relations can be composed in the usual way we compose relations. The double category \mathbb{L}\mathbf{inRel} becomes symmetric monoidal using direct sum as the tensor product, but unlike \mathbb{M}\mathbf{ark} it is a strict double category: that is, composition of linear relations is associative.

Our main result, Theorem 5.5, says that black-boxing gives a symmetric monoidal double functor

\blacksquare : \mathbb{M}\mathbf{ark} \to \mathbb{L}\mathbf{inRel}

As you’ll see if you check out our paper, there’s a lot of nontrivial content hidden in this short statement! The proof requires a lot of linear algebra and also a reasonable amount of category theory. For example, we needed this fact: if you’ve got a commutative cube in the category of finite sets:

and the top and bottom faces are pushouts, and the two left-most faces are pullbacks, and the two left-most arrows on the bottom face are monic, then the two right-most faces are pullbacks. I think it’s cool that this is relevant to Markov processes!

Finally, in Section 6 we state a conjecture. First we use a technique invented by Mike Shulman to construct symmetric monoidal bicategories \mathbf{Mark} and \mathbf{LinRel} from the symmetric monoidal double categories \mathbb{M}\mathbf{ark} and \mathbb{L}\mathbf{inRel}. We conjecture that our black-boxing double functor determines a functor between these symmetric monoidal bicategories. This has got to be true. However, double categories seem to be a simpler framework for coarse-graining open Markov processes.

Finally, let me talk a bit about some related work. As I already mentioned, Brendan, Blake and I constructed a symmetric monoidal category where the morphisms are open Markov processes. However, we formalized such Markov processes in a slightly different way than Kenny and I do now. We defined a Markov process to be one of the pictures I’ve been showing you: a directed multigraph where each edge is assigned a positive number called its ‘rate constant’. In other words, we defined it to be a diagram

where X is a finite set of vertices or ‘states’, E is a finite set of edges or ‘transitions’ between states, the functions s,t : E \to X give the source and target of each edge, and r : E \to (0,\infty) gives the rate constant for each transition. We explained how from this data one can extract a matrix of real numbers (H_{i j})_{i,j \in X} called the ‘Hamiltonian’ of the Markov process, with two properties that are familiar in this game:

H_{i j} \geq 0 if i \neq j,

\sum_{i \in X} H_{i j} = 0 for all j \in X.

A matrix with these properties is called ‘infinitesimal stochastic’, since these conditions are equivalent to \exp(t H) being stochastic for all t \ge 0.

In our new paper, Kenny and I skip the directed multigraphs and work directly with the Hamiltonians! In other words, we define a Markov process to be a finite set X together with an infinitesimal stochastic matrix (H_{ij})_{i,j \in X}. This allows us to work more directly with the Hamiltonian and the all-important ‘master equation’

\displaystyle{    \frac{d p(t)}{d t} = H p(t)  }

which describes the evolution of a time-dependent probability distribution

p(t) : X \to \mathbb{R}

Clerc, Humphrey and Panangaden have constructed a bicategory with finite sets as objects, ‘open discrete labeled Markov processes’ as morphisms, and ‘simulations’ as 2-morphisms. The use the word ‘open’ in a pretty similar way to me. But their open discrete labeled Markov processes are also equipped with a set of ‘actions’ which represent interactions between the Markov process and the environment, such as an outside entity acting on a stochastic system. A ‘simulation’ is then a function between the state spaces that map the inputs, outputs and set of actions of one open discrete labeled Markov process to the inputs, outputs and set of actions of another.

Another compositional framework for Markov processes was discussed by de Francesco Albasini, Sabadini and Walters. They constructed an algebra of ‘Markov automata’. A Markov automaton is a family of matrices with non-negative real coefficients that is indexed by elements of a binary product of sets, where one set represents a set of ‘signals on the left interface’ of the Markov automata and the other set analogously for the right interface.

So, double categories are gradually invading the theory of Markov processes… as part of the bigger trend toward applied category theory. They’re natural things; scientists should use them.

23 Responses to Coarse-Graining Open Markov Processes

  1. Kram1032 says:

    Just a heads up: That link to your paper currently yields a 404

  2. Jelle says:

    This is very cool John, but the paper link is broken

  3. ron davison says:

    And Engineers too!
    Great intro for non math linguistically trained individuals.
    I only have to go look up a few new definitions from math to fully understand. This seems to be relevant to lots of engineering in existence now. A wireless channel over time with intended and unintended inputs could only activate the sparse framework needed in the moment and turn off branches that are not needed in the moment. Once the inputs detect needed state machine subcategories, they can be activated to allow optimization in the moment. A moving car cell signal as it goes from one cell to another is another example, each cell tower and cell phone could have this sparse in the moment, yet robust and full featured as needed.
    This would maximize efficiency with little hit to performance.
    Self driving cars are another, with the constant identification, significant power draw, and non-negotiable requirements.
    On a country road the computation needed is much less than in a city rush hour comparison.
    Mark-on with the Markov multi category progress.
    Captain Obvious here…Has anyone have more than two Markov categories in a triple or higher #?
    Is it just a matter of extension if yes?

  4. Ammar Husain says:

    Started this yesterday after this post. https://github.com/Cobord/OpenMarkovTensorFlow. Will still need to implement composition of 1-morphisms and the 2-morphisms.

  5. John Baez says:

    First Kenny and I wrote a paper about this stuff:

    http://math.ucr.edu/home/baez/coarse_graining.pdf

    Then I coarse-grained that paper, summarizing it in a blog post:

    https://golem.ph.utexas.edu/category/2018/03/coarsegraining_open_markov_pro.html

    Then I coarse-grained that and wrote this Google+ post:

    https://tinyurl.com/baez-coarse

    And then I coarse-grained that and wrote this tweet:

    If I coarse-grain it any more it’ll shrink to a single period.

    • nad says:

      I don’t have the time to understand all this category theory, so I tried to make sense of what you are doing here by guessing from the example what you could have meant (this is in general usually faster). But this is of course also error prone, so one question came up:
      Is it correct that if you would have taken the same s as on page 11 in your pdf then there would be a 7 instead of an 8 as a label between the nodes w and y in the second diagramm in the image above (i.e. the twitter image)?

      • John Baez says:

        Sorry, I don’t know what “the same s as on page 11” means. I don’t see anything called s there. There’s something called S, which is the set of inputs.

        • nad says:

          With s I mean the very first letter in the text: s: \mathbb{R}^{X'} \rightarrow \mathbb{R}^{X} i.e. the alleged “stochastic section” of the “function” p. I am not sure whether I derived the correct infinitesimal matrix H from your diagram above, but in contrast to the H in the paper (p.10) the corresponding “lumped” H’ of the diagram would depend on the choice of the probabilities (or weights?) in s. I derive H’ by multiplying H with s (where I actually replaced 1/3 with P and 2/3 with (1-P) in order to understand the dependencies on the probabilities P better) and then add row 3 and 4 of H.

        • nad says:

          sorry a typo:
          and then add row 3 and 4 of H \rightarrow and then add row 3 and 4 of Hp.

        • nad says:

          sorry again typo:
          and then add row 3 and 4 of H \rightarrow and then add row 3 and 4 of Hs.
          Adding the rows is what I guessed could be the action of p_*

        • John Baez says:

          I’m having trouble finding time to look at the paper and do the (easy) calculations needed to check your hypotheses. But anyway, given finite sets X and Y, a map p : X \to Y, and a vector v \in \mathbb{R}^X, we have

          \displaystyle{ (p_* v)_i = \sum_{j \in X \textrm{ such that } p(j) = i} v_j }

          for any i \in Y. So, coarse-graining the Hamiltonian of a Markov process involves some summing.

        • Coarse-graining is an approximation as it reduces dimensionality in the flow diagram at the expense of fidelity.

          If the rate of 4 that describes the w2 to w1 transition was increased to a very large number, the coarse-grain solution would result in the w to y transition asymptotically converging to 9. This is a basic form of order-reduction for stiff problems.

          https://en.wikipedia.org/wiki/Model_order_reduction

        • nad says:

          @John
          OK thanks. Now I am more or less sure that I understand what you mean with the notation p_*. By the way is Kenny reading the comments to this blog post?
          As said I might have a different approach to derive a Hamiltonian from a diagram, but with the approach I sofar used, the Hamiltonian as derived with my approach would not be lumpable:
          H = \left( \begin{array}{ccccc} -5&0&0&0&0\\ 5&-11&0&0&0\\ 0&8&-9&4&0\\ 0&3&0&-10&0\\ 0&0&9&6&0 \end{array}\right)

          @Paul
          I don’t see what this have to do with stiffness problems, but if 4 gets big then with the above hamiltonian the w to y transition would converge to 9 only for a specific section.

        • John Baez says:

          I doubt Kenny is reading the comments here. I’ll tell him about these comments.

      • Kenny says:

        Hi Nad,

        The image in the tweet was taken from back when we were originally considering more general coarse-grainings of Markov processes. Later on, for category theoretic reasons, we decided to only work with Markov processes that were lumpable, since lumpability is precisely the condition needed for the Hamiltonian of the new Markov process to be independent of the choice of stochastic section s. As you noticed, the image in the tweet isn’t lumpable (at least not in the manner that is being visually suggested) since the rates of the edges going from both of the states w1 and w2 to the state y are not the same. They are in the example in the paper.

        It’s also perhaps worth noting that the edge with rate 4 going from the state w2 to the state w1 plays no role in the lumped Markov process, which intuitively makes sense: anything in either of the states w1 or w2 will transition to state y at the same rate in the case where the original Markov process is lumpable, and transitioning between states w1 and w2 will not affect this.

        • nad says:

          Hello Kenny

          thanks for replying. You wrote:

          Later on, for category theoretic reasons, we decided to only work with Markov processes that were lumpable, since lumpability is precisely the condition needed for the Hamiltonian of the new Markov process to be independent of the choice of stochastic section s.

          Frankly I have problems to see why this is a category theoretic question. In particular I haven’t sofar understood what this lumpability is needed for. I could have imagined e.g. that an internal lumpability might for example not change the relations between input and outputs of a Markov network, but I haven’t found a remark similar to that.

          I searched the text for all occasions of the word “lump”, but apart from the reference it didn’t appear in the text after theorem 3.10.

        • John Baez says:

          Nad wrote:

          I could have imagined e.g. that an internal lumpability might for example not change the relations between input and outputs of a Markov network, but I haven’t found a remark similar to that.

          It sounds like you’re talking about Lemma 5.3, which says that coarse-graining is compatible with black-boxing. You’ll note that the proof uses the equation

          p_* H = H' p_*

          In our paper we define ‘lumpability’ in Definition 3.4. Theorem 3.10 gives 3 equivalent conditions for lumpability. The best one is condition (ii):

          p_* H = H' p_*

          This is clearly equivalent to

          p_* \exp(tH) = \exp(tH') p_*  \textrm{ for all } t \in [0,\infty)

          This condition says that two things are the same:

          1) evolving a state in time using our original Markov process with Hamiltonian H and then coarse-graining it with p_*,

          2) coarse-graining a state with p_* and then evolving it in time using the coarse-grained Markov process with Hamiltonian H'.

          Since this is so nice, we build the equation p_* H = H' p_* into the definition of ‘morphism between open Markov processes’ in Definition 3.4. In the rest of the paper we study these morphisms and never mention the word ‘lump’.

          We need the condition p_* H = H' p_* to prove most of the interesting results in the paper. For example, Lemma 5.3.

          The literature on Markov processes talks about lumpability, but they usually use the equivalent condition (iii) as their definition of lumpability.

        • nad says:

          John wrote:

          It sounds like you’re talking about Lemma 5.3, which says that coarse-graining is compatible with black-boxing.

          Well, as said I was looking for a remark like that, but I didn’t and unfortunately still don’t see that “coarse-graining is compatible with black-boxing” is explained in Lemma 5.3:

          Lemma 5.3: Given a 2-morphism [DIAGRAM] in Mark, there exists a unique 2-morphism [DIAGRAM] in LinRel.

          That is even if you tell me that “build the equation p_* H = H' p_* into the definition of ‘morphism between open Markov processes” I don’t see why Lemma 5.3. says “coarse-graining is compatible with black-boxing” and in particular what you mean with “compatible.” This seems to be hidden in those cryptic diagrams. But anyways I got now a rough idea of what you were doing here.

        • John Baez says:

          Thanks for being so publicly confused about our paper; having worked on it for a year the meaning of the formalism has become obvious to Kenny and me, but you’re reminding me that a few sentences here and there could help explain its meaning.

          Lemma 5.3 is about what happens when we apply the 2-functor \blacksquare, which is black-boxing, to 2-morphisms in Mark, which are morphisms between open Markov processes. The simplest morphisms between open Markov processes are coarse-grainings. So, the Lemma says what happens when we black-box a coarse-graining.

          The simplest case is when the maps f and g in the diagram are identity functions. This means that we’re leaving the input and output states alone. In this case, Lemma 5.3 says that the relation between input and output probabilities and flows in steady state are the same for the original open Markov process and the coarse-grained open Markov process.

          I thought that might be what you were trying to say here:

          I could have imagined e.g. that an internal lumpability might for example not change the relations between input and outputs of a Markov network.

          This is true if by “the relations between inputs and outputs” you mean “the relation between input and output probabilities and flows in steady state”. This relation is what black-boxing gives us.

          The double category formalism, with its supposedly cryptic diagrams, is just a way of drawing open Markov processes and the morphisms between them, e.g. coarse-grainings. With this style of picture, the coarse-grainings look like rectangles. The basic operations on coarse-grainings look like ways of sticking together rectangles to get bigger rectangles. The rules these operations obey turn into intuitive rules about sticking together rectangles.

          That’s the whole point of double categories. Double categories are “the algebra of rectangles”, just as categories are “the algebra of arrows”.

          We don’t explain this, because this paper was written for people who are comfortable with double categories. Explaining double categories would double the length of this paper.

          I said morphisms between open Markov processes look like rectangles. But the simplest morphisms between open Markov processes are coarse-grainings, so let me talk just about coarse-grainings. The two main operations on these are:

          1) We can compose coarse-grainings of open Markov processes “horizontally” by attaching the outputs of one to the inputs of another.

          2) We can compose coarse-grainings of open Markov processes “vertically” by doing first one and then another: coarse-graining and then coarse-graining some more.

          We can stick together two rectangles either horizontally or vertically, and that’s how we draw operations 1) and 2). Some nontrivial laws governing these operations then drawn as obvious-looking laws about sticking rectangles together.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.