Fisher’s Fundamental Theorem (Part 3)

8 October, 2020

Last time we stated and proved a simple version of Fisher’s fundamental theorem of natural selection, which says that under some conditions, the rate of increase of the mean fitness equals the variance of the fitness. But the conditions we gave were very restrictive: namely, that the fitness of each species of replicator is constant, not depending on how many of these replicators there are, or any other replicators.

To broaden the scope of Fisher’s fundamental theorem we need to do one of two things:

1) change the left side of the equation: talk about some other quantity other than rate of change of mean fitness.

2) change the right side of the question: talk about some other quantity than the variance in fitness.

Or we could do both! People have spent a lot of time generalizing Fisher’s fundamental theorem. I don’t think there are, or should be, any hard rules on what counts as a generalization.

But today we’ll take alternative 1). We’ll show the square of something called the ‘Fisher speed’ always equals the variance in fitness. One nice thing about this result is that we can drop the restrictive condition I mentioned. Another nice thing is that the Fisher speed is a concept from information theory! It’s defined using the Fisher metric on the space of probability distributions.

And yes—that metric is named after the same guy who proved Fisher’s fundamental theorem! So, arguably, Fisher should have proved this generalization of Fisher’s fundamental theorem. But in fact it seems that I was the first to prove it, around February 1st, 2017. Some similar results were already known, and I will discuss those someday. But they’re a bit different.

A good way to think about the Fisher speed is that it’s ‘the rate at which information is being updated’. A population of replicators of different species gives a probability distribution. Like any probability distribution, this has information in it. As the populations of our replicators change, the Fisher speed says the rate at which this information is being updated. So, in simple terms, we’ll show

The square of the rate at which information is updated is equal to the variance in fitness.

This is quite a change from Fisher’s original idea, namely:

The rate of increase of mean fitness is equal to the variance in fitness.

But it has the advantage of always being true… as long the population dynamics are described by the general framework we introduced last time. So let me remind you of the general setup, and then prove the result!

The setup

We start out with population functions P_i \colon \mathbb{R} \to (0,\infty), one for each species of replicator i = 1,\dots,n, obeying the Lotka–Volterra equation

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) P_i }

for some differentiable functions f_i \colon (0,\infty) \to \mathbb{R} called fitness functions. The probability of a replicator being in the ith species is

\displaystyle{  p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)} }

Using the Lotka–Volterra equation we showed last time that these probabilities obey the replicator equation

\displaystyle{ \dot{p}_i = \left( f_i(P) - \overline f(P) \right)  p_i }

Here P is short for the whole list of populations (P_1(t), \dots, P_n(t)), and

\displaystyle{ \overline f(P) = \sum_j f_j(P) p_j  }

is the mean fitness.

The Fisher metric

The space of probability distributions on the set \{1, \dots, n\} is called the (n-1)-simplex

\Delta^{n-1} = \{ (x_1, \dots, x_n) : \; x_i \ge 0, \; \displaystyle{ \sum_{i=1}^n x_i = 1 } \}

It’s called \Delta^{n-1} because it’s (n-1)-dimensional. When n = 3 it looks like the letter \Delta:

The Fisher metric is a Riemannian metric on the interior of the (n-1)-simplex. That is, given a point p in the interior of \Delta^{n-1} and two tangent vectors v,w at this point the Fisher metric gives a number

g(v,w) = \displaystyle{ \sum_{i=1}^n \frac{v_i w_i}{p_i}  }

Here we are describing the tangent vectors v,w as vectors in \mathbb{R}^n with the property that the sum of their components is zero: that’s what makes them tangent to the (n-1)-simplex. And we’re demanding that x be in the interior of the simplex to avoid dividing by zero, since on the boundary of the simplex we have p_i = 0 for at least one choice of $i.$

If we have a probability distribution p(t) moving around in the interior of the (n-1)-simplex as a function of time, its Fisher speed is

\displaystyle{ \sqrt{g(\dot{p}(t), \dot{p}(t))} = \sum_{i=1}^n \frac{\dot{p}_i(t)^2}{p_i(t)} }

if the derivative \dot{p}(t) exists. This is the usual formula for the speed of a curve moving in a Riemannian manifold, specialized to the case at hand.

Now we’ve got all the formulas we’ll need to prove the result we want. But for those who don’t already know and love it, it’s worthwhile saying a bit more about the Fisher metric.

The factor of 1/x_i in the Fisher metric changes the geometry of the simplex so that it becomes round, like a portion of a sphere:

But the reason the Fisher metric is important, I think, is its connection to relative information. Given two probability distributions p, q \in \Delta^{n-1}, the information of q relative to p is

\displaystyle{ I(q,p) = \sum_{i = 1}^n q_i \ln\left(\frac{q_i}{p_i}\right)   }

You can show this is the expected amount of information gained if p was your prior distribution and you receive information that causes you to update your prior to q. So, sometimes it’s called the information gain. It’s also called relative entropy or—my least favorite, since it sounds so mysterious—the Kullback–Leibler divergence.

Suppose p(t) is a smooth curve in the interior of the (n-1)-simplex. We can ask the rate at which information is gained as time passes. Perhaps surprisingly, a calculation gives

\displaystyle{ \frac{d}{dt} I(p(t), p(t_0))\Big|_{t = t_0} = 0 }

That is, in some sense ‘to first order’ no information is being gained at any moment t_0 \in \mathbb{R}. However, we have

\displaystyle{  \frac{d^2}{dt^2} I(p(t), p(t_0))\Big|_{t = t_0} =  g(\dot{p}(t_0), \dot{p}(t_0))}

So, the square of the Fisher speed has a nice interpretation in terms of relative entropy!

For a derivation of these last two equations, see Part 7 of my posts on information geometry. For more on the meaning of relative entropy, see Part 6.

The result

It’s now extremely easy to show what we want, but let me state it formally so all the assumptions are crystal clear.

Theorem. Suppose the functions P_i \colon \mathbb{R} \to (0,\infty) obey the Lotka–Volterra equations:

\displaystyle{ \dot P_i = f_i(P) P_i}

for some differentiable functions f_i \colon (0,\infty)^n \to \mathbb{R} called fitness functions. Define probabilities and the mean fitness as above, and define the variance of the fitness by

\displaystyle{ \mathrm{Var}(f(P)) =  \sum_j ( f_j(P) - \overline f(P))^2 \, p_j }

Then if none of the populations P_i are zero, the square of the Fisher speed of the probability distribution p(t) = (p_1(t), \dots , p_n(t)) is the variance of the fitness:

g(\dot{p}, \dot{p})  = \mathrm{Var}(f(P))

Proof. The proof is near-instantaneous. We take the square of the Fisher speed:

\displaystyle{ g(\dot{p}, \dot{p}) = \sum_{i=1}^n \frac{\dot{p}_i(t)^2}{p_i(t)} }

and plug in the replicator equation:

\displaystyle{ \dot{p}_i = (f_i(P) - \overline f(P)) p_i }

We obtain:

\begin{array}{ccl} \displaystyle{ g(\dot{p}, \dot{p})} &=& \displaystyle{ \sum_{i=1}^n \left( f_i(P) - \overline f(P) \right)^2 p_i } \\ \\ &=& \mathrm{Var}(f(P)) \end{array}

as desired.   █

It’s hard to imagine anything simpler than this. We see that given the Lotka–Volterra equation, what causes information to be updated is nothing more and nothing less than variance in fitness! But there are other variants of Fisher’s fundamental theorem worth discussing, so I’ll talk about those in future posts.


Markov Decision Processes

6 October, 2020

The National Institute of Mathematical and Biological Sciences is having an online seminar on ‘adaptive management’. It should be fun for people who want to understand Markov decision processes—like me!

NIMBioS Adaptive Management Webinar Series, 2020 October 26-29 (Monday-Thursday).

Adaptive management seeks to determine sound management strategies in the face of uncertainty concerning the behavior of the system being managed. Specifically, it attempts to find strategies for managing dynamic systems while learning the behavior of the system. These webinars review the key concept of a Markov Decision Process (MDP) and demonstrate how quantitative adaptive management strategies can be developed using MDPs. Additional conceptual, computational and application aspects will be discussed, including dynamic programming and Bayesian formalization of learning.

Here are the topics:

Session 1: Introduction to decision problems
Session 2: Introduction to Markov decision processes (MDPs)
Session 3: Solving Markov decision processes (MDPs)
Session 4: Modeling beliefs
Session 5: Conjugacy and discrete model adaptive management (AM)
Session 6: More on AM problems (Dirichlet/multinomial and Gaussian prior/likelihood)
Session 7: Partially observable Markov decision processes (POMDPs)
Session 8: Frontier topics (projection methods, approximate DP, communicating solutions)

 


Fock Space Techniques for Stochastic Physics

2 October, 2020

I’ve been fascinated for a long time about the relation between classical probability theory and quantum mechanics. This story took a strange new turn when people discovered that stochastic Petri nets, good for describing classical probabilistic models of interacting entities, can also be described using ideas from the quantum field theory!

I’ll be talking about this at the online category theory seminar at UNAM, the National Autonomous University of Mexico, on Wednesday October 7th at 18:00 UTC (11 am Pacific Time):

Fock space techniques for stochastic physics

Abstract. Some ideas from quantum theory are beginning to percolate back to classical probability theory. For example, the master equation for a chemical reaction network—also known as a stochastic Petri net—describes particle interactions in a stochastic rather than quantum way. If we look at this equation from the perspective of quantum theory, this formalism turns out to involve creation and annihilation operators, coherent states and other well-known ideas—but with a few big differences.

You can watch the talk here:

You can also see the slides of this talk. Click on any picture in the slides, or any text in blue, and get more information!

My students Joe Moeller and Jade Master will also be giving talks in this seminar—on Petri nets and structured cospans.

 


The Brownian Map

19 September, 2020

\phantom{x}

Nina Holden won the 2021 Maryam Mirzakhani New Frontiers Prize for her work on random surfaces and the mathematics of quantum gravity. I’d like to tell you what she did… but I’m so far behind I’ll just explain a bit of the background.

Suppose you randomly choose a triangulation of the sphere with n triangles. This is a purely combinatorial thing, but you can think of it as a metric space if each of the triangles is equilateral with all sides of length 1.

This is a distorted picture of what you might get, drawn by Jérémie Bettinelli:


The triangles are not drawn as equilateral, so we can fit this shape into 3d space. Visit Bettinelli’s page for images that you can rotate:

• Jérémie Bettinelli, Computer simulations of random maps.

I’ve described how to build a random space out of n triangles. In the limit n \to \infty, if you rescale the resulting space by a factor of n^{-1/4} so it doesn’t get bigger and bigger, it converges to a ‘random metric space’ with fascinating properties. It’s called the Brownian map.

This random metric space is on average so wrinkly and crinkly that ‘almost surely’—that is, with probability 1—its Hausdorff dimension is not 2 but 4. And yet it is almost surely homeomorphic to a sphere!

Rigorously proving this is hard: a mix of combinatorics, probability theory and geometry.

Ideas from physics are also important here. There’s a theory called
Liouville quantum gravity’ that describes these random 2-dimensional surfaces. So, physicists have ways of—nonrigorously—figuring out answers to some questions faster than the mathematicians!

A key step in understanding the Brownian map was this paper from 2013:

• Jean-François Le Gall, Uniqueness and universality of the Brownian map, Annals of Probability 41 (2013), 2880–2960.



The Brownian map is to surfaces what Brownian motion is to curves. For example, the Hausdorff dimension of Brownian motion is almost surely 2: twice the dimension of a smooth curve. For the Brownian map it’s almost surely 4, twice the dimension of a smooth surface.

Let me just say one more technical thing. There’s a ‘space of all compact metric spaces’, and the Brownian map is actually a probability measure on this space! It’s called the Gromov-Hausdorff space, and it itself is a metric space… but not compact. (So no, we don’t have a set containing itself as an element.)


There’s a lot more to say about this… but I haven’t gotten very close to understanding Nina Holden’s work yet. She wrote a 7-paper series leading up to this one:

• Nina Holden and Xin Sun, Convergence of uniform triangulations under the Cardy embedding.

They show that random triangulations of a disk can be chosen to a random metric on the disk which can also be obtained from Liouville quantum gravity.

This is a much easier place to start learning this general subject:

• Ewain Gwynne, Random surfaces and Liouville quantum gravity.

One reason I find all this interesting is that when I worked on ‘spin foam models’ of quantum gravity, we were trying to develop combinatorial theories of spacetime that had nice limits as the number of discrete building blocks approached infinity. We were studying theories much more complicated than random 2-dimensional triangulations, and it quickly became clear to me how much work it would be to carefully analyze these. So it’s interesting to see how mathematicians have entered this subject—starting with a much more tractable class of theories, which are already quite hard.

While the theory I just described gives random metric spaces whose Hausdorff dimension is twice their topological dimension, Liouville quantum gravity actually contains an adjustable parameter that lets you force these metric spaces to become less wrinkled, with lower Hausdorff dimension. Taming the behavior of random triangulations gets harder in higher dimensions. Renate Loll, Jan Ambjørn and others have argued that we need to work with Lorentzian rather than Riemannian geometries to get physically reasonable behavior. This approach to quantum gravity is called causal dynamical triangulations.


Open Markov Processes

4 July, 2020

I gave a talk on some work I did with Kenny Courser. You can see slides of the talk, and also a video and the papers it’s based on:

Coarse-graining open Markov processes.

Abstract. We illustrate some new paradigms in applied category theory with the example of coarse-graining open Markov processes. Coarse-graining is a standard method of extracting a simpler Markov process from a more complicated one by identifying states. Here we extend coarse-graining to ‘open’ Markov processes: that is, those where probability can flow in or out of certain states called ‘inputs’ and ‘outputs’. One can build up an ordinary Markov process from smaller open pieces in two basic ways: composition, where we identify the outputs of one open Markov process with the inputs of another, and tensoring, where we set two open Markov processes side by side. These constructions make open Markov processes into the morphisms of a symmetric monoidal category. But we can go further and construct a symmetric monoidal double category where the 2-morphisms include ways of coarse-graining open Markov processes. We can describe the behavior of open Markov processes using double functors out of this double category.

For more, look at these:

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• John Baez and Kenny Courser, Coarse-graining open Markov processes. (Blog article here.)

• Kenny Courser, Open Systems: A Double Categorical Perspective.


Categorical Statistics Group

10 June, 2020

As a spinoff of the workshop Categorical Probability and Statistics, Oliver Shetler has organized a reading group on category theory applied to statistics. The first meeting is Saturday June 27th at 17:00 UTC.

You can sign up for the group here, and also read more about it there. We’re discussing the group on the Category Theory Community Server, so if you want to join the reading group should probably also join that.

Here is a reading list. I’m sure the group won’t cover all these papers—we’ll start with the first one and see how it goes from there. But it’s certainly helpful to have a list like this.

• McCullagh, What is a statistical model?

• Morse and Sacksteder, Statistical isomorphism.

• Simpson, Probability sheaves and the Giry monad.

• Fritz, A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics.

• Jacobs, Probabilities, distribution monads, and convex categories.

• Keimel, The monad of probability measures over compact ordered spaces and its Eilenberg-Moore algebras.

• McCullaugh, Di Nardo, Senato, Natural statistics for spectral samples.

• Perrone, Categorical Probability and Stochastic Dominance in Metric Spaces. (Ph.D. thesis)

• Patterson, The Algebra and Machine Representation of Statistical Models. (Ph.D. thesis)

• Tuyeras, A category theoretical argument for causal inference.

• Culbertson and Sturtz, A categorical foundation for Bayesian probability.

• Fong, Causal Theories: A Categorical Perspective on Bayesian Networks. (Masters thesis)

• Fritz and Perrone, A probability monad as the colimit of spaces of finite samples.

• Fritz and Perrone, Bimonoidal structure of probability monads.

• Fritz, A presentation of the category of stochastic matrices.

• Jacobs and Furber, Towards a categorical account of conditional probability.

• Bradley, At the Interface of Algebra and Statistics. (Ph.D. Thesis)

• Bradley, Stoudenmire and Terilla, Modeling sequences with quantum states.

• Jacobs, Categorical aspects of parameter learning.

• Jacobs, Parameters and parameterization in specification, using distributive categories.

• Parzygnat, Inverses, disintegrations, and Bayesian inversion in quantum Markov categories.

• Jacobs, A channel-based perspective on conjugate priors.


A Categorical View of Conditional Expectation

2 April, 2020

 \;

I always like to see categories combined with probability theory and analysis. So I’m glad Prakash Panangaden did that in his talk at the ACT@UCR seminar.

He gave his talk on Wednesday April 8th. Afterwards we had discussions at the Category Theory Community Server, here:

https://categorytheory.zulipchat.com/#narrow/stream/229966-ACT.40UCR-seminar/topic/April.208th.3A.20Prakash.20Panangaden

You can view or join the conversation there if you sign in.

You can see his slides here, or download a video here, or watch the video here:

A categorical view of conditional expectation

Abstract. This talk is a fragment from a larger work on approximating Markov processes. I will focus on a functorial definition of conditional expectation without talking about how it was used. We define categories of cones—which are abstract versions of the familiar cones in vector spaces—of measures and related categories cones of Lp functions. We will state a number of dualities and isomorphisms between these categories. Then we will define conditional expectation by exploiting these dualities: it will turn out that we can define conditional expectation with respect to certain morphisms. These generalize the standard notion of conditioning with respect to a sub-sigma algebra. Why did I use the plural? Because it turns out that there are two kinds of conditional expectation, one of which looks like a left adjoint (in the matrix sense not the categorical sense) and the other looks like a right adjoint. I will review concepts like image measure, Radon-Nikodym derivatives and the traditional definition of conditional expectation. This is joint work with Philippe Chaput, Vincent Danos and Gordon Plotkin.

For more, see:

• Philippe Chaput, Vincent Danos, Prakash Panangaden and Gordon Plotkin, Approximating Markov processes by averaging, in International Colloquium on Automata, Languages, and Programming, Springer, Berlin, 2009.


Beyond Classical Bayesian Networks

7 July, 2018

In the final installment of the Applied Category Theory seminar, two students explain a category-theoretic approach to Bayesian networks and their generalizations. Check it out:

• Pablo Andres-Martinez and Sophie Raynor, Beyond Classical Bayesian networks, The n-Category Café, 7 July 2018.

Pablo Andres-Martinez is a postdoc at the University of Edinburgh, working in the cool-sounding Centre for Doctoral Training in Pervasive Parallelism. Sophie Raynor works at Hoppinger. Their blog article discusses this paper:

• Joe Henson, Raymond Lal and Matthew F. Pusey, Theory-independent limits on correlations from generalized Bayesian networks, New Journal of Physics 16 (2014), 113043.


Compositionality — The Journal

6 May, 2018

A new journal! We’ve been working on it for a long time, but we finished sorting out some details at ACT2018, and now we’re ready to tell the world!

It’s free to read, free to publish in, and it’s about building big things from smaller parts. Here’s the top of the journal’s home page right now:

Here’s the official announcement:

We are pleased to announce the launch of Compositionality, a new diamond open-access journal for research using compositional ideas, most notably of a category-theoretic origin, in any discipline. Topics may concern foundational structures, an organizing principle, or a powerful tool. Example areas include but are not limited to: computation, logic, physics, chemistry, engineering, linguistics, and cognition. To learn more about the scope and editorial policies of the journal, please visit our website at http://www.compositionality-journal.org.

Compositionality is the culmination of a long-running discussion by many members of the extended category theory community, and the editorial policies, look, and mission of the journal have yet to be finalized. We would love to get your feedback about our ideas on the forum we have established for this purpose:

http://reddit.com/r/compositionality

Lastly, the journal is currently receiving applications to serve on the editorial board; submissions are due May 31 and will be evaluated by the members of our steering board: John Baez, Bob Coecke, Kathryn Hess, Steve Lack, and Valeria de Paiva.

https://tinyurl.com/call-for-editors

We will announce a call for submissions in mid-June.

We’re looking forward to your ideas and submissions!

Best regards,

Brendan Fong, Nina Otter, and Joshua Tan

http://www.compositionality-journal.org/


Coarse-Graining Open Markov Processes

4 March, 2018

Kenny Courser and I have been working hard on this paper for months:

• John Baez and Kenny Courser, Coarse-graining open Markov processes.

It may be almost done. So, it would be great if people here could take a look and comment on it! It’s a cool mix of probability theory and double categories. I’ve posted a similar but non-isomorphic article on the n-Category Café, where people know a lot about double categories. But maybe some of you here know more about Markov processes!

‘Coarse-graining’ is a standard method of extracting a simple Markov process from a more complicated one by identifying states. We extend coarse-graining to open Markov processes. An ‘open’ Markov process is one where probability can flow in or out of certain states called ‘inputs’ and ‘outputs’. One can build up an ordinary Markov process from smaller open pieces in two basic ways:

• composition, where we identify the outputs of one open Markov process with the inputs of another,

and

• tensoring, where we set two open Markov processes side by side.

A while back, Brendan Fong, Blake Pollard and I showed that these constructions make open Markov processes into the morphisms of a symmetric monoidal category:

A compositional framework for Markov processes, Azimuth, January 12, 2016.

Here Kenny and I go further by constructing a symmetric monoidal double category where the 2-morphisms include ways of coarse-graining open Markov processes. We also extend the previously defined ‘black-boxing’ functor from the category of open Markov processes to this double category.

But before you dive into the paper, let me explain all this stuff a bit more….

Very roughly speaking, a ‘Markov process’ is a stochastic model describing a sequence of transitions between states in which the probability of a transition depends only on the current state. But the only Markov processes talk about are continuous-time Markov processes with a finite set of states. These can be drawn as labeled graphs:

where the number labeling each edge describes the probability per time of making a transition from one state to another.

An ‘open’ Markov process is a generalization in which probability can also flow in or out of certain states designated as ‘inputs’ and outputs’:

Open Markov processes can be seen as morphisms in a category, since we can compose two open Markov processes by identifying the outputs of the first with the inputs of the second. Composition lets us build a Markov process from smaller open parts—or conversely, analyze the behavior of a Markov process in terms of its parts.

In this paper, Kenny extend the study of open Markov processes to include coarse-graining. ‘Coarse-graining’ is a widely studied method of simplifying a Markov process by mapping its set of states X onto some smaller set X' in a manner that respects the dynamics. Here we introduce coarse-graining for open Markov processes. And we show how to extend this notion to the case of maps p: X \to X' that are not surjective, obtaining a general concept of morphism between open Markov processes.

Since open Markov processes are already morphisms in a category, it is natural to treat morphisms between them as morphisms between morphisms, or ‘2-morphisms’. We can do this using double categories!

Double categories were first introduced by Ehresmann around 1963. Since then, they’ve used in topology and other branches of pure math—but more recently they’ve been used to study open dynamical systems and open discrete-time Markov chains. So, it should not be surprising that they are also useful for open Markov processes..

A 2-morphism in a double category looks like this:

While a mere category has only objects and morphisms, here we have a few more types of things. We call A, B, C and D ‘objects’, f and g ‘vertical 1-morphisms’, M and N ‘horizontal 1-cells’, and \alpha a ‘2-morphism’. We can compose vertical 1-morphisms to get new vertical 1-morphisms and compose horizontal 1-cells to get new horizontal 1-cells. We can compose the 2-morphisms in two ways: horizontally by setting squares side by side, and vertically by setting one on top of the other. The ‘interchange law’ relates vertical and horizontal composition of 2-morphisms.

In a ‘strict’ double category all these forms of composition are associative. In a ‘pseudo’ double category, horizontal 1-cells compose in a weakly associative manner: that is, the associative law holds only up to an invertible 2-morphism, the ‘associator’, which obeys a coherence law. All this is just a sketch; for details on strict and pseudo double categories try the paper by Grandis and Paré.

Kenny and I construct a double category \mathbb{M}\mathbf{ark} with:

  1. finite sets as objects,
  2. maps between finite sets as vertical 1-morphisms,
  3. open Markov processes as horizontal 1-cells,
  4. morphisms between open Markov processes as 2-morphisms.

I won’t give the definition of item 4 here; you gotta read our paper for that! Composition of open Markov processes is only weakly associative, so \mathbb{M}\mathbf{ark} is a pseudo double category.

This is how our paper goes. In Section 2 we define open Markov processes and steady state solutions of the open master equation. In Section 3 we introduce coarse-graining first for Markov processes and then open Markov processes. In Section 4 we construct the double category \mathbb{M}\mathbf{ark} described above. We prove this is a symmetric monoidal double category in the sense defined by Mike Shulman. This captures the fact that we can not only compose open Markov processes but also ‘tensor’ them by setting them side by side.

For example, if we compose this open Markov process:

with the one I showed you before:

we get this open Markov process:

But if we tensor them, we get this:

As compared with an ordinary Markov process, the key new feature of an open Markov process is that probability can flow in or out. To describe this we need a generalization of the usual master equation for Markov processes, called the ‘open master equation’.

This is something that Brendan, Blake and I came up with earlier. In this equation, the probabilities at input and output states are arbitrary specified functions of time, while the probabilities at other states obey the usual master equation. As a result, the probabilities are not necessarily normalized. We interpret this by saying probability can flow either in or out at both the input and the output states.

If we fix constant probabilities at the inputs and outputs, there typically exist solutions of the open master equation with these boundary conditions that are constant as a function of time. These are called ‘steady states’. Often these are nonequilibrium steady states, meaning that there is a nonzero net flow of probabilities at the inputs and outputs. For example, probability can flow through an open Markov process at a constant rate in a nonequilibrium steady state. It’s like a bathtub where water is flowing in from the faucet, and flowing out of the drain, but the level of the water isn’t changing.

Brendan, Blake and I studied the relation between probabilities and flows at the inputs and outputs that holds in steady state. We called the process of extracting this relation from an open Markov process ‘black-boxing’, since it gives a way to forget the internal workings of an open system and remember only its externally observable behavior. We showed that black-boxing is compatible with composition and tensoring. In other words, we showed that black-boxing is a symmetric monoidal functor.

In Section 5 of our new paper, Kenny and I show that black-boxing is compatible with morphisms between open Markov processes. To make this idea precise, we prove that black-boxing gives a map from the double category \mathbb{M}\mathbf{ark} to another double category, called \mathbb{L}\mathbf{inRel}, which has:

  1. finite-dimensional real vector spaces U,V,W,\dots as objects,
  2. linear maps f : V \to W as vertical 1-morphisms from V to W,
  3. linear relations R \subseteq V \oplus W as horizontal 1-cells from V to W,
  4. squares

    obeying (f \oplus g)R \subseteq S as 2-morphisms.

Here a ‘linear relation’ from a vector space V to a vector space W is a linear subspace R \subseteq V \oplus W. Linear relations can be composed in the usual way we compose relations. The double category \mathbb{L}\mathbf{inRel} becomes symmetric monoidal using direct sum as the tensor product, but unlike \mathbb{M}\mathbf{ark} it is a strict double category: that is, composition of linear relations is associative.

Our main result, Theorem 5.5, says that black-boxing gives a symmetric monoidal double functor

\blacksquare : \mathbb{M}\mathbf{ark} \to \mathbb{L}\mathbf{inRel}

As you’ll see if you check out our paper, there’s a lot of nontrivial content hidden in this short statement! The proof requires a lot of linear algebra and also a reasonable amount of category theory. For example, we needed this fact: if you’ve got a commutative cube in the category of finite sets:

and the top and bottom faces are pushouts, and the two left-most faces are pullbacks, and the two left-most arrows on the bottom face are monic, then the two right-most faces are pullbacks. I think it’s cool that this is relevant to Markov processes!

Finally, in Section 6 we state a conjecture. First we use a technique invented by Mike Shulman to construct symmetric monoidal bicategories \mathbf{Mark} and \mathbf{LinRel} from the symmetric monoidal double categories \mathbb{M}\mathbf{ark} and \mathbb{L}\mathbf{inRel}. We conjecture that our black-boxing double functor determines a functor between these symmetric monoidal bicategories. This has got to be true. However, double categories seem to be a simpler framework for coarse-graining open Markov processes.

Finally, let me talk a bit about some related work. As I already mentioned, Brendan, Blake and I constructed a symmetric monoidal category where the morphisms are open Markov processes. However, we formalized such Markov processes in a slightly different way than Kenny and I do now. We defined a Markov process to be one of the pictures I’ve been showing you: a directed multigraph where each edge is assigned a positive number called its ‘rate constant’. In other words, we defined it to be a diagram

where X is a finite set of vertices or ‘states’, E is a finite set of edges or ‘transitions’ between states, the functions s,t : E \to X give the source and target of each edge, and r : E \to (0,\infty) gives the rate constant for each transition. We explained how from this data one can extract a matrix of real numbers (H_{i j})_{i,j \in X} called the ‘Hamiltonian’ of the Markov process, with two properties that are familiar in this game:

H_{i j} \geq 0 if i \neq j,

\sum_{i \in X} H_{i j} = 0 for all j \in X.

A matrix with these properties is called ‘infinitesimal stochastic’, since these conditions are equivalent to \exp(t H) being stochastic for all t \ge 0.

In our new paper, Kenny and I skip the directed multigraphs and work directly with the Hamiltonians! In other words, we define a Markov process to be a finite set X together with an infinitesimal stochastic matrix (H_{ij})_{i,j \in X}. This allows us to work more directly with the Hamiltonian and the all-important ‘master equation’

\displaystyle{    \frac{d p(t)}{d t} = H p(t)  }

which describes the evolution of a time-dependent probability distribution

p(t) : X \to \mathbb{R}

Clerc, Humphrey and Panangaden have constructed a bicategory with finite sets as objects, ‘open discrete labeled Markov processes’ as morphisms, and ‘simulations’ as 2-morphisms. The use the word ‘open’ in a pretty similar way to me. But their open discrete labeled Markov processes are also equipped with a set of ‘actions’ which represent interactions between the Markov process and the environment, such as an outside entity acting on a stochastic system. A ‘simulation’ is then a function between the state spaces that map the inputs, outputs and set of actions of one open discrete labeled Markov process to the inputs, outputs and set of actions of another.

Another compositional framework for Markov processes was discussed by de Francesco Albasini, Sabadini and Walters. They constructed an algebra of ‘Markov automata’. A Markov automaton is a family of matrices with non-negative real coefficients that is indexed by elements of a binary product of sets, where one set represents a set of ‘signals on the left interface’ of the Markov automata and the other set analogously for the right interface.

So, double categories are gradually invading the theory of Markov processes… as part of the bigger trend toward applied category theory. They’re natural things; scientists should use them.