## Interview (Part 1)

18 March, 2016

Greg Bernhardt runs an excellent website for discussing physics, math and other topics, called Physics Forums. He recently interviewed me there. Since I used this opportunity to explain a bit about the Azimuth Project and network theory, I thought I’d reprint the interview here. Here is Part 1.

Give us some background on yourself.

I’m interested in all kinds of mathematics and physics, so I call myself a mathematical physicist. But I’m a math professor at the University of California in Riverside. I’ve taught here since 1989. My wife Lisa Raphals got a job here nine years later: among other things, she studies classical Chinese and Greek philosophy.

I got my bachelors’s degree in math at Princeton. I did my undergrad thesis on whether you can use a computer to solve Schrödinger’s equation to arbitrary accuracy. In the end, it became obvious that you can. I was really interested in mathematical logic, and I used some in my thesis—the theory of computable functions—but I decided it wasn’t very helpful in physics. When I read the magnificently poetic last chapter of Misner, Thorne and Wheeler’s Gravitation, I decided that quantum gravity was the problem to work on.

I went to math grad school at MIT, but I didn’t find anyone to work with on quantum gravity. So, I did my thesis on quantum field theory with Irving Segal. He was one of the founders of “constructive quantum field theory”, where you try to rigorously prove that quantum field theories make mathematical sense and obey certain axioms that they should. This was a hard subject, and I didn’t accomplish much, but I learned a lot.

I got a postdoc at Yale and switched to classical field theory, mainly because it was something I could do. On the side I was still trying to understand quantum gravity. String theory was bursting into prominence at the time, and my life would have been easier if I’d jumped onto that bandwagon. But I didn’t like it, because most of the work back then studied strings moving on a fixed “background” spacetime. Quantum gravity is supposed to be about how the geometry of spacetime is variable and quantum­-mechanical, so I didn’t want a theory of quantum gravity set on a pre­-existing background geometry!

I got a professorship at U.C. Riverside based on my work on classical field theory. But at a conference on that subject in Seattle, I heard Abhay Ashtekar, Chris Isham and Renate Loll give some really interesting talks on loop quantum gravity. I don’t know why they gave those talks at a conference on classical field theory. But I’m sure glad they did! I liked their work because it was background­-free and mathematically rigorous. So I started work on loop quantum gravity.

Like many other theories, quantum gravity is easier in low dimensions. I became interested in how category theory lets you formulate quantum gravity in a universe with just 3 spacetime dimensions. It amounts to a radical new conception of space, where the geometry is described in a thoroughly quantum­-mechanical way. Ultimately, space is a quantum superposition of “spin networks”, which are like Feynman diagrams. The idea is roughly that a spin network describes a virtual process where particles move around and interact. If we know how likely each of these processes is, we know the geometry of space.

A spin network

Loop quantum gravity tries to do the same thing for full­-fledged quantum gravity in 4 spacetime dimensions, but it doesn’t work as well. Then Louis Crane had an exciting idea: maybe 4­-dimensional quantum gravity needs a more sophisticated structure: a “2­-category”.

I had never heard of 2­-categories. Category theory is about things and processes that turn one thing into another. In a 2­-category we also have “meta-processes” that turn one process into another.

I became very excited about 2-­categories. At the time I was so dumb I didn’t consider the possibility of 3­-categories, and 4­-categories, and so on. To be precise, I was more of a mathematical physicist than a mathematician: I wasn’t trying to develop math for its own sake. Then someone named James Dolan told me about n­-categories! That was a real eye­-opener. He came to U.C. Riverside to work with me. So I started thinking about n­-categories in parallel with loop quantum gravity.

Dolan was technically my grad student, but I probably learned more from him than vice versa. In 1996 we wrote a paper called “Higher­-dimensional algebra and topological quantum field theory”, which might be my best paper. It’s full of grandiose guesses about n-­categories and their connections to other branches of math and physics. We had this vision of how everything fit together. It was so beautiful, with so much evidence supporting it, that we knew it had to be true. Unfortunately, at the time nobody had come up with a good definition of n­-category, except for n < 4. So we called our guesses “hypotheses” instead of “conjectures”. In math a conjecture should be something utterly precise: it’s either true or not, with no room for interpretation.

By now, I think everybody more or less believes our hypotheses. Some of the easier ones have already been turned into theorems. Jacob Lurie, a young hotshot at Harvard, improved the statement of one and wrote a 111-page outline of a proof. Unfortunately he still used some concepts that hadn’t been defined. People are working to fix that, and I feel sure they’ll succeed.

A foam of soap bubbles

Anyway, I kept trying to connect these ideas to quantum gravity. In 1997, I introduced “spin foams”. These are structures like spin networks, but with an extra dimension. Spin networks have vertices and edges. Spin foams also have 2­-dimensional faces: imagine a foam of soap bubbles.

The idea was to use spin foams to give a purely quantum­-mechanical description of the geometry of spacetime, just as spin networks describe the geometry of space. But mathematically, what we’re doing here is going from a category to a 2­-category.

By now, there are a number of different theories of quantum gravity based on spin foams. Unfortunately, it’s not clear that any of them really work. In 2002, Dan Christensen, Greg Egan and I did a bunch of supercomputer calculations to study this question. We showed that the most popular spin foam theory at the time gave dramatically different answers than people had hoped for. I think we more or less killed that theory.

That left me rather depressed. I don’t enjoy programming: indeed, Christensen and Egan did all the hard work of that sort on our paper. I didn’t want to spend years sifting through spin foam theories to find one that works. And most of all, I didn’t want to end up as an old man still not knowing if my work had been worthwhile! To me n­-category theory was clearly the math of the future—and it was easy for me to come up with cool new ideas in that subject. So, I quit quantum gravity and switched to n­-categories.

But this was very painful. Quantum gravity is a kind of “holy grail” in physics. When you work on that subject, you wind up talking to lots of people who believe that unifying quantum mechanics and general relativity is the most important thing in the world, and that nothing else could possibly be as interesting. You wind up believing it. It took me years to get out of that mindset.

Ironically, when I quit quantum gravity, I felt free to explore string theory. As a branch of math, it’s really wonderful. I started looking at how n­-categories apply to string theory. It turns out there’s a wonderful story here: briefly, particles are to categories as strings are to 2­-categories, and all the math of particles can be generalized to strings using this idea! I worked out a bit of this story with Urs Schreiber and John Huerta.

Around 2010, I felt I had to switch to working on environmental issues and math related to engineering and biology, for the sake of the planet. That was another painful renunciation. But luckily, Urs Schreiber and others are continuing to work on n­-categories and string theory, and doing it better than I ever could. So I don’t feel the need to work on those things anymore—indeed, it would be hard to keep up. I just follow along quietly from the sidelines.

It’s quite possible that we need a dozen more good ideas before we really get anywhere on quantum gravity. But I feel confident that n­-categories will have a role to play. So, I’m happy to have helped push that subject forward.

Your uncle, Albert Baez, was a physicist. How did he help develop your interests?

He had a huge effect on me. He’s mainly famous for being the father of the folk singer Joan Baez. But he started out in optics and helped invent the first X­-ray microscope. Later he became very involved in physics education, especially in what were then called third­-world countries. For example, in 1951 he helped set up a physics department at the University of Baghdad.

Albert V. Baez

When I was a kid he worked for UNESCO, so he’d come to Washington D.C. and stay with my parents, who lived nearby. Whenever he showed up, he would open his suitcase and pull out amazing gadgets: diffraction gratings, holograms, and things like that. And he would explain how they work! I decided physics was the coolest thing there is.

When I was eight, he gave me a copy of his book The New College Physics: A Spiral Approach. I immediately started trying to read it. The “spiral approach” is a great pedagogical principle: instead of explaining each topic just once, you should start off easy and then keep spiraling around from topic to topic, examining them in greater depth each time you revisit them. So he not only taught me physics, he taught me about how to learn and how to teach.

Later, when I was fifteen, I spent a couple weeks at his apartment in Berkeley. He showed me the Lawrence Hall of Science, which is where I got my first taste of programming—in BASIC, with programs stored on paper tape. This was in 1976. He also gave me a copy of The Feynman Lectures on Physics. And so, the following summer, when I was working at a state park building trails and such, I was also trying to learn quantum mechanics from the third volume of The Feynman Lectures. The other kids must have thought I was a complete geek—which of course I was.

Give us some insight on what your average work day is like.

During the school year I teach two or three days a week. On days when I teach, that’s the focus of my day: I try to prepare my classes starting at breakfast. Teaching is lots of fun for me. Right now I’m teaching two courses: an undergraduate course on game theory and a graduate course on category theory. I’m also running a seminar on category theory. In addition, I meet with my graduate students for a four­-hour session once a week: they show me what they’ve done, and we try to push our research projects forward.

On days when I don’t teach, I spend a lot of time writing. I love blogging, so I could easily do that all day, but I try to spend a lot of time writing actual papers. Any given paper starts out being tough to write, but near the end it practically writes itself. At the end, I have to tear myself away from it: I keep wanting to add more. At that stage, I feel an energetic glow at the end of a good day spent writing. Few things are so satisfying.

During the summer I don’t teach, so I can get a lot of writing done. I spent two years doing research at the Centre of Quantum Technologies, which is in Singapore, and since 2012 I’ve been working there during summers. Sometimes I bring my grad students, but mostly I just write.

I also spend plenty of time doing things with my wife, like talking, cooking, shopping, and working out at the gym. We like to watch TV shows in the evening, mainly mysteries and science fiction.

We also do a lot of gardening. When I was younger that seemed boring ­ but as you get older, subjective time speeds up, so you pay more attention to things like plants growing. There’s something tremendously satisfying about planting a small seedling, watching it grow into an orange tree, and eating its fruit for breakfast.

I love playing the piano and recording electronic music, but doing it well requires big blocks of time, which I don’t always have. Music is pure delight, and if I’m not listening to it I’m usually composing it in my mind.

If I gave in to my darkest urges and becames a decadent wastrel I might spend all day blogging, listening to music, recording music and working on pure math. But I need other things to stay sane.

What research are you working on at the moment?

Lately I’ve been trying to finish a paper called “Struggles with the Continuum”. It’s about the problems physics has with infinities, due to the assumption that spacetime is a continuum. At certain junctures this paper became psychologically difficult to write, since it’s supposed to include a summary of quantum field theory, which is complicated and sprawling subject. So, I’ve resorted to breaking this paper into blog articles and posting them on Physics Forums, just to motivate myself.

Purely for fun, I’ve been working with Greg Egan on some projects involving the octonions. The octonions are a number system where you can add, subtract, multiply and divide. Such number systems only exist in 1, 2, 4, and 8 dimensions: you’ve got the real numbers, which form a line, the complex numbers, which form a plane, the quaternions, which are 4­-dimensional, and the octonions, which are 8­dimensional. The octonions are the biggest, but also the weirdest. For example, multiplication of octonions violates the associative law: (xy)z is not equal to x(yz). So the octonions sound completely crazy at first, but they turn out to have fascinating connections to string theory and other things. They’re pretty addictive, and if became a decadent wastrel I would spend a lot more time on them.

The 240 unit integral octonions, projected onto a plane

There’s a concept of “integer” for the octonions, and integral octonions form a lattice, a repeating pattern of points, in 8 dimensions. This is called the E8 lattice. There’s another lattices that lives in in 24 dimensions, called the “Leech lattice”. Both are connected to string theory. Notice that 8+2 equals 10, the dimension superstrings like to live in, and 24+2 equals 26, the dimension bosonic strings like to live in. That’s not a coincidence! The 2 here comes from the 2­-dimensional world-sheet of the string.

Since 3×8 is 24, Egan and I became interested in how you could built the Leech lattice from 3 copies of the E8 lattice. People already knew a trick for doing it, but it took us a while to understand how it worked—and then Egan showed you could do this trick in exactly 17,280 ways! I want to write up the proof. There’s a lot of beautiful geometry here.

There’s something really exhilarating about struggling to reach the point where you have some insight into these structures and how they’re connected to physics.

My main work, though, involves using category theory to study networks. I’m interested in networks of all kinds, from electrical circuits to neural networks to “chemical reaction networks” and many more. Different branches of science and engineering focus on different kinds of networks. But there’s not enough communication between researchers in different subjects, so it’s up to mathematicians to set up a unified theory of networks.

I’ve got seven grad students working on this project—or actually eight, if you count Brendan Fong: I’ve been helping him on his dissertation, but he’s actually a student at Oxford.

Brendan was the first to join the project. I wanted him to work on electrical circuits, which are a nice familiar kind of network, a good starting point. But he went much deeper: he developed a general category­-theoretic framework for studying networks. We then applied it to electrical circuits, and other things as well.

Blake Pollard and Brendan Fong at the Centre for Quantum Technologies

Blake Pollard is a student of mine in the physics department here at U. C. Riverside. Together with Brendan and me, he developed a category­-theoretic approach to Markov processes: random processes where a system hops around between different states. We used Brendan’s general formalism to reduce Markov processes to electrical circuits. Now Blake is going further and using these ideas to tackle chemical reaction networks.

My other students are in the math department at U. C. Riverside. Jason Erbele is working on “control theory”, a branch of engineering where you try to design feedback loops to make sure processes run in a stable way. Control theory uses networks called “signal flow diagrams”, and Jason has worked out how to understand these using category theory.

Signal flow diagram for an inverted pendulum on a cart

Jason isn’t using Brendan’s framework: he’s using a different one, called PROPs, which were developed a long time ago for use in algebraic topology. My student Franciscus Rebro has been developing it further, for use in our project. It gives a nice way to describe networks in terms of their basic building blocks. It also illuminates the similarity between signal flow diagrams and Feynman diagrams! They’re very similar, but there’s a big difference: in signal flow diagrams the signals are classical, while Feynman diagrams are quantum­-mechanical.

My student Brandon Coya has been working on electrical circuits. He’s sort of continuing what Brendan started, and unifying Brendan’s formalism with PROPs.

My student Adam Yassine is starting to work on networks in classical mechanics. In classical mechanics you usually consider a single system: you write down the Hamiltonian, you get the equations of motion, and you try to solve them. He’s working on a setup where you can take lots of systems and hook them up into a network.

My students Kenny Courser and Daniel Cicala are digging deeper into another aspect of network theory. As I hinted earlier, a category is about things and processes that turn one thing into another. In a 2­-category we also have “meta-processes” that turn one process into another. We’re starting to bring 2­-categories into network theory.

For example, you can use categories to describe an electrical circuit as a process that turns some inputs into some outputs. You put some currents in one end and some currents come out the other end. But you can also use 2­-categories to describe “meta-processes” that turn one electrical circuit into another. An example of a meta-process would be a way of simplifying an electrical circuit, like replacing two resistors in series by a single resistor.

Ultimately I want to push these ideas in the direction of biochemistry. Biology seems complicated and “messy” to physicists and mathematicians, but I think there must be a beautiful logic to it. It’s full of networks, and these networks change with time. So, 2­-categories seem like a natural language for biology.

It won’t be easy to convince people of this, but that’s okay.

## Network Calculus

14 March, 2016

If anyone here can go to this talk and report back, I’d be most grateful! I just heard about it from Jamie Vicary.

Jens Schmitt, Network calculus, Thursday 17 March 2016, 10:00 am, Tony Hoare Room, Robert Hooke Building, the University of Oxford.

Abstract. This talk is about Network Calculus: a recent methodology to provide performance guarantees in concurrent programs, digital circuits, and communication networks. There is a deterministic and a stochastic version of network calculus, providing worst-case and probabilistic guarantees, respectively. The deterministic network calculus has been developed in the last 25 years and is somewhat a settled methodology; the stochastic network calculus is ten years younger and while some of the dust has settled there are still many open fundamental research questions. For both, the key innovation over existing methods is that they work with bounds on the arrival and service processes of the system under analysis. This often enables dealing with complex systems where conventional methods become intractable—of course at a loss of exact results, but still with rigorous performance bounds.

## Corelations in Network Theory

2 February, 2016

Category theory reduces a large chunk of math to the clever manipulation of arrows. One of the fun things about this is that you can often take a familiar mathematical construction, think of it category-theoretically, and just turn around all the arrows to get something new and interesting!

In math we love functions. If we have a function

$f: X \to Y$

we can formally turn around the arrow to think of $f$ as something going back from $Y$ back to $X$. But this something is usually not a function: it’s called a ‘cofunction’. A cofunction from $Y$ to $X$ is simply a function from $X$ to $Y.$

Cofunctions are somewhat interesting, but they’re really just functions viewed through a looking glass, so they don’t give much new—at least, not by themselves.

The game gets more interesting if we think of functions and cofunctions as special sorts of relations. A relation from $X$ to $Y$ is a subset

$R \subseteq X \times Y$

It’s a function when for each $x \in X$ there’s a unique $y \in Y$ with $(x,y) \in R.$ It’s a cofunction when for each $y \in Y$ there’s a unique $x \in x$ with $(x,y) \in R.$

Just as we can compose functions, we can compose relations. Relations have certain advantages over functions: for example, we can ‘turn around’ any relation $R$ from $X$ to $Y$ and get a relation $R^\dagger$ from $Y$ to $X:$

$R^\dagger = \{(y,x) : \; (x,y) \in R \}$

If we turn around a function we get a cofunction, and vice versa. But we can also do other fun things: for example, since both functions and cofunctions are relations, we can compose a function and a cofunction and get a relation.

Of course, relations also have certain disadvantages compared to functions. But it’s utterly clear by now that the category $\mathrm{FinRel},$ where the objects are finite sets and the morphisms are relations, is very important.

So far, so good. But what happens if we take the definition of ‘relation’ and turn all the arrows around?

There are actually several things I could mean by this question, some more interesting than others. But one of them gives a very interesting new concept: the concept of ‘corelation’. And two of my students have just written a very nice paper on corelations:

• Brandon Coya and Brendan Fong, Corelations are the prop for extraspecial commutative Frobenius monoids.

Here’s why this paper is important for network theory: corelations between finite sets are exactly what we need to describe electrical circuits made of ideal conductive wires! A corelation from a finite set $X$ to a finite set $Y$ can be drawn this way:

I have drawn more wires than strictly necessary: I’ve drawn a wire between two points whenever I want current to be able to flow between them. But there’s a reason I did this: a corelation from $X$ to $Y$ simply tells us when current can flow from one point in either of these sets to any other point in these sets.

Of course circuits made solely of conductive wires are not very exciting for electrical engineers. But in an earlier paper, Brendan introduced corelations as an important stepping-stone toward more general circuits:

• John Baez and Brendan Fong, A compositional framework for passive linear circuits. (Blog article here.)

The key point is simply that you use conductive wires to connect resistors, inductors, capacitors, batteries and the like and build interesting circuits—so if you don’t fully understand the math of conductive wires, you’re limited in your ability to understand circuits in general!

In their new paper, Brendan teamed up with Brandon Coya, and they figured out all the rules obeyed by the category $\mathrm{FinCorel},$ where the objects are finite sets and the morphisms are corelations. I’ll explain these rules later.

This sort of analysis had previously been done for $\mathrm{FinRel},$ and it turns out there’s a beautiful analogy between the two cases! Here is a chart displaying the analogy:

 Spans Cospans extra bicommutative bimonoids special commutative Frobenius monoids Relations Corelations extraspecial bicommutative bimonoids extraspecial commutative Frobenius monoids

I’m sure this will be cryptic to the nonmathematicians reading this, and even many mathematicians—but the paper explains what’s going on here.

I’ll actually say what an ‘extraspecial commutative Frobenius monoid’ is later in this post. This is a terse way of listing all the rules obeyed by corelations between finite sets—and thus, all the rules obeyed by conductive wires.

But first, let’s talk about something simpler.

### What is a corelation?

Just as we can define functions as relations of a special sort, we can also define relations in terms of functions. A relation from $X$ to $Y$ is a subset

$R \subseteq X \times Y$

but we can think of this as an equivalence class of one-to-one functions

$i: R \to X \times Y$

Why an equivalence class? The image of $i$ is our desired subset of $X \times Y.$ The set $R$ here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the elements of $X \times Y$ that are in the image of $i.$

Now we have a relation described as an arrow, or really an equivalence class of arrows. Next, let’s turn the arrow around!

There are different things I might mean by that, but we want to do it cleverly. When we turn arrows around, the concept of product (for example, cartesian product $X \times Y$ of sets) turns into the concept of sum (for example, disjoint union $X + Y$ of sets). Similarly, the concept of monomorphism (such as a one-to-one function) turns into the concept of epimorphism (such as an onto function). If you don’t believe me, click on the links!

So, we should define a corelation from a set $X$ to a set $Y$ to be an equivalence class of onto functions

$p: X + Y \to C$

Why an equivalence class? The set $C$ here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the sets of elements of $X + Y$ that get mapped to the same thing via $p.$

In simpler terms, a corelation from $X$ to a set $Y$ is just a partition of the disjoint union $X + Y.$ So, it looks like this:

If we like, we can then draw a line connecting any two points that lie in the same part of the partition:

These lines determine the corelation, so we can also draw a corelation this way:

This is why corelations describe circuits made solely of wires!

### The rules governing corelations

The main result in Brandon and Brendan’s paper is that $\mathrm{FinCorel}$ is equivalent to the PROP for extraspecial commutative Frobenius monoids. That’s a terse way of the laws governing $\mathrm{FinCorel}.$

Let me just show you the most important laws. In each of these law I’ll draw two circuits made of wires, and write an equals sign asserting that they give the same corelation from a set $X$ to a set $Y.$ The inputs $X$ of each circuit are on top, and the outputs $Y$ are at the bottom. I’ll draw 3-way junctions as little triangles, but don’t worry about that. When we compose two corelations we may get a wire left in mid-air, not connected to the inputs or outputs. We draw the end of the wire as a little circle.

There are some laws called the ‘commutative monoid’ laws:

and an upside-down version called the ‘cocommutative comonoid’ laws:

Then we have ‘Frobenius laws’:

and finally we have the ‘special’ and ‘extra’ laws:

All other laws can be derived from these in some systematic ways.

Commutative Frobenius monoids obey the commutative monoid laws, the cocommutative comonoid laws and the Frobenius laws. They play a fundamental role in 2d topological quantum field theory. Special Frobenius monoids are also well-known. But the ‘extra’ law, which says that a little piece of wire not connected to anything can be thrown away with no effect, is less well studied. Jason Erbele and I gave it this name in our work on control theory:

• John Baez and Jason Erbele, Categories in control. (Blog article here.)

### For more

David Ellerman has spent a lot of time studying what would happen to mathematics if we turned around a lot of arrows in a certain systematic way. In particular, just as the concept of relation would be replaced by the concept of corelation, the concept of subset would be replaced by the concept of partition. You can see how it fits together: just as a relation from $X$ to $Y$ is a subset of $X \times Y,$ a corelation from $X$ to $Y$ is a partition of $X + Y.$

There’s a lattice of subsets of a set:

In logic these subsets correspond to propositions, and the lattice operations are the logical operations ‘and’ and ‘or’. But there’s also a lattice of partitions of a set:

In Ellerman’s vision, this lattice of partitions gives a new kind of logic. You can read about it here:

• David Ellerman, Introduction to partition logic, Logic Journal of the Interest Group in Pure and Applied Logic 22 (2014), 94–125.

As mentioned, the main result in Brandon and Brendan’s paper is that $\mathrm{FinCorel}$ is equivalent to the PROP for extraspecial commutative Frobenius monoids. After they proved this, they noticed that the result has also been stated in other language and proved in other ways by two other authors:

• Fabio Zanasi, Interacting Hopf Algebras—the Theory of Linear Systems, PhD thesis, École Normale Supériere de Lyon, 2015.

• K. Dosen and Z. Petrić, Syntax for split preorders, Annals of Pure and Applied Logic 164 (2013), 443–481.

Unsurprisingly, I prefer Brendan and Brandon’s approach to deriving the result. But it’s nice to see different perspectives!

## The Internal Model Principle

27 January, 2016

“Every good key must be a model of the lock it opens.”

That sentence states an obvious fact, but perhaps also a profound insight if we interpret it generally enough.

That sentence is also the title of a paper:

• Daniel L. Scholten, Every good key must be a model of the lock it opens (the Conant & Ashby Theorem revisited), 2010.

Scholten gives a lot of examples, including these:

• A key is a model of a lock’s keyhole.

• A city street map is a model of the actual city streets

• A restaurant menu is a model of the food the restaurant prepares and sells.

• Honey bees use a kind of dance to model the location of a source of nectar.

• An understanding of some phenomenon (for example a physicist’s understanding of lightning) is a mental model of the actual phenomenon.

This line of thought has an interesting application to control theory. It suggests that to do the best job of regulating some system, a control apparatus should include a model of that system.

Indeed, much earlier, Conant and Ashby tried to turn this idea into a theorem, the ‘good regulator theorem’:

• Roger C. Conant and W. Ross Ashby, Every good regulator of a system must be a model of that system), International Journal of Systems Science 1 (1970), 89–97.

Scholten’s paper is heavily based on this earlier paper. He summarizes it as follows:

What all of this means, more or less, is that the pursuit of a goal by some dynamic agent (Regulator) in the face of a source of obstacles (System) places at least one particular and unavoidable demand on that agent, which is that the agent’s behaviors must be executed in such a reliable and predictable way that they can serve as a representation (Model) of that source of obstacles.

It’s not clear that this is true, but it’s an appealing thought.

A particularly self-referential example arises when the regulator is some organism and the System is the world it lives in, including itself. In this case, it seems the regulator should include a model of itself! This would lead, ultimately, to self-awareness.

It all sounds great. But Scholten raises an obvious question: if Conant and Ashby’s theorem is so great, why isn’t more well-known? Scholten puts it quite vividly:

Given the preponderance of control-models that are used by humans (the evidence for this preponderance will be surveyed in the latter part of the paper), and especially given the obvious need to regulate that system, one might guess that the C&A theorem would be at least as famous as, say, the Pythagorean Theorem ($a^2 + b^2 = c^2$), the Einstein mass-energy equivalence ($E = mc^2,$ which can be seen on T-shirts and bumper stickers), or the DNA double helix (which actually shows up in TV crime dramas and movies about super heroes). And yet, it would appear that relatively few lay-persons have ever even heard of C&A’s important prerequisite to successful regulation.

There could be various explanations. But here’s mine: when I tried to read Conant and Ashby’s paper, I got stuck. They use some very basic mathematical notation in nonstandard ways, and they don’t clearly state the hypotheses and conclusion of their theorem.

Luckily, the paper is short, and the argument, while mysterious, seems simple. So, I immediately felt I should be able to dream up the hypotheses, conclusion, and argument based on the hints given.

Scholten’s paper didn’t help much, since he says:

Throughout the following discussion I will assume that the reader has studied Conant & Ashby’s original paper, possesses the level of technical competence required to understand their proof, and is familiar with the components of the basic model that they used to prove their theorem [….]

However, I have a guess about the essential core of Conant and Ashby’s theorem. So, I’ll state that, and then say more about their setup.

Needless to say, I looked around to see if someone else had already done the work of figuring out what Conant and Ashby were saying. The best thing I found was this:

• B. A. Francis and W. M. Wonham, The internal model principle of control theory, Automatica 12 (1976) 457–465.

This paper works in a more specialized context: linear control theory. They’ve got a linear system or ‘plant’ responding to some input, a regulator or ‘compensator’ that is trying to make the plant behave in a desired way, and a ‘disturbance’ that affects the plant in some unwanted way. They prove that to perfectly correct for the disturbance, the compensator must contain an ‘internal model’ of the disturbance.

I’m probably stating this a bit incorrectly. This paper is much more technical, but it seems to be more careful in stating assumptions and conclusions. In particular, they seem to give a precise definition of an ‘internal model’. And I read elsewhere that the ‘internal model principle’ proved here has become a classic result in control theory!

This paper says that Conant and Ashby’s paper provided “plausibility arguments in favor of the internal model idea”. So, perhaps Conant and Ashby inspired Francis and Wonham, and were then largely forgotten.

### My guess

My guess is that Conant and Ashby’s theorem boils down to this:

Theorem. Let $R$ and $S$ be finite sets, and fix a probability distribution $p$ on $S$. Suppose $q$ is any probability distribution on $R \times S$ such that

$\displaystyle{ p(s) = \sum_{r \in R} q(r,s) \; \textrm{for all} \; s \in S}$

Let $H(p)$ be the Shannon entropy of $p$ and let $H(q)$ be the Shannon entropy of $q.$ Then

$H(q) \ge H(p)$

and equality is achieved if there is a function

$h: S \to R$

such that

$q(r,s) = \left\{\begin{array}{cc} p(s) & \textrm{if} \; r = h(s) \\ 0 & \textrm{otherwise} \end{array} \right.$       █

Note that this is not an ‘if and only if’.

The proof of this is pretty easy to anyone who knows a bit about probability theory and entropy. I can restate it using a bit of standard jargon, which may make it more obvious to experts. We’ve got an $S$-valued random variable, say $\textbf{s}.$ We want to extend it to an $R \times S$-valued random variable $(\textbf{r}, \textbf{s})$ whose entropy is small as possible. Then we can achieve this by choosing a function $h: S \to R,$ and letting $\textbf{s} = h(\textbf{r}).$

Here’s the point: if we make $\textbf{s}$ be a function of $\textbf{r},$ we aren’t adding any extra randomness, so the entropy doesn’t go up.

What in the world does this have to do with a good regulator containing a model of the system it’s regulating?

Well, I can’t explain that as well as I’d like—sorry. But the rough idea seems to be this. Suppose that $S$ is a system with a given random behavior, and $R$ is another system, the regulator. If we want the combination of the system and regulator to behave as ‘nonrandomly’ as possible, we can let the state of the regulator be a function of the state of the system.

This theorem is actually a ‘lemma’ in Conant and Ashby’s paper. Let’s look at their setup, and the ‘good regulator theorem’ as they actually state it.

### Their setup

Conant and Ashby consider five sets and three functions. In a picture:

The sets are these:

• A set $Z$ of possible outcomes.

• A goal: some subset $G \subseteq Z$ of good outcomes

• A set $D$ of disturbances, which I might prefer to call ‘inputs’.

• A set $S$ of states of some system that is affected by the disturbances.

• A set $R$ of states of some regulator that is also affected by the disturbances.

The functions are these:

• A function $\phi : D \to S$ saying how a disturbance determines a state of the system.

• A function $\rho: D \to R$ saying how a disturbance determines a state of the regulator.

• A function $\psi: S \times R \to Z$ saying how a state of the system and a state of the regulator determines an outcome.

Of course we want some conditions on these maps. What we want, I guess, is for the outcome to be good regardless of the disturbance. I might say that as follows: for every $d \in D$ we have

$\psi(\phi(d), \rho(d)) \in G$

Unfortunately Conant and Ashby say they want this:

$\rho \subset [\psi^{-1}(G)]\phi$

I can’t parse this: they’re using math notation in ways I don’t recognize. Can you figure out what they mean, and whether it matches my guess above?

Then, after a lot of examples and stuff, they state their theorem:

Theorem. The simplest optimal regulator $R$ of a reguland $S$ produces events $R$ which are related to events $S$ by a mapping $h: S \to R.$

Clearly I’ve skipped over too much! This barely makes any sense at all.

Unfortunately, looking at the text before the theorem, I don’t see these terms being explained. Furthermore, their ‘proof’ introduces extra assumptions that were not mentioned in the statement of the theorem. It begins:

The sets $R, S,$ and $Z$ and the mapping $\psi: R \times S \to Z$ are presumed given. We will assume that over the set $S$ there exists a probability distribution $p(S)$ which gives the relative frequencies of the events in $S.$ We will further assume that the behaviour of any particular regulator $R$ is specified by a conditional distribution $p(R|S)$ giving, for each event in $S,$ a distribution on the regulatory events in $R.$

Get it? Now they’re saying the state of the regulator $R$ depends on the state of the system $S$ via a conditional probability distribution $p(r|s)$ where $r \in R$ and $s \in S.$ It’s odd that they didn’t mention this earlier! Their picture made it look like the state of the regulator is determined by the ‘disturbance’ via the function $\rho: D \to R.$ But okay.

They’re also assuming there’s a probability distribution on $S.$ They use this and the above conditional probability distribution to get a probability distribution on $R.$

In fact, the set $D$ and the functions out of this set seem to play no role in their proof!

It’s unclear to me exactly what we’re given, what we get to choose, and what we’re trying to optimize. They do try to explain this. Here’s what they say:

Now $p(S)$ and $p(R|S)$ jointly determine $p(R,S)$ and hence $p(Z)$ and $H(Z),$ the entropy in the set of outcomes:

$\displaystyle{ H(Z) = - \sum_{z \in Z} p(z) \log (p(z)) }$

With $p(S)$ fixed, the class of optimal regulators therefore corresponds to the class of optimal distributions $p(R|S)$ for which $H(Z)$ is minimal. We will call this class of optimal distributions $\pi.$

I could write a little essay on why this makes me unhappy, but never mind. I’m used to the habit of using the same letter $p$ to stand for probability distributions on lots of different sets: folks let the argument of $p$ say which set they have in mind at any moment. So, they’re starting with a probability distribution on $S$ and a conditional probability distribution on $r \in R$ given $s \in S.$ They’re using these to determine probability distribution on $R \times S.$ Then, presumably using the map $\psi: S \times R \to Z,$ they get a probability distribution on $Z.$ $H(Z)$ is the entropy of the probability distribution on $Z,$ and for some reason they are trying to minimize this.

(Where did the subset $G \subseteq Z$ of ‘good’ outcomes go? Shouldn’t that play a role? Oh well.)

I believe the claim is that when this entropy is minimized, there’s a function $h : S \to R$ such that

$p(r|s) = 1 \; \textrm{if} \; r = h(s)$

This says that the state of the regulator should be completely determined by the the state of the system. And this, I believe, is what they mean by

Every good regulator of a system must be a model of that system.

I hope you understand: I’m not worrying about whether the setup is a good one, e.g. sufficiently general for real-world applications. I’m just trying to figure out what the setup actually is, what Conant and Ashby’s theorem actually says, and whether it’s true.

I think I’ve just made a lot of progress. Surely this was no fun to read. But it I found it useful to write it.

## Biology, Networks and Control Theory

13 September, 2015

The Institute for Mathematics and its Applications (or IMA, in Minneapolis, Minnesota), is teaming up with the Mathematical Biosciences Institute (or MBI, in Columbus, Ohio). They’re having a big program on control theory and networks:

### At the IMA

Here’s what’s happening at the Institute for Mathematics and its Applications:

Concepts and techniques from control theory are becoming increasingly interdisciplinary. At the same time, trends in modern control theory are influenced and inspired by other disciplines. As a result, the systems and control community is rapidly broadening its scope in a variety of directions. The IMA program is designed to encourage true interdisciplinary research and the cross fertilization of ideas. An important element for success is that ideas flow across disciplines in a timely manner and that the cross-fertilization takes place in unison.

Due to the usefulness of control, talent from control theory is drawn and often migrates to other important areas, such as biology, computer science, and biomedical research, to apply its mathematical tools and expertise. It is vital that while the links are strong, we bring together researchers who have successfully bridged into other disciplines to promote the role of control theory and to focus on the efforts of the controls community. An IMA investment in this area will be a catalyst for many advances and will provide the controls community with a cohesive research agenda.

In all topics of the program the need for research is pressing. For instance, viable implementations of control algorithms for smart grids are an urgent and clearly recognized need with considerable implications for the environment and quality of life. The mathematics of control will undoubtedly influence technology and vice-versa. The urgency for these new technologies suggests that the greatest impact of the program is to have it sooner rather than later.

First trimester (Fall 2015): Networks, whether social, biological, swarms of animals or vehicles, the Internet, etc., constitute an increasingly important subject in science and engineering. Their connectivity and feedback pathways affect robustness and functionality. Such concepts are at the core of a new and rapidly evolving frontier in the theory of dynamical systems and control. Embedded systems and networks are already pervasive in automotive, biological, aerospace, and telecommunications technologies and soon are expected to impact the power infrastructure (smart grids). In this new technological and scientific realm, the modeling and representation of systems, the role of feedback, and the value and cost of information need to be re-evaluated and understood. Traditional thinking that is relevant to a limited number of feedback loops with practically unlimited bandwidth is no longer applicable. Feedback control and stability of network dynamics is a relatively new endeavor. Analysis and control of network dynamics will occupy mostly the first trimester while applications to power networks will be a separate theme during the third trimester. The first trimester will be divided into three workshops on the topics of analysis of network dynamics and regulation, communication and cooperative control over networks, and a separate one on biological systems and networks.

The second trimester (Winter 2016) will have two workshops. The first will be on modeling and estimation (Workshop 4) and the second one on distributed parameter systems and partial differential equations (Workshop 5). The theme of Workshop 4 will be on structure and parsimony in models. The goal is to explore recent relevant theories and techniques that allow sparse representations, rank constrained optimization, and structural constraints in models and control designs. Our intent is to blend a group of researchers in the aforementioned topics with a select group of researchers with interests in a statistical viewpoint. Workshop 5 will focus on distributed systems and related computational issues. One of our aims is to bring control theorists with an interest in optimal control of distributed parameter systems together with mathematicians working on optimal transport theory (in essence an optimal control problem). The subject of optimal transport is rapidly developing with ramifications in probability and statistics (of essence in system modeling and hence of interest to participants in Workshop 4 as well) and in distributed control of PDE’s. Emphasis will also be placed on new tools and new mathematical developments (in PDE’s, computational methods, optimization). The workshops will be closely spaced to facilitate participation in more than one.

The third trimester (Spring 2016) will target applications where the mathematics of systems and control may soon prove to have a timely impact. From the invention of atomic force microscopy at the nanoscale to micro-mirror arrays for a next generation of telescopes, control has played a critical role in sensing and imaging of challenging new realms. At present, thanks to recent technological advances of AFM and optical tweezers, great strides are taking place making it possible to manipulate the biological transport of protein molecules as well as the control of individual atoms. Two intertwined subject areas, quantum and nano control and scientific instrumentation, are seen to blend together (Workshop 6) with partial focus on the role of feedback control and optimal filtering in achieving resolution and performance at such scales. A second theme (Workshop 7) will aim at control issues in distributed hybrid systems, at a macro scale, with a specific focus the “smart grid” and energy applications.

• Workshop 1, Distributed Control and Decision Making Over Networks, 28 September – 2 October 2015.

• Workshop 2, Analysis and Control of Network Dynamics, 19-23 October 2015.

• Workshop 3, Biological Systems and Networks, 11-16 November 2015.

• Workshop 4, Optimization and Parsimonious Modeling, 25-29 January 2016.

• Workshop 5, Computational Methods for Control of Infinite-dimensional Systems, 14-18 March 2016.

• Workshop 6, Quantum and Nano Control, 11-15 April 2016.

• Workshop 7, Control at Large Scales: Energy Markets and Responsive Grids, 9-13 March 2016.

### At the MBI

Here’s what’s going on at the Mathematical Biology Institute:

The MBI network program is part of a yearlong cooperative program with IMA.

Networks and deterministic and stochastic dynamical systems on networks are used as models in many areas of biology. This underscores the importance of developing tools to understand the interplay between network structures and dynamical processes, as well as how network dynamics can be controlled. The dynamics associated with such models are often different from what one might traditionally expect from a large system of equations, and these differences present the opportunity to develop exciting new theories and methods that should facilitate the analysis of specific models. Moreover, a nascent area of research is the dynamics of networks in which the networks themselves change in time, which occurs, for example, in plasticity in neuroscience and in up regulation and down regulation of enzymes in biochemical systems.

There are many areas in biology (including neuroscience, gene networks, and epidemiology) in which network analysis is now standard. Techniques from network science have yielded many biological insights in these fields and their study has yielded many theorems. Moreover, these areas continue to be exciting areas that contain both concrete and general mathematical problems. Workshop 1 explores the mathematics behind the applications in which restrictions on general coupled systems are important. Examples of such restrictions include symmetry, Boolean dynamics, and mass-action kinetics; and each of these special properties permits the proof of theorems about dynamics on these special networks.

Workshop 2 focuses on the interplay between stochastic and deterministic behavior in biological networks. An important related problem is to understand how stochasticity affects parameter estimation. Analyzing the relationship between stochastic changes, network structure, and network dynamics poses mathematical questions that are new, difficult, and fascinating.

In recent years, an increasing number of biological systems have been modeled using networks whose structure changes in time or which use multiple kinds of couplings between the same nodes or couplings that are not just pairwise. General theories such as groupoids and hypergraphs have been developed to handle the structure in some of these more general coupled systems, and specific application models have been studied by simulation. Workshop 3 will bring together theorists, modelers, and experimentalists to address the modeling of biological systems using new network structures and the analysis of such structures.

Biological systems use control to achieve desired dynamics and prevent undesirable behaviors. Consequently, the study of network control is important both to reveal naturally evolved control mechanisms that underlie the functioning of biological systems and to develop human-designed control interventions to recover lost function, mitigate failures, or repurpose biological networks. Workshop 4 will address the challenging subjects of control and observability of network dynamics.

#### Events

Workshop 1: Dynamics in Networks with Special Properties, 25-29 January, 2016.

Workshop 2: The Interplay of Stochastic and Deterministic Dynamics in Networks, 22-26 February, 2016.

Workshop 3: Generalized Network Structures and Dynamics, 21-15 March, 2016.

Workshop 4: Control and Observability of Network Dynamics, 11-15 April, 2016.

You can get more schedule information on these posters:

## A Compositional Framework for Markov Processes

4 September, 2015

This summer my students Brendan Fong and Blake Pollard visited me at the Centre for Quantum Technologies, and we figured out how to understand open continuous-time Markov chains! I think this is a nice step towards understanding the math of living systems.

Admittedly, it’s just a small first step. But I’m excited by this step, since Blake and I have been trying to get this stuff to work for a couple years, and it finally fell into place. And we think we know what to do next.

Here’s our paper:

• John C. Baez, Brendan Fong and Blake S. Pollard, A compositional framework for open Markov processes.

And here’s the basic idea…

### Open detailed balanced Markov processes

A continuous-time Markov chain is a way to specify the dynamics of a population which is spread across some finite set of states. Population can flow between the states. The larger the population of a state, the more rapidly population flows out of the state. Because of this property, under certain conditions the populations of the states tend toward an equilibrium where at any state the inflow of population is balanced by its outflow.

In applications to statistical mechanics, we are often interested in equilibria such that for any two states connected by an edge, say $i$ and $j,$ the flow from $i$ to $j$ equals the flow from $j$ to $i.$ A continuous-time Markov chain with a chosen equilibrium having this property is called ‘detailed balanced‘.

I’m getting tired of saying ‘continuous-time Markov chain’, so from now on I’ll just say ‘Markov process’, just because it’s shorter. Okay? That will let me say the next sentence without running out of breath:

Our paper is about open detailed balanced Markov processes.

Here’s an example:

The detailed balanced Markov process itself consists of a finite set of states together with a finite set of edges between them, with each state $i$ labelled by an equilibrium population $q_i >0,$ and each edge $e$ labelled by a rate constant $r_e > 0.$

These populations and rate constants are required to obey an equation called the ‘detailed balance condition’. This equation means that in equilibrium, the flow from $i$ to $j$ equal the flow from $j$ to $i.$ Do you see how it works in this example?

To get an ‘open’ detailed balanced Markov process, some states are designated as inputs or outputs. In general each state may be specified as both an input and an output, or as inputs and outputs multiple times. See how that’s happening in this example? It may seem weird, but it makes things work better.

People usually say Markov processes are all about how probabilities flow from one state to another. But we work with un-normalized probabilities, which we call ‘populations’, rather than probabilities that must sum to 1. The reason is that in an open Markov process, probability is not conserved: it can flow in or out at the inputs and outputs. We allow it to flow both in and out at both the input states and the output states.

Our most fundamental result is that there’s a category $\mathrm{DetBalMark}$ where a morphism is an open detailed balanced Markov process. We think of it as a morphism from its inputs to its outputs.

We compose morphisms in $\mathrm{DetBalMark}$ by identifying the output states of one open detailed balanced Markov process with the input states of another. The populations of identified states must match. For example, we may compose this morphism $N$:

with the previously shown morphism $M$ to get this morphism $M \circ N$:

And here’s our second most fundamental result: the category $\mathrm{DetBalMark}$ is actually a dagger compact category. This lets us do other stuff with open Markov processes. An important one is ‘tensoring’, which lets us take two open Markov processes like $M$ and $N$ above and set them side by side, giving $M \otimes N$:

The so-called compactness is also important. This means we can take some inputs of an open Markov process and turn them into outputs, or vice versa. For example, using the compactness of $\mathrm{DetBalMark}$ we can get this open Markov process from $M$:

In fact all the categories in our paper are dagger compact categories, and all our functors preserve this structure. Dagger compact categories are a well-known framework for describing systems with inputs and outputs, so this is good.

### The analogy to electrical circuits

In a detailed balanced Markov process, population can flow along edges. In the detailed balanced equilibrium, without any flow of population from outside, the flow along from state $i$ to state $j$ will be matched by the flow back from $j$ to $i.$ The populations need to take specific values for this to occur.

In an electrical circuit made of linear resistors, charge can flow along wires. In equilibrium, without any driving voltage from outside, the current along each wire will be zero. The potentials will be equal at every node.

This sets up an analogy between detailed balanced continuous-time Markov chains and electrical circuits made of linear resistors! I love analogy charts, so this makes me very happy:

 Circuits Detailed balanced Markov processes potential population current flow conductance rate constant power dissipation

This analogy is already well known. Schnakenberg used it in his book Thermodynamic Network Analysis of Biological Systems. So, our main goal is to formalize and exploit it. This analogy extends from systems in equilibrium to the more interesting case of nonequilibrium steady states, which are the main topic of our paper.

Earlier, Brendan and I introduced a way to ‘black box’ a circuit and define the relation it determines between potential-current pairs at the input and output terminals. This relation describes the circuit’s external behavior as seen by an observer who can only perform measurements at the terminals.

An important fact is that black boxing is ‘compositional’: if one builds a circuit from smaller pieces, the external behavior of the whole circuit can be determined from the external behaviors of the pieces. For category theorists, this means that black boxing is a functor!

Our new paper with Blake develops a similar ‘black box functor’ for detailed balanced Markov processes, and relates it to the earlier one for circuits.

When you black box a detailed balanced Markov process, you get the relation between population–flow pairs at the terminals. (By the ‘flow at a terminal’, we more precisely mean the net population outflow.) This relation holds not only in equilibrium, but also in any nonequilibrium steady state. Thus, black boxing an open detailed balanced Markov process gives its steady state dynamics as seen by an observer who can only measure populations and flows at the terminals.

### The principle of minimum dissipation

At least since the work of Prigogine, it’s been widely accepted that a large class of systems minimize entropy production in a nonequilibrium steady state. But people still fight about the the precise boundary of this class of systems, and even the meaning of this ‘principle of minimum entropy production’.

For detailed balanced open Markov processes, we show that a quantity we call the ‘dissipation’ is minimized in any steady state. This is a quadratic function of the populations and flows, analogous to the power dissipation of a circuit made of resistors. We make no claim that this quadratic function actually deserves to be called ‘entropy production’. Indeed, Schnakenberg has convincingly argued that they are only approximately equal.

But still, the ‘dissipation’ function is very natural and useful—and Prigogine’s so-called ‘entropy production’ is also a quadratic function.

### Black boxing

I’ve already mentioned the category $\mathrm{DetBalMark},$ where a morphism is an open detailed balanced Markov process. But our paper needs two more categories to tell its story! There’s the category of circuits, and the category of linear relations.

A morphism in the category $\mathrm{Circ}$ is an open electrical circuit made of resistors: that is, a graph with each edge labelled by a ‘conductance’ $c_e > 0,$ together with specified input and output nodes:

A morphism in the category $\mathrm{LinRel}$ is a linear relation $L : U \leadsto V$ between finite-dimensional real vector spaces $U$ and $V.$ This is nothing but a linear subspace $L \subseteq U \oplus V.$ Just as relations generalize functions, linear relations generalize linear functions!

In our previous paper, Brendan and I introduced these two categories and a functor between them, the ‘black box functor’:

$\blacksquare : \mathrm{Circ} \to \mathrm{LinRel}$

The idea is that any circuit determines a linear relation between the potentials and net current flows at the inputs and outputs. This relation describes the behavior of a circuit of resistors as seen from outside.

Our new paper introduces a black box functor for detailed balanced Markov processes:

$\square : \mathrm{DetBalMark} \to \mathrm{LinRel}$

We draw this functor as a white box merely to distinguish it from the other black box functor. The functor $\square$ maps any detailed balanced Markov process to the linear relation obeyed by populations and flows at the inputs and outputs in a steady state. In short, it describes the steady state behavior of the Markov process ‘as seen from outside’.

How do we manage to black box detailed balanced Markov processes? We do it using the analogy with circuits!

### The analogy becomes a functor

Every analogy wants to be a functor. So, we make the analogy between detailed balanced Markov processes and circuits precise by turning it into a functor:

$K : \mathrm{DetBalMark} \to \mathrm{Circ}$

This functor converts any open detailed balanced Markov process into an open electrical circuit made of resistors. This circuit is carefully chosen to reflect the steady-state behavior of the Markov process. Its underlying graph is the same as that of the Markov process. So, the ‘states’ of the Markov process are the same as the ‘nodes’ of the circuit.

Both the equilibrium populations at states of the Markov process and the rate constants labelling edges of the Markov process are used to compute the conductances of edges of this circuit. In the simple case where the Markov process has exactly one edge from any state $i$ to any state $j,$ the rule is this:

$C_{i j} = H_{i j} q_j$

where:

$q_j$ is the equilibrium population of the $j$th state of the Markov process,

$H_{i j}$ is the rate constant for the edge from the $j$th state to the $i$th state of the Markov process, and

$C_{i j}$ is the conductance (that is, the reciprocal of the resistance) of the wire from the $j$th node to the $i$th node of the resulting circuit.

The detailed balance condition for Markov processes says precisely that the matrix $C_{i j}$ is symmetric! This is just right for an electrical circuit made of resistors, since it means that the resistance of the wire from node $i$ to node $j$ equals the resistance of the same wire in the reverse direction, from node $j$ to node $i.$

### A triangle of functors

If you paid careful attention, you’ll have noticed that I’ve described a triangle of functors:

And if you know anything about how category theorists think, you’ll be wondering if this diagram commutes.

In fact, this triangle of functors does not commute! However, a general lesson of category theory is that we should only expect diagrams of functors to commute up to natural isomorphism, and this is what happens here:

The natural transformation $\alpha$ ‘corrects’ the black box functor for resistors to give the one for detailed balanced Markov processes.

The functors $\square$ and $\blacksquare \circ K$ are actually equal on objects. An object in $\mathrm{DetBalMark}$ is a finite set $X$ with each element $i \in X$ labelled a positive populations $q_i.$ Both functors map this object to the vector space $\mathbb{R}^X \oplus \mathbb{R}^X.$ For the functor $\square,$ we think of this as a space of population-flow pairs. For the functor $\blacksquare \circ K,$ we think of it as a space of potential-current pairs. The natural transformation $\alpha$ then gives a linear relation

$\alpha_{X,q} : \mathbb{R}^X \oplus \mathbb{R}^X \leadsto \mathbb{R}^X \oplus \mathbb{R}^X$

in fact an isomorphism of vector spaces, which converts potential-current pairs into population-flow pairs in a manner that depends on the $q_i.$ I’ll skip the formula; it’s in the paper.

But here’s the key point. The naturality of $\alpha$ actually allows us to reduce the problem of computing the functor $\square$ to the problem of computing $\blacksquare.$ Suppose

$M: (X,q) \to (Y,r)$

is any morphism in $\mathrm{DetBalMark}.$ The object $(X,q)$ is some finite set $X$ labelled by populations $q,$ and $(Y,r)$ is some finite set $Y$ labelled by populations $r.$ Then the naturality of $\alpha$ means that this square commutes:

Since $\alpha_{X,q}$ and $\alpha_{Y,r}$ are isomorphisms, we can solve for the functor $\square$ as follows:

$\square(M) = \alpha_Y \circ \blacksquare K(M) \circ \alpha_X^{-1}$

This equation has a clear intuitive meaning! It says that to compute the behavior of a detailed balanced Markov process, namely $\square(f),$ we convert it into a circuit made of resistors and compute the behavior of that, namely $\blacksquare K(f).$ This is not equal to the behavior of the Markov process, but we can compute that behavior by converting the input populations and flows into potentials and currents, feeding them into our circuit, and then converting the outputs back into populations and flows.

### What we really do

So that’s a sketch of what we do, and I hope you ask questions if it’s not clear. But I also hope you read our paper! Here’s what we actually do in there. After an introduction and summary of results:

• Section 3 defines open Markov processes and the open master equation.

• Section 4 introduces detailed balance for open Markov
processes.

• Section 5 recalls the principle of minimum power
for open circuits made of linear resistors, and explains how to black box them.

• Section 6 introduces the principle of minimum dissipation for open detailed balanced Markov processes, and describes how to black box these.

• Section 7 states the analogy between circuits and detailed balanced Markov processes in a formal way.

• Section 8 describes how to compose open Markov processes, making them into the morphisms of a category.

• Section 9 does the same for detailed balanced Markov processes.

• Section 10 describes the ‘black box functor’ that sends any open detailed balanced Markov process to the linear relation describing its external behavior, and recalls the similar functor for circuits.

• Section 11 makes the analogy between between open detailed balanced Markov processes and open circuits even more formal, by making it into a functor. We prove that together with the two black box functors, this forms a triangle that commutes up to natural isomorphism.

• Section 12 is about geometric aspects of this theory. We show that the linear relations in the image of these black box functors are Lagrangian relations between symplectic vector spaces. We also show that the master equation can be seen as a gradient flow equation.

• Section 13 is a summary of what we have learned.

Finally, Appendix A is a quick tutorial on decorated cospans. This is a key mathematical tool in our work, developed by Brendan in an earlier paper.

## Trends in Reaction Network Theory (Part 2)

1 July, 2015

Here in Copenhagen we’ll soon be having a bunch of interesting talks on chemical reaction networks:

Workshop on Mathematical Trends in Reaction Network Theory, 1-3 July 2015, Department of Mathematical Sciences, University of Copenhagen. Organized by Elisenda Feliu and Carsten Wiuf.

Looking through the abstracts, here are a couple that strike me.

First of all, Gheorghe Craciun claims to have proved the biggest open conjecture in this field: the Global Attractor Conjecture!

• Gheorge Craciun, Toric differential inclusions and a proof of the global attractor conjecture.

This famous old conjecture says that for a certain class of chemical reactions, the ones coming from ‘complex balanced reaction networks’, the chemicals will approach equilibrium no matter what their initial concentrations are. Here’s what Craciun says:

Abstract. In a groundbreaking 1972 paper Fritz Horn and Roy Jackson showed that a complex balanced mass-action system must have a unique locally stable equilibrium within any compatibility class. In 1974 Horn conjectured that this equilibrium is a global attractor, i.e., all solutions in the same compatibility class must converge to this equilibrium. Later, this claim was called the Global Attractor Conjecture, and it was shown that it has remarkable implications for the dynamics of large classes of polynomial and power-law dynamical systems, even if they are not derived from mass-action kinetics. Several special cases of this conjecture have been proved during the last decade. We describe a proof of the conjecture in full generality. In particular, it will follow that all detailed balanced mass action systems and all deficiency zero mass-action systems have the global attractor property. We will also discuss some implications for biochemical mechanisms that implement noise filtering and cellular homeostasis.

Manoj Gopalkrishnan wrote a great post explaining the concept of complex balanced reaction network here on Azimuth, so if you want to understand the conjecture you could start there.

Even better, Manoj is talking here about a way to do statistical inference with chemistry! His talk is called ‘Statistical inference with a chemical soup’:

Abstract. The goal is to design an “intelligent chemical soup” that can do statistical inference. This may have niche technological applications in medicine and biological research, as well as provide fundamental insight into the workings of biochemical reaction pathways. As a first step towards our goal, we describe a scheme that exploits the remarkable mathematical similarity between log-linear models in statistics and chemical reaction networks. We present a simple scheme that encodes the information in a log-linear model as a chemical reaction network. Observed data is encoded as initial concentrations, and the equilibria of the corresponding mass-action system yield the maximum likelihood estimators. The simplicity of our scheme suggests that molecular environments, especially within cells, may be particularly well suited to performing statistical computations.

It’s based on this paper:

• Manoj Gopalkrishnan, A scheme for molecular computation of maximum likelihood estimators for log-linear models.

I’m not sure, but this idea may exploit existing analogies between the approach to equilibrium in chemistry, the approach to equilibrium in evolutionary game theory, and statistical inference. You may have read Marc Harper’s post about that stuff!

David Doty is giving a broader review of ‘Computation by (not about) chemistry’:

Abstract. The model of chemical reaction networks (CRNs) is extensively used throughout the natural sciences as a descriptive language for existing chemicals. If we instead think of CRNs as a programming language for describing artificially engineered chemicals, what sorts of computations are possible for these chemicals to achieve? The answer depends crucially on several formal choices:

1) Do we treat matter as infinitely divisible (real-valued concentrations) or atomic (integer-valued counts)?

2) How do we represent the input and output of the computation (e.g., Boolean presence or absence of species, positive numbers directly represented by counts/concentrations, positive and negative numbers represented indirectly by the difference between counts/concentrations of a pair of species)?

3) Do we assume mass-action rate laws (reaction rates proportional to reactant counts/concentrations) or do we insist the system works correctly under a broader class of rate laws?

The talk will survey several recent results and techniques. A primary goal of the talk is to convey the “programming perspective”: rather than asking “What does chemistry do?”, we want to understand “What could chemistry do?” as well as “What can chemistry provably not do?”

I’m really interested in chemical reaction networks that appear in biological systems, and there will be lots of talks about that. For example, Ovidiu Radulescu will talk about ‘Taming the complexity of biochemical networks through model reduction and tropical geometry’. Model reduction is the process of simplifying complicated models while preserving at least some of their good features. Tropical geometry is a cool version of algebraic geometry that uses the real numbers with minimization as addition and addition as multiplication. This number system underlies the principle of least action, or the principle of maximum energy. Here is Radulescu’s abstract:

Abstract. Biochemical networks are used as models of cellular physiology with diverse applications in biology and medicine. In the absence of objective criteria to detect essential features and prune secondary details, networks generated from data are too big and therefore out of the applicability of many mathematical tools for studying their dynamics and behavior under perturbations. However, under circumstances that we can generically denote by multi-scaleness, large biochemical networks can be approximated by smaller and simpler networks. Model reduction is a way to find these simpler models that can be more easily analyzed. We discuss several model reduction methods for biochemical networks with polynomial or rational rate functions and propose as their common denominator the notion of tropical equilibration, meaning finite intersection of tropical varieties in algebraic geometry. Using tropical methods, one can strongly reduce the number of variables and parameters of biochemical network. For multi-scale networks, these reductions are computed symbolically on orders of magnitude of parameters and variables, and are valid in wide domains of parameter and phase spaces.

I’m talking about the analogy between probabilities and quantum amplitudes, and how this makes chemistry analogous to particle physics. You can see two versions of my talk here, but I’ll be giving the ‘more advanced’ version, which is new:

Abstract. Some ideas from quantum theory are just beginning to percolate back to classical probability theory. For example, the master equation for a chemical reaction network describes the interactions of molecules in a stochastic rather than quantum way. If we look at it from the perspective of quantum theory, this formalism turns out to involve creation and annihilation operators, coherent states and other well-known ideas—but with a few big differences.

Anyway, there are a lot more talks, but if I don’t have breakfast and walk over to the math department, I’ll miss those talks!