## Corelations in Network Theory

2 February, 2016

Category theory reduces a large chunk of math to the clever manipulation of arrows. One of the fun things about this is that you can often take a familiar mathematical construction, think of it category-theoretically, and just turn around all the arrows to get something new and interesting!

In math we love functions. If we have a function

$f: X \to Y$

we can formally turn around the arrow to think of $f$ as something going back from $Y$ back to $X$. But this something is usually not a function: it’s called a ‘cofunction’. A cofunction from $Y$ to $X$ is simply a function from $X$ to $Y.$

Cofunctions are somewhat interesting, but they’re really just functions viewed through a looking glass, so they don’t give much new—at least, not by themselves.

The game gets more interesting if we think of functions and cofunctions as special sorts of relations. A relation from $X$ to $Y$ is a subset

$R \subseteq X \times Y$

It’s a function when for each $x \in X$ there’s a unique $y \in Y$ with $(x,y) \in R.$ It’s a cofunction when for each $y \in Y$ there’s a unique $x \in x$ with $(x,y) \in R.$

Just as we can compose functions, we can compose relations. Relations have certain advantages over functions: for example, we can ‘turn around’ any relation $R$ from $X$ to $Y$ and get a relation $R^\dagger$ from $Y$ to $X:$

$R^\dagger = \{(y,x) : \; (x,y) \in R \}$

If we turn around a function we get a cofunction, and vice versa. But we can also do other fun things: for example, since both functions and cofunctions are relations, we can compose a function and a cofunction and get a relation.

Of course, relations also have certain disadvantages compared to functions. But it’s utterly clear by now that the category $\mathrm{FinRel},$ where the objects are finite sets and the morphisms are relations, is very important.

So far, so good. But what happens if we take the definition of ‘relation’ and turn all the arrows around?

There are actually several things I could mean by this question, some more interesting than others. But one of them gives a very interesting new concept: the concept of ‘corelation’. And two of my students have just written a very nice paper on corelations:

• Brandon Coya and Brendan Fong, Corelations are the prop for extraspecial commutative Frobenius monoids.

Here’s why this paper is important for network theory: corelations between finite sets are exactly what we need to describe electrical circuits made of ideal conductive wires! A corelation from a finite set $X$ to a finite set $Y$ can be drawn this way:

I have drawn more wires than strictly necessary: I’ve drawn a wire between two points whenever I want current to be able to flow between them. But there’s a reason I did this: a corelation from $X$ to $Y$ simply tells us when current can flow from one point in either of these sets to any other point in these sets.

Of course circuits made solely of conductive wires are not very exciting for electrical engineers. But in an earlier paper, Brendan introduced corelations as an important stepping-stone toward more general circuits:

• John Baez and Brendan Fong, A compositional framework for passive linear circuits. (Blog article here.)

The key point is simply that you use conductive wires to connect resistors, inductors, capacitors, batteries and the like and build interesting circuits—so if you don’t fully understand the math of conductive wires, you’re limited in your ability to understand circuits in general!

In their new paper, Brendan teamed up with Brandon Coya, and they figured out all the rules obeyed by the category $\mathrm{FinCorel},$ where the objects are finite sets and the morphisms are corelations. I’ll explain these rules later.

This sort of analysis had previously been done for $\mathrm{FinRel},$ and it turns out there’s a beautiful analogy between the two cases! Here is a chart displaying the analogy:

 Spans Cospans extra bicommutative bimonoids special commutative Frobenius monoids Relations Corelations extraspecial bicommutative bimonoids extraspecial commutative Frobenius monoids

I’m sure this will be cryptic to the nonmathematicians reading this, and even many mathematicians—but the paper explains what’s going on here.

I’ll actually say what an ‘extraspecial commutative Frobenius monoid’ is later in this post. This is a terse way of listing all the rules obeyed by corelations between finite sets—and thus, all the rules obeyed by conductive wires.

But first, let’s talk about something simpler.

### What is a corelation?

Just as we can define functions as relations of a special sort, we can also define relations in terms of functions. A relation from $X$ to $Y$ is a subset

$R \subseteq X \times Y$

but we can think of this as an equivalence class of one-to-one functions

$i: R \to X \times Y$

Why an equivalence class? The image of $i$ is our desired subset of $X \times Y.$ The set $R$ here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the elements of $X \times Y$ that are in the image of $i.$

Now we have a relation described as an arrow, or really an equivalence class of arrows. Next, let’s turn the arrow around!

There are different things I might mean by that, but we want to do it cleverly. When we turn arrows around, the concept of product (for example, cartesian product $X \times Y$ of sets) turns into the concept of sum (for example, disjoint union $X + Y$ of sets). Similarly, the concept of monomorphism (such as a one-to-one function) turns into the concept of epimorphism (such as an onto function). If you don’t believe me, click on the links!

So, we should define a corelation from a set $X$ to a set $Y$ to be an equivalence class of onto functions

$p: X + Y \to C$

Why an equivalence class? The set $C$ here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the sets of elements of $X + Y$ that get mapped to the same thing via $p.$

In simpler terms, a corelation from $X$ to a set $Y$ is just a partition of the disjoint union $X + Y.$ So, it looks like this:

If we like, we can then draw a line connecting any two points that lie in the same part of the partition:

These lines determine the corelation, so we can also draw a corelation this way:

This is why corelations describe circuits made solely of wires!

### The rules governing corelations

The main result in Brandon and Brendan’s paper is that $\mathrm{FinCorel}$ is equivalent to the PROP for extraspecial commutative Frobenius monoids. That’s a terse way of the laws governing $\mathrm{FinCorel}.$

Let me just show you the most important laws. In each of these law I’ll draw two circuits made of wires, and write an equals sign asserting that they give the same corelation from a set $X$ to a set $Y.$ The inputs $X$ of each circuit are on top, and the outputs $Y$ are at the bottom. I’ll draw 3-way junctions as little triangles, but don’t worry about that. When we compose two corelations we may get a wire left in mid-air, not connected to the inputs or outputs. We draw the end of the wire as a little circle.

There are some laws called the ‘commutative monoid’ laws:

and an upside-down version called the ‘cocommutative comonoid’ laws:

Then we have ‘Frobenius laws’:

and finally we have the ‘special’ and ‘extra’ laws:

All other laws can be derived from these in some systematic ways.

Commutative Frobenius monoids obey the commutative monoid laws, the cocommutative comonoid laws and the Frobenius laws. They play a fundamental role in 2d topological quantum field theory. Special Frobenius monoids are also well-known. But the ‘extra’ law, which says that a little piece of wire not connected to anything can be thrown away with no effect, is less well studied. Jason Erbele and I gave it this name in our work on control theory:

• John Baez and Jason Erbele, Categories in control. (Blog article here.)

### For more

David Ellerman has spent a lot of time studying what would happen to mathematics if we turned around a lot of arrows in a certain systematic way. In particular, just as the concept of relation would be replaced by the concept of corelation, the concept of subset would be replaced by the concept of partition. You can see how it fits together: just as a relation from $X$ to $Y$ is a subset of $X \times Y,$ a corelation from $X$ to $Y$ is a partition of $X + Y.$

There’s a lattice of subsets of a set:

In logic these subsets correspond to propositions, and the lattice operations are the logical operations ‘and’ and ‘or’. But there’s also a lattice of partitions of a set:

In Ellerman’s vision, this lattice of partitions gives a new kind of logic. You can read about it here:

• David Ellerman, Introduction to partition logic, Logic Journal of the Interest Group in Pure and Applied Logic 22 (2014), 94–125.

As mentioned, the main result in Brandon and Brendan’s paper is that $\mathrm{FinCorel}$ is equivalent to the PROP for extraspecial commutative Frobenius monoids. After they proved this, they noticed that the result has also been stated in other language and proved in other ways by two other authors:

• Fabio Zanasi, Interacting Hopf Algebras—the Theory of Linear Systems, PhD thesis, École Normale Supériere de Lyon, 2015.

• K. Dosen and Z. Petrić, Syntax for split preorders, Annals of Pure and Applied Logic 164 (2013), 443–481.

Unsurprisingly, I prefer Brendan and Brandon’s approach to deriving the result. But it’s nice to see different perspectives!

## The Internal Model Principle

27 January, 2016

“Every good key must be a model of the lock it opens.”

That sentence states an obvious fact, but perhaps also a profound insight if we interpret it generally enough.

That sentence is also the title of a paper:

• Daniel L. Scholten, Every good key must be a model of the lock it opens (the Conant & Ashby Theorem revisited), 2010.

Scholten gives a lot of examples, including these:

• A key is a model of a lock’s keyhole.

• A city street map is a model of the actual city streets

• A restaurant menu is a model of the food the restaurant prepares and sells.

• Honey bees use a kind of dance to model the location of a source of nectar.

• An understanding of some phenomenon (for example a physicist’s understanding of lightning) is a mental model of the actual phenomenon.

This line of thought has an interesting application to control theory. It suggests that to do the best job of regulating some system, a control apparatus should include a model of that system.

Indeed, much earlier, Conant and Ashby tried to turn this idea into a theorem, the ‘good regulator theorem’:

• Roger C. Conant and W. Ross Ashby, Every good regulator of a system must be a model of that system), International Journal of Systems Science 1 (1970), 89–97.

Scholten’s paper is heavily based on this earlier paper. He summarizes it as follows:

What all of this means, more or less, is that the pursuit of a goal by some dynamic agent (Regulator) in the face of a source of obstacles (System) places at least one particular and unavoidable demand on that agent, which is that the agent’s behaviors must be executed in such a reliable and predictable way that they can serve as a representation (Model) of that source of obstacles.

It’s not clear that this is true, but it’s an appealing thought.

A particularly self-referential example arises when the regulator is some organism and the System is the world it lives in, including itself. In this case, it seems the regulator should include a model of itself! This would lead, ultimately, to self-awareness.

It all sounds great. But Scholten raises an obvious question: if Conant and Ashby’s theorem is so great, why isn’t more well-known? Scholten puts it quite vividly:

Given the preponderance of control-models that are used by humans (the evidence for this preponderance will be surveyed in the latter part of the paper), and especially given the obvious need to regulate that system, one might guess that the C&A theorem would be at least as famous as, say, the Pythagorean Theorem ($a^2 + b^2 = c^2$), the Einstein mass-energy equivalence ($E = mc^2,$ which can be seen on T-shirts and bumper stickers), or the DNA double helix (which actually shows up in TV crime dramas and movies about super heroes). And yet, it would appear that relatively few lay-persons have ever even heard of C&A’s important prerequisite to successful regulation.

There could be various explanations. But here’s mine: when I tried to read Conant and Ashby’s paper, I got stuck. They use some very basic mathematical notation in nonstandard ways, and they don’t clearly state the hypotheses and conclusion of their theorem.

Luckily, the paper is short, and the argument, while mysterious, seems simple. So, I immediately felt I should be able to dream up the hypotheses, conclusion, and argument based on the hints given.

Scholten’s paper didn’t help much, since he says:

Throughout the following discussion I will assume that the reader has studied Conant & Ashby’s original paper, possesses the level of technical competence required to understand their proof, and is familiar with the components of the basic model that they used to prove their theorem [….]

However, I have a guess about the essential core of Conant and Ashby’s theorem. So, I’ll state that, and then say more about their setup.

Needless to say, I looked around to see if someone else had already done the work of figuring out what Conant and Ashby were saying. The best thing I found was this:

• B. A. Francis and W. M. Wonham, The internal model principle of control theory, Automatica 12 (1976) 457–465.

This paper works in a more specialized context: linear control theory. They’ve got a linear system or ‘plant’ responding to some input, a regulator or ‘compensator’ that is trying to make the plant behave in a desired way, and a ‘disturbance’ that affects the plant in some unwanted way. They prove that to perfectly correct for the disturbance, the compensator must contain an ‘internal model’ of the disturbance.

I’m probably stating this a bit incorrectly. This paper is much more technical, but it seems to be more careful in stating assumptions and conclusions. In particular, they seem to give a precise definition of an ‘internal model’. And I read elsewhere that the ‘internal model principle’ proved here has become a classic result in control theory!

This paper says that Conant and Ashby’s paper provided “plausibility arguments in favor of the internal model idea”. So, perhaps Conant and Ashby inspired Francis and Wonham, and were then largely forgotten.

### My guess

My guess is that Conant and Ashby’s theorem boils down to this:

Theorem. Let $R$ and $S$ be finite sets, and fix a probability distribution $p$ on $S$. Suppose $q$ is any probability distribution on $R \times S$ such that

$\displaystyle{ p(s) = \sum_{r \in R} q(r,s) \; \textrm{for all} \; s \in S}$

Let $H(p)$ be the Shannon entropy of $p$ and let $H(q)$ be the Shannon entropy of $q.$ Then

$H(q) \ge H(p)$

and equality is achieved if there is a function

$h: S \to R$

such that

$q(r,s) = \left\{\begin{array}{cc} p(s) & \textrm{if} \; r = h(s) \\ 0 & \textrm{otherwise} \end{array} \right.$       █

Note that this is not an ‘if and only if’.

The proof of this is pretty easy to anyone who knows a bit about probability theory and entropy. I can restate it using a bit of standard jargon, which may make it more obvious to experts. We’ve got an $S$-valued random variable, say $\textbf{s}.$ We want to extend it to an $R \times S$-valued random variable $(\textbf{r}, \textbf{s})$ whose entropy is small as possible. Then we can achieve this by choosing a function $h: S \to R,$ and letting $\textbf{s} = h(\textbf{r}).$

Here’s the point: if we make $\textbf{s}$ be a function of $\textbf{r},$ we aren’t adding any extra randomness, so the entropy doesn’t go up.

What in the world does this have to do with a good regulator containing a model of the system it’s regulating?

Well, I can’t explain that as well as I’d like—sorry. But the rough idea seems to be this. Suppose that $S$ is a system with a given random behavior, and $R$ is another system, the regulator. If we want the combination of the system and regulator to behave as ‘nonrandomly’ as possible, we can let the state of the regulator be a function of the state of the system.

This theorem is actually a ‘lemma’ in Conant and Ashby’s paper. Let’s look at their setup, and the ‘good regulator theorem’ as they actually state it.

### Their setup

Conant and Ashby consider five sets and three functions. In a picture:

The sets are these:

• A set $Z$ of possible outcomes.

• A goal: some subset $G \subseteq Z$ of good outcomes

• A set $D$ of disturbances, which I might prefer to call ‘inputs’.

• A set $S$ of states of some system that is affected by the disturbances.

• A set $R$ of states of some regulator that is also affected by the disturbances.

The functions are these:

• A function $\phi : D \to S$ saying how a disturbance determines a state of the system.

• A function $\rho: D \to R$ saying how a disturbance determines a state of the regulator.

• A function $\psi: S \times R \to Z$ saying how a state of the system and a state of the regulator determines an outcome.

Of course we want some conditions on these maps. What we want, I guess, is for the outcome to be good regardless of the disturbance. I might say that as follows: for every $d \in D$ we have

$\psi(\phi(d), \rho(d)) \in G$

Unfortunately Conant and Ashby say they want this:

$\rho \subset [\psi^{-1}(G)]\phi$

I can’t parse this: they’re using math notation in ways I don’t recognize. Can you figure out what they mean, and whether it matches my guess above?

Then, after a lot of examples and stuff, they state their theorem:

Theorem. The simplest optimal regulator $R$ of a reguland $S$ produces events $R$ which are related to events $S$ by a mapping $h: S \to R.$

Clearly I’ve skipped over too much! This barely makes any sense at all.

Unfortunately, looking at the text before the theorem, I don’t see these terms being explained. Furthermore, their ‘proof’ introduces extra assumptions that were not mentioned in the statement of the theorem. It begins:

The sets $R, S,$ and $Z$ and the mapping $\psi: R \times S \to Z$ are presumed given. We will assume that over the set $S$ there exists a probability distribution $p(S)$ which gives the relative frequencies of the events in $S.$ We will further assume that the behaviour of any particular regulator $R$ is specified by a conditional distribution $p(R|S)$ giving, for each event in $S,$ a distribution on the regulatory events in $R.$

Get it? Now they’re saying the state of the regulator $R$ depends on the state of the system $S$ via a conditional probability distribution $p(r|s)$ where $r \in R$ and $s \in S.$ It’s odd that they didn’t mention this earlier! Their picture made it look like the state of the regulator is determined by the ‘disturbance’ via the function $\rho: D \to R.$ But okay.

They’re also assuming there’s a probability distribution on $S.$ They use this and the above conditional probability distribution to get a probability distribution on $R.$

In fact, the set $D$ and the functions out of this set seem to play no role in their proof!

It’s unclear to me exactly what we’re given, what we get to choose, and what we’re trying to optimize. They do try to explain this. Here’s what they say:

Now $p(S)$ and $p(R|S)$ jointly determine $p(R,S)$ and hence $p(Z)$ and $H(Z),$ the entropy in the set of outcomes:

$\displaystyle{ H(Z) = - \sum_{z \in Z} p(z) \log (p(z)) }$

With $p(S)$ fixed, the class of optimal regulators therefore corresponds to the class of optimal distributions $p(R|S)$ for which $H(Z)$ is minimal. We will call this class of optimal distributions $\pi.$

I could write a little essay on why this makes me unhappy, but never mind. I’m used to the habit of using the same letter $p$ to stand for probability distributions on lots of different sets: folks let the argument of $p$ say which set they have in mind at any moment. So, they’re starting with a probability distribution on $S$ and a conditional probability distribution on $r \in R$ given $s \in S.$ They’re using these to determine probability distribution on $R \times S.$ Then, presumably using the map $\psi: S \times R \to Z,$ they get a probability distribution on $Z.$ $H(Z)$ is the entropy of the probability distribution on $Z,$ and for some reason they are trying to minimize this.

(Where did the subset $G \subseteq Z$ of ‘good’ outcomes go? Shouldn’t that play a role? Oh well.)

I believe the claim is that when this entropy is minimized, there’s a function $h : S \to R$ such that

$p(r|s) = 1 \; \textrm{if} \; r = h(s)$

This says that the state of the regulator should be completely determined by the the state of the system. And this, I believe, is what they mean by

Every good regulator of a system must be a model of that system.

I hope you understand: I’m not worrying about whether the setup is a good one, e.g. sufficiently general for real-world applications. I’m just trying to figure out what the setup actually is, what Conant and Ashby’s theorem actually says, and whether it’s true.

I think I’ve just made a lot of progress. Surely this was no fun to read. But it I found it useful to write it.

21 December, 2015

A few years ago, after hearing Susan Holmes speak about the mathematics of phylogenetic trees, I became interested in their connection to algebraic topology. I wrote an article about this here:

• John Baez, Operads and the tree of life, 6 July 2011.

In trying to the make the ideas precise I recruited the help of Nina Otter, who was then a graduate student at ETH Zürich. She came to Riverside and we started to work together.

Now Nina is a grad student at Oxford working on mathematical biology with Heather Harrington. I visited her last summer and we made more progress… but then she realized that our paper needed another big theorem, a result relating our topology on the space of phylogenetic trees to the topology described by Susan Holmes and her coauthors here:

• Louis J. Billera, Susan P. Holmes and Karen Vogtmann, Geometry of the space of phylogenetic trees, Advances in Applied Mathematics 27 (2001), 733–767.

It took another half year to finish things up. I could never have done this myself.

But now we’re done! Here’s our paper:

• John Baez and Nina Otter, Operads and phylogenetic trees.

### The basic idea

Trees are important, not only in mathematics, but also biology. The most important is the ‘tree of life’ relating all organisms that have ever lived on Earth. Darwin drew this sketch of it in 1837:

He wrote about it in On the Origin of Species

The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth. The green and budding twigs may represent existing species; and those produced during former years may represent the long succession of extinct species. At each period of growth all the growing twigs have tried to branch out on all sides, and to overtop and kill the surrounding twigs and branches, in the same manner as species and groups of species have at all times overmastered other species in the great battle for life.

Now we know that the tree of life is not really a tree in the mathematical sense. One reason is ‘endosymbiosis’: the incorporation of one organism together with its genetic material into another, as probably happened with the mitochondria in our cells and also the plastids that hold chlorophyll in plants. Another is ‘horizontal gene transfer’: the passing of genetic material from one organism to another, which happens frequently with bacteria. So, the tree of life is really a thicket:

But a tree is often a good approximation, especially for animals and plants in the last few hundred million years. Biologists who try to infer phylogenetic trees from present-day genetic data often use simple models where:

• the genotype of each species follows a random walk, but

• species branch in two at various times.

These are called ‘Markov models’. The simplest Markov model for DNA evolution is the Jukes–Cantor model. Consider one or more pieces of DNA having a total of n base pairs. We can think of this as a string of letters chosen from the set {A,T,C,G}:

…ATCGATTGAGCTCTAGCG…

As time passes, the Jukes–Cantor model says the DNA changes randomly, with each base pair having the same constant rate of randomly flipping to any other. So, we get a Markov process on the set

$X = \{\textrm{A,T,C,G}\}{}^N$

However, a species can also split in two. So, given current-day genetic data from various species, biologists try to infer the most probable tree where, starting from a common ancestor, the DNA in question undergoes a random walk most of the time but branches in two at certain times.

To formalize this, we can define a concept of ‘phylogenetic tree’. Our work is based on the definition of Billera, Holmes and Vogtmann, though we use a slightly different definition, for reasons that will soon become clear. For us, a phylogenetic tree is a rooted tree with leaves labelled by numbers $1,2, \dots, n$ and edges labelled by ‘times’ or, geometrically speaking, ‘lengths’ in $[0, \infty).$ We require that:

• the length of every edge is positive, except perhaps for ‘external edges’: that is, edges incident to the leaves or root;

• there are no 1-ary vertices.

For example, here is a phylogenetic tree with 5 leaves:

where $\ell_1, \dots, \ell_6 \ge 0$ but we demand that $\ell_7 > 0.$ We draw the vertices as dots. We do not count the leaves and the root as vertices, and we label the root with the number $0.$ We cannot collapse edges of length zero that end at leaves, since doing so would eliminate those leaves. Also note that the embedding of the tree in the plane is irrelevant, so this counts as the same phylogenetic tree:

In applications to biology, we are often interested in trees where the total distance from the root to the leaf is the same for every leaf, since all species have evolved for the same time from their common ancestor. These are mathematically interesting as well, because then the distance between any two leaves defines an ultrametric on the set of leaves. However, more general phylogenetic trees are also interesting—and they become essential when we construct an operad whose operations are phylogenetic trees.

Let $\textrm{Phyl}_n$ be the set of phylogenetic trees with n leaves. This has a natural topology, which we explain in Section 6 of our paper. For example, here is a continuous path in $\textrm{Phyl}_4$ where we only change the length of one internal edge, reducing it until it becomes zero and we can collapse it:

Phylogenetic trees reconstructed by biologists are typically binary. When a phylogenetic tree appears to have higher arity, sometimes we merely lack sufficient data to resolve a higher-arity branching into a number of binary ones. With the topology we are using on $\textrm{Phyl}_n,$ binary trees form an open dense set of $\textrm{Phyl}_n,$ except for $\textrm{Phyl}_1.$ However, trees of higher arity are still important, because paths, paths of paths, etc. in $\textrm{Phyl}_n$ are often forced to pass through trees of higher arity.

Billera, Holmes and Vogtmann focused their attention on the set $\mathcal{T}_n$ of phylogenetic trees where lengths of the external edges—edges incident to the root and leaves—are fixed to a constant value. They endow $\mathcal{T}_n$ with a metric, which induces a topology on $\mathcal{T}_n,$ and there is a homeomorphism

$\textrm{Phyl}_n \cong \mathcal{T}_n \times [0,\infty)^{n+1}$

where the data in $[0,\infty)^{n+1}$ describe the lengths of the external edges in a general phylogenetic tree.

In algebraic topology, trees are often used to describe the composition of n-ary operations. This is formalized in the theory of operads. An ‘operad’ is an algebraic stucture where for each natural number $n=0,1,2,\dots$ we have a set $O_n$ whose elements are considered as abstract n-ary operations, not necessarily operating on anything yet. An element $f \in O_n$ can be depicted as a planar tree with one vertex and n labelled leaves:

We can compose these operations in a tree-like way to get new operations:

and an associative law holds, making this sort of composite unambiguous:

There are various kinds of operads, but in this paper our operads will always be ‘unital’, having an operation $1 \in O_1$ that acts as an identity for composition. They will also be ‘symmetric’, meaning there is an action of the symmetric group $S_n$ on each set $O_n,$ compatible with composition. Further, they will be ‘topological’, meaning that each set $O_n$ is a topological space, with composition and permutations acting as continuous maps.

In Section 5 we prove that there is an operad $\textrm{Phyl},$ the ‘phylogenetic operad’, whose space of n-ary operations is $\textrm{Phyl}_n.$ This raises a number of questions:

• What is the mathematical nature of this operad?

• How is it related to ‘Markov processes with branching’?

• How is it related to known operads in topology?

Briefly, the answer is that $\textrm{Phyl}$ is the coproduct of $\textrm{Com},$ the operad for commutative topological semigroups, and $[0,\infty),$ the operad having only unary operations, one for each $t \in [0,\infty).$ The first describes branching, the second describes Markov processes. Moreover, $\textrm{Phyl}$ is closely related to the Boardmann–Vogt $W$ construction applied to $\textrm{Com}.$ This is a construction that Boardmann and Vogt applied to another operad in order to obtain an operad whose algebras are loop spaces.

To understand all this in more detail, first recall that the raison d’être of operads is to have ‘algebras’. The most traditional sort of algebra of an operad $O$ is a topological space $X$ on which each operation $f \in O_n$ acts as a
continuous map

$\alpha(f)\colon X^n \to X$

obeying some conditions: composition, the identity, and the permutation group actions are preserved, and $\alpha(f)$ depends continuously on $f.$ The idea is that the abstract operations in $O$ are realized as actual operations on the space $X.$

In this paper we instead need algebras of a linear sort. Such an algebra of $O$ is a finite-dimensional real vector space $V$ on which each operation $f \in O_n$ acts as a multilinear map

$\alpha(f)\colon V^n \to X$

obeying the same list of conditions. We can also think of $\alpha(f)$ as a linear map

$\alpha(f)\colon V^{\otimes n} \to V$

where $V^{\otimes n}$ is the nth tensor power of $V.$

We also need ‘coalgebras’ of operads. The point is that while ordinarily one thinks of an operation $f \in O_n$ as having n inputs and one output, a phylogenetic tree is better thought of as having one input and n outputs. A coalgebra of $O$ is a finite-dimensional real vector space $V$ on which operation $f \in O_n$ gives a linear map

$\alpha(f) \colon V \to V^{\otimes n}$

obeying the same conditions as an algebra, but ‘turned around’.

The main point of this paper is that the phylogenetic operad has interesting coalgebras, which correspond to how phylogenetic trees are actually used to describe branching Markov processes in biology. But to understand this, we need to start by looking at coalgebras of two operads from which the phylogenetic operad is built.

By abuse of notation, we will use $[0,\infty)$ as the name for the operad having only unary operations, one for each $t \in [0,\infty),$ with composition of operations given by addition. A coalgebra of $[0,\infty)$ is a finite-dimensional real vector space $V$ together with for each $t \in [0,\infty)$ a linear map

$\alpha(t) \colon V \to V$

such that:

$\alpha(s+t) = \alpha(s) \alpha(t)$ for all $s,t \in [0,\infty)$

$\alpha(0) = 1_V$

$\alpha(t)$ depends continuously on $t.$

Analysts usually call such a thing a ‘continuous one-parameter semigroup’ of operators on $V.$

Given a finite set $X,$ a ‘Markov process’ or ‘continuous-time Markov chain’ on $X$ is a continuous one-parameter semigroup of operators on $\mathbb{R}^X$ such that if $f \in \mathbb{R}^X$ is a probability distribution on $X,$ so is $\alpha(t) f$ for all $t \in [0,\infty).$ Equivalently, if we think of $\alpha(t)$ as a $X \times X$ matrix of real numbers, we demand that its entries be nonnegative and each column sum to 1. Such a matrix is called ‘stochastic’. If $X$ is a set of possible sequences of base pairs, a Markov process on $X$ describes the random changes of DNA with the passage of time. We shall see that any Markov process on $X$ makes $\mathbb{R}^X$ into a coalgebra of $[0,\infty).$

This handles the Markov process aspect of DNA evolution; what about the branching? For this we use $\textrm{Com},$ the unique operad with one n-ary operation for each n > 0. Algebras of $\textrm{Com}$ are not-necessarily-unital commutative algebras: there is only one way to multiply n elements for n > 0.

For us what matters most is that coalgebras of $\textrm{Com}$ are finite-dimensional cocommutative coalgebras, not necessarily with counit. If $X$ is a finite set, there is a cocommutative coalgebra whose underlying vector space is $\mathbb{R}^X.$ The unique n-ary operation of $\textrm{Com}$ acts as the linear map

$\displaystyle{ \Delta_n \colon \mathbb{R}^X \to \mathbb{R}^X \otimes \cdots \otimes \mathbb{R}^X \; \cong \;\mathbb{R}^{X^n} }$

where
$\Delta_n (f)(x_1, \dots, x_n) = \left\{ \begin{array}{cl} f(x) & \textrm{if } x_1 = \cdots = x_n = x \\ \\ 0 & \textrm{otherwise} \end{array} \right.$

This map describes the ‘n-fold duplication’ of a probability distribution $f$ on the set $X$ of possible genes. We use this map to say what takes place when a species branches:

Next, we wish to describe how to combine the operads $[0,\infty)$ and $\textrm{Com}$ to obtain the phylogenetic operad. Any pair of operads $O$ and $O'$ has a coproduct $O + O'.$ The definition of coproduct gives an easy way to understand the algebras of $O + O'.$ Such an algebra is simply an object that is both an algebra of $O$ and an algebra of $O',$ with no compatibility conditions imposed.

One can also give an explicit construction of $O + O'.$ Briefly, the n-ary operations of $O + O'$ are certain equivalence classes of trees with leaves labelled $\{1,\dots, n\}$ and all nodes, except for leaves, labelled by operations in $O$ or $O'.$ While this fact is surely known to experts on operads, it seems hard to find in the literature, so we prove this in Theorem 24.

Given this, it should come as no surprise that the operad $\textrm{Phyl}$ is the coproduct $\textrm{Com} + [0,\infty).$ In fact, we shall take this as a definition. Starting from this definition, we work backwards to show that the operations of $\textrm{Phyl}$ correspond to phylogenetic trees. We prove this in Theorem 28. The definition of coproduct determines a topology on the spaces $\textrm{Phyl}_n,$ and it is a nontrivial fact that with this topology we have $\textrm{Phyl}_n \cong \mathcal{T}_n \times [0,\infty)^{n+1}$ for n > 1, where $\mathcal{T}_n$ has the topology defined by Billera, Holmes and Vogtmann. We prove this in Theorem 34.

Using the definition of the phylogenetic operad as a coproduct, it is clear that given any Markov process on a finite set $X,$ the vector space $\mathbb{R}^X$ naturally becomes a coalgebra of this operad. The reason is that, as we have seen, $\mathbb{R}^X$ is automatically a coalgebra of $\textrm{Com},$ and the Markov process makes it into a coalgebra of $[0,\infty).$ Thus, by the universal property of a coproduct, it becomes a coalgebra of $\textrm{Phyl} \cong \textrm{Com} + [0,\infty).$ We prove this in Theorem 36.

Since operads arose in algebraic topology, it is interesting to consider how the phylogenetic operad connects to ideas from that subject. Boardmann and Vogt defined a construction on operads, the ‘$W$ construction’, which when applied to the operad for spaces with an associative multiplication gives an operad for loop spaces. The operad $\textrm{Phyl}$ has an interesting relation to $W(\textrm{Com}).$ To see this, define addition on $[0,\infty]$ in the obvious way, where

$\infty + t = t + \infty = \infty$

Then $[0,\infty]$ becomes a commutative topological monoid, so we obtain an operad with only unary operations, one for each $t \in [0,\infty],$ where composition is addition. By abuse of notation, let us call this operad $[0,\infty].$

Boardmann and Vogt’s $W$ construction involves trees with edges having lengths in $[0,1],$ but we can equivalently use $[0,\infty].$ Leinster has observed that for any nonsymmetric topological operad $O,$ Boardmann and Vogt’s operad $W(O)$ is closely related to $O + [0,\infty].$ Here we make this observation precise in the symmetric case. Operations in $\textrm{Com} + [0,\infty]$ are just like phylogenetic trees except that edges may have length $\infty.$ Moreover, for any operad $O,$ the operad $W(O)$ is a non-unital suboperad of $O + [0,\infty].$ An operation of $O + [0,\infty]$ lies in $W(O)$ if and only if all the external edges of the corresponding tree have length $\infty.$ We prove this in Theorem 40.

Berger and Moerdijk showed that if $S_n$ acts freely on $O_n$ and $O_1$ is well-pointed, then $W(O)$ is a cofibrant replacement for $O.$ This is true for $O = \textrm{Assoc},$ the operad whose algebras are topological semigroups. This cofibrancy is why Boardmann and Vogt could use $W(\textrm{Assoc})$ as an operad for loop spaces. But $S_n$ does not act freely on $\textrm{Com}_n,$ and $W(\textrm{Com})$ is not a cofibrant replacement for $\textrm{Com}.$ So, it is not an operad for infinite loop spaces.

Nonetheless, the larger operad $\textrm{Com} + [0,\infty],$ a compactification of $\textrm{Phyl} = \textrm{Com} + [0,\infty),$ is somewhat interesting. The reason is that any Markov process

$\alpha \colon [0,\infty) \to \mathrm{End}(\mathbb{R}^X)$

approaches a limit as $t \to \infty.$ Indeed, $\alpha$ extends uniquely to a homomorphism from the topological monoid $[0,\infty]$ to $\mathrm{End}(\mathbb{R}^X).$ Thus, given a Markov process on a finite set $X,$ the vector space $\mathbb{R}^X$ naturally becomes a coalgebra of $\textrm{Com} + [0,\infty].$ We prove this in Theorem 38.

## Biology, Networks and Control Theory

13 September, 2015

The Institute for Mathematics and its Applications (or IMA, in Minneapolis, Minnesota), is teaming up with the Mathematical Biosciences Institute (or MBI, in Columbus, Ohio). They’re having a big program on control theory and networks:

### At the IMA

Here’s what’s happening at the Institute for Mathematics and its Applications:

Concepts and techniques from control theory are becoming increasingly interdisciplinary. At the same time, trends in modern control theory are influenced and inspired by other disciplines. As a result, the systems and control community is rapidly broadening its scope in a variety of directions. The IMA program is designed to encourage true interdisciplinary research and the cross fertilization of ideas. An important element for success is that ideas flow across disciplines in a timely manner and that the cross-fertilization takes place in unison.

Due to the usefulness of control, talent from control theory is drawn and often migrates to other important areas, such as biology, computer science, and biomedical research, to apply its mathematical tools and expertise. It is vital that while the links are strong, we bring together researchers who have successfully bridged into other disciplines to promote the role of control theory and to focus on the efforts of the controls community. An IMA investment in this area will be a catalyst for many advances and will provide the controls community with a cohesive research agenda.

In all topics of the program the need for research is pressing. For instance, viable implementations of control algorithms for smart grids are an urgent and clearly recognized need with considerable implications for the environment and quality of life. The mathematics of control will undoubtedly influence technology and vice-versa. The urgency for these new technologies suggests that the greatest impact of the program is to have it sooner rather than later.

First trimester (Fall 2015): Networks, whether social, biological, swarms of animals or vehicles, the Internet, etc., constitute an increasingly important subject in science and engineering. Their connectivity and feedback pathways affect robustness and functionality. Such concepts are at the core of a new and rapidly evolving frontier in the theory of dynamical systems and control. Embedded systems and networks are already pervasive in automotive, biological, aerospace, and telecommunications technologies and soon are expected to impact the power infrastructure (smart grids). In this new technological and scientific realm, the modeling and representation of systems, the role of feedback, and the value and cost of information need to be re-evaluated and understood. Traditional thinking that is relevant to a limited number of feedback loops with practically unlimited bandwidth is no longer applicable. Feedback control and stability of network dynamics is a relatively new endeavor. Analysis and control of network dynamics will occupy mostly the first trimester while applications to power networks will be a separate theme during the third trimester. The first trimester will be divided into three workshops on the topics of analysis of network dynamics and regulation, communication and cooperative control over networks, and a separate one on biological systems and networks.

The second trimester (Winter 2016) will have two workshops. The first will be on modeling and estimation (Workshop 4) and the second one on distributed parameter systems and partial differential equations (Workshop 5). The theme of Workshop 4 will be on structure and parsimony in models. The goal is to explore recent relevant theories and techniques that allow sparse representations, rank constrained optimization, and structural constraints in models and control designs. Our intent is to blend a group of researchers in the aforementioned topics with a select group of researchers with interests in a statistical viewpoint. Workshop 5 will focus on distributed systems and related computational issues. One of our aims is to bring control theorists with an interest in optimal control of distributed parameter systems together with mathematicians working on optimal transport theory (in essence an optimal control problem). The subject of optimal transport is rapidly developing with ramifications in probability and statistics (of essence in system modeling and hence of interest to participants in Workshop 4 as well) and in distributed control of PDE’s. Emphasis will also be placed on new tools and new mathematical developments (in PDE’s, computational methods, optimization). The workshops will be closely spaced to facilitate participation in more than one.

The third trimester (Spring 2016) will target applications where the mathematics of systems and control may soon prove to have a timely impact. From the invention of atomic force microscopy at the nanoscale to micro-mirror arrays for a next generation of telescopes, control has played a critical role in sensing and imaging of challenging new realms. At present, thanks to recent technological advances of AFM and optical tweezers, great strides are taking place making it possible to manipulate the biological transport of protein molecules as well as the control of individual atoms. Two intertwined subject areas, quantum and nano control and scientific instrumentation, are seen to blend together (Workshop 6) with partial focus on the role of feedback control and optimal filtering in achieving resolution and performance at such scales. A second theme (Workshop 7) will aim at control issues in distributed hybrid systems, at a macro scale, with a specific focus the “smart grid” and energy applications.

• Workshop 1, Distributed Control and Decision Making Over Networks, 28 September – 2 October 2015.

• Workshop 2, Analysis and Control of Network Dynamics, 19-23 October 2015.

• Workshop 3, Biological Systems and Networks, 11-16 November 2015.

• Workshop 4, Optimization and Parsimonious Modeling, 25-29 January 2016.

• Workshop 5, Computational Methods for Control of Infinite-dimensional Systems, 14-18 March 2016.

• Workshop 6, Quantum and Nano Control, 11-15 April 2016.

• Workshop 7, Control at Large Scales: Energy Markets and Responsive Grids, 9-13 March 2016.

### At the MBI

Here’s what’s going on at the Mathematical Biology Institute:

The MBI network program is part of a yearlong cooperative program with IMA.

Networks and deterministic and stochastic dynamical systems on networks are used as models in many areas of biology. This underscores the importance of developing tools to understand the interplay between network structures and dynamical processes, as well as how network dynamics can be controlled. The dynamics associated with such models are often different from what one might traditionally expect from a large system of equations, and these differences present the opportunity to develop exciting new theories and methods that should facilitate the analysis of specific models. Moreover, a nascent area of research is the dynamics of networks in which the networks themselves change in time, which occurs, for example, in plasticity in neuroscience and in up regulation and down regulation of enzymes in biochemical systems.

There are many areas in biology (including neuroscience, gene networks, and epidemiology) in which network analysis is now standard. Techniques from network science have yielded many biological insights in these fields and their study has yielded many theorems. Moreover, these areas continue to be exciting areas that contain both concrete and general mathematical problems. Workshop 1 explores the mathematics behind the applications in which restrictions on general coupled systems are important. Examples of such restrictions include symmetry, Boolean dynamics, and mass-action kinetics; and each of these special properties permits the proof of theorems about dynamics on these special networks.

Workshop 2 focuses on the interplay between stochastic and deterministic behavior in biological networks. An important related problem is to understand how stochasticity affects parameter estimation. Analyzing the relationship between stochastic changes, network structure, and network dynamics poses mathematical questions that are new, difficult, and fascinating.

In recent years, an increasing number of biological systems have been modeled using networks whose structure changes in time or which use multiple kinds of couplings between the same nodes or couplings that are not just pairwise. General theories such as groupoids and hypergraphs have been developed to handle the structure in some of these more general coupled systems, and specific application models have been studied by simulation. Workshop 3 will bring together theorists, modelers, and experimentalists to address the modeling of biological systems using new network structures and the analysis of such structures.

Biological systems use control to achieve desired dynamics and prevent undesirable behaviors. Consequently, the study of network control is important both to reveal naturally evolved control mechanisms that underlie the functioning of biological systems and to develop human-designed control interventions to recover lost function, mitigate failures, or repurpose biological networks. Workshop 4 will address the challenging subjects of control and observability of network dynamics.

#### Events

Workshop 1: Dynamics in Networks with Special Properties, 25-29 January, 2016.

Workshop 2: The Interplay of Stochastic and Deterministic Dynamics in Networks, 22-26 February, 2016.

Workshop 3: Generalized Network Structures and Dynamics, 21-15 March, 2016.

Workshop 4: Control and Observability of Network Dynamics, 11-15 April, 2016.

You can get more schedule information on these posters:

## A Compositional Framework for Markov Processes

4 September, 2015

This summer my students Brendan Fong and Blake Pollard visited me at the Centre for Quantum Technologies, and we figured out how to understand open continuous-time Markov chains! I think this is a nice step towards understanding the math of living systems.

Admittedly, it’s just a small first step. But I’m excited by this step, since Blake and I have been trying to get this stuff to work for a couple years, and it finally fell into place. And we think we know what to do next.

Here’s our paper:

• John C. Baez, Brendan Fong and Blake S. Pollard, A compositional framework for open Markov processes.

And here’s the basic idea…

### Open detailed balanced Markov processes

A continuous-time Markov chain is a way to specify the dynamics of a population which is spread across some finite set of states. Population can flow between the states. The larger the population of a state, the more rapidly population flows out of the state. Because of this property, under certain conditions the populations of the states tend toward an equilibrium where at any state the inflow of population is balanced by its outflow.

In applications to statistical mechanics, we are often interested in equilibria such that for any two states connected by an edge, say $i$ and $j,$ the flow from $i$ to $j$ equals the flow from $j$ to $i.$ A continuous-time Markov chain with a chosen equilibrium having this property is called ‘detailed balanced‘.

I’m getting tired of saying ‘continuous-time Markov chain’, so from now on I’ll just say ‘Markov process’, just because it’s shorter. Okay? That will let me say the next sentence without running out of breath:

Our paper is about open detailed balanced Markov processes.

Here’s an example:

The detailed balanced Markov process itself consists of a finite set of states together with a finite set of edges between them, with each state $i$ labelled by an equilibrium population $q_i >0,$ and each edge $e$ labelled by a rate constant $r_e > 0.$

These populations and rate constants are required to obey an equation called the ‘detailed balance condition’. This equation means that in equilibrium, the flow from $i$ to $j$ equal the flow from $j$ to $i.$ Do you see how it works in this example?

To get an ‘open’ detailed balanced Markov process, some states are designated as inputs or outputs. In general each state may be specified as both an input and an output, or as inputs and outputs multiple times. See how that’s happening in this example? It may seem weird, but it makes things work better.

People usually say Markov processes are all about how probabilities flow from one state to another. But we work with un-normalized probabilities, which we call ‘populations’, rather than probabilities that must sum to 1. The reason is that in an open Markov process, probability is not conserved: it can flow in or out at the inputs and outputs. We allow it to flow both in and out at both the input states and the output states.

Our most fundamental result is that there’s a category $\mathrm{DetBalMark}$ where a morphism is an open detailed balanced Markov process. We think of it as a morphism from its inputs to its outputs.

We compose morphisms in $\mathrm{DetBalMark}$ by identifying the output states of one open detailed balanced Markov process with the input states of another. The populations of identified states must match. For example, we may compose this morphism $N$:

with the previously shown morphism $M$ to get this morphism $M \circ N$:

And here’s our second most fundamental result: the category $\mathrm{DetBalMark}$ is actually a dagger compact category. This lets us do other stuff with open Markov processes. An important one is ‘tensoring’, which lets us take two open Markov processes like $M$ and $N$ above and set them side by side, giving $M \otimes N$:

The so-called compactness is also important. This means we can take some inputs of an open Markov process and turn them into outputs, or vice versa. For example, using the compactness of $\mathrm{DetBalMark}$ we can get this open Markov process from $M$:

In fact all the categories in our paper are dagger compact categories, and all our functors preserve this structure. Dagger compact categories are a well-known framework for describing systems with inputs and outputs, so this is good.

### The analogy to electrical circuits

In a detailed balanced Markov process, population can flow along edges. In the detailed balanced equilibrium, without any flow of population from outside, the flow along from state $i$ to state $j$ will be matched by the flow back from $j$ to $i.$ The populations need to take specific values for this to occur.

In an electrical circuit made of linear resistors, charge can flow along wires. In equilibrium, without any driving voltage from outside, the current along each wire will be zero. The potentials will be equal at every node.

This sets up an analogy between detailed balanced continuous-time Markov chains and electrical circuits made of linear resistors! I love analogy charts, so this makes me very happy:

 Circuits Detailed balanced Markov processes potential population current flow conductance rate constant power dissipation

This analogy is already well known. Schnakenberg used it in his book Thermodynamic Network Analysis of Biological Systems. So, our main goal is to formalize and exploit it. This analogy extends from systems in equilibrium to the more interesting case of nonequilibrium steady states, which are the main topic of our paper.

Earlier, Brendan and I introduced a way to ‘black box’ a circuit and define the relation it determines between potential-current pairs at the input and output terminals. This relation describes the circuit’s external behavior as seen by an observer who can only perform measurements at the terminals.

An important fact is that black boxing is ‘compositional’: if one builds a circuit from smaller pieces, the external behavior of the whole circuit can be determined from the external behaviors of the pieces. For category theorists, this means that black boxing is a functor!

Our new paper with Blake develops a similar ‘black box functor’ for detailed balanced Markov processes, and relates it to the earlier one for circuits.

When you black box a detailed balanced Markov process, you get the relation between population–flow pairs at the terminals. (By the ‘flow at a terminal’, we more precisely mean the net population outflow.) This relation holds not only in equilibrium, but also in any nonequilibrium steady state. Thus, black boxing an open detailed balanced Markov process gives its steady state dynamics as seen by an observer who can only measure populations and flows at the terminals.

### The principle of minimum dissipation

At least since the work of Prigogine, it’s been widely accepted that a large class of systems minimize entropy production in a nonequilibrium steady state. But people still fight about the the precise boundary of this class of systems, and even the meaning of this ‘principle of minimum entropy production’.

For detailed balanced open Markov processes, we show that a quantity we call the ‘dissipation’ is minimized in any steady state. This is a quadratic function of the populations and flows, analogous to the power dissipation of a circuit made of resistors. We make no claim that this quadratic function actually deserves to be called ‘entropy production’. Indeed, Schnakenberg has convincingly argued that they are only approximately equal.

But still, the ‘dissipation’ function is very natural and useful—and Prigogine’s so-called ‘entropy production’ is also a quadratic function.

### Black boxing

I’ve already mentioned the category $\mathrm{DetBalMark},$ where a morphism is an open detailed balanced Markov process. But our paper needs two more categories to tell its story! There’s the category of circuits, and the category of linear relations.

A morphism in the category $\mathrm{Circ}$ is an open electrical circuit made of resistors: that is, a graph with each edge labelled by a ‘conductance’ $c_e > 0,$ together with specified input and output nodes:

A morphism in the category $\mathrm{LinRel}$ is a linear relation $L : U \leadsto V$ between finite-dimensional real vector spaces $U$ and $V.$ This is nothing but a linear subspace $L \subseteq U \oplus V.$ Just as relations generalize functions, linear relations generalize linear functions!

In our previous paper, Brendan and I introduced these two categories and a functor between them, the ‘black box functor’:

$\blacksquare : \mathrm{Circ} \to \mathrm{LinRel}$

The idea is that any circuit determines a linear relation between the potentials and net current flows at the inputs and outputs. This relation describes the behavior of a circuit of resistors as seen from outside.

Our new paper introduces a black box functor for detailed balanced Markov processes:

$\square : \mathrm{DetBalMark} \to \mathrm{LinRel}$

We draw this functor as a white box merely to distinguish it from the other black box functor. The functor $\square$ maps any detailed balanced Markov process to the linear relation obeyed by populations and flows at the inputs and outputs in a steady state. In short, it describes the steady state behavior of the Markov process ‘as seen from outside’.

How do we manage to black box detailed balanced Markov processes? We do it using the analogy with circuits!

### The analogy becomes a functor

Every analogy wants to be a functor. So, we make the analogy between detailed balanced Markov processes and circuits precise by turning it into a functor:

$K : \mathrm{DetBalMark} \to \mathrm{Circ}$

This functor converts any open detailed balanced Markov process into an open electrical circuit made of resistors. This circuit is carefully chosen to reflect the steady-state behavior of the Markov process. Its underlying graph is the same as that of the Markov process. So, the ‘states’ of the Markov process are the same as the ‘nodes’ of the circuit.

Both the equilibrium populations at states of the Markov process and the rate constants labelling edges of the Markov process are used to compute the conductances of edges of this circuit. In the simple case where the Markov process has exactly one edge from any state $i$ to any state $j,$ the rule is this:

$C_{i j} = H_{i j} q_j$

where:

$q_j$ is the equilibrium population of the $j$th state of the Markov process,

$H_{i j}$ is the rate constant for the edge from the $j$th state to the $i$th state of the Markov process, and

$C_{i j}$ is the conductance (that is, the reciprocal of the resistance) of the wire from the $j$th node to the $i$th node of the resulting circuit.

The detailed balance condition for Markov processes says precisely that the matrix $C_{i j}$ is symmetric! This is just right for an electrical circuit made of resistors, since it means that the resistance of the wire from node $i$ to node $j$ equals the resistance of the same wire in the reverse direction, from node $j$ to node $i.$

### A triangle of functors

If you paid careful attention, you’ll have noticed that I’ve described a triangle of functors:

And if you know anything about how category theorists think, you’ll be wondering if this diagram commutes.

In fact, this triangle of functors does not commute! However, a general lesson of category theory is that we should only expect diagrams of functors to commute up to natural isomorphism, and this is what happens here:

The natural transformation $\alpha$ ‘corrects’ the black box functor for resistors to give the one for detailed balanced Markov processes.

The functors $\square$ and $\blacksquare \circ K$ are actually equal on objects. An object in $\mathrm{DetBalMark}$ is a finite set $X$ with each element $i \in X$ labelled a positive populations $q_i.$ Both functors map this object to the vector space $\mathbb{R}^X \oplus \mathbb{R}^X.$ For the functor $\square,$ we think of this as a space of population-flow pairs. For the functor $\blacksquare \circ K,$ we think of it as a space of potential-current pairs. The natural transformation $\alpha$ then gives a linear relation

$\alpha_{X,q} : \mathbb{R}^X \oplus \mathbb{R}^X \leadsto \mathbb{R}^X \oplus \mathbb{R}^X$

in fact an isomorphism of vector spaces, which converts potential-current pairs into population-flow pairs in a manner that depends on the $q_i.$ I’ll skip the formula; it’s in the paper.

But here’s the key point. The naturality of $\alpha$ actually allows us to reduce the problem of computing the functor $\square$ to the problem of computing $\blacksquare.$ Suppose

$M: (X,q) \to (Y,r)$

is any morphism in $\mathrm{DetBalMark}.$ The object $(X,q)$ is some finite set $X$ labelled by populations $q,$ and $(Y,r)$ is some finite set $Y$ labelled by populations $r.$ Then the naturality of $\alpha$ means that this square commutes:

Since $\alpha_{X,q}$ and $\alpha_{Y,r}$ are isomorphisms, we can solve for the functor $\square$ as follows:

$\square(M) = \alpha_Y \circ \blacksquare K(M) \circ \alpha_X^{-1}$

This equation has a clear intuitive meaning! It says that to compute the behavior of a detailed balanced Markov process, namely $\square(f),$ we convert it into a circuit made of resistors and compute the behavior of that, namely $\blacksquare K(f).$ This is not equal to the behavior of the Markov process, but we can compute that behavior by converting the input populations and flows into potentials and currents, feeding them into our circuit, and then converting the outputs back into populations and flows.

### What we really do

So that’s a sketch of what we do, and I hope you ask questions if it’s not clear. But I also hope you read our paper! Here’s what we actually do in there. After an introduction and summary of results:

• Section 3 defines open Markov processes and the open master equation.

• Section 4 introduces detailed balance for open Markov
processes.

• Section 5 recalls the principle of minimum power
for open circuits made of linear resistors, and explains how to black box them.

• Section 6 introduces the principle of minimum dissipation for open detailed balanced Markov processes, and describes how to black box these.

• Section 7 states the analogy between circuits and detailed balanced Markov processes in a formal way.

• Section 8 describes how to compose open Markov processes, making them into the morphisms of a category.

• Section 9 does the same for detailed balanced Markov processes.

• Section 10 describes the ‘black box functor’ that sends any open detailed balanced Markov process to the linear relation describing its external behavior, and recalls the similar functor for circuits.

• Section 11 makes the analogy between between open detailed balanced Markov processes and open circuits even more formal, by making it into a functor. We prove that together with the two black box functors, this forms a triangle that commutes up to natural isomorphism.

• Section 12 is about geometric aspects of this theory. We show that the linear relations in the image of these black box functors are Lagrangian relations between symplectic vector spaces. We also show that the master equation can be seen as a gradient flow equation.

• Section 13 is a summary of what we have learned.

Finally, Appendix A is a quick tutorial on decorated cospans. This is a key mathematical tool in our work, developed by Brendan in an earlier paper.

## The Inverse Cube Force Law

30 August, 2015

Here you see three planets. The blue planet is orbiting the Sun in a realistic way: it’s going around an ellipse.

The other two are moving in and out just like the blue planet, so they all stay on the same circle. But they’re moving around this circle at different rates! The green planet is moving faster than the blue one: it completes 3 orbits each time the blue planet goes around once. The red planet isn’t going around at all: it only moves in and out.

What’s going on here?

In 1687, Isaac Newton published his Principia Mathematica. This book is famous, but in Propositions 43–45 of Book I he did something that people didn’t talk about much—until recently. He figured out what extra force, besides gravity, would make a planet move like one of these weird other planets. It turns out an extra force obeying an inverse cube law will do the job!

Let me make this more precise. We’re only interested in ‘central forces’ here. A central force is one that only pushes a particle towards or away from some chosen point, and only depends on the particle’s distance from that point. In Newton’s theory, gravity is a central force obeying an inverse square law:

$F(r) = - \displaystyle{ \frac{a}{r^2} }$

for some constant $a.$ But he considered adding an extra central force obeying an inverse cube law:

$F(r) = - \displaystyle{ \frac{a}{r^2} + \frac{b}{r^3} }$

He showed that if you do this, for any motion of a particle in the force of gravity you can find a motion of a particle in gravity plus this extra force, where the distance $r(t)$ is the same, but the angle $\theta(t)$ is not.

In fact Newton did more. He showed that if we start with any central force, adding an inverse cube force has this effect.

Newton’s theorem of revolving orbits, Wikipedia.

I haven’t fully understood all of this, but it instantly makes me think of three other things I know about the inverse cube force law, which are probably related. So maybe you can help me figure out the relationship.

The first, and simplest, is this. Suppose we have a particle in a central force. It will move in a plane, so we can use polar coordinates $r, \theta$ to describe its position. We can describe the force away from the origin as a function $F(r).$ Then the radial part of the particle’s motion obeys this equation:

$\displaystyle{ m \ddot r = F(r) + \frac{L^2}{mr^3} }$

where $L$ is the magnitude of particle’s angular momentum.

So, angular momentum acts to provide a ‘fictitious force’ pushing the particle out, which one might call the centrifugal force. And this force obeys an inverse cube force law!

Furthermore, thanks to the formula above, it’s pretty obvious that if you change $L$ but also add a precisely compensating inverse cube force, the value of $\ddot r$ will be unchanged! So, we can set things up so that the particle’s radial motion will be unchanged. But its angular motion will be different, since it has a different angular momentum. This explains Newton’s observation.

It’s often handy to write a central force in terms of a potential:

$F(r) = -V'(r)$

Then we can make up an extra potential responsible for the centrifugal force, and combine it with the actual potential $V$ into a so-called effective potential:

$\displaystyle{ U(r) = V(r) + \frac{L^2}{2mr^2} }$

The particle’s radial motion then obeys a simple equation:

$\ddot{r} = - U'(r)$

For a particle in gravity, where the force obeys an inverse square law and $V$ is proportional to $-1/r,$ the effective potential might look like this:

This is the graph of

$\displaystyle{ U(r) = -\frac{4}{r} + \frac{1}{r^2} }$

If you’re used to particles rolling around in potentials, you can easily see that a particle with not too much energy will move back and forth, never making it to $r = 0$ or $r = \infty.$ This corresponds to an elliptical orbit. Give it more energy and the particle can escape to infinity, but it will never hit the origin. The repulsive ‘centrifugal force’ always overwhelms the attraction of gravity near the origin, at least if the angular momentum is nonzero.

On the other hand, suppose we have a particle moving in an attractive inverse cube force! Then the potential is proportional to $1/r^2,$ so the effective potential is

$\displaystyle{ U(r) = \frac{c}{r^2} + \frac{L^2}{mr^2} }$

where $c$ is negative for an attractive force. If this attractive force is big enough, namely

$\displaystyle{ c < -\frac{L^2}{m} }$

then this force can exceed the centrifugal force, and the particle can fall in to $r = 0.$

If we keep track of the angular coordinate $\theta,$ we can see what’s really going on. The particle is spiraling in to its doom, hitting the origin in a finite amount of time!

This should remind you of a black hole, and indeed something similar happens there, but even more drastic:

For a nonrotating uncharged black hole, the effective potential has three terms. Like Newtonian gravity it has an attractive $-1/r$ term and a repulsive $1/r^2$ term. But it also has an attractive term $-1/r^3$ term! In other words, it’s as if on top of Newtonian gravity, we had another attractive force obeying an inverse fourth power law! This overwhelms the others at short distances, so if you get too close to a black hole, you spiral in to your doom.

For example, a black hole can have an effective potential like this:

But back to inverse cube force laws! I know two more things about them. A while back I discussed how a particle in an inverse square force can be reinterpreted as a harmonic oscillator:

Planets in the fourth dimension, Azimuth.

There are many ways to think about this, and apparently the idea in some form goes all the way back to Newton! It involves a sneaky way to take a particle in a potential

$\displaystyle{ V(r) \propto r^{-1} }$

and think of it as moving around in the complex plane. Then if you square its position—thought of as a complex number—and cleverly reparametrize time, you get a particle moving in a potential

$\displaystyle{ V(r) \propto r^2 }$

This amazing trick can be generalized! A particle in a potential

$\displaystyle{ V(r) \propto r^p }$

can transformed to a particle in a potential

$\displaystyle{ V(r) \propto r^q }$

if

$(p+2)(q+2) = 4$

A good description is here:

• Rachel W. Hall and Krešimir Josić, Planetary motion and the duality of force laws, SIAM Review 42 (2000), 115–124.

This trick transforms particles in $r^p$ potentials with $p$ ranging between $-2$ and $+\infty$ to $r^q$ potentials with $q$ ranging between $+\infty$ and $-2.$ It’s like a see-saw: when $p$ is small, $q$ is big, and vice versa.

But you’ll notice this trick doesn’t actually work at $p = -2,$ the case that corresponds to the inverse cube force law. The problem is that $p + 2 = 0$ in this case, so we can’t find $q$ with $(p+2)(q+2) = 4.$

So, the inverse cube force is special in three ways: it’s the one that you can add on to any force to get solutions with the same radial motion but different angular motion, it’s the one that naturally describes the ‘centrifugal force’, and it’s the one that doesn’t have a partner! We’ve seen how the first two ways are secretly the same. I don’t know about the third, but I’m hopeful.

### Quantum aspects

Finally, here’s a fourth way in which the inverse cube law is special. This shows up most visibly in quantum mechanics… and this is what got me interested in this business in the first place.

You see, I’m writing a paper called ‘Struggles with the continuum’, which discusses problems in analysis that arise when you try to make some of our favorite theories of physics make sense. The inverse square force law poses interesting problems of this sort, which I plan to discuss. But I started wanting to compare the inverse cube force law, just so people can see things that go wrong in this case, and not take our successes with the inverse square law for granted.

Unfortunately a huge digression on the inverse cube force law would be out of place in that paper. So, I’m offloading some of that material to here.

In quantum mechanics, a particle moving in an inverse cube force law has a Hamiltonian like this:

$H = -\nabla^2 + c r^{-2}$

The first term describes the kinetic energy, while the second describes the potential energy. I’m setting $\hbar = 1$ and $2m = 1$ to remove some clutter that doesn’t really affect the key issues.

To see how strange this Hamiltonian is, let me compare an easier case. If $p < 2,$ the Hamiltonian

$H = -\nabla^2 + c r^{-p}$

is essentially self-adjoint on $C_0^\infty(\mathbb{R}^3 - \{0\}),$ which is the space of compactly supported smooth functions on 3d Euclidean space minus the origin. What this means is that first of all, $H$ is defined on this domain: it maps functions in this domain to functions in $L^2(\mathbb{R}^3)$. But more importantly, it means we can uniquely extend $H$ from this domain to a self-adjoint operator on some larger domain. In quantum physics, we want our Hamiltonians to be self-adjoint. So, this fact is good.

Proving this fact is fairly hard! It uses something called the Kato–Lax–Milgram–Nelson theorem together with this beautiful inequality:

$\displaystyle{ \int_{\mathbb{R}^3} \frac{1}{4r^2} |\psi(x)|^2 \,d^3 x \le \int_{\mathbb{R}^3} |\nabla \psi(x)|^2 \,d^3 x }$

for any $\psi\in C_0^\infty(\mathbb{R}^3).$

If you think hard, you can see this inequality is actually a fact about the quantum mechanics of the inverse cube law! It says that if $c \ge -1/4,$ the energy of a quantum particle in the potential $c r^{-2}$ is bounded below. And in a sense, this inequality is optimal: if $c < -1/4$, the energy is not bounded below. This is the quantum version of how a classical particle can spiral in to its doom in an attractive inverse cube law, if it doesn’t have enough angular momentum. But it’s subtly and mysteriously different.

You may wonder how this inequality is used to prove good things about potentials that are ‘less singular’ than the $c r^{-2}$ potential: that is, potentials $c r^{-p}$ with $p < 2.$ For that, you have to use some tricks that I don’t want to explain here. I also don’t want to prove this inequality, or explain why its optimal! You can find most of this in some old course notes of mine:

• John Baez, Quantum Theory and Analysis, 1989.

See especially section 15.

But it’s pretty easy to see how this inequality implies things about the expected energy of a quantum particle in the potential $c r^{-2}$. So let’s do that.

In this potential, the expected energy of a state $\psi$ is:

$\displaystyle{ \langle \psi, H \psi \rangle = \int_{\mathbb{R}^3} \overline\psi(x)\, (-\nabla^2 + c r^{-2})\psi(x) \, d^3 x }$

Doing an integration by parts, this gives:

$\displaystyle{ \langle \psi, H \psi \rangle = \int_{\mathbb{R}^3} |\nabla \psi(x)|^2 + cr^{-2} |\psi(x)|^2 \,d^3 x }$

The inequality I showed you says precisely that when $c = -1/4,$ this is greater than or equal to zero. So, the expected energy is actually nonnegative in this case! And making $c$ greater than $-1/4$ only makes the expected energy bigger.

Note that in classical mechanics, the energy of a particle in this potential ceases to be bounded below as soon as $c < 0.$ Quantum mechanics is different because of the uncertainty principle! To get a lot of negative potential energy, the particle’s wavefunction must be squished near the origin, but that gives it kinetic energy.

It turns out that the Hamiltonian for a quantum particle in an inverse cube force law has exquisitely subtle and tricky behavior. Many people have written about it, running into ‘paradoxes’ when they weren’t careful enough. Only rather recently have things been straightened out.

For starters, the Hamiltonian for this kind of particle

$H = -\nabla^2 + c r^{-2}$

has different behaviors depending on $c.$ Obviously the force is attractive when $c > 0$ and repulsive when $c < 0,$ but that’s not the only thing that matters! Here’s a summary:

$c \ge 3/4.$ In this case $H$ is essentially self-adjoint on $C_0^\infty(\mathbb{R}^3 - \{0\}).$ So, it admits a unique self-adjoint extension and there’s no ambiguity about this case.

$c < 3/4.$ In this case $H$ is not essentially self-adjoint on $C_0^\infty(\mathbb{R}^3 - \{0\}).$ In fact, it admits more than one self-adjoint extension! This means that we need extra input from physics to choose the Hamiltonian in this case. It turns out that we need to say what happens when the particle hits the singularity at $r = 0.$ This is a long and fascinating story that I just learned yesterday.

$c \ge -1/4.$ In this case the expected energy $\langle \psi, H \psi \rangle$ is bounded below for $\psi \in C_0^\infty(\mathbb{R}^3 - \{0\}).$ It turns out that whenever we have a Hamiltonian that is bounded below, even if there is not a unique self-adjoint extension, there exists a canonical ‘best choice’ of self-adjoint extension, called the Friedrichs extension. I explain this in my course notes.

$c < -1/4.$ In this case the expected energy is not bounded below, so we don’t have the Friedrichs extension to help us choose which self-adjoint extension is ‘best’.

To go all the way down this rabbit hole, I recommend these two papers:

• Sarang Gopalakrishnan, Self-Adjointness and the Renormalization of Singular Potentials, B.A. Thesis, Amherst College.

• D. M. Gitman, I. V. Tyutin and B. L. Voronov, Self-adjoint extensions and spectral analysis in the Calogero problem, Jour. Phys. A 43 (2010), 145205.

The first is good for a broad overview of problems associated to singular potentials such as the inverse cube force law; there is attention to mathematical rigor the focus is on physical insight. The second is good if you want—as I wanted—to really get to the bottom of the inverse cube force law in quantum mechanics. Both have lots of references.

Also, both point out a crucial fact I haven’t mentioned yet: in quantum mechanics the inverse cube force law is special because, naively, at least it has a kind of symmetry under rescaling! You can see this from the formula

$H = -\nabla^2 + cr^{-2}$

by noting that both the Laplacian and $r^{-2}$ have units of length-2. So, they both transform in the same way under rescaling: if you take any smooth function $\psi$, apply $H$ and then expand the result by a factor of $k,$ you get $k^2$ times what you get if you do those operations in the other order.

In particular, this means that if you have a smooth eigenfunction of $H$ with eigenvalue $\lambda,$ you will also have one with eigenfunction $k^2 \lambda$ for any $k > 0.$ And if your original eigenfunction was normalizable, so will be the new one!

With some calculation you can show that when $c \le -1/4,$ the Hamiltonian $H$ has a smooth normalizable eigenfunction with a negative eigenvalue. In fact it’s spherically symmetric, so finding it is not so terribly hard. But this instantly implies that $H$ has smooth normalizable eigenfunctions with any negative eigenvalue.

This implies various things, some terrifying. First of all, it means that $H$ is not bounded below, at least not on the space of smooth normalizable functions. A similar but more delicate scaling argument shows that it’s also not bounded below on $C_0^\infty(\mathbb{R}^3 - \{0\}),$ as I claimed earlier.

This is scary but not terrifying: it simply means that when $c \le -1/4,$ the potential is too strongly negative for the Hamiltonian to be bounded below.

The terrifying part is this: we’re getting uncountably many normalizable eigenfunctions, all with different eigenvalues, one for each choice of $k.$ A self-adjoint operator on a countable-dimensional Hilbert space like $L^2(\mathbb{R}^3)$ can’t have uncountably many normalizable eigenvectors with different eigenvalues, since then they’d all be orthogonal to each other, and that’s too many orthogonal vectors to fit in a Hilbert space of countable dimension!

This sounds like a paradox, but it’s not. These functions are not all orthogonal, and they’re not all eigenfunctions of a self-adjoint operator. You see, the operator $H$ is not self-adjoint on the domain we’ve chosen, the space of all smooth functions in $L^2(\mathbb{R}^3).$ We can carefully choose a domain to get a self-adjoint operator… but it turns out there are many ways to do it.

Intriguingly, in most cases this choice breaks the naive dilation symmetry. So, we’re getting what physicists call an ‘anomaly’: a symmetry of a classical system that fails to give a symmetry of the corresponding quantum system.

Of course, if you’ve made it this far, you probably want to understand what the different choices of Hamiltonian for a particle in an inverse cube force law actually mean, physically. The idea seems to be that they say how the particle changes phase when it hits the singularity at $r = 0$ and bounces back out.

(Why does it bounce back out? Well, if it didn’t, time evolution would not be unitary, so it would not be described by a self-adjoint Hamiltonian! We could try to describe the physics of a quantum particle that does not come back out when it hits the singularity, and I believe people have tried, but this requires a different set of mathematical tools.)

For a detailed analysis of this, it seems one should take Schrödinger’s equation and do a separation of variables into the angular part and the radial part:

$\psi(r,\theta,\phi) = \Psi(r) \Phi(\theta,\phi)$

For each choice of $\ell = 0,1,2,\dots$ one gets a space of spherical harmonics that one can use for the angular part $\Phi.$ The interesting part is the radial part, $\Psi.$ Here it is helpful to make a change of variables

$u(r) = \Psi(r)/r$

At least naively, Schrödinger’s equation for the particle in the $cr^{-2}$ potential then becomes

$\displaystyle{ \frac{d}{dt} u = -iH u }$

where

$\displaystyle{ H = -\frac{d^2}{dr^2} + \frac{c + \ell(\ell+1)}{r^2} }$

Beware: I keep calling all sorts of different but related Hamiltonians $H,$ and this one is for the radial part of the dynamics of a quantum particle in an inverse cube force. As we’ve seen before in the classical case, the centrifugal force and the inverse cube force join forces in an ‘effective potential’

$\displaystyle{ U(r) = kr^{-2} }$

where

$k = c + \ell(\ell+1)$

So, we have reduced the problem to that of a particle on the open half-line $(0,\infty)$ moving in the potential $kr^{-2}.$ The Hamiltonian for this problem:

$\displaystyle{ H = -\frac{d^2}{dr^2} + \frac{k}{r^2} }$

is called the Calogero Hamiltonian. Needless to say, it has fascinating and somewhat scary properties, since to make it into a bona fide self-adjoint operator, we must make some choice about what happens when the particle hits $r = 0.$ The formula above does not really specify the Hamiltonian.

This is more or less where Gitman, Tyutin and Voronov begin their analysis, after a long and pleasant review of the problem. They describe all the possible choices of self-adjoint operator that are allowed. The answer depends on the values of $k,$ but very crudely, the choice says something like how the phase of your particle changes when it bounces off the singularity. Most choices break the dilation invariance of the problem. But intriguingly, some choices retain invariance under a discrete subgroup of dilations!

So, the rabbit hole of the inverse cube force law goes quite deep, and I expect I haven’t quite gotten to the bottom yet. The problem may seem pathological, verging on pointless. But the math is fascinating, and it’s a great testing-ground for ideas in quantum mechanics—very manageable compared to deeper subjects like quantum field theory, which are riddled with their own pathologies. Finally, the connection between the inverse cube force law and centrifugal force makes me think it’s not a mere curiosity.

### In four dimensions

It’s a bit odd to study the inverse cube force law in 3-dimensonal space, since Newtonian gravity and the electrostatic force would actually obey an inverse cube law in 4-dimensional space. For the classical 2-body problem it doesn’t matter much whether you’re in 3d or 4d space, since the motion stays on the plane. But for quantum 2-body problem it makes more of a difference!

Just for the record, let me say how the quantum 2-body problem works in 4 dimensions. As before, we can work in the center of mass frame and consider this Hamiltonian:

$H = -\nabla^2 + c r^{-2}$

And as before, the behavior of this Hamiltonian depends on $c.$ Here’s the story this time:

$c \ge 0.$ In this case $H$ is essentially self-adjoint on $C_0^\infty(\mathbb{R}^4 - \{0\}).$ So, it admits a unique self-adjoint extension and there’s no ambiguity about this case.

$c < 0.$ In this case $H$ is not essentially self-adjoint on $C_0^\infty(\mathbb{R}^4 - \{0\}).$

$c \ge -1.$ In this case the expected energy $\langle \psi, H \psi \rangle$ is bounded below for $\psi \in C_0^\infty(\mathbb{R}^3 - \{0\}).$ So, there is exists a canonical ‘best choice’ of self-adjoint extension, called the Friedrichs extension.

$c < -1.$ In this case the expected energy is not bounded below, so we don’t have the Friedrichs extension to help us choose which self-adjoint extension is ‘best’.

I’ve been assured these are correct by Barry Simon, and a lot of this material will appear in Section 7.4 of his book:

• Barry Simon, A Comprehensive Course in Analysis, Part 4: Operator Theory, American Mathematical Society, Providence, RI, 2015.

• Barry Simon, Essential self-adjointness of Schrödinger operators with singular potentials, Arch. Rational Mech. Analysis 52 (1973), 44–48.

### Notes

The animation was made by ‘WillowW’ and placed on Wikicommons. It’s one of a number that appears in this Wikipedia article:

Newton’s theorem of revolving orbits, Wikipedia.

I made the graphs using the free online Desmos graphing calculator.

The picture of a spiral was made by ‘Anarkman’ and ‘Pbroks13’ and placed on Wikicommons; it appears in

Hyperbolic spiral, Wikipedia.

The hyperbolic spiral is one of three kinds of orbits that are possible in an inverse cube force law. They are vaguely analogous to ellipses, hyperbolas and parabolas, but there are actually no bound orbits except perfect circles. The three kinds are called Cotes’s spirals. In polar coordinates, they are:

• the epispiral:

$\displaystyle{ \frac{1}{r} = A \cos\left( k\theta + \varepsilon \right) }$

• the hyperbolic spiral:

$\displaystyle{ \frac{1}{r} = A \cosh\left( k\theta + \varepsilon \right) }$

• the Poinsot spiral:

$\displaystyle{ \frac{1}{r} = A \theta + \varepsilon }$

## The Physics of Butterfly Wings

11 August, 2015

Some butterflies have shiny, vividly colored wings. From different angles you see different colors. This effect is called iridescence. How does it work?

It turns out these butterfly wings are made of very fancy materials! Light bounces around inside these materials in a tricky way. Sunlight of different colors winds up reflecting off these materials in different directions.

We’re starting to understand the materials and make similar substances in the lab. They’re called photonic crystals. They have amazing properties.

Here at the Centre for Quantum Technologies we have people studying exotic materials of many kinds. Next door, there’s a lab completely devoted to studying graphene: crystal sheets of carbon in which electrons can move as if they were massless particles! Graphene has a lot of potential for building new technologies—that’s why Singapore is pumping money into researching it.

Some physicists at MIT just showed that one of the materials in butterfly wings might act like a 3d form of graphene. In graphene, electrons can only move easily in 2 directions. In this new material, electrons could move in all 3 directions, acting as if they had no mass.

The pictures here show the microscopic structure of two materials found in butterfly wings:

The picture at left actually shows a sculpture made by the mathematical artist Bathsheba Grossman. But it’s a piece of a gyroid: a surface with a very complicated shape, which repeats forever in 3 directions. It’s called a minimal surface because you can’t shrink its area by tweaking it just a little. It divides space into two regions.

The gyroid was discovered in 1970 by a mathematician, Alan Schoen. It’s a triply periodic minimal surfaces, meaning one that repeats itself in 3 different directions in space, like a crystal.

Schoen was working for NASA, and his idea was to use the gyroid for building ultra-light, super-strong structures. But that didn’t happen. Research doesn’t move in predictable directions.

In 1983, people discovered that in some mixtures of oil and water, the oil naturally forms a gyroid. The sheets of oil try to minimize their area, so it’s not surprising that they form a minimal surface. Something else makes this surface be a gyroid—I’m not sure what.

Butterfly wings are made of a hard material called chitin. Around 2008, people discovered that the chitin in some iridescent butterfly wings is made in a gyroid pattern! The spacing in this pattern is very small, about one wavelength of visible light. This makes light move through this material in a complicated way, which depends on the light’s color and the direction it’s moving.

So: butterflies have naturally evolved a photonic crystal based on a gyroid!

The universe is awesome, but it’s not magic. A mathematical pattern is beautiful if it’s a simple solution to at least one simple problem. This is why beautiful patterns naturally bring themselves into existence: they’re the simplest ways for certain things to happen. Darwinian evolution helps out: it scans through trillions of possibilities and finds solutions to problems. So, we should expect life to be packed with mathematically beautiful patterns… and it is.

The picture at right above shows a ‘double gyroid’. Here it is again:

This is actually two interlocking surfaces, shown in red and blue. You can get them by writing the gyroid as a level surface:

$f(x,y,z) = 0$

and taking the two nearby surfaces

$f(x,y,z) = \pm c$

for some small value of $c.$.

It turns out that while they’re still growing, some butterflies have a double gyroid pattern in their wings. This turns into a single gyroid when they grow up!

The new research at MIT studied how an electron would move through a double gyroid pattern. They calculated its dispersion relation: how the speed of the electron would depend on its energy and the direction it’s moving.

An ordinary particle moves faster if it has more energy. But a massless particle, like a photon, moves at the same speed no matter what energy it has. The MIT team showed that an electron in a double gyroid pattern moves at a speed that doesn’t depend much on its energy. So, in some ways this electron acts like a massless particle.

But it’s quite different than a photon. It’s actually more like a neutrino! You see, unlike photons, electrons and neutrinos are spin-1/2 particles. Neutrinos are almost massless. A massless spin-1/2 particle can have a built-in handedness, spinning in only one direction around its axis of motion. Such a particle is called a Weyl spinor. The MIT team showed that a electron moving through a double gyroid acts approximately like a Weyl spinor!

How does this work? Well, the key fact is that the double gyroid has a built-in handedness, or chirality. It comes in a left-handed and right-handed form. You can see the handedness quite clearly in Grossman’s sculpture of the ordinary gyroid:

Beware: nobody has actually made electrons act like Weyl spinors in the lab yet. The MIT team just found a way that should work. Someday someone will actually make it happen, probably in less than a decade. And later, someone will do amazing things with this ability. I don’t know what. Maybe the butterflies know!

### References and more

For a good introduction to the physics of gyroids, see:

• James A. Dolan, Bodo D. Wilts, Silvia Vignolini, Jeremy J. Baumberg, Ullrich Steiner and Timothy D. Wilkinson, Optical properties of gyroid structured materials: from photonic crystals to metamaterials, Advanced Optical Materials 3 (2015), 12–32.

For some of the history and math of gyroids, see Alan Schoen’s webpage:

• Alan Schoen, Triply-periodic minimal surfaces.

For more on gyroids in butterfly wings, see:

• K. Michielsen and D. G. Stavenga, Gyroid cuticular structures in butterfly wing scales: biological photonic crystals.

• Vinodkumar Saranathana et al, Structure, function, and self-assembly of single network gyroid (I4132) photonic crystals in butterfly wing scales, PNAS 107 (2010), 11676–11681.

The paper by Michielsen and Stavenga is free online! They say the famous ‘blue Morpho’ butterfly shown in the picture at the top of this article does not use a gyroid; it uses a “two-dimensional photonic crystal slab consisting of arrays of rectangles formed by lamellae and microribs.” But they find gyroids in four other species: Callophrys rubi, Cyanophrys remus, Pardes sesostris and Teinopalpus imperialis. It compares tunnelling electron microscope pictures of slices of their iridescent patches with computer-generated slices of gyroids. The comparison looks pretty good to me:

For the evolution of iridescence, see:

• Melissa G. Meadows et al, Iridescence: views from many angles, J. Roy. Soc. Interface 6 (2009).

For the new research at MIT, see:

• Ling Lu, Liang Fu, John D. Joannopoulos and Marin Soljačić, Weyl points and line nodes in gapless gyroid photonic crystals.

• Ling Lu, Zhiyu Wang, Dexin Ye, Lixin Ran, Liang Fu, John D. Joannopoulos and Marin Soljačić, Experimental observation of Weyl points, Science 349 (2015), 622–624.

Again, the first is free online. There’s a lot of great math lurking inside, most of which is too mind-blowing too explain quickly. Let me just paraphrase the start of the paper, so at least experts can get the idea:

Two-dimensional (2d) electrons and photons at the energies and frequencies of Dirac points exhibit extraordinary features. As the best example, almost all the remarkable properties of graphene are tied to the massless Dirac fermions at its Fermi level. Topologically, Dirac cones are not only the critical points for 2d phase transitions but also the unique surface manifestation of a topologically gapped 3d bulk. In a similar way, it is expected that if a material could be found that exhibits a 3d linear dispersion relation, it would also display a wide range of interesting physics phenomena. The associated 3D linear point degeneracies are called “Weyl points”. In the past year, there have been a few studies of Weyl fermions in electronics. The associated Fermi-arc surface states, quantum Hall effect, novel transport properties and a realization of the Adler–Bell–Jackiw anomaly are also expected. However, no observation of Weyl points has been reported. Here, we present a theoretical discovery and detailed numerical investigation of frequency-isolated Weyl points in perturbed double-gyroid photonic crystals along with their complete phase diagrams and their topologically protected surface states.

Also a bit for the mathematicians:

Weyl points are topologically stable objects in the 3d Brillouin zone: they act as monopoles of Berry flux in momentum space, and hence are intimately related to the topological invariant known as the Chern number. The Chern number can be defined for a single bulk band or a set of bands, where the Chern numbers of the individual bands are summed, on any closed 2d surface in the 3d Brillouin zone. The difference of the Chern numbers defined on two surfaces, of all bands below the Weyl point frequencies, equals the sum of the chiralities of the Weyl points enclosed in between the two surfaces.

This is a mix of topology and physics jargon that may be hard for pure mathematicians to understand, but I’ll be glad to translate if there’s interest.

For starters, a ‘monopole of Berry flux in momentum space’ is a poetic way of talking about a twisted complex line bundle over the space of allowed energy-momenta of the electron in the double gyroid. We get a twist at every ‘Weyl point’, meaning a point where the dispersion relations look locally like those of a Weyl spinor when its energy-momentum is near zero. Near such a point, the dispersion relations are a Fourier-transformed version of the Weyl equation.