In October 2006, I wrote this in my online diary:

A long time ago on this diary, I mentioned my friend Bruce Smith’s nightmare scenario. In the quest for ever faster growth, corporations evolve toward ever faster exploitation of natural resources. The Earth is not enough. So, ultimately, they send out self-replicating von Neumann probes that eat up solar systems as they go, turning the planets into more probes. Different brands of probes will compete among each other, evolving toward ever faster expansion. Eventually, the winners will form a wave expanding outwards at nearly the speed of light—demolishing everything behind them, leaving only wreckage.

The scary part is that even if we don’t let this happen, some other civilization might.

The last point is the key one. Even if something is unlikely, in a sufficiently large universe it will happen, as long as it’s

possible. And then it will perpetuate itself, as long as it’s evolutionarily fit. Our universe seems pretty darn big. So, even if a given strategy is hard to find, if it’s a winning strategy it will get played somewhere.So, even in this nightmare scenario of "spheres of von Neumann probes expanding at near lightspeed", we don’t need to worry about a bleak future for the universe as a whole—any more than we need to worry that viruses will completely kill off all higher life forms. Some fraction of civilizations will probably develop defenses in time to repel the onslaught of these expanding spheres.

It’s not something I stay awake worrying about, but it’s a depressingly plausible possibility. As you can see, I was trying to reassure myself that everything would be okay, or at least acceptable, in the long run.

Even earlier, S. Jay Olson and I wrote a paper together on the limitations in accurately measuring distances caused by quantum gravity. If you try to measure a distance too accurately, you’ll need to concentrate so much energy in such a small space that you’ll create a black hole!

That was in 2002. Later I lost touch with him. But now I’m happy to discover that he’s doing interesting work on quantum gravity and quantum information processing! He is now at Boise State University in Idaho, his home state.

But here’s the cool part: he’s also studying aggressively expanding civilizations.

What will happen if some civilizations start aggressively expanding through the Universe at a reasonable fraction of the speed of light? We don’t have to assume most of them do. Indeed, there can’t be too many, or they’d already be here! More precisely, the density of such civilizations must be low at the present time. The number of them could be infinite, since space is apparently infinite. But none have reached us. We may eventually become such a civilization, but we’re not one yet.

Each such civilization will form a growing ‘bubble’: an expanding sphere of influence. And occasionally, these bubbles will collide!

Here are some pictures from a simulation he did:

As he notes, the math of these bubbles has already been studied by researchers interested in inflationary cosmology, like Alan Guth. These folks have considered the possibility that in the very early Universe, most of space was filled with a ‘false vacuum’: a state of matter that resembles the actual vacuum, but has higher energy density.

A false vacuum could turn into the true vacuum, liberating energy in the form of particle-antiparticle pairs. However, it might not do this instantly! It might be ‘metastable’, like ball number 1 in this picture:

It might need a nudge to ‘roll over the hill’ (metaphorically) and down into the lower-energy state corresponding to the true vacuum, shown as ball number 3. Or, thanks to quantum mechanics, it might ‘tunnel’ through this hill.

The balls and the hill are just an analogy. What I mean is that the false vacuum might need to go through a stage of having even higher energy density before it could turn into the true vacuum. Random fluctuations, either quantum-mechanical or thermal, could make this happen. Such a random fluctuation could happen in one location, forming a ‘bubble’ of true vacuum that—under certain conditions—would rapidly expand.

It’s actually not very different from bubbles of steam forming in superheated water!

But here’s the really interesting Jay Olson noted in his first paper on this subject. Research on bubbles in the inflationary cosmology could actually be relevant to aggressively expanding civilizations!

Why? Just as a bubble of expanding true vacuum has different pressure than the false vacuum surrounding it, the same might be true for an aggressively expanding civilization. If they are serious about expanding rapidly, they may convert a lot of matter into radiation to power their expansion. And while energy is conserved in this process, the *pressure* of radiation in space is a lot bigger than the pressure of matter, which is almost zero.

General relativity says that energy density slows the expansion of the Universe. But also—and this is probably less well-known among nonphysicists—it says that *pressure* has a similar effect. Also, as the Universe expands, the energy density and pressure of radiation drops at a different rate than the energy density of matter.

So, the expansion of the Universe itself, on a very large scale, could be affected by aggressively expanding civilizations!

The fun part is that Jay Olson actually studies this in a quantitative way, making some guesses about the numbers involved. Of course there’s a huge amount of uncertainty in all matters concerning aggressively expanding high-tech civilizations, so he actually considers a wide range of possible numbers. But if we assume a civilization turns a large fraction of matter into radiation, the effects could be significant!

The effect of the extra pressure due to radiation would be to temporarily slow the expansion of the Universe. But the expansion would not be stopped. The radiation will gradually thin out. So eventually, dark energy—which has negative pressure, and does not thin out as the Universe expands—will win. Then the Universe will expand exponentially, as it is already beginning to do now.

(Here I am ignoring speculative theories where dark energy has properties that change dramatically over time.)

Here are his papers on this subject. The abstracts sketch his results, but you have to look at the papers to see how nice they are. He’s thought quite carefully about these things.

• S. Jay Olson, Homogeneous cosmology with aggressively expanding civilizations, *Classical and Quantum Gravity* **32** (2015) 215025.

Abstract.In the context of a homogeneous universe, we note that the appearance of aggressively expanding advanced life is geometrically similar to the process of nucleation and bubble growth in a first-order cosmological phase transition. We exploit this similarity to describe the dynamics of life saturating the universe on a cosmic scale, adapting the phase transition model to incorporate probability distributions of expansion and resource consumption strategies. Through a series of numerical solutions spanning several orders of magnitude in the input assumption parameters, the resulting cosmological model is used to address basic questions related to the intergalactic spreading of life, dealing with issues such as timescales, observability, competition between strategies, and first-mover advantage. Finally, we examine physical effects on the universe itself, such as reheating and the backreaction on the evolution of the scale factor, if such life is able to control and convert a significant fraction of the available pressureless matter into radiation. We conclude that the existence of life, if certain advanced technologies are practical, could have a significant influence on the future large-scale evolution of the universe.

• S. Jay Olson, Estimates for the number of visible galaxy-spanning civilizations and the cosmological expansion of life.

Abstract.If advanced civilizations appear in the universe with a desire to expand, the entire universe can become saturated with life on a short timescale, even if such expanders appear but rarely. Our presence in an untouched Milky Way thus constrains the appearance rate of galaxy-spanning Kardashev type III (K3) civilizations, if it is assumed that some fraction of K3 civilizations will continue their expansion at intergalactic distances. We use this constraint to estimate the appearance rate of K3 civilizations for 81 cosmological scenarios by specifying the extent to which humanity could be a statistical outlier. We find that in nearly all plausible scenarios, the distance to the nearest visible K3 is cosmological. In searches where the observable range is limited, we also find that the most likely detections tend to be expanding civilizations who have entered the observable range from farther away. An observation of K3 clusters is thus more likely than isolated K3 galaxies.

• S. Jay Olson, On the visible size and geometry of aggressively expanding civilizations at cosmological distances.

Abstract.If a subset of advanced civilizations in the universe choose to rapidly expand into unoccupied space, these civilizations would have the opportunity to grow to a cosmological scale over the course of billions of years. If such life also makes observable changes to the galaxies they inhabit, then it is possible that vast domains of life-saturated galaxies could be visible from the Earth. Here, we describe the shape and angular size of these domains as viewed from the Earth, and calculate median visible sizes for a variety of scenarios. We also calculate the total fraction of the sky that should be covered by at least one domain. In each of the 27 scenarios we examine, the median angular size of the nearest domain is within an order of magnitude of a percent of the whole celestial sphere. Observing such a domain would likely require an analysis of galaxies on the order of a giga-lightyear from the Earth.

Here are the main assumptions in his first paper:

1. At early times (relative to the appearance of life), the universe is described by the standard cosmology – a benchmark Friedmann-Robertson-Walker (FRW) solution.

2. The limits of technology will allow for self-reproducing spacecraft, sustained relativistic travel over cosmological distances, and an efficient process to convert baryonic matter into radiation.

3. Control of resources in the universe will tend to be dominated by civilizations that adopt a strategy of aggressive expansion (defined as a frontier which expands at a large fraction of the speed of the individual spacecraft involved), rather than those expanding diffusively due to the conventional pressures of population dynamics.

4. The appearance of aggressively expanding life in the universe is a spatially random event and occurs at some specified, model-dependent rate.

5. Aggressive expanders will tend to expand in all directions unless constrained by the presence of other civilizations, will attempt to gain control of as much matter as is locally available for their use, and once established in a region of space, will consume mass as an energy source (converting it to radiation) at some specified, model-dependent rate.

]]>

In math we love functions. If we have a function

we can formally turn around the arrow to think of as something going back from back to . But this something is usually not a function: it’s called a ‘cofunction’. A **cofunction** from to is simply a function from to

Cofunctions are somewhat interesting, but they’re really just functions viewed through a looking glass, so they don’t give much new—at least, not by themselves.

The game gets more interesting if we think of functions and cofunctions as special sorts of relations. A **relation** from to is a subset

It’s a **function** when for each there’s a unique with It’s a **cofunction** when for each there’s a unique with

Just as we can compose functions, we can compose relations. Relations have certain advantages over functions: for example, we can ‘turn around’ any relation from to and get a relation from to

If we turn around a function we get a cofunction, and vice versa. But we can also do other fun things: for example, since both functions and cofunctions are relations, we can compose a function and a cofunction and get a relation.

Of course, relations also have certain *disadvantages* compared to functions. But it’s utterly clear by now that the category where the objects are finite sets and the morphisms are relations, is very important.

So far, so good. But what happens if we take the definition of ‘relation’ and turn all the arrows around?

There are actually several things I could mean by this question, some more interesting than others. But one of them gives a very interesting new concept: the concept of ‘corelation’. And two of my students have just written a very nice paper on corelations:

• Brandon Coya and Brendan Fong, Corelations are the prop for extraspecial commutative Frobenius monoids.

Here’s why this paper is important for network theory: corelations between finite sets are exactly what we need to describe electrical circuits made of ideal conductive wires! A corelation from a finite set to a finite set can be drawn this way:

I have drawn more wires than strictly necessary: I’ve drawn a wire between two points whenever I want current to be able to flow between them. But there’s a reason I did this: a corelation from to simply tells us when current can flow from one point in either of these sets to any other point in these sets.

Of course circuits made solely of conductive wires are not very exciting for electrical engineers. But in an earlier paper, Brendan introduced corelations as an important stepping-stone toward more general circuits:

• John Baez and Brendan Fong, A compositional framework for passive linear circuits. (Blog article here.)

The key point is simply that you use conductive wires to connect resistors, inductors, capacitors, batteries and the like and build interesting circuits—so if you don’t fully understand the math of conductive wires, you’re limited in your ability to understand circuits in general!

In their new paper, Brendan teamed up with Brandon Coya, and they figured out all the rules obeyed by the category where the objects are finite sets and the morphisms are corelations. I’ll explain these rules later.

This sort of analysis had previously been done for and it turns out there’s a beautiful analogy between the two cases! Here is a chart displaying the analogy:

Spans |
Cospans |

extra bicommutative bimonoids | special commutative Frobenius monoids |

Relations |
Corelations |

extraspecial bicommutative bimonoids | extraspecial commutative Frobenius monoids |

I’m sure this will be cryptic to the nonmathematicians reading this, and even many mathematicians—but the paper explains what’s going on here.

I’ll actually say what an ‘extraspecial commutative Frobenius monoid’ is later in this post. This is a terse way of listing all the rules obeyed by corelations between finite sets—and thus, all the rules obeyed by conductive wires.

But first, let’s talk about something simpler.

Just as we can define functions as relations of a special sort, we can also define relations in terms of functions. A relation from to is a subset

but we can think of this as an equivalence class of one-to-one functions

Why an equivalence class? The image of is our desired subset of The set here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the elements of that are in the image of

Now we have a relation described as an arrow, or really an equivalence class of arrows. Next, let’s turn the arrow around!

There are different things I might mean by that, but we want to do it cleverly. When we turn arrows around, the concept of product (for example, cartesian product of sets) turns into the concept of sum (for example, disjoint union of sets). Similarly, the concept of monomorphism (such as a one-to-one function) turns into the concept of epimorphism (such as an onto function). If you don’t believe me, click on the links!

So, we should define a **corelation** from a set to a set to be an equivalence class of onto functions

Why an equivalence class? The set here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the sets of elements of that get mapped to the same thing via

In simpler terms, a corelation from to a set is just a partition of the disjoint union So, it looks like this:

If we like, we can then draw a line connecting any two points that lie in the same part of the partition:

These lines determine the corelation, so we can also draw a corelation this way:

This is why corelations describe circuits made solely of wires!

The main result in Brandon and Brendan’s paper is that is equivalent to the PROP for extraspecial commutative Frobenius monoids. That’s a terse way of the laws governing

Let me just show you the most important laws. In each of these law I’ll draw two circuits made of wires, and write an equals sign asserting that they give the same corelation from a set to a set The inputs of each circuit are on top, and the outputs are at the bottom. I’ll draw 3-way junctions as little triangles, but don’t worry about that. When we compose two corelations we may get a wire left in mid-air, not connected to the inputs or outputs. We draw the end of the wire as a little circle.

There are some laws called the ‘commutative monoid’ laws:

and an upside-down version called the ‘cocommutative comonoid’ laws:

Then we have ‘Frobenius laws’:

and finally we have the ‘special’ and ‘extra’ laws:

All other laws can be derived from these in some systematic ways.

Commutative Frobenius monoids obey the commutative monoid laws, the cocommutative comonoid laws and the Frobenius laws. They play a fundamental role in 2d topological quantum field theory. Special Frobenius monoids are also well-known. But the ‘extra’ law, which says that a little piece of wire not connected to anything can be thrown away with no effect, is less well studied. Jason Erbele and I gave it this name in our work on control theory:

• John Baez and Jason Erbele, Categories in control. (Blog article here.)

David Ellerman has spent a lot of time studying what would happen to mathematics if we turned around a lot of arrows in a certain systematic way. In particular, just as the concept of relation would be replaced by the concept of corelation, the concept of subset would be replaced by the concept of partition. You can see how it fits together: just as a relation from to is a subset of a corelation from to is a partition of

There’s a lattice of subsets of a set:

In logic these subsets correspond to propositions, and the lattice operations are the logical operations ‘and’ and ‘or’. But there’s also a lattice of partitions of a set:

In Ellerman’s vision, this lattice of partitions gives a new kind of logic. You can read about it here:

• David Ellerman, Introduction to partition logic, *Logic Journal of the Interest Group in Pure and Applied Logic* **22** (2014), 94–125.

As mentioned, the main result in Brandon and Brendan’s paper is that is equivalent to the PROP for extraspecial commutative Frobenius monoids. After they proved this, they noticed that the result has also been stated in other language and proved in other ways by two other authors:

• Fabio Zanasi, *Interacting Hopf Algebras—the Theory of Linear Systems*, PhD thesis, École Normale Supériere de Lyon, 2015.

• K. Dosen and Z. Petrić, Syntax for split preorders, *Annals of Pure and Applied Logic* **164** (2013), 443–481.

Unsurprisingly, I prefer Brendan and Brandon’s approach to deriving the result. But it’s nice to see different perspectives!

]]>

It’s more impressive to see someone go native with a pack of hyenas:

• Marcus Baynes-Rock, *Among the Bone Eaters: Encounters with Hyenas in Harar*, Penn State University Press, 2015.

I’ve always been scared of hyenas, perhaps because they look ill-favored and ‘mean’ to me, or perhaps because their jaws have incredible bone-crushing force:

This is a spotted hyena, the species of hyena that Marcus Baynes-Rock befriended in the Ethiopian city of Harar. Their bite force has been estimated at 220 pounds!

(As a scientist I should say 985 newtons, but I have trouble imagining what it’s like to have teeth pressing into my flesh with a force of 985 newtons. If you don’t have a feeling for ‘pounds’, just imagine a 100-kilogram man standing on a hyena tooth that is pressing into your leg.)

So, you don’t want to annoy a hyena, or look too edible. However, the society of hyenas is founded on *friendship!* It’s the bonds of friendship that will make one hyena rush in to save another from an attacking lion. So, if you can figure out how to make hyenas *befriend* you, you’ve got some heavy-duty pals who will watch your back.

In Harar, people have been associating with spotted hyenas for a long time. At first they served as ‘trash collectors’, but later the association deepened. According to Wikipedia:

Written records indicate that spotted hyenas have been present in the walled Ethiopian city of Harar for at least 500 years, where they sanitise the city by feeding on its organic refuse.

The practice of regularly feeding them did not begin until the 1960s. The first to put it into practice was a farmer who began to feed hyenas in order to stop them attacking his livestock, with his descendants having continued the practice. Some of the hyena men give each hyena a name they respond to, and call to them using a “hyena dialect”, a mixture of English and Oromo. The hyena men feed the hyenas by mouth, using pieces of raw meat provided by spectators. Tourists usually organize to watch the spectacle through a guide for a negotiable rate. As of 2002, the practice is considered to be on the decline, with only two practicing hyena men left in Harar.

According to local folklore, the feeding of hyenas in Harar originated during a 19th-century famine, during which the starving hyenas began to attack livestock and humans. In one version of the story, a pure-hearted man dreamed of how the Hararis could placate the hyenas by feeding them porridge, and successfully put it into practice, while another credits the revelation to the town’s Muslim saints convening on a mountaintop. The anniversary of this pact is celebrated every year on the Day of Ashura, when the hyenas are provided with porridge prepared with pure butter. It is believed that during this occasion, the hyenas’ clan leaders taste the porridge before the others. Should the porridge not be to the lead hyenas’ liking, the other hyenas will not eat it, and those in charge of feeding them make the requested improvements. The manner in which the hyenas eat the porridge on this occasion are believed to have oracular significance; if the hyena eats more than half the porridge, then it is seen as portending a prosperous new year. Should the hyena refuse to eat the porridge or eat all of it, then the people will gather in shrines to pray, in order to avert famine or pestilence.

Marcus Baynes-Rock went to Harar to learn about this. He wound up becoming friends with a pack of hyenas:

He would play with them and run with them through the city streets at night. In the end he ‘went native’: he would even be startled, like the hyenas, when they came across a human being!

To get a feeling for this, I think you have to either read his book or listen to this:

• In a city that welcomes hyenas, an anthropologist makes friends, *Here and Now*, National Public Radio, 18 January 2016.

Nearer the beginning of this quest, he wrote this:

The Old Town of Harar in eastern Ethiopia is enclosed by a wall built 500 years ago to protect the town’s inhabitants from hostile neighbours after a religious conflict that destabilised the region. Historically, the gates would be opened every morning to admit outsiders into the town to buy and sell goods and perhaps worship at one of the dozens of mosques in the Muslim city. Only Muslims were allowed to enter. And each night, non-Hararis would be evicted from the town and the gates locked. So it is somewhat surprising that this endogamous, culturally exclusive society incorporated holes into its defensive wall, through which spotted hyenas from the surrounding hills could access the town at night.

Spotted hyenas could be considered the most hated mammal in Africa. Decried as ugly and awkward, associated with witches and sorcerers and seen as contaminating, spotted hyenas are a public relations challenge of the highest order. Yet in Harar, hyenas are not only allowed into the town to clean the streets of food scraps, they are deeply embedded in the traditions and beliefs of the townspeople. Sufism predominates in Harar and at last count there were 121 shrines in and near the town dedicated to the town’s saints. These saints are said to meet on Mt Hakim every Thursday to discuss any pressing issues facing the town and it is the hyenas who pass the information from the saints on to the townspeople via intermediaries who can understand hyena language. Etymologically, the Harari word for hyena, ‘waraba’ comes from ‘werabba’ which translates literally as ‘news man’. Hyenas are also believed to clear the streets of jinn, the unseen entities that are a constant presence for people in the town, and hyenas’ spirits are said to be like angels who fight with bad spirits to defend the souls of spiritually vulnerable people.

[…]

My current research in Harar is concerned with both sides of the relationship. First is the collection of stories, traditions, songs and proverbs of which there are many and trying to understand how the most hated mammal in Africa can be accommodated in an urban environment; to understand how a society can tolerate the presence of a potentially dangerous

species. Second is to understand the hyenas themselves and their participation in the relationship. In other parts of Ethiopia, and even within walking distance of Harar, hyenas are dangerous animals and attacks on people are common. Yet, in the old town of Harar, attacks are unheard of and it is not unusual to see hyenas, in search of food scraps, wandering past perfectly edible people sleeping in the streets. This localised immunity from attack is reassuring for a researcher spending nights alone with the hyenas in Harar’s narrow streets and alleys.

But this sounds like it was written before he went native!

By the way: people have even applied network theory to friendships among spotted hyenas:

• Amiyaal Ilany, Andrew S. Booms and Kay E. Holekamp, Topological effects of network structure on long-term social network dynamics in a wild mammal, *Ecology Letters*, **18** (2015), 687–695.

The paper is not open-access, but there’s an easy-to-read summary here:

• Scientists puzzled by ‘social network’ of spotted hyenas, *Sci.news.com*, 18 May 2015.

The scientists collected more than 55,000 observations of social interactions of spotted hyenas (also known as laughing hyenas) over a 20 year period in Kenya, making this one of the largest to date of social network dynamics in any non-human species.

They found that cohesive clustering of the kind where an individual bonds with friends of friends, something scientists call ‘triadic closure,’ was the most consistent factor influencing the long-term dynamics of the social structure of these mammals.

Individual traits, such as sex and social rank, and environmental effects, such as the amount of rainfall and the abundance of prey, also matter, but the ability of individuals to form and maintain social bonds in triads was key.

“Cohesive clusters can facilitate efficient cooperation and hence maximize fitness, and so our study shows that hyenas exploit this advantage. Interestingly, clustering is something done in human societies, from hunter-gatherers to Facebook users,” said Dr Ilany, who is the lead author on the study published in the journal

Ecology LettersHyenas, which can live up to 22 years, typically live in large, stable groups known as clans, which can comprise more than 100 individuals.

According to the scientists, hyenas can discriminate maternal and paternal kin from unrelated hyenas and are selective in their social choices, tending to not form bonds with every hyena in the clan, rather preferring the friends of their friends.

They found that hyenas follow a complex set of rules when making social decisions. Males follow rigid rules in forming bonds, whereas females tend to change their preferences over time. For example, a female might care about social rank at one time, but then later choose based on rainfall amounts.

“In spotted hyenas, females are the dominant sex and so they can be very flexible in their social preferences. Females also remain in the same clan all their lives, so they may know the social environment better,” said study co-author Dr Kay Holekamp of Michigan State University.

“In contrast, males disperse to new clans after reaching puberty, and after they disperse they have virtually no social control because they are the lowest ranking individuals in the new clan, so we can speculate that perhaps this is why they are obliged to follow stricter social rules.”

If you like math, you might like this way of measuring ‘triadic closure’:

• Triadic closure, Wikipedia.

For a program to measure triadic closure, click on the picture:

]]>

That sentence states an obvious fact, but perhaps also a profound insight if we interpret it generally enough.

That sentence is also the title of a paper:

• Daniel L. Scholten, Every good key must be a model of the lock it opens (the Conant & Ashby Theorem revisited), 2010.

Scholten gives a lot of examples, including these:

• A key is a model of a lock’s keyhole.

• A city street map is a model of the actual city streets

• A restaurant menu is a model of the food the restaurant prepares and sells.

• Honey bees use a kind of dance to model the location of a source of nectar.

• An understanding of some phenomenon (for example a physicist’s understanding of lightning) is a mental model of the actual phenomenon.

This line of thought has an interesting application to control theory. It suggests that *to do the best job of regulating some system, a control apparatus should include a model of that system*.

Indeed, much earlier, Conant and Ashby tried to turn this idea into a theorem, the ‘good regulator theorem’:

• Roger C. Conant and W. Ross Ashby, Every good regulator of a system must be a model of that system), *International Journal of Systems Science* **1** (1970), 89–97.

Scholten’s paper is heavily based on this earlier paper. He summarizes it as follows:

What all of this means, more or less, is that the pursuit of a goal by some dynamic agent (Regulator) in the face of a source of obstacles (System) places at least one particular and unavoidable demand on that agent, which is that the agent’s behaviors must be executed in such a reliable and predictable way that they can serve as a representation (Model) of that source of obstacles.

It’s not clear that this is true, but it’s an appealing thought.

A particularly self-referential example arises when the regulator is some organism and the System is the world it lives in, *including itself*. In this case, it seems the regulator should include a model of *itself!* This would lead, ultimately, to self-awareness.

It all sounds great. But Scholten raises an obvious question: if Conant and Ashby’s theorem is so great, why isn’t more well-known? Scholten puts it quite vividly:

Given the preponderance of control-models that are used by humans (the evidence for this preponderance will be surveyed in the latter part of the paper), and especially given the obvious need to regulate that system, one might guess that the C&A theorem would be at least as famous as, say, the Pythagorean Theorem (), the Einstein mass-energy equivalence ( which can be seen on T-shirts and bumper stickers), or the DNA double helix (which actually shows up in TV crime dramas and movies about super heroes). And yet, it would appear that relatively few lay-persons have ever even heard of C&A’s important prerequisite to successful regulation.

There could be various explanations. But here’s mine: when I tried to read Conant and Ashby’s paper, I got stuck. They use some very basic mathematical notation in nonstandard ways, and they don’t clearly state the hypotheses and conclusion of their theorem.

Luckily, the paper is short, and the argument, while mysterious, seems simple. So, I immediately felt I should be able to *dream up* the hypotheses, conclusion, and argument based on the hints given.

Scholten’s paper didn’t help much, since he says:

Throughout the following discussion I will assume that the reader has studied Conant & Ashby’s original paper, possesses the level of technical competence required to understand their proof, and is familiar with the components of the basic model that they used to prove their theorem [….]

However, I have a guess about the essential core of Conant and Ashby’s theorem. So, I’ll state that, and then say more about their setup.

Needless to say, I looked around to see if someone else had already done the work of figuring out what Conant and Ashby were saying. The best thing I found was this:

• B. A. Francis and W. M. Wonham, The internal model principle of control theory, *Automatica* **12** (1976) 457–465.

This paper works in a more specialized context: linear control theory. They’ve got a linear system or ‘plant’ responding to some input, a regulator or ‘compensator’ that is trying to make the plant behave in a desired way, and a ‘disturbance’ that affects the plant in some unwanted way. They prove that to perfectly correct for the disturbance, the compensator must contain an ‘internal model’ of the disturbance.

I’m probably stating this a bit incorrectly. This paper is much more technical, but it seems to be more careful in stating assumptions and conclusions. In particular, they seem to give a precise definition of an ‘internal model’. And I read elsewhere that the ‘internal model principle’ proved here has become a classic result in control theory!

This paper says that Conant and Ashby’s paper provided “plausibility arguments in favor of the internal model idea”. So, perhaps Conant and Ashby inspired Francis and Wonham, and were then largely forgotten.

My guess is that Conant and Ashby’s theorem boils down to this:

**Theorem.** Let and be finite sets, and fix a probability distribution on . Suppose is any probability distribution on such that

Let be the Shannon entropy of and let be the Shannon entropy of Then

and equality is achieved if there is a function

such that

█

Note that this is not an ‘if and only if’.

The proof of this is pretty easy to anyone who knows a bit about probability theory and entropy. I can restate it using a bit of standard jargon, which may make it more obvious to experts. We’ve got an -valued random variable, say We want to extend it to an -valued random variable whose entropy is small as possible. Then we can achieve this by choosing a function and letting

Here’s the point: if we make be a function of we aren’t adding any extra randomness, so the entropy doesn’t go up.

What in the world does this have to do with a good regulator containing a model of the system it’s regulating?

Well, I can’t explain that as well as I’d like—sorry. But the rough idea seems to be this. Suppose that is a **system** with a given random behavior, and is another system, the **regulator**. If we want the combination of the system and regulator to behave as ‘nonrandomly’ as possible, we can let the state of the regulator be a function of the state of the system.

This theorem is actually a ‘lemma’ in Conant and Ashby’s paper. Let’s look at their setup, and the ‘good regulator theorem’ as they actually state it.

Conant and Ashby consider five sets and three functions. In a picture:

The sets are these:

• A set of possible **outcomes**.

• A **goal**: some subset of **good** outcomes

• A set of **disturbances**, which I might prefer to call ‘inputs’.

• A set of states of some **system** that is affected by the disturbances.

• A set of states of some **regulator** that is also affected by the disturbances.

The functions are these:

• A function saying how a disturbance determines a state of the system.

• A function saying how a disturbance determines a state of the regulator.

• A function saying how a state of the system and a state of the regulator determines an outcome.

Of course we want some conditions on these maps. What we want, *I guess*, is for the outcome to be good regardless of the disturbance. I might say that as follows: for every we have

Unfortunately Conant and Ashby say they want this:

I can’t parse this: they’re using math notation in ways I don’t recognize. Can you figure out what they mean, and whether it matches my guess above?

Then, after a lot of examples and stuff, they state their theorem:

**Theorem.** The simplest optimal regulator of a reguland produces events which are related to events by a mapping

Clearly I’ve skipped over too much! This barely makes any sense at all.

Unfortunately, looking at the text before the theorem, I don’t see these terms being explained. Furthermore, their ‘proof’ introduces extra assumptions that were not mentioned in the statement of the theorem. It begins:

The sets and and the mapping are presumed given. We will assume that over the set there exists a probability distribution which gives the relative frequencies of the events in We will further assume that the behaviour of any particular regulator is specified by a conditional distribution giving, for each event in a distribution on the regulatory events in

Get it? Now they’re saying the state of the regulator depends on the state of the system via a conditional probability distribution where and It’s odd that they didn’t mention this earlier! Their picture made it look like the state of the regulator is determined by the ‘disturbance’ via the function But okay.

They’re also assuming there’s a probability distribution on They use this and the above conditional probability distribution to get a probability distribution on

In fact, the set and the functions out of this set seem to play no role in their proof!

It’s unclear to me exactly what we’re given, what we get to choose, and what we’re trying to optimize. They do try to explain this. Here’s what they say:

Now and jointly determine and hence and the entropy in the set of outcomes:

With fixed, the class of optimal regulators therefore corresponds to the class of optimal distributions for which is minimal. We will call this class of optimal distributions

I could write a little essay on why this makes me unhappy, but never mind. I’m used to the habit of using the same letter to stand for probability distributions on lots of different sets: folks let the argument of say which set they have in mind at any moment. So, they’re starting with a probability distribution on and a conditional probability distribution on given They’re using these to determine probability distribution on Then, presumably using the map they get a probability distribution on is the entropy of the probability distribution on and for some reason they are trying to minimize this.

(Where did the subset of ‘good’ outcomes go? Shouldn’t that play a role? Oh well.)

I believe the claim is that when this entropy is minimized, there’s a function such that

This says that the state of the regulator should be completely determined by the the state of the system. And this, I believe, is what they mean by

Every good regulator of a system must be a model of that system.

I hope you understand: I’m not worrying about whether the setup is a good one, e.g. sufficiently general for real-world applications. I’m just trying to figure out what the setup actually *is*, what Conant and Ashby’s theorem actually *says*, and whether it’s *true*.

I think I’ve just made a lot of progress. Surely this was no fun to read. But it I found it useful to write it.

]]>

• Ken Caldeira, Stop Emissions!, *Technology Review*, January/February 2016, 41–43.

Let me quote a bit:

Many years ago, I protested at the gates of a nuclear power plant. For a long time, I believed it would be easy to get energy from biomass, wind, and solar. Small is beautiful. Distributed power, not centralized.

I wish I could still believe that.

My thinking changed when I worked with Marty Hoffert of New York University on research that was first published in

Naturein 1998. It was the first peer-reviewed study that examined the amount of near-zero-emission energy we would need in order to solve the climate problem. Unfortunately, our conclusions still hold. We need massive deployment of affordable and dependable near-zero-emission energy, and we need a major research and development program to develop better energy and transportation systems.It’s true that wind and solar power have been getting much more attractive in recent years. Both have gotten significantly cheaper. Even so, neither wind nor solar is dependable enough, and batteries do not yet exist that can store enough energy at affordable prices to get a modern industrial society through those times when the wind is not blowing and the sun is not shining.

Recent analyses suggest that wind and solar power, connected by a continental-scale electric grid and using natural-gas power plants to provide backup, could reduce greenhouse-gas emissions from electricity production by about two-thirds. But generating electricity is responsible for only about one-third of total global carbon dioxide emissions, which are increasing by more than 2 percent a year. So even if we had this better electric sector tomorrow, within a decade or two emissions would be back where they are today.

We need to bring much, much more to bear on the climate problem. It can’t be solved unless it is addressed as seriously as we address national security. The politicians who go to the Paris Climate Conference are making commitments that fall far short of what would be needed to substantially reduce climate risk.

## Daunting math

Four weeks ago, a hurricane-strength cyclone smashed into Yemen, in the Arabian Peninsula, for the first time in recorded history. Also this fall, a hurricane with the most powerful winds ever measured slammed into the Pacific coast of Mexico.

Unusually intense storms such as these are a predicted consequence of global warming, as are longer heat waves and droughts and many other negative weather-related events that we can expect to become more commonplace. Already, in the middle latitudes of the Northern Hemisphere, average temperatures are increasing at a rate that is equivalent to moving south about 10 meters (30 feet) each day. This rate is about 100 times faster than most climate change that we can observe in the geologic record, and it gravely threatens biodiversity in many parts of the world. We are already losing about two coral reefs each week, largely as a direct consequence of our greenhouse-gas emissions.

Recently, my colleagues and I studied what will happen in the long term if we continue pulling fossil carbon out of the ground and releasing it into the atmosphere. We found that it would take many thousands of years for the planet to recover from this insult. If we burn all available fossil-fuel resources and dump the resulting carbon dioxide waste in the sky, we can expect global average temperatures to be 9 °C (15 °F) warmer than today even 10,000 years into the future. We can expect sea levels to be about 60 meters (200 feet) higher than today. In much of the tropics, it is possible that mammals (including us) would not be able to survive outdoors in the daytime heat. Thus, it is essential to our long-term well-being that fossil-fuel carbon does not go into our atmosphere.

If we want to reduce the threat of climate change in the near future, there are actions to take now: reduce emissions of short-lived pollutants such as black carbon, cut emissions of methane from natural-gas fields and landfills, and so on. We need to slow and then reverse deforestation, adopt electric cars, and build solar, wind, and nuclear plants.

But while existing technologies can start us down the path, they can’t get us to our goal. Most analysts believe we should decarbonize electricity generation and use electricity for transportation, industry, and even home heating. (Using electricity for heating is wildly inefficient, but there may be no better solution in a carbon-constrained world.) This would require a system of electricity generation several times larger than the one we have now. Can we really use existing technology to scale up our system so dramatically while markedly reducing emissions from that sector?

Solar power is the only energy source that we know can power civilization indefinitely. Unfortunately, we do not have global-scale electricity grids that could wheel solar energy from day to night. At the scale of the regional electric grid, we do not have batteries that can balance daytime electricity generation with nighttime demand.

We should do what we know how to do. But all the while, we need to be thinking about what we don’t know how to do. We need to find better ways to generate, store, and transmit electricity. We also need better zero-carbon fuels for the parts of the economy that can’t be electrified. And most important, perhaps, we need better ways of using energy.

Energy is a means, not an end. We don’t want energy so much as we want what it makes possible: transportation, entertainment, shelter, and nutrition. Given United Nations estimates that the world will have at least 11 billion people by the end of this century (50 percent more than today), and given that we can expect developing economies to grow rapidly, demand for services that require energy is likely to increase by a factor of 10 or more over the next century. If we want to stabilize the climate, we need to reduce total emissions from today’s level by a factor of 10. Put another way, if we want to destroy neither our environment nor our economy, we need to reduce the emissions per energy service provided by a factor of 100. This requires something of an energy miracle.

The essay continues.

Near the end, he writes “despite all these reasons for despair, I’m hopeful”. He is hopeful that a collective change of heart is underway that will enable humanity to solve this problem. But he doesn’t claim to know any workable solution to the problem. In fact, he mostly list reasons why various possible solutions won’t be enough.

]]>

I just learned about the Salar de Uyuni: the world’s largest salt flat, located in southwest Bolivia. It’s about 10,000 square kilometers in area!

It’s high up, near the crest of the Andes, 3,600 meters above sea level. Once there were permanent lakes here, but no more. This area is a transition zone: the eastern part gets rain in the summer, but clouds never make it past the western part, near the border with Chile. Further west comes the the famously dry Atacama Desert.

The Salar de Uyuni is high, but still it lives up to the name ‘salt flat’: its salt crust varies in height by less than one meter over the entire area. It’s so flat that people use it for testing equipment that measures altitudes.

Why is it so flat? Because the dry crust covers a huge pool of brine that is still liquid! This brine is a saturated solution of sodium chloride, lithium chloride and magnesium chloride in water. As a result, Salar de Uyuni contains over half of the world’s lithium reserves!

In the rainy season, the Salazar de Uyuni looks very different:

And when it’s wet, three different types of flamingos visit the Salar: the Chilean flamingo, the rare Andean flamingo, and the closely related but even rarer James flamingo, which for a while was thought to be extinct!

Flamingos eat algae that grow in the brine. This is why they’re pink! Newly hatched flamingos are gray or white. Their feathers become pink only thanks to carotene which they get from algae—or from crustaceans that in turn eat algae. Animals are not able to synthesize these molecules!

Carotene comes in different forms, but here is one of the most

common: β-carotene. I like it because it’s perfectly symmetrical. It has a long chain of carbons with alternating single and double bonds. Electrons vibrating along this chain absorb blue light. So carotene has the opposite color: *orange!*

It’s not just flamingos that need carotene or related compounds. Humans need a chemical called retinal in order to see:

It looks roughly like half a carotene molecule—and like

carotene, it’s good at absorbing light. Attached to a larger protein molecule called an opsin, retinal acts like a kind of antenna, catching particles of light. Humans can’t produce retinal without help from the foods we eat. Any chemical we can use to produce retinal is called ‘vitamin A’. So vitamin A isn’t one specific chemical: it’s a group. But beta carotene counts as a form of vitamin A.

Speaking of humans: people sometimes come to have fun in the Salar de Uyuni. There are hotels made of salt! And thanks to the featureless expanse of salt, you can take some amusing trick pictures:

Click on the pictures to find out more about them. For more on the Salar de Uyuni, try:

• Salar de Uyuni, Wikipedia.

**Puzzle:** What kinds of algae, and other organisms, live in the brine of the Salar de Uyuni when it rains? How do they survive when it dries out? There must be some very interesting adaptations going on.

]]>

If you put yeast cells in water containing a constant low concentration of glucose, they convert it into alcohol at a constant rate. But if you increase the concentration of glucose something funny happens. The alcohol output starts to oscillate!

It’s not that the yeast is doing something clever and complicated. If you break down the yeast cells, killing them, this effect still happens. People think these oscillations are inherent to the chemical reactions in glycolysis.

I learned this after writing Part 1, thanks to Alan Rendall. I first met Alan when we were both working on quantum gravity. But last summer I met him in Copenhagen, where we both attending the workshop Trends in reaction network theory. It turned out that now he’s deep into the mathematics of biochemistry, especially chemical oscillations! He has a blog, and he’s written some great articles on glycolysis:

• Alan Rendall, Albert Goldbeter and glycolytic oscillations, *Hydrobates*, 21 January 2012.

• Alan Rendall, The Higgins–Selkov oscillator, *Hydrobates*, 14 May 2014.

In case you’re wondering, *Hydrobates* is the name of a kind of sea bird, the storm petrel. Alan is fond of sea birds. Since the ultimate goal of my work is to help our relationship with nature, this post is dedicated to the storm petrel:

Last time I gave a summary description of glycolysis:

glucose + 2 NAD^{+} + 2 ADP + 2 phosphate →

2 pyruvate + 2 NADH + 2 H^{+} + 2 ATP + 2 H_{2}O

2 pyruvate + 2 NADH + 2 H

The idea is that a single molecule of glucose:

gets split into two molecules of pyruvate:

The free energy released from this process is used to take two molecules of adenosine diphosphate or ADP:

and attach to each one phosphate group, typically found as phosphoric acid:

thus producing two molecules of adenosine triphosphate or ATP:

along with 2 molecules of water.

But in the process, something else happens too! 2 molecules of nicotinamide adenine dinucleotide NAD get reduced. That is, they change from the oxidized form called NAD^{+}:

to the reduced form called NADH, along with two protons: that is, 2 H^{+}.

**Puzzle 1.** Why does NAD^{+} have a little plus sign on it, despite the two O^{–}’s in the picture above?

Left alone in water, ATP spontaneously converts back to ADP and phosphate:

ATP + H_{2}O → ADP + phosphate

This process gives off 30.5 kilojoules of energy per mole. The cell harnesses this to do useful work by coupling this reaction to others. Thus, ATP serves as ‘energy currency’, and making it is the main point of glycolysis.

The cell can also use NADH to do interesting things. It generally has more free energy than NAD^{+}, so it can power things while turning back into NAD^{+}. Just how much more free energy it has depends a lot on conditions in the cell: for example, on the pH.

**Puzzle 2.** There is often roughly 700 times as much NAD^{+} as NADH in the cytoplasm of mammals. In these conditions, what is the free energy difference between NAD^{+} and NADH? I think this is something you’re supposed to be able to figure out.

Nothing in what I’ve said so far gives any clue about why glycolysis might exhibit *oscillations*. So, we have to dig deeper.

Glycolysis actually consists of 10 steps, each mediated by its own enzyme. Click on this picture to see all these steps:

If your eyes tend to glaze over when looking at this, don’t feel bad—so do mine. There’s a lot of information here. But if you look carefully, you’ll see that the 1st and 3rd stages of glycolysis actually convert 2 ATP’s to ADP, while the 7th and 10th convert 4 ADP’s to ATP. So, the early steps require free energy, while the later ones double this investment. As the saying goes, “it takes money to make money”.

This nuance makes it clear that if a cell starts with no ATP, it won’t be able to make ATP by glycolysis. And if has just a small amount of ATP, it won’t be very good at making it this way.

In short, this affects the dynamics in an important way. But I don’t see how it could explain oscillations in how much ATP is manufactured from a constant supply of glucose!

We can look up the free energy changes for each of the 10 reactions in glycolysis. Here they are, named by the enzymes involved:

I got this from here:

• Leslie Frost, Glycolysis.

I think these are her notes on Chapter 14 of Voet, Voet, and Pratt’s *Fundamentals of Biochemistry*. But again, I don’t think these explain the oscillations. So we have to look elsewhere.

By some careful detective work—by replacing the input of glucose by an input of each of the intermediate products—biochemists figured out which step causes the oscillations. It’s the 3rd step, where fructose-6-phosphate is converted into fructose-1,6-bisphosphate, powered by the conversion of ATP into ADP. The enzyme responsible for this step is called phosphofructokinase or PFK. And it turns out that PFK works better when there is ADP around!

In short, the reaction network shown above is incomplete: ADP catalyzes its own formation in the 3rd step.

How does this lead to oscillations? The **Higgins–Selkov model** is a scenario for how it *might* happen. I’ll explain this model, offering no evidence that it’s correct. And then I’ll take you to a website where you can see this model in action!

Suppose that fructose-6-phosphate is being produced at a constant rate. And suppose there’s some other reaction, which we haven’t mentioned yet, that uses up ADP at a constant rate. Suppose also that it takes two ADP’s to catalyze the 3rd step. So, we have these reactions:

→ fructose-6-phosphate

fructose-6-phosphate + 2 ADP → 3 ADP

ADP →

fructose-6-phosphate + 2 ADP → 3 ADP

ADP →

Here the blanks mean ‘nothing’, or more precisely ‘we don’t care’. The fructose-6-biphosphate is coming in from somewhere, but we don’t care where it’s coming from. The ADP is going away, but we don’t care where. We’re also ignoring the ATP that’s required for the second reaction, and the fructose-1,6-bisphosphate that’s produced by this reaction. All these features are irrelevant to the Higgins–Selkov model.

Now suppose there’s initially a lot of ADP around. Then the fructose-6-phosphate will quickly be used up, creating even more ADP. So we get even more ADP!

But as this goes on, the amount of fructose-6-phosphate sitting around will drop. So, eventually the production of ADP will drop. Thus, since we’re positing a reaction that uses up ADP at a constant rate, the amount of ADP will start to drop.

Eventually there will be very little ADP. Then it will be very hard for fructose-6-phosphate to get used up. So, the amount of fructose-6-phosphate will start to build up!

Of course, whatever ADP is still left will help use up this fructose-6-phosphate and turn it into ADP. This will increase the amount of ADP. So eventually we will have a lot of ADP again.

We’re back where we started. And so, we’ve got a cycle!

Of course, this story doesn’t prove anything. We should really take our chemical reaction network and translate it into some differential equations for the amount of fructose-6-phosphate and the amount of ADP. In the Higgins–Selkov model people sometimes write just ‘S’ for fructose-6-phosphate and ‘P’ for ADP. (In case you’re wondering, S stands for ‘substrate’ and P stands for ‘product’.) So, our chemical reaction network becomes

→ S

S + 2P → 3P

P →

S + 2P → 3P

P →

and using the law of mass action we get these equations:

where and stand for how much S and P we have, respectively, and are some constants.

Now we can solve these differential equations and see if we get oscillations. The answer depends on the constants and also perhaps the initial conditions.

To see what actually happens, try this website:

• Mike Martin, Glycolytic oscillations: the Higgins–Selkov model.

If you run it with the constants and initial conditions given to you, you’ll get oscillations. You’ll also get this vector field on the plane, showing how the system evolves in time:

This is called a phase portrait, and its a standard tool for studying first-order differential equations where two variables depend on time.

This particular phase portrait shows an unstable fixed point and a limit cycle. That’s jargon for saying that in these conditions, the system will tend to oscillate. But if you adjust the constants, the limit cycle will go away! The appearance or disappearance of a limit cycle like this is called a Hopf bifurcation.

For details, see:

• Alan Rendall, Dynamical systems, Chapter 11: Oscillations.

He shows that the Higgins–Selkov model has a unique stationary solution (i.e. fixed point), which he describes. By linearizing it, he finds that this fixed point is stable when (the inflow of S) is less than a certain value, and unstable when it exceeds that value.

In the unstable case, if the solutions are all bounded as there must be a periodic solution. In the course notes he shows this for a simpler model of glycolysis, the Schnakenberg model. In some separate notes he shows it for the Higgins–Selkov model, at least for certain values of the parameters:

• Alan Rendall, The Higgins–Selkov oscillator.

]]>

In December, the rover Curiosity reached some sand dunes on Mars, giving us the first views of these dunes taken from the ground instead of from above. It’s impressive how the dune seems to shoot straight up from the rocks here!

In fact this slope—the steep downwind slope of one of “Bagnold Dunes” along the northwestern flank of Mount Sharp—is just about 27°. But mountaineers will confirm that slopes always looks steeper than they are.

The wind makes this dune move about one meter per year.

For more, see:

• NASA, NASA Mars rover Curiosity reaches sand dunes, 10 December 2015.

• Jet Propulsion Laboratory, Mastcam telephoto of a Martian dune’s downwind face, 4 January 2016.

• Jet Propulsion Laboratory, Slip face on downwind side of ‘Namib’ sand dune on Mars, 6 January 2016.

]]>

Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).

The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how *life* works. That’s a difficult job! But one has to start somewhere, and this is one place to start.

We have a few papers on this subject:

• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)

However, right now we just want to show you three closely connected results about how relative entropy changes in open Markov processes.

An **open Markov process** consists of a finite set of **states**, a subset of **boundary states**, and an **infinitesimal stochastic** operator meaning a linear operator with

and

For each state we introduce a **population** We call the resulting function the **population distribution**.

Populations evolve in time according to the **open master equation**:

So, the populations obey a linear differential equation at states that are not in the boundary, but they are specified ‘by the user’ to be chosen functions at the boundary states. The off-diagonal entry for describe the rate at which population transitions from the th to the th state.

A **closed Markov process**, or continuous-time discrete-state Markov chain, is an open Markov process whose boundary is empty. For a closed Markov process, the open master equation becomes the usual **master equation**:

In a closed Markov process the total population is conserved:

This lets us normalize the initial total population to 1 and have it stay equal to 1. If we do this, we can talk about *probabilities* instead of populations. In an open Markov process, population can flow in and out at the boundary states.

For any pair of distinct states is the flow of population from to The **net flux** of population from the th state to the th state is the flow from to minus the flow from to :

A **steady state** is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an **equilibrium**. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. The idea is that population can flow in or out at the boundary states.

We say an equilibrium of a Markov process is **detailed balanced** if all the net fluxes vanish:

or in other words:

Given two population distributions we can define the **relative entropy**

When is a detailed balanced equilibrium solution of the master equation, the relative entropy can be seen as the ‘free energy’ of For a precise statement, see Section 4 of Relative entropy in biological systems.

The Second Law of Thermodynamics implies that the free energy of a closed system tends to decrease with time, so for *closed* Markov processes we expect to be nonincreasing. And this is true! But for *open* Markov processes, free energy can flow in from outside. This is just one of several nice results about how relative entropy changes with time.

**Theorem 1.** Consider an open Markov process with as its set of states and as the set of boundary states. Suppose and obey the open master equation, and let the quantities

measure how much the time derivatives of and fail to obey the master equation. Then we have

This result separates the change in relative entropy change into two parts: an ‘internal’ part and a ‘boundary’ part.

It turns out the ‘internal’ part is always less than or equal to zero. So, from Theorem 1 we can deduce a version of the Second Law of Thermodynamics for open Markov processes:

**Theorem 2.** Given the conditions of Theorem 1, we have

Intuitively, this says that free energy can only increase if it comes in from the boundary!

There is another nice result that holds when is an equilibrium solution of the master equation. This idea seems to go back to Schnakenberg:

**Theorem 3.** Given the conditions of Theorem 1, suppose also that is an equilibrium solution of the master equation. Then we have

where

is the **net flux** from to while

is the conjugate **thermodynamic force**.

The flux has a nice meaning: it’s the net flow of population from to The thermodynamic force is a bit subtler, but this theorem reveals its meaning: it says how much the population *wants* to flow from to

More precisely, up to that factor of the thermodynamic force says how much free energy loss is caused by net flux from to There’s a nice analogy here to water losing potential energy as it flows downhill due to the force of gravity.

**Proof of Theorem 1.** We begin by taking the time derivative of the relative information:

We can separate this into a sum over states for which the time derivatives of and are given by the master equation, and boundary states for which they are not:

For boundary states we have

and similarly for the time derivative of We thus obtain

To evaluate the first sum, recall that

so

Thus, we have

We can rewrite this as

Since is infinitesimal stochastic we have so the first term drops out, and we are left with

as desired. █

**Proof of Theorem 2.** Thanks to Theorem 1, to prove

it suffices to show that

or equivalently (recalling the proof of Theorem 1):

The last two terms on the left hand side cancel when Thus, if we break the sum into an part and an part, the left side becomes

Next we can use the infinitesimal stochastic property of to write as the sum of over not equal to obtaining

Since when and for all we conclude that this quantity is █

**Proof of Theorem 3.** Now suppose also that is an equilibrium solution of the master equation. Then for all states so by Theorem 1 we need to show

We also have so the second

term in the sum at left vanishes, and it suffices to show

By definition we have

This in turn equals

and we can switch the dummy indices in the second sum, obtaining

or simply

But this is

and the first term vanishes because is infinitesimal stochastic: We thus have

as desired. █

]]>

• John Baez and Blake Pollard, Relative entropy in biological systems. (Blog article here.)

But now Blake has a new paper, and I want to talk about that:

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology.

I’ll focus on just one aspect: the principle of minimum entropy production. This is an exciting yet controversial principle in non-equilibrium thermodynamics. Blake examines it in a situation where we can tell exactly what’s happening.

Life exists away from equilibrium. Left isolated, systems will tend toward thermodynamic equilibrium. However, biology is about **open systems**: physical systems that exchange matter or energy with their surroundings. Open systems can be maintained away from equilibrium by this exchange. This leads to the idea of a **non-equilibrium steady state**—a state of an open system that doesn’t change, but is not in equilibrium.

A simple example is a pan of water sitting on a stove. Heat passes from the flame to the water and then to the air above. If the flame is very low, the water doesn’t boil and nothing moves. So, we have a steady state, at least approximately. But this is not an equilibrium, because there is a constant flow of energy through the water.

Of course in reality the water will be slowly evaporating, so we don’t really have a steady state. As always, models are approximations. If the water is evaporating slowly enough, it can be useful to approximate the situation with a non-equilibrium steady state.

There is much more to biology than steady states. However, to dip our toe into the chilly waters of non-equilibrium thermodynamics, it is nice to start with steady states. And already here there are puzzles left to solve.

Ilya Prigogine won the Nobel prize for his work on non-equilibrium thermodynamics. One reason is that he had an interesting idea about steady states. He claimed that under certain conditions, a non-equilibrium steady state will *minimize entropy production!*

There has been a lot of work trying to make the ‘principle of minimum entropy production’ precise and turn it into a theorem. In this book:

• G. Lebon and D. Jou, *Understanding Non-equilibrium Thermodynamics*, Springer, Berlin, 2008.

the authors give an argument for the principle of minimum entropy production based on four conditions:

• **time-independent boundary conditions**: the surroundings of the system don’t change with time.

• **linear phenomenological laws**: the laws governing the macroscopic behavior of the system are linear.

• **constant phenomenological coefficients**: the laws governing the macroscopic behavior of the system don’t change with time.

• **symmetry of the phenomenological coefficients**: since they are linear, the laws governing the macroscopic behavior of the system can be described by a linear operator, and we demand that in a suitable basis the matrix for this operator is symmetric:

The last condition is obviously the subtlest one; it’s sometimes called **Onsager reciprocity**, and people have spent a lot of time trying to derive it from other conditions.

However, Blake goes in a different direction. He considers a concrete class of open systems, a very large class called ‘open Markov processes’. These systems obey the first three conditions listed above, and the ‘detailed balanced’ open Markov processes also obey the last one. But Blake shows that minimum entropy production holds only approximately—with the approximation being good for steady states that are *near equilibrium!*

However, he shows that another minimum principle holds exactly, even for steady states that are far from equilibrium. He calls this the ‘principle of minimum dissipation’.

We actually discussed the principle of minimum dissipation in an earlier paper:

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

But one advantage of Blake’s new paper is that it presents the results with a minimum of category theory. Of course I love category theory, and I think it’s the right way to formalize open systems, but it can be intimidating.

Another good thing about Blake’s new paper is that it explicitly compares the principle of minimum entropy to the principle of minimum dissipation. He shows they agree in a certain limit—namely, the limit where the system is close to equilibrium.

Let me explain this. I won’t include the nice example from biology that Blake discusses: a very simple model of membrane transport. For that, read his paper! I’ll just give the general results.

An **open Markov process** consists of a finite set of **states**, a subset of **boundary states**, and an **infinitesimal stochastic** operator meaning a linear operator with

and

I’ll explain these two conditions in a minute.

For each we introduce a **population** We call the resulting function the **population distribution**. Populations evolve in time according to the **open master equation**:

So, the populations obey a linear differential equation at states that are not in the boundary, but they are specified ‘by the user’ to be chosen functions at the boundary states.

The off-diagonal entries are the rates at which population hops from the th to the th state. This lets us understand the definition of an infinitesimal stochastic operator. The first condition:

says that the rate for population to transition from one state to another is non-negative. The second:

says that population is conserved, at least if there are no boundary states. Population can flow in or out at boundary states, since the master equation doesn’t hold there.

A **steady state** is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an **equilibrium**. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. Again, the reason is that population can flow in or out at the boundary.

We say an equilibrium of a Markov process is **detailed balanced** if the rate at which population flows from the th state to the th state is equal to the rate at which it flows from the th state to the th:

Suppose we’ve got an open Markov process that has a detailed balanced equilibrium . Then a non-equilibrium steady state will minimize a function called the ‘dissipation’, subject to constraints on its boundary populations. There’s a nice formula for the dissipation in terms of and

**Definition.** Given an open Markov process with detailed balanced equilibrium we define the **dissipation** for a population distribution to be

This formula is a bit tricky, but you’ll notice it’s quadratic in and it vanishes when So, it’s pretty nice.

Using this concept we can formulate a principle of minimum dissipation, and prove that non-equilibrium steady states obey this principle:

**Definition.** We say a population distribution obeys the **principle of minimum dissipation** with boundary population if minimizes subject to the constraint that

**Theorem 1.** A population distribution is a steady state with for all boundary states if and only if obeys the principle of minimum dissipation with boundary population .

**Proof**. This follows from Theorem 28 in A compositional framework for Markov processes.

How does dissipation compare with entropy production? To answer this, first we must ask: what really is entropy production? And: how does the equilibrium state show up in the concept of entropy production?

The **relative entropy** of two population distributions is given by

It is well known that for a closed Markov process with as a detailed balanced equilibrium, the relative entropy is monotonically *decreasing* with time. This is due to an annoying sign convention in the definition of relative entropy: while entropy is typically increasing, relative entropy typically decreases. We could fix this by putting a minus sign in the above formula or giving this quantity some other name. A lot of people call it the **Kullback–Leibler divergence**, but I have taken to calling it **relative information**. For more, see:

• John Baez and Blake Pollard, Relative entropy in biological systems. (Blog article here.)

We say ‘relative entropy’ in the title, but then we explain why ‘relative information’ is a better name, and use that. More importantly, we explain why has the physical meaning of *free energy*. Free energy tends to decrease, so everything is okay. For details, see Section 4.

Blake has a nice formula for how fast decreases:

**Theorem 2.** Consider an open Markov process with as its set of states and as the set of boundary states. Suppose obeys the open master equation and is a detailed balanced equilibrium. For any boundary state let

measure how much fails to obey the master equation. Then we have

Moreover, the first term is less than or equal to zero.

**Proof.** For a self-contained proof, see Information geometry (part 16), which is coming up soon. It will be a special case of the theorems there. █

Blake compares this result to previous work by Schnakenberg:

• J. Schnakenberg, Network theory of microscopic and macroscopic behavior of master equation systems, *Rev. Mod. Phys.* **48** (1976), 571–585.

The negative of Blake’s first term is this:

Under certain circumstances, this equals what Schnakenberg calls the **entropy production**. But a better name for this quantity might be **free energy loss**, since for a closed Markov process that’s exactly what it is! In this case there are no boundary states, so the theorem above says is the rate at which relative entropy—or in other words, free energy—decreases.

For an open Markov process, things are more complicated. The theorem above shows that free energy can also flow in or out at the boundary, thanks to the second term in the formula.

Anyway, the sensible thing is to compare a principle of ‘minimum free energy loss’ to the principle of minimum dissipation. The principle of minimum dissipation is true. How about the principle of minimum free energy loss? It turns out to be approximately true near equilibrium.

For this, consider the situation in which is near to the equilibrium distribution in the sense that

for some small numbers We collect these numbers in a vector called

**Theorem 3.** Consider an open Markov process with as its set of states and as the set of boundary states. Suppose is a detailed balanced equilibrium and let be arbitrary. Then

where is the free energy loss, is the dissipation, is defined as above, and by we mean a sum of terms of order

**Proof.** First take the free energy loss:

Expanding the logarithm to first order in we get

Since is infinitesimal stochastic, so the second term in the sum vanishes, leaving

or

Since is a equilibrium we have so now the last term in the sum vanishes, leaving

Next, take the dissipation

and expand the square, getting

Since is infinitesimal stochastic, The first term is just this times a function of summed over so it vanishes, leaving

Since is an equilibrium, The last term above is this times a function of summed over so it vanishes, leaving

This matches what we got for up to terms of order █

In short: detailed balanced open Markov processes are governed by the principle of minimum dissipation, not minimum entropy production. *Minimum dissipation agrees with minimum entropy production only near equilibrium.*

]]>