Relative Entropy in Biological Systems

27 November, 2015

Here’s a draft of a paper for the proceedings of a workshop on Information and Entropy in Biological System this spring:

• John Baez and Blake Pollard, Relative Entropy in Biological Systems.

We’d love any comments or questions you might have. I’m not happy with the title. In the paper we advocate using the term ‘relative information’ instead of ‘relative entropy’—yet the latter is much more widely used, so I feel we need it in the title to let people know what the paper is about!

Here’s the basic idea.

Life relies on nonequilibrium thermodynamics, since in thermal equilibrium there are no flows of free energy. Biological systems are also open systems, in the sense that both matter and energy flow in and out of them. Nonetheless, it is important in biology that systems can sometimes be treated as approximately closed, and sometimes approach equilibrium before being disrupted in one way or another. This can occur on a wide range of scales, from large ecosystems to within a single cell or organelle. Examples include:

• A population approaching an evolutionarily stable state.

• Random processes such as mutation, genetic drift, the diffusion of organisms in an environment or the diffusion of molecules in a liquid.

• A chemical reaction approaching equilibrium.

An interesting common feature of these processes is that as they occur, quantities mathematically akin to entropy tend to increase. Closely related quantities such as free energy tend to decrease. In this review, we explain some mathematical results that make this idea precise.

Most of these results involve a quantity that is variously known as ‘relative information’, ‘relative entropy’, ‘information gain’ or the ‘Kullback–Leibler divergence’. We’ll use the first term. Given two probability distributions p and q on a finite set X, their relative information, or more precisely the information of p relative to q, is

\displaystyle{ I(p\|q) = \sum_{i \in X} p_i \ln\left(\frac{p_i}{q_i}\right) }

We use the word ‘information’ instead of ‘entropy’ because one expects entropy to increase with time, and the theorems we present will say that I(p\|q) decreases with time under various conditions. The reason is that the Shannon entropy

\displaystyle{ S(p) = -\sum_{i \in X} p_i \ln p_i }

contains a minus sign that is missing from the definition of relative information.

Intuitively, I(p\|q) is the amount of information gained when we start with a hypothesis given by some probability distribution q and then learn the ‘true’ probability distribution p. For example, if we start with the hypothesis that a coin is fair and then are told that it landed heads up, the relative information is \ln 2, so we have gained 1 bit of information. If however we started with the hypothesis that the coin always lands heads up, we would have gained no information.

We put the word ‘true’ in quotes here, because the notion of a ‘true’ probability distribution, which subjective Bayesians reject, is not required to use relative information. A more cautious description of relative information is that it is a divergence: a way of measuring the difference between probability distributions that obeys

I(p \| q) \ge 0


I(p \| q) = 0 \iff p = q

but not necessarily the other axioms for a distance function, symmetry and the triangle inequality, which indeed fail for relative information.

There are many other divergences besides relative information, some of which we discuss in Section 6. However, relative information can be singled out by a number of characterizations, including one based on ideas from Bayesian inference. The relative information is also close to the expected number of extra bits required to code messages distributed according to the probability measure p using a code optimized for messages distributed according to q.

In this review, we describe various ways in which a population or probability distribution evolves continuously according to some differential equation. For all these differential equations, I describe conditions under which relative information decreases. Briefly, the results are as follows. We hasten to reassure the reader that our paper explains all the jargon involved, and the proofs of the claims are given in full:

• In Section 2, we consider a very general form of the Lotka–Volterra equations, which are a commonly used model of population dynamics. Starting from the population P_i of each type of replicating entity, we can define a probability distribution

p_i = \frac{P_i}{\displaystyle{\sum_{i \in X} P_i }}

which evolves according to a nonlinear equation called the replicator equation. We describe a necessary and sufficient condition under which I(q\|p(t)) is nonincreasing when p(t) evolves according to the replicator equation while q is held fixed.

• In Section 3, we consider a special case of the replicator equation that is widely studied in evolutionary game theory. In this case we can think of probability distributions as mixed strategies in a two-player game. When q is a dominant strategy, $I(q|p(t))$ can never increase when p(t) evolves according to the replicator equation. We can think of I(q\|p(t)) as the information that the population has left to learn.
Thus, evolution is analogous to a learning process—an analogy that in the field of artificial intelligence is exploited by evolutionary algorithms.

• In Section 4 we consider continuous-time, finite-state Markov processes. Here we have probability distributions on a finite set X evolving according to a linear equation called the master equation. In this case I(p(t)\|q(t)) can never increase. Thus, if q is a steady state solution of the master equation, both I(p(t)\|q) and I(q\|p(t)) are nonincreasing. We can always write q as the Boltzmann distribution for some energy function E : X \to \mathbb{R}, meaning that

\displaystyle{ q_i = \frac{\exp(-E_i / k T)}{\displaystyle{\sum_{j \in X} \exp(-E_j / k T)}} }

where T is temperature and k is Boltzmann’s constant. In this case, I(p(t)\|q) is proportional to a difference of free energies:

\displaystyle{ I(p(t)\|q) = \frac{F(p) - F(q)}{T} }

Thus, the nonincreasing nature of I(p(t)\|q) is a version of the Second Law of Thermodynamics.

• In Section 5, we consider chemical reactions and other processes described by reaction networks. In this context we have populations P_i of entities of various kinds i \in X, and these populations evolve according to a nonlinear equation called the rate equation. We can generalize relative information from probability distributions to populations by setting

\displaystyle{ I(P\|Q) = \sum_{i \in X} P_i \ln\left(\frac{P_i}{Q_i}\right) - \left(P_i - Q_i\right) }

If Q is a special sort of steady state solution of the rate equation, called a complex balanced equilibrium, I(P(t)\|Q) can never increase when P(t) evolves according to the rate equation.

• Finally, in Section 6, we consider a class of functions called f-divergences which include relative information as a special case. For any convex function f : [0,\infty) \to [0,\infty), the f-divergence of two probability distributions p, q : X \to [0,1] is given by

\displaystyle{ I_f(p\|q) = \sum_{i \in X} q_i f\left(\frac{p_i}{q_i}\right)}

Whenever p(t) and q(t) are probability distributions evolving according to the master equation of some Markov process, I_f(p(t)\|q(t)) is nonincreasing. The f-divergence is also well-defined for populations, and nonincreasing for two populations that both evolve according to the master equation.

Regime Shift?

25 November, 2015

There’s no reason that the climate needs to change gradually. Recently scientists have become interested in regime shifts, which are abrupt, substantial and lasting changes in the state of a complex system.

Rasha Kamel of the Azimuth Project pointed us to a report in Science Daily which says:

Planet Earth experienced a global climate shift in the late 1980s on an unprecedented scale, fueled by anthropogenic warming and a volcanic eruption, according to new research. Scientists say that a major step change, or ‘regime shift,’ in Earth’s biophysical systems, from the upper atmosphere to the depths of the ocean and from the Arctic to Antarctica, was centered around 1987, and was sparked by the El Chichón volcanic eruption in Mexico five years earlier.

As always, it’s good to drill down through the science reporters’ summaries to the actual papers.  So I read this one:

• Philip C. Reid et al, Global impacts of the 1980s regime shift on the Earth’s climate and systems, Global Change Biology, 2015.

The authors of this paper analyzed 72 time series of climate and ecological data to search for such a regime shift, and found one around 1987. If such a thing really happened, this could be very important.

Here are some of the data they looked at:

Click to enlarge them—they’re pretty interesting! Vertical lines denote regime shift years, colored in different ways: 1984 blue, 1985 green, 1986 orange, 1987 red, 1988 brown, 1989 purple and so on. You can see that lots are red.

The paper has a lot of interesting and informed speculations about the cause of this shift—so give it a look.  For now I just want to tackle an important question of a more technical nature: how did they search for regime shifts?

They used the ‘STARS’ method, which stands for Sequential t-Test Analysis of Regime Shifts. They explain:

The STARS method (Rodionov, 2004; Rodionov & Overland, 2005) tests whether the end of one period (regime) of a certain length is different from a subsequent period (new regime). The cumulative sum of normalized deviations from the hypothetical mean level of the new regime is calculated, and then compared with the mean level of the preceding regime. A shift year is detected if the difference in the mean levels is statistically significant according to a Student’s t-test.

In his third paper, Rodionov (2006) shows how autocorrelation can be accounted for. From each year of the time series (except edge years), the rules are applied backwards and forwards to test that year as a potential shift year. The method is, therefore, a running procedure applied on sequences of years within the time series. The multiple STARS method used here repeats the procedure for 20 test-period lengths ranging from 6 to 25 years that are, for simplicity (after testing many variations), of the same length on either side of the regime shift.

Elsewhere I read that the STARS method is ‘too sensitive’. Could it be due to limitations of the ‘statistical significance’ idea involved in Student’s t-test?

You can download software that implements the STARS method here. The method is explained in the papers by Rodionov.

Do you know about this stuff?  If so, I’d like to hear your views on this paper and the STARS method.

Tale of a Doomed Galaxy

8 November, 2015

Part 1

About 3 billion years ago, if there was intelligent life on the galaxy we call PG 1302-102, it should have known it was in serious trouble.

Our galaxy has a supermassive black hole in the middle. But that galaxy had two. One was about ten times as big as the other. Taken together, they weighed a billion times as much as our Sun.

They gradually spiraled in towards each other… and then, suddenly, one fine morning, they collided. The resulting explosion was 10 million times more powerful than a supernova—more powerful than anything astronomers here on Earth have ever seen! It was probably enough to wipe out all life in that galaxy.

We haven’t actually seen this yet. The light and gravitational waves from the disaster are still speeding towards us. They should reach us in roughly 100,000 years. We’re not sure when.

Right now, we see the smaller black hole still orbiting the big one, once every 5 years. In fact it’s orbiting once every 4 years! But thanks to the expansion of the universe, PG 1302-102 is moving away from us so fast that time on that distant galaxy looks significantly slowed down to us.

Orbiting once every 4 years: that doesn’t sound so fast. But the smaller black hole is about 2000 times more distant from its more massive companion than Pluto is from our Sun! So in fact it’s moving at very high speed – about 1% of the speed of light. We can actually see it getting redshifted and then blueshifted as it zips around. And it will continue to speed up as it spirals in.

What exactly will happen when these black holes collide? It’s too bad we won’t live to see it. We’re far enough that it will be perfectly safe to watch from here! But the human race knows enough about physics to say quite a lot about what it will be like. And we’ve built some amazing machines to detect the gravitational waves created by collisions like this—so as time goes on, we’ll know even more.

Part 2

Even before the black holes at the heart of PG 1302-102 collided, life in that galaxy would have had a quasar to contend with!

This is a picture of Centaurus A, a much closer galaxy with a quasar in it. A quasar is huge black hole in the middle of a galaxy—a black hole that’s eating lots of stars, which rip apart and form a disk of hot gas as they spiral in. ‘Hot’ is an understatement, since this gas moves near the speed of light. It gets so hot that it pumps out intense jets of particles – from its north and south poles. Some of these particles even make it to Earth.

Any solar system in Centaurus A that gets in the way of those jets is toast.

And these jets create lots of radiation, from radio waves to X-rays. That’s how we can see quasars from billions of light years away. Quasars are the brightest objects in the universe, except for short-lived catastrophic events like the black hole collisions and gamma-ray bursts from huge dying stars.

It’s hard to grasp the size and power of such things, but let’s try. You can’t see the black hole in the middle of this picture, but it weighs 55 million times as much as our Sun. The blue glow of the jets in this picture is actually X rays. The jet at upper left is 13,000 light years long, made of particles moving at half the speed of light.

A typical quasar puts out a power of roughly 1040 watts. They vary a lot, but let’s pick this number as our ‘standard quasar’.

But what does 1040 watts actually mean? For comparison, the Sun puts out 4 x 1026 watts. So, we’re talking 30 trillion Suns. But even that’s too big a number to comprehend!

Maybe it would help to say that the whole Milky Way puts out 5 x 1036 watts. So a single quasar, at the center of one galaxy, can have the power of 2000 galaxies like ours.

Or, we can work out how much energy would be produced if the entire mass of the Moon were converted into energy. I’m getting 6 x 1039 joules. That’s a lot! But our standard quasar is putting out a bit more power than if it were converting one Moon into energy each second.

But you can’t just turn matter completely into energy: you need an equal amount of antimatter, and there’s not that much around. A quasar gets its power the old-fashioned way: by letting things fall down. In this case, fall down into a black hole.

To power our standard quasar, 10 stars need to fall into the black hole every year. The biggest quasars eat 1000 stars a year. The black hole in our galaxy gets very little to eat, so we don’t have a quasar.

There are short-lived events much more powerful than a quasar. For example, a gamma-ray burst, formed as a hypergiant star collapses into a black hole. A powerful gamma-ray burst can put out 10^44 watts for a few seconds. That’s equal to 10,000 quasars! But quasars last a long, long time.

So this was life in PG 1302-102 before things got really intense – before its two black holes spiraled into each other and collided. What was that collision like? I’ll talk about that next time.

The above picture of Centaurus A was actually made from images taken by three separate telescopes. The orange glow is submillimeter radiation – between infrared and microwaves—detected by the Atacama Pathfinder Experiment (APEX) telescope in Chile. The blue glow is X-rays seen by the Chandra X-ray Observatory. The rest is a photo taken in visible light by the Wide Field Imager on the Max-Planck/ESO 2.2 meter telescope, also located in Chile. This shows the dust lanes in the galaxy and background stars.

Part 3

What happened at the instant the supermassive black holes in the galaxy PG 1302-102 finally collided?

We’re not sure yet, because the light and gravitational waves will take time to get here. But physicists are using computers to figure out what happens when black hole collide!

Here you see some results. The red blobs are the event horizons of two black holes.

First the black holes orbit each other, closer and closer, as they lose energy by emitting gravitational radiation. This is called the ‘inspiral’ phase.

Then comes the ‘plunge’ and ‘merger’. They plunge towards each other. A thin bridge forms between them, which you see here. Then they completely merge.

Finally you get a single black hole, which oscillates and then calms down. This is called the ‘ringdown’, because it’s like a bell ringing, loudly at first and then more quietly. But instead of emitting sound, it’s emitting gravitational waves—ripples in the shape of space!

In the top picture, the black holes have the same mass: one looks smaller, but that’s because it’s farther away. In the bottom picture, the black hole at left is twice as massive.

Here’s one cool discovery. An earlier paper had argued there could be two bridges, except in very symmetrical situations. If that were true, a black hole could have the topology of a torus for a little while. But these calculations showed that – at least in the cases they looked at—there’s just one bridge.

So, you can’t have black hole doughnuts. At least not yet.

These calculations were done using free software called SpEC. But before you try to run it at home: the team that puts out this software says:

Because of the steep learning curve and complexity of SpEC, new users are typically introduced to SpEC through a collaboration with experienced SpEC users.

It probably requires a lot of computer power, too. These calculations are very hard. We know the equations; they’re just tough to solve. The first complete simulation of an inspiral, merger and ringdown was done in 2005.

The reason people want to simulate colliding black holes is not mainly to create pretty pictures, or even understand what happens to the event horizon. It’s to understand the gravitational waves they will produce! People are building better and better gravitational wave detectors—more on that later—but we still haven’t seen gravitational waves. This is not surprising: they’re very weak. To find them, we need to filter out noise. So, we need to know what to look for.

The pictures are from here:

• Michael I. Cohen and Jeffrey D. Kaplan and Mark A. Scheel, On toroidal horizons in binary black hole inspirals, Phys. Rev. D 85 (2012), 024031.

Part 4

Let’s imagine an old, advanced civilization in the doomed galaxy PG 1302-102.

Long ago they had mastered space travel. Thus, they were able to survive when their galaxy collided with another—just as ours will collide with Andromeda four billion years from now. They had a lot of warning—and so do we. The picture here shows what Andromeda will look like 250 million years before it hits.

They knew everything we do about astronomy—and more. So they knew that when galaxies collide, almost all stars sail past each other unharmed. A few planets get knocked out of orbit. Colliding clouds of gas and dust form new stars, often blue giants that live short, dramatic lives, going supernova after just 10 million years.

All this could be handled by not being in the wrong place at the wrong time. They knew the real danger came from the sleeping monsters at the heart of the colliding galaxies.

Namely, the supermassive black holes!

Almost every galaxy has a huge black hole at its center. This black hole is quiet when not being fed. But when galaxies collide, lots of gas and dust and even stars get caught by the gravity and pulled in. This material form a huge flat disk as it spirals down and heats up. The result is an active galactic nucleus.

In the worst case, the central black holes can eat thousands of stars a year. Then we get a quasar, which easily pumps out the power of 2000 ordinary galaxies.

Much of this power comes out in huge jets of X-rays. These jets keep growing, eventually stretching for hundreds of thousands of light years. The whole galaxy becomes bathed in X-rays—killing all life that’s not prepared.

Let’s imagine a civilization that was prepared. Natural selection has ways of weeding out civilizations that are bad at long-term planning. If you’re prepared, and you have the right technology, a quasar could actually be a good source of power.

But the quasar was just the start of the problem. The combined galaxy had two black holes at its center. The big one was at least 400 million times the mass of our Sun. The smaller one was about a tenth as big—but still huge.

They eventually met and started to orbit each other. By flinging stars out the way, they gradually came closer. It was slow at first, but the closer they got, the faster they circled each other, and the more gravitational waves they pumped out. This carried away more energy—so they moved closer, and circled even faster, in a dance with an insane, deadly climax.

Right now—here on Earth, where it takes a long time for the news to reach us—we see that in 100,000 years the two black holes will spiral down completely, collide and merge. When this happens, a huge pulse of gravitational waves, electromagnetic radiation, matter and even antimatter will blast through the galaxy called PG 1302-102.

I don’t know exactly what this will be like. I haven’t found papers describing this kind of event in detail.

One expert told the New York Times that the energy of this explosion will equal 100 million supernovae. I don’t think he was exaggerating. A supernova is a giant star whose core collapses as it runs out of fuel, easily turning several Earth masses of hydrogen into iron before you can say “Jack Robinson”. When it does this, it can easily pump out 1044 joules of energy. So, 100 millon supernovae is 1052 joules. By contrast, if we could convert all the mass of the black holes in PG 1302-102. into energy, we’d get about 1056 joules. So, our expert was just saying that their merger will turns 0.01% of their combined mass into energy. That seems quite reasonable to me.

But I want to know what happens then! What will the explosion do to the galaxy? Most of the energy comes out as gravitational radiation. Gravitational waves don’t interact very strongly with matter. But when they’re this strong, who knows? And of course there will be plenty of ordinary radiation, as the accretion disk gets shredded and sucked into the new combined black hole.

The civilization I’m imagining was smart enough not to stick around. They decided to simply leave the galaxy.

After all, they could tell the disaster was coming, at least a million years in advance. Some may have decided to stay and rough it out, or die a noble death. But most left.

And then what?

It takes a long time to reach another galaxy. Right now, travelling at 1% the speed of light, it would take 250 million years to reach Andromeda from here.

But they wouldn’t have to go to another galaxy. They could just back off, wait for the fireworks to die down, and move back in.

So don’t feel bad for them. I imagine they’re doing fine.

By the way, the expert I mentioned is S. George Djorgovski of Caltech, mentioned here:

• Dennis Overbye, Black holes inch ahead to violent cosmic union, New York Times, 7 January 2015.

Part 5

When distant black holes collide, they emit a burst of gravitational radiation: a ripple in the shape of space, spreading out at the speed of light. Can we detect that here on Earth? We haven’t yet. But with luck we will soon, thanks to LIGO.

LIGO stands for Laser Interferometer Gravitational Wave Observatory. The idea is simple. You shine a laser beam down two very long tubes and let it bounce back and forth between mirrors at the ends. You use this compare the length of these tubes. When a gravitational wave comes by, it stretches space in one direction and squashes it in another direction. So, we can detect it.

Sounds easy, eh? Not when you run the numbers! We’re trying to see gravitational waves that stretch space just a tiny bit: about one part in 1023. At LIGO, the tubes are 4 kilometers long. So, we need to see their length change by an absurdly small amount: one-thousandth the diameter of a proton!

It’s amazing to me that people can even contemplate doing this, much less succeed. They use lots of tricks:

• They bounce the light back and forth many times, effectively increasing the length of the tubes to 1800 kilometers.

• There’s no air in the tubes—just a very good vacuum.

• They hang the mirrors on quartz fibers, making each mirror part of a pendulum with very little friction. This means it vibrates very well at one particular frequency, and very badly at frequencies far from that. This damps out the shaking of the ground, which is a real problem.

• This pendulum is hung on another pendulum.

• That pendulum is hung on a third pendulum.

• That pendulum is hung on a fourth pendulum.

• The whole chain of pendulums is sitting on a device that detects vibrations and moves in a way to counteract them, sort of like noise-cancelling headphones.

• There are 2 of these facilities, one in Livingston, Louisiana and another in Hanford, Washington. Only if both detect a gravitational wave do we get excited.

I visited the LIGO facility in Louisiana in 2006. It was really cool! Back then, the sensitivity was good enough to see collisions of black holes and neutron stars up to 50 million light years away.

Here I’m not talking about supermassive black holes like the ones in the doomed galaxy of my story here! I’m talking about the much more common black holes and neutron stars that form when stars go supernova. Sometimes a pair of stars orbiting each other will both blow up, and form two black holes—or two neutron stars, or a black hole and neutron star. And eventually these will spiral into each other and emit lots of gravitational waves right before they collide.

50 million light years is big enough that LIGO could see about half the galaxies in the Virgo Cluster. Unfortunately, with that many galaxies, we only expect to see one neutron star collision every 50 years or so.

They never saw anything. So they kept improving the machines, and now we’ve got Advanced LIGO! This should now be able to see collisions up to 225 million light years away… and after a while, three times further.

They turned it on September 18th. Soon we should see more than one gravitational wave burst each year.

In fact, there’s a rumor that they’ve already seen one! But they’re still testing the device, and there’s a team whose job is to inject fake signals, just to see if they’re detected. Davide Castelvecchi writes:

LIGO is almost unique among physics experiments in practising ‘blind injection’. A team of three collaboration members has the ability to simulate a detection by using actuators to move the mirrors. “Only they know if, and when, a certain type of signal has been injected,” says Laura Cadonati, a physicist at the Georgia Institute of Technology in Atlanta who leads the Advanced LIGO’s data-analysis team.

Two such exercises took place during earlier science runs of LIGO, one in 2007 and one in 2010. Harry Collins, a sociologist of science at Cardiff University, UK, was there to document them (and has written books about it). He says that the exercises can be valuable for rehearsing the analysis techniques that will be needed when a real event occurs. But the practice can also be a drain on the team’s energies. “Analysing one of these events can be enormously time consuming,” he says. “At some point, it damages their home life.”

The original blind-injection exercises took 18 months and 6 months respectively. The first one was discarded, but in the second case, the collaboration wrote a paper and held a vote to decide whether they would make an announcement. Only then did the blind-injection team ‘open the envelope’ and reveal that the events had been staged.

Aargh! The disappointment would be crushing.

But with luck, Advanced LIGO will soon detect real gravitational waves. And I hope life here in the Milky Way thrives for a long time – so that when the gravitational waves from the doomed galaxy PG 1302-102 reach us, hundreds of thousands of years in the future, we can study them in exquisite detail.

For Castelvecchi’s whole story, see:

• Davide Castelvecchi Has giant LIGO experiment seen gravitational waves?, Nature, 30 September 2015.

For pictures of my visit to LIGO, see:

• John Baez, This week’s finds in mathematical physics (week 241), 20 November 2006.

For how Advanced LIGO works, see:

• The LIGO Scientific Collaboration Advanced LIGO, 17 November 2014.


To see where the pictures are from, click on them. For more, try this:

• Ravi Mandalia, Black hole binary entangled by gravity progressing towards deadly merge.

The picture of Andromeda in the nighttime sky 3.75 billion years from now was made by NASA. You can see a whole series of these pictures here:

• NASA, NASA’s Hubble shows Milky Way is destined for head-on collision, 31 March 2012.

Let’s get ready! For starters, let’s deal with global warming.

Fires in Indonesia

2 November, 2015

I lived in Singapore for two years, and I go back to work there every summer. I love Southeast Asia, its beautiful landscapes, its friendly people, and its huge biological and cultural diversity. It’s a magical place.

But in 2013 there was a horrible haze from fires in nearby Sumatra. And this year it’s even worse. It makes me want to cry, thinking about how millions of people all over this region are being choked as the rain forest burns.

This part of the world has a dry season from May to October and then a wet season. In the dry season, Indonesian farmers slash down jungle growth, burn it, and plant crops. That is nothing new.

But now, palm oil plantations run by big companies do this on a massive scale. Jungles are disappearing at an astonishing rate. Some of this is illegal, but corrupt government officials are paid to look the other way. Whenever we buy palm oil—in soap, cookies, bread, margarine, detergents, and many other products—we become part of the problem.

This year the fires are worse. One reason is that we’re having an El Niño. That typically means more rain in California—which we desperately need. But it means less rain in Southeast Asia.

This summer it was very dry in Singapore. Then, in September, the haze started. We got used to rarely seeing the sun—only yellow-brown light filtering through the smoke. When it stinks outside, you try to stay indoors.

When I left on September 19th, the PSI index of air pollution had risen above 200, which is ‘very unhealthy’. Singapore had offered troops to help fight the fires, but Indonesia turned down the offer, saying they could handle the situation themselves. That was completely false: thousands of fires were burning out of control in Sumatra, Borneo and other Indonesian islands.

I believe the Indonesian government just didn’t want foreign troops out their land. Satellites could detect the many hot spots where fires were burning. But outrageously, the government refused to say who owned those lands.

A few days after I left, the PSI index in Singapore had shot above 300, which is ‘hazardous’. But in parts of Borneo the PSI had reached 1,986. The only name for that is hell.

By now Indonesia has accepted help from Singapore. Thanks to changing winds, the PSI in Singapore has been slowly dropping throughout October. In the last few days the rainy season has begun. Each time the rain clears the air, Singaporeans can see something beautiful and almost forgotten: a blue sky.

Rain is also helping in Borneo. But the hellish fires continue. There have been over 100,000 individual fires—mostly in Sumatra, Borneo and Papua. In many places, peat in the ground has caught on fire! It’s very hard to put out a peat fire.

If you care about the Earth, this is very disheartening. These fires have been putting over 15 million tons of carbon dioxide into the air per day – more than the whole US economy! And so far this year they’ve put out 1.5 billion tons of CO2. That’s more than Germany’s carbon emissions for the whole year—in fact, even more than Japan’s. How can we make progress on reducing carbon emissions with this going on?

For you and me, the first thing is to stop buying products with palm oil. The problem is largely one of government corruption driven by money from palm oil plantations. But the real heart of the problem lies in Indonesia. Luckily Widodo, the president of this country, may be part of the solution. But the solution will be difficult.

Quoting National Public Radio:

Widodo is Indonesia’s first president with a track record of efficient local governance in running two large cities. Strong action on the haze issue could help fulfill the promise of reform that motivated Indonesian voters to put him in office in October 2014.

The president has deployed thousands of firefighters and accepted international assistance. He has ordered a moratorium on new licenses to use peat land and ordered law enforcers to prosecute people and companies who clear land by burning forests.

“It must be stopped, we mustn’t allow our tropical rainforests to disappear because of monoculture plantations like oil palms,” Widodo said early in his administration.

Land recently burned and planted with palm trees is now under police investigation in Kalimantan [the Indonesian part of Borneo].

The problem of Indonesia’s illegal forest fires is so complex that it’s very hard to say exactly who is responsible for causing it.

Indonesia’s government has blamed both big palm oil companies and small freeholders. Poynton [executive director of the Forest Trust] says the culprits are often mid-sized companies with strong ties to local politicians. He describes them as lawless middlemen who pay local farmers to burn forests and plant oil palms, often on other companies’ concessions.

“There are these sort of low-level, Mafioso-type guys that basically say, ‘You get in there and clear the land, and I’ll then finance you to establish a palm oil plantation,'” he says.

The problem is exacerbated by ingrained government corruption, in which politicians grant land use permits for forests and peat lands to agribusiness in exchange for financial and political support.

“The disaster is not in the fires,” says independent Jakarta-based commentator Wimar Witoelar. “It’s in the way that past Indonesian governments have colluded with big palm oil businesses to make the peat lands a recipe for disaster.”

The quote is from here:

• Anthony Kuhn, As Indonesia’s annual fires rage, plenty of blame but no responsibility.

For how to avoid using palm oil, see for example:

• Lael Goodman, How many products with palm oil do I use in a day?

First, avoid processed foods. That’s smart for other reasons too.

Second, avoid stuff that contains stearic acid, sodium palmitate, sodium laureth sulfate, cetyl alcohol, glyceryl stearate and related compounds—various forms of artificial grease that are often made from palm oil. It takes work to avoid all this stuff, but at least be aware of it. These chemicals are not made in laboratories from pure carbon, hydrogen, oxygen and nitrogen! The raw ingredients often come from palm plantations, huge monocultures that are replacing the wonderful diversity of rainforest life.

For more nuanced suggestions, see the comments below. Right now I’m just so disgusted that I want to avoid palm oil.

For data on the carbon emissions of this and other fires, see:

Global fire emissions data.

1997 was the last really big El Niño.

This shows a man in Malaysia in September. Click on the pictures for more details. The picture at top shows a woman named a woman named Gaye Thavisin in Indonesia—perhaps in Kalimantan, the Indonesian half of Borneo, the third largest island in the world. Here is a bit of her story:

The Jungle River Cruise is run by Kalimantan Tour Destinations a foreign owned company set up by two women pioneering the introduction of ecotourism into a part of Central Kalimantan that to date has virtually no tourism.

Inspired by the untapped potential of Central Kalimantan’s mighty rivers, Gaye Thavisin and Lorna Dowson-Collins converted a traditional Kalimantan barge into a comfortable cruise boat with five double cabins, an inside sitting area and a upper viewing deck, bringing the first jungle cruises to the area.

Originally Lorna Dowson-Collins worked in Central Kalimantan with a local NGO on a sustainable livelihoods programme. The future livelihoods of the local people were under threat as logging left the land devastated with poor soils and no forest to fend from.

Kalimantan was teeming with the potential of her people and their fascinating culture, with beautiful forests of diverse flora and fauna, including the iconic orang-utan, and her mighty rivers providing access to these wonderful treasures.

An idea for a social enterprise emerged , which involved building a boat to journey guests to inaccessible places and provide comfortable accommodation.

Gaye Thavisin, an Australian expatriate, for 4 years operated an attractive, new hotel 36 km out of Palangkaraya in Kalimantan. Gaye was passionate about developing the tourism potential of Central Kalimantan and was also looking at the idea of boats. With her contract at the hotel coming to an end, the Jungle Cruise began to take shape!

Biology, Networks and Control Theory

13 September, 2015

The Institute for Mathematics and its Applications (or IMA, in Minneapolis, Minnesota), is teaming up with the Mathematical Biosciences Institute (or MBI, in Columbus, Ohio). They’re having a big program on control theory and networks:

IMA-MBI coordinated program on network dynamics and control: Fall 2015 and Spring 2016.

MBI emphasis semester on dynamics of biologically inspired networks: Spring 2016.

At the IMA

Here’s what’s happening at the Institute for Mathematics and its Applications:

Concepts and techniques from control theory are becoming increasingly interdisciplinary. At the same time, trends in modern control theory are influenced and inspired by other disciplines. As a result, the systems and control community is rapidly broadening its scope in a variety of directions. The IMA program is designed to encourage true interdisciplinary research and the cross fertilization of ideas. An important element for success is that ideas flow across disciplines in a timely manner and that the cross-fertilization takes place in unison.

Due to the usefulness of control, talent from control theory is drawn and often migrates to other important areas, such as biology, computer science, and biomedical research, to apply its mathematical tools and expertise. It is vital that while the links are strong, we bring together researchers who have successfully bridged into other disciplines to promote the role of control theory and to focus on the efforts of the controls community. An IMA investment in this area will be a catalyst for many advances and will provide the controls community with a cohesive research agenda.

In all topics of the program the need for research is pressing. For instance, viable implementations of control algorithms for smart grids are an urgent and clearly recognized need with considerable implications for the environment and quality of life. The mathematics of control will undoubtedly influence technology and vice-versa. The urgency for these new technologies suggests that the greatest impact of the program is to have it sooner rather than later.

First trimester (Fall 2015): Networks, whether social, biological, swarms of animals or vehicles, the Internet, etc., constitute an increasingly important subject in science and engineering. Their connectivity and feedback pathways affect robustness and functionality. Such concepts are at the core of a new and rapidly evolving frontier in the theory of dynamical systems and control. Embedded systems and networks are already pervasive in automotive, biological, aerospace, and telecommunications technologies and soon are expected to impact the power infrastructure (smart grids). In this new technological and scientific realm, the modeling and representation of systems, the role of feedback, and the value and cost of information need to be re-evaluated and understood. Traditional thinking that is relevant to a limited number of feedback loops with practically unlimited bandwidth is no longer applicable. Feedback control and stability of network dynamics is a relatively new endeavor. Analysis and control of network dynamics will occupy mostly the first trimester while applications to power networks will be a separate theme during the third trimester. The first trimester will be divided into three workshops on the topics of analysis of network dynamics and regulation, communication and cooperative control over networks, and a separate one on biological systems and networks.

The second trimester (Winter 2016) will have two workshops. The first will be on modeling and estimation (Workshop 4) and the second one on distributed parameter systems and partial differential equations (Workshop 5). The theme of Workshop 4 will be on structure and parsimony in models. The goal is to explore recent relevant theories and techniques that allow sparse representations, rank constrained optimization, and structural constraints in models and control designs. Our intent is to blend a group of researchers in the aforementioned topics with a select group of researchers with interests in a statistical viewpoint. Workshop 5 will focus on distributed systems and related computational issues. One of our aims is to bring control theorists with an interest in optimal control of distributed parameter systems together with mathematicians working on optimal transport theory (in essence an optimal control problem). The subject of optimal transport is rapidly developing with ramifications in probability and statistics (of essence in system modeling and hence of interest to participants in Workshop 4 as well) and in distributed control of PDE’s. Emphasis will also be placed on new tools and new mathematical developments (in PDE’s, computational methods, optimization). The workshops will be closely spaced to facilitate participation in more than one.

The third trimester (Spring 2016) will target applications where the mathematics of systems and control may soon prove to have a timely impact. From the invention of atomic force microscopy at the nanoscale to micro-mirror arrays for a next generation of telescopes, control has played a critical role in sensing and imaging of challenging new realms. At present, thanks to recent technological advances of AFM and optical tweezers, great strides are taking place making it possible to manipulate the biological transport of protein molecules as well as the control of individual atoms. Two intertwined subject areas, quantum and nano control and scientific instrumentation, are seen to blend together (Workshop 6) with partial focus on the role of feedback control and optimal filtering in achieving resolution and performance at such scales. A second theme (Workshop 7) will aim at control issues in distributed hybrid systems, at a macro scale, with a specific focus the “smart grid” and energy applications.

For more information on individual workshops, go here:

• Workshop 1, Distributed Control and Decision Making Over Networks, 28 September – 2 October 2015.

• Workshop 2, Analysis and Control of Network Dynamics, 19-23 October 2015.

• Workshop 3, Biological Systems and Networks, 11-16 November 2015.

• Workshop 4, Optimization and Parsimonious Modeling, 25-29 January 2016.

• Workshop 5, Computational Methods for Control of Infinite-dimensional Systems, 14-18 March 2016.

• Workshop 6, Quantum and Nano Control, 11-15 April 2016.

• Workshop 7, Control at Large Scales: Energy Markets and Responsive Grids, 9-13 March 2016.

At the MBI

Here’s what’s going on at the Mathematical Biology Institute:

The MBI network program is part of a yearlong cooperative program with IMA.

Networks and deterministic and stochastic dynamical systems on networks are used as models in many areas of biology. This underscores the importance of developing tools to understand the interplay between network structures and dynamical processes, as well as how network dynamics can be controlled. The dynamics associated with such models are often different from what one might traditionally expect from a large system of equations, and these differences present the opportunity to develop exciting new theories and methods that should facilitate the analysis of specific models. Moreover, a nascent area of research is the dynamics of networks in which the networks themselves change in time, which occurs, for example, in plasticity in neuroscience and in up regulation and down regulation of enzymes in biochemical systems.

There are many areas in biology (including neuroscience, gene networks, and epidemiology) in which network analysis is now standard. Techniques from network science have yielded many biological insights in these fields and their study has yielded many theorems. Moreover, these areas continue to be exciting areas that contain both concrete and general mathematical problems. Workshop 1 explores the mathematics behind the applications in which restrictions on general coupled systems are important. Examples of such restrictions include symmetry, Boolean dynamics, and mass-action kinetics; and each of these special properties permits the proof of theorems about dynamics on these special networks.

Workshop 2 focuses on the interplay between stochastic and deterministic behavior in biological networks. An important related problem is to understand how stochasticity affects parameter estimation. Analyzing the relationship between stochastic changes, network structure, and network dynamics poses mathematical questions that are new, difficult, and fascinating.

In recent years, an increasing number of biological systems have been modeled using networks whose structure changes in time or which use multiple kinds of couplings between the same nodes or couplings that are not just pairwise. General theories such as groupoids and hypergraphs have been developed to handle the structure in some of these more general coupled systems, and specific application models have been studied by simulation. Workshop 3 will bring together theorists, modelers, and experimentalists to address the modeling of biological systems using new network structures and the analysis of such structures.

Biological systems use control to achieve desired dynamics and prevent undesirable behaviors. Consequently, the study of network control is important both to reveal naturally evolved control mechanisms that underlie the functioning of biological systems and to develop human-designed control interventions to recover lost function, mitigate failures, or repurpose biological networks. Workshop 4 will address the challenging subjects of control and observability of network dynamics.


Workshop 1: Dynamics in Networks with Special Properties, 25-29 January, 2016.

Workshop 2: The Interplay of Stochastic and Deterministic Dynamics in Networks, 22-26 February, 2016.

Workshop 3: Generalized Network Structures and Dynamics, 21-15 March, 2016.

Workshop 4: Control and Observability of Network Dynamics, 11-15 April, 2016.

You can get more schedule information on these posters:

A Compositional Framework for Markov Processes

4 September, 2015


This summer my students Brendan Fong and Blake Pollard visited me at the Centre for Quantum Technologies, and we figured out how to understand open continuous-time Markov chains! I think this is a nice step towards understanding the math of living systems.

Admittedly, it’s just a small first step. But I’m excited by this step, since Blake and I have been trying to get this stuff to work for a couple years, and it finally fell into place. And we think we know what to do next.

Here’s our paper:

• John C. Baez, Brendan Fong and Blake S. Pollard, A compositional framework for open Markov processes.

And here’s the basic idea…

Open detailed balanced Markov processes

A continuous-time Markov chain is a way to specify the dynamics of a population which is spread across some finite set of states. Population can flow between the states. The larger the population of a state, the more rapidly population flows out of the state. Because of this property, under certain conditions the populations of the states tend toward an equilibrium where at any state the inflow of population is balanced by its outflow.

In applications to statistical mechanics, we are often interested in equilibria such that for any two states connected by an edge, say i and j, the flow from i to j equals the flow from j to i. A continuous-time Markov chain with a chosen equilibrium having this property is called ‘detailed balanced‘.

I’m getting tired of saying ‘continuous-time Markov chain’, so from now on I’ll just say ‘Markov process’, just because it’s shorter. Okay? That will let me say the next sentence without running out of breath:

Our paper is about open detailed balanced Markov processes.

Here’s an example:

The detailed balanced Markov process itself consists of a finite set of states together with a finite set of edges between them, with each state i labelled by an equilibrium population q_i >0, and each edge e labelled by a rate constant r_e > 0.

These populations and rate constants are required to obey an equation called the ‘detailed balance condition’. This equation means that in equilibrium, the flow from i to j equal the flow from j to i. Do you see how it works in this example?

To get an ‘open’ detailed balanced Markov process, some states are designated as inputs or outputs. In general each state may be specified as both an input and an output, or as inputs and outputs multiple times. See how that’s happening in this example? It may seem weird, but it makes things work better.

People usually say Markov processes are all about how probabilities flow from one state to another. But we work with un-normalized probabilities, which we call ‘populations’, rather than probabilities that must sum to 1. The reason is that in an open Markov process, probability is not conserved: it can flow in or out at the inputs and outputs. We allow it to flow both in and out at both the input states and the output states.

Our most fundamental result is that there’s a category \mathrm{DetBalMark} where a morphism is an open detailed balanced Markov process. We think of it as a morphism from its inputs to its outputs.

We compose morphisms in \mathrm{DetBalMark} by identifying the output states of one open detailed balanced Markov process with the input states of another. The populations of identified states must match. For example, we may compose this morphism N:

with the previously shown morphism M to get this morphism M \circ N:

And here’s our second most fundamental result: the category \mathrm{DetBalMark} is actually a dagger compact category. This lets us do other stuff with open Markov processes. An important one is ‘tensoring’, which lets us take two open Markov processes like M and N above and set them side by side, giving M \otimes N:

The so-called compactness is also important. This means we can take some inputs of an open Markov process and turn them into outputs, or vice versa. For example, using the compactness of \mathrm{DetBalMark} we can get this open Markov process from M:

In fact all the categories in our paper are dagger compact categories, and all our functors preserve this structure. Dagger compact categories are a well-known framework for describing systems with inputs and outputs, so this is good.

The analogy to electrical circuits

In a detailed balanced Markov process, population can flow along edges. In the detailed balanced equilibrium, without any flow of population from outside, the flow along from state i to state j will be matched by the flow back from j to i. The populations need to take specific values for this to occur.

In an electrical circuit made of linear resistors, charge can flow along wires. In equilibrium, without any driving voltage from outside, the current along each wire will be zero. The potentials will be equal at every node.

This sets up an analogy between detailed balanced continuous-time Markov chains and electrical circuits made of linear resistors! I love analogy charts, so this makes me very happy:

    Circuits    Detailed balanced Markov processes
potential population
current flow
conductance rate constant
power dissipation

This analogy is already well known. Schnakenberg used it in his book Thermodynamic Network Analysis of Biological Systems. So, our main goal is to formalize and exploit it. This analogy extends from systems in equilibrium to the more interesting case of nonequilibrium steady states, which are the main topic of our paper.

Earlier, Brendan and I introduced a way to ‘black box’ a circuit and define the relation it determines between potential-current pairs at the input and output terminals. This relation describes the circuit’s external behavior as seen by an observer who can only perform measurements at the terminals.

An important fact is that black boxing is ‘compositional’: if one builds a circuit from smaller pieces, the external behavior of the whole circuit can be determined from the external behaviors of the pieces. For category theorists, this means that black boxing is a functor!

Our new paper with Blake develops a similar ‘black box functor’ for detailed balanced Markov processes, and relates it to the earlier one for circuits.

When you black box a detailed balanced Markov process, you get the relation between population–flow pairs at the terminals. (By the ‘flow at a terminal’, we more precisely mean the net population outflow.) This relation holds not only in equilibrium, but also in any nonequilibrium steady state. Thus, black boxing an open detailed balanced Markov process gives its steady state dynamics as seen by an observer who can only measure populations and flows at the terminals.

The principle of minimum dissipation

At least since the work of Prigogine, it’s been widely accepted that a large class of systems minimize entropy production in a nonequilibrium steady state. But people still fight about the the precise boundary of this class of systems, and even the meaning of this ‘principle of minimum entropy production’.

For detailed balanced open Markov processes, we show that a quantity we call the ‘dissipation’ is minimized in any steady state. This is a quadratic function of the populations and flows, analogous to the power dissipation of a circuit made of resistors. We make no claim that this quadratic function actually deserves to be called ‘entropy production’. Indeed, Schnakenberg has convincingly argued that they are only approximately equal.

But still, the ‘dissipation’ function is very natural and useful—and Prigogine’s so-called ‘entropy production’ is also a quadratic function.

Black boxing

I’ve already mentioned the category \mathrm{DetBalMark}, where a morphism is an open detailed balanced Markov process. But our paper needs two more categories to tell its story! There’s the category of circuits, and the category of linear relations.

A morphism in the category \mathrm{Circ} is an open electrical circuit made of resistors: that is, a graph with each edge labelled by a ‘conductance’ c_e > 0, together with specified input and output nodes:

A morphism in the category \mathrm{LinRel} is a linear relation L : U \leadsto V between finite-dimensional real vector spaces U and V. This is nothing but a linear subspace L \subseteq U \oplus V. Just as relations generalize functions, linear relations generalize linear functions!

In our previous paper, Brendan and I introduced these two categories and a functor between them, the ‘black box functor’:

\blacksquare : \mathrm{Circ} \to \mathrm{LinRel}

The idea is that any circuit determines a linear relation between the potentials and net current flows at the inputs and outputs. This relation describes the behavior of a circuit of resistors as seen from outside.

Our new paper introduces a black box functor for detailed balanced Markov processes:

\square : \mathrm{DetBalMark} \to \mathrm{LinRel}

We draw this functor as a white box merely to distinguish it from the other black box functor. The functor \square maps any detailed balanced Markov process to the linear relation obeyed by populations and flows at the inputs and outputs in a steady state. In short, it describes the steady state behavior of the Markov process ‘as seen from outside’.

How do we manage to black box detailed balanced Markov processes? We do it using the analogy with circuits!

The analogy becomes a functor

Every analogy wants to be a functor. So, we make the analogy between detailed balanced Markov processes and circuits precise by turning it into a functor:

K : \mathrm{DetBalMark} \to \mathrm{Circ}

This functor converts any open detailed balanced Markov process into an open electrical circuit made of resistors. This circuit is carefully chosen to reflect the steady-state behavior of the Markov process. Its underlying graph is the same as that of the Markov process. So, the ‘states’ of the Markov process are the same as the ‘nodes’ of the circuit.

Both the equilibrium populations at states of the Markov process and the rate constants labelling edges of the Markov process are used to compute the conductances of edges of this circuit. In the simple case where the Markov process has exactly one edge from any state i to any state j, the rule is this:

C_{i j} = H_{i j} q_j


q_j is the equilibrium population of the jth state of the Markov process,

H_{i j} is the rate constant for the edge from the jth state to the ith state of the Markov process, and

C_{i j} is the conductance (that is, the reciprocal of the resistance) of the wire from the jth node to the ith node of the resulting circuit.

The detailed balance condition for Markov processes says precisely that the matrix C_{i j} is symmetric! This is just right for an electrical circuit made of resistors, since it means that the resistance of the wire from node i to node j equals the resistance of the same wire in the reverse direction, from node j to node i.

A triangle of functors

If you paid careful attention, you’ll have noticed that I’ve described a triangle of functors:

And if you know anything about how category theorists think, you’ll be wondering if this diagram commutes.

In fact, this triangle of functors does not commute! However, a general lesson of category theory is that we should only expect diagrams of functors to commute up to natural isomorphism, and this is what happens here:

The natural transformation \alpha ‘corrects’ the black box functor for resistors to give the one for detailed balanced Markov processes.

The functors \square and \blacksquare \circ K are actually equal on objects. An object in \mathrm{DetBalMark} is a finite set X with each element i \in X labelled a positive populations q_i. Both functors map this object to the vector space \mathbb{R}^X \oplus \mathbb{R}^X. For the functor \square, we think of this as a space of population-flow pairs. For the functor \blacksquare \circ K, we think of it as a space of potential-current pairs. The natural transformation \alpha then gives a linear relation

\alpha_{X,q} : \mathbb{R}^X \oplus \mathbb{R}^X \leadsto \mathbb{R}^X \oplus \mathbb{R}^X

in fact an isomorphism of vector spaces, which converts potential-current pairs into population-flow pairs in a manner that depends on the q_i. I’ll skip the formula; it’s in the paper.

But here’s the key point. The naturality of \alpha actually allows us to reduce the problem of computing the functor \square to the problem of computing \blacksquare. Suppose

M: (X,q) \to (Y,r)

is any morphism in \mathrm{DetBalMark}. The object (X,q) is some finite set X labelled by populations q, and (Y,r) is some finite set Y labelled by populations r. Then the naturality of \alpha means that this square commutes:

Since \alpha_{X,q} and \alpha_{Y,r} are isomorphisms, we can solve for the functor \square as follows:

\square(M) = \alpha_Y \circ \blacksquare K(M) \circ \alpha_X^{-1}

This equation has a clear intuitive meaning! It says that to compute the behavior of a detailed balanced Markov process, namely \square(f), we convert it into a circuit made of resistors and compute the behavior of that, namely \blacksquare K(f). This is not equal to the behavior of the Markov process, but we can compute that behavior by converting the input populations and flows into potentials and currents, feeding them into our circuit, and then converting the outputs back into populations and flows.

What we really do

So that’s a sketch of what we do, and I hope you ask questions if it’s not clear. But I also hope you read our paper! Here’s what we actually do in there. After an introduction and summary of results:

• Section 3 defines open Markov processes and the open master equation.

• Section 4 introduces detailed balance for open Markov

• Section 5 recalls the principle of minimum power
for open circuits made of linear resistors, and explains how to black box them.

• Section 6 introduces the principle of minimum dissipation for open detailed balanced Markov processes, and describes how to black box these.

• Section 7 states the analogy between circuits and detailed balanced Markov processes in a formal way.

• Section 8 describes how to compose open Markov processes, making them into the morphisms of a category.

• Section 9 does the same for detailed balanced Markov processes.

• Section 10 describes the ‘black box functor’ that sends any open detailed balanced Markov process to the linear relation describing its external behavior, and recalls the similar functor for circuits.

• Section 11 makes the analogy between between open detailed balanced Markov processes and open circuits even more formal, by making it into a functor. We prove that together with the two black box functors, this forms a triangle that commutes up to natural isomorphism.

• Section 12 is about geometric aspects of this theory. We show that the linear relations in the image of these black box functors are Lagrangian relations between symplectic vector spaces. We also show that the master equation can be seen as a gradient flow equation.

• Section 13 is a summary of what we have learned.

Finally, Appendix A is a quick tutorial on decorated cospans. This is a key mathematical tool in our work, developed by Brendan in an earlier paper.

The Inverse Cube Force Law

30 August, 2015

Here you see three planets. The blue planet is orbiting the Sun in a realistic way: it’s going around an ellipse.

The other two are moving in and out just like the blue planet, so they all stay on the same circle. But they’re moving around this circle at different rates! The green planet is moving faster than the blue one: it completes 3 orbits each time the blue planet goes around once. The red planet isn’t going around at all: it only moves in and out.

What’s going on here?

In 1687, Isaac Newton published his Principia Mathematica. This book is famous, but in Propositions 43–45 of Book I he did something that people didn’t talk about much—until recently. He figured out what extra force, besides gravity, would make a planet move like one of these weird other planets. It turns out an extra force obeying an inverse cube law will do the job!

Let me make this more precise. We’re only interested in ‘central forces’ here. A central force is one that only pushes a particle towards or away from some chosen point, and only depends on the particle’s distance from that point. In Newton’s theory, gravity is a central force obeying an inverse square law:

F(r) = - \displaystyle{ \frac{a}{r^2} }

for some constant a. But he considered adding an extra central force obeying an inverse cube law:

F(r) = - \displaystyle{ \frac{a}{r^2} + \frac{b}{r^3} }

He showed that if you do this, for any motion of a particle in the force of gravity you can find a motion of a particle in gravity plus this extra force, where the distance r(t) is the same, but the angle \theta(t) is not.

In fact Newton did more. He showed that if we start with any central force, adding an inverse cube force has this effect.

There’s a very long page about this on Wikipedia:

Newton’s theorem of revolving orbits, Wikipedia.

I haven’t fully understood all of this, but it instantly makes me think of three other things I know about the inverse cube force law, which are probably related. So maybe you can help me figure out the relationship.

The first, and simplest, is this. Suppose we have a particle in a central force. It will move in a plane, so we can use polar coordinates r, \theta to describe its position. We can describe the force away from the origin as a function F(r). Then the radial part of the particle’s motion obeys this equation:

\displaystyle{ m \ddot r = F(r) + \frac{L^2}{mr^3} }

where L is the magnitude of particle’s angular momentum.

So, angular momentum acts to provide a ‘fictitious force’ pushing the particle out, which one might call the centrifugal force. And this force obeys an inverse cube force law!

Furthermore, thanks to the formula above, it’s pretty obvious that if you change L but also add a precisely compensating inverse cube force, the value of \ddot r will be unchanged! So, we can set things up so that the particle’s radial motion will be unchanged. But its angular motion will be different, since it has a different angular momentum. This explains Newton’s observation.

It’s often handy to write a central force in terms of a potential:

F(r) = -V'(r)

Then we can make up an extra potential responsible for the centrifugal force, and combine it with the actual potential V into a so-called effective potential:

\displaystyle{ U(r) = V(r) + \frac{L^2}{2mr^2} }

The particle’s radial motion then obeys a simple equation:

\ddot{r} = - U'(r)

For a particle in gravity, where the force obeys an inverse square law and V is proportional to -1/r, the effective potential might look like this:

This is the graph of

\displaystyle{ U(r) = -\frac{4}{r} + \frac{1}{r^2} }

If you’re used to particles rolling around in potentials, you can easily see that a particle with not too much energy will move back and forth, never making it to r = 0 or r = \infty. This corresponds to an elliptical orbit. Give it more energy and the particle can escape to infinity, but it will never hit the origin. The repulsive ‘centrifugal force’ always overwhelms the attraction of gravity near the origin, at least if the angular momentum is nonzero.

On the other hand, suppose we have a particle moving in an attractive inverse cube force! Then the potential is proportional to 1/r^2, so the effective potential is

\displaystyle{ U(r) = \frac{c}{r^2} + \frac{L^2}{mr^2} }

where c is negative for an attractive force. If this attractive force is big enough, namely

\displaystyle{ c < -\frac{L^2}{m} }

then this force can exceed the centrifugal force, and the particle can fall in to r = 0.

If we keep track of the angular coordinate \theta, we can see what’s really going on. The particle is spiraling in to its doom, hitting the origin in a finite amount of time!

This should remind you of a black hole, and indeed something similar happens there, but even more drastic:

Schwarzschild geodesics: effective radial potential energy, Wikipedia.

For a nonrotating uncharged black hole, the effective potential has three terms. Like Newtonian gravity it has an attractive -1/r term and a repulsive 1/r^2 term. But it also has an attractive term -1/r^3 term! In other words, it’s as if on top of Newtonian gravity, we had another attractive force obeying an inverse fourth power law! This overwhelms the others at short distances, so if you get too close to a black hole, you spiral in to your doom.

For example, a black hole can have an effective potential like this:

But back to inverse cube force laws! I know two more things about them. A while back I discussed how a particle in an inverse square force can be reinterpreted as a harmonic oscillator:

Planets in the fourth dimension, Azimuth.

There are many ways to think about this, and apparently the idea in some form goes all the way back to Newton! It involves a sneaky way to take a particle in a potential

\displaystyle{ V(r) \propto r^{-1} }

and think of it as moving around in the complex plane. Then if you square its position—thought of as a complex number—and cleverly reparametrize time, you get a particle moving in a potential

\displaystyle{ V(r) \propto r^2 }

This amazing trick can be generalized! A particle in a potential

\displaystyle{ V(r) \propto r^p }

can transformed to a particle in a potential

\displaystyle{ V(r) \propto r^q }


(p+2)(q+2) = 4

A good description is here:

• Rachel W. Hall and Krešimir Josić, Planetary motion and the duality of force laws, SIAM Review 42 (2000), 115–124.

This trick transforms particles in r^p potentials with p ranging between -2 and +\infty to r^q potentials with q ranging between +\infty and -2. It’s like a see-saw: when p is small, q is big, and vice versa.

But you’ll notice this trick doesn’t actually work at p = -2, the case that corresponds to the inverse cube force law. The problem is that p + 2 = 0 in this case, so we can’t find q with (p+2)(q+2) = 4.

So, the inverse cube force is special in three ways: it’s the one that you can add on to any force to get solutions with the same radial motion but different angular motion, it’s the one that naturally describes the ‘centrifugal force’, and it’s the one that doesn’t have a partner! We’ve seen how the first two ways are secretly the same. I don’t know about the third, but I’m hopeful.

Quantum aspects

Finally, here’s a fourth way in which the inverse cube law is special. This shows up most visibly in quantum mechanics… and this is what got me interested in this business in the first place.

You see, I’m writing a paper called ‘Struggles with the continuum’, which discusses problems in analysis that arise when you try to make some of our favorite theories of physics make sense. The inverse square force law poses interesting problems of this sort, which I plan to discuss. But I started wanting to compare the inverse cube force law, just so people can see things that go wrong in this case, and not take our successes with the inverse square law for granted.

Unfortunately a huge digression on the inverse cube force law would be out of place in that paper. So, I’m offloading some of that material to here.

In quantum mechanics, a particle moving in an inverse cube force law has a Hamiltonian like this:

H = -\nabla^2 + c r^{-2}

The first term describes the kinetic energy, while the second describes the potential energy. I’m setting \hbar = 1 and 2m = 1 to remove some clutter that doesn’t really affect the key issues.

To see how strange this Hamiltonian is, let me compare an easier case. If p < 2, the Hamiltonian

H = -\nabla^2 + c r^{-p}

is essentially self-adjoint on C_0^\infty(\mathbb{R}^3 - \{0\}), which is the space of compactly supported smooth functions on 3d Euclidean space minus the origin. What this means is that first of all, H is defined on this domain: it maps functions in this domain to functions in L^2(\mathbb{R}^3). But more importantly, it means we can uniquely extend H from this domain to a self-adjoint operator on some larger domain. In quantum physics, we want our Hamiltonians to be self-adjoint. So, this fact is good.

Proving this fact is fairly hard! It uses something called the Kato–Lax–Milgram–Nelson theorem together with this beautiful inequality:

\displaystyle{ \int_{\mathbb{R}^3} \frac{1}{4r^2} |\psi(x)|^2 \,d^3 x \le \int_{\mathbb{R}^3} |\nabla \psi(x)|^2 \,d^3 x }

for any \psi\in C_0^\infty(\mathbb{R}^3).

If you think hard, you can see this inequality is actually a fact about the quantum mechanics of the inverse cube law! It says that if c \ge -1/4, the energy of a quantum particle in the potential c r^{-2} is bounded below. And in a sense, this inequality is optimal: if c < -1/4, the energy is not bounded below. This is the quantum version of how a classical particle can spiral in to its doom in an attractive inverse cube law, if it doesn’t have enough angular momentum. But it’s subtly and mysteriously different.

You may wonder how this inequality is used to prove good things about potentials that are ‘less singular’ than the c r^{-2} potential: that is, potentials c r^{-p} with p < 2. For that, you have to use some tricks that I don’t want to explain here. I also don’t want to prove this inequality, or explain why its optimal! You can find most of this in some old course notes of mine:

• John Baez, Quantum Theory and Analysis, 1989.

See especially section 15.

But it’s pretty easy to see how this inequality implies things about the expected energy of a quantum particle in the potential c r^{-2}. So let’s do that.

In this potential, the expected energy of a state \psi is:

\displaystyle{  \langle \psi, H \psi \rangle =   \int_{\mathbb{R}^3} \overline\psi(x)\, (-\nabla^2 + c r^{-2})\psi(x) \, d^3 x }

Doing an integration by parts, this gives:

\displaystyle{  \langle \psi, H \psi \rangle = \int_{\mathbb{R}^3} |\nabla \psi(x)|^2 + cr^{-2} |\psi(x)|^2 \,d^3 x }

The inequality I showed you says precisely that when c = -1/4, this is greater than or equal to zero. So, the expected energy is actually nonnegative in this case! And making c greater than -1/4 only makes the expected energy bigger.

Note that in classical mechanics, the energy of a particle in this potential ceases to be bounded below as soon as c < 0. Quantum mechanics is different because of the uncertainty principle! To get a lot of negative potential energy, the particle’s wavefunction must be squished near the origin, but that gives it kinetic energy.

It turns out that the Hamiltonian for a quantum particle in an inverse cube force law has exquisitely subtle and tricky behavior. Many people have written about it, running into ‘paradoxes’ when they weren’t careful enough. Only rather recently have things been straightened out.

For starters, the Hamiltonian for this kind of particle

H = -\nabla^2 + c r^{-2}

has different behaviors depending on c. Obviously the force is attractive when c > 0 and repulsive when c < 0, but that’s not the only thing that matters! Here’s a summary:

c \ge 3/4. In this case H is essentially self-adjoint on C_0^\infty(\mathbb{R}^3 - \{0\}). So, it admits a unique self-adjoint extension and there’s no ambiguity about this case.

c < 3/4. In this case H is not essentially self-adjoint on C_0^\infty(\mathbb{R}^3 - \{0\}). In fact, it admits more than one self-adjoint extension! This means that we need extra input from physics to choose the Hamiltonian in this case. It turns out that we need to say what happens when the particle hits the singularity at r = 0. This is a long and fascinating story that I just learned yesterday.

c \ge -1/4. In this case the expected energy \langle \psi, H \psi \rangle is bounded below for \psi \in C_0^\infty(\mathbb{R}^3 - \{0\}). It turns out that whenever we have a Hamiltonian that is bounded below, even if there is not a unique self-adjoint extension, there exists a canonical ‘best choice’ of self-adjoint extension, called the Friedrichs extension. I explain this in my course notes.

c < -1/4. In this case the expected energy is not bounded below, so we don’t have the Friedrichs extension to help us choose which self-adjoint extension is ‘best’.

To go all the way down this rabbit hole, I recommend these two papers:

• Sarang Gopalakrishnan, Self-Adjointness and the Renormalization of Singular Potentials, B.A. Thesis, Amherst College.

• D. M. Gitman, I. V. Tyutin and B. L. Voronov, Self-adjoint extensions and spectral analysis in the Calogero problem, Jour. Phys. A 43 (2010), 145205.

The first is good for a broad overview of problems associated to singular potentials such as the inverse cube force law; there is attention to mathematical rigor the focus is on physical insight. The second is good if you want—as I wanted—to really get to the bottom of the inverse cube force law in quantum mechanics. Both have lots of references.

Also, both point out a crucial fact I haven’t mentioned yet: in quantum mechanics the inverse cube force law is special because, naively, at least it has a kind of symmetry under rescaling! You can see this from the formula

H = -\nabla^2 + cr^{-2}

by noting that both the Laplacian and r^{-2} have units of length-2. So, they both transform in the same way under rescaling: if you take any smooth function \psi, apply H and then expand the result by a factor of k, you get k^2 times what you get if you do those operations in the other order.

In particular, this means that if you have a smooth eigenfunction of H with eigenvalue \lambda, you will also have one with eigenfunction k^2 \lambda for any k > 0. And if your original eigenfunction was normalizable, so will be the new one!

With some calculation you can show that when c \le -1/4, the Hamiltonian H has a smooth normalizable eigenfunction with a negative eigenvalue. In fact it’s spherically symmetric, so finding it is not so terribly hard. But this instantly implies that H has smooth normalizable eigenfunctions with any negative eigenvalue.

This implies various things, some terrifying. First of all, it means that H is not bounded below, at least not on the space of smooth normalizable functions. A similar but more delicate scaling argument shows that it’s also not bounded below on C_0^\infty(\mathbb{R}^3 - \{0\}), as I claimed earlier.

This is scary but not terrifying: it simply means that when c \le -1/4, the potential is too strongly negative for the Hamiltonian to be bounded below.

The terrifying part is this: we’re getting uncountably many normalizable eigenfunctions, all with different eigenvalues, one for each choice of k. A self-adjoint operator on a countable-dimensional Hilbert space like L^2(\mathbb{R}^3) can’t have uncountably many normalizable eigenvectors with different eigenvalues, since then they’d all be orthogonal to each other, and that’s too many orthogonal vectors to fit in a Hilbert space of countable dimension!

This sounds like a paradox, but it’s not. These functions are not all orthogonal, and they’re not all eigenfunctions of a self-adjoint operator. You see, the operator H is not self-adjoint on the domain we’ve chosen, the space of all smooth functions in L^2(\mathbb{R}^3). We can carefully choose a domain to get a self-adjoint operator… but it turns out there are many ways to do it.

Intriguingly, in most cases this choice breaks the naive dilation symmetry. So, we’re getting what physicists call an ‘anomaly’: a symmetry of a classical system that fails to give a symmetry of the corresponding quantum system.

Of course, if you’ve made it this far, you probably want to understand what the different choices of Hamiltonian for a particle in an inverse cube force law actually mean, physically. The idea seems to be that they say how the particle changes phase when it hits the singularity at r = 0 and bounces back out.

(Why does it bounce back out? Well, if it didn’t, time evolution would not be unitary, so it would not be described by a self-adjoint Hamiltonian! We could try to describe the physics of a quantum particle that does not come back out when it hits the singularity, and I believe people have tried, but this requires a different set of mathematical tools.)

For a detailed analysis of this, it seems one should take Schrödinger’s equation and do a separation of variables into the angular part and the radial part:

\psi(r,\theta,\phi) = \Psi(r) \Phi(\theta,\phi)

For each choice of \ell = 0,1,2,\dots one gets a space of spherical harmonics that one can use for the angular part \Phi. The interesting part is the radial part, \Psi. Here it is helpful to make a change of variables

u(r) = \Psi(r)/r

At least naively, Schrödinger’s equation for the particle in the cr^{-2} potential then becomes

\displaystyle{ \frac{d}{dt} u = -iH u }


\displaystyle{ H = -\frac{d^2}{dr^2} + \frac{c + \ell(\ell+1)}{r^2} }

Beware: I keep calling all sorts of different but related Hamiltonians H, and this one is for the radial part of the dynamics of a quantum particle in an inverse cube force. As we’ve seen before in the classical case, the centrifugal force and the inverse cube force join forces in an ‘effective potential’

\displaystyle{ U(r) = kr^{-2} }


k = c + \ell(\ell+1)

So, we have reduced the problem to that of a particle on the open half-line (0,\infty) moving in the potential kr^{-2}. The Hamiltonian for this problem:

\displaystyle{ H = -\frac{d^2}{dr^2} + \frac{k}{r^2} }

is called the Calogero Hamiltonian. Needless to say, it has fascinating and somewhat scary properties, since to make it into a bona fide self-adjoint operator, we must make some choice about what happens when the particle hits r = 0. The formula above does not really specify the Hamiltonian.

This is more or less where Gitman, Tyutin and Voronov begin their analysis, after a long and pleasant review of the problem. They describe all the possible choices of self-adjoint operator that are allowed. The answer depends on the values of k, but very crudely, the choice says something like how the phase of your particle changes when it bounces off the singularity. Most choices break the dilation invariance of the problem. But intriguingly, some choices retain invariance under a discrete subgroup of dilations!

So, the rabbit hole of the inverse cube force law goes quite deep, and I expect I haven’t quite gotten to the bottom yet. The problem may seem pathological, verging on pointless. But the math is fascinating, and it’s a great testing-ground for ideas in quantum mechanics—very manageable compared to deeper subjects like quantum field theory, which are riddled with their own pathologies. Finally, the connection between the inverse cube force law and centrifugal force makes me think it’s not a mere curiosity.

In four dimensions

It’s a bit odd to study the inverse cube force law in 3-dimensonal space, since Newtonian gravity and the electrostatic force would actually obey an inverse cube law in 4-dimensional space. For the classical 2-body problem it doesn’t matter much whether you’re in 3d or 4d space, since the motion stays on the plane. But for quantum 2-body problem it makes more of a difference!

Just for the record, let me say how the quantum 2-body problem works in 4 dimensions. As before, we can work in the center of mass frame and consider this Hamiltonian:

H = -\nabla^2 + c r^{-2}

And as before, the behavior of this Hamiltonian depends on c. Here’s the story this time:

c \ge 0. In this case H is essentially self-adjoint on C_0^\infty(\mathbb{R}^4 - \{0\}). So, it admits a unique self-adjoint extension and there’s no ambiguity about this case.

c < 0. In this case H is not essentially self-adjoint on C_0^\infty(\mathbb{R}^4 - \{0\}).

c \ge -1. In this case the expected energy \langle \psi, H \psi \rangle is bounded below for \psi \in C_0^\infty(\mathbb{R}^3 - \{0\}). So, there is exists a canonical ‘best choice’ of self-adjoint extension, called the Friedrichs extension.

c < -1. In this case the expected energy is not bounded below, so we don’t have the Friedrichs extension to help us choose which self-adjoint extension is ‘best’.

I’ve been assured these are correct by Barry Simon, and a lot of this material will appear in Section 7.4 of his book:

• Barry Simon, A Comprehensive Course in Analysis, Part 4: Operator Theory, American Mathematical Society, Providence, RI, 2015.

See also:

• Barry Simon, Essential self-adjointness of Schrödinger operators with singular potentials, Arch. Rational Mech. Analysis 52 (1973), 44–48.


The animation was made by ‘WillowW’ and placed on Wikicommons. It’s one of a number that appears in this Wikipedia article:

Newton’s theorem of revolving orbits, Wikipedia.

I made the graphs using the free online Desmos graphing calculator.

The picture of a spiral was made by ‘Anarkman’ and ‘Pbroks13’ and placed on Wikicommons; it appears in

Hyperbolic spiral, Wikipedia.

The hyperbolic spiral is one of three kinds of orbits that are possible in an inverse cube force law. They are vaguely analogous to ellipses, hyperbolas and parabolas, but there are actually no bound orbits except perfect circles. The three kinds are called Cotes’s spirals. In polar coordinates, they are:

• the epispiral:

\displaystyle{ \frac{1}{r} = A \cos\left( k\theta + \varepsilon \right) }

• the hyperbolic spiral:

\displaystyle{ \frac{1}{r} = A \cosh\left( k\theta + \varepsilon \right) }

• the Poinsot spiral:

\displaystyle{ \frac{1}{r} = A \theta + \varepsilon }


Get every new post delivered to your Inbox.

Join 3,095 other followers