Biology, Networks and Control Theory

13 September, 2015

The Institute for Mathematics and its Applications (or IMA, in Minneapolis, Minnesota), is teaming up with the Mathematical Biosciences Institute (or MBI, in Columbus, Ohio). They’re having a big program on control theory and networks:

IMA-MBI coordinated program on network dynamics and control: Fall 2015 and Spring 2016.

MBI emphasis semester on dynamics of biologically inspired networks: Spring 2016.

At the IMA

Here’s what’s happening at the Institute for Mathematics and its Applications:

Concepts and techniques from control theory are becoming increasingly interdisciplinary. At the same time, trends in modern control theory are influenced and inspired by other disciplines. As a result, the systems and control community is rapidly broadening its scope in a variety of directions. The IMA program is designed to encourage true interdisciplinary research and the cross fertilization of ideas. An important element for success is that ideas flow across disciplines in a timely manner and that the cross-fertilization takes place in unison.

Due to the usefulness of control, talent from control theory is drawn and often migrates to other important areas, such as biology, computer science, and biomedical research, to apply its mathematical tools and expertise. It is vital that while the links are strong, we bring together researchers who have successfully bridged into other disciplines to promote the role of control theory and to focus on the efforts of the controls community. An IMA investment in this area will be a catalyst for many advances and will provide the controls community with a cohesive research agenda.

In all topics of the program the need for research is pressing. For instance, viable implementations of control algorithms for smart grids are an urgent and clearly recognized need with considerable implications for the environment and quality of life. The mathematics of control will undoubtedly influence technology and vice-versa. The urgency for these new technologies suggests that the greatest impact of the program is to have it sooner rather than later.

First trimester (Fall 2015): Networks, whether social, biological, swarms of animals or vehicles, the Internet, etc., constitute an increasingly important subject in science and engineering. Their connectivity and feedback pathways affect robustness and functionality. Such concepts are at the core of a new and rapidly evolving frontier in the theory of dynamical systems and control. Embedded systems and networks are already pervasive in automotive, biological, aerospace, and telecommunications technologies and soon are expected to impact the power infrastructure (smart grids). In this new technological and scientific realm, the modeling and representation of systems, the role of feedback, and the value and cost of information need to be re-evaluated and understood. Traditional thinking that is relevant to a limited number of feedback loops with practically unlimited bandwidth is no longer applicable. Feedback control and stability of network dynamics is a relatively new endeavor. Analysis and control of network dynamics will occupy mostly the first trimester while applications to power networks will be a separate theme during the third trimester. The first trimester will be divided into three workshops on the topics of analysis of network dynamics and regulation, communication and cooperative control over networks, and a separate one on biological systems and networks.

The second trimester (Winter 2016) will have two workshops. The first will be on modeling and estimation (Workshop 4) and the second one on distributed parameter systems and partial differential equations (Workshop 5). The theme of Workshop 4 will be on structure and parsimony in models. The goal is to explore recent relevant theories and techniques that allow sparse representations, rank constrained optimization, and structural constraints in models and control designs. Our intent is to blend a group of researchers in the aforementioned topics with a select group of researchers with interests in a statistical viewpoint. Workshop 5 will focus on distributed systems and related computational issues. One of our aims is to bring control theorists with an interest in optimal control of distributed parameter systems together with mathematicians working on optimal transport theory (in essence an optimal control problem). The subject of optimal transport is rapidly developing with ramifications in probability and statistics (of essence in system modeling and hence of interest to participants in Workshop 4 as well) and in distributed control of PDE’s. Emphasis will also be placed on new tools and new mathematical developments (in PDE’s, computational methods, optimization). The workshops will be closely spaced to facilitate participation in more than one.

The third trimester (Spring 2016) will target applications where the mathematics of systems and control may soon prove to have a timely impact. From the invention of atomic force microscopy at the nanoscale to micro-mirror arrays for a next generation of telescopes, control has played a critical role in sensing and imaging of challenging new realms. At present, thanks to recent technological advances of AFM and optical tweezers, great strides are taking place making it possible to manipulate the biological transport of protein molecules as well as the control of individual atoms. Two intertwined subject areas, quantum and nano control and scientific instrumentation, are seen to blend together (Workshop 6) with partial focus on the role of feedback control and optimal filtering in achieving resolution and performance at such scales. A second theme (Workshop 7) will aim at control issues in distributed hybrid systems, at a macro scale, with a specific focus the “smart grid” and energy applications.

For more information on individual workshops, go here:

• Workshop 1, Distributed Control and Decision Making Over Networks, 28 September – 2 October 2015.

• Workshop 2, Analysis and Control of Network Dynamics, 19-23 October 2015.

• Workshop 3, Biological Systems and Networks, 11-16 November 2015.

• Workshop 4, Optimization and Parsimonious Modeling, 25-29 January 2016.

• Workshop 5, Computational Methods for Control of Infinite-dimensional Systems, 14-18 March 2016.

• Workshop 6, Quantum and Nano Control, 11-15 April 2016.

• Workshop 7, Control at Large Scales: Energy Markets and Responsive Grids, 9-13 March 2016.

At the MBI

Here’s what’s going on at the Mathematical Biology Institute:

The MBI network program is part of a yearlong cooperative program with IMA.

Networks and deterministic and stochastic dynamical systems on networks are used as models in many areas of biology. This underscores the importance of developing tools to understand the interplay between network structures and dynamical processes, as well as how network dynamics can be controlled. The dynamics associated with such models are often different from what one might traditionally expect from a large system of equations, and these differences present the opportunity to develop exciting new theories and methods that should facilitate the analysis of specific models. Moreover, a nascent area of research is the dynamics of networks in which the networks themselves change in time, which occurs, for example, in plasticity in neuroscience and in up regulation and down regulation of enzymes in biochemical systems.

There are many areas in biology (including neuroscience, gene networks, and epidemiology) in which network analysis is now standard. Techniques from network science have yielded many biological insights in these fields and their study has yielded many theorems. Moreover, these areas continue to be exciting areas that contain both concrete and general mathematical problems. Workshop 1 explores the mathematics behind the applications in which restrictions on general coupled systems are important. Examples of such restrictions include symmetry, Boolean dynamics, and mass-action kinetics; and each of these special properties permits the proof of theorems about dynamics on these special networks.

Workshop 2 focuses on the interplay between stochastic and deterministic behavior in biological networks. An important related problem is to understand how stochasticity affects parameter estimation. Analyzing the relationship between stochastic changes, network structure, and network dynamics poses mathematical questions that are new, difficult, and fascinating.

In recent years, an increasing number of biological systems have been modeled using networks whose structure changes in time or which use multiple kinds of couplings between the same nodes or couplings that are not just pairwise. General theories such as groupoids and hypergraphs have been developed to handle the structure in some of these more general coupled systems, and specific application models have been studied by simulation. Workshop 3 will bring together theorists, modelers, and experimentalists to address the modeling of biological systems using new network structures and the analysis of such structures.

Biological systems use control to achieve desired dynamics and prevent undesirable behaviors. Consequently, the study of network control is important both to reveal naturally evolved control mechanisms that underlie the functioning of biological systems and to develop human-designed control interventions to recover lost function, mitigate failures, or repurpose biological networks. Workshop 4 will address the challenging subjects of control and observability of network dynamics.


Workshop 1: Dynamics in Networks with Special Properties, 25-29 January, 2016.

Workshop 2: The Interplay of Stochastic and Deterministic Dynamics in Networks, 22-26 February, 2016.

Workshop 3: Generalized Network Structures and Dynamics, 21-15 March, 2016.

Workshop 4: Control and Observability of Network Dynamics, 11-15 April, 2016.

You can get more schedule information on these posters:

A Compositional Framework for Markov Processes

4 September, 2015


This summer my students Brendan Fong and Blake Pollard visited me at the Centre for Quantum Technologies, and we figured out how to understand open continuous-time Markov chains! I think this is a nice step towards understanding the math of living systems.

Admittedly, it’s just a small first step. But I’m excited by this step, since Blake and I have been trying to get this stuff to work for a couple years, and it finally fell into place. And we think we know what to do next.

Here’s our paper:

• John C. Baez, Brendan Fong and Blake S. Pollard, A compositional framework for open Markov processes.

And here’s the basic idea…

Open detailed balanced Markov processes

A continuous-time Markov chain is a way to specify the dynamics of a population which is spread across some finite set of states. Population can flow between the states. The larger the population of a state, the more rapidly population flows out of the state. Because of this property, under certain conditions the populations of the states tend toward an equilibrium where at any state the inflow of population is balanced by its outflow.

In applications to statistical mechanics, we are often interested in equilibria such that for any two states connected by an edge, say i and j, the flow from i to j equals the flow from j to i. A continuous-time Markov chain with a chosen equilibrium having this property is called ‘detailed balanced‘.

I’m getting tired of saying ‘continuous-time Markov chain’, so from now on I’ll just say ‘Markov process’, just because it’s shorter. Okay? That will let me say the next sentence without running out of breath:

Our paper is about open detailed balanced Markov processes.

Here’s an example:

The detailed balanced Markov process itself consists of a finite set of states together with a finite set of edges between them, with each state i labelled by an equilibrium population q_i >0, and each edge e labelled by a rate constant r_e > 0.

These populations and rate constants are required to obey an equation called the ‘detailed balance condition’. This equation means that in equilibrium, the flow from i to j equal the flow from j to i. Do you see how it works in this example?

To get an ‘open’ detailed balanced Markov process, some states are designated as inputs or outputs. In general each state may be specified as both an input and an output, or as inputs and outputs multiple times. See how that’s happening in this example? It may seem weird, but it makes things work better.

People usually say Markov processes are all about how probabilities flow from one state to another. But we work with un-normalized probabilities, which we call ‘populations’, rather than probabilities that must sum to 1. The reason is that in an open Markov process, probability is not conserved: it can flow in or out at the inputs and outputs. We allow it to flow both in and out at both the input states and the output states.

Our most fundamental result is that there’s a category \mathrm{DetBalMark} where a morphism is an open detailed balanced Markov process. We think of it as a morphism from its inputs to its outputs.

We compose morphisms in \mathrm{DetBalMark} by identifying the output states of one open detailed balanced Markov process with the input states of another. The populations of identified states must match. For example, we may compose this morphism N:

with the previously shown morphism M to get this morphism M \circ N:

And here’s our second most fundamental result: the category \mathrm{DetBalMark} is actually a dagger compact category. This lets us do other stuff with open Markov processes. An important one is ‘tensoring’, which lets us take two open Markov processes like M and N above and set them side by side, giving M \otimes N:

The so-called compactness is also important. This means we can take some inputs of an open Markov process and turn them into outputs, or vice versa. For example, using the compactness of \mathrm{DetBalMark} we can get this open Markov process from M:

In fact all the categories in our paper are dagger compact categories, and all our functors preserve this structure. Dagger compact categories are a well-known framework for describing systems with inputs and outputs, so this is good.

The analogy to electrical circuits

In a detailed balanced Markov process, population can flow along edges. In the detailed balanced equilibrium, without any flow of population from outside, the flow along from state i to state j will be matched by the flow back from j to i. The populations need to take specific values for this to occur.

In an electrical circuit made of linear resistors, charge can flow along wires. In equilibrium, without any driving voltage from outside, the current along each wire will be zero. The potentials will be equal at every node.

This sets up an analogy between detailed balanced continuous-time Markov chains and electrical circuits made of linear resistors! I love analogy charts, so this makes me very happy:

    Circuits    Detailed balanced Markov processes
potential population
current flow
conductance rate constant
power dissipation

This analogy is already well known. Schnakenberg used it in his book Thermodynamic Network Analysis of Biological Systems. So, our main goal is to formalize and exploit it. This analogy extends from systems in equilibrium to the more interesting case of nonequilibrium steady states, which are the main topic of our paper.

Earlier, Brendan and I introduced a way to ‘black box’ a circuit and define the relation it determines between potential-current pairs at the input and output terminals. This relation describes the circuit’s external behavior as seen by an observer who can only perform measurements at the terminals.

An important fact is that black boxing is ‘compositional’: if one builds a circuit from smaller pieces, the external behavior of the whole circuit can be determined from the external behaviors of the pieces. For category theorists, this means that black boxing is a functor!

Our new paper with Blake develops a similar ‘black box functor’ for detailed balanced Markov processes, and relates it to the earlier one for circuits.

When you black box a detailed balanced Markov process, you get the relation between population–flow pairs at the terminals. (By the ‘flow at a terminal’, we more precisely mean the net population outflow.) This relation holds not only in equilibrium, but also in any nonequilibrium steady state. Thus, black boxing an open detailed balanced Markov process gives its steady state dynamics as seen by an observer who can only measure populations and flows at the terminals.

The principle of minimum dissipation

At least since the work of Prigogine, it’s been widely accepted that a large class of systems minimize entropy production in a nonequilibrium steady state. But people still fight about the the precise boundary of this class of systems, and even the meaning of this ‘principle of minimum entropy production’.

For detailed balanced open Markov processes, we show that a quantity we call the ‘dissipation’ is minimized in any steady state. This is a quadratic function of the populations and flows, analogous to the power dissipation of a circuit made of resistors. We make no claim that this quadratic function actually deserves to be called ‘entropy production’. Indeed, Schnakenberg has convincingly argued that they are only approximately equal.

But still, the ‘dissipation’ function is very natural and useful—and Prigogine’s so-called ‘entropy production’ is also a quadratic function.

Black boxing

I’ve already mentioned the category \mathrm{DetBalMark}, where a morphism is an open detailed balanced Markov process. But our paper needs two more categories to tell its story! There’s the category of circuits, and the category of linear relations.

A morphism in the category \mathrm{Circ} is an open electrical circuit made of resistors: that is, a graph with each edge labelled by a ‘conductance’ c_e > 0, together with specified input and output nodes:

A morphism in the category \mathrm{LinRel} is a linear relation L : U \leadsto V between finite-dimensional real vector spaces U and V. This is nothing but a linear subspace L \subseteq U \oplus V. Just as relations generalize functions, linear relations generalize linear functions!

In our previous paper, Brendan and I introduced these two categories and a functor between them, the ‘black box functor’:

\blacksquare : \mathrm{Circ} \to \mathrm{LinRel}

The idea is that any circuit determines a linear relation between the potentials and net current flows at the inputs and outputs. This relation describes the behavior of a circuit of resistors as seen from outside.

Our new paper introduces a black box functor for detailed balanced Markov processes:

\square : \mathrm{DetBalMark} \to \mathrm{LinRel}

We draw this functor as a white box merely to distinguish it from the other black box functor. The functor \square maps any detailed balanced Markov process to the linear relation obeyed by populations and flows at the inputs and outputs in a steady state. In short, it describes the steady state behavior of the Markov process ‘as seen from outside’.

How do we manage to black box detailed balanced Markov processes? We do it using the analogy with circuits!

The analogy becomes a functor

Every analogy wants to be a functor. So, we make the analogy between detailed balanced Markov processes and circuits precise by turning it into a functor:

K : \mathrm{DetBalMark} \to \mathrm{Circ}

This functor converts any open detailed balanced Markov process into an open electrical circuit made of resistors. This circuit is carefully chosen to reflect the steady-state behavior of the Markov process. Its underlying graph is the same as that of the Markov process. So, the ‘states’ of the Markov process are the same as the ‘nodes’ of the circuit.

Both the equilibrium populations at states of the Markov process and the rate constants labelling edges of the Markov process are used to compute the conductances of edges of this circuit. In the simple case where the Markov process has exactly one edge from any state i to any state j, the rule is this:

C_{i j} = H_{i j} q_j


q_j is the equilibrium population of the jth state of the Markov process,

H_{i j} is the rate constant for the edge from the jth state to the ith state of the Markov process, and

C_{i j} is the conductance (that is, the reciprocal of the resistance) of the wire from the jth node to the ith node of the resulting circuit.

The detailed balance condition for Markov processes says precisely that the matrix C_{i j} is symmetric! This is just right for an electrical circuit made of resistors, since it means that the resistance of the wire from node i to node j equals the resistance of the same wire in the reverse direction, from node j to node i.

A triangle of functors

If you paid careful attention, you’ll have noticed that I’ve described a triangle of functors:

And if you know anything about how category theorists think, you’ll be wondering if this diagram commutes.

In fact, this triangle of functors does not commute! However, a general lesson of category theory is that we should only expect diagrams of functors to commute up to natural isomorphism, and this is what happens here:

The natural transformation \alpha ‘corrects’ the black box functor for resistors to give the one for detailed balanced Markov processes.

The functors \square and \blacksquare \circ K are actually equal on objects. An object in \mathrm{DetBalMark} is a finite set X with each element i \in X labelled a positive populations q_i. Both functors map this object to the vector space \mathbb{R}^X \oplus \mathbb{R}^X. For the functor \square, we think of this as a space of population-flow pairs. For the functor \blacksquare \circ K, we think of it as a space of potential-current pairs. The natural transformation \alpha then gives a linear relation

\alpha_{X,q} : \mathbb{R}^X \oplus \mathbb{R}^X \leadsto \mathbb{R}^X \oplus \mathbb{R}^X

in fact an isomorphism of vector spaces, which converts potential-current pairs into population-flow pairs in a manner that depends on the q_i. I’ll skip the formula; it’s in the paper.

But here’s the key point. The naturality of \alpha actually allows us to reduce the problem of computing the functor \square to the problem of computing \blacksquare. Suppose

M: (X,q) \to (Y,r)

is any morphism in \mathrm{DetBalMark}. The object (X,q) is some finite set X labelled by populations q, and (Y,r) is some finite set Y labelled by populations r. Then the naturality of \alpha means that this square commutes:

Since \alpha_{X,q} and \alpha_{Y,r} are isomorphisms, we can solve for the functor \square as follows:

\square(M) = \alpha_Y \circ \blacksquare K(M) \circ \alpha_X^{-1}

This equation has a clear intuitive meaning! It says that to compute the behavior of a detailed balanced Markov process, namely \square(f), we convert it into a circuit made of resistors and compute the behavior of that, namely \blacksquare K(f). This is not equal to the behavior of the Markov process, but we can compute that behavior by converting the input populations and flows into potentials and currents, feeding them into our circuit, and then converting the outputs back into populations and flows.

What we really do

So that’s a sketch of what we do, and I hope you ask questions if it’s not clear. But I also hope you read our paper! Here’s what we actually do in there. After an introduction and summary of results:

• Section 3 defines open Markov processes and the open master equation.

• Section 4 introduces detailed balance for open Markov

• Section 5 recalls the principle of minimum power
for open circuits made of linear resistors, and explains how to black box them.

• Section 6 introduces the principle of minimum dissipation for open detailed balanced Markov processes, and describes how to black box these.

• Section 7 states the analogy between circuits and detailed balanced Markov processes in a formal way.

• Section 8 describes how to compose open Markov processes, making them into the morphisms of a category.

• Section 9 does the same for detailed balanced Markov processes.

• Section 10 describes the ‘black box functor’ that sends any open detailed balanced Markov process to the linear relation describing its external behavior, and recalls the similar functor for circuits.

• Section 11 makes the analogy between between open detailed balanced Markov processes and open circuits even more formal, by making it into a functor. We prove that together with the two black box functors, this forms a triangle that commutes up to natural isomorphism.

• Section 12 is about geometric aspects of this theory. We show that the linear relations in the image of these black box functors are Lagrangian relations between symplectic vector spaces. We also show that the master equation can be seen as a gradient flow equation.

• Section 13 is a summary of what we have learned.

Finally, Appendix A is a quick tutorial on decorated cospans. This is a key mathematical tool in our work, developed by Brendan in an earlier paper.

The Inverse Cube Force Law

30 August, 2015

Here you see three planets. The blue planet is orbiting the Sun in a realistic way: it’s going around an ellipse.

The other two are moving in and out just like the blue planet, so they all stay on the same circle. But they’re moving around this circle at different rates! The green planet is moving faster than the blue one: it completes 3 orbits each time the blue planet goes around once. The red planet isn’t going around at all: it only moves in and out.

What’s going on here?

In 1687, Isaac Newton published his Principia Mathematica. This book is famous, but in Propositions 43–45 of Book I he did something that people didn’t talk about much—until recently. He figured out what extra force, besides gravity, would make a planet move like one of these weird other planets. It turns out an extra force obeying an inverse cube law will do the job!

Let me make this more precise. We’re only interested in ‘central forces’ here. A central force is one that only pushes a particle towards or away from some chosen point, and only depends on the particle’s distance from that point. In Newton’s theory, gravity is a central force obeying an inverse square law:

F(r) = - \displaystyle{ \frac{a}{r^2} }

for some constant a. But he considered adding an extra central force obeying an inverse cube law:

F(r) = - \displaystyle{ \frac{a}{r^2} + \frac{b}{r^3} }

He showed that if you do this, for any motion of a particle in the force of gravity you can find a motion of a particle in gravity plus this extra force, where the distance r(t) is the same, but the angle \theta(t) is not.

In fact Newton did more. He showed that if we start with any central force, adding an inverse cube force has this effect.

There’s a very long page about this on Wikipedia:

Newton’s theorem of revolving orbits, Wikipedia.

I haven’t fully understood all of this, but it instantly makes me think of three other things I know about the inverse cube force law, which are probably related. So maybe you can help me figure out the relationship.

The first, and simplest, is this. Suppose we have a particle in a central force. It will move in a plane, so we can use polar coordinates r, \theta to describe its position. We can describe the force away from the origin as a function F(r). Then the radial part of the particle’s motion obeys this equation:

\displaystyle{ m \ddot r = F(r) + \frac{L^2}{mr^3} }

where L is the magnitude of particle’s angular momentum.

So, angular momentum acts to provide a ‘fictitious force’ pushing the particle out, which one might call the centrifugal force. And this force obeys an inverse cube force law!

Furthermore, thanks to the formula above, it’s pretty obvious that if you change L but also add a precisely compensating inverse cube force, the value of \ddot r will be unchanged! So, we can set things up so that the particle’s radial motion will be unchanged. But its angular motion will be different, since it has a different angular momentum. This explains Newton’s observation.

It’s often handy to write a central force in terms of a potential:

F(r) = -V'(r)

Then we can make up an extra potential responsible for the centrifugal force, and combine it with the actual potential V into a so-called effective potential:

\displaystyle{ U(r) = V(r) + \frac{L^2}{2mr^2} }

The particle’s radial motion then obeys a simple equation:

\ddot{r} = - U'(r)

For a particle in gravity, where the force obeys an inverse square law and V is proportional to -1/r, the effective potential might look like this:

This is the graph of

\displaystyle{ U(r) = -\frac{4}{r} + \frac{1}{r^2} }

If you’re used to particles rolling around in potentials, you can easily see that a particle with not too much energy will move back and forth, never making it to r = 0 or r = \infty. This corresponds to an elliptical orbit. Give it more energy and the particle can escape to infinity, but it will never hit the origin. The repulsive ‘centrifugal force’ always overwhelms the attraction of gravity near the origin, at least if the angular momentum is nonzero.

On the other hand, suppose we have a particle moving in an attractive inverse cube force! Then the potential is proportional to 1/r^2, so the effective potential is

\displaystyle{ U(r) = \frac{c}{r^2} + \frac{L^2}{mr^2} }

where c is negative for an attractive force. If this attractive force is big enough, namely

\displaystyle{ c < -\frac{L^2}{m} }

then this force can exceed the centrifugal force, and the particle can fall in to r = 0.

If we keep track of the angular coordinate \theta, we can see what’s really going on. The particle is spiraling in to its doom, hitting the origin in a finite amount of time!

This should remind you of a black hole, and indeed something similar happens there, but even more drastic:

Schwarzschild geodesics: effective radial potential energy, Wikipedia.

For a nonrotating uncharged black hole, the effective potential has three terms. Like Newtonian gravity it has an attractive -1/r term and a repulsive 1/r^2 term. But it also has an attractive term -1/r^3 term! In other words, it’s as if on top of Newtonian gravity, we had another attractive force obeying an inverse fourth power law! This overwhelms the others at short distances, so if you get too close to a black hole, you spiral in to your doom.

For example, a black hole can have an effective potential like this:

But back to inverse cube force laws! I know two more things about them. A while back I discussed how a particle in an inverse square force can be reinterpreted as a harmonic oscillator:

Planets in the fourth dimension, Azimuth.

There are many ways to think about this, and apparently the idea in some form goes all the way back to Newton! It involves a sneaky way to take a particle in a potential

\displaystyle{ V(r) \propto r^{-1} }

and think of it as moving around in the complex plane. Then if you square its position—thought of as a complex number—and cleverly reparametrize time, you get a particle moving in a potential

\displaystyle{ V(r) \propto r^2 }

This amazing trick can be generalized! A particle in a potential

\displaystyle{ V(r) \propto r^p }

can transformed to a particle in a potential

\displaystyle{ V(r) \propto r^q }


(p+2)(q+2) = 4

A good description is here:

• Rachel W. Hall and Krešimir Josić, Planetary motion and the duality of force laws, SIAM Review 42 (2000), 115–124.

This trick transforms particles in r^p potentials with p ranging between -2 and +\infty to r^q potentials with q ranging between +\infty and -2. It’s like a see-saw: when p is small, q is big, and vice versa.

But you’ll notice this trick doesn’t actually work at p = -2, the case that corresponds to the inverse cube force law. The problem is that p + 2 = 0 in this case, so we can’t find q with (p+2)(q+2) = 4.

So, the inverse cube force is special in three ways: it’s the one that you can add on to any force to get solutions with the same radial motion but different angular motion, it’s the one that naturally describes the ‘centrifugal force’, and it’s the one that doesn’t have a partner! We’ve seen how the first two ways are secretly the same. I don’t know about the third, but I’m hopeful.

Quantum aspects

Finally, here’s a fourth way in which the inverse cube law is special. This shows up most visibly in quantum mechanics… and this is what got me interested in this business in the first place.

You see, I’m writing a paper called ‘Struggles with the continuum’, which discusses problems in analysis that arise when you try to make some of our favorite theories of physics make sense. The inverse square force law poses interesting problems of this sort, which I plan to discuss. But I started wanting to compare the inverse cube force law, just so people can see things that go wrong in this case, and not take our successes with the inverse square law for granted.

Unfortunately a huge digression on the inverse cube force law would be out of place in that paper. So, I’m offloading some of that material to here.

In quantum mechanics, a particle moving in an inverse cube force law has a Hamiltonian like this:

H = -\nabla^2 + c r^{-2}

The first term describes the kinetic energy, while the second describes the potential energy. I’m setting \hbar = 1 and 2m = 1 to remove some clutter that doesn’t really affect the key issues.

To see how strange this Hamiltonian is, let me compare an easier case. If p < 2, the Hamiltonian

H = -\nabla^2 + c r^{-p}

is essentially self-adjoint on C_0^\infty(\mathbb{R}^3 - \{0\}), which is the space of compactly supported smooth functions on 3d Euclidean space minus the origin. What this means is that first of all, H is defined on this domain: it maps functions in this domain to functions in L^2(\mathbb{R}^3). But more importantly, it means we can uniquely extend H from this domain to a self-adjoint operator on some larger domain. In quantum physics, we want our Hamiltonians to be self-adjoint. So, this fact is good.

Proving this fact is fairly hard! It uses something called the Kato–Lax–Milgram–Nelson theorem together with this beautiful inequality:

\displaystyle{ \int_{\mathbb{R}^3} \frac{1}{4r^2} |\psi(x)|^2 \,d^3 x \le \int_{\mathbb{R}^3} |\nabla \psi(x)|^2 \,d^3 x }

for any \psi\in C_0^\infty(\mathbb{R}^3).

If you think hard, you can see this inequality is actually a fact about the quantum mechanics of the inverse cube law! It says that if c \ge -1/4, the energy of a quantum particle in the potential c r^{-2} is bounded below. And in a sense, this inequality is optimal: if c < -1/4, the energy is not bounded below. This is the quantum version of how a classical particle can spiral in to its doom in an attractive inverse cube law, if it doesn’t have enough angular momentum. But it’s subtly and mysteriously different.

You may wonder how this inequality is used to prove good things about potentials that are ‘less singular’ than the c r^{-2} potential: that is, potentials c r^{-p} with p < 2. For that, you have to use some tricks that I don’t want to explain here. I also don’t want to prove this inequality, or explain why its optimal! You can find most of this in some old course notes of mine:

• John Baez, Quantum Theory and Analysis, 1989.

See especially section 15.

But it’s pretty easy to see how this inequality implies things about the expected energy of a quantum particle in the potential c r^{-2}. So let’s do that.

In this potential, the expected energy of a state \psi is:

\displaystyle{  \langle \psi, H \psi \rangle =   \int_{\mathbb{R}^3} \overline\psi(x)\, (-\nabla^2 + c r^{-2})\psi(x) \, d^3 x }

Doing an integration by parts, this gives:

\displaystyle{  \langle \psi, H \psi \rangle = \int_{\mathbb{R}^3} |\nabla \psi(x)|^2 + cr^{-2} |\psi(x)|^2 \,d^3 x }

The inequality I showed you says precisely that when c = -1/4, this is greater than or equal to zero. So, the expected energy is actually nonnegative in this case! And making c greater than -1/4 only makes the expected energy bigger.

Note that in classical mechanics, the energy of a particle in this potential ceases to be bounded below as soon as c < 0. Quantum mechanics is different because of the uncertainty principle! To get a lot of negative potential energy, the particle’s wavefunction must be squished near the origin, but that gives it kinetic energy.

It turns out that the Hamiltonian for a quantum particle in an inverse cube force law has exquisitely subtle and tricky behavior. Many people have written about it, running into ‘paradoxes’ when they weren’t careful enough. Only rather recently have things been straightened out.

For starters, the Hamiltonian for this kind of particle

H = -\nabla^2 + c r^{-2}

has different behaviors depending on c. Obviously the force is attractive when c > 0 and repulsive when c < 0, but that’s not the only thing that matters! Here’s a summary:

c \ge 3/4. In this case H is essentially self-adjoint on C_0^\infty(\mathbb{R}^3 - \{0\}). So, it admits a unique self-adjoint extension and there’s no ambiguity about this case.

c < 3/4. In this case H is not essentially self-adjoint on C_0^\infty(\mathbb{R}^3 - \{0\}). In fact, it admits more than one self-adjoint extension! This means that we need extra input from physics to choose the Hamiltonian in this case. It turns out that we need to say what happens when the particle hits the singularity at r = 0. This is a long and fascinating story that I just learned yesterday.

c \ge -1/4. In this case the expected energy \langle \psi, H \psi \rangle is bounded below for \psi \in C_0^\infty(\mathbb{R}^3 - \{0\}). It turns out that whenever we have a Hamiltonian that is bounded below, even if there is not a unique self-adjoint extension, there exists a canonical ‘best choice’ of self-adjoint extension, called the Friedrichs extension. I explain this in my course notes.

c < -1/4. In this case the expected energy is not bounded below, so we don’t have the Friedrichs extension to help us choose which self-adjoint extension is ‘best’.

To go all the way down this rabbit hole, I recommend these two papers:

• Sarang Gopalakrishnan, Self-Adjointness and the Renormalization of Singular Potentials, B.A. Thesis, Amherst College.

• D. M. Gitman, I. V. Tyutin and B. L. Voronov, Self-adjoint extensions and spectral analysis in the Calogero problem, Jour. Phys. A 43 (2010), 145205.

The first is good for a broad overview of problems associated to singular potentials such as the inverse cube force law; there is attention to mathematical rigor the focus is on physical insight. The second is good if you want—as I wanted—to really get to the bottom of the inverse cube force law in quantum mechanics. Both have lots of references.

Also, both point out a crucial fact I haven’t mentioned yet: in quantum mechanics the inverse cube force law is special because, naively, at least it has a kind of symmetry under rescaling! You can see this from the formula

H = -\nabla^2 + cr^{-2}

by noting that both the Laplacian and r^{-2} have units of length-2. So, they both transform in the same way under rescaling: if you take any smooth function \psi, apply H and then expand the result by a factor of k, you get k^2 times what you get if you do those operations in the other order.

In particular, this means that if you have a smooth eigenfunction of H with eigenvalue \lambda, you will also have one with eigenfunction k^2 \lambda for any k > 0. And if your original eigenfunction was normalizable, so will be the new one!

With some calculation you can show that when c \le -1/4, the Hamiltonian H has a smooth normalizable eigenfunction with a negative eigenvalue. In fact it’s spherically symmetric, so finding it is not so terribly hard. But this instantly implies that H has smooth normalizable eigenfunctions with any negative eigenvalue.

This implies various things, some terrifying. First of all, it means that H is not bounded below, at least not on the space of smooth normalizable functions. A similar but more delicate scaling argument shows that it’s also not bounded below on C_0^\infty(\mathbb{R}^3 - \{0\}), as I claimed earlier.

This is scary but not terrifying: it simply means that when c \le -1/4, the potential is too strongly negative for the Hamiltonian to be bounded below.

The terrifying part is this: we’re getting uncountably many normalizable eigenfunctions, all with different eigenvalues, one for each choice of k. A self-adjoint operator on a countable-dimensional Hilbert space like L^2(\mathbb{R}^3) can’t have uncountably many normalizable eigenvectors with different eigenvalues, since then they’d all be orthogonal to each other, and that’s too many orthogonal vectors to fit in a Hilbert space of countable dimension!

This sounds like a paradox, but it’s not. These functions are not all orthogonal, and they’re not all eigenfunctions of a self-adjoint operator. You see, the operator H is not self-adjoint on the domain we’ve chosen, the space of all smooth functions in L^2(\mathbb{R}^3). We can carefully choose a domain to get a self-adjoint operator… but it turns out there are many ways to do it.

Intriguingly, in most cases this choice breaks the naive dilation symmetry. So, we’re getting what physicists call an ‘anomaly’: a symmetry of a classical system that fails to give a symmetry of the corresponding quantum system.

Of course, if you’ve made it this far, you probably want to understand what the different choices of Hamiltonian for a particle in an inverse cube force law actually mean, physically. The idea seems to be that they say how the particle changes phase when it hits the singularity at r = 0 and bounces back out.

(Why does it bounce back out? Well, if it didn’t, time evolution would not be unitary, so it would not be described by a self-adjoint Hamiltonian! We could try to describe the physics of a quantum particle that does not come back out when it hits the singularity, and I believe people have tried, but this requires a different set of mathematical tools.)

For a detailed analysis of this, it seems one should take Schrödinger’s equation and do a separation of variables into the angular part and the radial part:

\psi(r,\theta,\phi) = \Psi(r) \Phi(\theta,\phi)

For each choice of \ell = 0,1,2,\dots one gets a space of spherical harmonics that one can use for the angular part \Phi. The interesting part is the radial part, \Psi. Here it is helpful to make a change of variables

u(r) = \Psi(r)/r

At least naively, Schrödinger’s equation for the particle in the cr^{-2} potential then becomes

\displaystyle{ \frac{d}{dt} u = -iH u }


\displaystyle{ H = -\frac{d^2}{dr^2} + \frac{c + \ell(\ell+1)}{r^2} }

Beware: I keep calling all sorts of different but related Hamiltonians H, and this one is for the radial part of the dynamics of a quantum particle in an inverse cube force. As we’ve seen before in the classical case, the centrifugal force and the inverse cube force join forces in an ‘effective potential’

\displaystyle{ U(r) = kr^{-2} }


k = c + \ell(\ell+1)

So, we have reduced the problem to that of a particle on the open half-line (0,\infty) moving in the potential kr^{-2}. The Hamiltonian for this problem:

\displaystyle{ H = -\frac{d^2}{dr^2} + \frac{k}{r^2} }

is called the Calogero Hamiltonian. Needless to say, it has fascinating and somewhat scary properties, since to make it into a bona fide self-adjoint operator, we must make some choice about what happens when the particle hits r = 0. The formula above does not really specify the Hamiltonian.

This is more or less where Gitman, Tyutin and Voronov begin their analysis, after a long and pleasant review of the problem. They describe all the possible choices of self-adjoint operator that are allowed. The answer depends on the values of k, but very crudely, the choice says something like how the phase of your particle changes when it bounces off the singularity. Most choices break the dilation invariance of the problem. But intriguingly, some choices retain invariance under a discrete subgroup of dilations!

So, the rabbit hole of the inverse cube force law goes quite deep, and I expect I haven’t quite gotten to the bottom yet. The problem may seem pathological, verging on pointless. But the math is fascinating, and it’s a great testing-ground for ideas in quantum mechanics—very manageable compared to deeper subjects like quantum field theory, which are riddled with their own pathologies. Finally, the connection between the inverse cube force law and centrifugal force makes me think it’s not a mere curiosity.

In four dimensions

It’s a bit odd to study the inverse cube force law in 3-dimensonal space, since Newtonian gravity and the electrostatic force would actually obey an inverse cube law in 4-dimensional space. For the classical 2-body problem it doesn’t matter much whether you’re in 3d or 4d space, since the motion stays on the plane. But for quantum 2-body problem it makes more of a difference!

Just for the record, let me say how the quantum 2-body problem works in 4 dimensions. As before, we can work in the center of mass frame and consider this Hamiltonian:

H = -\nabla^2 + c r^{-2}

And as before, the behavior of this Hamiltonian depends on c. Here’s the story this time:

c \ge 0. In this case H is essentially self-adjoint on C_0^\infty(\mathbb{R}^4 - \{0\}). So, it admits a unique self-adjoint extension and there’s no ambiguity about this case.

c < 0. In this case H is not essentially self-adjoint on C_0^\infty(\mathbb{R}^4 - \{0\}).

c \ge -1. In this case the expected energy \langle \psi, H \psi \rangle is bounded below for \psi \in C_0^\infty(\mathbb{R}^3 - \{0\}). So, there is exists a canonical ‘best choice’ of self-adjoint extension, called the Friedrichs extension.

c < -1. In this case the expected energy is not bounded below, so we don’t have the Friedrichs extension to help us choose which self-adjoint extension is ‘best’.

I’ve been assured these are correct by Barry Simon, and a lot of this material will appear in Section 7.4 of his book:

• Barry Simon, A Comprehensive Course in Analysis, Part 4: Operator Theory, American Mathematical Society, Providence, RI, 2015.

See also:

• Barry Simon, Essential self-adjointness of Schrödinger operators with singular potentials, Arch. Rational Mech. Analysis 52 (1973), 44–48.


The animation was made by ‘WillowW’ and placed on Wikicommons. It’s one of a number that appears in this Wikipedia article:

Newton’s theorem of revolving orbits, Wikipedia.

I made the graphs using the free online Desmos graphing calculator.

The picture of a spiral was made by ‘Anarkman’ and ‘Pbroks13’ and placed on Wikicommons; it appears in

Hyperbolic spiral, Wikipedia.

The hyperbolic spiral is one of three kinds of orbits that are possible in an inverse cube force law. They are vaguely analogous to ellipses, hyperbolas and parabolas, but there are actually no bound orbits except perfect circles. The three kinds are called Cotes’s spirals. In polar coordinates, they are:

• the epispiral:

\displaystyle{ \frac{1}{r} = A \cos\left( k\theta + \varepsilon \right) }

• the hyperbolic spiral:

\displaystyle{ \frac{1}{r} = A \cosh\left( k\theta + \varepsilon \right) }

• the Poinsot spiral:

\displaystyle{ \frac{1}{r} = A \theta + \varepsilon }

The Physics of Butterfly Wings

11 August, 2015

Some butterflies have shiny, vividly colored wings. From different angles you see different colors. This effect is called iridescence. How does it work?

It turns out these butterfly wings are made of very fancy materials! Light bounces around inside these materials in a tricky way. Sunlight of different colors winds up reflecting off these materials in different directions.

We’re starting to understand the materials and make similar substances in the lab. They’re called photonic crystals. They have amazing properties.

Here at the Centre for Quantum Technologies we have people studying exotic materials of many kinds. Next door, there’s a lab completely devoted to studying graphene: crystal sheets of carbon in which electrons can move as if they were massless particles! Graphene has a lot of potential for building new technologies—that’s why Singapore is pumping money into researching it.

Some physicists at MIT just showed that one of the materials in butterfly wings might act like a 3d form of graphene. In graphene, electrons can only move easily in 2 directions. In this new material, electrons could move in all 3 directions, acting as if they had no mass.

The pictures here show the microscopic structure of two materials found in butterfly wings:

The picture at left actually shows a sculpture made by the mathematical artist Bathsheba Grossman. But it’s a piece of a gyroid: a surface with a very complicated shape, which repeats forever in 3 directions. It’s called a minimal surface because you can’t shrink its area by tweaking it just a little. It divides space into two regions.

The gyroid was discovered in 1970 by a mathematician, Alan Schoen. It’s a triply periodic minimal surfaces, meaning one that repeats itself in 3 different directions in space, like a crystal.

Schoen was working for NASA, and his idea was to use the gyroid for building ultra-light, super-strong structures. But that didn’t happen. Research doesn’t move in predictable directions.

In 1983, people discovered that in some mixtures of oil and water, the oil naturally forms a gyroid. The sheets of oil try to minimize their area, so it’s not surprising that they form a minimal surface. Something else makes this surface be a gyroid—I’m not sure what.

Butterfly wings are made of a hard material called chitin. Around 2008, people discovered that the chitin in some iridescent butterfly wings is made in a gyroid pattern! The spacing in this pattern is very small, about one wavelength of visible light. This makes light move through this material in a complicated way, which depends on the light’s color and the direction it’s moving.

So: butterflies have naturally evolved a photonic crystal based on a gyroid!

The universe is awesome, but it’s not magic. A mathematical pattern is beautiful if it’s a simple solution to at least one simple problem. This is why beautiful patterns naturally bring themselves into existence: they’re the simplest ways for certain things to happen. Darwinian evolution helps out: it scans through trillions of possibilities and finds solutions to problems. So, we should expect life to be packed with mathematically beautiful patterns… and it is.

The picture at right above shows a ‘double gyroid’. Here it is again:

This is actually two interlocking surfaces, shown in red and blue. You can get them by writing the gyroid as a level surface:

f(x,y,z) = 0

and taking the two nearby surfaces

f(x,y,z) = \pm c

for some small value of c..

It turns out that while they’re still growing, some butterflies have a double gyroid pattern in their wings. This turns into a single gyroid when they grow up!

The new research at MIT studied how an electron would move through a double gyroid pattern. They calculated its dispersion relation: how the speed of the electron would depend on its energy and the direction it’s moving.

An ordinary particle moves faster if it has more energy. But a massless particle, like a photon, moves at the same speed no matter what energy it has. The MIT team showed that an electron in a double gyroid pattern moves at a speed that doesn’t depend much on its energy. So, in some ways this electron acts like a massless particle.

But it’s quite different than a photon. It’s actually more like a neutrino! You see, unlike photons, electrons and neutrinos are spin-1/2 particles. Neutrinos are almost massless. A massless spin-1/2 particle can have a built-in handedness, spinning in only one direction around its axis of motion. Such a particle is called a Weyl spinor. The MIT team showed that a electron moving through a double gyroid acts approximately like a Weyl spinor!

How does this work? Well, the key fact is that the double gyroid has a built-in handedness, or chirality. It comes in a left-handed and right-handed form. You can see the handedness quite clearly in Grossman’s sculpture of the ordinary gyroid:

Beware: nobody has actually made electrons act like Weyl spinors in the lab yet. The MIT team just found a way that should work. Someday someone will actually make it happen, probably in less than a decade. And later, someone will do amazing things with this ability. I don’t know what. Maybe the butterflies know!

References and more

For a good introduction to the physics of gyroids, see:

• James A. Dolan, Bodo D. Wilts, Silvia Vignolini, Jeremy J. Baumberg, Ullrich Steiner and Timothy D. Wilkinson, Optical properties of gyroid structured materials: from photonic crystals to metamaterials, Advanced Optical Materials 3 (2015), 12–32.

For some of the history and math of gyroids, see Alan Schoen’s webpage:

• Alan Schoen, Triply-periodic minimal surfaces.

For more on gyroids in butterfly wings, see:

• K. Michielsen and D. G. Stavenga, Gyroid cuticular structures in butterfly wing scales: biological photonic crystals.

• Vinodkumar Saranathana et al, Structure, function, and self-assembly of single network gyroid (I4132) photonic crystals in butterfly wing scales, PNAS 107 (2010), 11676–11681.

The paper by Michielsen and Stavenga is free online! They say the famous ‘blue Morpho’ butterfly shown in the picture at the top of this article does not use a gyroid; it uses a “two-dimensional photonic crystal slab consisting of arrays of rectangles formed by lamellae and microribs.” But they find gyroids in four other species: Callophrys rubi, Cyanophrys remus, Pardes sesostris and Teinopalpus imperialis. It compares tunnelling electron microscope pictures of slices of their iridescent patches with computer-generated slices of gyroids. The comparison looks pretty good to me:

For the evolution of iridescence, see:

• Melissa G. Meadows et al, Iridescence: views from many angles, J. Roy. Soc. Interface 6 (2009).

For the new research at MIT, see:

• Ling Lu, Liang Fu, John D. Joannopoulos and Marin Soljačić, Weyl points and line nodes in gapless gyroid photonic crystals.

• Ling Lu, Zhiyu Wang, Dexin Ye, Lixin Ran, Liang Fu, John D. Joannopoulos and Marin Soljačić, Experimental observation of Weyl points, Science 349 (2015), 622–624.

Again, the first is free online. There’s a lot of great math lurking inside, most of which is too mind-blowing too explain quickly. Let me just paraphrase the start of the paper, so at least experts can get the idea:

Two-dimensional (2d) electrons and photons at the energies and frequencies of Dirac points exhibit extraordinary features. As the best example, almost all the remarkable properties of graphene are tied to the massless Dirac fermions at its Fermi level. Topologically, Dirac cones are not only the critical points for 2d phase transitions but also the unique surface manifestation of a topologically gapped 3d bulk. In a similar way, it is expected that if a material could be found that exhibits a 3d linear dispersion relation, it would also display a wide range of interesting physics phenomena. The associated 3D linear point degeneracies are called “Weyl points”. In the past year, there have been a few studies of Weyl fermions in electronics. The associated Fermi-arc surface states, quantum Hall effect, novel transport properties and a realization of the Adler–Bell–Jackiw anomaly are also expected. However, no observation of Weyl points has been reported. Here, we present a theoretical discovery and detailed numerical investigation of frequency-isolated Weyl points in perturbed double-gyroid photonic crystals along with their complete phase diagrams and their topologically protected surface states.

Also a bit for the mathematicians:

Weyl points are topologically stable objects in the 3d Brillouin zone: they act as monopoles of Berry flux in momentum space, and hence are intimately related to the topological invariant known as the Chern number. The Chern number can be defined for a single bulk band or a set of bands, where the Chern numbers of the individual bands are summed, on any closed 2d surface in the 3d Brillouin zone. The difference of the Chern numbers defined on two surfaces, of all bands below the Weyl point frequencies, equals the sum of the chiralities of the Weyl points enclosed in between the two surfaces.

This is a mix of topology and physics jargon that may be hard for pure mathematicians to understand, but I’ll be glad to translate if there’s interest.

For starters, a ‘monopole of Berry flux in momentum space’ is a poetic way of talking about a twisted complex line bundle over the space of allowed energy-momenta of the electron in the double gyroid. We get a twist at every ‘Weyl point’, meaning a point where the dispersion relations look locally like those of a Weyl spinor when its energy-momentum is near zero. Near such a point, the dispersion relations are a Fourier-transformed version of the Weyl equation.

The Game of Googol

20 July, 2015

Here’s a puzzle from a recent issue of Quanta, an online science magazine:

Puzzle 1: I write down two different numbers that are completely unknown to you, and hold one in my left hand and one in my right. You have absolutely no idea how I generated these two numbers. Which is larger?

You can point to one of my hands, and I will show you the number in it. Then you can decide to either select the number you have seen or switch to the number you have not seen, held in the other hand. Is there a strategy that will give you a greater than 50% chance of choosing the larger number, no matter which two numbers I write down?

At first it seems the answer is no. Whatever number you see, the other number could be larger or smaller. There’s no way to tell. So obviously you can’t get a better than 50% chance of picking the hand with the largest number—even if you’ve seen one of those numbers!

But “obviously” is not a proof. Sometimes “obvious” things are wrong!

It turns out that, amazingly, the answer to the puzzle is yes! You can find a strategy to do better than 50%. But the strategy uses randomness. So, this puzzle is a great illustration of the power of randomness.

If you want to solve it yourself, stop now or read Quanta magazine for some clues—they offered a small prize for the best answer:

• Pradeep Mutalik, Can information rise from randomness?, Quanta, 7 July 2015.

Greg Egan gave a nice solution in the comments to this magazine article, and I’ll reprint it below along with two followup puzzles. So don’t look down there unless you want a spoiler.

I should add: the most common mistake among educated readers seems to be assuming that the first player, the one who chooses the two numbers, chooses them according to some probability distribution. Don’t assume that. They are simply arbitrary numbers.

The history of this puzzle

I’d seen this puzzle before—do you know who invented it? On G+, Hans Havermann wrote:

I believe the origin of this puzzle goes back to (at least) John Fox and Gerald Marnie’s 1958 betting game ‘Googol’. Martin Gardner mentioned it in his February 1960 column in Scientific American. Wikipedia mentions it under the heading ‘Secretary problem’. Gardner suggested that a variant of the game was proposed by Arthur Cayley in 1875.

Actually the game of Googol is a generalization of the puzzle that we’ve been discussing. Martin Gardner explained it thus:

Ask someone to take as many slips of paper as he pleases, and on each slip write a different positive number. The numbers may range from small fractions of 1 to a number the size of a googol (1 followed by a hundred 0s) or even larger. These slips are turned face down and shuffled over the top of a table. One at a time you turn the slips face up. The aim is to stop turning when you come to the number that you guess to be the largest of the series. You cannot go back and pick a previously turned slip. If you turn over all the slips, then of course you must pick the last one turned.

So, the puzzle I just showed you is the special case when there are just 2 slips of paper. I seem to recall that Gardner incorrectly dismissed this case as trivial!

There’s been a lot of work on Googol. Julien Berestycki writes:

I heard about this puzzle a few years ago from Sasha Gnedin. He has a very nice paper about this

• Alexander V. Gnedin, A solution to the game of Googol, Annals of Probability (1994), 1588–1595.

One of the many beautiful ideas in this paper is that it asks what is the best strategy for the guy who writes the numbers! It also cites a paper by Gnedin and Berezowskyi (of oligarchic fame). 

Egan’s solution

Okay, here is Greg Egan’s solution, paraphrased a bit:

Pick some function f : \mathbb{R} \to \mathbb{R} such that:

\displaystyle{ \lim_{x \to -\infty} f(x) = 0 }

\displaystyle{ \lim_{x \to +\infty} f(x) = 1 }

f is strictly increasing: if x > y then f(x) > f(y)

There are lots of functions like this, for example

\displaystyle{f(x) = \frac{e^x}{e^x + 1} }

Next, pick one of the first player’s hands at random. If the number you are shown is a, compute f(a). Then generate a uniformly distributed random number z between 0 and 1. If z is less than or equal to f(a) guess that x is the larger number, but if z is greater than f(a) guess that the larger number is in the other hand.

The probability of guessing correctly can be calculated as the probability of seeing the larger number initially and then, correctly, sticking with it, plus the probability of seeing the smaller number initially and then, correctly, choosing the other hand.

Say the larger number is x and the smaller one is y. Then the probability of guessing correctly is

\frac{1}{2} f(x) + \frac{1}{2} (1 - f(y)) =  \frac{1}{2} + \frac{1}{2} (f(x) - f(y))

This is strictly greater than \frac{1}{2} since x > y so f(x) - f(y) > 0.

So, you have a more than 50% chance of winning! But as you play the game, there’s no way to tell how much more than 50%. If the numbers on the other players hands are very large, or very small, your chance will be just slightly more than 50%.

Followup puzzles

Here are two more puzzles:

Puzzle 2: Prove that no deterministic strategy can guarantee you have a more than 50% chance of choosing the larger number.

Puzzle 3: There are perfectly specific but ‘algorithmically random’ sequences of bits, which can’t predicted well by any program. If we use these to generate a uniform algorithmically random number between 0 and 1, and use the strategy Egan describes, will our chance of choosing the larger number be more than 50%, or not?

But watch out—here come Egan’s solutions to those!


Egan writes:

Here are my answers to your two puzzles on G+.

Puzzle 2: Prove that no deterministic strategy can guarantee you have a more than 50% chance of choosing the larger number.

Answer: If we adopt a deterministic strategy, that means there is a function S: \mathbb{R} \to \{0,1\} that tells us whether on not we stick with the number x when we see it. If S(x)=1 we stick with it, if S(x)=0 we swap it for the other number.

If the two numbers are x and y, with x > y, then the probability of success will be:

P = 0.5 + 0.5(S(x)-S(y))

This is exactly the same as the formula we obtained when we stuck with x with probability f(x), but we have specialised to functions S valued in \{0,1\}.

We can only guarantee a more than 50% chance of choosing the larger number if S is monotonically increasing everywhere, i.e. S(x) > S(y) whenever x > y. But this is impossible for a function valued in \{0,1\}. To prove this, define x_0 to be any number in [1,2] such that S(x_0)=0; such an x_0 must exist, otherwise S would be constant on [1,2] and hence not monotonically increasing. Similarly define x_1 to be any number in [-2,-1] such that S(x_1) = 1. We then have x_0 > x_1 but S(x_0) < S(x_1).

Puzzle 3: There are perfectly specific but ‘algorithmically random’ sequences of bits, which can’t predicted well by any program. If we use these to generate a uniform algorithmically random number between 0 and 1, and use the strategy Egan describes, will our chance of choosing the larger number be more than 50%, or not?

Answer: As Philip Gibbs noted, a deterministic pseudo-random number generator is still deterministic. Using a specific sequence of algorithmically random bits

(b_1, b_2, \dots )

to construct a number z between 0 and 1 means z takes on the specific value:

z_0 = \sum_i b_i 2^{-i}

So rather than sticking with x with probability f(x) for our monotonically increasing function f, we end up always sticking with x if z_0 \le f(x), and always swapping if z_0 > f(x). This is just using a function S:\mathbb{R} \to \{0,1\} as in Puzzle 2, with:

S(x) = 0 if x < f^{-1}(z_0)

S(x) = 1 if x \ge f^{-1}(z_0)

So all the same consequences as in Puzzle 2 apply, and we cannot guarantee a more than 50% chance of choosing the larger number.

Puzzle 3 emphasizes the huge gulf between ‘true randomness’, where we only have a probability distribution of numbers z, and the situation where we have a specific number z_0, generated by any means whatsoever.

We could generate z_0 using a pseudorandom number generator, radioactive decay of atoms, an oracle whose randomness is certified by all the Greek gods, or whatever. No matter how randomly z_0 is generated, once we have it, we know there exist choices for the first player that will guarantee our defeat!

This may seem weird at first, but if you think about simple games of luck you’ll see it’s completely ordinary. We can have a more than 50% chance of winning such a game even if for any particular play we make the other player has a move that ensures our defeat. That’s just how randomness works.

Trends in Reaction Network Theory (Part 2)

1 July, 2015

Here in Copenhagen we’ll soon be having a bunch of interesting talks on chemical reaction networks:

Workshop on Mathematical Trends in Reaction Network Theory, 1-3 July 2015, Department of Mathematical Sciences, University of Copenhagen. Organized by Elisenda Feliu and Carsten Wiuf.

Looking through the abstracts, here are a couple that strike me.

First of all, Gheorghe Craciun claims to have proved the biggest open conjecture in this field: the Global Attractor Conjecture!

• Gheorge Craciun, Toric differential inclusions and a proof of the global attractor conjecture.

This famous old conjecture says that for a certain class of chemical reactions, the ones coming from ‘complex balanced reaction networks’, the chemicals will approach equilibrium no matter what their initial concentrations are. Here’s what Craciun says:

Abstract. In a groundbreaking 1972 paper Fritz Horn and Roy Jackson showed that a complex balanced mass-action system must have a unique locally stable equilibrium within any compatibility class. In 1974 Horn conjectured that this equilibrium is a global attractor, i.e., all solutions in the same compatibility class must converge to this equilibrium. Later, this claim was called the Global Attractor Conjecture, and it was shown that it has remarkable implications for the dynamics of large classes of polynomial and power-law dynamical systems, even if they are not derived from mass-action kinetics. Several special cases of this conjecture have been proved during the last decade. We describe a proof of the conjecture in full generality. In particular, it will follow that all detailed balanced mass action systems and all deficiency zero mass-action systems have the global attractor property. We will also discuss some implications for biochemical mechanisms that implement noise filtering and cellular homeostasis.

Manoj Gopalkrishnan wrote a great post explaining the concept of complex balanced reaction network here on Azimuth, so if you want to understand the conjecture you could start there.

Even better, Manoj is talking here about a way to do statistical inference with chemistry! His talk is called ‘Statistical inference with a chemical soup’:

Abstract. The goal is to design an “intelligent chemical soup” that can do statistical inference. This may have niche technological applications in medicine and biological research, as well as provide fundamental insight into the workings of biochemical reaction pathways. As a first step towards our goal, we describe a scheme that exploits the remarkable mathematical similarity between log-linear models in statistics and chemical reaction networks. We present a simple scheme that encodes the information in a log-linear model as a chemical reaction network. Observed data is encoded as initial concentrations, and the equilibria of the corresponding mass-action system yield the maximum likelihood estimators. The simplicity of our scheme suggests that molecular environments, especially within cells, may be particularly well suited to performing statistical computations.

It’s based on this paper:

• Manoj Gopalkrishnan, A scheme for molecular computation of maximum likelihood estimators for log-linear models.

I’m not sure, but this idea may exploit existing analogies between the approach to equilibrium in chemistry, the approach to equilibrium in evolutionary game theory, and statistical inference. You may have read Marc Harper’s post about that stuff!

David Doty is giving a broader review of ‘Computation by (not about) chemistry’:

Abstract. The model of chemical reaction networks (CRNs) is extensively used throughout the natural sciences as a descriptive language for existing chemicals. If we instead think of CRNs as a programming language for describing artificially engineered chemicals, what sorts of computations are possible for these chemicals to achieve? The answer depends crucially on several formal choices:

1) Do we treat matter as infinitely divisible (real-valued concentrations) or atomic (integer-valued counts)?

2) How do we represent the input and output of the computation (e.g., Boolean presence or absence of species, positive numbers directly represented by counts/concentrations, positive and negative numbers represented indirectly by the difference between counts/concentrations of a pair of species)?

3) Do we assume mass-action rate laws (reaction rates proportional to reactant counts/concentrations) or do we insist the system works correctly under a broader class of rate laws?

The talk will survey several recent results and techniques. A primary goal of the talk is to convey the “programming perspective”: rather than asking “What does chemistry do?”, we want to understand “What could chemistry do?” as well as “What can chemistry provably not do?”

I’m really interested in chemical reaction networks that appear in biological systems, and there will be lots of talks about that. For example, Ovidiu Radulescu will talk about ‘Taming the complexity of biochemical networks through model reduction and tropical geometry’. Model reduction is the process of simplifying complicated models while preserving at least some of their good features. Tropical geometry is a cool version of algebraic geometry that uses the real numbers with minimization as addition and addition as multiplication. This number system underlies the principle of least action, or the principle of maximum energy. Here is Radulescu’s abstract:

Abstract. Biochemical networks are used as models of cellular physiology with diverse applications in biology and medicine. In the absence of objective criteria to detect essential features and prune secondary details, networks generated from data are too big and therefore out of the applicability of many mathematical tools for studying their dynamics and behavior under perturbations. However, under circumstances that we can generically denote by multi-scaleness, large biochemical networks can be approximated by smaller and simpler networks. Model reduction is a way to find these simpler models that can be more easily analyzed. We discuss several model reduction methods for biochemical networks with polynomial or rational rate functions and propose as their common denominator the notion of tropical equilibration, meaning finite intersection of tropical varieties in algebraic geometry. Using tropical methods, one can strongly reduce the number of variables and parameters of biochemical network. For multi-scale networks, these reductions are computed symbolically on orders of magnitude of parameters and variables, and are valid in wide domains of parameter and phase spaces.

I’m talking about the analogy between probabilities and quantum amplitudes, and how this makes chemistry analogous to particle physics. You can see two versions of my talk here, but I’ll be giving the ‘more advanced’ version, which is new:

Probabilities versus amplitudes.

Abstract. Some ideas from quantum theory are just beginning to percolate back to classical probability theory. For example, the master equation for a chemical reaction network describes the interactions of molecules in a stochastic rather than quantum way. If we look at it from the perspective of quantum theory, this formalism turns out to involve creation and annihilation operators, coherent states and other well-known ideas—but with a few big differences.

Anyway, there are a lot more talks, but if I don’t have breakfast and walk over to the math department, I’ll miss those talks!

You can learn more about individual talks in the comments here (see below) and also in Matteo Polettini’s blog:

• Matteo Polettini, Mathematical trends in reaction network theory: part 1 and part 2, Out of Equilibrium, 1 July 2015.

PROPs for Linear Systems

18 May, 2015

Eric Drexler likes to say: engineering is dual to science, because science tries to understand what the world does, while engineering is about getting the world to do what you want. I think we need a slightly less ‘coercive’, more ‘cooperative’ approach to the world in order to develop ‘ecotechnology’, but it’s still a useful distinction.

For example, classical mechanics is the study of what things do when they follow Newton’s laws. Control theory is the study of what you can get them to do.

Say you have an upside-down pendulum on a cart. Classical mechanics says what it will do. But control theory says: if you watch the pendulum and use what you see to move the cart back and forth correctly, you can make sure the pendulum doesn’t fall over!

Control theorists do their work with the help of ‘signal-flow diagrams’. For example, here is the signal-flow diagram for an inverted pendulum on a cart:

When I take a look at a diagram like this, I say to myself: that’s a string diagram for a morphism in a monoidal category! And it’s true. Jason Erbele wrote a paper explaining this. Independently, Bonchi, Sobociński and Zanasi did some closely related work:

• John Baez and Jason Erbele, Categories in control.

• Filippo Bonchi, Paweł Sobociński and Fabio Zanasi, Interacting Hopf algebras.

• Filippo Bonchi, Paweł Sobociński and Fabio Zanasi, A categorical semantics of signal flow graphs.

I’ll explain some of the ideas at the Turin meeting on the categorical foundations of network theory. But I also want to talk about this new paper that Simon Wadsley of Cambridge University wrote with my student Nick Woods:

• Simon Wadsley and Nick Woods, PROPs for linear systems.

This makes the picture neater and more general!

You see, Jason and I used signal flow diagrams to give a new description of the category of finite-dimensional vector spaces and linear maps. This category plays a big role in the control theory of linear systems. Bonchi, Sobociński and Zanasi gave a closely related description of an equivalent category, \mathrm{Mat}(k), where:

• objects are natural numbers, and

• a morphism f : m \to n is an n \times m matrix with entries in the field k,

and composition is given by matrix multiplication.

But Wadsley and Woods generalized all this work to cover \mathrm{Mat}(R) whenever R is a commutative rig. A rig is a ‘ring without negatives’—like the natural numbers. We can multiply matrices valued in any rig, and this includes some very useful examples… as I’ll explain later.

Wadsley and Woods proved:

Theorem. Whenever R is a commutative rig, \mathrm{Mat}(R) is the PROP for bicommutative bimonoids over R.

This result is quick to state, but it takes a bit of explaining! So, let me start by bringing in some definitions.

Bicommutative bimonoids

We will work in any symmetric monoidal category, and draw morphisms as string diagrams.

A commutative monoid is an object equipped with a multiplication:

and a unit:

obeying these laws:

For example, suppose \mathrm{FinVect}_k is the symmetric monoidal category of finite-dimensional vector spaces over a field k, with direct sum as its tensor product. Then any object V \in \mathrm{FinVect}_k is a commutative monoid where the multiplication is addition:

(x,y) \mapsto x + y

and the unit is zero: that is, the unique map from the zero-dimensional vector space to V.

Turning all this upside down, cocommutative comonoid has a comultiplication:

and a counit:

obeying these laws:

For example, consider our vector space V \in \mathrm{FinVect}_k again. It’s a commutative comonoid where the comultiplication is duplication:

x \mapsto (x,x)

and the counit is deletion: that is, the unique map from V to the zero-dimensional vector space.

Given an object that’s both a commutative monoid and a cocommutative comonoid, we say it’s a bicommutative bimonoid if these extra axioms hold:

You can check that these are true for our running example of a finite-dimensional vector space V. The most exciting one is the top one, which says that adding two vectors and then duplicating the result is the same as duplicating each one, then adding them appropriately.

Our example has some other properties, too! Each element c \in k defines a morphism from V to itself, namely scalar multiplication by c:

x \mapsto c x

We draw this as follows:

These morphisms are compatible with the ones so far:

Moreover, all the ‘rig operations’ in k—that is, addition, multiplication, 0 and 1, but not subtraction or division—can be recovered from what we have so far:

We summarize this by saying our vector space V is a bicommutative bimonoid ‘over k‘.

More generally, suppose we have a bicommutative bimonoid A in a symmetric monoidal category. Let \mathrm{End}(A) be the set of bicommutative bimonoid homomorphisms from A to itself. This is actually a rig: there’s a way to add these homomorphisms, and also a way to ‘multiply’ them (namely, compose them).

Suppose R is any commutative rig. Then we say A is a bicommutative bimonoid over R if it’s equipped with a rig homomorphism

\Phi : R \to \mathrm{End}(A)

This is a way of summarizing the diagrams I just showed you! You see, each c \in R gives a morphism from A to itself, which we write as

The fact that this is a bicommutative bimonoid endomorphism says precisely this:

And the fact that \Phi is a rig homomorphism says precisely this:

So sometimes the right word is worth a dozen pictures!

What Jason and I showed is that for any field k, the \mathrm{FinVect}_k is the free symmetric monoidal category on a bicommutative bimonoid over k. This means that the above rules, which are rules for manipulating signal flow diagrams, completely characterize the world of linear algebra!

Bonchi, Sobociński and Zanasi used ‘PROPs’ to prove a similar result where the field is replaced by a sufficiently nice commutative ring. And Wadlsey and Woods used PROPS to generalize even further to the case of an arbitrary commutative rig!

But what are PROPs?


A PROP is a particularly tractable sort of symmetric monoidal category: a strict symmetric monoidal category where the objects are natural numbers and the tensor product of objects is given by ordinary addition. The symmetric monoidal category \mathrm{FinVect}_k is equivalent to the PROP \mathrm{Mat}(k), where a morphism f : m \to n is an n \times m matrix with entries in k, composition of morphisms is given by matrix multiplication, and the tensor product of morphisms is the direct sum of matrices.

We can define a similar PROP \mathrm{Mat}(R) whenever R is a commutative rig, and Wadsley and Woods gave an elegant description of the ‘algebras’ of \mathrm{Mat}(R). Suppose C is a PROP and D is a strict symmetric monoidal category. Then the category of algebras of C in D is the category of strict symmetric monoidal functors F : C \to D and natural transformations between these.

If for every choice of D the category of algebras of C in D is equivalent to the category of algebraic structures of some kind in D, we say C is the PROP for structures of that kind. This explains the theorem Wadsley and Woods proved:

Theorem. Whenever R is a commutative rig, \mathrm{Mat}(R) is the PROP for bicommutative bimonoids over R.

The fact that an algebra of \mathrm{Mat}(R) is a bicommutative bimonoid is equivalent to all this stuff:

The fact that \Phi(c) is a bimonoid homomorphism for all c \in R is equivalent to this stuff:

And the fact that \Phi is a rig homomorphism is equivalent to this stuff:

This is a great result because it includes some nice new examples.

First, the commutative rig of natural numbers gives a PROP \mathrm{Mat}. This is equivalent to the symmetric monoidal category \mathrm{FinSpan}, where morphisms are isomorphism classes of spans of finite sets, with disjoint union as the tensor product. Steve Lack had already shown that \mathrm{FinSpan} is the PROP for bicommutative bimonoids. But this also follows from the result of Wadsley and Woods, since every bicommutative bimonoid V is automatically equipped with a unique rig homomorphism

\Phi : \mathbb{N} \to \mathrm{End}(V)

Second, the commutative rig of booleans

\mathbb{B} = \{F,T\}

with ‘or’ as addition and ‘and’ as multiplication gives a PROP \mathrm{Mat}(\mathbb{B}). This is equivalent to the symmetric monoidal category \mathrm{FinRel} where morphisms are relations between finite sets, with disjoint union as the tensor product. Samuel Mimram had already shown that this is the PROP for special bicommutative bimonoids, meaning those where comultiplication followed by multiplication is the identity:

But again, this follows from the general result of Wadsley and Woods!

Finally, taking the commutative ring of integers \mathbb{Z}, Wadsley and Woods showed that \mathrm{Mat}(\mathbb{Z}) is the PROP for bicommutative Hopf monoids. The key here is that scalar multiplication by -1 obeys the axioms for an antipode—the extra morphism that makes a bimonoid into a Hopf monoid. Here are those axioms:

More generally, whenever R is a commutative ring, the presence of -1 \in R guarantees that a bimonoid over R is automatically a Hopf monoid over R. So, when R is a commutative ring, Wadsley and Woods’ result implies that \mathrm{Mat}(R) is the PROP for Hopf monoids over R.

Earlier, in their paper on ‘interacting Hopf algebras’, Bonchi, Sobociński and Zanasi had given an elegant and very different proof that \mathrm{Mat}(R) is the PROP for Hopf monoids over R whenever R is a principal ideal domain. The advantage of their argument is that they build up the PROP for Hopf monoids over R from smaller pieces, using some ideas developed by Steve Lack. But the new argument by Wadsley and Woods has its own charm.

In short, we’re getting the diagrammatics of linear algebra worked out very nicely, providing a solid mathematical foundation for signal flow diagrams in control theory!


Get every new post delivered to your Inbox.

Join 3,072 other followers