Statistical Laws of Darwinian Evolution

18 April, 2016

guest post by Matteo Smerlak

Biologists like Steven J. Gould like to emphasize that evolution is unpredictable. They have a point: there is absolutely no way an alien visiting the Earth 400 million years ago could have said:

Hey, I know what’s gonna happen here. Some descendants of those ugly fish will grow wings and start flying in the air. Others will walk the surface of the Earth for a few million years, but they’ll get bored and they’ll eventually go back to the oceans; when they do, they’ll be able to chat across thousands of kilometers using ultrasound. Yet others will grow arms, legs, fur, they’ll climb trees and invent BBQ, and, sooner or later, they’ll start wondering “why all this?”.

Nor can we tell if, a week from now, the flu virus will mutate, become highly pathogenic and forever remove the furry creatures from the surface of the Earth.

Evolution isn’t gravity—we can’t tell in which directions things will fall down.

One reason we can’t predict the outcomes of evolution is that genomes evolve in a super-high dimensional combinatorial space, which a ginormous number of possible turns at every step. Another is that living organisms interact with one another in a massively non-linear way, with, feedback loops, tipping points and all that jazz.

Life’s a mess, if you want my physicist’s opinion.

But that doesn’t mean that nothing can be predicted. Think of statistics. Nobody can predict who I’ll vote for in the next election, but it’s easy to tell what the distribution of votes in the country will be like. Thus, for continuous variables which arise as sums of large numbers of independent components, the central limit theorem tells us that the distribution will always be approximately normal. Or take extreme events: the max of N independent random variables is distributed according to a member of a one-parameter family of so-called “extreme value distributions”: this is the content of the famous Fisher–Tippett–Gnedenko theorem.

So this is the problem I want to think about in this blog post: is evolution ruled by statistical laws? Or, in physics terms: does it exhibit some form of universality?

Fitness distributions are the thing

One lesson from statistical physics is that, to uncover universality, you need to focus on relevant variables. In the case of evolution, it was Darwin’s main contribution to figure out the main relevant variable: the average number of viable offspring, aka fitness, of an organism. Other features—physical strength, metabolic efficiency, you name it—matter only insofar as they are correlated with fitness. If we further assume that fitness is (approximately) heritable, meaning that descendants have the same fitness as their ancestors, we get a simple yet powerful dynamical principle called natural selection: in a given population, the lineage with the highest fitness eventually dominates, i.e. its fraction goes to one over time. This principle is very general: it applies to genes and species, but also to non-living entities such as algorithms, firms or language. The general relevance of natural selection as a evolutionary force is sometimes referred to as “Universal Darwinism”.

The general idea of natural selection is pictured below (reproduced from this paper):

It’s not hard to write down an equation which expresses natural selection in general terms. Consider an infinite population in which each lineage grows with some rate x. (This rate is called the log-fitness or Malthusian fitness to contrast it with the number of viable offspring w=e^{x\Delta t} with \Delta t the lifetime of a generation. It’s more convenient to use x than w in what follows, so we’ll just call x “fitness”). Then the distribution of fitness at time t satisfies the equation

\displaystyle{ \frac{\partial p_t(x)}{\partial t} =\left(x-\int d y\, y\, p_t(y)\right)p_t(x) }

whose explicit solution in terms of the initial fitness distribution p_0(x):

\displaystyle{ p_t(x)=\frac{e^{x t}p_0(x)}{\int d y\, e^{y t}p_0(y)} }

is called the Cramér transform of p_0(x) in large deviations theory. That is, viewed as a flow in the space of probability distributions, natural selection is nothing but a time-dependent exponential tilt. (These equations and the results below can be generalized to include the effect of mutations, which are critical to maintain variation in the population, but we’ll skip this here to focus on pure natural selection. See my paper referenced below for more information.)

An immediate consequence of these equations is that the mean fitness \mu_t=\int dx\, x\, p_t(x) grows monotonically in time, with a rate of growth given by the variance \sigma_t^2=\int dx\, (x-\mu_t)^2\, p_t(x):

\displaystyle{ \frac{d\mu_t}{dt}=\sigma_t^2\geq 0 }

The great geneticist Ronald Fisher (yes, the one in the extreme value theorem!) was very impressed with this relationship. He thought it amounted to an biological version of the second law of thermodynamics, writing in his 1930 monograph

Professor Eddington has recently remarked that “The law that entropy always increases—the second law of thermodynamics—holds, I think, the supreme position among the laws of nature”. It is not a little instructive that so similar a law should hold the supreme position among the biological sciences.

Unfortunately, this excitement hasn’t been shared by the biological community, notably because this Fisher “fundamental theorem of natural selection” isn’t predictive: the mean fitness \mu_t grows according to the fitness variance \sigma_t^2, but what determines the evolution of \sigma_t^2? I can’t use the identity above to predict the speed of evolution in any sense. Geneticists say it’s “dynamically insufficient”.

Two limit theorems

But the situation isn’t as bad as it looks. The evolution of p_t(x) may be decomposed into the evolution of its mean \mu_t, of its variance \sigma_t^2, and of its shape or type

\overline{p}_t(x)=\sigma_t p_t(\sigma_t x+\mu_t).

(We also call \overline{p}_t(x) the “standardized fitness distribution”.) With Ahmed Youssef we showed that:

• If p_0(x) is supported on the whole real line and decays at infinity as

-\ln\int_x^{\infty}p_0(y)d y\underset{x\to\infty}{\sim} x^{\alpha}

for some \alpha > 1, then \mu_t\sim t^{\overline{\alpha}-1}, \sigma_t^2\sim t^{\overline{\alpha}-2} and \overline{p}_t(x) converges to the standard normal distribution as t\to\infty. Here \overline{\alpha} is the conjugate exponent to \alpha, i.e. 1/\overline{\alpha}+1/\alpha=1.

• If p_0(x) has a finite right-end point x_+ with

p(x)\underset{x\to x_+}{\sim} (x_+-x)^\beta

for some \beta\geq0, then x_+-\mu_t\sim t^{-1}, \sigma_t^2\sim t^{-2} and \overline{p}_t(x) converges to the flipped gamma distribution

\displaystyle{ p^*_\beta(x)= \frac{(1+\beta)^{(1+\beta)/2}}{\Gamma(1+\beta)} \Theta[x-(1+\beta)^{1/2}] }

\displaystyle { e^{-(1+\beta)^{1/2}[(1+\beta)^{1/2}-x]}\Big[(1+\beta)^{1/2}-x\Big]^\beta }

Here and below the symbol \sim means “asymptotically equivalent up to a positive multiplicative constant”; \Theta(x) is the Heaviside step function. Note that p^*_\beta(x) becomes Gaussian in the limit \beta\to\infty, i.e. the attractors of cases 1 and 2 form a continuous line in the space of probability distributions; the other extreme case, \beta\to0, corresponds to a flipped exponential distribution.

The one-parameter family of attractors p_\beta^*(x) is plotted below:

These results achieve two things. First, they resolve the dynamical insufficiency of Fisher’s fundamental theorem by giving estimates of the speed of evolution in terms of the tail behavior of the initial fitness distribution. Second, they show that natural selection is indeed subject to a form of universality, whereby the relevant statistical structure turns out to be finite dimensional, with only a handful of “conserved quantities” (the \alpha and \beta exponents) controlling the late-time behavior of natural selection. This amounts to a large reduction in complexity and, concomitantly, an enhancement of predictive power.

(For the mathematically-oriented reader, the proof of the theorems above involves two steps: first, translate the selection equation into a equation for (cumulant) generating functions; second, use a suitable Tauberian theorem—the Kasahara theorem—to relate the behavior of generating functions at large values of their arguments to the tail behavior of p_0(x). Details in our paper.)

It’s useful to consider the convergence of fitness distributions to the attractors p_\beta^*(x) for 0\leq\beta\leq \infty in the skewness-kurtosis plane, i.e. in terms of the third and fourth cumulants of p_t(x).

The red curve is the family of attractors, with the normal at the bottom right and the flipped exponential at the top left, and the dots correspond to numerical simulations performed with the classical Wright–Fisher model and with a simple genetic algorithm solving a linear programming problem. The attractors attract!

Conclusion and a question

Statistics is useful because limit theorems (the central limit theorem, the extreme value theorem) exist. Without them, we wouldn’t be able to make any population-level prediction. Same with statistical physics: it only because matter consists of large numbers of atoms, and limit theorems hold (the H-theorem, the second law), that macroscopic physics is possible in the first place. I believe the same perspective is useful in evolutionary dynamics: it’s true that we can’t predict how many wings birds will have in ten million years, but we can tell what shape fitness distributions should have if natural selection is true.

I’ll close with an open question for you, the reader. In the central limit theorem as well as in the second law of thermodynamics, convergence is driven by a Lyapunov function, namely entropy. (In the case of the central limit theorem, it’s a relatively recent result by Arstein et al.: the entropy of the normalized sum of n i.i.d. random variables, when it’s finite, is a monotonically increasing function of n.) In the case of natural selection for unbounded fitness, it’s clear that entropy will also be eventually monotonically increasing—the normal is the distribution with largest entropy at fixed variance and mean.

Yet it turns out that, in our case, entropy isn’t monotonic at all times; in fact, the closer the initial distribution p_0(x) is to the normal distribution, the later the entropy of the standardized fitness distribution starts to increase. Or, equivalently, the closer the initial distribution p_0(x) to the normal, the later its relative entropy with respect to the normal. Why is this? And what’s the actual Lyapunov function for this process (i.e., what functional of the standardized fitness distribution is monotonic at all times under natural selection)?

In the plots above the blue, orange and green lines correspond respectively to

\displaystyle{ p_0(x)\propto e^{-x^2/2-x^4}, \quad p_0(x)\propto e^{-x^2/2-.01x^4}, \quad p_0(x)\propto e^{-x^2/2-.001x^4} }


• S. J. Gould, Wonderful Life: The Burgess Shale and the Nature of History, W. W. Norton & Co., New York, 1989.

• M. Smerlak and A. Youssef, Limiting fitness distributions in evolutionary dynamics, 2015.

• R. A. Fisher, The Genetical Theory of Natural Selection, Oxford University Press, Oxford, 1930.

• S. Artstein, K. Ball, F. Barthe and A. Naor, Solution of Shannon’s problem on the monotonicity of entropy, J. Am. Math. Soc. 17 (2004), 975–982.

Diamonds and Triamonds

11 April, 2016

The structure of a diamond crystal is fascinating. But there’s an equally fascinating form of carbon, called the triamond, that’s theoretically possible but never yet seen in nature. Here it is:


In the triamond, each carbon atom is bonded to three others at 120° angles, with one double bond and two single bonds. Its bonds lie in a plane, so we get a plane for each atom.

But here’s the tricky part: for any two neighboring atoms, these planes are different. In fact, if we draw the bond planes for all the atoms in the triamond, they come in four kinds, parallel to the faces of a regular tetrahedron!

If we discount the difference between single and double bonds, the triamond is highly symmetrical. There’s a symmetry carrying any atom and any of its bonds to any other atom and any of its bonds. However, the triamond has an inherent handedness, or chirality. It comes in two mirror-image forms.

A rather surprising thing about the triamond is that the smallest rings of atoms are 10-sided. Each atom lies in 15 of these 10-sided rings.

Some chemists have argued that the triamond should be ‘metastable’ at room temperature and pressure: that is, it should last for a while but eventually turn to graphite. Diamonds are also considered metastable, though I’ve never seen anyone pull an old diamond ring from their jewelry cabinet and discover to their shock that it’s turned to graphite. The big difference is that diamonds are formed naturally under high pressure—while triamonds, it seems, are not.

Nonetheless, the mathematics behind the triamond does find its way into nature. A while back I told you about a minimal surface called the ‘gyroid’, which is found in many places:

The physics of butterfly wings.

It turns out that the pattern of a gyroid is closely connected to the triamond! So, if you’re looking for a triamond-like pattern in nature, certain butterfly wings are your best bet:

• Matthias Weber, The gyroids: algorithmic geometry III, The Inner Frame, 23 October 2015.

Instead of trying to explain it here, I’ll refer you to the wonderful pictures at Weber’s blog.

Building the triamond

I want to tell you a way to build the triamond. I saw it here:

• Toshikazu Sunada, Crystals that nature might miss creating, Notices of the American Mathematical Society 55 (2008), 208–215.

This is the paper that got people excited about the triamond, though it was discovered much earlier by the crystallographer Fritz Laves back in 1932, and Coxeter named it the Laves graph.

To build the triamond, we can start with this graph:


It’s called \mathrm{K}_4, since it’s the complete graph on four vertices, meaning there’s one edge between each pair of vertices. The vertices correspond to four different kinds of atoms in the triamond: let’s call them red, green, yellow and blue. The edges of this graph have arrows on them, labelled with certain vectors

e_1, e_2, e_3, f_1, f_2, f_3 \in \mathbb{R}^3

Let’s not worry yet about what these vectors are. What really matters is this: to move from any atom in the triamond to any of its neighbors, you move along the vector labeling the edge between them… or its negative, if you’re moving against the arrow.

For example, suppose you’re at any red atom. It has 3 nearest neighbors, which are blue, green and yellow. To move to the blue neighbor you add f_1 to your position. To move to the green one you subtract e_2, since you’re moving against the arrow on the edge connecting blue and green. Similarly, to go to the yellow neighbor you subtract the vector f_3 from your position.

Thus, any path along the bonds of the triamond determines a path in the graph \mathrm{K}_4.

Conversely, if you pick an atom of some color in the triamond, any path in \mathrm{K}_4 starting from the vertex of that color determines a path in the triamond! However, going around a loop in \mathrm{K}_4 may not get you back to the atom you started with in the triamond.

Mathematicians summarize these facts by saying the triamond is a ‘covering space’ of the graph \mathrm{K}_4.

Now let’s see if you can figure out those vectors.

Puzzle 1. Find vectors e_1, e_2, e_3, f_1, f_2, f_3 \in \mathbb{R}^3 such that:

A) All these vectors have the same length.

B) The three vectors coming out of any vertex lie in a plane at 120° angles to each other:


For example, f_1, -e_2 and -f_3 lie in a plane at 120° angles to each other. We put in two minus signs because two arrows are pointing into the red vertex.

C) The four planes we get this way, one for each vertex, are parallel to the faces of a regular tetrahedron.

If you want, you can even add another constraint:

D) All the components of the vectors e_1, e_2, e_3, f_1, f_2, f_3 are integers.

Diamonds and hyperdiamonds

That’s the triamond. Compare the diamond:

Here each atom of carbon is connected to four others. This pattern is found not just in carbon but also other elements in the same column of the periodic table: silicon, germanium, and tin. They all like to hook up with four neighbors.

The pattern of atoms in a diamond is called the diamond cubic. It’s elegant but a bit tricky. Look at it carefully!

To build it, we start by putting an atom at each corner of a cube. Then we put an atom in the middle of each face of the cube. If we stopped there, we would have a face-centered cubic. But there are also four more carbons inside the cube—one at the center of each tetrahedron we’ve created.

If you look really carefully, you can see that the full pattern consists of two interpenetrating face-centered cubic lattices, one offset relative to the other along the cube’s main diagonal.

The face-centered cubic is the 3-dimensional version of a pattern that exists in any dimension: the Dn lattice. To build this, take an n-dimensional checkerboard and alternately color the hypercubes red and black. Then, put a point in the center of each black hypercube!

You can also get the Dn lattice by taking all n-tuples of integers that sum to an even integer. Requiring that they sum to something even is a way to pick out the black hypercubes.

The diamond is also an example of a pattern that exists in any dimension! I’ll call this the hyperdiamond, but mathematicians call it Dn+, because it’s the union of two copies of the Dn lattice. To build it, first take all n-tuples of integers that sum to an even integer. Then take all those points shifted by the vector (1/2, …, 1/2).

In any dimension, the volume of the unit cell of the hyperdiamond is 1, so mathematicians say it’s unimodular. But only in even dimensions is the sum or difference of any two points in the hyperdiamond again a point in the hyperdiamond. Mathematicians call a discrete set of points with this property a lattice.

If even dimensions are better than odd ones, how about dimensions that are multiples of 4? Then the hyperdiamond is better still: it’s an integral lattice, meaning that the dot product of any two vectors in the lattice is again an integer.

And in dimensions that are multiples of 8, the hyperdiamond is even better. It’s even, meaning that the dot product of any vector with itself is even.

In fact, even unimodular lattices are only possible in Euclidean space when the dimension is a multiple of 8. In 8 dimensions, the only even unimodular lattice is the 8-dimensional hyperdiamond, which is usually called the E8 lattice. The E8 lattice is one of my favorite entities, and I’ve written a lot about it in this series:

Integral octonions.

To me, the glittering beauty of diamonds is just a tiny hint of the overwhelming beauty of E8.

But let’s go back down to 3 dimensions. I’d like to describe the diamond rather explicitly, so we can see how a slight change produces the triamond.

It will be less stressful if we double the size of our diamond. So, let’s start with a face-centered cubic consisting of points whose coordinates are even integers summing to a multiple of 4. That consists of these points:

(0,0,0)   (2,2,0)   (2,0,2)   (0,2,2)

and all points obtained from these by adding multiples of 4 to any of the coordinates. To get the diamond, we take all these together with another face-centered cubic that’s been shifted by (1,1,1). That consists of these points:

(1,1,1)   (3,3,1)   (3,1,3)   (1,3,3)

and all points obtained by adding multiples of 4 to any of the coordinates.

The triamond is similar! Now we start with these points

(0,0,0)   (1,2,3)   (2,3,1)   (3,1,2)

and all the points obtain from these by adding multiples of 4 to any of the coordinates. To get the triamond, we take all these together with another copy of these points that’s been shifted by (2,2,2). That other copy consists of these points:

(2,2,2)   (3,0,1)   (0,1,3)   (1,3,0)

and all points obtained by adding multiples of 4 to any of the coordinates.

Unlike the diamond, the triamond has an inherent handedness, or chirality. You’ll note how we used the point (1,2,3) and took cyclic permutations of its coordinates to get more points. If we’d started with (3,2,1) we would have gotten the other, mirror-image version of the triamond.

Covering spaces

I mentioned that the triamond is a ‘covering space’ of the graph \mathrm{K}_4. More precisely, there’s a graph T whose vertices are the atoms of the triamond, and whose edges are the bonds of the triamond. There’s a map of graphs

p: T \to \mathrm{K}_4

This automatically means that every path in T is mapped to a path in \mathrm{K}_4. But what makes T a covering space of \mathrm{K}_4 is that any path in T comes from a path in \mathrm{K}_4, which is unique after we choose its starting point.

If you’re a high-powered mathematician you might wonder if T is the universal covering space of \mathrm{K}_4. It’s not, but it’s the universal abelian covering space.

What does this mean? Any path in \mathrm{K}_4 gives a sequence of vectors e_1, e_2, e_3, f_1, f_2, f_3 and their negatives. If we pick a starting point in the triamond, this sequence describes a unique path in the triamond. When does this path get you back where you started? The answer, I believe, is this: if and only if you can take your sequence, rewrite it using the commutative law, and cancel like terms to get zero. This is related to how adding vectors in \mathbb{R}^3 is a commutative operation.


For example, there’s a loop in \mathrm{K}_4 that goes “red, blue, green, red”. This gives the sequence of vectors

f_1, -e_3, e_2

We can turn this into an expression

f_1 - e_3 + e_2

However, we can’t simplify this to zero using just the commutative law and cancelling like terms. So, if we start at some red atom in the triamond and take the unique path that goes “red, blue, green, red”, we do not get back where we started!

Note that in this simplification process, we’re not allowed to use what the vectors “really are”. It’s a purely formal manipulation.

Puzzle 2. Describe a loop of length 10 in the triamond using this method. Check that you can simplify the corresponding expression to zero using the rules I described.

A similar story works for the diamond, but starting with a different graph:


The graph formed by a diamond’s atoms and the edges between them is the universal abelian cover of this little graph! This graph has 2 vertices because there are 2 kinds of atom in the diamond. It has 4 edges because each atom has 4 nearest neighbors.

Puzzle 3. What vectors should we use to label the edges of this graph, so that the vectors coming out of any vertex describe how to move from that kind of atom in the diamond to its 4 nearest neighbors?

There’s also a similar story for graphene, which is hexagonal array of carbon atoms in a plane:


Puzzle 4. What graph with edges labelled by vectors in \mathbb{R}^2 should we use to describe graphene?

I don’t know much about how this universal abelian cover trick generalizes to higher dimensions, though it’s easy to handle the case of a cubical lattice in any dimension.

Puzzle 5. I described higher-dimensional analogues of diamonds: are there higher-dimensional triamonds?


The Wikipedia article is good:

• Wikipedia, Laves graph.

They say this graph has many names: the K4 crystal, the (10,3)-a network, the srs net, the diamond twin, and of course the triamond. The name triamond is not very logical: while each carbon has 3 neighbors in the triamond, each carbon has not 2 but 4 neighbors in the diamond. So, perhaps the diamond should be called the ‘quadriamond’. In fact, the word ‘diamond’ has nothing to do with the prefix ‘di-‘ meaning ‘two’. It’s more closely related to the word ‘adamant’. Still, I like the word ‘triamond’.

This paper describes various attempts to find the Laves graph in chemistry:

• Stephen T. Hyde, Michael O’Keeffe, and Davide M. Proserpio, A short history of an elusive yet ubiquitous structure in chemistry, materials, and mathematics, Angew. Chem. Int. Ed. 47 (2008), 7996–8000.

This paper does some calculations arguing that the triamond is a metastable form of carbon:

• Masahiro Itoh et al, New metallic carbon crystal, Phys. Rev. Lett. 102 (2009), 055703.

Abstract. Recently, mathematical analysis clarified that sp2 hybridized carbon should have a three-dimensional crystal structure (\mathrm{K}_4) which can be regarded as a twin of the sp3 diamond crystal. In this study, various physical properties of the \mathrm{K}_4 carbon crystal, especially for the electronic properties, were evaluated by first principles calculations. Although the \mathrm{K}_4 crystal is in a metastable state, a possible pressure induced structural phase transition from graphite to \mathrm{K}_4 was suggested. Twisted π states across the Fermi level result in metallic properties in a new carbon crystal.

The picture of the \mathrm{K}_4 crystal was placed on Wikicommons by someone named ‘Workbit’, under a Creative Commons Attribution-Share Alike 4.0 International license. The picture of the tetrahedron was made using Robert Webb’s Stella software and placed on Wikicommons. The pictures of graphs come from Sunada’s paper, though I modified the picture of \mathrm{K}_4. The moving image of the diamond cubic was created by H.K.D.H. Bhadeshia and put into the public domain on Wikicommons. The picture of graphene was drawn by Dr. Thomas Szkopek and put into the public domain on Wikicommons.

Computing the Uncomputable

2 April, 2016

I love the more mind-blowing results of mathematical logic:

Surprises in logic.

Here’s a new one:

• Joel David Hamkins, Any function can be computable.

Let me try to explain it without assuming you’re an expert on mathematical logic. That may be hard, but I’ll give it a try. We need to start with some background.

First, you need to know that there are many different ‘models’ of arithmetic. If you write down the usual axioms for the natural numbers, the Peano axioms (or ‘PA’ for short), you can then look around for different structures that obey these axioms. These are called ‘models’ of PA.

One of them is what you think the natural numbers are. For you, the natural numbers are just 0, 1, 2, 3, …, with the usual way of adding and multiplying them. This is usually called the ‘standard model’ of PA. The numbers 0, 1, 2, 3, … are called the ‘standard’ natural numbers.

But there are also nonstandard models of arithmetic. These models contain extra numbers beside the standard ones! These are called ‘nonstandard’ natural numbers.

This takes a while to get used to. There are several layers of understanding to pass through.

For starters, you should think of these extra ‘nonstandard’ natural numbers as bigger than all the standard ones. So, imagine a whole bunch of extra numbers tacked on after the standard natural numbers, with the operations of addition and multiplication cleverly defined in such a way that all the usual axioms still hold.

You can’t just tack on finitely many extra numbers and get this to work. But there can be countably many, or uncountably many. There are infinitely many different ways to do this. They are all rather hard to describe.

To get a handle on them, it helps to realize this. Suppose you have a statement S in arithmetic that is neither provable nor disprovable from PA. Then S will hold in some models of arithmetic, while its negation not(S) will hold in some other models.

For example, the Gödel sentence G says “this sentence is not provable in PA”. If Peano arithmetic is consistent, neither G nor not(G) is provable in PA. So G holds in some models, while not(G) holds in others.

Thus, you can intuitively think of different models as “possible worlds”. If you have an undecidable statement, meaning one that you can’t prove or disprove in PA, then it holds in some worlds, while its negation holds in other worlds.

In the case of the Gödel sentence G, most mathematicians think G is “true”. Why the quotes? Truth is a slippery concept in logic—there’s no precise definition of what it means for a sentence in arithmetic to be “true”. All we can precisely define is:

1) whether or not a sentence is provable from some axioms


2) whether or not a sentence holds in some model.

Nonetheless, mathematicians are human, so they have beliefs about what’s true. Many mathematicians believe that G is true: indeed, in popular accounts one often hears that G is “true but unprovable in Peano arithmetic”. So, these mathematicians are inclined to say that any model where G doesn’t hold is nonstandard.

The result

Anyway, what is Joel David Hamkins’ result? It’s this:

There is a Turing machine T with the following property. For any function f from the natural numbers to the natural numbers, there is a model of PA such that in this model, if we give T any standard natural n as input, it halts and outputs f(n).

So, take f to be your favorite uncomputable function. Then there’s a model of arithmetic such that in this model, the Turing machine computes f, at least when you feed the machine standard numbers as inputs.

So, very very roughly, there’s a possible world in which your uncomputable function becomes computable!

But you have to be very careful about how you interpret this result.

The trick

What’s the trick? The proof is beautiful, but it would take real work to improve on Hamkins’ blog article, so please read that. I’ll just say that he makes extensive use of Rosser sentences, which say:

“For any proof of this sentence in theory T, there is a smaller proof of the negation of this sentence.”

Rosser sentences are already mind-blowing, but Hamkins uses an infinite sequence of such sentences and their negations, chosen in a way that depends on the function f, to cleverly craft a model of arithmetic in which the Turing machine T computes this function on standard inputs.

But what’s really going on? How can using a nonstandard model make an uncomputable function become computable for standard natural numbers? Shouldn’t nonstandard models agree with the standard one on this issue? After all, the only difference is that they have extra nonstandard numbers tacked on after all the standard ones! How can that make a Turing machine succeed in computing f on standard natural numbers?

I’m not 100% sure, but I think I know the answer. I hope some logicians will correct me if I’m wrong.

You have to read the result rather carefully:

There is a Turing machine T with the following property. For any function f from the natural numbers to the natural numbers, there is a model of PA such that in this model, if we give T any standard natural n as input, it halts and computes f(n).

When we say the Turing machine halts, we mean it halts after N steps for some natural number N. But this may not be a standard natural number! It’s a natural number in the model we’re talking about.

So, the Turing machine halts… but perhaps only after a nonstandard number of steps.

In short: you can compute the uncomputable, but only if you’re willing to wait long enough. You may need to wait a nonstandard amount of time.

It’s like that old Navy saying:


But the trick becomes more evident if you notice that one single Turing machine T computes different functions from the natural numbers to the natural numbers… in different models. That’s even weirder than computing an uncomputable function.

The only way to build a machine that computes n+1 in one model and n+2 in another to build a machine that doesn’t halt in a standard amount of time in either model. It only halts after a nonstandard amount of time. In one model, it halts and outputs n+1. In another, it halts and outputs n+2.

A scary possibility

To dig a bit deeper—and this is where it gets a bit scary—we have to admit that the standard model is a somewhat elusive thing. I certainly didn’t define it when I said this:

For you, the natural numbers are just 0, 1, 2, 3, …, with the usual way of adding and multiplying them. This is usually called the standard model of PA. The numbers 0, 1, 2, 3, … are called the ‘standard’ natural numbers.

The point is that “0, 1, 2, 3, …” here is vague. It makes sense if you already know what the standard natural numbers are. But if you don’t already know, those three dots aren’t going to tell you!

You might say the standard natural numbers are those of the form 1 + ··· + 1, where we add 1 to itself some finite number of times. But what does ‘finite number’ mean here? It means a standard natural number! So this is circular.

So, conceivably, the concept of ‘standard’ natural number, and the concept of ‘standard’ model of PA, are more subjective than most mathematicians think. Perhaps some of my ‘standard’ natural numbers are nonstandard for you!

I think most mathematicians would reject this possibility… but not all. Edward Nelson tackled it head-on in his marvelous book Internal Set Theory. He writes:

Perhaps it is fair to say that “finite” does not mean what we have always thought it to mean. What have we always thought it to mean? I used to think that I knew what I had always thought it to mean, but I no longer think so.

If we go down this road, Hamkins’ result takes on a different significance. It says that any subjectivity in the notion of ‘natural number’ may also infect what it means for a Turing machine to halt, and what function a Turing machine computes when it does halt.

Mathematics in Biochemical Kinetics and Engineering

23 March, 2016

Anyone who was interested in the Workshop on Mathematical Trends in Reaction Network Theory last summer in Copenhagen might be interested in this:

Mathematics in (bio)Chemical Kinetics and Engineering (MaCKiE 2017), Budapest, 25–27 May, 2017.

This conference is planned so that it starts right after another one: the 14th Joint European Thermodynamics Conference will be in Budapest from the 21st to the 25th.

Since its first event in 2002, the MaCKiE workshop is organized in every second year. The previous meetings were held in Ghent (Belgium), Chennai (India), Heidelberg (Germany), and Houston (USA). The meeting aims to bring together scientists interested in the application of advanced mathematical methods to describe kinetic phenomena, especially chemists, mathematicians, physicist, biologists, and engineers. The acronym MaCKiE naturally comes from the title of the conference, but is also part of the German name of Mack the Knife in Brecht and Weill’s Threepenny Opera, Mackie Messer, and is phonetically indistinguishable from “makkie” in Dutch, optimistically meaning “a cinch”.

Conference papers will be published in Reaction Kinetics, Mechanisms and Catalysis in early 2018.

The Involute of a Cubical Parabola

22 March, 2016

In his remarkable book The Theory of Singularities and its Applications, Vladimir Arnol’d claims that the symmetry group of the icosahedron is secretly lurking in the problem of finding the shortest path from one point in the plane to another while avoiding some obstacles that have smooth boundaries.

Arnol’d nicely expresses the awe mathematicians feel when they discover a phenomenon like this:

Thus the propagation of waves, on a 2-manifold with boundary, is controlled by an icosahedron hidden at an inflection point at the boundary. This icosahedron is hidden, and it is difficult to find it even if its existence is known.

I would like to understand this!

I think the easiest way for me to make progress is to solve this problem posed by Arnol’d:

Puzzle. Prove that the generic involute of a cubical parabola has a cusp of order 5/2 on the straight line tangent to the parabola at the inflection point.

There’s a lot of jargon here! Let me try to demystify it. (I don’t have the energy now to say how the symmetry group of the icosahedron gets into the picture, but it’s connected to the ‘5’ in the cusp of order 5/2.)

A cubical parabola is just a curve like y = x^3:


It’s a silly name. I guess y = x^3 looked at y = x^2 and said “I want to be a parabola too!”

The involute of a curve is what you get by attaching one end of a taut string to that curve and tracing the path of the string’s free end as you wind the string onto that curve. For example:

Here our original curve, in blue, is a catenary: the curve formed by a hanging chain. Its involute is shown in red.

There are a couple of confusing things about this picture if you’re just starting to learn about involutes. First, Sam Derbyshire, who made this picture, cleverly moved the end of the string attached to the catenary at the instant the other end hit the catenary! That allowed him to continue the involute past the moment it hits the catenary. The result is a famous curve called a tractrix.

Second, it seems that the end of the string attached to the catenary is ‘at infinity’, very far up.

But you don’t need to play either of these tricks if you’re trying to draw an involute. Take a point p on a curve C. Take a string of length \ell, nail down one end at p, and wind the string along C. Then the free end of your string traces out a curve D.

D is called an involute of C. It consists of all the points you can get to from p by a path of length \ell that doesn’t cross C.

So, Arnol’d’s puzzle concerns the involute of the curve y = x^3.

He wants you to nail down one end of the string at any ‘generic’ location. So, don’t nail it down at x = 0, y = 0, since that point is different from all the rest. That point is an inflection point, where the curve y = x^3 switches from curving down to curving up!

He wants you to wind the string along the curve y = x^3, forming an involute. And he wants you to see what the involute does when it crosses the line y = 0.

This is a bit tricky, since the region y \le x^3 is not convex. If you nail your string down at x = -1, y = -1, your string will have to start out above the curve y = x^3. But when the free end of your string crosses the line y = 0, the story changes. Now your string will need to go below the curve y = x^3.

It’s a bit hard to explain this both simply and accurately, but if you imagine drawing the involute with a piece of string, I think you’ll encounter the issue I’m talking about. I hope I understand it correctly!

Anyway, suppose you succeed in drawing the involute. What should you see?

Arnol’d says the involute should have a ‘cusp of order 5/2’ somewhere on the line y = 0.

A cusp of order 5/2 is a singularity in an otherwise smooth curve that looks like y^2 = x^5 in some coordinates. In a recent post I described various kinds of cusps, and in a comment I mentioned that the cusp of order 5/2 was called a rhamphoid cusp. Strangely, I wrote all that before knowing that Arnol’d places great significance on the cusp of order 5/2 in the involute of a cubical parabola!

Simon Burton drew some nice cusps of order 5/2. The curve y^2 = x^5 looks like this:

Rhamphoid Cusp

This is a more typical curve with a cusp of order 5/2:

(x-4y^2)^2 - (y+ 2x)^5 = 0

It looks like this:

Rhamphoid Cusp

It’s less symmetrical than the curve y^2 = x^5. Indeed, it looks like a bird’s beak: the word ‘rhamphoid’ means ‘beak-like’.
Arnol’d emphasizes that you should usually expect this sort of shape for a cusp of order 5/2:

It is easy to recognize this curve in experimental data, since after a generic diffeomorphism the curve consists of two branches that have equal curvatures at the common point, and hence are convex from the same side [….]

So, if we draw the involutes of a cubical parabola we should see something like this! And indeed, Marshall Hampton has made a great online program that draws these involutes. Here’s one:

Involute of cubical parabola - Marshall Hampton

The blue curve is the involute. It looks like it has a cusp of order 5/2 where it hits the line y = 0. It also has a less pointy cusp where it hits the red curve y = x^3. Like the cusp in the tractrix, this should be a cusp of order 3/2, also known as an ordinary cusp.


Regarding the easier puzzle I posed above, Arnol’d gives this hint:

HINT. The curvature centers of both branches of the involute, which meet at the point of the inflectional tangent, lie at the inflection point, hence both branches have the same convexity (they are both concave from the side of the inflection point of the boundary).

That’s not what I’d call crystal clear! However, I now understand what he means by the two ‘branches’ of the involute. They come from how you need to change the rules of the game as the free end of your string crosses the line y = 0. Remember, I wrote:

If you nail your string down at x = -1, y = -1, your string will have to start out above the curve y = x^3. But when the free end of your string crosses the line y = 0, the story changes. Now your string will need to go below the curve y = x^3.

When the rules of the game change, he claims there’s a cusp of order 5/2 in the involute.

I also think I finally understand the picture that Arnol’d uses to explain what’s going on:

It shows the curve y = x^3 in bold, and three involutes of this curve. One involute is not generic: it goes through the special point x = 0, y = 0. The other two are. They each have a cusp of order 5/2 where they hit the line y = 0, but also a cusp of order 3/2 where they hit the curve y = x^3. We can recognize the cusps of order 5/2, if we look carefully, by the fact that both branches are convex on the same side.

But again, the challenge is to prove that these involutes have cusps of order 5/2 where they hit the line y = 0. A cusp of order 7/2 would also have two branches that are convex on the same side!

Here’s one more hint. Wikipedia says that if we have a curve

C :\mathbb{R} \to \mathbb{R}^2

parametrized by arclength, so


for all s, then its involute is the curve

D :\mathbb{R} \to \mathbb{R}^2

given by

D(s) = C(s)- s C^\prime(s)

Strictly speaking, this must be an involute. And it must somehow handle the funny situations I described, where the involute fails to be smooth. I don’t know it does this.

Interview (Part 2)

21 March, 2016

Greg Bernhardt runs an excellent website for discussing physics, math and other topics, called Physics Forums. He recently interviewed me there. Since I used this opportunity to explain a bit about the Azimuth Project and network theory, I thought I’d reprint the interview here. Here is Part 2.


Tell us about your experience with past projects like “This Week’s Finds in Mathematical Physics”.

I was hired by U.C. Riverside back in 1989. I was lonely and bored, since Lisa was back on the other coast. So, I spent a lot of evenings on the computer.

We had the internet back then—this was shortly after stone tools were invented—but the world-wide web hadn’t caught on yet. So, I would read and write posts on “newsgroups” using a program called a “news server”. You have to imagine me sitting in front of an old green­-on­-black cathode ray tube monitor with a large floppy disk drive, firing up the old modem to hook up to the internet.

In 1993, I started writing a series of posts on the papers I’d read. I called it “This Week’s Finds in Mathematical Physics”, which was a big mistake, because I couldn’t really write one every week. After a while I started using it to explain lots of topics in math and physics. I wrote 300 issues. Then I quit in 2010, when I started taking climate change seriously.

Share with us a bit about your current projects like Azimuth and the n­-Café.

The n­-Category Café is a blog I started with Urs Schreiber and the philosopher David Corfield back in 2006, when all three of us realized that n­-categories are the big wave that math is riding right now. We have a bunch more bloggers on the team now. But the n­-Café lost some steam when I quit work in n­-categories and Urs started putting most of his energy into two related projects: a wiki called the nLab and a discussion group called the nForum.

In 2010, when I noticed that global warming was like a huge wave crashing down on our civilization, I started the Azimuth Project. The goal was to create a focal point for scientists and engineers interested in saving the planet. It consists of a team of people, a blog, a wiki and a discussion group. It was very productive for a while: we wrote a lot of educational articles on climate science and energy issues. But lately I’ve realized I’m better at abstract math. So, I’ve been putting more time into working with my grad students.

What about climate change has captured your interest?

That’s like asking: “What about that huge tsunami rushing toward us has captured your interest?”

Around 2004 I started hearing news that sent chills up my spine ­ and what really worried me is how few people were talking about this news, at least in the US.

I’m talking about how we’re pushing the Earth’s climate out of the glacial cycle we’ve been in for over a million years, into brand new territory. I’m talking about things like how it takes hundreds or thousands of years for CO2 to exit the atmosphere after it’s been put in. And I’m talking about how global warming is just part of a bigger phenomenon: the Anthropocene. That’s a new geological epoch, in which the biosphere is rapidly changing due to human influences. It’s not just the temperature:

• About 1/4 of all chemical energy produced by plants is now used by humans.

• The rate of species going extinct is 100­–1000 times the usual background rate.

• Populations of large ocean fish have declined 90% since 1950.

• Humans now take more nitrogen from the atmosphere and convert it into nitrates than all other processes combined.

8­-9 times as much phosphorus is flowing into oceans than the natural background rate.

This doesn’t necessarily spell the end of our civilization, but it is something that we’ll all have to deal with.

So, I felt the need to alert people and try to dream up strategies to do something. That’s why in 2010 I quit work on n­-categories and started the Azimuth Project.

Carbon Dioxide Variations

You have life experience on both US coasts. Which do you prefer and why?

There are some differences between the coasts, but they’re fairly minor. The West Coast is part of the Pacific Rim, so there’s more Asian influence here. The seasons are less pronounced here, because winds in the northern hemisphere blow from west to east, and the oceans serve as a temperature control system. Down south in Riverside it’s a semi­-desert, so we can eat breakfast in our back yard in January! But I live here not because I like the West Coast more. This just happens to be where my wife Lisa and I managed to get a job.

What I really like is getting out of the US and seeing the rest of the world. When you’re at cremation ritual in Bali, or a Hmong festival in Laos, the difference between regions of the US starts seeming pretty small.

But I wasn’t a born traveler. When I spent my first summer in England, I was very apprehensive about making a fool of myself. The British have different manners, and their old universities are full of arcane customs and subtle social distinctions that even the British find terrifying. But after a few summers there I got over it. First, all around the world, being American gives you a license to be clueless. If you behave any better than the worst stereotypes, people are impressed. Second, I spend most of my time with mathematicians, who are incredibly forgiving of bad social behavior as long as you know interesting theorems.

By now I’ve gotten to feel very comfortable in England. The last couple of years I’ve spent time at the quantum computation group at Oxford–the group run by Bob Coecke and Samson Abramsky. I like talking to Jamie Vicary about n­categories and physics, and also my old friend Minhyong Kim, who is a number theorist there.

I was also very apprehensive when I first visited Paris. Everyone talks about how the waiters are rude, and so on. But I think that’s an exaggeration. Yes, if you go to cafés packed with boorish tourists, the waiters will treat you like a boorish tourist—so don’t do that. If you go to quieter places and behave politely, most people are friendly. Luckily Lisa speaks French and has some friends in Paris; that opens up a lot of opportunities. I don’t speak French, so I always feel like a bit of an idiot, but I’ve learned to cope. I’ve spent a few summers there working with Paul­-André Melliès on category theory and logic.

Yau Ma Tei Market - Hong Kong

Yau Ma Tei Market – Hong Kong

I was also intimidated when I first spent a summer in Hong Kong—and even more so when I spent a summer in Shanghai. Lisa speaks Chinese too: she’s more cultured than me, and she drags me to interesting places. My first day walking around Shanghai left me completely exhausted: everything was new! Walking down the street you see people selling frogs in a bucket, strange fungi and herbs, then a little phone shop where telephone numbers with lots of 8’s cost more, and so on: it’s a kind of cognitive assault.

But again, I came to enjoy it. And coming back to California, everything seemed a bit boring. Why is there so much land that’s not being used? Where are all the people? Why is the food so bland?

I’ve spent the most time outside the US in Singapore. Again, that’s because my wife and I both got job offers there, not because it’s the best place in the world. Compared to China it’s rather sterile and manicured. But it’s still a fascinating place. They’ve pulled themselves up from a British colonial port town to a multi­cultural country that’s in some ways more technologically advanced than the US. The food is great: it’s a mix of Chinese, Indian, Malay and pretty much everything else. There’s essentially no crime: you can walk around in the darkest alley in the worst part of town at 3 am and still feel safe. It’s interesting to live in a country where people from very different cultures are learning to live together and prosper. The US considers itself a melting-pot, but in Singapore they have four national languages: English, Mandarin, Malay and Tamil.

Most of all, it’s great to live in places where the culture and politics is different than where I grew up. But I’m trying to travel less, because it’s bad for the planet.

You’ve gained some fame for your “crackpot index”. What were your motivations for developing it? Any new criteria you’d add?

After the internet first caught on, a bunch of us started using it to talk about physics on the usenet newsgroup sci.physics.

And then, all of a sudden, crackpots around the world started joining in!

Before this, I don’t think anybody realized how many people had their own personal theories of physics. You might have a crazy uncle who spent his time trying to refute special relativity, but you didn’t realize there were actually thousands of these crazy uncles.

As I’m sure you know here at Physics Forums, crackpots naturally tend to drive out more serious conversations. If you have some people talking about the laws of black hole thermodynamics, and some guy jumps in and says that the universe is a black hole, everyone will drop what they’re doing and argue with that guy. It’s irresistible. It reminds me of how when someone brings a baby to a party, everyone will start cooing to the baby. But it’s worse.

When physics crackpots started taking over the usenet newsgroup sci.physics, I discovered that they had a lot of features in common. The Crackpot Index summarizes these common features. Whenever I notice a new pattern, I add it.

For example: if someone starts comparing themselves to Galileo and says the physics establishment is going after them like the Inquisition, I guarantee you that they’re a crackpot. Their theories could be right—but unfortunately, they’ve got delusions of grandeur and a persecution complex.

It’s not being wrong that makes someone a crackpot. Being a full­-fledged crackpot is the endpoint of a tragic syndrome. Someone starts out being a bit too confident that they can revolutionize physics without learning it first. In fact, many young physicists go through this stage! But the good ones react to criticism by upping their game. The ones who become crackpots just brush it off. They come up with an idea that they think is great, and when nobody likes it, they don’t say “okay, I need to learn more.” Instead, they make up excuses: nobody understands me, maybe there’s a conspiracy at work, etc. The excuses get more complicated with each rebuff, and it gets harder and harder for them to back down and say “whoops, I was wrong”.

When I wrote the Crackpot Index, I thought crackpots were funny. Alexander Abian claimed all the world’s ills would be cured if we blew up the Moon. Archimedes Plutonium thinks the Universe is a giant plutonium atom. These ideas are funny. But now I realize how sad it is that someone can start with an passion for physics and end up in this kind of trap. They almost never escape.

Who are some of your math and physics heroes of the past and of today?

Wow, that’s a big question! I think every scientist needs to have heroes. I’ve had a lot.

Marie Curie

Marie Curie

When I was a kid, I was in love with Marie Curie. I wanted to marry a woman like her: someone who really cared about science. She overcame huge obstacles to get a degree in physics, discovered not one but two new elements, often doing experiments in her own kitchen—and won not one but two Nobel prizes. She was a tragic figure in many ways. Her beloved husband Pierre, a great physicist in his own right, slipped and was run over by a horse­-drawn cart, dying instantly when the wheels ran over his skull. She herself probably died from her experiments with radiation. But this made me love her all the more.

Later my big hero was Einstein. How could any physicist not have Einstein as a hero? First he came up with the idea that light comes in discrete quanta: photons. Then, two months later, he used Brownian motion to figure out the size of atoms. One month after that: special relativity, unifying space and time! Three months later, the equivalence between mass and energy. And all this was just a warmup for his truly magnificent theory of general relativity, explaining gravity as the curvature of space and time. He truly transformed our vision of the Universe. And then, in his later years, the noble and unsuccessful search for a unified field theory. As a friend of mine put it, what matters here is not that he failed: what matters is that he set physics a new goal, more ambitious than any goal it had before.

Later it was Feynman. As I mentioned, my uncle gave me Feynman’s Lectures on Physics. This is how I first learned Maxwell’s equations, special relativity, quantum mechanics. His way of explaining things with a minimum of jargon, getting straight to the heart of every issue, is something I really admire. Later I enjoyed his books like Surely You Must Be Joking. Still later I learned enough to be impressed by his work on QED.

But when you read his autobiographical books, you can see that he was a bit too obsessed with pretending to be a fun­-loving ordinary guy. A fun­-loving ordinary guy who just happens to be smarter than everyone else. In short, a self­-absorbed showoff. He could also be pretty mean to women—and in that respect, Einstein was even worse. So our heroes should not be admired uncritically.

Alexander Grothendieck

Alexander Grothendieck

A good example is Alexander Grothendieck. I guess he’s my main math hero these days. To solve concrete problems like the Weil conjectures, he avoided brute force techniques and instead developed revolutionary new concepts that gently dissolved those problems. And these new concepts turned out to be much more important than the problems that motivated him. I’m talking about abelian categories, schemes, topoi, stacks, things like that. Everyone who really wants to understand math at a deep level has got to learn these concepts. They’re beautiful and wonderfully simple—but not easy to master. You have to really change your world view to understand them, just like general relativity or quantum mechanics. You have to rewire your neurons.

At his peak, Grothendieck seemed almost superhuman. It seems he worked almost all day and all night, bouncing his ideas off the other amazing French algebraic geometers. Apparently 20,000 pages of his writings remain unpublished! But he became increasingly alienated from the mathematical establishment and eventually disappeared completely, hiding in a village near the Pyrenees.

Which groundbreaking advances in science and math are you most looking forward to?

I’d really like to see progress in figuring out the fundamental laws of physics. Ideally, I’d like to know the Theory of Everything. Of course, we don’t even know that there is one! There could be an endless succession of deeper and deeper realizations to be had about the laws of physics, with no final answer.

If we ever do discover the Theory of Everything, that won’t be the end of the story. It could be just the beginning. For example, next we could ask why this particular theory governs our Universe. Is it necessary, or contingent? People like to chat about this puzzle already, but I think it’s premature. I think we should find the Theory of Everything first.

Unfortunately, right now fundamental physics is in a phase of being “stuck”. I don’t expect to see the Theory of Everything in my lifetime. I’d be happy to see any progress at all! There are dozens of very basic things we don’t understand.

When it comes to math, I expect that people will have their hands full this century redoing the foundations using ∞-categories, and answering some of the questions that come up when you do this. The crowd working on “homotopy type theory” is making good progress–but so far they’re mainly thinking about ∞-groupoids, which are a very special sort of ∞-category. When we do all of math using ∞-categories, it will be a whole new ballgame.

And then there’s the question of whether humanity will figure out a way to keep from ruining the planet we live on. And the question of whether we’ll succeed in replacing ourselves with something more intelligent—or even wiser.

The Milky Way and Andromeda Nebula after their first collision, 4 billion years from now

The Milky Way and Andromeda Nebula after their first collision, 4 billion years from now

Here’s something cool: red dwarf stars will keep burning for 10 trillion years. If we, or any civilization, can settle down next to one of those, there will be plenty of time to figure things out. That’s what I hope for.

But some of my friends think that life always uses up resources as fast as possible. So one of my big questions is whether intelligent life will develop the patience to sit around and think interesting thoughts, or whether it will burn up red dwarf stars and every other source of energy as fast as it can, as we’re doing now with fossil fuels.

What does the future hold for John Baez? What are your goals?

What the future holds for me, primarily, is death.

That’s true of all of us—or at least most of us. While some hope that technology will bring immortality, or at least a much longer life, I bet most of us are headed for death fairly soon. So I try to make the most of the time I have.

I’m always re­-evaluating what I should do. I used to spend time thinking about quantum gravity and n­-categories. But quantum gravity feels stuck, and n­-category theory is shooting forward so fast that my help is no longer needed.

Climate change is hugely important, and nobody really knows what to do about it. Lots of people are trying lots of different things. Unfortunately I’m no better than the rest when it comes to the most obvious strategies—like politics, or climate science, or safer nuclear reactors, or better batteries and photocells.

The trick is finding things you can do better than other people. Right now for me that means thinking about networks and biology in a very abstract way. I’m inspired by this remark by Patten and Witkamp:

To understand ecosystems, ultimately will be to understand networks.

So that’s my goal for the next five years or so. It’s probably not be the best thing anyone can do to prepare for the Middle Anthropocene. But it may be the best thing I can do: use the math I know to help people understand the biosphere.

It may seem like I keep jumping around: from quantum gravity to n-categories to biology. But I keep wanting to think about networks, and how they change in time.

At some point I hope to retire and become a bit more of a self­-indulgent wastrel. I could write a fun book about group theory in geometry and physics, and a fun book about the octonions. I might even get around to spending more time on music!

John Baez in Namo Gorge, Gansu

John Baez

Interview (Part 1)

18 March, 2016

Greg Bernhardt runs an excellent website for discussing physics, math and other topics, called Physics Forums. He recently interviewed me there. Since I used this opportunity to explain a bit about the Azimuth Project and network theory, I thought I’d reprint the interview here. Here is Part 1.

Give us some background on yourself.

I’m interested in all kinds of mathematics and physics, so I call myself a mathematical physicist. But I’m a math professor at the University of California in Riverside. I’ve taught here since 1989. My wife Lisa Raphals got a job here nine years later: among other things, she studies classical Chinese and Greek philosophy.

I got my bachelors’s degree in math at Princeton. I did my undergrad thesis on whether you can use a computer to solve Schrödinger’s equation to arbitrary accuracy. In the end, it became obvious that you can. I was really interested in mathematical logic, and I used some in my thesis—the theory of computable functions—but I decided it wasn’t very helpful in physics. When I read the magnificently poetic last chapter of Misner, Thorne and Wheeler’s Gravitation, I decided that quantum gravity was the problem to work on.

I went to math grad school at MIT, but I didn’t find anyone to work with on quantum gravity. So, I did my thesis on quantum field theory with Irving Segal. He was one of the founders of “constructive quantum field theory”, where you try to rigorously prove that quantum field theories make mathematical sense and obey certain axioms that they should. This was a hard subject, and I didn’t accomplish much, but I learned a lot.

I got a postdoc at Yale and switched to classical field theory, mainly because it was something I could do. On the side I was still trying to understand quantum gravity. String theory was bursting into prominence at the time, and my life would have been easier if I’d jumped onto that bandwagon. But I didn’t like it, because most of the work back then studied strings moving on a fixed “background” spacetime. Quantum gravity is supposed to be about how the geometry of spacetime is variable and quantum­-mechanical, so I didn’t want a theory of quantum gravity set on a pre­-existing background geometry!

I got a professorship at U.C. Riverside based on my work on classical field theory. But at a conference on that subject in Seattle, I heard Abhay Ashtekar, Chris Isham and Renate Loll give some really interesting talks on loop quantum gravity. I don’t know why they gave those talks at a conference on classical field theory. But I’m sure glad they did! I liked their work because it was background­-free and mathematically rigorous. So I started work on loop quantum gravity.

Like many other theories, quantum gravity is easier in low dimensions. I became interested in how category theory lets you formulate quantum gravity in a universe with just 3 spacetime dimensions. It amounts to a radical new conception of space, where the geometry is described in a thoroughly quantum­-mechanical way. Ultimately, space is a quantum superposition of “spin networks”, which are like Feynman diagrams. The idea is roughly that a spin network describes a virtual process where particles move around and interact. If we know how likely each of these processes is, we know the geometry of space.

A spin network

A spin network

Loop quantum gravity tries to do the same thing for full­-fledged quantum gravity in 4 spacetime dimensions, but it doesn’t work as well. Then Louis Crane had an exciting idea: maybe 4­-dimensional quantum gravity needs a more sophisticated structure: a “2­-category”.

I had never heard of 2­-categories. Category theory is about things and processes that turn one thing into another. In a 2­-category we also have “meta-processes” that turn one process into another.

I became very excited about 2-­categories. At the time I was so dumb I didn’t consider the possibility of 3­-categories, and 4­-categories, and so on. To be precise, I was more of a mathematical physicist than a mathematician: I wasn’t trying to develop math for its own sake. Then someone named James Dolan told me about n­-categories! That was a real eye­-opener. He came to U.C. Riverside to work with me. So I started thinking about n­-categories in parallel with loop quantum gravity.

Dolan was technically my grad student, but I probably learned more from him than vice versa. In 1996 we wrote a paper called “Higher­-dimensional algebra and topological quantum field theory”, which might be my best paper. It’s full of grandiose guesses about n-­categories and their connections to other branches of math and physics. We had this vision of how everything fit together. It was so beautiful, with so much evidence supporting it, that we knew it had to be true. Unfortunately, at the time nobody had come up with a good definition of n­-category, except for n < 4. So we called our guesses “hypotheses” instead of “conjectures”. In math a conjecture should be something utterly precise: it’s either true or not, with no room for interpretation.

By now, I think everybody more or less believes our hypotheses. Some of the easier ones have already been turned into theorems. Jacob Lurie, a young hotshot at Harvard, improved the statement of one and wrote a 111-page outline of a proof. Unfortunately he still used some concepts that hadn’t been defined. People are working to fix that, and I feel sure they’ll succeed.

A foam of soap bubbles

A foam of soap bubbles

Anyway, I kept trying to connect these ideas to quantum gravity. In 1997, I introduced “spin foams”. These are structures like spin networks, but with an extra dimension. Spin networks have vertices and edges. Spin foams also have 2­-dimensional faces: imagine a foam of soap bubbles.

The idea was to use spin foams to give a purely quantum­-mechanical description of the geometry of spacetime, just as spin networks describe the geometry of space. But mathematically, what we’re doing here is going from a category to a 2­-category.

By now, there are a number of different theories of quantum gravity based on spin foams. Unfortunately, it’s not clear that any of them really work. In 2002, Dan Christensen, Greg Egan and I did a bunch of supercomputer calculations to study this question. We showed that the most popular spin foam theory at the time gave dramatically different answers than people had hoped for. I think we more or less killed that theory.

That left me rather depressed. I don’t enjoy programming: indeed, Christensen and Egan did all the hard work of that sort on our paper. I didn’t want to spend years sifting through spin foam theories to find one that works. And most of all, I didn’t want to end up as an old man still not knowing if my work had been worthwhile! To me n­-category theory was clearly the math of the future—and it was easy for me to come up with cool new ideas in that subject. So, I quit quantum gravity and switched to n­-categories.

But this was very painful. Quantum gravity is a kind of “holy grail” in physics. When you work on that subject, you wind up talking to lots of people who believe that unifying quantum mechanics and general relativity is the most important thing in the world, and that nothing else could possibly be as interesting. You wind up believing it. It took me years to get out of that mindset.

Ironically, when I quit quantum gravity, I felt free to explore string theory. As a branch of math, it’s really wonderful. I started looking at how n­-categories apply to string theory. It turns out there’s a wonderful story here: briefly, particles are to categories as strings are to 2­-categories, and all the math of particles can be generalized to strings using this idea! I worked out a bit of this story with Urs Schreiber and John Huerta.

Around 2010, I felt I had to switch to working on environmental issues and math related to engineering and biology, for the sake of the planet. That was another painful renunciation. But luckily, Urs Schreiber and others are continuing to work on n­-categories and string theory, and doing it better than I ever could. So I don’t feel the need to work on those things anymore—indeed, it would be hard to keep up. I just follow along quietly from the sidelines.

It’s quite possible that we need a dozen more good ideas before we really get anywhere on quantum gravity. But I feel confident that n­-categories will have a role to play. So, I’m happy to have helped push that subject forward.

Your uncle, Albert Baez, was a physicist. How did he help develop your interests?

He had a huge effect on me. He’s mainly famous for being the father of the folk singer Joan Baez. But he started out in optics and helped invent the first X­-ray microscope. Later he became very involved in physics education, especially in what were then called third­-world countries. For example, in 1951 he helped set up a physics department at the University of Baghdad.

Albert V. Baez

Albert V. Baez

When I was a kid he worked for UNESCO, so he’d come to Washington D.C. and stay with my parents, who lived nearby. Whenever he showed up, he would open his suitcase and pull out amazing gadgets: diffraction gratings, holograms, and things like that. And he would explain how they work! I decided physics was the coolest thing there is.

When I was eight, he gave me a copy of his book The New College Physics: A Spiral Approach. I immediately started trying to read it. The “spiral approach” is a great pedagogical principle: instead of explaining each topic just once, you should start off easy and then keep spiraling around from topic to topic, examining them in greater depth each time you revisit them. So he not only taught me physics, he taught me about how to learn and how to teach.

Later, when I was fifteen, I spent a couple weeks at his apartment in Berkeley. He showed me the Lawrence Hall of Science, which is where I got my first taste of programming—in BASIC, with programs stored on paper tape. This was in 1976. He also gave me a copy of The Feynman Lectures on Physics. And so, the following summer, when I was working at a state park building trails and such, I was also trying to learn quantum mechanics from the third volume of The Feynman Lectures. The other kids must have thought I was a complete geek—which of course I was.

Give us some insight on what your average work day is like.

During the school year I teach two or three days a week. On days when I teach, that’s the focus of my day: I try to prepare my classes starting at breakfast. Teaching is lots of fun for me. Right now I’m teaching two courses: an undergraduate course on game theory and a graduate course on category theory. I’m also running a seminar on category theory. In addition, I meet with my graduate students for a four­-hour session once a week: they show me what they’ve done, and we try to push our research projects forward.

On days when I don’t teach, I spend a lot of time writing. I love blogging, so I could easily do that all day, but I try to spend a lot of time writing actual papers. Any given paper starts out being tough to write, but near the end it practically writes itself. At the end, I have to tear myself away from it: I keep wanting to add more. At that stage, I feel an energetic glow at the end of a good day spent writing. Few things are so satisfying.

During the summer I don’t teach, so I can get a lot of writing done. I spent two years doing research at the Centre of Quantum Technologies, which is in Singapore, and since 2012 I’ve been working there during summers. Sometimes I bring my grad students, but mostly I just write.

I also spend plenty of time doing things with my wife, like talking, cooking, shopping, and working out at the gym. We like to watch TV shows in the evening, mainly mysteries and science fiction.

We also do a lot of gardening. When I was younger that seemed boring ­ but as you get older, subjective time speeds up, so you pay more attention to things like plants growing. There’s something tremendously satisfying about planting a small seedling, watching it grow into an orange tree, and eating its fruit for breakfast.

I love playing the piano and recording electronic music, but doing it well requires big blocks of time, which I don’t always have. Music is pure delight, and if I’m not listening to it I’m usually composing it in my mind.

If I gave in to my darkest urges and becames a decadent wastrel I might spend all day blogging, listening to music, recording music and working on pure math. But I need other things to stay sane.

What research are you working on at the moment?

Lately I’ve been trying to finish a paper called “Struggles with the Continuum”. It’s about the problems physics has with infinities, due to the assumption that spacetime is a continuum. At certain junctures this paper became psychologically difficult to write, since it’s supposed to include a summary of quantum field theory, which is complicated and sprawling subject. So, I’ve resorted to breaking this paper into blog articles and posting them on Physics Forums, just to motivate myself.

Purely for fun, I’ve been working with Greg Egan on some projects involving the octonions. The octonions are a number system where you can add, subtract, multiply and divide. Such number systems only exist in 1, 2, 4, and 8 dimensions: you’ve got the real numbers, which form a line, the complex numbers, which form a plane, the quaternions, which are 4­-dimensional, and the octonions, which are 8­dimensional. The octonions are the biggest, but also the weirdest. For example, multiplication of octonions violates the associative law: (xy)z is not equal to x(yz). So the octonions sound completely crazy at first, but they turn out to have fascinating connections to string theory and other things. They’re pretty addictive, and if became a decadent wastrel I would spend a lot more time on them.

The integral octonions of norm 1, projected onto a plane

The 240 unit integral octonions, projected onto a plane

There’s a concept of “integer” for the octonions, and integral octonions form a lattice, a repeating pattern of points, in 8 dimensions. This is called the E8 lattice. There’s another lattices that lives in in 24 dimensions, called the “Leech lattice”. Both are connected to string theory. Notice that 8+2 equals 10, the dimension superstrings like to live in, and 24+2 equals 26, the dimension bosonic strings like to live in. That’s not a coincidence! The 2 here comes from the 2­-dimensional world-sheet of the string.

Since 3×8 is 24, Egan and I became interested in how you could built the Leech lattice from 3 copies of the E8 lattice. People already knew a trick for doing it, but it took us a while to understand how it worked—and then Egan showed you could do this trick in exactly 17,280 ways! I want to write up the proof. There’s a lot of beautiful geometry here.

There’s something really exhilarating about struggling to reach the point where you have some insight into these structures and how they’re connected to physics.

My main work, though, involves using category theory to study networks. I’m interested in networks of all kinds, from electrical circuits to neural networks to “chemical reaction networks” and many more. Different branches of science and engineering focus on different kinds of networks. But there’s not enough communication between researchers in different subjects, so it’s up to mathematicians to set up a unified theory of networks.

I’ve got seven grad students working on this project—or actually eight, if you count Brendan Fong: I’ve been helping him on his dissertation, but he’s actually a student at Oxford.

Brendan was the first to join the project. I wanted him to work on electrical circuits, which are a nice familiar kind of network, a good starting point. But he went much deeper: he developed a general category­-theoretic framework for studying networks. We then applied it to electrical circuits, and other things as well.

Blake Pollard and Brendan Fong at the Centre for Quantum Technologies

Blake Pollard and Brendan Fong at the Centre for Quantum Technologies

Blake Pollard is a student of mine in the physics department here at U. C. Riverside. Together with Brendan and me, he developed a category­-theoretic approach to Markov processes: random processes where a system hops around between different states. We used Brendan’s general formalism to reduce Markov processes to electrical circuits. Now Blake is going further and using these ideas to tackle chemical reaction networks.

My other students are in the math department at U. C. Riverside. Jason Erbele is working on “control theory”, a branch of engineering where you try to design feedback loops to make sure processes run in a stable way. Control theory uses networks called “signal flow diagrams”, and Jason has worked out how to understand these using category theory.

Signal flow diagram

Signal flow diagram for an inverted pendulum on a cart

Jason isn’t using Brendan’s framework: he’s using a different one, called PROPs, which were developed a long time ago for use in algebraic topology. My student Franciscus Rebro has been developing it further, for use in our project. It gives a nice way to describe networks in terms of their basic building blocks. It also illuminates the similarity between signal flow diagrams and Feynman diagrams! They’re very similar, but there’s a big difference: in signal flow diagrams the signals are classical, while Feynman diagrams are quantum­-mechanical.

My student Brandon Coya has been working on electrical circuits. He’s sort of continuing what Brendan started, and unifying Brendan’s formalism with PROPs.

My student Adam Yassine is starting to work on networks in classical mechanics. In classical mechanics you usually consider a single system: you write down the Hamiltonian, you get the equations of motion, and you try to solve them. He’s working on a setup where you can take lots of systems and hook them up into a network.

My students Kenny Courser and Daniel Cicala are digging deeper into another aspect of network theory. As I hinted earlier, a category is about things and processes that turn one thing into another. In a 2­-category we also have “meta-processes” that turn one process into another. We’re starting to bring 2­-categories into network theory.

For example, you can use categories to describe an electrical circuit as a process that turns some inputs into some outputs. You put some currents in one end and some currents come out the other end. But you can also use 2­-categories to describe “meta-processes” that turn one electrical circuit into another. An example of a meta-process would be a way of simplifying an electrical circuit, like replacing two resistors in series by a single resistor.

Ultimately I want to push these ideas in the direction of biochemistry. Biology seems complicated and “messy” to physicists and mathematicians, but I think there must be a beautiful logic to it. It’s full of networks, and these networks change with time. So, 2­-categories seem like a natural language for biology.

It won’t be easy to convince people of this, but that’s okay.


Get every new post delivered to your Inbox.

Join 3,222 other followers