Struggles with the Continuum (Part 3)

12 September, 2016

In these posts, we’re seeing how our favorite theories of physics deal with the idea that space and time are a continuum, with points described as lists of real numbers. We’re not asking if this idea is true: there’s no clinching evidence to answer that question, so it’s too easy to let ones philosophical prejudices choose the answer. Instead, we’re looking to see what problems this idea causes, and how physicists have struggled to solve them.

We started with the Newtonian mechanics of point particles attracting each other with an inverse square force law. We found strange ‘runaway solutions’ where particles shoot to infinity in a finite amount of time by converting an infinite amount of potential energy into kinetic energy.

Then we added quantum mechanics, and we saw this problem went away, thanks to the uncertainty principle.

Now let’s take away quantum mechanics and add special relativity. Now our particles can’t go faster than light. Does this help?

Point particles interacting with the electromagnetic field

Special relativity prohibits instantaneous action at a distance. Thus, most physicists believe that special relativity requires that forces be carried by fields, with disturbances in these fields propagating no faster than the speed of light. The argument for this is not watertight, but we seem to actually see charged particles transmitting forces via a field, the electromagnetic field—that is, light. So, most work on relativistic interactions brings in fields.

Classically, charged point particles interacting with the electromagnetic field are described by two sets of equations: Maxwell’s equations and the Lorentz force law. The first are a set of differential equations involving:

• the electric field \vec E and mangetic field \vec B, which in special relativity are bundled together into the electromagnetic field F, and

• the electric charge density \rho and current density \vec \jmath, which are bundled into another field called the ‘four-current’ J.

By themselves, these equations are not enough to completely determine the future given initial conditions. In fact, you can choose \rho and \vec \jmath freely, subject to the conservation law

\displaystyle{ \frac{\partial \rho}{\partial t} + \nabla \cdot \vec \jmath = 0 }

For any such choice, there exists a solution of Maxwell’s equations for t \ge 0 given initial values for \vec E and \vec B that obey these equations at t = 0.

Thus, to determine the future given initial conditions, we also need equations that say what \rho and \vec{\jmath} will do. For a collection of charged point particles, they are determined by the curves in spacetime traced out by these particles. The Lorentz force law says that the force on a particle of charge e is

\vec{F} = e (\vec{E} + \vec{v} \times \vec{B})

where \vec v is the particle’s velocity and \vec{E} and \vec{B} are evaluated at the particle’s location. From this law we can compute the particle’s acceleration if we know its mass.

The trouble starts when we try to combine Maxwell’s equations and the Lorentz force law in a consistent way, with the goal being to predict the future behavior of the \vec{E} and \vec{B} fields, together with particles’ positions and velocities, given all these quantities at t = 0. Attempts to do this began in the late 1800s. The drama continues today, with no definitive resolution! You can find good accounts, available free online, by Feynman and by Janssen and Mecklenburg. Here we can only skim the surface.

The first sign of a difficulty is this: the charge density and current associated to a charged particle are singular, vanishing off the curve it traces out in spacetime but ‘infinite’ on this curve. For example, a charged particle at rest at the origin has

\rho(t,\vec x) = e \delta(\vec{x}), \qquad \vec{\jmath}(t,\vec{x}) = \vec{0}

where \delta is the Dirac delta and e is the particle’s charge. This in turn forces the electric field to be singular at the origin. The simplest solution of Maxwell’s equations consisent with this choice of \rho and \vec\jmath is

\displaystyle{ \vec{E}(t,\vec x) = \frac{e \hat{r}}{4 \pi \epsilon_0 r^2}, \qquad \vec{B}(t,\vec x) = 0 }

where \hat{r} is a unit vector pointing away from the origin and \epsilon_0 is a constant called the permittivity of free space.

In short, the electric field is ‘infinite’, or undefined, at the particle’s location. So, it is unclear how to define the ‘self-force’ exerted by the particle’s own electric field on itself. The formula for the electric field produced by a static point charge is really just our old friend, the inverse square law. Since we had previously ignored the force of a particle on itself, we might try to continue this tactic now. However, other problems intrude.

In relativistic electrodynamics, the electric field has energy density equal to

\displaystyle{ \frac{\epsilon_0}{2} |\vec{E}|^2 }

Thus, the total energy of the electric field of a point charge at rest is proportional to

\displaystyle{ \frac{\epsilon_0}{2} \int_{\mathbb{R}^3} |\vec{E}|^2 \, d^3 x =  \frac{e^2}{8 \pi \epsilon_0} \int_0^\infty \frac{1}{r^4} \, r^2 dr. }

But this integral diverges near r = 0, so the electric field of a charged particle has an infinite energy!

How, if at all, does this cause trouble when we try to unify Maxwell’s equations and the Lorentz force law? It helps to step back in history. In 1902, the physicist Max Abraham assumed that instead of a point, an electron is a sphere of radius R with charge evenly distributed on its surface. Then the energy of its electric field becomes finite, namely:

\displaystyle{  E = \frac{e^2}{8 \pi \epsilon_0} \int_{R}^\infty \frac{1}{r^4} \, r^2 dr = \frac{1}{2} \frac{e^2}{4 \pi \epsilon_0 R} }

where e is the electron’s charge.

Abraham also computed the extra momentum a moving electron of this sort acquires due to its electromagnetic field. He got it wrong because he didn’t understand Lorentz transformations. In 1904 Lorentz did the calculation right. Using the relationship between velocity, momentum and mass, we can derive from his result a formula for the ‘electromagnetic mass’ of the electron:

\displaystyle{  m = \frac{2}{3} \frac{e^2}{4 \pi \epsilon_0 R c^2} }

where c is the speed of light. We can think of this as the extra mass an electron acquires by carrying an electromagnetic field along with it.

Putting the last two equations together, these physicists obtained a remarkable result:

\displaystyle{  E = \frac{3}{4} mc^2 }

Then, in 1905, a fellow named Einstein came along and made it clear that the only reasonable relation between energy and mass is

E = mc^2

What had gone wrong?

In 1906, Poincaré figured out the problem. It is not a computational mistake, nor a failure to properly take special relativity into account. The problem is that like charges repel, so if the electron were a sphere of charge it would explode without something to hold it together. And that something—whatever it is—might have energy. But their calculation ignored that extra energy.

In short, the picture of the electron as a tiny sphere of charge, with nothing holding it together, is incomplete. And the calculation showing E = \frac{3}{4}mc^2, together with special relativity saying E = mc^2, shows that this incomplete picture is inconsistent. At the time, some physicists hoped that all the mass of the electron could be accounted for by the electromagnetic field. Their hopes were killed by this discrepancy.

Nonetheless it is interesting to take the energy E computed above, set it equal to m_e c^2 where m_e is the electron’s observed mass, and solve for the radius R. The answer is

\displaystyle{ R = \frac{1}{8 \pi \epsilon_0} \frac{e^2}{m_e c^2} } \approx 1.4 \times 10^{-15} \mathrm{ meters}

In the early 1900s, this would have been a remarkably tiny distance: 0.00003 times the Bohr radius of a hydrogen atom. By now we know this is roughly the radius of a proton. We know that electrons are not spheres of this size. So at present it makes more sense to treat the calculations so far as a prelude to some kind of limiting process where we take R \to 0. These calculations teach us two lessons.

First, the electromagnetic field energy approaches +\infty as we let R \to 0, so we will be hard pressed to take this limit and get a well-behaved physical theory. One approach is to give a charged particle its own ‘bare mass’ m_\mathrm{bare} in addition to the mass m_\mathrm{elec} arising from electromagnetic field energy, in a way that depends on R. Then as we take the R \to 0 limit we can let m_\mathrm{bare} \to -\infty in such a way that m_\mathrm{bare} + m_\mathrm{elec} approaches a chosen limit m, the physical mass of the point particle. This is an example of ‘renormalization’.

Second, it is wise to include conservation of energy-momentum as a requirement in addition to Maxwell’s equations and the Lorentz force law. Here is a more sophisticated way to phrase Poincaré’s realization. From the electromagnetic field one can compute a ‘stress-energy tensor’ T, which describes the flow of energy and momentum through spacetime. If all the energy and momentum of an object comes from its electromagnetic field, you can compute them by integrating T over the hypersurface t = 0. You can prove that the resulting 4-vector transforms correctly under Lorentz transformations if you assume the stress-energy tensor has vanishing divergence: \partial^\mu T_{\mu \nu} = 0. This equation says that energy and momentum are locally conserved. However, this equation fails to hold for a spherical shell of charge with no extra forces holding it together. In the absence of extra forces, it violates conservation of momentum for a charge to feel an electromagnetic force yet not accelerate.

So far we have only discussed the simplest situation: a single charged particle at rest, or moving at a constant velocity. To go further, we can try to compute the acceleration of a small charged sphere in an arbitrary electromagnetic field. Then, by taking the limit as the radius r of the sphere goes to zero, perhaps we can obtain the law of motion for a charged point particle.

In fact this whole program is fraught with difficulties, but physicists boldly go where mathematicians fear to tread, and in a rough way this program was carried out already by Abraham in 1905. His treatment of special relativistic effects was wrong, but these were easily corrected; the real difficulties lie elsewhere. In 1938 his calculations were carried out much more carefully—though still not rigorously—by Dirac. The resulting law of motion is thus called the ‘Abraham–Lorentz–Dirac force law’.

There are three key ways in which this law differs from our earlier naive statement of the Lorentz force law:

• We must decompose the electromagnetic field in two parts, the ‘external’ electromagnetic field F_\mathrm{ext} and the field produced by the particle:

F = F_\mathrm{ext} + F_\mathrm{ret}

Here F_\mathrm{ext} is a solution Maxwell equations with J = 0, while F_\mathrm{ret} is computed by convolving the particle’s 4-current J with a function called the ‘retarded Green’s function’. This breaks the time-reversal symmetry of the formalism so far, ensuring that radiation emitted by the particle moves outward as t increases. We then decree that the particle only feels a Lorentz force due to F_\mathrm{ext}, not F_\mathrm{ret}. This avoids the problem that F_\mathrm{ret} becomes infinite along the particle’s path as r \to 0.

• Maxwell’s equations say that an accelerating charged particle emits radiation, which carries energy-momentum. Conservation of energy-momentum implies that there is a compensating force on the charged particle. This is called the ‘radiation reaction’. So, in addition to the Lorentz force, there is a radiation reaction force.

• As we take the limit r \to 0, we must adjust the particle’s bare mass m_\mathrm{bare} in such a way that its physical mass m = m_\mathrm{bare} + m_\mathrm{elec} is held constant. This involves letting m_\mathrm{bare} \to -\infty as m_\mathrm{elec} \to +\infty.

It is easiest to describe the Abraham–Lorentz–Dirac force law using standard relativistic notation. So, we switch to units where c and 4 \pi \epsilon_0 equal 1, let x^\mu denote the spacetime coordinates of a point particle, and use a dot to denote the derivative with respect to proper time. Then the Abraham–Lorentz–Dirac force law says

m \ddot{x}^\mu = e F_{\mathrm{ext}}^{\mu \nu} \, \dot{x}_\nu \; - \; \frac{2}{3}e^2 \ddot{x}^\alpha \ddot{x}_\alpha \, \dot{x}^\mu \; + \; \frac{2}{3}e^2 \dddot{x}^\mu  .

The first term at right is the Lorentz force, which looks more elegant in this new notation. The second term is fairly intuitive: it acts to reduce the particle’s velocity at a rate proportional to its velocity (as one would expect from friction), but also proportional to the squared magnitude of its acceleration. This is the ‘radiation reaction’.

The last term, called the ‘Schott term’, is the most shocking. Unlike all familiar laws in classical mechanics, it involves the third derivative of the particle’s position!

This seems to shatter our original hope of predicting the electromagnetic field and the particle’s position and velocity given their initial values. Now it seems we need to specify the particle’s initial position, velocity and acceleration.

Furthermore, unlike Maxwell’s equations and the original Lorentz force law, the Abraham–Lorentz–Dirac force law is not symmetric under time reversal. If we take a solution and replace t with -t, the result is not a solution. Like the force of friction, radiation reaction acts to make a particle lose energy as it moves into the future, not the past.

The reason is that our assumptions have explicitly broken time symmetry. The splitting F = F_\mathrm{ext} + F_\mathrm{ret} says that a charged accelerating particle radiates into the future, creating the field F_\mathrm{ret}, and is affected only by the remaining electromagnetic field F_\mathrm{ext}.

Worse, the Abraham–Lorentz–Dirac force law has counterintuitive solutions. Suppose for example that F_\mathrm{ext} = 0. Besides the expected solutions where the particle’s velocity is constant, there are solutions for which the particle accelerates indefinitely, approaching the speed of light! These are called ‘runaway solutions’. In these runaway solutions, the acceleration as measured in the frame of reference of the particle grows exponentially with the passage of proper time.

So, the notion that special relativity might help us avoid the pathologies of Newtonian point particles interacting gravitationally—five-body solutions where particles shoot to infinity in finite time—is cruelly mocked by the Abraham–Lorentz–Dirac force law. Particles cannot move faster than light, but even a single particle can extract an arbitrary amount of energy-momentum from the electromagnetic field in its immediate vicinity and use this to propel itself forward at speeds approaching that of light. The energy stored in the field near the particle is sometimes called ‘Schott energy’.

Thanks to the Schott term in the Abraham–Lorentz–Dirac force law, the Schott energy can be converted into kinetic energy for the particle. The details of how this work are nicely discussed in a paper by Øyvind Grøn, so click the link and read that if you’re interested. I’ll just show you a picture from that paper:

Gron - Schott energy

So even one particle can do crazy things! But worse, suppose we generalize the framework to include more than one particle. The arguments for the Abraham–Lorentz–Dirac force law can be generalized to this case. The result is simply that each particle obeys this law with an external field F_\mathrm{ext} that includes the fields produced by all the other particles. But a problem appears when we use this law to compute the motion of two particles of opposite charge. To simplify the calculation, suppose they are located symmetrically with respect to the origin, with equal and opposite velocities and accelerations. Suppose the external field felt by each particle is solely the field created by the other particle. Since the particles have opposite charges, they should attract each other. However, one can prove they will never collide. In fact, if at any time they are moving towards each other, they will later turn around and move away from each other at ever-increasing speed!

This fact was discovered by C. Jayaratnam Eliezer in 1943. It is so counterintuitive that several proofs were required before physicists believed it.

None of these strange phenomena have ever been seen experimentally. Faced with this problem, physicists have naturally looked for ways out. First, why not simply cross out the \dddot{x}^\mu term in the Abraham–Lorentz–Dirac force? Unfortunately the resulting simplified equation

m \ddot{x}^\mu = e F_{\mathrm{ext}}^{\mu \nu} \, \dot{x}_\nu - \frac{2}{3}e^2 \ddot{x}^\alpha \ddot{x}_\alpha \, \dot{x}^\mu

has only trivial solutions. The reason is that with the particle’s path parametrized by proper time, the vector \dot{x}^\mu has constant length, so the vector \ddot{x}^\mu is orthogonal to \dot{x}^\mu . So is the vector F_{\mathrm{ext}}^{\mu \nu} \dot{x}_\nu, because F_{\mathrm{ext}} is an antisymmetric tensor. So, the last term must be zero, which implies \ddot{x} = 0, which in turn implies that all three terms must vanish.

Another possibility is that some assumption made in deriving the Abraham–Lorentz–Dirac force law is incorrect. Of course the theory is physically incorrect, in that it ignores quantum mechanics and other things, but that is not the issue. The issue here is one of mathematical physics, of trying to formulate a well-behaved classical theory that describes charged point particles interacting with the electromagnetic field. If we can prove this is impossible, we will have learned something. But perhaps there is a loophole. The original arguments for the Abraham–Lorentz–Dirac force law are by no means mathematically rigorous. They involve a delicate limiting procedure, and approximations that were believed, but not proved, to become perfectly accurate in the r \to 0 limit. Could these arguments conceal a mistake?

Calculations involving a spherical shell of charge has been improved by a series of authors, and nicely summarized by Fritz Rohrlich. In all these calculations, nonlinear powers of the acceleration and its time derivatives are neglected, and one hopes this is acceptable in the r \to 0 limit.

Dirac, struggling with renormalization in quantum field theory, took a different tack. Instead of considering a sphere of charge, he treated the electron as a point from the very start. However, he studied the flow of energy-momentum across the surface of a tube of radius r centered on the electron’s path. By computing this flow in the limit r \to 0, and using conservation of energy-momentum, he attempted to derive the force on the electron. He did not obtain a unique result, but the simplest choice gives the Abraham–Lorentz–Dirac equation. More complicated choices typically involve nonlinear powers of the acceleration and its time derivatives.

Since this work, many authors have tried to simplify Dirac’s rather complicated calculations and clarify his assumptions. This book is a good guide:

• Stephen Parrott, Relativistic Electrodynamics and Differential Geometry, Springer, Berlin, 1987.

But more recently, Jerzy Kijowski and some coauthors have made impressive progress in a series of papers that solve many of the problems we have described.

Kijowski’s key idea is to impose conditions on precisely how the electromagnetic field is allowed to behave near the path traced out by a charged point particle. He breaks the field into a ‘regular’ part and a ‘singular’ part:

F = F_\textrm{reg} + F_\textrm{sing}

Here F_\textrm{reg} is smooth everywhere, while F_\textrm{sing} is singular near the particle’s path, but only in a carefully prescribed way. Roughly, at each moment, in the particle’s instantaneous rest frame, the singular part of its electric field consists of the familiar part proportional to 1/r^2, together with a part proportional to 1/r^3 which depends on the particle’s acceleration. No other singularities are allowed!

On the one hand, this eliminates the ambiguities mentioned earlier: in the end, there are no ‘nonlinear powers of the acceleration and its time derivatives’ in Kijowski’s force law. On the other hand, this avoids breaking time reversal symmetry, as the earlier splitting F = F_\textrm{ext} + F_\textrm{ret} did.

Next, Kijowski defines the energy-momentum of a point particle to be m \dot{x}, where m is its physical mass. He defines the energy-momentum of the electromagnetic field to be just that due to F_\textrm{reg}, not F_\textrm{sing}. This amounts to eliminating the infinite ‘electromagnetic mass’ of the charged particle. He then shows that Maxwell’s equations and conservation of total energy-momentum imply an equation of motion for the particle!

This equation is very simple:

m \ddot{x}^\mu = e F_{\textrm{reg}}^{\mu \nu} \, \dot{x}_\nu

It is just the Lorentz force law! Since the troubling Schott term is gone, this is a second-order differential equation. So we can hope that to predict the future behavior of the electromagnetic field, together with the particle’s position and velocity, given all these quantities at t = 0.

And indeed this is true! In 1998, together with Gittel and Zeidler, Kijowski proved that initial data of this sort, obeying the careful restrictions on allowed singularities of the electromagnetic field, determine a unique solution of Maxwell’s equations and the Lorentz force law, at least for a short amount of time. Even better, all this remains true for any number of particles.

There are some obvious questions to ask about this new approach. In the Abraham–Lorentz–Dirac force law, the acceleration was an independent variable that needed to be specified at t = 0 along with position and momentum. This problem disappears in Kijowski’s approach. But how?

We mentioned that the singular part of the electromagnetic field, F_\textrm{sing}, depends on the particle’s acceleration. But more is true: the particle’s acceleration is completely determined by F_\textrm{sing}. So, the particle’s acceleration is not an independent variable because it is encoded into the electromagnetic field.

Another question is: where did the radiation reaction go? The answer is: we can see it if we go back and decompose the electromagnetic field as F_\textrm{ext} + F_\textrm{ret} as we had before. If we take the law

m \ddot{x}^\mu = e F_{\textrm{reg}}^{\mu \nu} \dot{x}_\nu

and rewrite it in terms of F_\textrm{ext}, we recover the original Abraham–Lorentz–Dirac law, including the radiation reaction term and Schott term.

Unfortunately, this means that ‘pathological’ solutions where particles extract arbitrary amounts of energy from the electromagnetic field are still possible. A related problem is that apparently nobody has yet proved solutions exist for all time. Perhaps a singularity worse than the allowed kind could develop in a finite amount of time—for example, when particles collide.

So, classical point particles interacting with the electromagnetic field still present serious challenges to the physicist and mathematician. When you have an infinitely small charged particle right next to its own infinitely strong electromagnetic field, trouble can break out very easily!

Particles without fields

Finally, I should also mention attempts, working within the framework of special relativity, to get rid of fields and have particles interact with each other directly. For example, in 1903 Schwarzschild introduced a framework in which charged particles exert an electromagnetic force on each other, with no mention of fields. In this setup, forces are transmitted not instantaneously but at the speed of light: the force on one particle at one spacetime point x depends on the motion of some other particle at spacetime point y only if the vector x - y is lightlike. Later Fokker and Tetrode derived this force law from a principle of least action. In 1949, Feynman and Wheeler checked that this formalism gives results compatible with the usual approach to electromagnetism using fields, except for several points:

• Each particle exerts forces only on other particles, so we avoid the thorny issue of how a point particle responds to the electromagnetic field produced by itself.

• There are no electromagnetic fields not produced by particles: for example, the theory does not describe the motion of a charged particle in an ‘external electromagnetic field’.

• The principle of least action guarantees that ‘if A affects B then B affects A‘. So, if a particle at x exerts a force on a particle at a point y in its future lightcone, the particle at y exerts a force on the particle at x in its past lightcone. This raises the issue of ‘reverse causality’, which Feynman and Wheeler address.

Besides the reverse causality issue, perhaps one reason this approach has not been more pursued is that it does not admit a Hamiltonian formulation in terms of particle positions and momenta. Indeed, there are a number of ‘no-go theorems’ for relativistic multiparticle Hamiltonians, saying that these can only describe noninteracting particles. So, most work that takes both quantum mechanics and special relativity into account uses fields.

Indeed, in quantum electrodynamics, even the charged point particles are replaced by fields—namely quantum fields! Next time we’ll see whether that helps.

Struggles with the Continuum (Part 2)

9 September, 2016

Last time we saw that that nobody yet knows if Newtonian gravity, applied to point particles, truly succeeds in predicting the future. To be precise: for four or more particles, nobody has proved that almost all initial conditions give a well-defined solution for all times!

The problem is related to the continuum nature of space: as particles get arbitrarily close to other, an infinite amount of potential energy can be converted to kinetic energy in a finite amount of time.

I left off by asking if this problem is solve by more sophisticated theories. For example, does the ‘speed limit’ imposed by special relativity help the situation? Or might quantum mechanics help, since it describes particles as ‘probability clouds’, and puts limits on how accurately we can simultaneously know both their position and momentum?

We begin with quantum mechanics, which indeed does help.

The quantum mechanics of charged particles

Few people spend much time thinking about ‘quantum celestial mechanics’—that is, quantum particles obeying Schrödinger’s equation, that attract each other gravitationally, obeying an inverse-square force law. But Newtonian gravity is a lot like the electrostatic force between charged particles. The main difference is a minus sign, which makes like masses attract, while like charges repel. In chemistry, people spend a lot of time thinking about charged particles obeying Schrödinger’s equation, attracting or repelling each other electrostatically. This approximation neglects magnetic fields, spin, and indeed anything related to the finiteness of the speed of light, but it’s good enough explain quite a bit about atoms and molecules.

In this approximation, a collection of charged particles is described by a wavefunction \psi, which is a complex-valued function of all the particles’ positions and also of time. The basic idea is that \psi obeys Schrödinger’s equation

\displaystyle{ \frac{d \psi}{dt} = - i H \psi}

where H is an operator called the Hamiltonian, and I’m working in units where \hbar = 1.

Does this equation succeeding in predicting \psi at a later time given \psi at time zero? To answer this, we must first decide what kind of function \psi should be, what concept of derivative applies to such funtions, and so on. These issues were worked out by von Neumann and others starting in the late 1920s. It required a lot of new mathematics. Skimming the surface, we can say this.

At any time, we want \psi to lie in the Hilbert space consisting of square-integrable functions of all the particle’s positions. We can then formally solve Schrödinger’s equation as

\psi(t) = \exp(-i t H) \psi(0)

where \psi(t) is the solution at time t. But for this to really work, we need H to be a self-adjoint operator on the chosen Hilbert space. The correct definition of ‘self-adjoint’ is a bit subtler than what most physicists learn in a first course on quantum mechanics. In particular, an operator can be superficially self-adjoint—the actual term for this is ‘symmetric’—but not truly self-adjoint.

In 1951, based on earlier work of Rellich, Kato proved that H is indeed self-adjoint for a collection of nonrelativistic quantum particles interacting via inverse-square forces. So, this simple model of chemistry works fine. We can also conclude that ‘celestial quantum mechanics’ would dodge the nasty problems that we saw in Newtonian gravity.

The reason, simply put, is the uncertainty principle.

In the classical case, bad things happen because the energy is not bounded below. A pair of classical particles attracting each other with an inverse square force law can have arbitrarily large negative energy, simply by being very close to each other. Since energy is conserved, if you have a way to make some particles get an arbitrarily large negative energy, you can balance the books by letting others get an arbitrarily large positive energy and shoot to infinity in a finite amount of time!

When we switch to quantum mechanics, the energy of any collection of particles becomes bounded below. The reason is that to make the potential energy of two particles large and negative, they must be very close. Thus, their difference in position must be very small. In particular, this difference must be accurately known! Thus, by the uncertainty principle, their difference in momentum must be very poorly known: at least one of its components must have a large standard deviation. This in turn means that the expected value of the kinetic energy must be large.

This must all be made quantitative, to prove that as particles get close, the uncertainty principle provides enough positive kinetic energy to counterbalance the negative potential energy. The Kato–Lax–Milgram–Nelson theorem, a refinement of the original Kato–Rellich theorem, is the key to understanding this issue. The Hamiltonian H for a collection of particles interacting by inverse square forces can be written as

H = K + V

where K is an operator for the kinetic energy and V is an operator for the potential energy. With some clever work one can prove that for any \epsilon > 0, there exists c > 0 such that if \psi is a smooth normalized wavefunction that vanishes at infinity and at points where particles collide, then

| \langle \psi , V \psi \rangle | \le \epsilon \langle \psi, K\psi \rangle + c.

Remember that \langle \psi , V \psi \rangle is the expected value of the potential energy, while \langle \psi, K \psi \rangle is the expected value of the kinetic energy. Thus, this inequality is a precise way of saying how kinetic energy triumphs over potential energy.

By taking \epsilon = 1, it follows that the Hamiltonian is bounded below on such
states \psi:

\langle \psi , H \psi \rangle \ge -c

But the fact that the inequality holds even for smaller values of \epsilon is the key to showing H is ‘essentially self-adjoint’. This means that while H is not self-adjoint when defined only on smooth wavefunctions that vanish at infinity and at points where particles collide, it has a unique self-adjoint extension to some larger domain. Thus, we can unambiguously take this extension to be the true Hamiltonian for this problem.

To understand what a great triumph this is, one needs to see what could have gone wrong! Suppose space had an extra dimension. In 3-dimensional space, Newtonian gravity obeys an inverse square force law because the area of a sphere is proportional to its radius squared. In 4-dimensional space, the force obeys an inverse cube law:

\displaystyle{ F = -\frac{Gm_1 m_2}{r^3} }

Using a cube instead of a square here makes the force stronger at short distances, with dramatic effects. For example, even for the classical 2-body problem, the equations of motion no longer ‘almost always’ have a well-defined solution for all times. For an open set of initial conditions, the particles spiral into each other in a finite amount of time!

Hyperbolic spiral - a fairly common orbit in an inverse cube force

Hyperbolic spiral – a fairly common orbit in an inverse cube force.

The quantum version of this theory is also problematic. The uncertainty principle is not enough to save the day. The inequalities above no longer hold: kinetic energy does not triumph over potential energy. The Hamiltonian is no longer essentially self-adjoint on the set of wavefunctions that I described.

In fact, this Hamiltonian has infinitely many self-adjoint extensions! Each one describes different physics: namely, a different choice of what happens when particles collide. Moreover, when G exceeds a certain critical value, the energy is no longer bounded below.

The same problems afflict quantum particles interacting by the electrostatic force in 4d space, as long as some of the particles have opposite charges. So, chemistry would be quite problematic in a world with four dimensions of space.

With more dimensions of space, the situation becomes even worse. In fact, this is part of a general pattern in mathematical physics: our struggles with the continuum tend to become worse in higher dimensions. String theory and M-theory may provide exceptions.

Next time we’ll look at what happens to point particles interacting electromagnetically when we take special relativity into account. After that, we’ll try to put special relativity and quantum mechanics together!

For more

For more on the inverse cube force law, see:

• John Baez, The inverse cube force law, Azimuth, 30 August 2015.

It turns out Newton made some fascinating discoveries about this law in his Principia; it has remarkable properties both classically and in quantum mechanics.

The hyperbolic spiral is one of 3 kinds of orbits possible in an inverse cube force; for the others see:

Cotes’s spiral, Wikipedia.

The picture of a hyperbolic spiral was drawn by Anarkman and Pbroks13 and placed on Wikicommons under a Creative Commons Attribution-Share Alike 3.0 Unported license.

Struggles with the Continuum (Part 1)

8 September, 2016

Is spacetime really a continuum? That is, can points of spacetime really be described—at least locally—by lists of four real numbers (t,x,y,z)? Or is this description, though immensely successful so far, just an approximation that breaks down at short distances?

Rather than trying to answer this hard question, let’s look back at the struggles with the continuum that mathematicians and physicists have had so far.

The worries go back at least to Zeno. Among other things, he argued that that an arrow can never reach its target:

That which is in locomotion must arrive at the half-way stage before it arrives at the goal.Aristotle summarizing Zeno

and Achilles can never catch up with a tortoise:

In a race, the quickest runner can never overtake the slowest, since the pursuer must first reach the point whence the pursued started, so that the slower must always hold a lead.Aristotle summarizing Zeno

These paradoxes can now be dismissed using our theory of real numbers. An interval of finite length can contain infinitely many points. In particular, a sum of infinitely many terms can still converge to a finite answer.

But the theory of real numbers is far from trivial. It became fully rigorous only considerably after the rise of Newtonian physics. At first, the practical tools of calculus seemed to require infinitesimals, which seemed logically suspect. Thanks to the work of Dedekind, Cauchy, Weierstrass, Cantor and others, a beautiful formalism was developed to handle real numbers, limits, and the concept of infinity in a precise axiomatic manner.

However, the logical problems are not gone. Gödel’s theorems hang like a dark cloud over the axioms of mathematics, assuring us that any consistent theory as strong as Peano arithmetic, or stronger, cannot prove itself consistent. Worse, it will leave some questions unsettled.

For example: how many real numbers are there? The continuum hypothesis proposes a conservative answer, but the usual axioms of set theory leaves this question open: there could vastly more real numbers than most people think. And the superficially plausible axiom of choice—which amounts to saying that the product of any collection of nonempty sets is nonempty—has scary consequences, like the existence of non-measurable subsets of the real line. This in turn leads to results like that of Banach and Tarski: one can partition a ball of unit radius into six disjoint subsets, and by rigid motions reassemble these subsets into two disjoint balls of unit radius. (Later it was shown that one can do the job with five, but no fewer.)

However, most mathematicians and physicists are inured to these logical problems. Few of us bother to learn about attempts to tackle them head-on, such as:

nonstandard analysis and synthetic differential geometry, which let us work consistently with infinitesimals,

constructivism, which avoids proof by contradiction: for example, one must ‘construct’ a mathematical object to prove that it exists,

finitism (which avoids infinities altogether),

ultrafinitism, which even denies the existence of very large numbers.

This sort of foundational work proceeds slowly, and is now deeply unfashionable. One reason is that it rarely seems to intrude in ‘real life’ (whatever that is). For example, it seems that no question about the experimental consequences of physical theories has an answer that depends on whether or not we assume the continuum hypothesis or the axiom of choice.

But even if we take a hard-headed practical attitude and leave logic to the logicians, our struggles with the continuum are not over. In fact, the infinitely divisible nature of the real line—the existence of arbitrarily small real numbers—is a serious challenge to almost all of the most widely used theories of physics.

Indeed, we have been unable to rigorously prove that most of these theories make sensible predictions in all circumstances, thanks to problems involving the continuum.

One might hope that a radical approach to the foundations of mathematics—such as those listed above—would allow avoid some of the problems I’ll be discussing. However, I know of no progress along these lines that would interest most physicists. Some of the ideas of constructivism have been embraced by topos theory, which also provides a foundation for calculus with infinitesimals using synthetic differential geometry. Topos theory and especially higher topos theory are becoming important in mathematical physics. They’re great! But as far as I know, they have not been used to solve the problems I want to discuss here.

Today I’ll talk about one of the first theories to use calculus: Newton’s theory of gravity.

Newtonian Gravity

In its simplest form, Newtonian gravity describes ideal point particles attracting each other with a force inversely proportional to the square of their distance. It is one of the early triumphs of modern physics. But what happens when these particles collide? Apparently the force between them becomes infinite. What does Newtonian gravity predict then?

Of course real planets are not points: when two planets come too close together, this idealization breaks down. Yet if we wish to study Newtonian gravity as a mathematical theory, we should consider this case. Part of working with a continuum is successfully dealing with such issues.

In fact, there is a well-defined ‘best way’ to continue the motion of two point masses through a collision. Their velocity becomes infinite at the moment of collision but is finite before and after. The total energy, momentum and angular momentum are unchanged by this event. So, a 2-body collision is not a serious problem. But what about a simultaneous collision of 3 or more bodies? This seems more difficult.

Worse than that, Xia proved in 1992 that with 5 or more particles, there are solutions where particles shoot off to infinity in a finite amount of time!

This sounds crazy at first, but it works like this: a pair of heavy particles orbit each other, another pair of heavy particles orbit each other, and these pairs toss a lighter particle back and forth. Xia and Saari’s nice expository article has a picture of the setup:


Each time the lighter particle gets thrown back and forth, the pairs move further apart from each other, while the two particles within each pair get closer together. And each time they toss the lighter particle back and forth, the two pairs move away from each other faster!

As the time t approaches a certain value t_0, the speed of these pairs approaches infinity, so they shoot off to infinity in opposite directions in a finite amount of time, and the lighter particle bounces back and forth an infinite number of times!

Of course this crazy behavior isn’t possible in the real world, but Newtonian physics has no ‘speed limit’, and we’re idealizing the particles as points. So, if two or more of them get arbitrarily close to each other, the potential energy they liberate can give some particles enough kinetic energy to zip off to infinity in a finite amount of time! After that time, the solution is undefined.

You can think of this as a modern reincarnation of Zeno’s paradox. Suppose you take a coin and put it heads up. Flip it over after 1/2 a second, then flip it over after 1/4 of a second, and so on. After one second, which side will be up? There is no well-defined answer. That may not bother us, since this is a contrived scenario that seems physically impossible. It’s a bit more bothersome that Newtonian gravity doesn’t tell us what happens to our particles when t = t_0.

Your might argue that collisions and these more exotic ‘noncollision singularities’ occur with probability zero, because they require finely tuned initial conditions. If so, perhaps we can safely ignore them!

This is a nice fallback position. But to a mathematician, this argument demands proof.

A bit more precisely, we would like to prove that the set of initial conditions for which two or more particles come arbitrarily close to each other within a finite time has ‘measure zero’. This would mean that ‘almost all’ solutions are well-defined for all times, in a very precise sense.

In 1977, Saari proved that this is true for 4 or fewer particles. However, to the best of my knowledge, the problem remains open for 5 or more particles. Thanks to previous work by Saari, we know that the set of initial conditions that lead to collisions has measure zero, regardless of the number of particles. So, the remaining problem is to prove that noncollision singularities occur with probability zero.

It is remarkable that even Newtonian gravity, often considered a prime example of determinism in physics, has not been proved to make definite predictions, not even ‘almost always’! In 1840, Laplace wrote:

We ought to regard the present state of the universe as the effect of its antecedent state and as the cause of the state that is to follow. An intelligence knowing all the forces acting in nature at a given instant, as well as the momentary positions of all things in the universe, would be able to comprehend in one single formula the motions of the largest bodies as well as the lightest atoms in the world, provided that its intellect were sufficiently powerful to subject all data to analysis; to it nothing would be uncertain, the future as well as the past would be present to its eyes. The perfection that the human mind has been able to give to astronomy affords but a feeble outline of such an intelligence.Laplace

However, this dream has not yet been realized for Newtonian gravity.

I expect that noncollision singularities will be proved to occur with probability zero. If so, the remaining question would why it takes so much work to prove this, and thus prove that Newtonian gravity makes definite predictions in almost all cases. Is this is a weakness in the theory, or just the way things go? Clearly it has something to do with three idealizations:

• point particles whose distance can be arbitrarily small,

• potential energies that can be arbitrariy large and negative,

• velocities that can be arbitrarily large.

These are connected: as the distance between point particles approaches zero, their potential energy approaches -\infty, and conservation of energy dictates that some velocities approach +\infty.

Does the situation improve when we go to more sophisticated theories? For example, does the ‘speed limit’ imposed by special relativity help the situation? Or might quantum mechanics help, since it describes particles as ‘probability clouds’, and puts limits on how accurately we can simultaneously know both their position and momentum?

Next time I’ll talk about quantum mechanics, which indeed does help.

Interview (Part 2)

21 March, 2016

Greg Bernhardt runs an excellent website for discussing physics, math and other topics, called Physics Forums. He recently interviewed me there. Since I used this opportunity to explain a bit about the Azimuth Project and network theory, I thought I’d reprint the interview here. Here is Part 2.


Tell us about your experience with past projects like “This Week’s Finds in Mathematical Physics”.

I was hired by U.C. Riverside back in 1989. I was lonely and bored, since Lisa was back on the other coast. So, I spent a lot of evenings on the computer.

We had the internet back then—this was shortly after stone tools were invented—but the world-wide web hadn’t caught on yet. So, I would read and write posts on “newsgroups” using a program called a “news server”. You have to imagine me sitting in front of an old green­-on­-black cathode ray tube monitor with a large floppy disk drive, firing up the old modem to hook up to the internet.

In 1993, I started writing a series of posts on the papers I’d read. I called it “This Week’s Finds in Mathematical Physics”, which was a big mistake, because I couldn’t really write one every week. After a while I started using it to explain lots of topics in math and physics. I wrote 300 issues. Then I quit in 2010, when I started taking climate change seriously.

Share with us a bit about your current projects like Azimuth and the n­-Café.

The n­-Category Café is a blog I started with Urs Schreiber and the philosopher David Corfield back in 2006, when all three of us realized that n­-categories are the big wave that math is riding right now. We have a bunch more bloggers on the team now. But the n­-Café lost some steam when I quit work in n­-categories and Urs started putting most of his energy into two related projects: a wiki called the nLab and a discussion group called the nForum.

In 2010, when I noticed that global warming was like a huge wave crashing down on our civilization, I started the Azimuth Project. The goal was to create a focal point for scientists and engineers interested in saving the planet. It consists of a team of people, a blog, a wiki and a discussion group. It was very productive for a while: we wrote a lot of educational articles on climate science and energy issues. But lately I’ve realized I’m better at abstract math. So, I’ve been putting more time into working with my grad students.

What about climate change has captured your interest?

That’s like asking: “What about that huge tsunami rushing toward us has captured your interest?”

Around 2004 I started hearing news that sent chills up my spine ­ and what really worried me is how few people were talking about this news, at least in the US.

I’m talking about how we’re pushing the Earth’s climate out of the glacial cycle we’ve been in for over a million years, into brand new territory. I’m talking about things like how it takes hundreds or thousands of years for CO2 to exit the atmosphere after it’s been put in. And I’m talking about how global warming is just part of a bigger phenomenon: the Anthropocene. That’s a new geological epoch, in which the biosphere is rapidly changing due to human influences. It’s not just the temperature:

• About 1/4 of all chemical energy produced by plants is now used by humans.

• The rate of species going extinct is 100­–1000 times the usual background rate.

• Populations of large ocean fish have declined 90% since 1950.

• Humans now take more nitrogen from the atmosphere and convert it into nitrates than all other processes combined.

8­-9 times as much phosphorus is flowing into oceans than the natural background rate.

This doesn’t necessarily spell the end of our civilization, but it is something that we’ll all have to deal with.

So, I felt the need to alert people and try to dream up strategies to do something. That’s why in 2010 I quit work on n­-categories and started the Azimuth Project.

Carbon Dioxide Variations

You have life experience on both US coasts. Which do you prefer and why?

There are some differences between the coasts, but they’re fairly minor. The West Coast is part of the Pacific Rim, so there’s more Asian influence here. The seasons are less pronounced here, because winds in the northern hemisphere blow from west to east, and the oceans serve as a temperature control system. Down south in Riverside it’s a semi­-desert, so we can eat breakfast in our back yard in January! But I live here not because I like the West Coast more. This just happens to be where my wife Lisa and I managed to get a job.

What I really like is getting out of the US and seeing the rest of the world. When you’re at cremation ritual in Bali, or a Hmong festival in Laos, the difference between regions of the US starts seeming pretty small.

But I wasn’t a born traveler. When I spent my first summer in England, I was very apprehensive about making a fool of myself. The British have different manners, and their old universities are full of arcane customs and subtle social distinctions that even the British find terrifying. But after a few summers there I got over it. First, all around the world, being American gives you a license to be clueless. If you behave any better than the worst stereotypes, people are impressed. Second, I spend most of my time with mathematicians, who are incredibly forgiving of bad social behavior as long as you know interesting theorems.

By now I’ve gotten to feel very comfortable in England. The last couple of years I’ve spent time at the quantum computation group at Oxford–the group run by Bob Coecke and Samson Abramsky. I like talking to Jamie Vicary about n­categories and physics, and also my old friend Minhyong Kim, who is a number theorist there.

I was also very apprehensive when I first visited Paris. Everyone talks about how the waiters are rude, and so on. But I think that’s an exaggeration. Yes, if you go to cafés packed with boorish tourists, the waiters will treat you like a boorish tourist—so don’t do that. If you go to quieter places and behave politely, most people are friendly. Luckily Lisa speaks French and has some friends in Paris; that opens up a lot of opportunities. I don’t speak French, so I always feel like a bit of an idiot, but I’ve learned to cope. I’ve spent a few summers there working with Paul­-André Melliès on category theory and logic.

Yau Ma Tei Market - Hong Kong

Yau Ma Tei Market – Hong Kong

I was also intimidated when I first spent a summer in Hong Kong—and even more so when I spent a summer in Shanghai. Lisa speaks Chinese too: she’s more cultured than me, and she drags me to interesting places. My first day walking around Shanghai left me completely exhausted: everything was new! Walking down the street you see people selling frogs in a bucket, strange fungi and herbs, then a little phone shop where telephone numbers with lots of 8’s cost more, and so on: it’s a kind of cognitive assault.

But again, I came to enjoy it. And coming back to California, everything seemed a bit boring. Why is there so much land that’s not being used? Where are all the people? Why is the food so bland?

I’ve spent the most time outside the US in Singapore. Again, that’s because my wife and I both got job offers there, not because it’s the best place in the world. Compared to China it’s rather sterile and manicured. But it’s still a fascinating place. They’ve pulled themselves up from a British colonial port town to a multi­cultural country that’s in some ways more technologically advanced than the US. The food is great: it’s a mix of Chinese, Indian, Malay and pretty much everything else. There’s essentially no crime: you can walk around in the darkest alley in the worst part of town at 3 am and still feel safe. It’s interesting to live in a country where people from very different cultures are learning to live together and prosper. The US considers itself a melting-pot, but in Singapore they have four national languages: English, Mandarin, Malay and Tamil.

Most of all, it’s great to live in places where the culture and politics is different than where I grew up. But I’m trying to travel less, because it’s bad for the planet.

You’ve gained some fame for your “crackpot index”. What were your motivations for developing it? Any new criteria you’d add?

After the internet first caught on, a bunch of us started using it to talk about physics on the usenet newsgroup sci.physics.

And then, all of a sudden, crackpots around the world started joining in!

Before this, I don’t think anybody realized how many people had their own personal theories of physics. You might have a crazy uncle who spent his time trying to refute special relativity, but you didn’t realize there were actually thousands of these crazy uncles.

As I’m sure you know here at Physics Forums, crackpots naturally tend to drive out more serious conversations. If you have some people talking about the laws of black hole thermodynamics, and some guy jumps in and says that the universe is a black hole, everyone will drop what they’re doing and argue with that guy. It’s irresistible. It reminds me of how when someone brings a baby to a party, everyone will start cooing to the baby. But it’s worse.

When physics crackpots started taking over the usenet newsgroup sci.physics, I discovered that they had a lot of features in common. The Crackpot Index summarizes these common features. Whenever I notice a new pattern, I add it.

For example: if someone starts comparing themselves to Galileo and says the physics establishment is going after them like the Inquisition, I guarantee you that they’re a crackpot. Their theories could be right—but unfortunately, they’ve got delusions of grandeur and a persecution complex.

It’s not being wrong that makes someone a crackpot. Being a full­-fledged crackpot is the endpoint of a tragic syndrome. Someone starts out being a bit too confident that they can revolutionize physics without learning it first. In fact, many young physicists go through this stage! But the good ones react to criticism by upping their game. The ones who become crackpots just brush it off. They come up with an idea that they think is great, and when nobody likes it, they don’t say “okay, I need to learn more.” Instead, they make up excuses: nobody understands me, maybe there’s a conspiracy at work, etc. The excuses get more complicated with each rebuff, and it gets harder and harder for them to back down and say “whoops, I was wrong”.

When I wrote the Crackpot Index, I thought crackpots were funny. Alexander Abian claimed all the world’s ills would be cured if we blew up the Moon. Archimedes Plutonium thinks the Universe is a giant plutonium atom. These ideas are funny. But now I realize how sad it is that someone can start with an passion for physics and end up in this kind of trap. They almost never escape.

Who are some of your math and physics heroes of the past and of today?

Wow, that’s a big question! I think every scientist needs to have heroes. I’ve had a lot.

Marie Curie

Marie Curie

When I was a kid, I was in love with Marie Curie. I wanted to marry a woman like her: someone who really cared about science. She overcame huge obstacles to get a degree in physics, discovered not one but two new elements, often doing experiments in her own kitchen—and won not one but two Nobel prizes. She was a tragic figure in many ways. Her beloved husband Pierre, a great physicist in his own right, slipped and was run over by a horse­-drawn cart, dying instantly when the wheels ran over his skull. She herself probably died from her experiments with radiation. But this made me love her all the more.

Later my big hero was Einstein. How could any physicist not have Einstein as a hero? First he came up with the idea that light comes in discrete quanta: photons. Then, two months later, he used Brownian motion to figure out the size of atoms. One month after that: special relativity, unifying space and time! Three months later, the equivalence between mass and energy. And all this was just a warmup for his truly magnificent theory of general relativity, explaining gravity as the curvature of space and time. He truly transformed our vision of the Universe. And then, in his later years, the noble and unsuccessful search for a unified field theory. As a friend of mine put it, what matters here is not that he failed: what matters is that he set physics a new goal, more ambitious than any goal it had before.

Later it was Feynman. As I mentioned, my uncle gave me Feynman’s Lectures on Physics. This is how I first learned Maxwell’s equations, special relativity, quantum mechanics. His way of explaining things with a minimum of jargon, getting straight to the heart of every issue, is something I really admire. Later I enjoyed his books like Surely You Must Be Joking. Still later I learned enough to be impressed by his work on QED.

But when you read his autobiographical books, you can see that he was a bit too obsessed with pretending to be a fun­-loving ordinary guy. A fun­-loving ordinary guy who just happens to be smarter than everyone else. In short, a self­-absorbed showoff. He could also be pretty mean to women—and in that respect, Einstein was even worse. So our heroes should not be admired uncritically.

Alexander Grothendieck

Alexander Grothendieck

A good example is Alexander Grothendieck. I guess he’s my main math hero these days. To solve concrete problems like the Weil conjectures, he avoided brute force techniques and instead developed revolutionary new concepts that gently dissolved those problems. And these new concepts turned out to be much more important than the problems that motivated him. I’m talking about abelian categories, schemes, topoi, stacks, things like that. Everyone who really wants to understand math at a deep level has got to learn these concepts. They’re beautiful and wonderfully simple—but not easy to master. You have to really change your world view to understand them, just like general relativity or quantum mechanics. You have to rewire your neurons.

At his peak, Grothendieck seemed almost superhuman. It seems he worked almost all day and all night, bouncing his ideas off the other amazing French algebraic geometers. Apparently 20,000 pages of his writings remain unpublished! But he became increasingly alienated from the mathematical establishment and eventually disappeared completely, hiding in a village near the Pyrenees.

Which groundbreaking advances in science and math are you most looking forward to?

I’d really like to see progress in figuring out the fundamental laws of physics. Ideally, I’d like to know the Theory of Everything. Of course, we don’t even know that there is one! There could be an endless succession of deeper and deeper realizations to be had about the laws of physics, with no final answer.

If we ever do discover the Theory of Everything, that won’t be the end of the story. It could be just the beginning. For example, next we could ask why this particular theory governs our Universe. Is it necessary, or contingent? People like to chat about this puzzle already, but I think it’s premature. I think we should find the Theory of Everything first.

Unfortunately, right now fundamental physics is in a phase of being “stuck”. I don’t expect to see the Theory of Everything in my lifetime. I’d be happy to see any progress at all! There are dozens of very basic things we don’t understand.

When it comes to math, I expect that people will have their hands full this century redoing the foundations using ∞-categories, and answering some of the questions that come up when you do this. The crowd working on “homotopy type theory” is making good progress–but so far they’re mainly thinking about ∞-groupoids, which are a very special sort of ∞-category. When we do all of math using ∞-categories, it will be a whole new ballgame.

And then there’s the question of whether humanity will figure out a way to keep from ruining the planet we live on. And the question of whether we’ll succeed in replacing ourselves with something more intelligent—or even wiser.

The Milky Way and Andromeda Nebula after their first collision, 4 billion years from now

The Milky Way and Andromeda Nebula after their first collision, 4 billion years from now

Here’s something cool: red dwarf stars will keep burning for 10 trillion years. If we, or any civilization, can settle down next to one of those, there will be plenty of time to figure things out. That’s what I hope for.

But some of my friends think that life always uses up resources as fast as possible. So one of my big questions is whether intelligent life will develop the patience to sit around and think interesting thoughts, or whether it will burn up red dwarf stars and every other source of energy as fast as it can, as we’re doing now with fossil fuels.

What does the future hold for John Baez? What are your goals?

What the future holds for me, primarily, is death.

That’s true of all of us—or at least most of us. While some hope that technology will bring immortality, or at least a much longer life, I bet most of us are headed for death fairly soon. So I try to make the most of the time I have.

I’m always re­-evaluating what I should do. I used to spend time thinking about quantum gravity and n­-categories. But quantum gravity feels stuck, and n­-category theory is shooting forward so fast that my help is no longer needed.

Climate change is hugely important, and nobody really knows what to do about it. Lots of people are trying lots of different things. Unfortunately I’m no better than the rest when it comes to the most obvious strategies—like politics, or climate science, or safer nuclear reactors, or better batteries and photocells.

The trick is finding things you can do better than other people. Right now for me that means thinking about networks and biology in a very abstract way. I’m inspired by this remark by Patten and Witkamp:

To understand ecosystems, ultimately will be to understand networks.

So that’s my goal for the next five years or so. It’s probably not be the best thing anyone can do to prepare for the Middle Anthropocene. But it may be the best thing I can do: use the math I know to help people understand the biosphere.

It may seem like I keep jumping around: from quantum gravity to n-categories to biology. But I keep wanting to think about networks, and how they change in time.

At some point I hope to retire and become a bit more of a self­-indulgent wastrel. I could write a fun book about group theory in geometry and physics, and a fun book about the octonions. I might even get around to spending more time on music!

John Baez in Namo Gorge, Gansu

John Baez

Interview (Part 1)

18 March, 2016

Greg Bernhardt runs an excellent website for discussing physics, math and other topics, called Physics Forums. He recently interviewed me there. Since I used this opportunity to explain a bit about the Azimuth Project and network theory, I thought I’d reprint the interview here. Here is Part 1.

Give us some background on yourself.

I’m interested in all kinds of mathematics and physics, so I call myself a mathematical physicist. But I’m a math professor at the University of California in Riverside. I’ve taught here since 1989. My wife Lisa Raphals got a job here nine years later: among other things, she studies classical Chinese and Greek philosophy.

I got my bachelors’s degree in math at Princeton. I did my undergrad thesis on whether you can use a computer to solve Schrödinger’s equation to arbitrary accuracy. In the end, it became obvious that you can. I was really interested in mathematical logic, and I used some in my thesis—the theory of computable functions—but I decided it wasn’t very helpful in physics. When I read the magnificently poetic last chapter of Misner, Thorne and Wheeler’s Gravitation, I decided that quantum gravity was the problem to work on.

I went to math grad school at MIT, but I didn’t find anyone to work with on quantum gravity. So, I did my thesis on quantum field theory with Irving Segal. He was one of the founders of “constructive quantum field theory”, where you try to rigorously prove that quantum field theories make mathematical sense and obey certain axioms that they should. This was a hard subject, and I didn’t accomplish much, but I learned a lot.

I got a postdoc at Yale and switched to classical field theory, mainly because it was something I could do. On the side I was still trying to understand quantum gravity. String theory was bursting into prominence at the time, and my life would have been easier if I’d jumped onto that bandwagon. But I didn’t like it, because most of the work back then studied strings moving on a fixed “background” spacetime. Quantum gravity is supposed to be about how the geometry of spacetime is variable and quantum­-mechanical, so I didn’t want a theory of quantum gravity set on a pre­-existing background geometry!

I got a professorship at U.C. Riverside based on my work on classical field theory. But at a conference on that subject in Seattle, I heard Abhay Ashtekar, Chris Isham and Renate Loll give some really interesting talks on loop quantum gravity. I don’t know why they gave those talks at a conference on classical field theory. But I’m sure glad they did! I liked their work because it was background­-free and mathematically rigorous. So I started work on loop quantum gravity.

Like many other theories, quantum gravity is easier in low dimensions. I became interested in how category theory lets you formulate quantum gravity in a universe with just 3 spacetime dimensions. It amounts to a radical new conception of space, where the geometry is described in a thoroughly quantum­-mechanical way. Ultimately, space is a quantum superposition of “spin networks”, which are like Feynman diagrams. The idea is roughly that a spin network describes a virtual process where particles move around and interact. If we know how likely each of these processes is, we know the geometry of space.

A spin network

A spin network

Loop quantum gravity tries to do the same thing for full­-fledged quantum gravity in 4 spacetime dimensions, but it doesn’t work as well. Then Louis Crane had an exciting idea: maybe 4­-dimensional quantum gravity needs a more sophisticated structure: a “2­-category”.

I had never heard of 2­-categories. Category theory is about things and processes that turn one thing into another. In a 2­-category we also have “meta-processes” that turn one process into another.

I became very excited about 2-­categories. At the time I was so dumb I didn’t consider the possibility of 3­-categories, and 4­-categories, and so on. To be precise, I was more of a mathematical physicist than a mathematician: I wasn’t trying to develop math for its own sake. Then someone named James Dolan told me about n­-categories! That was a real eye­-opener. He came to U.C. Riverside to work with me. So I started thinking about n­-categories in parallel with loop quantum gravity.

Dolan was technically my grad student, but I probably learned more from him than vice versa. In 1996 we wrote a paper called “Higher­-dimensional algebra and topological quantum field theory”, which might be my best paper. It’s full of grandiose guesses about n-­categories and their connections to other branches of math and physics. We had this vision of how everything fit together. It was so beautiful, with so much evidence supporting it, that we knew it had to be true. Unfortunately, at the time nobody had come up with a good definition of n­-category, except for n < 4. So we called our guesses “hypotheses” instead of “conjectures”. In math a conjecture should be something utterly precise: it’s either true or not, with no room for interpretation.

By now, I think everybody more or less believes our hypotheses. Some of the easier ones have already been turned into theorems. Jacob Lurie, a young hotshot at Harvard, improved the statement of one and wrote a 111-page outline of a proof. Unfortunately he still used some concepts that hadn’t been defined. People are working to fix that, and I feel sure they’ll succeed.

A foam of soap bubbles

A foam of soap bubbles

Anyway, I kept trying to connect these ideas to quantum gravity. In 1997, I introduced “spin foams”. These are structures like spin networks, but with an extra dimension. Spin networks have vertices and edges. Spin foams also have 2­-dimensional faces: imagine a foam of soap bubbles.

The idea was to use spin foams to give a purely quantum­-mechanical description of the geometry of spacetime, just as spin networks describe the geometry of space. But mathematically, what we’re doing here is going from a category to a 2­-category.

By now, there are a number of different theories of quantum gravity based on spin foams. Unfortunately, it’s not clear that any of them really work. In 2002, Dan Christensen, Greg Egan and I did a bunch of supercomputer calculations to study this question. We showed that the most popular spin foam theory at the time gave dramatically different answers than people had hoped for. I think we more or less killed that theory.

That left me rather depressed. I don’t enjoy programming: indeed, Christensen and Egan did all the hard work of that sort on our paper. I didn’t want to spend years sifting through spin foam theories to find one that works. And most of all, I didn’t want to end up as an old man still not knowing if my work had been worthwhile! To me n­-category theory was clearly the math of the future—and it was easy for me to come up with cool new ideas in that subject. So, I quit quantum gravity and switched to n­-categories.

But this was very painful. Quantum gravity is a kind of “holy grail” in physics. When you work on that subject, you wind up talking to lots of people who believe that unifying quantum mechanics and general relativity is the most important thing in the world, and that nothing else could possibly be as interesting. You wind up believing it. It took me years to get out of that mindset.

Ironically, when I quit quantum gravity, I felt free to explore string theory. As a branch of math, it’s really wonderful. I started looking at how n­-categories apply to string theory. It turns out there’s a wonderful story here: briefly, particles are to categories as strings are to 2­-categories, and all the math of particles can be generalized to strings using this idea! I worked out a bit of this story with Urs Schreiber and John Huerta.

Around 2010, I felt I had to switch to working on environmental issues and math related to engineering and biology, for the sake of the planet. That was another painful renunciation. But luckily, Urs Schreiber and others are continuing to work on n­-categories and string theory, and doing it better than I ever could. So I don’t feel the need to work on those things anymore—indeed, it would be hard to keep up. I just follow along quietly from the sidelines.

It’s quite possible that we need a dozen more good ideas before we really get anywhere on quantum gravity. But I feel confident that n­-categories will have a role to play. So, I’m happy to have helped push that subject forward.

Your uncle, Albert Baez, was a physicist. How did he help develop your interests?

He had a huge effect on me. He’s mainly famous for being the father of the folk singer Joan Baez. But he started out in optics and helped invent the first X­-ray microscope. Later he became very involved in physics education, especially in what were then called third­-world countries. For example, in 1951 he helped set up a physics department at the University of Baghdad.

Albert V. Baez

Albert V. Baez

When I was a kid he worked for UNESCO, so he’d come to Washington D.C. and stay with my parents, who lived nearby. Whenever he showed up, he would open his suitcase and pull out amazing gadgets: diffraction gratings, holograms, and things like that. And he would explain how they work! I decided physics was the coolest thing there is.

When I was eight, he gave me a copy of his book The New College Physics: A Spiral Approach. I immediately started trying to read it. The “spiral approach” is a great pedagogical principle: instead of explaining each topic just once, you should start off easy and then keep spiraling around from topic to topic, examining them in greater depth each time you revisit them. So he not only taught me physics, he taught me about how to learn and how to teach.

Later, when I was fifteen, I spent a couple weeks at his apartment in Berkeley. He showed me the Lawrence Hall of Science, which is where I got my first taste of programming—in BASIC, with programs stored on paper tape. This was in 1976. He also gave me a copy of The Feynman Lectures on Physics. And so, the following summer, when I was working at a state park building trails and such, I was also trying to learn quantum mechanics from the third volume of The Feynman Lectures. The other kids must have thought I was a complete geek—which of course I was.

Give us some insight on what your average work day is like.

During the school year I teach two or three days a week. On days when I teach, that’s the focus of my day: I try to prepare my classes starting at breakfast. Teaching is lots of fun for me. Right now I’m teaching two courses: an undergraduate course on game theory and a graduate course on category theory. I’m also running a seminar on category theory. In addition, I meet with my graduate students for a four­-hour session once a week: they show me what they’ve done, and we try to push our research projects forward.

On days when I don’t teach, I spend a lot of time writing. I love blogging, so I could easily do that all day, but I try to spend a lot of time writing actual papers. Any given paper starts out being tough to write, but near the end it practically writes itself. At the end, I have to tear myself away from it: I keep wanting to add more. At that stage, I feel an energetic glow at the end of a good day spent writing. Few things are so satisfying.

During the summer I don’t teach, so I can get a lot of writing done. I spent two years doing research at the Centre of Quantum Technologies, which is in Singapore, and since 2012 I’ve been working there during summers. Sometimes I bring my grad students, but mostly I just write.

I also spend plenty of time doing things with my wife, like talking, cooking, shopping, and working out at the gym. We like to watch TV shows in the evening, mainly mysteries and science fiction.

We also do a lot of gardening. When I was younger that seemed boring ­ but as you get older, subjective time speeds up, so you pay more attention to things like plants growing. There’s something tremendously satisfying about planting a small seedling, watching it grow into an orange tree, and eating its fruit for breakfast.

I love playing the piano and recording electronic music, but doing it well requires big blocks of time, which I don’t always have. Music is pure delight, and if I’m not listening to it I’m usually composing it in my mind.

If I gave in to my darkest urges and becames a decadent wastrel I might spend all day blogging, listening to music, recording music and working on pure math. But I need other things to stay sane.

What research are you working on at the moment?

Lately I’ve been trying to finish a paper called “Struggles with the Continuum”. It’s about the problems physics has with infinities, due to the assumption that spacetime is a continuum. At certain junctures this paper became psychologically difficult to write, since it’s supposed to include a summary of quantum field theory, which is complicated and sprawling subject. So, I’ve resorted to breaking this paper into blog articles and posting them on Physics Forums, just to motivate myself.

Purely for fun, I’ve been working with Greg Egan on some projects involving the octonions. The octonions are a number system where you can add, subtract, multiply and divide. Such number systems only exist in 1, 2, 4, and 8 dimensions: you’ve got the real numbers, which form a line, the complex numbers, which form a plane, the quaternions, which are 4­-dimensional, and the octonions, which are 8­dimensional. The octonions are the biggest, but also the weirdest. For example, multiplication of octonions violates the associative law: (xy)z is not equal to x(yz). So the octonions sound completely crazy at first, but they turn out to have fascinating connections to string theory and other things. They’re pretty addictive, and if became a decadent wastrel I would spend a lot more time on them.

The integral octonions of norm 1, projected onto a plane

The 240 unit integral octonions, projected onto a plane

There’s a concept of “integer” for the octonions, and integral octonions form a lattice, a repeating pattern of points, in 8 dimensions. This is called the E8 lattice. There’s another lattices that lives in in 24 dimensions, called the “Leech lattice”. Both are connected to string theory. Notice that 8+2 equals 10, the dimension superstrings like to live in, and 24+2 equals 26, the dimension bosonic strings like to live in. That’s not a coincidence! The 2 here comes from the 2­-dimensional world-sheet of the string.

Since 3×8 is 24, Egan and I became interested in how you could built the Leech lattice from 3 copies of the E8 lattice. People already knew a trick for doing it, but it took us a while to understand how it worked—and then Egan showed you could do this trick in exactly 17,280 ways! I want to write up the proof. There’s a lot of beautiful geometry here.

There’s something really exhilarating about struggling to reach the point where you have some insight into these structures and how they’re connected to physics.

My main work, though, involves using category theory to study networks. I’m interested in networks of all kinds, from electrical circuits to neural networks to “chemical reaction networks” and many more. Different branches of science and engineering focus on different kinds of networks. But there’s not enough communication between researchers in different subjects, so it’s up to mathematicians to set up a unified theory of networks.

I’ve got seven grad students working on this project—or actually eight, if you count Brendan Fong: I’ve been helping him on his dissertation, but he’s actually a student at Oxford.

Brendan was the first to join the project. I wanted him to work on electrical circuits, which are a nice familiar kind of network, a good starting point. But he went much deeper: he developed a general category­-theoretic framework for studying networks. We then applied it to electrical circuits, and other things as well.

Blake Pollard and Brendan Fong at the Centre for Quantum Technologies

Blake Pollard and Brendan Fong at the Centre for Quantum Technologies

Blake Pollard is a student of mine in the physics department here at U. C. Riverside. Together with Brendan and me, he developed a category­-theoretic approach to Markov processes: random processes where a system hops around between different states. We used Brendan’s general formalism to reduce Markov processes to electrical circuits. Now Blake is going further and using these ideas to tackle chemical reaction networks.

My other students are in the math department at U. C. Riverside. Jason Erbele is working on “control theory”, a branch of engineering where you try to design feedback loops to make sure processes run in a stable way. Control theory uses networks called “signal flow diagrams”, and Jason has worked out how to understand these using category theory.

Signal flow diagram

Signal flow diagram for an inverted pendulum on a cart

Jason isn’t using Brendan’s framework: he’s using a different one, called PROPs, which were developed a long time ago for use in algebraic topology. My student Franciscus Rebro has been developing it further, for use in our project. It gives a nice way to describe networks in terms of their basic building blocks. It also illuminates the similarity between signal flow diagrams and Feynman diagrams! They’re very similar, but there’s a big difference: in signal flow diagrams the signals are classical, while Feynman diagrams are quantum­-mechanical.

My student Brandon Coya has been working on electrical circuits. He’s sort of continuing what Brendan started, and unifying Brendan’s formalism with PROPs.

My student Adam Yassine is starting to work on networks in classical mechanics. In classical mechanics you usually consider a single system: you write down the Hamiltonian, you get the equations of motion, and you try to solve them. He’s working on a setup where you can take lots of systems and hook them up into a network.

My students Kenny Courser and Daniel Cicala are digging deeper into another aspect of network theory. As I hinted earlier, a category is about things and processes that turn one thing into another. In a 2­-category we also have “meta-processes” that turn one process into another. We’re starting to bring 2­-categories into network theory.

For example, you can use categories to describe an electrical circuit as a process that turns some inputs into some outputs. You put some currents in one end and some currents come out the other end. But you can also use 2­-categories to describe “meta-processes” that turn one electrical circuit into another. An example of a meta-process would be a way of simplifying an electrical circuit, like replacing two resistors in series by a single resistor.

Ultimately I want to push these ideas in the direction of biochemistry. Biology seems complicated and “messy” to physicists and mathematicians, but I think there must be a beautiful logic to it. It’s full of networks, and these networks change with time. So, 2­-categories seem like a natural language for biology.

It won’t be easy to convince people of this, but that’s okay.

Information Geometry (Part 16)

14 January, 2016

joint with Blake Pollard

Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).

The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how life works. That’s a difficult job! But one has to start somewhere, and this is one place to start.

We have a few papers on this subject:

• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)

However, right now we just want to show you three closely connected results about how relative entropy changes in open Markov processes.


An open Markov process consists of a finite set X of states, a subset B \subseteq X of boundary states, and an infinitesimal stochastic operator H: \mathbb{R}^X \to \mathbb{R}^X, meaning a linear operator with

H_{ij} \geq 0 \ \  \text{for all} \ \ i \neq j


\sum_i H_{ij} = 0 \ \  \text{for all} \ \ j

For each state i \in X we introduce a population p_i  \in [0,\infty). We call the resulting function p : X \to [0,\infty) the population distribution.

Populations evolve in time according to the open master equation:

\displaystyle{ \frac{dp_i}{dt} = \sum_j H_{ij}p_j} \ \  \text{for all} \ \  i \in X-B

p_i(t) = b_i(t) \ \  \text{for all} \ \  i \in B

So, the populations p_i obey a linear differential equation at states i that are not in the boundary, but they are specified ‘by the user’ to be chosen functions b_i at the boundary states. The off-diagonal entry H_{ij} for i \neq j describe the rate at which population transitions from the jth to the ith state.

A closed Markov process, or continuous-time discrete-state Markov chain, is an open Markov process whose boundary is empty. For a closed Markov process, the open master equation becomes the usual master equation:

\displaystyle{  \frac{dp}{dt} = Hp }

In a closed Markov process the total population is conserved:

\displaystyle{ \frac{d}{dt} \sum_{i \in X} p_i = \sum_{i,j} H_{ij}p_j = 0 }

This lets us normalize the initial total population to 1 and have it stay equal to 1. If we do this, we can talk about probabilities instead of populations. In an open Markov process, population can flow in and out at the boundary states.

For any pair of distinct states i,j, H_{ij}p_j is the flow of population from j to i. The net flux of population from the jth state to the ith state is the flow from j to i minus the flow from i to j:

J_{ij} = H_{ij}p_j - H_{ji}p_i

A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. The idea is that population can flow in or out at the boundary states.

We say an equilibrium p : X \to [0,\infty) of a Markov process is detailed balanced if all the net fluxes vanish:

J_{ij} = 0 \ \  \text{for all} \ \ i,j \in X

or in other words:

H_{ij}p_j = H_{ji}p_i \ \  \text{for all} \ \ i,j \in X

Given two population distributions p, q : X \to [0,\infty) we can define the relative entropy

\displaystyle{  I(p,q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right)}

When q is a detailed balanced equilibrium solution of the master equation, the relative entropy can be seen as the ‘free energy’ of p. For a precise statement, see Section 4 of Relative entropy in biological systems.

The Second Law of Thermodynamics implies that the free energy of a closed system tends to decrease with time, so for closed Markov processes we expect I(p,q) to be nonincreasing. And this is true! But for open Markov processes, free energy can flow in from outside. This is just one of several nice results about how relative entropy changes with time.


Theorem 1. Consider an open Markov process with X as its set of states and B as the set of boundary states. Suppose p(t) and q(t) obey the open master equation, and let the quantities

\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_{j \in X} H_{ij}p_j }

\displaystyle{  \frac{Dq_i}{Dt} = \frac{dq_i}{dt} - \sum_{j \in X} H_{ij}q_j }

measure how much the time derivatives of p_i and q_i fail to obey the master equation. Then we have

\begin{array}{ccl}   \displaystyle{  \frac{d}{dt}  I(p(t),q(t)) } &=& \displaystyle{ \sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)} \\ \\ && \; + \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} +  \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} }  \end{array}

This result separates the change in relative entropy change into two parts: an ‘internal’ part and a ‘boundary’ part.

It turns out the ‘internal’ part is always less than or equal to zero. So, from Theorem 1 we can deduce a version of the Second Law of Thermodynamics for open Markov processes:

Theorem 2. Given the conditions of Theorem 1, we have

\displaystyle{  \frac{d}{dt}  I(p(t),q(t)) \; \le \;  \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} +  \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt}  }

Intuitively, this says that free energy can only increase if it comes in from the boundary!

There is another nice result that holds when q is an equilibrium solution of the master equation. This idea seems to go back to Schnakenberg:

Theorem 3. Given the conditions of Theorem 1, suppose also that q is an equilibrium solution of the master equation. Then we have

\displaystyle{  \frac{d}{dt}  I(p(t),q) =  -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} \; + \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} }


J_{ij} = H_{ij}p_j - H_{ji}p_i

is the net flux from j to i, while

\displaystyle{ A_{ij} = \ln \left(\frac{p_j q_i}{p_i q_j} \right) }

is the conjugate thermodynamic force.

The flux J_{ij} has a nice meaning: it’s the net flow of population from j to i. The thermodynamic force is a bit subtler, but this theorem reveals its meaning: it says how much the population wants to flow from j to i.

More precisely, up to that factor of 1/2, the thermodynamic force A_{ij} says how much free energy loss is caused by net flux from j to i. There’s a nice analogy here to water losing potential energy as it flows downhill due to the force of gravity.


Proof of Theorem 1. We begin by taking the time derivative of the relative information:

\begin{array}{ccl} \displaystyle{ \frac{d}{dt}  I(p(t),q(t)) } &=&  \displaystyle{  \sum_{i \in X} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} +  \frac{\partial I}{\partial q_i} \frac{dq_i}{dt} } \end{array}

We can separate this into a sum over states i \in X - B, for which the time derivatives of p_i and q_i are given by the master equation, and boundary states i \in B, for which they are not:

\begin{array}{ccl} \displaystyle{ \frac{d}{dt}  I(p(t),q(t)) } &=&  \displaystyle{  \sum_{i \in X-B, \; j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j +                                               \frac{\partial I}{\partial q_i} H_{ij} q_j }\\  \\   && + \; \; \; \displaystyle{  \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} +  \frac{\partial I}{\partial q_i} \frac{dq_i}{dt}}   \end{array}

For boundary states we have

\displaystyle{ \frac{dp_i}{dt} = \frac{Dp_i}{Dt} + \sum_{j \in X} H_{ij}p_j }

and similarly for the time derivative of q_i. We thus obtain

\begin{array}{ccl}  \displaystyle{ \frac{d}{dt}  I(p(t),q(t)) } &=&  \displaystyle{  \sum_{i,j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j }\\  \\ && + \; \; \displaystyle{  \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} +  \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt}}   \end{array}

To evaluate the first sum, recall that

\displaystyle{   I(p,q) = \sum_{i \in X} p_i \ln (\frac{p_i}{q_i})}


\displaystyle{\frac{\partial I}{\partial p_i}} =\displaystyle{1 +  \ln (\frac{p_i}{q_i})} ,  \qquad \displaystyle{ \frac{\partial I}{\partial q_i}}=  \displaystyle{- \frac{p_i}{q_i}   }

Thus, we have

\displaystyle{ \sum_{i,j \in X}  \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j  =   \sum_{i,j\in X} (1 +  \ln (\frac{p_i}{q_i})) H_{ij} p_j - \frac{p_i}{q_i} H_{ij} q_j }

We can rewrite this as

\displaystyle{   \sum_{i,j \in X} H_{ij} p_j  \left( 1 + \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }

Since H_{ij} is infinitesimal stochastic we have \sum_{i} H_{ij} = 0, so the first term drops out, and we are left with

\displaystyle{   \sum_{i,j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }

as desired.   █

Proof of Theorem 2. Thanks to Theorem 1, to prove

\displaystyle{  \frac{d}{dt}  I(p(t),q(t)) \; \le \;  \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} +  \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt}  }

it suffices to show that

\displaystyle{   \sum_{i,j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) \le 0  }

or equivalently (recalling the proof of Theorem 1):

\displaystyle{ \sum_{i,j} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \le 0 }

The last two terms on the left hand side cancel when i = j. Thus, if we break the sum into an i \ne j part and an i = j part, the left side becomes

\displaystyle{   \sum_{i \ne j} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \; + \; \sum_j H_{jj} p_j \ln(\frac{p_j}{q_j}) }

Next we can use the infinitesimal stochastic property of H to write H_{jj} as the sum of -H_{ij} over i not equal to j, obtaining

\displaystyle{ \sum_{i \ne j} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) - \sum_{i \ne j} H_{ij} p_j \ln(\frac{p_j}{q_j}) } =

\displaystyle{ \sum_{i \ne j} H_{ij} p_j  \left( \ln(\frac{p_iq_j}{p_j q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) }

Since H_{ij} \ge 0 when i \ne j and \ln(s) + 1 - s \le 0 for all s > 0, we conclude that this quantity is \le 0.   █

Proof of Theorem 3. Now suppose also that q is an equilibrium solution of the master equation. Then Dq_i/Dt = dq_i/dt = 0 for all states i, so by Theorem 1 we need to show

\displaystyle{ \sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)  \; = \;  -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }

We also have \sum_{j \in X} H_{ij} q_j = 0, so the second
term in the sum at left vanishes, and it suffices to show

\displaystyle{  \sum_{i, j \in X} H_{ij} p_j  \ln(\frac{p_i}{q_i}) \; = \;  - \frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }

By definition we have

\displaystyle{  \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} =  \displaystyle{  \frac{1}{2} \sum_{i,j}  \left( H_{ij} p_j - H_{ji}p_i \right)   \ln \left( \frac{p_j q_i}{p_i q_j} \right) }

This in turn equals

\displaystyle{  \frac{1}{2} \sum_{i,j}  H_{ij}p_j    \ln \left( \frac{p_j q_i}{p_i q_j} \right) -   \frac{1}{2} \sum_{i,j}  H_{ji}p_i  \ln \left( \frac{p_j q_i}{p_i q_j} \right) }

and we can switch the dummy indices i,j in the second sum, obtaining

\displaystyle{  \frac{1}{2} \sum_{i,j}  H_{ij}p_j    \ln \left( \frac{p_j q_i}{p_i q_j} \right) -   \frac{1}{2} \sum_{i,j}  H_{ij}p_j    \ln \left( \frac{p_i q_j}{p_j q_i} \right) }

or simply

\displaystyle{ \sum_{i,j} H_{ij} p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) }

But this is

\displaystyle{  \sum_{i,j} H_{ij} p_j \left(\ln ( \frac{p_j}{q_j}) + \ln (\frac{q_i}{p_i}) \right) }

and the first term vanishes because H is infinitesimal stochastic: \sum_i H_{ij} = 0. We thus have

\displaystyle{  \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} = \sum_{i,j} H_{ij} p_j  \ln (\frac{q_i}{p_i} )

as desired.   █

Information Geometry (Part 15)

11 January, 2016

It’s been a long time since you’ve seen an installment of the information geometry series on this blog! Before I took a long break, I was explaining relative entropy and how it changes in evolutionary games. Much of what I said is summarized and carried further here:

• John Baez and Blake Pollard, Relative entropy in biological systems, Entropy 18 (2016), 46. (Blog article here.)

But now Blake has a new paper, and I want to talk about that:

• Blake Pollard, Open Markov processes: a compositional perspective on non-equilibrium steady states in biology, Entropy 18 (2016), 140.

I’ll focus on just one aspect: the principle of minimum entropy production. This is an exciting yet controversial principle in non-equilibrium thermodynamics. Blake examines it in a situation where we can tell exactly what’s happening.

Non-equilibrium steady states

Life exists away from equilibrium. Left isolated, systems will tend toward thermodynamic equilibrium. However, biology is about open systems: physical systems that exchange matter or energy with their surroundings. Open systems can be maintained away from equilibrium by this exchange. This leads to the idea of a non-equilibrium steady state—a state of an open system that doesn’t change, but is not in equilibrium.

A simple example is a pan of water sitting on a stove. Heat passes from the flame to the water and then to the air above. If the flame is very low, the water doesn’t boil and nothing moves. So, we have a steady state, at least approximately. But this is not an equilibrium, because there is a constant flow of energy through the water.

Of course in reality the water will be slowly evaporating, so we don’t really have a steady state. As always, models are approximations. If the water is evaporating slowly enough, it can be useful to approximate the situation with a non-equilibrium steady state.

There is much more to biology than steady states. However, to dip our toe into the chilly waters of non-equilibrium thermodynamics, it is nice to start with steady states. And already here there are puzzles left to solve.

Minimum entropy production

Ilya Prigogine won the Nobel prize for his work on non-equilibrium thermodynamics. One reason is that he had an interesting idea about steady states. He claimed that under certain conditions, a non-equilibrium steady state will minimize entropy production!

There has been a lot of work trying to make the ‘principle of minimum entropy production’ precise and turn it into a theorem. In this book:

• G. Lebon and D. Jou, Understanding Non-equilibrium Thermodynamics, Springer, Berlin, 2008.

the authors give an argument for the principle of minimum entropy production based on four conditions:

time-independent boundary conditions: the surroundings of the system don’t change with time.

linear phenomenological laws: the laws governing the macroscopic behavior of the system are linear.

constant phenomenological coefficients: the laws governing the macroscopic behavior of the system don’t change with time.

symmetry of the phenomenological coefficients: since they are linear, the laws governing the macroscopic behavior of the system can be described by a linear operator, and we demand that in a suitable basis the matrix for this operator is symmetric: T_{ij} = T_{ji}.

The last condition is obviously the subtlest one; it’s sometimes called Onsager reciprocity, and people have spent a lot of time trying to derive it from other conditions.

However, Blake goes in a different direction. He considers a concrete class of open systems, a very large class called ‘open Markov processes’. These systems obey the first three conditions listed above, and the ‘detailed balanced’ open Markov processes also obey the last one. But Blake shows that minimum entropy production holds only approximately—with the approximation being good for steady states that are near equilibrium!

However, he shows that another minimum principle holds exactly, even for steady states that are far from equilibrium. He calls this the ‘principle of minimum dissipation’.

We actually discussed the principle of minimum dissipation in an earlier paper:

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

But one advantage of Blake’s new paper is that it presents the results with a minimum of category theory. Of course I love category theory, and I think it’s the right way to formalize open systems, but it can be intimidating.

Another good thing about Blake’s new paper is that it explicitly compares the principle of minimum entropy to the principle of minimum dissipation. He shows they agree in a certain limit—namely, the limit where the system is close to equilibrium.

Let me explain this. I won’t include the nice example from biology that Blake discusses: a very simple model of membrane transport. For that, read his paper! I’ll just give the general results.

The principle of minimum dissipation

An open Markov process consists of a finite set X of states, a subset B \subseteq X of boundary states, and an infinitesimal stochastic operator H: \mathbb{R}^X \to \mathbb{R}^X, meaning a linear operator with

H_{ij} \geq 0 \ \  \text{for all} \ \ i \neq j


\sum_i H_{ij} = 0 \ \  \text{for all} \ \ j

I’ll explain these two conditions in a minute.

For each i \in X we introduce a population p_i  \in [0,\infty). We call the resulting function p : X \to [0,\infty) the population distribution. Populations evolve in time according to the open master equation:

\displaystyle{ \frac{dp_i}{dt} = \sum_j H_{ij}p_j} \ \  \text{for all} \ \ i \in X-B

p_i(t) = b_i(t) \ \  \text{for all} \ \ i \in B

So, the populations p_i obey a linear differential equation at states i that are not in the boundary, but they are specified ‘by the user’ to be chosen functions b_i at the boundary states.

The off-diagonal entries H_{ij}, \ i \neq j are the rates at which population hops from the jth to the ith state. This lets us understand the definition of an infinitesimal stochastic operator. The first condition:

H_{ij} \geq 0 \ \  \text{for all} \ \ i \neq j

says that the rate for population to transition from one state to another is non-negative. The second:

\sum_i H_{ij} = 0 \ \  \text{for all} \ \ j

says that population is conserved, at least if there are no boundary states. Population can flow in or out at boundary states, since the master equation doesn’t hold there.

A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. Again, the reason is that population can flow in or out at the boundary.

We say an equilibrium q : X \to [0,\infty) of a Markov process is detailed balanced if the rate at which population flows from the ith state to the jth state is equal to the rate at which it flows from the jth state to the ith:

H_{ji}q_i = H_{ij}q_j \ \  \text{for all} \ \ i,j \in X

Suppose we’ve got an open Markov process that has a detailed balanced equilibrium q. Then a non-equilibrium steady state p will minimize a function called the ‘dissipation’, subject to constraints on its boundary populations. There’s a nice formula for the dissipation in terms of p and q.

Definition. Given an open Markov process with detailed balanced equilibrium q we define the dissipation for a population distribution p to be

\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j}{q_j} - \frac{p_i}{q_i} \right)^2 }

This formula is a bit tricky, but you’ll notice it’s quadratic in p and it vanishes when p = q. So, it’s pretty nice.

Using this concept we can formulate a principle of minimum dissipation, and prove that non-equilibrium steady states obey this principle:

Definition. We say a population distribution p: X \to \mathbb{R} obeys the principle of minimum dissipation with boundary population b: X \to \mathbb{R} if p minimizes D(p) subject to the constraint that

p_i = b_i \ \  \text{for all} \ \ i \in B.

Theorem 1. A population distribution p is a steady state with p_i = b_i for all boundary states i if and only if p obeys the principle of minimum dissipation with boundary population b.

Proof. This follows from Theorem 28 in A compositional framework for Markov processes.

Minimum entropy production versus minimum dissipation

How does dissipation compare with entropy production? To answer this, first we must ask: what really is entropy production? And: how does the equilibrium state q show up in the concept of entropy production?

The relative entropy of two population distributions p,q is given by

\displaystyle{ I(p,q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right) }

It is well known that for a closed Markov process with q as a detailed balanced equilibrium, the relative entropy is monotonically decreasing with time. This is due to an annoying sign convention in the definition of relative entropy: while entropy is typically increasing, relative entropy typically decreases. We could fix this by putting a minus sign in the above formula or giving this quantity I(p,q) some other name. A lot of people call it the Kullback–Leibler divergence, but I have taken to calling it relative information. For more, see:

• John Baez and Blake Pollard, Relative entropy in biological systems. (Blog article here.)

We say ‘relative entropy’ in the title, but then we explain why ‘relative information’ is a better name, and use that. More importantly, we explain why I(p,q) has the physical meaning of free energy. Free energy tends to decrease, so everything is okay. For details, see Section 4.

Blake has a nice formula for how fast I(p,q) decreases:

Theorem 2. Consider an open Markov process with X as its set of states and B as the set of boundary states. Suppose p(t) obeys the open master equation and q is a detailed balanced equilibrium. For any boundary state i \in B, let

\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_{j \in X} H_{ij}p_j }

measure how much p_i fails to obey the master equation. Then we have

\begin{array}{ccl}   \displaystyle{  \frac{d}{dt}  I(p(t),q) } &=& \displaystyle{ \sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)} \\ \\ && \; + \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} }  \end{array}

Moreover, the first term is less than or equal to zero.

Proof. For a self-contained proof, see Information geometry (part 16), which is coming up soon. It will be a special case of the theorems there.   █

Blake compares this result to previous work by Schnakenberg:

• J. Schnakenberg, Network theory of microscopic and macroscopic behavior of master equation systems, Rev. Mod. Phys. 48 (1976), 571–585.

The negative of Blake’s first term is this:

\displaystyle{ K(p) = - \sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }

Under certain circumstances, this equals what Schnakenberg calls the entropy production. But a better name for this quantity might be free energy loss, since for a closed Markov process that’s exactly what it is! In this case there are no boundary states, so the theorem above says K(p) is the rate at which relative entropy—or in other words, free energy—decreases.

For an open Markov process, things are more complicated. The theorem above shows that free energy can also flow in or out at the boundary, thanks to the second term in the formula.

Anyway, the sensible thing is to compare a principle of ‘minimum free energy loss’ to the principle of minimum dissipation. The principle of minimum dissipation is true. How about the principle of minimum free energy loss? It turns out to be approximately true near equilibrium.

For this, consider the situation in which p is near to the equilibrium distribution q in the sense that

\displaystyle{ \frac{p_i}{q_i} = 1 + \epsilon_i }

for some small numbers \epsilon_i. We collect these numbers in a vector called \epsilon.

Theorem 3. Consider an open Markov process with X as its set of states and B as the set of boundary states. Suppose q is a detailed balanced equilibrium and let p be arbitrary. Then

K(p) = D(p) + O(\epsilon^2)

where K(p) is the free energy loss, D(p) is the dissipation, \epsilon_i is defined as above, and by O(\epsilon^2) we mean a sum of terms of order \epsilon_i^2.

Proof. First take the free energy loss:

\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)}

Expanding the logarithm to first order in \epsilon, we get

\displaystyle{ K(p) =  -\sum_{i, j \in X} H_{ij} p_j  \left( \frac{p_i}{q_i} - 1 - \frac{p_i q_j}{p_j q_i} \right) + O(\epsilon^2) }

Since H is infinitesimal stochastic, \sum_i H_{ij} = 0, so the second term in the sum vanishes, leaving

\displaystyle{ K(p) =  -\sum_{i, j \in X} H_{ij} p_j  \left( \frac{p_i}{q_i} - \frac{p_i q_j}{p_j q_i} \right) \; + O(\epsilon^2) }


\displaystyle{ K(p) =  -\sum_{i, j \in X} \left( H_{ij} p_j  \frac{p_i}{q_i} - H_{ij} q_j \frac{p_i}{q_i} \right) \; + O(\epsilon^2) }

Since q is a equilibrium we have \sum_j H_{ij} q_j = 0, so now the last term in the sum vanishes, leaving

\displaystyle{ K(p) =  -\sum_{i, j \in X} H_{ij} \frac{p_i p_j}{q_i} \; + O(\epsilon^2) }

Next, take the dissipation

\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j}{q_j} - \frac{p_i}{q_i} \right)^2 }

and expand the square, getting

\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j^2}{q_j^2} - 2\frac{p_i p_j}{q_i q_j} +  \frac{p_i^2}{q_i^2} \right) }

Since H is infinitesimal stochastic, \sum_i H_{ij} = 0. The first term is just this times a function of j, summed over j, so it vanishes, leaving

\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left(- 2\frac{p_i p_j}{q_i q_j} +  \frac{p_i^2}{q_i^2} \right) }

Since q is an equilibrium, \sum_j H_{ij} q_j = 0. The last term above is this times a function of i, summed over i, so it vanishes, leaving

\displaystyle{ D(p) = - \sum_{i,j} H_{ij}q_j  \frac{p_i p_j}{q_i q_j} = - \sum_{i,j} H_{ij} \frac{p_i p_j}{q_i}  }

This matches what we got for K(p), up to terms of order O(\epsilon^2).   █

In short: detailed balanced open Markov processes are governed by the principle of minimum dissipation, not minimum entropy production. Minimum dissipation agrees with minimum entropy production only near equilibrium.