The Kuramoto–Sivashinsky Equation (Part 8)

5 October, 2022

Our paper on the Kuramoto–Sivashinsky equation is out!

• John Baez, Steve Huntsman and Cheyne Weis, The Kuramoto–Sivashinsky equation, Notices Amer. Math. Soc. 69 (2022), 1581–1583.

This equation displays chaos and an arrow of time—and we state a precise conjecture about it, which may be very hard to prove, but is easy to study numerically. Steve Huntsman has written MATLAB code to help you do this, and I can make that available if you’re interested.

So what’s the idea? The article is short so I’ll just include it here!

The Kuramoto–Sivashinsky equation

u_t = -u_{xx} - u_{xxxx} - u_x u

applies to a real-valued function u of time t \in \mathbb{R} and space x \in \mathbb{R}. This equation was introduced as a simple 1-dimensional model of instabilities in flames, but it turned out to mathematically fascinating in its own right. One reason is that the Kuramoto–Sivashinsky equation is a simple model of Galilean-invariant chaos with an arrow of time.

We say this equation is ‘Galilean invariant’ because the Galiei group, the usual group of symmetries in Newtonian mechanics, acts on the set of its solutions. When space is 1-dimensional, this group is generated by translations in t and x, reflections in x, and Galilei boosts, which are transformations to moving coordinate systems:

(t,x) \mapsto (t,x-tv)

Translations act in the obvious way. Spatial reflections act as follows: if u(t,x) is a solution, so is -u(t,-x). Galilei boosts act in a more subtle way: if u(t,x) is a solution, so is u(t,x-tv) + v.

Figure 1 — A solution u(t,x) of the Kuramoto–Sivashinsky equation. The variable x ranges over the interval [0,32\pi] with its endpoints identified. Initial data are independent identically distributed random variables, one at each grid point, uniformly distributed in [-1,1].

We say the Kuramoto–Sivashinsky equation is ‘chaotic’ because the distance between nearby solutions, defined in a suitable way, can grow exponentially, making the long-term behavior of a solution hard to predict in detail. And finally, we say this equation has an ‘arrow of time’ because time reversal

(t,x) \mapsto (-t,x)

is not a symmetry of this equation. Indeed, in Figure 1 we see that starting from random initial conditions, manifestly time-asymmetric patterns emerge. As we move forward in time, it looks as if stripes are born and merge, but never die or split. Attempting to make this precise leads to an interesting conjecture, but first we need some background.

It is common to study solutions of the Kuramoto–Sivashinsky equations that are spatially periodic, so that u(t,x) = u(t,x+L) for some L. We can then treat space as a circle, the interval [0,L] with its endpoints identified. For these spatially periodic solutions, the integral \int_0^L u(t,x) \, dx does not change with time. Applying a Galilean transform adds a constant to this integral. In what follows we restrict attention to solutions where this integral is zero. These are roughly the solutions where the stripes are at rest, on average.

We can learn a surprising amount about these solutions by looking at the linearized equation

u_t = -u_{xx} - u_{xxxx}

We can solve this using a Fourier series

\displaystyle{  u(t,x) = \sum_{0 \ne n \in \mathbb{Z}} \hat{u}_n(t) \, e^{ik_n x} }

where the frequency of the nth mode is k_n = 2\pi n/L. We obtain

\hat{u}_n(t) = \exp\left((k_n^2 - k_n^4) t\right) \, \hat{u}_n(0)

Thus the nth mode grows exponentially with time if and only if k_n^2 - k_n^4 > 0, which happens when 0 < |n| < 2 \pi L. So, as we increase L, more and more modes grow exponentially. These appear to be the cause of chaos even in the nonlinear equation. Indeed, all solutions of the Kuramoto–Sivashinsky equation approach an attractive fixed point if L is small enough, but as we increase L we see increasingly complicated behavior, and a ‘transition to chaos via period doubling’, which has been analyzed in great detail. Interestingly, the nonlinear term stabilizes the exponentially growing modes: in the language of physics, it tends to transfer power from these modes to high-frequency modes, which decay exponentially.

Proving this last fact is not easy. However, in 1992, Collet, Eckmann, Epstein and Stubbe did this in the process of showing that for any initial data in a Hilbert space called \dot{L}^2, which consists of functions u \colon [0,L] \to \mathbb{R} such that

\int_0^L |u(x)|^2 \, dx < \infty


\int_0^L u(x) \, dx = 0 ,

the Kuramoto–Sivashinsky equation has a unique solution for t \ge 0, in a suitable sense, and that the norm of this solution eventually becomes less than some constant times L^{8/5}, These authors also showed any such solution eventually becomes infinitely differentiable, even analytic.

Shortly after this, Temam and Wang went further. They showed that all solutions of the Kuramoto–Sivashinsky equation with initial data in \dot{L}^2 approach a finite-dimensional submanifold of \dot{L}^2 as t \to +\infty. They also showed the dimension of this manifold is bounded by a constant times (\ln L)^{0.2} L^{1.64}. This manifold, called the inertial manifold, describes the ‘eventual behaviors’ of solutions.

Understanding the eventual behavior of solutions of the Kuramoto–Sivashinsky equation remains a huge challenge. What are the ‘stripes’ in these solutions? Can we define them in such a way that stripes are born and merge but never split or disappear, at least after a solution has had time to get close enough to the inertial manifold?

It is important to note that the visually evident stripes in Figure 1 are not regions where u exceeds some constant, nor regions where it is less than some constant. Instead, as x increases and (t,x) passes through a stripe, u(t,x) first becomes positive and then negative. Thus, in the middle of a stripe the derivative u_x(t,x) takes a negative value. One obvious thing to try, then, is to define a stripe to be a region where u_x(t,x) < c for some suitably chosen negative constant c. Unfortunately the boundaries of these regions, where u_x(t,x) = c, tend to be very rough. Thus, this definition gives many small evanescent ‘stripes’, so the conjecture that eventually stripes are born and merge but never die or split would be false with such a definition.

One way to solve this problem is to smooth u_x by convolving it with a normalized Gaussian function of x. A bit of experimentation suggests using a normalized Gaussian of standard deviation 2. Thus, let v equal u_x convolved with this Gaussian, and define a ‘stripe’ to be a region where v < 0. For the solution in Figure 1, these stripes are indicated in Figure 2. One stripe splits around t = 10, but after the solution nears the inertial manifold, stripes never die or split—at least in this example. We conjecture that this is true generically: that is, for all solutions in some open dense subset of the inertial manifold. Proving this seems quite challenging, but it might be a step toward rigorously relating the Kuramoto–Sivashinsky equation to a model where stochastically moving particles can appear or merge, but never disappear or split.

Figure 2 — The same solution of the Kuramoto–Sivashinsky equation, with stripes indicated.

Numerical calculations also indicate that generically, solutions eventually have stripes with an average density that approaches about 0.1 as L \to +\infty. This is close to the inverse of the wavelength of the fastest-growing mode of the linearized equation, (2^{3/2} \pi)^{-1} \approx 0.1125. But we see no clear reason to think these numbers should be exactly equal. Even proving the existence of a limiting stripe density is an open problem! Thus, the Kuramoto–Sivashinsky equation continues to pose many mathematical challenges.


[1] P. Collet, J.-P. Eckmann, H. Epstein and J. Stubbe, A global attracting set for the Kuramoto–Sivashinsky equation Commun. Math. Phys. 152 (1993), 203–214.

[2] P. Collet, J.-P. Eckmann, H. Epstein and J. Stubbe, Analyticity for the Kuramoto–Sivashinsky equation, Physica D 67 (1993), 321–326.

[3] R. A. Edson, J. E. Bunder, T. W. Mattner and A. J. Roberts, Lyapunov exponents of the Kuramoto–Sivashinsky PDE, The ANZIAM Journal 61 (2019), 270–285.

[4] D. T. Papageorgiou and Y. S. Smyrlis, The route to chaos for the Kuramoto–Sivashinsky equation, Theor. Comput. Fluid Dyn. 3 (1991), 15–42.

[5] M. Rost and J. Krug, A particle model for the Kuramoto–Sivashinsky equation, Physica D 88 (1995), 1–13.

[6] R. Temam and X. M. Wang, Estimates on the lowest dimension of inertial manifolds for the Kuramoto–Sivashinsky equation in the general case, Differential and Integral Equations 7 (1994), 1095–1108.

[7] Encyclopedia of Mathematics, Kuramoto–Sivashinsky equation.

The Algebra of Grand Unified Theories

12 September, 2022

In this long conversation, Timothy Nguyen and I talk about the fascinating mathematical patterns in the Standard Model that led people to grand unified theories:

For more details go here:

• John Baez and John Huerta, The algebra of grand unified theories, Bulletin of the American Mathematical Society 47 (2010), 483–552.

Categorical Semantics of Entropy

19 April, 2022

There will be a workshop on the categorical semantics of entropy at the CUNY Grad Center in Manhattan on Friday May 13th, organized by John Terilla. I was kindly invited to give an online tutorial beforehand on May 11, which I will give remotely to save carbon. Tai-Danae Bradley will also be giving a tutorial that day in person:

Tutorial: Categorical Semantics of Entropy, Wednesday 11 May 2022, 13:00–16:30 Eastern Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

12:00-1:00 Eastern Daylight Time — Lunch in Room 5209.

1:00-2:30 — Shannon entropy from category theory, John Baez, University of California Riverside; Centre for Quantum Technologies (Singapore); Topos Institute.

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

2:30-3:00 — Coffee break.

3:00-4:30 — Operads and entropy, Tai-Danae Bradley, The Master’s University; Sandbox AQ.

This talk will open with a basic introduction to operads and their representations, with the main example being the operad of probabilities. I’ll then give a light sketch of how this framework leads to a small, but interesting, connection between information theory, abstract algebra, and topology, namely a correspondence between Shannon entropy and derivations of the operad of probabilities.

Symposium on Categorical Semantics of Entropy, Friday 13 May 2022, 9:30-3:15 Eastern Daylight Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

9:30-10:00 Eastern Daylight Time — Coffee and pastries in Room 5209.

10:00-10:45 — Operadic composition of thermodynamic systems, Owen Lynch, Utrecht University.

The maximum entropy principle is a fascinating and productive lens with which to view both thermodynamics and statistical mechanics. In this talk, we present a categorification of the maximum entropy principle, using convex spaces and operads. Along the way, we will discuss a variety of examples of the maximum entropy principle and show how each application can be captured using our framework. This approach shines a new light on old constructions. For instance, we will show how we can derive the canonical ensemble by attaching a probabilistic system to a heat bath. Finally, our approach to this categorification has applications beyond the maximum entropy principle, and we will give an hint of how to adapt this categorification to the formalization of the composition of other systems.

11:00-11:45 — Polynomial functors and Shannon entropy, David Spivak, MIT and the Topos Institute.

The category Poly of polynomial functors in one variable is extremely rich, brimming with categorical gadgets (e.g. eight monoidal products, two closures, limits, colimits, etc.) and applications including dynamical systems, databases, open games, and cellular automata. In this talk I’ll show that objects in Poly can be understood as empirical distributions. In part using the standard derivative of polynomials, we obtain a functor to Set × Setop which encodes an invariant of a distribution as a pair of sets. This invariant is well-behaved in the sense that it is a distributive monoidal functor: it acts on both distributions and maps between them, and it preserves both the sum and the tensor product of distributions. The Shannon entropy of the original distribution is then calculated directly from the invariant, i.e. only in terms of the cardinalities of these two sets. Given the many applications of polynomial functors and of Shannon entropy, having this link between them has potential to create useful synergies, e.g. to notions of entropic causality or entropic learning in dynamical systems.

12:00-1:30 — Lunch in Room 5209

1:30-2:15 — Higher entropy, Tom Mainiero, Rutgers New High Energy Theory Center.

Is the frowzy state of your desk no longer as thrilling as it once was? Are numerical measures of information no longer able to satisfy your needs? There is a cure! In this talk we’ll learn about: the secret topological lives of multipartite measures and quantum states; how a homological probe of this geometry reveals correlated random variables; the sly decategorified involvement of Shannon, Tsallis, Réyni, and von Neumann in this larger geometric conspiracy; and the story of how Gelfand, Neumark, and Segal’s construction of von Neumann algebra representations can help us uncover this informatic ruse. So come to this talk, spice up your entropic life, and bring new meaning to your relationship with disarray.

2:30-3:15 — On characterizing classical and quantum entropy, Arthur Parzygnat, Institut des Hautes Études Scientifiques.

In 2011, Baez, Fritz, and Leinster proved that the Shannon entropy can be characterized as a functor by a few simple postulates. In 2014, Baez and Fritz extended this theorem to provide a Bayesian characterization of the classical relative entropy, also known as the Kullback–Leibler divergence. In 2017, Gagné and Panangaden extended the latter result to include standard Borel spaces. In 2020, I generalized the first result on Shannon entropy so that it includes the von Neumann (quantum) entropy. In 2021, I provided partial results indicating that the Umegaki relative entropy may also have a Bayesian characterization. My results in the quantum setting are special applications of the recent theory of quantum Bayesian inference, which is a non-commutative extension of classical Bayesian statistics based on category theory. In this talk, I will give an overview of these developments and their possible applications in quantum information theory.

Wine and cheese reception to follow, Room 5209.

Compositional Thermostatics (Part 4)

8 March, 2022

guest post by Owen Lynch

This is the fourth and final part of a blog series on this paper:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

In Part 1, we went over our definition of thermostatic system: it’s a convex space X of states and a concave function S \colon X \to [-\infty, \infty] saying the entropy of each state. We also gave examples of thermostatic systems.

In Part 2, we talked about what it means to compose thermostatic systems. It amounts to constrained maximization of the total entropy.

In Part 3 we laid down a categorical framework for composing systems when there are choices that have to be made for how the systems are composed. This framework has been around for a long time: operads and operad algebras.

In this post we will bring together all of these parts in a big synthesis to create an operad of all the ways of composing thermostatic systems, along with an operad algebra of thermostatic systems!

Recall that in order to compose thermostatic systems (X_1, S_1), \ldots, (X_n, S_n), we need to use a ‘parameterized constraint’, a convex subset

R \subseteq X_1 \times \cdots \times X_n \times Y,

where Y is some other convex set. We end up with a thermostatic system on Y, with S \colon Y \to [-\infty,\infty] defined by

S(y) = \sup_{(x_1,\ldots,x_n,y) \in R} S_1(x_1) + \cdots + S_n(x_n)

In order to model this using operads and operad algebras, we will make an operad \mathcal{CR} which has convex sets as its types, and convex relations as its morphisms. Then we will make an operad algebra that assigns to any convex set X the set of concave functions

S \colon X \to [-\infty,\infty]

This operad algebra will describe how, given a relation R \subseteq X_1 \times \cdots \times X_n \times Y, we can ‘push forward’ entropy functions on X_1,\ldots,X_n to form an entropy function on Y.

The operad \mathcal{CR} is built using a construction from Part 3 that takes a symmetric monoidal category and produces an operad. The symmetric monoidal category that we start with is \mathsf{ConvRel}, which has convex sets as its objects and convex relations as its morphisms. This symmetric monoidal category has \mathsf{Conv} (the category of convex sets and convex-linear functions) as a subcategory with all the same objects, and \mathsf{ConvRel} inherits a symmetric monoidal structure from the bigger category \mathsf{Conv}.

Following the construction from Part 3, we see that we get an operad

\mathcal{CR} = \mathrm{Op}(\mathsf{ConvRel})

exactly as described before: namely it has convex sets as types, and

\mathcal{CR}(X_1,\ldots,X_n;Y) = \mathsf{ConvRel}(X_1 \times \cdots \times X_n, Y)

Next we want to make an operad algebra on \mathcal{CR}. To do this we use a lax symmetric monoidal functor \mathrm{Ent} from \mathsf{ConvRel} to \mathsf{Set}, defined as follows. On objects, \mathrm{Ent} sends any convex set X to the set of entropy functions on it:

\mathrm{Ent}(X) = \{ S \colon X \to [-\infty,\infty] \mid S \: \text{is concave} \}

On morphisms, \mathrm{Ent} sends any convex relation to to the map that “pushes forward” an entropy function along that relation:

\mathrm{Ent}(R \subseteq X \times Y) = (y \mapsto \sup_{(x,y) \in R} S(x))

And finally, the all-important laxator \epsilon produces an entropy function on X_1 \times X_2 by summing an entropy function on X_1 and an entropy function on X_2:

\epsilon_{X_1,X_2} = ((S_1,S_2) \mapsto S_1 + S_2)

The proof that all this indeed defines a lax symmetric monoidal functor can be found in our paper. The main point is that once we have proven this really is a lax symmetric monoidal functor, we can invoke the machinery of lax symmetric monoidal functors and operad algebras to prove that we get an operad algebra! This is very convenient, because proving that we have an operad algebra directly would be somewhat tedious.

We have now reached the technical high point of the paper, which is showing that this operad algebra exists and thus formalizing what it means to compose thermostatic systems. All that remains to do now is to show off a bunch of examples of composition, so that you can see how all this categorical machinery works in practice. In our paper we give many examples, but here let’s consider just one.

Consider the following setup with two ideal gases connected by a movable divider.

The state space of each individual ideal gas is \mathbb{R}^3_{> 0}, with coordinates (U,V,N) representing energy, volume, and number of particles respectively. Let (U_1, V_1, N_1) be the coordinates for the left-hand gas, and (U_2, V_2, N_2) be the coordinates for the right-hand gas. Then as the two gases move to thermodynamic equilibrium, the conserved quantities are U_1 + U_2, V_1 + V_2, N_1 and N_2. We picture this with the following diagram.

Ports on the inner circles represent variables for the ideal gases, and ports on the outer circle represent variables for the composed system. Wires represent relations between those variables. Thus, the entire diagram represents an operation in \mathcal{CR}, given by

U_1 + U_2 = U^e
V_1 + V_2 = V^e
N_1 = N_1^e
N_2 = N_2^e

We can then use the operad algebra to take entropy functions S_1,S_2 \colon \mathbb{R}^3_{> 0} \to [-\infty, \infty] on the two inner systems (the two ideal gases), and get an entropy function S^e \colon \mathbb{R}^4_{> 0} \to [-\infty,\infty] on the outer system.

As a consequence of this entropy maximization procedure, the inner state (U_1,V_1,N_1), (U_2,V_2,N_2) are such that the temperature and pressure equilibriate between the two ideal gases. This is because constrained maximization with the constraint U_1 + U_2 = U^e leads to the following equations at a maximizer:

\displaystyle{ \frac{1}{T_1} = \frac{\partial S_1}{\partial U_1} = \frac{\partial S_2}{\partial U_2} = \frac{1}{T_2} }

(where T_1 and T_2 are the respective temperatures), and

\displaystyle{ \frac{p_1}{T_1} = \frac{\partial S_1}{\partial V_1} = \frac{\partial S_2}{\partial V_2} = \frac{p_2}{T_2} }

(where p_1 and p_2 are the respective pressures).

Thus we arrive at the expected conclusion, which is that temperature and pressure equalize when we maximize entropy under constraints on the total energy and volume.

And that concludes this series of blog posts! For more examples of thermostatic composition, I invite you to read our paper, which has some “thermostatic systems” that one does not normally see thought of in this way, such as heat baths and probabilistic systems! And if you find this stuff interesting, don’t hesitate to reach out to me! Just drop a comment here or email me at the address in the paper.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

Compositional Thermostatics (Part 3)

14 February, 2022

guest post by Owen Lynch

This is the third part (Part 1, Part 2) of a blog series on a paper that we wrote recently:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

In the previous two posts we talked about what a thermostatic system was, and how we think about composing them. In this post, we are going to back up from thermostatic systems a little bit, and talk about operads: a general framework for composing things! But we will not yet discuss how thermostatic systems use this framework—we’ll do that in the next post.

The basic idea behind this framework is the following. Suppose that we have a bunch of widgets, and we want to compose these widgets. If we are lucky, given two widgets there is a natural way of composing them. This is the case if the widgets are elements of a monoid; we simply use the monoid operation. This is also the case if the widgets are morphisms in a category; if the domain and codomain of two widgets match up, then they can be composed. More generally, n-morphisms in a higher category also have natural ways of composition.

However, there is not always a canonical way of composing widgets. For instance, let R be a commutative ring, and let a and b be elements of R. Then there are many ways to compose them: we could add them, subtract them, multiply them, etc. In fact any element of the free commutative ring \mathbb{Z}[x,y] gives a way of composing a pair of elements in a commutative ring. For instance, x^2 + xy - y^2, when applied to a and b, gives a^2 + ab - b^2. Note that there is nothing special here about the fact that we started with two elements of R; we could start with as many elements of R as we liked, say, a_1,\ldots,a_n, and any element of \mathbb{Z}[x_1,\ldots,x_n] would give a ‘way of composing’ a_1,\ldots,a_n.

The reader familiar with universal algebra should recognize that this situation is very general: we could do the exact same thing with vector spaces, modules, groups, algebras, or any more exotic structures that support a notion of ‘free algebra on n variables’.

Let’s also discuss a less algebraic example. A point process X on a subset of Euclidean space A \subseteq \mathbb{R}^n can be described as an assignment of a \mathbb{N}-valued random variable X_U to each measurable set U \subseteq A that is countably additive under disjoint union of measurable sets.

The interpretation is that a point process gives a random collection of points in A, and X_U counts how many points fall in U. Moreover, this collection of points cannot have a limit point; there cannot be infinitely many points in any compact subset of A.

Now suppose that f \colon B \to A and g \colon C \to A are rigid embeddings such that f(B) \cap g(C) = \emptyset, and that X is a point process on B and Y is a point process on C. Then we can define a new point process Z on A (assuming that X and Y are independent) by letting

Z_U = X_{f^{-1}(U)}+ Y_{g^{-1}(U)}

This is the union of the point process X running in f(B) and the point process Y running in g(C).

Composing two point processes

The precise details here are not so important: what I want to display is the intuition that we are geometrically composing things that ‘live on’ a space. The embeddings f and g give us a way of gluing together a point process on B and a point process on C to get a point process on A. We could have picked something else that lives on a space, like a scalar/vector field, but I chose point processes because they are easy to visualize and composing them is fairly simple (when composing vector fields one has to be careful that they ‘match’ at the edges).


In all of the examples in the previous section, we have things that we want to compose, and ways of composing them. This situation is formalized by operads and operad algebras (which we will define very shortly). However, the confusing part is that the operad part corresponds to'”ways of composing them’, and the operad algebra part corresponds to ‘things we want to compose’. Thus, the mathematics is somewhat ‘flipped’ from the way of thinking that comes naturally; we first think about the ways of composing things, and then we think about what things we want to compose, rather than first thinking about the things we want to compose and only later thinking about the ways of composing them!

Unfortunately, this is the logical way of presenting operads and operad algebras; we must define what an operad is before we can talk about their algebras, even if what we really care about is the algebras. Thus, without further ado, let us define what an operad is.

An operad \mathcal{O} consists of a collection \mathcal{O}_0 of types (which are abstract, just like the ‘objects’ in a category are abstract), and for every list of types X_1,\ldots,X_n,Y \in \mathcal{O}_0, a collection of operations \mathcal{O}(X_1,\ldots,X_n;Y).

These operations are the ‘ways of composing things’, but they themselves can be composed by ‘feeding into’ each other, in the following way.

Suppose that g \in \mathcal{O}(Y_1,\ldots,Y_n;Z) and for each i = 1,\ldots,n, f_i \in \mathcal{O}(X_{i,1},\ldots,X_{i,k_i};Y_i). Then we can make an operation

g(f_1,\ldots,f_n) \in \mathcal{O}(X_{1,1},\ldots,X_{1,k_1},\ldots,X_{n,1},\ldots,X_{n,k_n};Z)

We visualize operads by letting an operation be a circle that can take several inputs and produces a single output. Then composition of operations is given by attaching the output of circles to the input of other circles. Pictured below is the composition of a unary operator f_1, a nullary operator f_2, and a binary operator f_3 with a ternary operator g to create a ternary operator g(f_1,f_2,f_3).

This image has an empty alt attribute; its file name is bitmap.png
One view of composition in an operad

Additionally, for every type X \in \mathcal{O}_0, there is an ‘identity operation’ 1_X \in \mathcal{O}(X;X) that satisfies for any g \in \mathcal{O}(X_1,\ldots,X_n;Y)

g(1_{X_1},\ldots,1_{X_n}) = g

and for any f \in \mathcal{O}(X;Y)

1_Y(f) = f

There is also an associativity law for composition that is a massive pain to write out explicitly, but is more or less exactly as one would expect. For unary operators f,g,h, it states

f(g(h)) = f(g)(h)

The last condition for being an operad is that if f \in \mathcal{O}(X_1,\ldots,X_n;Y) and \sigma \in S(n), the symmetric group on n elements, then we can apply \sigma to f to get

\sigma^\ast(f) \in \mathcal{O}(X_{\sigma(1)},\ldots,X_{\sigma(n)};Y).

We require that (\sigma \tau)^\ast(f) = \tau^\ast(\sigma^\ast(f)) if \sigma,\tau \in S(n), and there are also some conditions for how \sigma^\ast interacts with composition, which can be straightforwardly derived from the intuition that \sigma^\ast permutes the arguments of an operation.

Note that our definition of an operad is what might typically be known as a ‘symmetric, colored operad’, but as we will always be using symmetric, colored operads, we choose to simply drop the modifiers.

That was a long definition, so it is time for an example. This example corresponds to the first situation in the first section, where we wanted to compose ring elements.

Define \mathcal{R} to be an operad with one type, which we will call R \in \mathcal{R}_0, and let \mathcal{R}(R^n;R) = \mathbb{Z}[x_1,\ldots,x_n], where \mathcal{R}(R^n;R) is \mathcal{R}(R,\ldots,R;R) with R repeated n times.

Composition is simply polynomial substitution. That is, if

q(y_1,\ldots,y_n) \in \mathbb{Z}[y_1,\ldots,y_n] \cong \mathcal{R}(R^n;R)


p_i(x_{i,1},\ldots,x_{i,k_i}) \in \mathbb{Z}[x_{i,1},\ldots,x_{i,k_i}] \cong \mathcal{R}(R^{k_i};R)


q(p_1(x_{1,1},\ldots,x_{1,k_1}),\ldots,p_n(x_{n,1},\ldots,x_{n,k_n})) \in \mathcal{R}(R^{\sum_{i=1}^n k_i};R)

is the composite of p_1,\ldots,p_n,q. For instance, composing

x^2 \in \mathbb{Z}[x] \cong \mathcal{R}(R;R)


y+z \in \mathbb{Z}[y,z] \cong \mathcal{R}(R,R;R)

results in

(y+z)^2 \in \mathbb{Z}[y,z] \cong \mathcal{R}(R,R;R)

The reader is invited to supply details for identities and the symmetry operators.

For the other example, define an operad \mathcal{P} by letting \mathcal{P}_0 be the set of compact subsets of \mathbb{R}^2 (we could consider something more exciting, but this works fine and is easy to visualize). An operation f \in \mathcal{P}(X_1,\ldots,X_n;Y) consists of disjoint embeddings f_1,\ldots,f_n, where f_i \colon X_i \to Y.

We can visualize such an operation as simply a shape with holes in it.

Shape with holes

Composition of such operations is just given by nesting the holes.

Composing by nesting

The outcome of the above composition is given by simply taking away the intermediate shapes (i.e. the big circle and the triangle).

The composed operation

Another source of examples for operads comes from the following construction. Suppose that (C,\otimes,1) is a symmetric monoidal category. Define \mathrm{Op}(C,\otimes,1) = \mathrm{Op}(C) by letting

\mathrm{Op}(C)_0 = C_0

where C_0 is the collection of objects in C, and

\mathrm{Op}(C)(X_1,\ldots,X_n;Y) = \mathrm{Hom}_C(X_1 \otimes \cdots \otimes X_n, Y)

To compose operations f_1,\ldots,f_n and g (assuming that the types are such that these are composable), we simply take g \circ (f_1 \otimes \ldots \otimes f_n). Moreover, the identity operation is simply the identity morphism, and the action of \sigma \in S(n) is given by the symmetric monoidal structure.

In fact, the second example that we talked about is an example of this construction! If we let C be the category where the objects are compact subsets of \mathbb{R}^2, with embeddings as the morphisms, and let the symmetric monoidal product be disjoint union, then it is not too hard to show that the operad we end up with is the same as the one we described above.

Perhaps the most important example of this construction is when it is applied to (\mathsf{Set}, \times, 1), because this is important in the next section! This operad has as types, sets, and an operation

f \in \mathrm{Op}(\mathsf{Set})(X_1,\ldots,X_n;Y)

is simply a function

f \colon X_1 \times \cdots \times X_n \to Y

Operad algebras

Although ‘operad algebra’ is the name that has stuck in the literature, I think a better term would be ‘operad action’, because the analogy to keep in mind is that of a group action. A group action allows a group to ‘act on’ elements of a set; an operad algebra similarly allows an operad to ‘act on’ elements of a set.

Moreover, a group action can be described as a functor from the 1-element category representing that group to \mathsf{Set}, and as we will see, an operad algebra can also be described as an ‘operad morphism’ from the operad to \mathrm{Op}(\mathsf{Set}), the operad just described in the last section.

In fact, this is how we will define an operad algebra; first we will define what an operad morphism is, and then we will define an operad algebra as an operad morphism to \mathrm{Op}(\mathsf{Set}).

An operad morphism F from an operad \mathcal{O} to an operad \mathcal{P} is exactly what one would expect: it consists of

• For every X_1,\ldots,X_n,Y \in \mathcal{O}_0, a map

F \colon \mathcal{O}(X_1,\ldots,X_n;Y) \to \mathcal{P}(F(X_1),\ldots,F(X_n);F(Y))

such that F commutes with all of the things an operad does, i.e. composition, identities, and the action of \sigma \in S(n).

Thus an operad morphism F from \mathcal{O} to \mathrm{Op}(\mathsf{Set}), also known as an operad algebra, consists of

• A set F(X) for every X \in \mathcal{O}_0
• A function F(f) \colon F(X_1) \times \cdots \times F(X_n) \to F(Y) for every operation f \in \mathcal{O}(X_1,\ldots,X_n;Y)

such that the assignment of sets and functions preserves identities, composition, and the action of \sigma \in S(n).

Without further ado, let’s look at the examples. From any ring A we can produce an algebra F_A of \mathcal{R}. We let F_A(R) = A (considered as a set), and for

p(x_1,\ldots,x_n) \in \mathbb{Z}[x_1,\ldots,x_n] = \mathcal{R}(X_1,\ldots,X_n;Y)

we let

F(p)(a_1,\ldots,a_n) = p(a_1,\ldots,a_n)

We can also make an operad algebra of point processes, \mathrm{PP}, for \mathcal{P}. For A \in \mathcal{P}_0, we let \mathrm{PP}(A) be the set of point processes on A. If f \colon A_1 \sqcup \cdots \sqcup A_n \to B is an embedding, then we let \mathrm{PP}(f) be the map that sends point processes X_1,\ldots,X_n on A_1,\ldots,A_n respectively to the point process Y defined by

Y_U = X_{f^{-1}(U) \cap A_1} + \cdots + X_{f^{-1}(U) \cap A_n}

Finally, if (C,\otimes,1) is a symmetric monoidal category, there is a way to make an operad algebra of \mathrm{Op}(C) from a special type of functor F \colon C \to \mathsf{Set}. This is convenient, because it is often easier to prove that the functor satisfies the necessary properties than it is to prove that the algebra is in fact well-formed.

The special kind of functor we need is a lax symmetric monoidal functor. This is a functor F equipped with a natural transformation \tau_{A,B} \colon F(A) \times F(B) \to F(A \otimes B) that is well-behaved with respect to the associator, identity, and symmetric structure of (C, \otimes, 1). We call \tau the laxator, and formally speaking, a lax symmetric monoidal functor consists of a functor along with a laxator. I won’t go into detail about the whole construction that makes an operad algebra out of a lax symmetric monoidal functor, but the basic idea is that given an operation f \in \mathrm{Op}(C)(X,Y;Z) (which is a morphism f \colon X \otimes Y \to Z), we can construct a function F(X) \times F(Y) \to F(Z) by composing

\tau_{X,Y} \colon F(X) \times F(Y) \to F(X \otimes Y)


F(f) \colon F(X \otimes Y) \to F(Z)

This basic idea can be extended using associativity to produce a function X_1 \times \cdots \times X_n \to Y from an operation f \colon X_1 \otimes \cdots \otimes X_n \to Y.

As an example of this construction, consider point processes again. We can make a lax symmetric monoidal functor \mathrm{PP} by sending a set A to \mathrm{PP}(A), the set of point processes on A, and an embedding f \colon A \to B to the map F(f) that sends a point process X to a point process Y defined by

Y_U = X_{f^{-1}(U)}

The laxator \tau_{A,B} \colon F(A) \times F(B) \to F(A \sqcup B) sends a point process X on A and a point process Y on B to a point process Z on a A \sqcup B defined by

Z_{U} = X_{U \cap A} + Y_{U \cap B}

The reader should inspect this definition and think about why it is equivalent to the earlier definition for the operad algebra of point processes.


This was a long post, so I’m going to try and go over the main points so that you can organize what you just learned in some sort of coherent fashion.

First I talked about how there frequently arises situations in which there isn’t a canonical way of ‘composing’ two things. The two examples that I gave were elements of a ring, and structures on spaces, specifically point processes.

I then talked about the formal way that we think about these situations. Namely, we organize the ‘ways of composing things’ into an operad, and then we organize the ‘things that we want to compose’ into an operad algebra. Along the way, I discussed a convenient way of making an operad out of a symmetric monoidal category, and an operad algebra out of a lax symmetric monoidal functor.

This construction will be important in the next post, when we make an operad of ‘ways of composing thermostatic systems’ and an operad algebra of thermostatic systems to go along with it.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

Compositional Thermostatics (Part 2)

7 February, 2022

guest post by Owen Lynch

In Part 1, John talked about a paper that we wrote recently:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

and he gave an overview of what a ‘thermostatic system’ is.

In this post, I want to talk about how to compose thermostatic systems. We will not yet use category theory, saving that for another post; instead we will give a ‘nuts-and-bolts’ approach, based on examples.

Suppose that we have two thermostatic systems and we put them in thermal contact, so that they can exchange heat energy. Then we predict that their temperatures should equalize. What does this mean precisely, and how do we derive this result?

Recall that a thermostatic system is given by a convex space X and a concave entropy function S \colon X \to [-\infty,\infty]. A ‘tank’ of constant heat capacity, whose state is solely determined by its energy, has state space X = \mathbb{R}_{> 0} and entropy function S(U) = C \log(U), where C is the heat capacity.

Now suppose that we have two tanks of heat capacity C_1 and C_2 respectively. As thermostatic systems, the state of both tanks is described by two energy variables, U_1 and U_2, and we have entropy functions

S_1(U_1) = C_1 \log(U_1)

S_2(U_2) = C_2 \log(U_2)

By conservation of energy, the total energy of both tanks must remain constant, so

U_1 + U_2 = U

for some U; equivalently

U_2 = U - U_1

The equilibrium state then has maximal total entropy subject to this constraint. That is, an equilibrium state (U_1^{\mathrm{eq}},U_2^{\mathrm{eq}}) must satisfy

S_1(U_1^{\mathrm{eq}}) + S_2(U_2^{\mathrm{eq}}) = \max_{U_1+U_2=U} S_1(U_1) + S_2(U_2)

We can now derive the condition of equal temperature from this condition. In thermodynamics, temperature is defined by

\displaystyle{ \frac{1}{T} = \frac{\partial S}{\partial U} }

The interested reader should calculate this for our entropy functions, and in doing this, see why we identify C with the heat capacity. Now, manipulating the condition of equilibrium, we get

\max_{U_1+U_2=U} S_1(U_1) + S_2(U_2) = \max_{U_1} S_1(U_1) + S_2(U-U_1)

As a function of U_1, the right hand side of this equation must have derivative equal to 0. Thus,

\displaystyle{ \frac{\partial}{\partial U_1} (S_1(U_1) + S_2(U-U_1)) = 0 }

Now, note that if U_2 = U - U_1, then

\displaystyle{  \frac{\partial}{\partial U_1} S(U-U_1) = -\frac{\partial}{\partial U_2} S(U_2) }

Thus, the condition of equilibrium is

\displaystyle{  \frac{\partial}{\partial U_1} S_1(U_1) = \frac{\partial}{\partial U_2} S_2(U_2) }

Using the fact that

\displaystyle{ \frac{1}{T_1} = \frac{\partial}{\partial U_1} S_1(U_1) , \qquad \frac{1}{T_2} = \frac{\partial}{\partial U_2} S_2(U_2) }

the above equation reduces to

\displaystyle{ \frac{1}{T_1} = \frac{1}{T_2} }

so we have our expected condition of temperature equilibriation!

The result of composing several thermostatic systems should be a new thermostatic system. In the case above, the new thermostatic system is described by a single variable: the total energy of the system U = U_1 + U_2. The entropy function of this new thermostatic system is given by the constrained supremum:

S(U) = \max_{U = U_1 + U_2} S_1(U_1) + S_2(U_2)

The reader should verify that this ends up being the same as a system with heat capacity C_1 + C_2, i.e. with entropy function given by

S(U) = (C_1 + C_2) \log(U)

A very similar argument goes through when one has two systems that can exchange both heat and volume; both temperature and pressure are equalized as a consequence of entropy maximization. We end up with a system that is parameterized by total energy and total volume, and has an entropy function that is a function of those quantities.

The general procedure is the following. Suppose that we have n thermostatic systems, (X_1,S_1),\ldots,(X_n,S_n). Let Y be a convex space, that we think of as describing the quantities that are conserved when we compose the n thermostatic systems (i.e., total energy, total volume, etc.). Each value of the conserved quantities y \in Y corresponds to many different possible values for x_1 \in X_1, \ldots x_n \in X_n. We represent this with a relation

R \subseteq X_1 \times \cdots \times X_n \times Y

We then turn Y into a thermostatic system by using the entropy function

S(y) = \max_{R(x_1,\ldots,x_n,y)} S_1(x_1) + \ldots + S_n(x_n)

It turns out that if we require R to be a convex relation (that is, a convex subspace of X_1 \times \cdots \times X_n \times Y) then S as defined above ends up being a concave function, so (Y,S) is a true thermostatic system.

We will have to wait until a later post in the series to see exactly how we describe this procedure using category theory. For now, however, I want to talk about why this procedure makes sense.

In the statistical mechanical interpretation, entropy is related to the probability of observing a specific macrostate. As we scale the system, the theory of large deviations tells us that seeing any macrostate other than the most probable macrostate is highly unlikely. Thus, we can find the macrostate that we will observe in practice by finding the entropy maxima. For an exposition of this point of view, see this paper:

• Jeffrey Commons, Ying-Jen Yang and Hong Qian, Duality symmetry, two entropy functions, and an eigenvalue problem in Gibbs’ theory.

There is also a dynamical systems interpretation of entropy, where entropy serves as a Lyapunov function for a dynamical system. This is the viewpoint taken here:

• Wassim M. Haddad, A Dynamical Systems Theory of Thermodynamics, Princeton U. Press.

In each of these viewpoints, however, the maximization of entropy is not global, but rather constrained. The dynamical system only maximizes entropy along its orbit, and the statistical mechanical system maximizes entropy with respect to constraints on the probability distribution.

We can think of thermostatics as a ‘common refinement’ of both of these points of view. We are agnostic as to the mechanism by which constrained maximization of entropy takes place and we are simply interested in investigating its consequences. We expect that a careful formalization of either system should end up deriving something similar to our thermostatic theory in the limit.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

The Kepler Problem (Part 4)

27 January, 2022

The Kepler problem is the study of a particle moving in an attractive inverse square force. In classical mechanics, this problem shows up when you study the motion of a planet around the Sun in the Solar System. In quantum mechanics, it shows up when you study the motion of an electron around a proton in a hydrogen atom.

In Part 2 we saw that the classical Kepler problem has, besides energy and the three components of angular momentum, three more conserved quantities: the components of the eccentricity vector!

This was discovered long ago, in 1710, by the physicist Jakob Hermann. But thanks to Noether, we now know that in classical mechanics, conserved quantities come from symmetries. In the Kepler problem, conservation of energy comes from time translation symmetry, while conservation of the angular momentum comes from rotation symmetry. Which symmetries give conservation of the eccentricity vector?

As we shall see, these symmetries are rotations in 4-dimensional space. These include the obvious rotations in 3-dimensional space which give angular momentum. The other 4-dimensional rotations act in a much less obvious way, and give the eccentricity vector.

In fact, we’ll see that the Kepler problem can be rephrased in terms of a free particle moving around on a sphere in 4-dimensional space. This is a nice explanation of the 4-dimensional rotation symmetry.

After that we’ll see a second way to rephrase the Kepler problem: in terms of a massless, relativistic free particle moving at the speed of light on a sphere in 4-dimensional space. Our first formulation will not involve relativity. This second will.

All this is very nice. You can read some fun explanations of the first formulation here:

• Greg Egan, The ellipse and the atom.

• John Baez, Planets in the fourth dimension.

But how could you guess this 4-dimensional rotation symmetry if you didn’t know about it already? One systematic approach uses Poisson brackets. I won’t explain these, just dive in and use them!

Remember, the particle in the Kepler problem has various observables, which are all ultimately functions of its position and momentum:

• position: \vec q

• momentum: \vec p

• energy: H = \tfrac{1}{2} p^2 - \tfrac{1}{q}

• angular momentum: \vec L = \vec q \times \vec p

• the eccentricity vector: \vec e = \vec p \times \vec L - \tfrac{\vec q}{q}

I’ll use conventions where the Poisson brackets of the components of position q_k and momentum p_\ell are taken to be

\{q_k,p_\ell\} = \delta_{jk}

From this, using the rules for Poisson brackets, we can calculate the Poisson brackets of everything else. For starters:

\{H, L_k\} = \{H,e_h\} = 0

These equations are utterly unsurprising, since they are equivalent to saying that angular momentum \vec L and the eccentricity vector \vec e are conserved. More interestingly, we have

\begin{array}{ccl} \{L_k, L_\ell\} &=& \epsilon_{jk\ell} L_\ell  \\ \{e_k, L_\ell\} &=& \epsilon_{jk\ell} e_\ell \\ \{e_k, e_\ell \} &=& -2H \epsilon_{jk\ell} L_\ell \end{array}

where all the indices go from 1 to 3, I’m summing over repeated indices even if they’re both subscripts, and \epsilon_{jk\ell} are the Levi–Civita symbols.

Now, the factor of -2H above is annoying. But on the region of phase space where H < 0—that is, the space of bound states, where the particle carries out an elliptical orbit—we can define a new vector to deal with this annoyance:

\displaystyle{    \vec M = \frac{\vec e}{\sqrt{-2H}} }

Now we easily get

\begin{array}{ccl} \{L_k, L_\ell\} &=& \epsilon_{jk\ell} L_\ell  \\ \{L_j, M_k\} &=& \epsilon_{jk\ell} M_\ell \\ \{M_j, M_k \} &=& \epsilon_{jk\ell} M_\ell \end{array}

This is nicer, but we can simplify it even more if we introduce some new vectors that are linear combinations of \vec L and \vec M, namely half their sum and half their difference:

\vec A = \tfrac{1}{2} (\vec L + \vec M), \qquad \vec B = \tfrac{1}{2}(\vec L - \vec M)

Then we get

\begin{array}{ccl} \{ A_j, A_k\} &=&  \epsilon_{jk\ell} A_\ell \\ \{ B_j, B_k\} &=&  \epsilon_{jk\ell} B_\ell  \\ \{ A_j, B_k\} &=& 0 \end{array}

So, the observables A_j and B_k contain the same information as the angular momentum and eccentricity vectors, but now they commute with each other!

What does this mean?

Well, when you’re first learning math the Levi–Civita symbols \epsilon_{jk\ell} may seem like just a way to summarize the funny rules for cross products in 3-dimensional space. But as you proceed, you ultimately learn that \mathbb{R}^3 with its cross product is the Lie algebra of the Lie group \mathrm{SO}(3) of rotations in 3-dimensional space. From this viewpoint, the Levi–Civita symbols are nothing but the structure constants for the Lie algebra \mathfrak{so}(3): that is, a way of describing the bracket operation in this Lie algebra in terms of basis vectors.

So, what we’ve got here are two commuting copies of \mathfrak{so}(3), one having the A_j as a basis and the other having the B_k as a basis, both with the Poisson bracket as their Lie bracket.

A better way to say the same thing is that we’ve got a single 6-dimensional Lie algebra

\mathfrak{so}(3) \oplus \mathfrak{so}(3)

having both the A_j and B_k as basis. But then comes the miracle:

\mathfrak{so}(3) \oplus \mathfrak{so}(3) \cong \mathfrak{so}(4)

The easiest way to see this is to realize that S^3, the unit sphere in 4 dimensions, is itself a Lie group with Lie algebra isomorphic to \mathfrak{so}(3). Namely, it’s the unit quaternions!—or if you prefer, the Lie group \mathrm{SU}(2). Like any Lie group it acts on itself via left and right translations, which commute. But these are actually ways of rotating S^3. So, you get a map of Lie algebras from \mathfrak{so}(3) \oplus \mathfrak{so}(3) to \mathfrak{so}(4), and you can check that this is an isomorphism.

So in this approach, the 4th dimension pops out of the fact that the Kepler problem has conserved quantities that give two commuting copies of \mathfrak{so}(3). By Noether’s theorem, it follows that conservation of angular momentum and the eccentricity vector must come from a hidden symmetry: symmetry under some group whose Lie algebra is \mathfrak{so}(4).

And indeed, it turns out that the group \mathrm{SO}(4) acts on the bound states of the Kepler problem in a way that commutes with time evolution!

But how can we understand this fact?

Historically, it seems that the first explanation was found in the quantum-mechanical context. In 1926, even before Schrödinger came up with his famous equation, Pauli used conservation of angular momentum and the eccentricity to determine the spectrum of hydrogen. But I believe he was using what we now call Lie algebra methods, not bringing in the group \mathrm{SO}(4).

In 1935, Vladimir Fock, famous for the ‘Fock space’ in quantum field theory, explained this 4-dimensional rotation symmetry by setting up an equivalence between hydrogen atom bound states and functions on the 3-sphere! In the following year, Valentine Bargmann, later famous for being Einstein’s assistant, connected Pauli and Fock’s work using group representation theory.

All this is quantum mechanics. It seems the first global discussion of this symmetry in the classical context was given by Bacry, Ruegg, and Souriau in 1966, leading to important work by Souriau and Moser in the early 1970s. Since then, much more has been done. You can learn about a lot of it from these two books, which are my constant companions these days:

• Victor Guillemin and Shlomo Sternberg, Variations on a Theme by Kepler, Providence, R.I., American Mathematical Society, 1990.

• Bruno Cordani, The Kepler Problem: Group Theoretical Aspects, Regularization and Quantization, with Application to the Study of Pertubation, Birkhäuser, Boston, 2002.

But let me try to summarize a bit of this material.

One way to understand the \mathrm{SO}(4) symmetry for bound states of the Kepler problem is the result of Hamilton that I explained last time: for a particle moving around an elliptical orbit in the Kepler problem, its momentum moves round and round in a circle.

I’ll call these circles Hamilton’s circles. Hamilton’s circles are not arbitrary circles in \mathbb{R}^3. Using the inverse of stereographic projection, we can map \mathbb{R}^3 to the unit 3-sphere:

\begin{array}{rccl} f \colon &\mathbb{R}^3 &\to & S^3    \subset \mathbb{R}^4  \\  \\ & \vec p    &\mapsto & \displaystyle{\left(\frac{p^2 - 1}{p^2 +1}, \frac{2 \vec p}{p^2 + 1}\right).} \end{array}

This map sends Hamilton’s circles in \mathbb{R}^3 to great circles in S^3. Furthermore, this construction gives all the great circles in S^3 except those that go through the north and south poles, (\pm 1, 0,0,0). These missing great circles correspond to periodic orbits in the Kepler problem where a particle starts with momentum zero, falls straight to the origin, and bounces back the way it came. If we include these degenerate orbits, every great circle on the unit 3-sphere is the path traced out by the momentum in some solution of the Kepler problem.

Let me reemphasize: in this picture, points of S^3 correspond not to positions but to momenta in the Kepler problem. As time passes, these points move along great circles in S^3... but not at constant speed.

How is their dynamics related to geodesic motion on the 3-sphere?
We can understand this as follows. In Part 2 we saw that

L^2 + M^2 =  - \frac{1}{2H}

and using the fact that \vec L \cdot \vec M = 0, an easy calculation gives

H \; = \; -\frac{1}{8A^2} \; = \; -\frac{1}{8B^2}

In the 3-sphere picture, the observables A_j become functions on the cotangent bundle T^\ast S^3. These functions are just the components of momentum for a particle on S^3, defined using a standard basis of right-invariant vector fields on S^3 \cong \mathrm{SU}(2). Similarly, the observables B_j are the components of momentum using a standard basis of left-invariant vector fields. It follows that

K = 8A^2 = 8B^2

is the Hamiltonian for a nonrelativistic free particle on S^3 with an appropriately chosen mass. Such a particle moves around a great circle on S^3 at constant speed. Since the Kepler Hamiltonian H is a function of K, particles governed by this Hamiltonian move along the same trajectories—but typically not at constant speed!

Both K and the Kepler Hamiltonian H = -1/K are well-defined smooth functions on the symplectic manifold that Souriau dubbed the Kepler manifold:

T^+ S^3 = \{ (x,p) : \; x \in S^3, \, p \in T_x S^3, \, p \ne 0 \}

This is the cotangent bundle of the 3-sphere with the zero cotangent vectors removed, so that H = -1/K is well-defined.

All this is great. But even better, there’s yet another picture of what’s going on, which brings relativity into the game!

We can also think of T^+ S^3 as a space of null geodesics in the Einstein universe: the manifold \mathbb{R} \times S^3 with the Lorentzian metric

dt^2 - ds^2

where dt^2 is the usual Riemannian metric on the real line (‘time’) and ds^2 is the usual metric on the unit sphere (‘space’). In this picture x \in S^3 describes the geodesic’s position at time zero, while the null cotangent vector p + \|p\| dt describes its 4-momentum at time zero. Beware: in this picture two geodesics count as distinct if we rescale p by any positive factor other than 1. But this is good: physically, it reflects the fact that in relativity, massless particles can have different 4-momentum even if they trace out the same path in spacetime.

In short, the Kepler manifold T^+ S^3 also serves as the classical phase space for a free massless spin-0 particle in the Einstein universe!

And here’s the cool part: the Hamiltonian for such a particle is

\sqrt{K} = \sqrt{-1/H}

So it’s a function of both the Hamiltonians we’ve seen before. Thus, time evolution given by this Hamiltonian carries particles around great circles on the 3-sphere… at constant speed, but at a different speed than the nonrelativistic free particle described by the Hamiltonian K.

In future episodes, I want to quantize this whole story. We’ll get some interesting outlooks on the quantum mechanics of the hydrogen atom.

The Kepler Problem (Part 3)

23 January, 2022

The Kepler problem studies a particle moving in an inverse square force, like a planet orbiting the Sun. Last time I talked about an extra conserved quantity associated to this problem, which keeps elliptical orbits from precessing or changing shape. This extra conserved quantity is sometimes called the Laplace–Runge–Lenz vector, but since it was first discovered by none of these people, I prefer to call it the ‘eccentricity vector’

In 1847, Hamilton noticed a fascinating consequence of this extra conservation law. For a particle moving in an inverse square force, its momentum moves along a circle!

Greg Egan has given a beautiful geometrical argument for this fact:

• Greg Egan, The ellipse and the atom.

I will not try to outdo him; instead I’ll follow a more dry, calculational approach. One reason is that I’m trying to amass a little arsenal of formulas connected to the Kepler problem.

Let’s dive in. Remember from last time: we’re studying a particle whose position \vec q obeys

\ddot{\vec q} = - \frac{\vec q}{q^3}

Its momentum is

\vec p = m \dot{\vec q}

Its momentum is not conserved. Its conserved quantities are energy:

H = \frac{1}{2} p^2 - \frac{1}{q}

the angular momentum vector:

\vec L = \vec q \times \vec p

and the eccentricity vector:

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

Now for the cool part: we can show that

\displaystyle{ \left( \vec p - \frac{\vec L \times \vec e}{L^2} \right)^2 = \frac{1}{L^2} }

Thus, the momentum \vec p stays on a circle of radius 1/L centered at the point (\vec L \times \vec e)/L^2. And since \vec L and \vec e are conserved, this circle doesn’t change! Let’s call it Hamilton’s circle.

Now let’s actually do the calculations needed to show that the momentum stays on Hamilton’s circle. Since

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

we have

\frac{\vec q}{q} = \vec p \times \vec L - \vec e

Taking the dot product of this vector with itself, which is 1, we get

\begin{array}{ccl}  1 &=& \frac{\vec q}{q} \cdot \frac{\vec q}{q}  \\ \\  &=& (\vec p \times \vec L - \vec e) \cdot (\vec p \times \vec L - \vec e) \\ \\  &=& (\vec p \times \vec L)^2 - 2 \vec e \cdot (\vec p \times \vec L) + e^2  \end{array}

Now, notice that \vec p and \vec L are orthogonal since \vec L = \vec q \times \vec p. Thus

(\vec p \times \vec L)^2 = p^2 L^2

I actually used this fact and explained it in more detail last time. Substituting this in, we get

1 = p^2 L^2 - 2 \vec e \cdot (\vec p \times \vec L) + e^2

Similarly, \vec e and \vec L are orthogonal! After all,

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

The first term is orthogonal to \vec L since it’s the cross product of \vec L and some other vector. And the second term is orthogonal to \vec L since \vec L is the cross product of \vec q and some other vector! So, we have

(\vec L \times \vec e)^2 = L^2 e^2

and thus

\displaystyle { e^2 = \frac{(\vec L \times \vec e)^2}{L^2} }

Substituting this in, we get

\displaystyle { 1 = p^2 L^2 - 2 \vec e \cdot (\vec p \times \vec L) + \frac{(\vec L \times \vec e)^2}{L^2} }

Using the cyclic property of the scalar triple product, we can rewrite this as

\displaystyle { 1 = p^2 L^2 - 2 \vec p \cdot (\vec L \times \vec e) + \frac{(\vec L \times \vec e)^2}{L^2} }

This is nicer because it involves \vec L \times \vec e in two places. If we divide both sides by L^2 we get

\displaystyle { \frac{1}{L^2} = p^2 - \frac{2}{L^2} \; \vec p \cdot (\vec L \times \vec e) + \frac{(\vec L \times \vec e)^2}{L^4} }

And now for the final flourish! The right hand is the dot product of a vector with itself:

\displaystyle { \frac{1}{L^2} = \left(\vec p -  \frac{\vec L \times \vec e}{L^2}\right)^2 }

This is the equation for Hamilton’s circle!

Now, beware: the momentum \vec p doesn’t usually move at a constant rate along Hamilton’s circle, since that would force the particle’s orbit to itself be circular.

But on the bright side, the momentum moves along Hamilton’s circle regardless of whether the particle’s orbit is elliptical, parabolic or hyperbolic. And we can easily distinguish the three cases using Hamilton’s circle!

After all, the center of Hamilton’s circle is the point (\vec L \times \vec e)/L^2, and

(\vec L \times \vec e)^2 = L^2 e^2

so the distance of this center from the origin is

\displaystyle{ \sqrt{\frac{(\vec L \times \vec e)^2}{L^4}} = \sqrt{\frac{L^2 e^2}{L^4}} = \frac{e}{L} }

On the other hand, the radius of Hamilton’s circle is 1/L. So his circle encloses the origin, goes through the origin or does not enclose the origin depending on whether e < 1, e = 1 or e > 1. But we saw last time that these three cases correspond to elliptical, parabolic and hyperbolic orbits!


• If e < 1 the particle’s orbit is an ellipse and the origin lies inside Hamilton’s circle. The momentum goes round and round Hamilton’s circle as time passes.

• If e = 1 the particle’s orbit is a parabola and the origin lies exactly on Hamilton’s circle. The particle’s momentum approaches zero as time approaches \pm \infty, so its momentum goes around Hamilton’s circle exactly once as time passes.

• If e > 1 the particle’s orbit is a hyperbola and the origin lies outside Hamilton’s circle. The particle’s momentum approaches distinct nonzero values as time approaches \pm \infty, so its momentum goes around just a portion of Hamilton’s circle.

By the way, in general the curve traced out by the momentum vector of a particle is called a hodograph. So you can learn more about Hamilton’s circle with the help of that buzzword.

The Color of Infinite Temperature

16 January, 2022


This is the color of something infinitely hot. Of course you’d instantly be fried by gamma rays of arbitrarily high frequency, but this would be its spectrum in the visible range.

This is also the color of a typical neutron star. They’re so hot they look the same.

It’s also the color of the early Universe!

This was worked out by David Madore.

As a blackbody gets hotter and hotter, its spectrum approaches the classical Rayleigh–Jeans law. That is, its true spectrum as given by the Planck law approaches the classical prediction over a larger and larger range of frequencies.

So, for an extremely hot blackbody, the spectrum of light we can actually see with our eyes is governed by the Rayleigh–Jeans law. This law says the color doesn’t depend on the temperature: only the brightness does!

And this color is shown above.

This involves human perception, not just straight physics. So David Madore needed to work out the response of the human eye to the Rayleigh–Jeans spectrum — “by integrating the spectrum against the CIE XYZ matching functions and using the definition of the sRGB color space.”

The color he got is sRGB(148,177,255). And according to the experts who sip latte all day and make up names for colors, this color is called ‘Perano’.

Here is some background material Madore wrote on colors and visual perception. It doesn’t include the whole calculation that leads to this particular color, so somebody should check it, but it should help you understand how to convert the blackbody spectrum at a particular temperature into an sRGB color:

• David Madore, Colors and colorimetry.

In the comments you can see that Thomas Mansencal redid the calculation and got a slightly different color: sRGB(154,181,255). It looks quite similar to me:

The Kepler Problem (Part 2)

31 December, 2021

I’m working on a math project involving the periodic table of elements and the Kepler problem—that is, the problem of a particle moving in an inverse square force law. That’s one reason I’ve been blogging about chemistry lately! I hope to tell you all about this project sometime—but right now I just want to say some very basic stuff about the ‘eccentricity vector’.

This vector is a conserved quantity for the Kepler problem. It was named the ‘Runge–Lenz vector’ after Lenz used it in 1924 to study the hydrogen atom in the framework of the ‘old quantum mechanics’ of Bohr and Sommerfeld: Lenz cite Runge’s popular German textbook on vector analysis from 1919, which explains this vector. But Runge never claimed any originality: he attributed this vector to Gibbs, who wrote about it in his book on vector analysis in 1901!

Nowadays many people call it the ‘Laplace–Runge–Lenz vector’, honoring Laplace’s discussion of it in his famous treatise on celestial mechaics in 1799. But in fact this vector goes back at least to Jakob Hermann, who wrote about it in 1710, triggering further work on this topic by Johann Bernoulli in the same year.

Nobody has seen signs of this vector in work before Hermann. So, we might call it the Hermann–Bernoulli–Laplace–Gibbs–Runge–Lenz vector, or just the Hermann vector. But I prefer to call it the eccentricity vector, because for a particle in an inverse square law its magnitude is the eccentricity of that orbit!

Let’s suppose we have a particle whose position \vec q \in \mathbb{R}^3 obeys this version of the inverse square force law:

\ddot{\vec q} = - \frac{\vec q}{q^3}

where I remove the arrow from a vector when I want to talk about its magnitude. So, I’m setting the mass of this particle equal to 1, along with the constant saying the strength of the force. That’s because I want to keep the formulas clean! With these conventions, the momentum of the particle is

\vec p = \dot{\vec q}

For this system it’s well-known that the following energy is conserved:

H = \frac{1}{2} p^2 - \frac{1}{q}

as well as the angular momentum vector:

\vec L = \vec q \times \vec p

But the interesting thing for me today is the eccentricity vector:

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

Let’s check that it’s conserved! Taking its time derivative,

\dot{\vec e} = \dot{\vec p} \times \vec L + \vec p \times \dot{\vec L} - \frac{\vec p}{q} + \frac{\dot q}{q^2} \,\vec q

But angular momentum is conserved so the second term vanishes, and

\dot q = \frac{d}{dt} \sqrt{\vec q \cdot \vec q} =  \frac{\vec p \cdot \vec q}{\sqrt{\vec q \cdot \vec q}} = \frac{\vec p \cdot \vec q}{q}

so we get

\dot{\vec e} = \dot{\vec p} \times \vec L - \frac{\vec p}{q} +  \frac{\vec p \cdot \vec q}{q^2}\, \vec q

But the inverse square force law says

\dot{\vec p} = - \frac{\vec q}{q^3}


\dot{\vec e} = - \frac{1}{q^3} \, \vec q \times \vec L - \frac{\vec p}{q} +  \frac{\vec p \cdot \vec q}{q^2}\, \vec q

How can we see that this vanishes? Mind you, there are various geometrical ways to think about this, but today I’m in the mood for checking that my skills in vector algebra are sufficient for a brute-force proof—and I want to record this proof so I can see it later!

To get anywhere we need to deal with the cross product in the above formula:

\vec q \times \vec L = \vec q \times (\vec q \times \vec p)

There’s a nice identity for the vector triple product:

\vec a \times (\vec b \times \vec c) = (\vec a \cdot \vec c) \vec b - (\vec a \cdot \vec b) \vec c

I could have fun talking about why this is true, but I won’t now! I’ll just use it:

\vec q \times \vec L = \vec q \times (\vec q \times \vec p) = (\vec q \cdot \vec p) \vec q - q^2 \, \vec p

and plug this into our formula

\dot{\vec e} = - \frac{1}{q^3} \, \vec q \times \vec L - \frac{\vec p}{q} +  \frac{\vec p \cdot \vec q}{q^2}\, \vec q


\dot{\vec e} = -\frac{1}{q^3} \Big((\vec q \cdot \vec p) \vec q - q^2 \vec p \Big) - \frac{\vec p}{q} +  \frac{\vec p \cdot \vec q}{q^3}\, \vec q

But look—everything cancels! So

\dot{\vec e} = 0

and the eccentricity vector is conserved!

So, it seems that the inverse square force law has 7 conserved quantities: the energy H, the 3 components of the angular momentum \vec L, and the 3 components of the eccentricity vector \vec e. But they can’t all be independent, since the particle only has 6 degrees of freedom: 3 for position and 3 for momentum. There can be at most 5 independent conserved quantities, since something has to change. So there have to be at least two relations betwen the conserved quantities we’ve found.

The first of these relations is pretty obvious: \vec e and \vec L are at right angles, so

\vec e \cdot \vec L = 0

But wait, why are they at right angles? Because

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

The first term is orthogonal to \vec L because it’s a cross product of \vec p and \vec L; the second is orthogonal to \vec L because \vec L is a cross product of \vec q and \vec p.

The second relation is a lot less obvious, but also more interesting. Let’s take the dot product of \vec e with itself:

e^2 = \left(\vec p \times \vec L - \frac{\vec q}{q}\right) \cdot \left(\vec p \times \vec L - \frac{\vec q}{q}\right)

or in other words,

e^2 = (\vec p \times \vec L) \cdot (\vec p \times \vec L) - \frac{2}{q} \vec q \cdot (\vec p \times \vec L) + 1

But remember this nice cross product identity:

(\vec a \times \vec b) \cdot (\vec a \times \vec b) + (\vec a \cdot \vec b)^2 = a^2 b^2

Since \vec p and L are at right angles this gives

(\vec p \times \vec L) \cdot (\vec p \times \vec L) = p^2 L^2


e^2 = p^2 L^2 - \frac{2}{q} \vec q \cdot (\vec p \times \vec L) + 1

Then we can use the cyclic identity for the scalar triple product:

\vec a \cdot (\vec b \times \vec c) = \vec c \cdot (\vec a \times \vec b)

to rewrite this as

e^2 = p^2 L^2 - \frac{2}{q} \vec L \cdot (\vec q \times \vec p) + 1

or simply

e^2 = p^2 L^2 - \frac{2}{q} L^2 + 1

or even better,

e^2 = 2 \left(\frac{1}{2} p^2 - \frac{1}{q}\right) L^2 + 1

But this means that

e^2 = 2HL^2 + 1

which is our second relation between conserved quantities for the Kepler problem!

This relation makes a lot of sense if you know that e is the eccentricity of the orbit. Then it implies:

• if H > 0 then e > 1 and the orbit is a hyperbola.

• if H = 0 then e = 1 and the orbit is a parabola.

• if H < 0 then 0 < e < 1 and the orbit is an ellipse (or circle).

But why is e the eccentricity? And why does the particle move in a hyperbola, parabola or ellipse in the first place? We can show both of these things by taking the dot product of \vec q and \vec e:

\begin{array}{ccl}   \vec q \cdot \vec e &=&   \vec q \cdot \left(\vec p \times \vec L - \frac{\vec q}{q} \right)  \\ \\    &=& \vec q \cdot (\vec p \times \vec L) - q   \end{array}

Using the cyclic property of the scalar triple product we can rewrite this as

\begin{array}{ccl}   \vec q \cdot \vec e &=&   \vec L \cdot (\vec q \times \vec p) - q  \\ \\  &=& L^2 - q  \end{array}

Now, we know that \vec q moves in the plane orthogonal to \vec L. In this plane, which contains the vector \vec e, the equation \vec q \cdot \vec e = L^2 - q defines a conic of eccentricity e. I won’t show this from scratch, but it may seem more familiar if we rotate the whole situation so this plane is the xy plane and \vec e points in the x direction. Then in polar coordinates this equation says

er \cos \theta = L^2 - r


r = \frac{L^2}{1 + e \cos \theta}

This is well-known, at least among students of physics who have solved the Kepler problem, to be the equation of a conic of eccentricity e.

Another thing that’s good to do is define a rescaled eccentricity vector. In the case of elliptical orbits, where H < 0, we define this by

\vec M = \frac{\vec e}{\sqrt{-2H}}

Then we can take our relation

e^2 = 2HL^2 + 1

and rewrite it as

1 = e^2 - 2H L^2

and then divide by -2H getting

- \frac{1}{2H} = \frac{e^2}{-2H} + L^2


- \frac{1}{2H} = L^2 + M^2

This suggests an interesting similarity between \vec L and \vec M, which turns out to be very important in a deeper understanding of the Kepler problem. And with more work, you can use this idea to show that -1/4H is the Hamiltonian for a free particle on the 3-sphere. But more about that some other time, I hope!

For now, you might try this:

• Wikipedia, Laplace–Runge–Lenz vector.

and of course this:

The Kepler problem (part 1).