Categorical Semantics of Entropy

19 April, 2022

There will be a workshop on the categorical semantics of entropy at the CUNY Grad Center in Manhattan on Friday May 13th, organized by John Terilla. I was kindly invited to give an online tutorial beforehand on May 11, which I will give remotely to save carbon. Tai-Danae Bradley will also be giving a tutorial that day in person:

Tutorial: Categorical Semantics of Entropy, Wednesday 11 May 2022, 13:00–16:30 Eastern Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

12:00-1:00 Eastern Daylight Time — Lunch in Room 5209.

1:00-2:30 — Shannon entropy from category theory, John Baez, University of California Riverside; Centre for Quantum Technologies (Singapore); Topos Institute.

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

2:30-3:00 — Coffee break.

3:00-4:30 — Operads and entropy, Tai-Danae Bradley, The Master’s University; Sandbox AQ.

This talk will open with a basic introduction to operads and their representations, with the main example being the operad of probabilities. I’ll then give a light sketch of how this framework leads to a small, but interesting, connection between information theory, abstract algebra, and topology, namely a correspondence between Shannon entropy and derivations of the operad of probabilities.

Symposium on Categorical Semantics of Entropy, Friday 13 May 2022, 9:30-3:15 Eastern Daylight Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

9:30-10:00 Eastern Daylight Time — Coffee and pastries in Room 5209.

10:00-10:45 — Operadic composition of thermodynamic systems, Owen Lynch, Utrecht University.

The maximum entropy principle is a fascinating and productive lens with which to view both thermodynamics and statistical mechanics. In this talk, we present a categorification of the maximum entropy principle, using convex spaces and operads. Along the way, we will discuss a variety of examples of the maximum entropy principle and show how each application can be captured using our framework. This approach shines a new light on old constructions. For instance, we will show how we can derive the canonical ensemble by attaching a probabilistic system to a heat bath. Finally, our approach to this categorification has applications beyond the maximum entropy principle, and we will give an hint of how to adapt this categorification to the formalization of the composition of other systems.

11:00-11:45 — Polynomial functors and Shannon entropy, David Spivak, MIT and the Topos Institute.

The category Poly of polynomial functors in one variable is extremely rich, brimming with categorical gadgets (e.g. eight monoidal products, two closures, limits, colimits, etc.) and applications including dynamical systems, databases, open games, and cellular automata. In this talk I’ll show that objects in Poly can be understood as empirical distributions. In part using the standard derivative of polynomials, we obtain a functor to Set × Setop which encodes an invariant of a distribution as a pair of sets. This invariant is well-behaved in the sense that it is a distributive monoidal functor: it acts on both distributions and maps between them, and it preserves both the sum and the tensor product of distributions. The Shannon entropy of the original distribution is then calculated directly from the invariant, i.e. only in terms of the cardinalities of these two sets. Given the many applications of polynomial functors and of Shannon entropy, having this link between them has potential to create useful synergies, e.g. to notions of entropic causality or entropic learning in dynamical systems.

12:00-1:30 — Lunch in Room 5209

1:30-2:15 — Higher entropy, Tom Mainiero, Rutgers New High Energy Theory Center.

Is the frowzy state of your desk no longer as thrilling as it once was? Are numerical measures of information no longer able to satisfy your needs? There is a cure! In this talk we’ll learn about: the secret topological lives of multipartite measures and quantum states; how a homological probe of this geometry reveals correlated random variables; the sly decategorified involvement of Shannon, Tsallis, Réyni, and von Neumann in this larger geometric conspiracy; and the story of how Gelfand, Neumark, and Segal’s construction of von Neumann algebra representations can help us uncover this informatic ruse. So come to this talk, spice up your entropic life, and bring new meaning to your relationship with disarray.

2:30-3:15 — On characterizing classical and quantum entropy, Arthur Parzygnat, Institut des Hautes Études Scientifiques.

In 2011, Baez, Fritz, and Leinster proved that the Shannon entropy can be characterized as a functor by a few simple postulates. In 2014, Baez and Fritz extended this theorem to provide a Bayesian characterization of the classical relative entropy, also known as the Kullback–Leibler divergence. In 2017, Gagné and Panangaden extended the latter result to include standard Borel spaces. In 2020, I generalized the first result on Shannon entropy so that it includes the von Neumann (quantum) entropy. In 2021, I provided partial results indicating that the Umegaki relative entropy may also have a Bayesian characterization. My results in the quantum setting are special applications of the recent theory of quantum Bayesian inference, which is a non-commutative extension of classical Bayesian statistics based on category theory. In this talk, I will give an overview of these developments and their possible applications in quantum information theory.

Wine and cheese reception to follow, Room 5209.

Compositional Thermostatics (Part 4)

8 March, 2022

guest post by Owen Lynch

This is the fourth and final part of a blog series on this paper:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

In Part 1, we went over our definition of thermostatic system: it’s a convex space X of states and a concave function S \colon X \to [-\infty, \infty] saying the entropy of each state. We also gave examples of thermostatic systems.

In Part 2, we talked about what it means to compose thermostatic systems. It amounts to constrained maximization of the total entropy.

In Part 3 we laid down a categorical framework for composing systems when there are choices that have to be made for how the systems are composed. This framework has been around for a long time: operads and operad algebras.

In this post we will bring together all of these parts in a big synthesis to create an operad of all the ways of composing thermostatic systems, along with an operad algebra of thermostatic systems!

Recall that in order to compose thermostatic systems (X_1, S_1), \ldots, (X_n, S_n), we need to use a ‘parameterized constraint’, a convex subset

R \subseteq X_1 \times \cdots \times X_n \times Y,

where Y is some other convex set. We end up with a thermostatic system on Y, with S \colon Y \to [-\infty,\infty] defined by

S(y) = \sup_{(x_1,\ldots,x_n,y) \in R} S_1(x_1) + \cdots + S_n(x_n)

In order to model this using operads and operad algebras, we will make an operad \mathcal{CR} which has convex sets as its types, and convex relations as its morphisms. Then we will make an operad algebra that assigns to any convex set X the set of concave functions

S \colon X \to [-\infty,\infty]

This operad algebra will describe how, given a relation R \subseteq X_1 \times \cdots \times X_n \times Y, we can ‘push forward’ entropy functions on X_1,\ldots,X_n to form an entropy function on Y.

The operad \mathcal{CR} is built using a construction from Part 3 that takes a symmetric monoidal category and produces an operad. The symmetric monoidal category that we start with is \mathsf{ConvRel}, which has convex sets as its objects and convex relations as its morphisms. This symmetric monoidal category has \mathsf{Conv} (the category of convex sets and convex-linear functions) as a subcategory with all the same objects, and \mathsf{ConvRel} inherits a symmetric monoidal structure from the bigger category \mathsf{Conv}.

Following the construction from Part 3, we see that we get an operad

\mathcal{CR} = \mathrm{Op}(\mathsf{ConvRel})

exactly as described before: namely it has convex sets as types, and

\mathcal{CR}(X_1,\ldots,X_n;Y) = \mathsf{ConvRel}(X_1 \times \cdots \times X_n, Y)

Next we want to make an operad algebra on \mathcal{CR}. To do this we use a lax symmetric monoidal functor \mathrm{Ent} from \mathsf{ConvRel} to \mathsf{Set}, defined as follows. On objects, \mathrm{Ent} sends any convex set X to the set of entropy functions on it:

\mathrm{Ent}(X) = \{ S \colon X \to [-\infty,\infty] \mid S \: \text{is concave} \}

On morphisms, \mathrm{Ent} sends any convex relation to to the map that “pushes forward” an entropy function along that relation:

\mathrm{Ent}(R \subseteq X \times Y) = (y \mapsto \sup_{(x,y) \in R} S(x))

And finally, the all-important laxator \epsilon produces an entropy function on X_1 \times X_2 by summing an entropy function on X_1 and an entropy function on X_2:

\epsilon_{X_1,X_2} = ((S_1,S_2) \mapsto S_1 + S_2)

The proof that all this indeed defines a lax symmetric monoidal functor can be found in our paper. The main point is that once we have proven this really is a lax symmetric monoidal functor, we can invoke the machinery of lax symmetric monoidal functors and operad algebras to prove that we get an operad algebra! This is very convenient, because proving that we have an operad algebra directly would be somewhat tedious.

We have now reached the technical high point of the paper, which is showing that this operad algebra exists and thus formalizing what it means to compose thermostatic systems. All that remains to do now is to show off a bunch of examples of composition, so that you can see how all this categorical machinery works in practice. In our paper we give many examples, but here let’s consider just one.

Consider the following setup with two ideal gases connected by a movable divider.

The state space of each individual ideal gas is \mathbb{R}^3_{> 0}, with coordinates (U,V,N) representing energy, volume, and number of particles respectively. Let (U_1, V_1, N_1) be the coordinates for the left-hand gas, and (U_2, V_2, N_2) be the coordinates for the right-hand gas. Then as the two gases move to thermodynamic equilibrium, the conserved quantities are U_1 + U_2, V_1 + V_2, N_1 and N_2. We picture this with the following diagram.

Ports on the inner circles represent variables for the ideal gases, and ports on the outer circle represent variables for the composed system. Wires represent relations between those variables. Thus, the entire diagram represents an operation in \mathcal{CR}, given by

U_1 + U_2 = U^e
V_1 + V_2 = V^e
N_1 = N_1^e
N_2 = N_2^e

We can then use the operad algebra to take entropy functions S_1,S_2 \colon \mathbb{R}^3_{> 0} \to [-\infty, \infty] on the two inner systems (the two ideal gases), and get an entropy function S^e \colon \mathbb{R}^4_{> 0} \to [-\infty,\infty] on the outer system.

As a consequence of this entropy maximization procedure, the inner state (U_1,V_1,N_1), (U_2,V_2,N_2) are such that the temperature and pressure equilibriate between the two ideal gases. This is because constrained maximization with the constraint U_1 + U_2 = U^e leads to the following equations at a maximizer:

\displaystyle{ \frac{1}{T_1} = \frac{\partial S_1}{\partial U_1} = \frac{\partial S_2}{\partial U_2} = \frac{1}{T_2} }

(where T_1 and T_2 are the respective temperatures), and

\displaystyle{ \frac{p_1}{T_1} = \frac{\partial S_1}{\partial V_1} = \frac{\partial S_2}{\partial V_2} = \frac{p_2}{T_2} }

(where p_1 and p_2 are the respective pressures).

Thus we arrive at the expected conclusion, which is that temperature and pressure equalize when we maximize entropy under constraints on the total energy and volume.

And that concludes this series of blog posts! For more examples of thermostatic composition, I invite you to read our paper, which has some “thermostatic systems” that one does not normally see thought of in this way, such as heat baths and probabilistic systems! And if you find this stuff interesting, don’t hesitate to reach out to me! Just drop a comment here or email me at the address in the paper.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

Compositional Thermostatics (Part 3)

14 February, 2022

guest post by Owen Lynch

This is the third part (Part 1, Part 2) of a blog series on a paper that we wrote recently:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

In the previous two posts we talked about what a thermostatic system was, and how we think about composing them. In this post, we are going to back up from thermostatic systems a little bit, and talk about operads: a general framework for composing things! But we will not yet discuss how thermostatic systems use this framework—we’ll do that in the next post.

The basic idea behind this framework is the following. Suppose that we have a bunch of widgets, and we want to compose these widgets. If we are lucky, given two widgets there is a natural way of composing them. This is the case if the widgets are elements of a monoid; we simply use the monoid operation. This is also the case if the widgets are morphisms in a category; if the domain and codomain of two widgets match up, then they can be composed. More generally, n-morphisms in a higher category also have natural ways of composition.

However, there is not always a canonical way of composing widgets. For instance, let R be a commutative ring, and let a and b be elements of R. Then there are many ways to compose them: we could add them, subtract them, multiply them, etc. In fact any element of the free commutative ring \mathbb{Z}[x,y] gives a way of composing a pair of elements in a commutative ring. For instance, x^2 + xy - y^2, when applied to a and b, gives a^2 + ab - b^2. Note that there is nothing special here about the fact that we started with two elements of R; we could start with as many elements of R as we liked, say, a_1,\ldots,a_n, and any element of \mathbb{Z}[x_1,\ldots,x_n] would give a ‘way of composing’ a_1,\ldots,a_n.

The reader familiar with universal algebra should recognize that this situation is very general: we could do the exact same thing with vector spaces, modules, groups, algebras, or any more exotic structures that support a notion of ‘free algebra on n variables’.

Let’s also discuss a less algebraic example. A point process X on a subset of Euclidean space A \subseteq \mathbb{R}^n can be described as an assignment of a \mathbb{N}-valued random variable X_U to each measurable set U \subseteq A that is countably additive under disjoint union of measurable sets.

The interpretation is that a point process gives a random collection of points in A, and X_U counts how many points fall in U. Moreover, this collection of points cannot have a limit point; there cannot be infinitely many points in any compact subset of A.

Now suppose that f \colon B \to A and g \colon C \to A are rigid embeddings such that f(B) \cap g(C) = \emptyset, and that X is a point process on B and Y is a point process on C. Then we can define a new point process Z on A (assuming that X and Y are independent) by letting

Z_U = X_{f^{-1}(U)}+ Y_{g^{-1}(U)}

This is the union of the point process X running in f(B) and the point process Y running in g(C).

Composing two point processes

The precise details here are not so important: what I want to display is the intuition that we are geometrically composing things that ‘live on’ a space. The embeddings f and g give us a way of gluing together a point process on B and a point process on C to get a point process on A. We could have picked something else that lives on a space, like a scalar/vector field, but I chose point processes because they are easy to visualize and composing them is fairly simple (when composing vector fields one has to be careful that they ‘match’ at the edges).


In all of the examples in the previous section, we have things that we want to compose, and ways of composing them. This situation is formalized by operads and operad algebras (which we will define very shortly). However, the confusing part is that the operad part corresponds to'”ways of composing them’, and the operad algebra part corresponds to ‘things we want to compose’. Thus, the mathematics is somewhat ‘flipped’ from the way of thinking that comes naturally; we first think about the ways of composing things, and then we think about what things we want to compose, rather than first thinking about the things we want to compose and only later thinking about the ways of composing them!

Unfortunately, this is the logical way of presenting operads and operad algebras; we must define what an operad is before we can talk about their algebras, even if what we really care about is the algebras. Thus, without further ado, let us define what an operad is.

An operad \mathcal{O} consists of a collection \mathcal{O}_0 of types (which are abstract, just like the ‘objects’ in a category are abstract), and for every list of types X_1,\ldots,X_n,Y \in \mathcal{O}_0, a collection of operations \mathcal{O}(X_1,\ldots,X_n;Y).

These operations are the ‘ways of composing things’, but they themselves can be composed by ‘feeding into’ each other, in the following way.

Suppose that g \in \mathcal{O}(Y_1,\ldots,Y_n;Z) and for each i = 1,\ldots,n, f_i \in \mathcal{O}(X_{i,1},\ldots,X_{i,k_i};Y_i). Then we can make an operation

g(f_1,\ldots,f_n) \in \mathcal{O}(X_{1,1},\ldots,X_{1,k_1},\ldots,X_{n,1},\ldots,X_{n,k_n};Z)

We visualize operads by letting an operation be a circle that can take several inputs and produces a single output. Then composition of operations is given by attaching the output of circles to the input of other circles. Pictured below is the composition of a unary operator f_1, a nullary operator f_2, and a binary operator f_3 with a ternary operator g to create a ternary operator g(f_1,f_2,f_3).

This image has an empty alt attribute; its file name is bitmap.png
One view of composition in an operad

Additionally, for every type X \in \mathcal{O}_0, there is an ‘identity operation’ 1_X \in \mathcal{O}(X;X) that satisfies for any g \in \mathcal{O}(X_1,\ldots,X_n;Y)

g(1_{X_1},\ldots,1_{X_n}) = g

and for any f \in \mathcal{O}(X;Y)

1_Y(f) = f

There is also an associativity law for composition that is a massive pain to write out explicitly, but is more or less exactly as one would expect. For unary operators f,g,h, it states

f(g(h)) = f(g)(h)

The last condition for being an operad is that if f \in \mathcal{O}(X_1,\ldots,X_n;Y) and \sigma \in S(n), the symmetric group on n elements, then we can apply \sigma to f to get

\sigma^\ast(f) \in \mathcal{O}(X_{\sigma(1)},\ldots,X_{\sigma(n)};Y).

We require that (\sigma \tau)^\ast(f) = \tau^\ast(\sigma^\ast(f)) if \sigma,\tau \in S(n), and there are also some conditions for how \sigma^\ast interacts with composition, which can be straightforwardly derived from the intuition that \sigma^\ast permutes the arguments of an operation.

Note that our definition of an operad is what might typically be known as a ‘symmetric, colored operad’, but as we will always be using symmetric, colored operads, we choose to simply drop the modifiers.

That was a long definition, so it is time for an example. This example corresponds to the first situation in the first section, where we wanted to compose ring elements.

Define \mathcal{R} to be an operad with one type, which we will call R \in \mathcal{R}_0, and let \mathcal{R}(R^n;R) = \mathbb{Z}[x_1,\ldots,x_n], where \mathcal{R}(R^n;R) is \mathcal{R}(R,\ldots,R;R) with R repeated n times.

Composition is simply polynomial substitution. That is, if

q(y_1,\ldots,y_n) \in \mathbb{Z}[y_1,\ldots,y_n] \cong \mathcal{R}(R^n;R)


p_i(x_{i,1},\ldots,x_{i,k_i}) \in \mathbb{Z}[x_{i,1},\ldots,x_{i,k_i}] \cong \mathcal{R}(R^{k_i};R)


q(p_1(x_{1,1},\ldots,x_{1,k_1}),\ldots,p_n(x_{n,1},\ldots,x_{n,k_n})) \in \mathcal{R}(R^{\sum_{i=1}^n k_i};R)

is the composite of p_1,\ldots,p_n,q. For instance, composing

x^2 \in \mathbb{Z}[x] \cong \mathcal{R}(R;R)


y+z \in \mathbb{Z}[y,z] \cong \mathcal{R}(R,R;R)

results in

(y+z)^2 \in \mathbb{Z}[y,z] \cong \mathcal{R}(R,R;R)

The reader is invited to supply details for identities and the symmetry operators.

For the other example, define an operad \mathcal{P} by letting \mathcal{P}_0 be the set of compact subsets of \mathbb{R}^2 (we could consider something more exciting, but this works fine and is easy to visualize). An operation f \in \mathcal{P}(X_1,\ldots,X_n;Y) consists of disjoint embeddings f_1,\ldots,f_n, where f_i \colon X_i \to Y.

We can visualize such an operation as simply a shape with holes in it.

Shape with holes

Composition of such operations is just given by nesting the holes.

Composing by nesting

The outcome of the above composition is given by simply taking away the intermediate shapes (i.e. the big circle and the triangle).

The composed operation

Another source of examples for operads comes from the following construction. Suppose that (C,\otimes,1) is a symmetric monoidal category. Define \mathrm{Op}(C,\otimes,1) = \mathrm{Op}(C) by letting

\mathrm{Op}(C)_0 = C_0

where C_0 is the collection of objects in C, and

\mathrm{Op}(C)(X_1,\ldots,X_n;Y) = \mathrm{Hom}_C(X_1 \otimes \cdots \otimes X_n, Y)

To compose operations f_1,\ldots,f_n and g (assuming that the types are such that these are composable), we simply take g \circ (f_1 \otimes \ldots \otimes f_n). Moreover, the identity operation is simply the identity morphism, and the action of \sigma \in S(n) is given by the symmetric monoidal structure.

In fact, the second example that we talked about is an example of this construction! If we let C be the category where the objects are compact subsets of \mathbb{R}^2, with embeddings as the morphisms, and let the symmetric monoidal product be disjoint union, then it is not too hard to show that the operad we end up with is the same as the one we described above.

Perhaps the most important example of this construction is when it is applied to (\mathsf{Set}, \times, 1), because this is important in the next section! This operad has as types, sets, and an operation

f \in \mathrm{Op}(\mathsf{Set})(X_1,\ldots,X_n;Y)

is simply a function

f \colon X_1 \times \cdots \times X_n \to Y

Operad algebras

Although ‘operad algebra’ is the name that has stuck in the literature, I think a better term would be ‘operad action’, because the analogy to keep in mind is that of a group action. A group action allows a group to ‘act on’ elements of a set; an operad algebra similarly allows an operad to ‘act on’ elements of a set.

Moreover, a group action can be described as a functor from the 1-element category representing that group to \mathsf{Set}, and as we will see, an operad algebra can also be described as an ‘operad morphism’ from the operad to \mathrm{Op}(\mathsf{Set}), the operad just described in the last section.

In fact, this is how we will define an operad algebra; first we will define what an operad morphism is, and then we will define an operad algebra as an operad morphism to \mathrm{Op}(\mathsf{Set}).

An operad morphism F from an operad \mathcal{O} to an operad \mathcal{P} is exactly what one would expect: it consists of

• For every X_1,\ldots,X_n,Y \in \mathcal{O}_0, a map

F \colon \mathcal{O}(X_1,\ldots,X_n;Y) \to \mathcal{P}(F(X_1),\ldots,F(X_n);F(Y))

such that F commutes with all of the things an operad does, i.e. composition, identities, and the action of \sigma \in S(n).

Thus an operad morphism F from \mathcal{O} to \mathrm{Op}(\mathsf{Set}), also known as an operad algebra, consists of

• A set F(X) for every X \in \mathcal{O}_0
• A function F(f) \colon F(X_1) \times \cdots \times F(X_n) \to F(Y) for every operation f \in \mathcal{O}(X_1,\ldots,X_n;Y)

such that the assignment of sets and functions preserves identities, composition, and the action of \sigma \in S(n).

Without further ado, let’s look at the examples. From any ring A we can produce an algebra F_A of \mathcal{R}. We let F_A(R) = A (considered as a set), and for

p(x_1,\ldots,x_n) \in \mathbb{Z}[x_1,\ldots,x_n] = \mathcal{R}(X_1,\ldots,X_n;Y)

we let

F(p)(a_1,\ldots,a_n) = p(a_1,\ldots,a_n)

We can also make an operad algebra of point processes, \mathrm{PP}, for \mathcal{P}. For A \in \mathcal{P}_0, we let \mathrm{PP}(A) be the set of point processes on A. If f \colon A_1 \sqcup \cdots \sqcup A_n \to B is an embedding, then we let \mathrm{PP}(f) be the map that sends point processes X_1,\ldots,X_n on A_1,\ldots,A_n respectively to the point process Y defined by

Y_U = X_{f^{-1}(U) \cap A_1} + \cdots + X_{f^{-1}(U) \cap A_n}

Finally, if (C,\otimes,1) is a symmetric monoidal category, there is a way to make an operad algebra of \mathrm{Op}(C) from a special type of functor F \colon C \to \mathsf{Set}. This is convenient, because it is often easier to prove that the functor satisfies the necessary properties than it is to prove that the algebra is in fact well-formed.

The special kind of functor we need is a lax symmetric monoidal functor. This is a functor F equipped with a natural transformation \tau_{A,B} \colon F(A) \times F(B) \to F(A \otimes B) that is well-behaved with respect to the associator, identity, and symmetric structure of (C, \otimes, 1). We call \tau the laxator, and formally speaking, a lax symmetric monoidal functor consists of a functor along with a laxator. I won’t go into detail about the whole construction that makes an operad algebra out of a lax symmetric monoidal functor, but the basic idea is that given an operation f \in \mathrm{Op}(C)(X,Y;Z) (which is a morphism f \colon X \otimes Y \to Z), we can construct a function F(X) \times F(Y) \to F(Z) by composing

\tau_{X,Y} \colon F(X) \times F(Y) \to F(X \otimes Y)


F(f) \colon F(X \otimes Y) \to F(Z)

This basic idea can be extended using associativity to produce a function X_1 \times \cdots \times X_n \to Y from an operation f \colon X_1 \otimes \cdots \otimes X_n \to Y.

As an example of this construction, consider point processes again. We can make a lax symmetric monoidal functor \mathrm{PP} by sending a set A to \mathrm{PP}(A), the set of point processes on A, and an embedding f \colon A \to B to the map F(f) that sends a point process X to a point process Y defined by

Y_U = X_{f^{-1}(U)}

The laxator \tau_{A,B} \colon F(A) \times F(B) \to F(A \sqcup B) sends a point process X on A and a point process Y on B to a point process Z on a A \sqcup B defined by

Z_{U} = X_{U \cap A} + Y_{U \cap B}

The reader should inspect this definition and think about why it is equivalent to the earlier definition for the operad algebra of point processes.


This was a long post, so I’m going to try and go over the main points so that you can organize what you just learned in some sort of coherent fashion.

First I talked about how there frequently arises situations in which there isn’t a canonical way of ‘composing’ two things. The two examples that I gave were elements of a ring, and structures on spaces, specifically point processes.

I then talked about the formal way that we think about these situations. Namely, we organize the ‘ways of composing things’ into an operad, and then we organize the ‘things that we want to compose’ into an operad algebra. Along the way, I discussed a convenient way of making an operad out of a symmetric monoidal category, and an operad algebra out of a lax symmetric monoidal functor.

This construction will be important in the next post, when we make an operad of ‘ways of composing thermostatic systems’ and an operad algebra of thermostatic systems to go along with it.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

Compositional Thermostatics (Part 2)

7 February, 2022

guest post by Owen Lynch

In Part 1, John talked about a paper that we wrote recently:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

and he gave an overview of what a ‘thermostatic system’ is.

In this post, I want to talk about how to compose thermostatic systems. We will not yet use category theory, saving that for another post; instead we will give a ‘nuts-and-bolts’ approach, based on examples.

Suppose that we have two thermostatic systems and we put them in thermal contact, so that they can exchange heat energy. Then we predict that their temperatures should equalize. What does this mean precisely, and how do we derive this result?

Recall that a thermostatic system is given by a convex space X and a concave entropy function S \colon X \to [-\infty,\infty]. A ‘tank’ of constant heat capacity, whose state is solely determined by its energy, has state space X = \mathbb{R}_{> 0} and entropy function S(U) = C \log(U), where C is the heat capacity.

Now suppose that we have two tanks of heat capacity C_1 and C_2 respectively. As thermostatic systems, the state of both tanks is described by two energy variables, U_1 and U_2, and we have entropy functions

S_1(U_1) = C_1 \log(U_1)

S_2(U_2) = C_2 \log(U_2)

By conservation of energy, the total energy of both tanks must remain constant, so

U_1 + U_2 = U

for some U; equivalently

U_2 = U - U_1

The equilibrium state then has maximal total entropy subject to this constraint. That is, an equilibrium state (U_1^{\mathrm{eq}},U_2^{\mathrm{eq}}) must satisfy

S_1(U_1^{\mathrm{eq}}) + S_2(U_2^{\mathrm{eq}}) = \max_{U_1+U_2=U} S_1(U_1) + S_2(U_2)

We can now derive the condition of equal temperature from this condition. In thermodynamics, temperature is defined by

\displaystyle{ \frac{1}{T} = \frac{\partial S}{\partial U} }

The interested reader should calculate this for our entropy functions, and in doing this, see why we identify C with the heat capacity. Now, manipulating the condition of equilibrium, we get

\max_{U_1+U_2=U} S_1(U_1) + S_2(U_2) = \max_{U_1} S_1(U_1) + S_2(U-U_1)

As a function of U_1, the right hand side of this equation must have derivative equal to 0. Thus,

\displaystyle{ \frac{\partial}{\partial U_1} (S_1(U_1) + S_2(U-U_1)) = 0 }

Now, note that if U_2 = U - U_1, then

\displaystyle{  \frac{\partial}{\partial U_1} S(U-U_1) = -\frac{\partial}{\partial U_2} S(U_2) }

Thus, the condition of equilibrium is

\displaystyle{  \frac{\partial}{\partial U_1} S_1(U_1) = \frac{\partial}{\partial U_2} S_2(U_2) }

Using the fact that

\displaystyle{ \frac{1}{T_1} = \frac{\partial}{\partial U_1} S_1(U_1) , \qquad \frac{1}{T_2} = \frac{\partial}{\partial U_2} S_2(U_2) }

the above equation reduces to

\displaystyle{ \frac{1}{T_1} = \frac{1}{T_2} }

so we have our expected condition of temperature equilibriation!

The result of composing several thermostatic systems should be a new thermostatic system. In the case above, the new thermostatic system is described by a single variable: the total energy of the system U = U_1 + U_2. The entropy function of this new thermostatic system is given by the constrained supremum:

S(U) = \max_{U = U_1 + U_2} S_1(U_1) + S_2(U_2)

The reader should verify that this ends up being the same as a system with heat capacity C_1 + C_2, i.e. with entropy function given by

S(U) = (C_1 + C_2) \log(U)

A very similar argument goes through when one has two systems that can exchange both heat and volume; both temperature and pressure are equalized as a consequence of entropy maximization. We end up with a system that is parameterized by total energy and total volume, and has an entropy function that is a function of those quantities.

The general procedure is the following. Suppose that we have n thermostatic systems, (X_1,S_1),\ldots,(X_n,S_n). Let Y be a convex space, that we think of as describing the quantities that are conserved when we compose the n thermostatic systems (i.e., total energy, total volume, etc.). Each value of the conserved quantities y \in Y corresponds to many different possible values for x_1 \in X_1, \ldots x_n \in X_n. We represent this with a relation

R \subseteq X_1 \times \cdots \times X_n \times Y

We then turn Y into a thermostatic system by using the entropy function

S(y) = \max_{R(x_1,\ldots,x_n,y)} S_1(x_1) + \ldots + S_n(x_n)

It turns out that if we require R to be a convex relation (that is, a convex subspace of X_1 \times \cdots \times X_n \times Y) then S as defined above ends up being a concave function, so (Y,S) is a true thermostatic system.

We will have to wait until a later post in the series to see exactly how we describe this procedure using category theory. For now, however, I want to talk about why this procedure makes sense.

In the statistical mechanical interpretation, entropy is related to the probability of observing a specific macrostate. As we scale the system, the theory of large deviations tells us that seeing any macrostate other than the most probable macrostate is highly unlikely. Thus, we can find the macrostate that we will observe in practice by finding the entropy maxima. For an exposition of this point of view, see this paper:

• Jeffrey Commons, Ying-Jen Yang and Hong Qian, Duality symmetry, two entropy functions, and an eigenvalue problem in Gibbs’ theory.

There is also a dynamical systems interpretation of entropy, where entropy serves as a Lyapunov function for a dynamical system. This is the viewpoint taken here:

• Wassim M. Haddad, A Dynamical Systems Theory of Thermodynamics, Princeton U. Press.

In each of these viewpoints, however, the maximization of entropy is not global, but rather constrained. The dynamical system only maximizes entropy along its orbit, and the statistical mechanical system maximizes entropy with respect to constraints on the probability distribution.

We can think of thermostatics as a ‘common refinement’ of both of these points of view. We are agnostic as to the mechanism by which constrained maximization of entropy takes place and we are simply interested in investigating its consequences. We expect that a careful formalization of either system should end up deriving something similar to our thermostatic theory in the limit.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

The Kepler Problem (Part 4)

27 January, 2022

The Kepler problem is the study of a particle moving in an attractive inverse square force. In classical mechanics, this problem shows up when you study the motion of a planet around the Sun in the Solar System. In quantum mechanics, it shows up when you study the motion of an electron around a proton in a hydrogen atom.

In Part 2 we saw that the classical Kepler problem has, besides energy and the three components of angular momentum, three more conserved quantities: the components of the eccentricity vector!

This was discovered long ago, in 1710, by the physicist Jakob Hermann. But thanks to Noether, we now know that in classical mechanics, conserved quantities come from symmetries. In the Kepler problem, conservation of energy comes from time translation symmetry, while conservation of the angular momentum comes from rotation symmetry. Which symmetries give conservation of the eccentricity vector?

As we shall see, these symmetries are rotations in 4-dimensional space. These include the obvious rotations in 3-dimensional space which give angular momentum. The other 4-dimensional rotations act in a much less obvious way, and give the eccentricity vector.

In fact, we’ll see that the Kepler problem can be rephrased in terms of a free particle moving around on a sphere in 4-dimensional space. This is a nice explanation of the 4-dimensional rotation symmetry.

After that we’ll see a second way to rephrase the Kepler problem: in terms of a massless, relativistic free particle moving at the speed of light on a sphere in 4-dimensional space. Our first formulation will not involve relativity. This second will.

All this is very nice. You can read some fun explanations of the first formulation here:

• Greg Egan, The ellipse and the atom.

• John Baez, Planets in the fourth dimension.

But how could you guess this 4-dimensional rotation symmetry if you didn’t know about it already? One systematic approach uses Poisson brackets. I won’t explain these, just dive in and use them!

Remember, the particle in the Kepler problem has various observables, which are all ultimately functions of its position and momentum:

• position: \vec q

• momentum: \vec p

• energy: H = \tfrac{1}{2} p^2 - \tfrac{1}{q}

• angular momentum: \vec L = \vec q \times \vec p

• the eccentricity vector: \vec e = \vec p \times \vec L - \tfrac{\vec q}{q}

I’ll use conventions where the Poisson brackets of the components of position q_k and momentum p_\ell are taken to be

\{q_k,p_\ell\} = \delta_{jk}

From this, using the rules for Poisson brackets, we can calculate the Poisson brackets of everything else. For starters:

\{H, L_k\} = \{H,e_h\} = 0

These equations are utterly unsurprising, since they are equivalent to saying that angular momentum \vec L and the eccentricity vector \vec e are conserved. More interestingly, we have

\begin{array}{ccl} \{L_k, L_\ell\} &=& \epsilon_{jk\ell} L_\ell  \\ \{e_k, L_\ell\} &=& \epsilon_{jk\ell} e_\ell \\ \{e_k, e_\ell \} &=& -2H \epsilon_{jk\ell} L_\ell \end{array}

where all the indices go from 1 to 3, I’m summing over repeated indices even if they’re both subscripts, and \epsilon_{jk\ell} are the Levi–Civita symbols.

Now, the factor of -2H above is annoying. But on the region of phase space where H < 0—that is, the space of bound states, where the particle carries out an elliptical orbit—we can define a new vector to deal with this annoyance:

\displaystyle{    \vec M = \frac{\vec e}{\sqrt{-2H}} }

Now we easily get

\begin{array}{ccl} \{L_k, L_\ell\} &=& \epsilon_{jk\ell} L_\ell  \\ \{L_j, M_k\} &=& \epsilon_{jk\ell} M_\ell \\ \{M_j, M_k \} &=& \epsilon_{jk\ell} M_\ell \end{array}

This is nicer, but we can simplify it even more if we introduce some new vectors that are linear combinations of \vec L and \vec M, namely half their sum and half their difference:

\vec A = \tfrac{1}{2} (\vec L + \vec M), \qquad \vec B = \tfrac{1}{2}(\vec L - \vec M)

Then we get

\begin{array}{ccl} \{ A_j, A_k\} &=&  \epsilon_{jk\ell} A_\ell \\ \{ B_j, B_k\} &=&  \epsilon_{jk\ell} B_\ell  \\ \{ A_j, B_k\} &=& 0 \end{array}

So, the observables A_j and B_k contain the same information as the angular momentum and eccentricity vectors, but now they commute with each other!

What does this mean?

Well, when you’re first learning math the Levi–Civita symbols \epsilon_{jk\ell} may seem like just a way to summarize the funny rules for cross products in 3-dimensional space. But as you proceed, you ultimately learn that \mathbb{R}^3 with its cross product is the Lie algebra of the Lie group \mathrm{SO}(3) of rotations in 3-dimensional space. From this viewpoint, the Levi–Civita symbols are nothing but the structure constants for the Lie algebra \mathfrak{so}(3): that is, a way of describing the bracket operation in this Lie algebra in terms of basis vectors.

So, what we’ve got here are two commuting copies of \mathfrak{so}(3), one having the A_j as a basis and the other having the B_k as a basis, both with the Poisson bracket as their Lie bracket.

A better way to say the same thing is that we’ve got a single 6-dimensional Lie algebra

\mathfrak{so}(3) \oplus \mathfrak{so}(3)

having both the A_j and B_k as basis. But then comes the miracle:

\mathfrak{so}(3) \oplus \mathfrak{so}(3) \cong \mathfrak{so}(4)

The easiest way to see this is to realize that S^3, the unit sphere in 4 dimensions, is itself a Lie group with Lie algebra isomorphic to \mathfrak{so}(3). Namely, it’s the unit quaternions!—or if you prefer, the Lie group \mathrm{SU}(2). Like any Lie group it acts on itself via left and right translations, which commute. But these are actually ways of rotating S^3. So, you get a map of Lie algebras from \mathfrak{so}(3) \oplus \mathfrak{so}(3) to \mathfrak{so}(4), and you can check that this is an isomorphism.

So in this approach, the 4th dimension pops out of the fact that the Kepler problem has conserved quantities that give two commuting copies of \mathfrak{so}(3). By Noether’s theorem, it follows that conservation of angular momentum and the eccentricity vector must come from a hidden symmetry: symmetry under some group whose Lie algebra is \mathfrak{so}(4).

And indeed, it turns out that the group \mathrm{SO}(4) acts on the bound states of the Kepler problem in a way that commutes with time evolution!

But how can we understand this fact?

Historically, it seems that the first explanation was found in the quantum-mechanical context. In 1926, even before Schrödinger came up with his famous equation, Pauli used conservation of angular momentum and the eccentricity to determine the spectrum of hydrogen. But I believe he was using what we now call Lie algebra methods, not bringing in the group \mathrm{SO}(4).

In 1935, Vladimir Fock, famous for the ‘Fock space’ in quantum field theory, explained this 4-dimensional rotation symmetry by setting up an equivalence between hydrogen atom bound states and functions on the 3-sphere! In the following year, Valentine Bargmann, later famous for being Einstein’s assistant, connected Pauli and Fock’s work using group representation theory.

All this is quantum mechanics. It seems the first global discussion of this symmetry in the classical context was given by Bacry, Ruegg, and Souriau in 1966, leading to important work by Souriau and Moser in the early 1970s. Since then, much more has been done. You can learn about a lot of it from these two books, which are my constant companions these days:

• Victor Guillemin and Shlomo Sternberg, Variations on a Theme by Kepler, Providence, R.I., American Mathematical Society, 1990.

• Bruno Cordani, The Kepler Problem: Group Theoretical Aspects, Regularization and Quantization, with Application to the Study of Pertubation, Birkhäuser, Boston, 2002.

But let me try to summarize a bit of this material.

One way to understand the \mathrm{SO}(4) symmetry for bound states of the Kepler problem is the result of Hamilton that I explained last time: for a particle moving around an elliptical orbit in the Kepler problem, its momentum moves round and round in a circle.

I’ll call these circles Hamilton’s circles. Hamilton’s circles are not arbitrary circles in \mathbb{R}^3. Using the inverse of stereographic projection, we can map \mathbb{R}^3 to the unit 3-sphere:

\begin{array}{rccl} f \colon &\mathbb{R}^3 &\to & S^3    \subset \mathbb{R}^4  \\  \\ & \vec p    &\mapsto & \displaystyle{\left(\frac{p^2 - 1}{p^2 +1}, \frac{2 \vec p}{p^2 + 1}\right).} \end{array}

This map sends Hamilton’s circles in \mathbb{R}^3 to great circles in S^3. Furthermore, this construction gives all the great circles in S^3 except those that go through the north and south poles, (\pm 1, 0,0,0). These missing great circles correspond to periodic orbits in the Kepler problem where a particle starts with momentum zero, falls straight to the origin, and bounces back the way it came. If we include these degenerate orbits, every great circle on the unit 3-sphere is the path traced out by the momentum in some solution of the Kepler problem.

Let me reemphasize: in this picture, points of S^3 correspond not to positions but to momenta in the Kepler problem. As time passes, these points move along great circles in S^3... but not at constant speed.

How is their dynamics related to geodesic motion on the 3-sphere?
We can understand this as follows. In Part 2 we saw that

L^2 + M^2 =  - \frac{1}{2H}

and using the fact that \vec L \cdot \vec M = 0, an easy calculation gives

H \; = \; -\frac{1}{8A^2} \; = \; -\frac{1}{8B^2}

In the 3-sphere picture, the observables A_j become functions on the cotangent bundle T^\ast S^3. These functions are just the components of momentum for a particle on S^3, defined using a standard basis of right-invariant vector fields on S^3 \cong \mathrm{SU}(2). Similarly, the observables B_j are the components of momentum using a standard basis of left-invariant vector fields. It follows that

K = 8A^2 = 8B^2

is the Hamiltonian for a nonrelativistic free particle on S^3 with an appropriately chosen mass. Such a particle moves around a great circle on S^3 at constant speed. Since the Kepler Hamiltonian H is a function of K, particles governed by this Hamiltonian move along the same trajectories—but typically not at constant speed!

Both K and the Kepler Hamiltonian H = -1/K are well-defined smooth functions on the symplectic manifold that Souriau dubbed the Kepler manifold:

T^+ S^3 = \{ (x,p) : \; x \in S^3, \, p \in T_x S^3, \, p \ne 0 \}

This is the cotangent bundle of the 3-sphere with the zero cotangent vectors removed, so that H = -1/K is well-defined.

All this is great. But even better, there’s yet another picture of what’s going on, which brings relativity into the game!

We can also think of T^+ S^3 as a space of null geodesics in the Einstein universe: the manifold \mathbb{R} \times S^3 with the Lorentzian metric

dt^2 - ds^2

where dt^2 is the usual Riemannian metric on the real line (‘time’) and ds^2 is the usual metric on the unit sphere (‘space’). In this picture x \in S^3 describes the geodesic’s position at time zero, while the null cotangent vector p + \|p\| dt describes its 4-momentum at time zero. Beware: in this picture two geodesics count as distinct if we rescale p by any positive factor other than 1. But this is good: physically, it reflects the fact that in relativity, massless particles can have different 4-momentum even if they trace out the same path in spacetime.

In short, the Kepler manifold T^+ S^3 also serves as the classical phase space for a free massless spin-0 particle in the Einstein universe!

And here’s the cool part: the Hamiltonian for such a particle is

\sqrt{K} = \sqrt{-1/H}

So it’s a function of both the Hamiltonians we’ve seen before. Thus, time evolution given by this Hamiltonian carries particles around great circles on the 3-sphere… at constant speed, but at a different speed than the nonrelativistic free particle described by the Hamiltonian K.

In future episodes, I want to quantize this whole story. We’ll get some interesting outlooks on the quantum mechanics of the hydrogen atom.

The Kepler Problem (Part 3)

23 January, 2022

The Kepler problem studies a particle moving in an inverse square force, like a planet orbiting the Sun. Last time I talked about an extra conserved quantity associated to this problem, which keeps elliptical orbits from precessing or changing shape. This extra conserved quantity is sometimes called the Laplace–Runge–Lenz vector, but since it was first discovered by none of these people, I prefer to call it the ‘eccentricity vector’

In 1847, Hamilton noticed a fascinating consequence of this extra conservation law. For a particle moving in an inverse square force, its momentum moves along a circle!

Greg Egan has given a beautiful geometrical argument for this fact:

• Greg Egan, The ellipse and the atom.

I will not try to outdo him; instead I’ll follow a more dry, calculational approach. One reason is that I’m trying to amass a little arsenal of formulas connected to the Kepler problem.

Let’s dive in. Remember from last time: we’re studying a particle whose position \vec q obeys

\ddot{\vec q} = - \frac{\vec q}{q^3}

Its momentum is

\vec p = m \dot{\vec q}

Its momentum is not conserved. Its conserved quantities are energy:

H = \frac{1}{2} p^2 - \frac{1}{q}

the angular momentum vector:

\vec L = \vec q \times \vec p

and the eccentricity vector:

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

Now for the cool part: we can show that

\displaystyle{ \left( \vec p - \frac{\vec L \times \vec e}{L^2} \right)^2 = \frac{1}{L^2} }

Thus, the momentum \vec p stays on a circle of radius 1/L centered at the point (\vec L \times \vec e)/L^2. And since \vec L and \vec e are conserved, this circle doesn’t change! Let’s call it Hamilton’s circle.

Now let’s actually do the calculations needed to show that the momentum stays on Hamilton’s circle. Since

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

we have

\frac{\vec q}{q} = \vec p \times \vec L - \vec e

Taking the dot product of this vector with itself, which is 1, we get

\begin{array}{ccl}  1 &=& \frac{\vec q}{q} \cdot \frac{\vec q}{q}  \\ \\  &=& (\vec p \times \vec L - \vec e) \cdot (\vec p \times \vec L - \vec e) \\ \\  &=& (\vec p \times \vec L)^2 - 2 \vec e \cdot (\vec p \times \vec L) + e^2  \end{array}

Now, notice that \vec p and \vec L are orthogonal since \vec L = \vec q \times \vec p. Thus

(\vec p \times \vec L)^2 = p^2 L^2

I actually used this fact and explained it in more detail last time. Substituting this in, we get

1 = p^2 L^2 - 2 \vec e \cdot (\vec p \times \vec L) + e^2

Similarly, \vec e and \vec L are orthogonal! After all,

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

The first term is orthogonal to \vec L since it’s the cross product of \vec L and some other vector. And the second term is orthogonal to \vec L since \vec L is the cross product of \vec q and some other vector! So, we have

(\vec L \times \vec e)^2 = L^2 e^2

and thus

\displaystyle { e^2 = \frac{(\vec L \times \vec e)^2}{L^2} }

Substituting this in, we get

\displaystyle { 1 = p^2 L^2 - 2 \vec e \cdot (\vec p \times \vec L) + \frac{(\vec L \times \vec e)^2}{L^2} }

Using the cyclic property of the scalar triple product, we can rewrite this as

\displaystyle { 1 = p^2 L^2 - 2 \vec p \cdot (\vec L \times \vec e) + \frac{(\vec L \times \vec e)^2}{L^2} }

This is nicer because it involves \vec L \times \vec e in two places. If we divide both sides by L^2 we get

\displaystyle { \frac{1}{L^2} = p^2 - \frac{2}{L^2} \; \vec p \cdot (\vec L \times \vec e) + \frac{(\vec L \times \vec e)^2}{L^4} }

And now for the final flourish! The right hand is the dot product of a vector with itself:

\displaystyle { \frac{1}{L^2} = \left(\vec p -  \frac{\vec L \times \vec e}{L^2}\right)^2 }

This is the equation for Hamilton’s circle!

Now, beware: the momentum \vec p doesn’t usually move at a constant rate along Hamilton’s circle, since that would force the particle’s orbit to itself be circular.

But on the bright side, the momentum moves along Hamilton’s circle regardless of whether the particle’s orbit is elliptical, parabolic or hyperbolic. And we can easily distinguish the three cases using Hamilton’s circle!

After all, the center of Hamilton’s circle is the point (\vec L \times \vec e)/L^2, and

(\vec L \times \vec e)^2 = L^2 e^2

so the distance of this center from the origin is

\displaystyle{ \sqrt{\frac{(\vec L \times \vec e)^2}{L^4}} = \sqrt{\frac{L^2 e^2}{L^4}} = \frac{e}{L} }

On the other hand, the radius of Hamilton’s circle is 1/L. So his circle encloses the origin, goes through the origin or does not enclose the origin depending on whether e < 1, e = 1 or e > 1. But we saw last time that these three cases correspond to elliptical, parabolic and hyperbolic orbits!


• If e < 1 the particle’s orbit is an ellipse and the origin lies inside Hamilton’s circle. The momentum goes round and round Hamilton’s circle as time passes.

• If e = 1 the particle’s orbit is a parabola and the origin lies exactly on Hamilton’s circle. The particle’s momentum approaches zero as time approaches \pm \infty, so its momentum goes around Hamilton’s circle exactly once as time passes.

• If e > 1 the particle’s orbit is a hyperbola and the origin lies outside Hamilton’s circle. The particle’s momentum approaches distinct nonzero values as time approaches \pm \infty, so its momentum goes around just a portion of Hamilton’s circle.

By the way, in general the curve traced out by the momentum vector of a particle is called a hodograph. So you can learn more about Hamilton’s circle with the help of that buzzword.

The Color of Infinite Temperature

16 January, 2022


This is the color of something infinitely hot. Of course you’d instantly be fried by gamma rays of arbitrarily high frequency, but this would be its spectrum in the visible range.

This is also the color of a typical neutron star. They’re so hot they look the same.

It’s also the color of the early Universe!

This was worked out by David Madore.

As a blackbody gets hotter and hotter, its spectrum approaches the classical Rayleigh–Jeans law. That is, its true spectrum as given by the Planck law approaches the classical prediction over a larger and larger range of frequencies.

So, for an extremely hot blackbody, the spectrum of light we can actually see with our eyes is governed by the Rayleigh–Jeans law. This law says the color doesn’t depend on the temperature: only the brightness does!

And this color is shown above.

This involves human perception, not just straight physics. So David Madore needed to work out the response of the human eye to the Rayleigh–Jeans spectrum — “by integrating the spectrum against the CIE XYZ matching functions and using the definition of the sRGB color space.”

The color he got is sRGB(148,177,255). And according to the experts who sip latte all day and make up names for colors, this color is called ‘Perano’.

Here is some background material Madore wrote on colors and visual perception. It doesn’t include the whole calculation that leads to this particular color, so somebody should check it, but it should help you understand how to convert the blackbody spectrum at a particular temperature into an sRGB color:

• David Madore, Colors and colorimetry.

In the comments you can see that Thomas Mansencal redid the calculation and got a slightly different color: sRGB(154,181,255). It looks quite similar to me:

The Kepler Problem (Part 2)

31 December, 2021

I’m working on a math project involving the periodic table of elements and the Kepler problem—that is, the problem of a particle moving in an inverse square force law. That’s one reason I’ve been blogging about chemistry lately! I hope to tell you all about this project sometime—but right now I just want to say some very basic stuff about the ‘eccentricity vector’.

This vector is a conserved quantity for the Kepler problem. It was named the ‘Runge–Lenz vector’ after Lenz used it in 1924 to study the hydrogen atom in the framework of the ‘old quantum mechanics’ of Bohr and Sommerfeld: Lenz cite Runge’s popular German textbook on vector analysis from 1919, which explains this vector. But Runge never claimed any originality: he attributed this vector to Gibbs, who wrote about it in his book on vector analysis in 1901!

Nowadays many people call it the ‘Laplace–Runge–Lenz vector’, honoring Laplace’s discussion of it in his famous treatise on celestial mechaics in 1799. But in fact this vector goes back at least to Jakob Hermann, who wrote about it in 1710, triggering further work on this topic by Johann Bernoulli in the same year.

Nobody has seen signs of this vector in work before Hermann. So, we might call it the Hermann–Bernoulli–Laplace–Gibbs–Runge–Lenz vector, or just the Hermann vector. But I prefer to call it the eccentricity vector, because for a particle in an inverse square law its magnitude is the eccentricity of that orbit!

Let’s suppose we have a particle whose position \vec q \in \mathbb{R}^3 obeys this version of the inverse square force law:

\ddot{\vec q} = - \frac{\vec q}{q^3}

where I remove the arrow from a vector when I want to talk about its magnitude. So, I’m setting the mass of this particle equal to 1, along with the constant saying the strength of the force. That’s because I want to keep the formulas clean! With these conventions, the momentum of the particle is

\vec p = \dot{\vec q}

For this system it’s well-known that the following energy is conserved:

H = \frac{1}{2} p^2 - \frac{1}{q}

as well as the angular momentum vector:

\vec L = \vec q \times \vec p

But the interesting thing for me today is the eccentricity vector:

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

Let’s check that it’s conserved! Taking its time derivative,

\dot{\vec e} = \dot{\vec p} \times \vec L + \vec p \times \dot{\vec L} - \frac{\vec p}{q} + \frac{\dot q}{q^2} \,\vec q

But angular momentum is conserved so the second term vanishes, and

\dot q = \frac{d}{dt} \sqrt{\vec q \cdot \vec q} =  \frac{\vec p \cdot \vec q}{\sqrt{\vec q \cdot \vec q}} = \frac{\vec p \cdot \vec q}{q}

so we get

\dot{\vec e} = \dot{\vec p} \times \vec L - \frac{\vec p}{q} +  \frac{\vec p \cdot \vec q}{q^2}\, \vec q

But the inverse square force law says

\dot{\vec p} = - \frac{\vec q}{q^3}


\dot{\vec e} = - \frac{1}{q^3} \, \vec q \times \vec L - \frac{\vec p}{q} +  \frac{\vec p \cdot \vec q}{q^2}\, \vec q

How can we see that this vanishes? Mind you, there are various geometrical ways to think about this, but today I’m in the mood for checking that my skills in vector algebra are sufficient for a brute-force proof—and I want to record this proof so I can see it later!

To get anywhere we need to deal with the cross product in the above formula:

\vec q \times \vec L = \vec q \times (\vec q \times \vec p)

There’s a nice identity for the vector triple product:

\vec a \times (\vec b \times \vec c) = (\vec a \cdot \vec c) \vec b - (\vec a \cdot \vec b) \vec c

I could have fun talking about why this is true, but I won’t now! I’ll just use it:

\vec q \times \vec L = \vec q \times (\vec q \times \vec p) = (\vec q \cdot \vec p) \vec q - q^2 \, \vec p

and plug this into our formula

\dot{\vec e} = - \frac{1}{q^3} \, \vec q \times \vec L - \frac{\vec p}{q} +  \frac{\vec p \cdot \vec q}{q^2}\, \vec q


\dot{\vec e} = -\frac{1}{q^3} \Big((\vec q \cdot \vec p) \vec q - q^2 \vec p \Big) - \frac{\vec p}{q} +  \frac{\vec p \cdot \vec q}{q^3}\, \vec q

But look—everything cancels! So

\dot{\vec e} = 0

and the eccentricity vector is conserved!

So, it seems that the inverse square force law has 7 conserved quantities: the energy H, the 3 components of the angular momentum \vec L, and the 3 components of the eccentricity vector \vec e. But they can’t all be independent, since the particle only has 6 degrees of freedom: 3 for position and 3 for momentum. There can be at most 5 independent conserved quantities, since something has to change. So there have to be at least two relations betwen the conserved quantities we’ve found.

The first of these relations is pretty obvious: \vec e and \vec L are at right angles, so

\vec e \cdot \vec L = 0

But wait, why are they at right angles? Because

\vec e = \vec p \times \vec L - \frac{\vec q}{q}

The first term is orthogonal to \vec L because it’s a cross product of \vec p and \vec L; the second is orthogonal to \vec L because \vec L is a cross product of \vec q and \vec p.

The second relation is a lot less obvious, but also more interesting. Let’s take the dot product of \vec e with itself:

e^2 = \left(\vec p \times \vec L - \frac{\vec q}{q}\right) \cdot \left(\vec p \times \vec L - \frac{\vec q}{q}\right)

or in other words,

e^2 = (\vec p \times \vec L) \cdot (\vec p \times \vec L) - \frac{2}{q} \vec q \cdot (\vec p \times \vec L) + 1

But remember this nice cross product identity:

(\vec a \times \vec b) \cdot (\vec a \times \vec b) + (\vec a \cdot \vec b)^2 = a^2 b^2

Since \vec p and L are at right angles this gives

(\vec p \times \vec L) \cdot (\vec p \times \vec L) = p^2 L^2


e^2 = p^2 L^2 - \frac{2}{q} \vec q \cdot (\vec p \times \vec L) + 1

Then we can use the cyclic identity for the scalar triple product:

\vec a \cdot (\vec b \times \vec c) = \vec c \cdot (\vec a \times \vec b)

to rewrite this as

e^2 = p^2 L^2 - \frac{2}{q} \vec L \cdot (\vec q \times \vec p) + 1

or simply

e^2 = p^2 L^2 - \frac{2}{q} L^2 + 1

or even better,

e^2 = 2 \left(\frac{1}{2} p^2 - \frac{1}{q}\right) L^2 + 1

But this means that

e^2 = 2HL^2 + 1

which is our second relation between conserved quantities for the Kepler problem!

This relation makes a lot of sense if you know that e is the eccentricity of the orbit. Then it implies:

• if H > 0 then e > 1 and the orbit is a hyperbola.

• if H = 0 then e = 1 and the orbit is a parabola.

• if H < 0 then 0 < e < 1 and the orbit is an ellipse (or circle).

But why is e the eccentricity? And why does the particle move in a hyperbola, parabola or ellipse in the first place? We can show both of these things by taking the dot product of \vec q and \vec e:

\begin{array}{ccl}   \vec q \cdot \vec e &=&   \vec q \cdot \left(\vec p \times \vec L - \frac{\vec q}{q} \right)  \\ \\    &=& \vec q \cdot (\vec p \times \vec L) - q   \end{array}

Using the cyclic property of the scalar triple product we can rewrite this as

\begin{array}{ccl}   \vec q \cdot \vec e &=&   \vec L \cdot (\vec q \times \vec p) - q  \\ \\  &=& L^2 - q  \end{array}

Now, we know that \vec q moves in the plane orthogonal to \vec L. In this plane, which contains the vector \vec e, the equation \vec q \cdot \vec e = L^2 - q defines a conic of eccentricity e. I won’t show this from scratch, but it may seem more familiar if we rotate the whole situation so this plane is the xy plane and \vec e points in the x direction. Then in polar coordinates this equation says

er \cos \theta = L^2 - r


r = \frac{L^2}{1 + e \cos \theta}

This is well-known, at least among students of physics who have solved the Kepler problem, to be the equation of a conic of eccentricity e.

Another thing that’s good to do is define a rescaled eccentricity vector. In the case of elliptical orbits, where H < 0, we define this by

\vec M = \frac{\vec e}{\sqrt{-2H}}

Then we can take our relation

e^2 = 2HL^2 + 1

and rewrite it as

1 = e^2 - 2H L^2

and then divide by -2H getting

- \frac{1}{2H} = \frac{e^2}{-2H} + L^2


- \frac{1}{2H} = L^2 + M^2

This suggests an interesting similarity between \vec L and \vec M, which turns out to be very important in a deeper understanding of the Kepler problem. And with more work, you can use this idea to show that -1/4H is the Hamiltonian for a free particle on the 3-sphere. But more about that some other time, I hope!

For now, you might try this:

• Wikipedia, Laplace–Runge–Lenz vector.

and of course this:

The Kepler problem (part 1).

Photon-Photon Scattering

13 December, 2021

Light can bounce off light by exchanging virtual charged particles! This gives nonlinear corrections to Maxwell’s equations, even in the vacuum—but they’re only noticeable when the electric field is about 1018 volts/meter or more. This is an enormous electric field, able to accelerate a proton from rest to Large Hadron Collider energies in just 5 micrometers!

In 2017, light-on-light scattering was seen at the LHC when they shot lead ions past each other:

Direct evidence for light-by-light scattering at high energy had proven elusive for decades, until the Large Hadron Collider (LHC) began its second data-taking period (Run 2). Collisions of lead ions in the LHC provide a uniquely clean environment to study light-by-light scattering. Bunches of lead ions that are accelerated to very high energy are surrounded by an enormous flux of photons. Indeed, the coherent action from the large number of 82 protons in a lead atom with all the electrons stripped off (as is the case for the lead ions in the LHC) give rise to an electromagnetic field of up to 1025 volts per metre. When two lead ions pass close by each other at the centre of the ATLAS detector, but at a distance greater than twice the lead ion radius, those photons can still interact and scatter off one another without any further interaction between the lead ions, as the reach of the (much stronger) strong force is bound to the radius of a single proton. These interactions are known as ultra-peripheral collisions.

But now people want to see photon-photon scattering by shooting lasers at each other! One place they’ll try this is at the Extreme Light Infrastructure.

In 2019, a laser at the Extreme Light Infrastructure in Romania achieved a power of 10 petawatts for brief pulses — listen to the announcement for what means!

I think it reached an intensity of 1029 watts per square meter, but I’m not sure. If you know the intensity I in watts/square mete of a plane wave of light, you can compute the maximum strength E of its electric field (in volts/meter) by

I = \frac{1}{2} \varepsilon_0 c  E^2

where \varepsilon_0 is the permittivity of the vacuum and c is the speed of light. According to Dominik Wild, I = 1029 watts per square meter gives E \approx 1016 volts/meter. If so, this is about 1/100 the field strength needed to see strong nonlinear corrections to Maxwell’s equations.

In China, the Station of Extreme Light plans to build a laser that makes brief pulses of 100 petawatts. That’s 10,000 times the power of all the world’s electrical grids combined—for a very short time! They’re aiming for an intensity of 1028 watts/square meter:

• Edwin Cartlidge, Physicists are planning to build lasers so powerful they could rip apart empty space, Science, January 24, 2018.

The modification of Maxwell’s equations due to virtual particles was worked out by Heisenberg and Euler in 1936. (No, not that Euler.) They’re easiest to describe using a Lagrangian, but if we wrote out the equations we’d get Maxwell’s equations plus extra terms that are cubic in \mathbf{E} and \mathbf{B}.

For more, read these:

• Wikpedia, Schwinger limit.

• Wikpedia, Schwinger effect.

The Schwinger limit is the strength of the electric (or magnetic) field where nonlinearity due to virtual charrged particles becomes significant. They’re about

\displaystyle{ E_\text{c} = \frac{m_\text{e}^2 c^3}{e \hbar} \approx 1.32 \times 10^{18} \, \textrm{volts/meter} }

\displaystyle{ B_\text{c} = \frac{m_\text{e}^2 c^2}{e \hbar} \approx 4.41 \times 10^{9} \, \textrm{tesla} }

where e is the electron charge and \hbar is Planck’s constant. For more see page 38 here:

• David Delphenich, Nonlinear electrodynamics and QED.

The Schwinger effect is when a very large static electric field ‘sparks the vacuum’ and creates real particles. This may put an upper limit on many protons can be in an atomic nucleus, spelling an end to the periodic table.

Transition Metals

9 December, 2021

The transition metals are more complicated than lighter elements.


Because they’re the first whose electron wavefunctions are described by quadratic functions of x,y, and z — not just linear or constant. These are called ‘d orbitals’, and they look sort of like this:

More precisely: the wavefunctions of electrons in atoms depend on the distance r from the nucleus and also the angles \theta, \phi. The angular dependence is described by ‘spherical harmonics’, certain functions on the sphere. These are gotten by taking certain polynomials in x,y,z and restricting them to the unit sphere. Chemists have their own jargon for this:

• constant polynomial: s orbital

• linear polynomial: p orbital

• quadratic polynomial: d orbital

• cubic polynomial: f orbital

and so on.

To be even more precise, a spherical harmonic is an
eigenfunction of the Laplacian on the sphere. Any such function is the restriction to the sphere of some homogeneous polynomial in x,y,z whose Laplacian in 3d space is zero. This polynomial can be constant, linear, etc.

The dimension of the space of spherical harmonics goes like 1, 3, 5, 7,… as we increase the degree of the polynomial starting from 0:

• constant: 1

• linear: x, y, z

• quadratic: xy, xz, yz, x^2 - y^2, x^2 - z^2

etcetera. So, we get one s orbital, three p orbitals, five d orbitals and so on. Here I’ve arbitrarily chosen a basis of the space of quadratic polynomials with vanishing Laplacian, and I’m not claiming this matches the d orbitals in the pictures!

The transition metals are the first to use the d orbitals. This is why they’re so different than lighter elements.

Although there are 5 d orbitals, an electron occupying such an orbital can have spin up or down. This is why there are 10 transition metals per row!

This chart doesn’t show the last row of highly radioactive transition metals, just the ones you’re likely to see:

Look: 10 per row, all because there’s a 5d space of quadratic polynomials in x,y,z with vanishing Laplacian. Math becomes matter.

The Madelung rules

Can we understand why the first transition element, scandium, has 21 electrons? Yes, if we’re willing to use the ‘Madelung rules’ explained last time. Let me review them rapidly here.

You’ll notice this chart has axes called n and \ell.

As I just explained, the angular dependence of an orbital is determined by a homogeneous polynomial with vanishing Laplacian. In the above chart, the degree of this polynomial is called \ell. The space of such polynomials has dimension 2\ell + 1.

But an orbital has an additional radial dependence, described using a number called n. The math, which I won’t go into, requires that 0 \le \ell \le n. That gives the above chart its roughly triangular appearance.

The letters s, p, d, f are just chemistry jargon for \ell = 0,1,2,3.

Thanks to spin and the Pauli exclusion principle, we can pack at most 2(2\ell + 1) electrons into the orbitals with a given choice of n and \ell. This bunch of orbitals is called a ‘subshell’.

The Madelung rules say the order in which subshells get filled:

  1. Electrons are assigned to subshells in order of increasing values of n + \ell.
  2. For subshells with the same value of n + \ell, electrons are assigned first to the subshell with lower n.

So let’s see what happens. Only when we hit \ell = 2 will we get transition metals!

\boxed{n + \ell = 1}

n = 1, \ell = 0

This is called the 1s subshell, and we can put 2 electrons in here. First we get hydrogen with 1 electron, then helium with 2. At this point all the n = 1 subshells are full, so the ‘1st shell’ is complete, and helium is called a ‘noble gas’.

\boxed{n + \ell = 2}

n = 2, \ell = 0

This is called the 2s subshell, and we can put 2 more electrons in here. We get lithium with 3 electrons, and then beryllium with 4.

\boxed{n + \ell = 3}

n = 2, \ell = 1

This is called the 2p subshell, and we can put 6 more electrons in here. We get:

◦ boron with 5 electrons,
◦ carbon with 6,
◦ nitrogen with 7,
◦ oxygen with 8,
◦ fluorine with 9,
◦ neon with 10.

At this point all the n = 2 subshells are full, so the 2nd shell is complete and neon is another noble gas.

n = 3, \ell = 0

This is is called the 3s subshell, and we can put 2 more electrons in here. We get sodium with 11 electrons, and magnesium with 12.

\boxed{n + \ell = 4}

n = 3, \ell = 1

This is called the 4p subshell, and we can put 6 more electrons in here. We get:

◦ aluminum with 13 electrons,
◦ silicon with 14,
◦ phosphorus with 15,
◦ sulfur with 16,
◦ chlorine with 17,
◦ argon with 18.

At this point all the n = 3 subshells are full, so the 3rd shell is complete and argon is another noble gas.

n = 4, \ell = 0

This is called the 4s subshell, and we can put 2 more electrons in here. We get potassium with 19 electrons and calcium with 20.

\boxed{n + \ell = 5}

n = 3, \ell = 2

This is called the 3d subshell, and we can put 10 electrons in here. Since now we’ve finally hit \ell = 2, and thus a d subshell, these are transition metals! We get:

◦ scandium with 21 electrons,
◦ titanium with 22,
◦ vanadium with 23,
◦ chromium with 24,
◦ manganese with 25,
◦ iron with 26,
◦ cobalt with 27,
◦ nickel with 28,
◦ copper with 29,
◦ zinc with 30.

And the story continues—but at least we’ve seen why the first batch of transition elements starts where it does!

The scandal of scandium

For a strong attack on the Madelung rules, see:

• Eric Scerri, The problem with the Aufbau principle for finding electronic configurations, 24 June 2012.

But it’s important to realize that he’s attacking a version of the Madelung rules that is different, and stronger than the version stated above. My version only concerned atoms, not ions. The stronger version claims that you can use the Madelung rules not only to determine the ground state of an atom, but also those of the positive ions obtained by taking that atom and removing some electrons!

This stronger version breaks down if you consider scandium with one electron removed. As we’ve just seen, scandium has the electrons as in argon together with three more: two in the 4s orbital and one in the 3d orbital. This conforms to the Madelung rules.

But when you ionize scandium and remove one electron, it’s not the 3d electron that leaves—it’s one of the 4s electrons! This breaks the stronger version of the Madelung rules.

The weaker version of the Madelung rules also breaks down, but later in the transition metals. The first problem is with chromium, the second is with copper:

By the Madelung rules, chromium should have 2 electrons in the 4s shell and 4 in the 3d shell. But in fact it has just 1 in the 4s and 5 in the 3d.

The second is with copper. By the Madelung rules, this should have 2 electrons in the 4s shell and 9 in the 3d. But in fact it has just 1 in the 4s and 10 in the 3d.

There are also other breakdowns in heavier transition metals, listed here:

• Wikipedia, Aufbau principle: exceptions in the d block.

These subtleties can only be understood by digging a lot deeper into how the electrons in an atom interact with each other. That’s above my pay grade right now. If you know a good place to learn more about this, let me know! I’m only interested in atoms here, not molecules.

Oxidation states of transition metals

Transition metals get some of their special properties because the electrons in the d subshell are easily removed. For example, this is why the transition metals conduct electricity.

Also, when reacting chemically with other elements, they lose different numbers of electrons. The different possibilities are called ‘oxidation states’.

For example, scandium has all the electrons of argon (Ar) plus two in an s orbital and one in a d orbital. It can easily lose 3 electrons, giving an oxidation state called Sc3+. Titanium has one more electron, so it can lose 4 and form Ti4+. And so on:

This accounts for the most obvious pattern in the chart below: the diagonal lines sloping up.

The red dots are common oxidation states, while the white dots are rarer oxidation states. For example iron (Fe) can lose 2 electrons, 3 electrons, 4 electrons (more rarely), 5 electrons, or 6 electrons (more rarely).

The diagonal lines sloping up come from the simple fact that as we move through a group of transition metals, there are more and more electrons in the d subshell, so more can be easily be removed. But everything is complicated by the fact that electrons interact! So the trend doesn’t go on forever: manganese gives up 8 electrons but iron doesn’t easily give up 8, only at most 6. And there’s much more going on, too.

Note also that the two charts above don’t actually agree: the chart in color includes more rare oxidation states.


For a bit more, read:

• Wikipedia, Transition metals.

Oxidation states of transition metals, Chemistry LibreTexts.

The colored chart of oxidation states in this post is from Wikicommons,
made by Felix Wan, corrected to include the two most common oxidation
states of ruthenium. The black-and-white chart is from the Chemistry