There will be a workshop on the categorical semantics of entropy at the CUNY Grad Center in Manhattan on Friday May 13th, organized by John Terilla. I was kindly invited to give an online tutorial beforehand on May 11, which I will give remotely to save carbon. Tai-Danae Bradley will also be giving a tutorial that day in person:
12:00-1:00 Eastern Daylight Time — Lunch in Room 5209.
1:00-2:30 — Shannon entropy from category theory, John Baez, University of California Riverside; Centre for Quantum Technologies (Singapore); Topos Institute.
Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.
2:30-3:00 — Coffee break.
3:00-4:30 — Operads and entropy, Tai-Danae Bradley, The Master’s University; Sandbox AQ.
This talk will open with a basic introduction to operads and their representations, with the main example being the operad of probabilities. I’ll then give a light sketch of how this framework leads to a small, but interesting, connection between information theory, abstract algebra, and topology, namely a correspondence between Shannon entropy and derivations of the operad of probabilities.
The maximum entropy principle is a fascinating and productive lens with which to view both thermodynamics and statistical mechanics. In this talk, we present a categorification of the maximum entropy principle, using convex spaces and operads. Along the way, we will discuss a variety of examples of the maximum entropy principle and show how each application can be captured using our framework. This approach shines a new light on old constructions. For instance, we will show how we can derive the canonical ensemble by attaching a probabilistic system to a heat bath. Finally, our approach to this categorification has applications beyond the maximum entropy principle, and we will give an hint of how to adapt this categorification to the formalization of the composition of other systems.
11:00-11:45 — Polynomial functors and Shannon entropy, David Spivak, MIT and the Topos Institute.
The category Poly of polynomial functors in one variable is extremely rich, brimming with categorical gadgets (e.g. eight monoidal products, two closures, limits, colimits, etc.) and applications including dynamical systems, databases, open games, and cellular automata. In this talk I’ll show that objects in Poly can be understood as empirical distributions. In part using the standard derivative of polynomials, we obtain a functor to Set × Setop which encodes an invariant of a distribution as a pair of sets. This invariant is well-behaved in the sense that it is a distributive monoidal functor: it acts on both distributions and maps between them, and it preserves both the sum and the tensor product of distributions. The Shannon entropy of the original distribution is then calculated directly from the invariant, i.e. only in terms of the cardinalities of these two sets. Given the many applications of polynomial functors and of Shannon entropy, having this link between them has potential to create useful synergies, e.g. to notions of entropic causality or entropic learning in dynamical systems.
12:00-1:30 — Lunch in Room 5209
1:30-2:15 — Higher entropy, Tom Mainiero, Rutgers New High Energy Theory Center.
Is the frowzy state of your desk no longer as thrilling as it once was? Are numerical measures of information no longer able to satisfy your needs? There is a cure! In this talk we’ll learn about: the secret topological lives of multipartite measures and quantum states; how a homological probe of this geometry reveals correlated random variables; the sly decategorified involvement of Shannon, Tsallis, Réyni, and von Neumann in this larger geometric conspiracy; and the story of how Gelfand, Neumark, and Segal’s construction of von Neumann algebra representations can help us uncover this informatic ruse. So come to this talk, spice up your entropic life, and bring new meaning to your relationship with disarray.
2:30-3:15 — On characterizing classical and quantum entropy, Arthur Parzygnat, Institut des Hautes Études Scientifiques.
In 2011, Baez, Fritz, and Leinster proved that the Shannon entropy can be characterized as a functor by a few simple postulates. In 2014, Baez and Fritz extended this theorem to provide a Bayesian characterization of the classical relative entropy, also known as the Kullback–Leibler divergence. In 2017, Gagné and Panangaden extended the latter result to include standard Borel spaces. In 2020, I generalized the first result on Shannon entropy so that it includes the von Neumann (quantum) entropy. In 2021, I provided partial results indicating that the Umegaki relative entropy may also have a Bayesian characterization. My results in the quantum setting are special applications of the recent theory of quantum Bayesian inference, which is a non-commutative extension of classical Bayesian statistics based on category theory. In this talk, I will give an overview of these developments and their possible applications in quantum information theory.
In Part 1, we went over our definition of thermostatic system: it’s a convex space of states and a concave function saying the entropy of each state. We also gave examples of thermostatic systems.
In Part 2, we talked about what it means to compose thermostatic systems. It amounts to constrained maximization of the total entropy.
In Part 3 we laid down a categorical framework for composing systems when there are choices that have to be made for how the systems are composed. This framework has been around for a long time: operads and operad algebras.
In this post we will bring together all of these parts in a big synthesis to create an operad of all the ways of composing thermostatic systems, along with an operad algebra of thermostatic systems!
Recall that in order to compose thermostatic systems we need to use a ‘parameterized constraint’, a convex subset
where is some other convex set. We end up with a thermostatic system on , with defined by
In order to model this using operads and operad algebras, we will make an operad which has convex sets as its types, and convex relations as its morphisms. Then we will make an operad algebra that assigns to any convex set the set of concave functions
This operad algebra will describe how, given a relation , we can ‘push forward’ entropy functions on to form an entropy function on .
The operad is built using a construction from Part 3 that takes a symmetric monoidal category and produces an operad. The symmetric monoidal category that we start with is which has convex sets as its objects and convex relations as its morphisms. This symmetric monoidal category has (the category of convex sets and convex-linear functions) as a subcategory with all the same objects, and inherits a symmetric monoidal structure from the bigger category
Following the construction from Part 3, we see that we get an operad
exactly as described before: namely it has convex sets as types, and
Next we want to make an operad algebra on . To do this we use a lax symmetric monoidal functor from to defined as follows. On objects, sends any convex set to the set of entropy functions on it:
On morphisms, sends any convex relation to to the map that “pushes forward” an entropy function along that relation:
And finally, the all-important laxator produces an entropy function on by summing an entropy function on and an entropy function on :
The proof that all this indeed defines a lax symmetric monoidal functor can be found in our paper. The main point is that once we have proven this really is a lax symmetric monoidal functor, we can invoke the machinery of lax symmetric monoidal functors and operad algebras to prove that we get an operad algebra! This is very convenient, because proving that we have an operad algebra directly would be somewhat tedious.
We have now reached the technical high point of the paper, which is showing that this operad algebra exists and thus formalizing what it means to compose thermostatic systems. All that remains to do now is to show off a bunch of examples of composition, so that you can see how all this categorical machinery works in practice. In our paper we give many examples, but here let’s consider just one.
Consider the following setup with two ideal gases connected by a movable divider.
The state space of each individual ideal gas is , with coordinates representing energy, volume, and number of particles respectively. Let be the coordinates for the left-hand gas, and be the coordinates for the right-hand gas. Then as the two gases move to thermodynamic equilibrium, the conserved quantities are , , and . We picture this with the following diagram.
Ports on the inner circles represent variables for the ideal gases, and ports on the outer circle represent variables for the composed system. Wires represent relations between those variables. Thus, the entire diagram represents an operation in , given by
We can then use the operad algebra to take entropy functions on the two inner systems (the two ideal gases), and get an entropy function on the outer system.
As a consequence of this entropy maximization procedure, the inner state are such that the temperature and pressure equilibriate between the two ideal gases. This is because constrained maximization with the constraint leads to the following equations at a maximizer:
(where and are the respective temperatures), and
(where and are the respective pressures).
Thus we arrive at the expected conclusion, which is that temperature and pressure equalize when we maximize entropy under constraints on the total energy and volume.
And that concludes this series of blog posts! For more examples of thermostatic composition, I invite you to read our paper, which has some “thermostatic systems” that one does not normally see thought of in this way, such as heat baths and probabilistic systems! And if you find this stuff interesting, don’t hesitate to reach out to me! Just drop a comment here or email me at the address in the paper.
In the previous two posts we talked about what a thermostatic system was, and how we think about composing them. In this post, we are going to back up from thermostatic systems a little bit, and talk about operads: a general framework for composing things! But we will not yet discuss how thermostatic systems use this framework—we’ll do that in the next post.
The basic idea behind this framework is the following. Suppose that we have a bunch of widgets, and we want to compose these widgets. If we are lucky, given two widgets there is a natural way of composing them. This is the case if the widgets are elements of a monoid; we simply use the monoid operation. This is also the case if the widgets are morphisms in a category; if the domain and codomain of two widgets match up, then they can be composed. More generally, n-morphisms in a higher category also have natural ways of composition.
However, there is not always a canonical way of composing widgets. For instance, let be a commutative ring, and let and be elements of Then there are many ways to compose them: we could add them, subtract them, multiply them, etc. In fact any element of the free commutative ring gives a way of composing a pair of elements in a commutative ring. For instance, when applied to and gives Note that there is nothing special here about the fact that we started with two elements of we could start with as many elements of as we liked, say, and any element of would give a ‘way of composing’
The reader familiar with universal algebra should recognize that this situation is very general: we could do the exact same thing with vector spaces, modules, groups, algebras, or any more exotic structures that support a notion of ‘free algebra on variables’.
Let’s also discuss a less algebraic example. A point process on a subset of Euclidean space can be described as an assignment of a -valued random variable to each measurable set that is countably additive under disjoint union of measurable sets.
The interpretation is that a point process gives a random collection of points in and counts how many points fall in Moreover, this collection of points cannot have a limit point; there cannot be infinitely many points in any compact subset of
Now suppose that and are rigid embeddings such that and that is a point process on and is a point process on Then we can define a new point process on (assuming that and are independent) by letting
This is the union of the point process running in and the point process running in
The precise details here are not so important: what I want to display is the intuition that we are geometrically composing things that ‘live on’ a space. The embeddings and give us a way of gluing together a point process on and a point process on to get a point process on We could have picked something else that lives on a space, like a scalar/vector field, but I chose point processes because they are easy to visualize and composing them is fairly simple (when composing vector fields one has to be careful that they ‘match’ at the edges).
In all of the examples in the previous section, we have things that we want to compose, and ways of composing them. This situation is formalized byoperads and operad algebras (which we will define very shortly). However, the confusing part is that the operad part corresponds to'”ways of composing them’, and the operad algebra part corresponds to ‘things we want to compose’. Thus, the mathematics is somewhat ‘flipped’ from the way of thinking that comes naturally; we first think about the ways of composing things, and then we think about what things we want to compose, rather than first thinking about the things we want to compose and only later thinking about the ways of composing them!
Unfortunately, this is the logical way of presenting operads and operad algebras; we must define what an operad is before we can talk about their algebras, even if what we really care about is the algebras. Thus, without further ado, let us define what an operad is.
An operad consists of a collection of types (which are abstract, just like the ‘objects’ in a category are abstract), and for every list of types a collection of operations
These operations are the ‘ways of composing things’, but they themselves can be composed by ‘feeding into’ each other, in the following way.
Suppose that and for each Then we can make an operation
We visualize operads by letting an operation be a circle that can take several inputs and produces a single output. Then composition of operations is given by attaching the output of circles to the input of other circles. Pictured below is the composition of a unary operator a nullary operator and a binary operator with a ternary operator to create a ternary operator
Additionally, for every type there is an ‘identity operation’ that satisfies for any
and for any
There is also an associativity law for composition that is a massive pain to write out explicitly, but is more or less exactly as one would expect. For unary operators it states
The last condition for being an operad is that if
and the symmetric group on elements, then we can apply to to get
We require that if and there are also some conditions for how interacts with composition, which can be straightforwardly derived from the intuition that permutes the arguments of an operation.
Note that our definition of an operad is what might typically be known as a ‘symmetric, colored operad’, but as we will always be using symmetric, colored operads, we choose to simply drop the modifiers.
That was a long definition, so it is time for an example. This example corresponds to the first situation in the first section, where we wanted to compose ring elements.
Define to be an operad with one type, which we will call and let where is with repeated times.
Composition is simply polynomial substitution. That is, if
is the composite of For instance, composing
The reader is invited to supply details for identities and the symmetry operators.
For the other example, define an operad by letting be the set of compact subsets of (we could consider something more exciting, but this works fine and is easy to visualize). An operation consists of disjoint embeddings where
We can visualize such an operation as simply a shape with holes in it.
Composition of such operations is just given by nesting the holes.
The outcome of the above composition is given by simply taking away the intermediate shapes (i.e. the big circle and the triangle).
Another source of examples for operads comes from the following construction. Suppose that is a symmetric monoidal category. Define by letting
where is the collection of objects in and
To compose operations and (assuming that the types are such that these are composable), we simply take Moreover, the identity operation is simply the identity morphism, and the action of is given by the symmetric monoidal structure.
In fact, the second example that we talked about is an example of this construction! If we let be the category where the objects are compact subsets of with embeddings as the morphisms, and let the symmetric monoidal product be disjoint union, then it is not too hard to show that the operad we end up with is the same as the one we described above.
Perhaps the most important example of this construction is when it is applied to because this is important in the next section! This operad has as types, sets, and an operation
is simply a function
Although ‘operad algebra’ is the name that has stuck in the literature, I think a better term would be ‘operad action’, because the analogy to keep in mind is that of a group action. A group action allows a group to ‘act on’ elements of a set; an operad algebra similarly allows an operad to ‘act on’ elements of a set.
Moreover, a group action can be described as a functor from the 1-element category representing that group to and as we will see, an operad algebra can also be described as an ‘operad morphism’ from the operad to the operad just described in the last section.
In fact, this is how we will define an operad algebra; first we will define what an operad morphism is, and then we will define an operad algebra as an operad morphism to
An operad morphism from an operad to an operad is exactly what one would expect: it consists of
• For every a map
such that commutes with all of the things an operad does, i.e. composition, identities, and the action of
Thus an operad morphism from to also known as an operad algebra, consists of
• A set for every
• A function for every operation
such that the assignment of sets and functions preserves identities, composition, and the action of
Without further ado, let’s look at the examples. From any ring we can produce an algebra of We let (considered as a set), and for
We can also make an operad algebra of point processes, for For we let be the set of point processes on If is an embedding, then we let be the map that sends point processes on respectively to the point process defined by
Finally, if is a symmetric monoidal category, there is a way to make an operad algebra of from a special type of functor This is convenient, because it is often easier to prove that the functor satisfies the necessary properties than it is to prove that the algebra is in fact well-formed.
The special kind of functor we need is a lax symmetric monoidal functor. This is a functor equipped with a natural transformation that is well-behaved with respect to the associator, identity, and symmetric structure of We call the laxator, and formally speaking, a lax symmetric monoidal functor consists of a functor along with a laxator.
I won’t go into detail about the whole construction that makes an operad algebra out of a lax symmetric monoidal functor, but the basic idea is that given an operation (which is a morphism ), we can construct a function by composing
This basic idea can be extended using associativity to produce a function from an operation
As an example of this construction, consider point processes again. We can make a lax symmetric monoidal functor by sending a set to the set of point processes on and an embedding to the map that sends a point process to a point process defined by
The laxator sends a point process on and a point process on to a point process on a defined by
The reader should inspect this definition and think about why it is equivalent to the earlier definition for the operad algebra of point processes.
This was a long post, so I’m going to try and go over the main points so that you can organize what you just learned in some sort of coherent fashion.
First I talked about how there frequently arises situations in which there isn’t a canonical way of ‘composing’ two things. The two examples that I gave were elements of a ring, and structures on spaces, specifically point processes.
I then talked about the formal way that we think about these situations. Namely, we organize the ‘ways of composing things’ into an operad, and then we organize the ‘things that we want to compose’ into an operad algebra. Along the way, I discussed a convenient way of making an operad out of a symmetric monoidal category, and an operad algebra out of a lax symmetric monoidal functor.
This construction will be important in the next post, when we make an operad of ‘ways of composing thermostatic systems’ and an operad algebra of thermostatic systems to go along with it.
See all four parts of this series:
• Part 1: thermostatic systems and convex sets.
• Part 2: composing thermostatic systems.
• Part 3: operads and their algebras.
• Part 4: the operad for composing thermostatic systems.
and he gave an overview of what a ‘thermostatic system’ is.
In this post, I want to talk about how to compose thermostatic systems. We will not yet use category theory, saving that for another post; instead we will give a ‘nuts-and-bolts’ approach, based on examples.
Suppose that we have two thermostatic systems and we put them in thermal contact, so that they can exchange heat energy. Then we predict that their temperatures should equalize. What does this mean precisely, and how do we derive this result?
Recall that a thermostatic system is given by a convex space and a concave entropy function A ‘tank’ of constant heat capacity, whose state is solely determined by its energy, has state space and entropy function where is the heat capacity.
Now suppose that we have two tanks of heat capacity and respectively. As thermostatic systems, the state of both tanks is described by two energy variables, and and we have entropy functions
By conservation of energy, the total energy of both tanks must remain constant, so
for some equivalently
The equilibrium state then has maximal total entropy subject to this constraint. That is, an equilibrium state must satisfy
We can now derive the condition of equal temperature from this condition. In thermodynamics, temperature is defined by
The interested reader should calculate this for our entropy functions, and in doing this, see why we identify with the heat capacity. Now, manipulating the condition of equilibrium, we get
As a function of the right hand side of this equation must have derivative equal to Thus,
Now, note that if then
Thus, the condition of equilibrium is
Using the fact that
the above equation reduces to
so we have our expected condition of temperature equilibriation!
The result of composing several thermostatic systems should be a new thermostatic system. In the case above, the new thermostatic system is described by a single variable: the total energy of the system The entropy function of this new thermostatic system is given by the constrained supremum:
The reader should verify that this ends up being the same as a system with heat capacity i.e. with entropy function given by
A very similar argument goes through when one has two systems that can exchange both heat and volume; both temperature and pressure are equalized as a consequence of entropy maximization. We end up with a system that is parameterized by total energy and total volume, and has an entropy function that is a function of those quantities.
The general procedure is the following. Suppose that we have thermostatic systems, Let be a convex space, that we think of as describing the quantities that are conserved when we compose the thermostatic systems (i.e., total energy, total volume, etc.). Each value of the conserved quantities corresponds to many different possible values for We represent this with a relation
We then turn into a thermostatic system by using the entropy function
It turns out that if we require to be a convex relation (that is, a convex subspace of ) then as defined above ends up being a concave function, so is a true thermostatic system.
We will have to wait until a later post in the series to see exactly how we describe this procedure using category theory. For now, however, I want to talk about why this procedure makes sense.
In the statistical mechanical interpretation, entropy is related to the probability of observing a specific macrostate. As we scale the system, the theory of large deviations tells us that seeing any macrostate other than the most probable macrostate is highly unlikely. Thus, we can find the macrostate that we will observe in practice by finding the entropy maxima. For an exposition of this point of view, see this paper:
In each of these viewpoints, however, the maximization of entropy is not global, but rather constrained. The dynamical system only maximizes entropy along its orbit, and the statistical mechanical system maximizes entropy with respect to constraints on the probability distribution.
We can think of thermostatics as a ‘common refinement’ of both of these points of view. We are agnostic as to the mechanism by which constrained maximization of entropy takes place and we are simply interested in investigating its consequences. We expect that a careful formalization of either system should end up deriving something similar to our thermostatic theory in the limit.
The Kepler problem is the study of a particle moving in an attractive inverse square force. In classical mechanics, this problem shows up when you study the motion of a planet around the Sun in the Solar System. In quantum mechanics, it shows up when you study the motion of an electron around a proton in a hydrogen atom.
In Part 2 we saw that the classical Kepler problem has, besides energy and the three components of angular momentum, three more conserved quantities: the components of the eccentricity vector!
This was discovered long ago, in 1710, by the physicist Jakob Hermann. But thanks to Noether, we now know that in classical mechanics, conserved quantities come from symmetries. In the Kepler problem, conservation of energy comes from time translation symmetry, while conservation of the angular momentum comes from rotation symmetry. Which symmetries give conservation of the eccentricity vector?
As we shall see, these symmetries are rotations in 4-dimensional space. These include the obvious rotations in 3-dimensional space which give angular momentum. The other 4-dimensional rotations act in a much less obvious way, and give the eccentricity vector.
In fact, we’ll see that the Kepler problem can be rephrased in terms of a free particle moving around on a sphere in 4-dimensional space. This is a nice explanation of the 4-dimensional rotation symmetry.
After that we’ll see a second way to rephrase the Kepler problem: in terms of a massless, relativistic free particle moving at the speed of light on a sphere in 4-dimensional space. Our first formulation will not involve relativity. This second will.
All this is very nice. You can read some fun explanations of the first formulation here:
But how could you guess this 4-dimensional rotation symmetry if you didn’t know about it already? One systematic approach uses Poisson brackets. I won’t explain these, just dive in and use them!
Remember, the particle in the Kepler problem has various observables, which are all ultimately functions of its position and momentum:
• angular momentum:
• the eccentricity vector:
I’ll use conventions where the Poisson brackets of the components of position and momentum are taken to be
From this, using the rules for Poisson brackets, we can calculate the Poisson brackets of everything else. For starters:
These equations are utterly unsurprising, since they are equivalent to saying that angular momentum and the eccentricity vector are conserved. More interestingly, we have
where all the indices go from 1 to 3, I’m summing over repeated indices even if they’re both subscripts, and are the Levi–Civita symbols.
Now, the factor of above is annoying. But on the region of phase space where —that is, the space of bound states, where the particle carries out an elliptical orbit—we can define a new vector to deal with this annoyance:
Now we easily get
This is nicer, but we can simplify it even more if we introduce some new vectors that are linear combinations of and namely half their sum and half their difference:
Then we get
So, the observables and contain the same information as the angular momentum and eccentricity vectors, but now they commute with each other!
What does this mean?
Well, when you’re first learning math the Levi–Civita symbols may seem like just a way to summarize the funny rules for cross products in 3-dimensional space. But as you proceed, you ultimately learn that with its cross product is the Lie algebra of the Lie group of rotations in 3-dimensional space. From this viewpoint, the Levi–Civita symbols are nothing but the structure constants for the Lie algebra that is, a way of describing the bracket operation in this Lie algebra in terms of basis vectors.
So, what we’ve got here are two commuting copies of one having the as a basis and the other having the as a basis, both with the Poisson bracket as their Lie bracket.
A better way to say the same thing is that we’ve got a single 6-dimensional Lie algebra
having both the and as basis. But then comes the miracle:
The easiest way to see this is to realize that the unit sphere in 4 dimensions, is itself a Lie group with Lie algebra isomorphic to Namely, it’s the unit quaternions!—or if you prefer, the Lie group Like any Lie group it acts on itself via left and right translations, which commute. But these are actually ways of rotating So, you get a map of Lie algebras from to and you can check that this is an isomorphism.
So in this approach, the 4th dimension pops out of the fact that the Kepler problem has conserved quantities that give two commuting copies of By Noether’s theorem, it follows that conservation of angular momentum and the eccentricity vector must come from a hidden symmetry: symmetry under some group whose Lie algebra is
And indeed, it turns out that the group acts on the bound states of the Kepler problem in a way that commutes with time evolution!
But how can we understand this fact?
Historically, it seems that the first explanation was found in the quantum-mechanical context. In 1926, even before Schrödinger came up with his famous equation, Pauli used conservation of angular momentum and the eccentricity to determine the spectrum of hydrogen. But I believe he was using what we now call Lie algebra methods, not bringing in the group
In 1935, Vladimir Fock, famous for the ‘Fock space’ in quantum field theory, explained this 4-dimensional rotation symmetry by setting up an equivalence between hydrogen atom bound states and functions on the 3-sphere! In the following year, Valentine Bargmann, later famous for being Einstein’s assistant, connected Pauli and Fock’s work using group representation theory.
All this is quantum mechanics. It seems the first global discussion of this symmetry in the classical context was given by Bacry, Ruegg, and Souriau in 1966, leading to important work by Souriau and Moser in the early 1970s. Since then, much more has been done. You can learn about a lot of it from these two books, which are my constant companions these days:
• Victor Guillemin and Shlomo Sternberg, Variations on a Theme by Kepler, Providence, R.I., American Mathematical Society, 1990.
• Bruno Cordani, The Kepler Problem: Group Theoretical Aspects, Regularization and Quantization, with Application to the Study of Pertubation, Birkhäuser, Boston, 2002.
But let me try to summarize a bit of this material.
One way to understand the symmetry for bound states of the Kepler problem is the result of Hamilton that I explained last time: for a particle moving around an elliptical orbit in the Kepler problem, its momentum moves round and round in a circle.
I’ll call these circles Hamilton’s circles. Hamilton’s circles are not arbitrary circles in . Using the inverse of stereographic projection, we can map to the unit 3-sphere:
This map sends Hamilton’s circles in to great circles in Furthermore, this construction gives all the great circles in except those that go through the north and south poles, These missing great circles correspond to periodic orbits in the Kepler problem where a particle starts with momentum zero, falls straight to the origin, and bounces back the way it came. If we include these degenerate orbits, every great circle on the unit 3-sphere is the path traced out by the momentum in some solution of the Kepler problem.
Let me reemphasize: in this picture, points of correspond not to positions but to momenta in the Kepler problem. As time passes, these points move along great circles in but not at constant speed.
How is their dynamics related to geodesic motion on the 3-sphere?
We can understand this as follows. In Part 2 we saw that
and using the fact that an easy calculation gives
In the 3-sphere picture, the observables become functions on the cotangent bundle . These functions are just the components of momentum for a particle on , defined using a standard basis of right-invariant vector fields on Similarly, the observables are the components of momentum using a standard basis of left-invariant vector fields. It follows that
is the Hamiltonian for a nonrelativistic free particle on with an appropriately chosen mass. Such a particle moves around a great circle on at constant speed. Since the Kepler Hamiltonian is a function of , particles governed by this Hamiltonian move along the same trajectories—but typically not at constant speed!
Both and the Kepler Hamiltonian are well-defined smooth functions on the symplectic manifold that Souriau dubbed the Kepler manifold:
This is the cotangent bundle of the 3-sphere with the zero cotangent vectors removed, so that is well-defined.
All this is great. But even better, there’s yet another picture of what’s going on, which brings relativity into the game!
We can also think of as a space of null geodesics in the Einstein universe: the manifold with the Lorentzian metric
where is the usual Riemannian metric on the real line (‘time’) and is the usual metric on the unit sphere (‘space’). In this picture describes the geodesic’s position at time zero, while the null cotangent vector describes its 4-momentum at time zero. Beware: in this picture two geodesics count as distinct if we rescale by any positive factor other than 1. But this is good: physically, it reflects the fact that in relativity, massless particles can have different 4-momentum even if they trace out the same path in spacetime.
In short, the Kepler manifold also serves as the classical phase space for a free massless spin-0 particle in the Einstein universe!
And here’s the cool part: the Hamiltonian for such a particle is
So it’s a function of both the Hamiltonians we’ve seen before. Thus, time evolution given by this Hamiltonian carries particles around great circles on the 3-sphere… at constant speed, but at a different speed than the nonrelativistic free particle described by the Hamiltonian
In future episodes, I want to quantize this whole story. We’ll get some interesting outlooks on the quantum mechanics of the hydrogen atom.
The Kepler problem studies a particle moving in an inverse square force, like a planet orbiting the Sun. Last time I talked about an extra conserved quantity associated to this problem, which keeps elliptical orbits from precessing or changing shape. This extra conserved quantity is sometimes called the Laplace–Runge–Lenz vector, but since it was first discovered by none of these people, I prefer to call it the ‘eccentricity vector’
In 1847, Hamilton noticed a fascinating consequence of this extra conservation law. For a particle moving in an inverse square force, its momentum moves along a circle!
Greg Egan has given a beautiful geometrical argument for this fact:
I will not try to outdo him; instead I’ll follow a more dry, calculational approach. One reason is that I’m trying to amass a little arsenal of formulas connected to the Kepler problem.
Let’s dive in. Remember from last time: we’re studying a particle whose position obeys
Its momentum is
Its momentum is not conserved. Its conserved quantities are energy:
the angular momentum vector:
and the eccentricity vector:
Now for the cool part: we can show that
Thus, the momentum stays on a circle of radius centered at the point And since and are conserved, this circle doesn’t change! Let’s call it Hamilton’s circle.
Now let’s actually do the calculations needed to show that the momentum stays on Hamilton’s circle. Since
Taking the dot product of this vector with itself, which is 1, we get
Now, notice that and are orthogonal since Thus
I actually used this fact and explained it in more detail last time. Substituting this in, we get
Similarly, and are orthogonal! After all,
The first term is orthogonal to since it’s the cross product of and some other vector. And the second term is orthogonal to since is the cross product of and some other vector! So, we have
Substituting this in, we get
Using the cyclic property of the scalar triple product, we can rewrite this as
This is nicer because it involves in two places. If we divide both sides by we get
And now for the final flourish! The right hand is the dot product of a vector with itself:
This is the equation for Hamilton’s circle!
Now, beware: the momentum doesn’t usually move at a constant rate along Hamilton’s circle, since that would force the particle’s orbit to itself be circular.
But on the bright side, the momentum moves along Hamilton’s circle regardless of whether the particle’s orbit is elliptical, parabolic or hyperbolic. And we can easily distinguish the three cases using Hamilton’s circle!
After all, the center of Hamilton’s circle is the point and
so the distance of this center from the origin is
On the other hand, the radius of Hamilton’s circle is So his circle encloses the origin, goes through the origin or does not enclose the origin depending on whether or But we saw last time that these three cases correspond to elliptical, parabolic and hyperbolic orbits!
• If the particle’s orbit is an ellipse and the origin lies inside Hamilton’s circle. The momentum goes round and round Hamilton’s circle as time passes.
• If the particle’s orbit is a parabola and the origin lies exactly on Hamilton’s circle. The particle’s momentum approaches zero as time approaches so its momentum goes around Hamilton’s circle exactly once as time passes.
• If the particle’s orbit is a hyperbola and the origin lies outside Hamilton’s circle. The particle’s momentum approaches distinct nonzero values as time approaches so its momentum goes around just a portion of Hamilton’s circle.
By the way, in general the curve traced out by the momentum vector of a particle is called a hodograph. So you can learn more about Hamilton’s circle with the help of that buzzword.
As a blackbody gets hotter and hotter, its spectrum approaches the classical Rayleigh–Jeans law. That is, its true spectrum as given by the Planck law approaches the classical prediction over a larger and larger range of frequencies.
So, for an extremely hot blackbody, the spectrum of light we can actually see with our eyes is governed by the Rayleigh–Jeans law. This law says the color doesn’t depend on the temperature: only the brightness does!
And this color is shown above.
This involves human perception, not just straight physics. So David Madore needed to work out the response of the human eye to the Rayleigh–Jeans spectrum — “by integrating the spectrum against the CIE XYZ matching functions and using the definition of the sRGB color space.”
The color he got is sRGB(148,177,255). And according to the experts who sip latte all day and make up names for colors, this color is called ‘Perano’.
Here is some background material Madore wrote on colors and visual perception. It doesn’t include the whole calculation that leads to this particular color, so somebody should check it, but it should help you understand how to convert the blackbody spectrum at a particular temperature into an sRGB color:
I’m working on a math project involving the periodic table of elements and the Kepler problem—that is, the problem of a particle moving in an inverse square force law. That’s one reason I’ve been blogging about chemistry lately! I hope to tell you all about this project sometime—but right now I just want to say some very basic stuff about the ‘eccentricity vector’.
This vector is a conserved quantity for the Kepler problem. It was named the ‘Runge–Lenz vector’ after Lenz used it in 1924 to study the hydrogen atom in the framework of the ‘old quantum mechanics’ of Bohr and Sommerfeld: Lenz cite Runge’s popular German textbook on vector analysis from 1919, which explains this vector. But Runge never claimed any originality: he attributed this vector to Gibbs, who wrote about it in his book on vector analysis in 1901!
Nowadays many people call it the ‘Laplace–Runge–Lenz vector’, honoring Laplace’s discussion of it in his famous treatise on celestial mechaics in 1799. But in fact this vector goes back at least to Jakob Hermann, who wrote about it in 1710, triggering further work on this topic by Johann Bernoulli in the same year.
Nobody has seen signs of this vector in work before Hermann. So, we might call it the Hermann–Bernoulli–Laplace–Gibbs–Runge–Lenz vector, or just the Hermann vector. But I prefer to call it the eccentricity vector, because for a particle in an inverse square law its magnitude is the eccentricity of that orbit!
Let’s suppose we have a particle whose position obeys this version of the inverse square force law:
where I remove the arrow from a vector when I want to talk about its magnitude. So, I’m setting the mass of this particle equal to 1, along with the constant saying the strength of the force. That’s because I want to keep the formulas clean! With these conventions, the momentum of the particle is
For this system it’s well-known that the following energy is conserved:
as well as the angular momentum vector:
But the interesting thing for me today is the eccentricity vector:
Let’s check that it’s conserved! Taking its time derivative,
But angular momentum is conserved so the second term vanishes, and
so we get
But the inverse square force law says
How can we see that this vanishes? Mind you, there are various geometrical ways to think about this, but today I’m in the mood for checking that my skills in vector algebra are sufficient for a brute-force proof—and I want to record this proof so I can see it later!
To get anywhere we need to deal with the cross product in the above formula:
I could have fun talking about why this is true, but I won’t now! I’ll just use it:
and plug this into our formula
But look—everything cancels! So
and the eccentricity vector is conserved!
So, it seems that the inverse square force law has 7 conserved quantities: the energy the 3 components of the angular momentum and the 3 components of the eccentricity vector . But they can’t all be independent, since the particle only has 6 degrees of freedom: 3 for position and 3 for momentum. There can be at most 5 independent conserved quantities, since something has to change. So there have to be at least two relations betwen the conserved quantities we’ve found.
The first of these relations is pretty obvious: and are at right angles, so
But wait, why are they at right angles? Because
The first term is orthogonal to because it’s a cross product of and the second is orthogonal to because is a cross product of and .
The second relation is a lot less obvious, but also more interesting. Let’s take the dot product of with itself:
which is our second relation between conserved quantities for the Kepler problem!
This relation makes a lot of sense if you know that is the eccentricity of the orbit. Then it implies:
• if then and the orbit is a hyperbola.
• if then and the orbit is a parabola.
• if then and the orbit is an ellipse (or circle).
But why is the eccentricity? And why does the particle move in a hyperbola, parabola or ellipse in the first place? We can show both of these things by taking the dot product of and
Using the cyclic property of the scalar triple product we can rewrite this as
Now, we know that moves in the plane orthogonal to . In this plane, which contains the vector the equation defines a conic of eccentricity . I won’t show this from scratch, but it may seem more familiar if we rotate the whole situation so this plane is the plane and points in the direction. Then in polar coordinates this equation says
This is well-known, at least among students of physics who have solved the Kepler problem, to be the equation of a conic of eccentricity .
Another thing that’s good to do is define a rescaled eccentricity vector. In the case of elliptical orbits, where we define this by
Then we can take our relation
and rewrite it as
and then divide by getting
This suggests an interesting similarity between and which turns out to be very important in a deeper understanding of the Kepler problem. And with more work, you can use this idea to show that is the Hamiltonian for a free particle on the 3-sphere. But more about that some other time, I hope!
Light can bounce off light by exchanging virtual charged particles! This gives nonlinear corrections to Maxwell’s equations, even in the vacuum—but they’re only noticeable when the electric field is about 1018 volts/meter or more. This is an enormous electric field, able to accelerate a proton from rest to Large Hadron Collider energies in just 5 micrometers!
Direct evidence for light-by-light scattering at high energy had proven elusive for decades, until the Large Hadron Collider (LHC) began its second data-taking period (Run 2). Collisions of lead ions in the LHC provide a uniquely clean environment to study light-by-light scattering. Bunches of lead ions that are accelerated to very high energy are surrounded by an enormous flux of photons. Indeed, the coherent action from the large number of 82 protons in a lead atom with all the electrons stripped off (as is the case for the lead ions in the LHC) give rise to an electromagnetic field of up to 1025 volts per metre. When two lead ions pass close by each other at the centre of the ATLAS detector, but at a distance greater than twice the lead ion radius, those photons can still interact and scatter off one another without any further interaction between the lead ions, as the reach of the (much stronger) strong force is bound to the radius of a single proton. These interactions are known as ultra-peripheral collisions.
But now people want to see photon-photon scattering by shooting lasers at each other! One place they’ll try this is at the Extreme Light Infrastructure.
In 2019, a laser at the Extreme Light Infrastructure in Romania achieved a power of 10 petawatts for brief pulses — listen to the announcement for what means!
I think it reached an intensity of 1029 watts per square meter, but I’m not sure. If you know the intensity in watts/square mete of a plane wave of light, you can compute the maximum strength of its electric field (in volts/meter) by
where is the permittivity of the vacuum and is the speed of light. According to Dominik Wild, = 1029 watts per square meter gives 1016 volts/meter. If so, this is about 1/100 the field strength needed to see strong nonlinear corrections to Maxwell’s equations.
In China, the Station of Extreme Light plans to build a laser that makes brief pulses of 100 petawatts. That’s 10,000 times the power of all the world’s electrical grids combined—for a very short time! They’re aiming for an intensity of 1028 watts/square meter:
The modification of Maxwell’s equations due to virtual particles was worked out by Heisenberg and Euler in 1936. (No, not that Euler.) They’re easiest to describe using a Lagrangian, but if we wrote out the equations we’d get Maxwell’s equations plus extra terms that are cubic in and .
The Schwinger effect is when a very large static electric field ‘sparks the vacuum’ and creates real particles. This may put an upper limit on many protons can be in an atomic nucleus, spelling an end to the periodic table.
Because they’re the first whose electron wavefunctions are described by quadratic functions of and — not just linear or constant. These are called ‘d orbitals’, and they look sort of like this:
More precisely: the wavefunctions of electrons in atoms depend on the distance from the nucleus and also the angles The angular dependence is described by ‘spherical harmonics’, certain functions on the sphere. These are gotten by taking certain polynomials in and restricting them to the unit sphere. Chemists have their own jargon for this:
• constant polynomial: s orbital
• linear polynomial: p orbital
• quadratic polynomial: d orbital
• cubic polynomial: f orbital
and so on.
To be even more precise, a spherical harmonic is an
eigenfunction of the Laplacian on the sphere. Any such function is the restriction to the sphere of some homogeneous polynomial in whose Laplacian in 3d space is zero. This polynomial can be constant, linear, etc.
The dimension of the space of spherical harmonics goes like 1, 3, 5, 7,… as we increase the degree of the polynomial starting from 0:
etcetera. So, we get one s orbital, three p orbitals, five d orbitals and so on. Here I’ve arbitrarily chosen a basis of the space of quadratic polynomials with vanishing Laplacian, and I’m not claiming this matches the d orbitals in the pictures!
The transition metals are the first to use the d orbitals. This is why they’re so different than lighter elements.
Although there are 5 d orbitals, an electron occupying such an orbital can have spin up or down. This is why there are 10 transition metals per row!
This chart doesn’t show the last row of highly radioactive transition metals, just the ones you’re likely to see:
Look: 10 per row, all because there’s a 5d space of quadratic polynomials in with vanishing Laplacian. Math becomes matter.
The Madelung rules
Can we understand why the first transition element, scandium, has 21 electrons? Yes, if we’re willing to use the ‘Madelung rules’ explained last time. Let me review them rapidly here.
You’ll notice this chart has axes called and
As I just explained, the angular dependence of an orbital is determined by a homogeneous polynomial with vanishing Laplacian. In the above chart, the degree of this polynomial is called The space of such polynomials has dimension
But an orbital has an additional radial dependence, described using a number called The math, which I won’t go into, requires that That gives the above chart its roughly triangular appearance.
The letters s, p, d, f are just chemistry jargon for
Thanks to spin and the Pauli exclusion principle, we can pack at most electrons into the orbitals with a given choice of and This bunch of orbitals is called a ‘subshell’.
The Madelung rules say the order in which subshells get filled:
Electrons are assigned to subshells in order of increasing values of .
For subshells with the same value of , electrons are assigned first to the subshell with lower
So let’s see what happens. Only when we hit will we get transition metals!
This is called the 1s subshell, and we can put 2 electrons in here. First we get hydrogen with 1 electron, then helium with 2. At this point all the subshells are full, so the ‘1st shell’ is complete, and helium is called a ‘noble gas’.
This is called the 2s subshell, and we can put 2 more electrons in here. We get lithium with 3 electrons, and then beryllium with 4.
This is called the 2p subshell, and we can put 6 more electrons in here. We get:
◦ boron with 5 electrons,
◦ carbon with 6,
◦ nitrogen with 7,
◦ oxygen with 8,
◦ fluorine with 9,
◦ neon with 10.
At this point all the subshells are full, so the 2nd shell is complete and neon is another noble gas.
This is is called the 3s subshell, and we can put 2 more electrons in here. We get sodium with 11 electrons, and magnesium with 12.
This is called the 4p subshell, and we can put 6 more electrons in here. We get:
◦ aluminum with 13 electrons,
◦ silicon with 14,
◦ phosphorus with 15,
◦ sulfur with 16,
◦ chlorine with 17,
◦ argon with 18.
At this point all the subshells are full, so the 3rd shell is complete and argon is another noble gas.
This is called the 4s subshell, and we can put 2 more electrons in here. We get potassium with 19 electrons and calcium with 20.
This is called the 3d subshell, and we can put 10 electrons in here. Since now we’ve finally hit and thus a d subshell, these are transition metals! We get:
◦ scandium with 21 electrons,
◦ titanium with 22,
◦ vanadium with 23,
◦ chromium with 24,
◦ manganese with 25,
◦ iron with 26,
◦ cobalt with 27,
◦ nickel with 28,
◦ copper with 29,
◦ zinc with 30.
And the story continues—but at least we’ve seen why the first batch of transition elements starts where it does!
But it’s important to realize that he’s attacking a version of the Madelung rules that is different, and stronger than the version stated above. My version only concerned atoms, not ions. The stronger version claims that you can use the Madelung rules not only to determine the ground state of an atom, but also those of the positive ions obtained by taking that atom and removing some electrons!
This stronger version breaks down if you consider scandium with one electron removed. As we’ve just seen, scandium has the electrons as in argon together with three more: two in the 4s orbital and one in the 3d orbital. This conforms to the Madelung rules.
But when you ionize scandium and remove one electron, it’s not the 3d electron that leaves—it’s one of the 4s electrons! This breaks the stronger version of the Madelung rules.
The weaker version of the Madelung rules also breaks down, but later in the transition metals. The first problem is with chromium, the second is with copper:
By the Madelung rules, chromium should have 2 electrons in the 4s shell and 4 in the 3d shell. But in fact it has just 1 in the 4s and 5 in the 3d.
The second is with copper. By the Madelung rules, this should have 2 electrons in the 4s shell and 9 in the 3d. But in fact it has just 1 in the 4s and 10 in the 3d.
There are also other breakdowns in heavier transition metals, listed here:
These subtleties can only be understood by digging a lot deeper into how the electrons in an atom interact with each other. That’s above my pay grade right now. If you know a good place to learn more about this, let me know! I’m only interested in atoms here, not molecules.
Oxidation states of transition metals
Transition metals get some of their special properties because the electrons in the d subshell are easily removed. For example, this is why the transition metals conduct electricity.
Also, when reacting chemically with other elements, they lose different numbers of electrons. The different possibilities are called ‘oxidation states’.
For example, scandium has all the electrons of argon (Ar) plus two in an s orbital and one in a d orbital. It can easily lose 3 electrons, giving an oxidation state called Sc3+. Titanium has one more electron, so it can lose 4 and form Ti4+. And so on:
This accounts for the most obvious pattern in the chart below: the diagonal lines sloping up.
The red dots are common oxidation states, while the white dots are rarer oxidation states. For example iron (Fe) can lose 2 electrons, 3 electrons, 4 electrons (more rarely), 5 electrons, or 6 electrons (more rarely).
The diagonal lines sloping up come from the simple fact that as we move through a group of transition metals, there are more and more electrons in the d subshell, so more can be easily be removed. But everything is complicated by the fact that electrons interact! So the trend doesn’t go on forever: manganese gives up 8 electrons but iron doesn’t easily give up 8, only at most 6. And there’s much more going on, too.
Note also that the two charts above don’t actually agree: the chart in color includes more rare oxidation states.
The colored chart of oxidation states in this post is from Wikicommons,
made by Felix Wan, corrected to include the two most common oxidation
states of ruthenium. The black-and-white chart is from the Chemistry
You need the word 'latex' right after the first dollar sign, and it needs a space after it. Double dollar signs don't work, and other limitations apply, some described here. You can't preview comments here, but I'm happy to fix errors.