Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

There will be a workshop on the categorical semantics of entropy at the CUNY Grad Center in Manhattan on Friday May 13th, organized by John Terilla. I was kindly invited to give an online tutorial beforehand on May 11, which I will give remotely to save carbon. Tai-Danae Bradley will also be giving a tutorial that day in person:

• Tutorial: Categorical Semantics of Entropy, Wednesday 11 May 2022, 13:00–16:30 Eastern Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

12:00-1:00 Eastern Daylight Time — Lunch in Room 5209.

1:00-2:30 — Shannon entropy from category theory, John Baez, University of California Riverside; Centre for Quantum Technologies (Singapore); Topos Institute.

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

2:30-3:00 — Coffee break.

3:00-4:30 — Operads and entropy, Tai-Danae Bradley, The Master’s University; Sandbox AQ.

This talk will open with a basic introduction to operads and their representations, with the main example being the operad of probabilities. I’ll then give a light sketch of how this framework leads to a small, but interesting, connection between information theory, abstract algebra, and topology, namely a correspondence between Shannon entropy and derivations of the operad of probabilities.

• Symposium on Categorical Semantics of Entropy, Friday 13 May 2022, 9:30-3:15 Eastern Daylight Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

9:30-10:00 Eastern Daylight Time — Coffee and pastries in Room 5209.

The maximum entropy principle is a fascinating and productive lens with which to view both thermodynamics and statistical mechanics. In this talk, we present a categorification of the maximum entropy principle, using convex spaces and operads. Along the way, we will discuss a variety of examples of the maximum entropy principle and show how each application can be captured using our framework. This approach shines a new light on old constructions. For instance, we will show how we can derive the canonical ensemble by attaching a probabilistic system to a heat bath. Finally, our approach to this categorification has applications beyond the maximum entropy principle, and we will give an hint of how to adapt this categorification to the formalization of the composition of other systems.

11:00-11:45 — Polynomial functors and Shannon entropy, David Spivak, MIT and the Topos Institute.

The category Poly of polynomial functors in one variable is extremely rich, brimming with categorical gadgets (e.g. eight monoidal products, two closures, limits, colimits, etc.) and applications including dynamical systems, databases, open games, and cellular automata. In this talk I’ll show that objects in Poly can be understood as empirical distributions. In part using the standard derivative of polynomials, we obtain a functor to Set × Set^{op} which encodes an invariant of a distribution as a pair of sets. This invariant is well-behaved in the sense that it is a distributive monoidal functor: it acts on both distributions and maps between them, and it preserves both the sum and the tensor product of distributions. The Shannon entropy of the original distribution is then calculated directly from the invariant, i.e. only in terms of the cardinalities of these two sets. Given the many applications of polynomial functors and of Shannon entropy, having this link between them has potential to create useful synergies, e.g. to notions of entropic causality or entropic learning in dynamical systems.

12:00-1:30 — Lunch in Room 5209

1:30-2:15 — Higher entropy, Tom Mainiero, Rutgers New High Energy Theory Center.

Is the frowzy state of your desk no longer as thrilling as it once was? Are numerical measures of information no longer able to satisfy your needs? There is a cure! In this talk we’ll learn about: the secret topological lives of multipartite measures and quantum states; how a homological probe of this geometry reveals correlated random variables; the sly decategorified involvement of Shannon, Tsallis, Réyni, and von Neumann in this larger geometric conspiracy; and the story of how Gelfand, Neumark, and Segal’s construction of von Neumann algebra representations can help us uncover this informatic ruse. So come to this talk, spice up your entropic life, and bring new meaning to your relationship with disarray.

2:30-3:15 — On characterizing classical and quantum entropy, Arthur Parzygnat, Institut des Hautes Études Scientifiques.

In 2011, Baez, Fritz, and Leinster proved that the Shannon entropy can be characterized as a functor by a few simple postulates. In 2014, Baez and Fritz extended this theorem to provide a Bayesian characterization of the classical relative entropy, also known as the Kullback–Leibler divergence. In 2017, Gagné and Panangaden extended the latter result to include standard Borel spaces. In 2020, I generalized the first result on Shannon entropy so that it includes the von Neumann (quantum) entropy. In 2021, I provided partial results indicating that the Umegaki relative entropy may also have a Bayesian characterization. My results in the quantum setting are special applications of the recent theory of quantum Bayesian inference, which is a non-commutative extension of classical Bayesian statistics based on category theory. In this talk, I will give an overview of these developments and their possible applications in quantum information theory.

The Fifth International Conference on Applied Category Theory, ACT2022, will take place at the University of Strathclyde from 18 to 22 July 2022, preceded by the Adjoint School 2022 from 11 to 15 July. This conference follows previous events at Cambridge (UK), Cambridge (MA), Oxford and Leiden.

Applied category theory is important to a growing community of researchers who study computer science, logic, engineering, physics, biology, chemistry, social sciences, linguistics and other subjects using category-theoretic tools. The background and experience of our members is as varied as the systems being studied. The goal of the Applied Category Theory conference series is to bring researchers together, strengthen the applied category theory community, disseminate the latest results, and facilitate further development of the field.

Submissions

We accept submissions in English of original research papers, talks about work accepted/submitted/published elsewhere, and demonstrations of relevant software. Accepted original research papers will be published in a proceedings volume. The keynote addresses will be chosen from the accepted papers. The conference will include an industry showcase event and community meeting. We particularly encourage people from underrepresented groups to submit their work and the organizers are committed to non-discrimination, equity, and inclusion.

Submission formats

Extended Abstracts should be submitted describing the contribution and providing a basis for determining the topics and quality of the anticipated presentation (1-2 pages). These submissions will be adjudicated for inclusion as a talk at the conference. Such work should include references to any longer papers, preprints, or manuscripts providing additional details.

Conference Papers should present original, high-quality work in the style of a computer science conference paper (up to 14 pages, not counting the bibliography; detailed proofs may be included in an appendix for the convenience of the reviewers). Such submissions should not be an abridged version of an existing journal article (see item 1) although pre-submission Arxiv preprints are permitted. These submissions will be adjudicated for both a talk and publication in the conference proceedings.

Software Demonstrations should be submitted in the format of an Extended Abstract (1-2 pages) giving the program committee enough information to assess the content of the demonstration. We are particularly interested in software that makes category theory research easier, or uses category theoretic ideas to improve software in other domains.

Extended abstracts and conference papers should be prepared with LaTeX. For conference papers please use the EPTCS style files available at

The following dates are all in 2022, and Anywhere On Earth.

• Submission Deadline: Monday 9 May
• Author Notification: Tuesday 7 June
• Camera-ready version due: Tuesday 28 June
• Adjoint School: Monday 11 to Friday 15 July
• Main Conference: Monday 18 to Friday 22 July

Conference format

We hope to run the conference as a hybrid event with talks recorded or streamed for remote participation. However, due to the state of the pandemic, the possibility of in-person attendance is not yet confirmed. Please be mindful of changing conditions when booking travel or hotel accommodations.

Financial support

Limited financial support will be available. Please contact the organisers for more information.

Program committee

• Jade Master, University of Strathclyde (Co-chair)
• Martha Lewis, University of Bristol (Co-chair)

The full program committee will be announced soon.

Organizing committee

• Jules Hedges, University of Strathclyde
• Jade Master, University of Strathclyde
• Fredrik Nordvall Forsberg, University of Strathclyde
• James Fairbanks, University of Florida

Steering committee

• John Baez, University of California, Riverside
• Bob Coecke, Cambridge Quantum
• Dorette Pronk, Dalhousie University
• David Spivak, Topos Institute

Perhaps the main interesting thing about this talk is that I sketch some work happening at the Topos Institute where we are using techniques from category theory to design epidemiological models:

• Categories: the mathematics of connection

Abstract. As we move from the paradigm of modeling one single self-contained system at a time to modeling ‘open systems’ which interact with their — perhaps unmodeled — environment, category theory becomes a useful tool. It gives a mathematical language to describe the interface between an open system and its environment, the process of composing open systems along their interfaces, and how the behavior of a composite system relates to the behaviors of its parts. It is far from a silver bullet: at present, every successful application of category theory to open systems takes hard work. But I believe we are starting to see real progress.

In the previous two posts we talked about what a thermostatic system was, and how we think about composing them. In this post, we are going to back up from thermostatic systems a little bit, and talk about operads: a general framework for composing things! But we will not yet discuss how thermostatic systems use this framework—we’ll do that in the next post.

The basic idea behind this framework is the following. Suppose that we have a bunch of widgets, and we want to compose these widgets. If we are lucky, given two widgets there is a natural way of composing them. This is the case if the widgets are elements of a monoid; we simply use the monoid operation. This is also the case if the widgets are morphisms in a category; if the domain and codomain of two widgets match up, then they can be composed. More generally, n-morphisms in a higher category also have natural ways of composition.

However, there is not always a canonical way of composing widgets. For instance, let be a commutative ring, and let and be elements of Then there are many ways to compose them: we could add them, subtract them, multiply them, etc. In fact any element of the free commutative ring gives a way of composing a pair of elements in a commutative ring. For instance, when applied to and gives Note that there is nothing special here about the fact that we started with two elements of we could start with as many elements of as we liked, say, and any element of would give a ‘way of composing’

The reader familiar with universal algebra should recognize that this situation is very general: we could do the exact same thing with vector spaces, modules, groups, algebras, or any more exotic structures that support a notion of ‘free algebra on variables’.

Let’s also discuss a less algebraic example. A point process on a subset of Euclidean space can be described as an assignment of a -valued random variable to each measurable set that is countably additive under disjoint union of measurable sets.

The interpretation is that a point process gives a random collection of points in and counts how many points fall in Moreover, this collection of points cannot have a limit point; there cannot be infinitely many points in any compact subset of

Now suppose that and are rigid embeddings such that and that is a point process on and is a point process on Then we can define a new point process on (assuming that and are independent) by letting

This is the union of the point process running in and the point process running in

The precise details here are not so important: what I want to display is the intuition that we are geometrically composing things that ‘live on’ a space. The embeddings and give us a way of gluing together a point process on and a point process on to get a point process on We could have picked something else that lives on a space, like a scalar/vector field, but I chose point processes because they are easy to visualize and composing them is fairly simple (when composing vector fields one has to be careful that they ‘match’ at the edges).

Operads

In all of the examples in the previous section, we have things that we want to compose, and ways of composing them. This situation is formalized byoperads and operad algebras (which we will define very shortly). However, the confusing part is that the operad part corresponds to'”ways of composing them’, and the operad algebra part corresponds to ‘things we want to compose’. Thus, the mathematics is somewhat ‘flipped’ from the way of thinking that comes naturally; we first think about the ways of composing things, and then we think about what things we want to compose, rather than first thinking about the things we want to compose and only later thinking about the ways of composing them!

Unfortunately, this is the logical way of presenting operads and operad algebras; we must define what an operad is before we can talk about their algebras, even if what we really care about is the algebras. Thus, without further ado, let us define what an operad is.

An operad consists of a collection of types (which are abstract, just like the ‘objects’ in a category are abstract), and for every list of types a collection of operations

These operations are the ‘ways of composing things’, but they themselves can be composed by ‘feeding into’ each other, in the following way.

Suppose that and for each Then we can make an operation

We visualize operads by letting an operation be a circle that can take several inputs and produces a single output. Then composition of operations is given by attaching the output of circles to the input of other circles. Pictured below is the composition of a unary operator a nullary operator and a binary operator with a ternary operator to create a ternary operator

Additionally, for every type there is an ‘identity operation’ that satisfies for any

and for any

There is also an associativity law for composition that is a massive pain to write out explicitly, but is more or less exactly as one would expect. For unary operators it states

The last condition for being an operad is that if
and the symmetric group on elements, then we can apply to to get

We require that if and there are also some conditions for how interacts with composition, which can be straightforwardly derived from the intuition that permutes the arguments of an operation.

Note that our definition of an operad is what might typically be known as a ‘symmetric, colored operad’, but as we will always be using symmetric, colored operads, we choose to simply drop the modifiers.

That was a long definition, so it is time for an example. This example corresponds to the first situation in the first section, where we wanted to compose ring elements.

Define to be an operad with one type, which we will call and let where is with repeated times.

Composition is simply polynomial substitution. That is, if

and

then

is the composite of For instance, composing

and

results in

The reader is invited to supply details for identities and the symmetry operators.

For the other example, define an operad by letting be the set of compact subsets of (we could consider something more exciting, but this works fine and is easy to visualize). An operation consists of disjoint embeddings where

We can visualize such an operation as simply a shape with holes in it.

Composition of such operations is just given by nesting the holes.

The outcome of the above composition is given by simply taking away the intermediate shapes (i.e. the big circle and the triangle).

Another source of examples for operads comes from the following construction. Suppose that is a symmetric monoidal category. Define by letting

where is the collection of objects in and

To compose operations and (assuming that the types are such that these are composable), we simply take Moreover, the identity operation is simply the identity morphism, and the action of is given by the symmetric monoidal structure.

In fact, the second example that we talked about is an example of this construction! If we let be the category where the objects are compact subsets of with embeddings as the morphisms, and let the symmetric monoidal product be disjoint union, then it is not too hard to show that the operad we end up with is the same as the one we described above.

Perhaps the most important example of this construction is when it is applied to because this is important in the next section! This operad has as types, sets, and an operation

is simply a function

Operad algebras

Although ‘operad algebra’ is the name that has stuck in the literature, I think a better term would be ‘operad action’, because the analogy to keep in mind is that of a group action. A group action allows a group to ‘act on’ elements of a set; an operad algebra similarly allows an operad to ‘act on’ elements of a set.

Moreover, a group action can be described as a functor from the 1-element category representing that group to and as we will see, an operad algebra can also be described as an ‘operad morphism’ from the operad to the operad just described in the last section.

In fact, this is how we will define an operad algebra; first we will define what an operad morphism is, and then we will define an operad algebra as an operad morphism to

An operad morphism from an operad to an operad is exactly what one would expect: it consists of

• For every a map

such that commutes with all of the things an operad does, i.e. composition, identities, and the action of

Thus an operad morphism from to also known as an operad algebra, consists of

• A set for every
• A function for every operation

such that the assignment of sets and functions preserves identities, composition, and the action of

Without further ado, let’s look at the examples. From any ring we can produce an algebra of We let (considered as a set), and for

we let

We can also make an operad algebra of point processes, for For we let be the set of point processes on If is an embedding, then we let be the map that sends point processes on respectively to the point process defined by

Finally, if is a symmetric monoidal category, there is a way to make an operad algebra of from a special type of functor This is convenient, because it is often easier to prove that the functor satisfies the necessary properties than it is to prove that the algebra is in fact well-formed.

The special kind of functor we need is a lax symmetric monoidal functor. This is a functor equipped with a natural transformation that is well-behaved with respect to the associator, identity, and symmetric structure of We call the laxator, and formally speaking, a lax symmetric monoidal functor consists of a functor along with a laxator.
I won’t go into detail about the whole construction that makes an operad algebra out of a lax symmetric monoidal functor, but the basic idea is that given an operation (which is a morphism ), we can construct a function by composing

with

This basic idea can be extended using associativity to produce a function from an operation

As an example of this construction, consider point processes again. We can make a lax symmetric monoidal functor by sending a set to the set of point processes on and an embedding to the map that sends a point process to a point process defined by

The laxator sends a point process on and a point process on to a point process on a defined by

The reader should inspect this definition and think about why it is equivalent to the earlier definition for the operad algebra of point processes.

Summary

This was a long post, so I’m going to try and go over the main points so that you can organize what you just learned in some sort of coherent fashion.

First I talked about how there frequently arises situations in which there isn’t a canonical way of ‘composing’ two things. The two examples that I gave were elements of a ring, and structures on spaces, specifically point processes.

I then talked about the formal way that we think about these situations. Namely, we organize the ‘ways of composing things’ into an operad, and then we organize the ‘things that we want to compose’ into an operad algebra. Along the way, I discussed a convenient way of making an operad out of a symmetric monoidal category, and an operad algebra out of a lax symmetric monoidal functor.

This construction will be important in the next post, when we make an operad of ‘ways of composing thermostatic systems’ and an operad algebra of thermostatic systems to go along with it.

See all four parts of this series:
• Part 1: thermostatic systems and convex sets.
• Part 2: composing thermostatic systems.
• Part 3: operads and their algebras.
• Part 4: the operad for composing thermostatic systems.

and he gave an overview of what a ‘thermostatic system’ is.

In this post, I want to talk about how to compose thermostatic systems. We will not yet use category theory, saving that for another post; instead we will give a ‘nuts-and-bolts’ approach, based on examples.

Suppose that we have two thermostatic systems and we put them in thermal contact, so that they can exchange heat energy. Then we predict that their temperatures should equalize. What does this mean precisely, and how do we derive this result?

Recall that a thermostatic system is given by a convex space and a concave entropy function A ‘tank’ of constant heat capacity, whose state is solely determined by its energy, has state space and entropy function where is the heat capacity.

Now suppose that we have two tanks of heat capacity and respectively. As thermostatic systems, the state of both tanks is described by two energy variables, and and we have entropy functions

By conservation of energy, the total energy of both tanks must remain constant, so

for some equivalently

The equilibrium state then has maximal total entropy subject to this constraint. That is, an equilibrium state must satisfy

We can now derive the condition of equal temperature from this condition. In thermodynamics, temperature is defined by

The interested reader should calculate this for our entropy functions, and in doing this, see why we identify with the heat capacity. Now, manipulating the condition of equilibrium, we get

As a function of the right hand side of this equation must have derivative equal to Thus,

Now, note that if then

Thus, the condition of equilibrium is

Using the fact that

the above equation reduces to

so we have our expected condition of temperature equilibriation!

The result of composing several thermostatic systems should be a new thermostatic system. In the case above, the new thermostatic system is described by a single variable: the total energy of the system The entropy function of this new thermostatic system is given by the constrained supremum:

The reader should verify that this ends up being the same as a system with heat capacity i.e. with entropy function given by

A very similar argument goes through when one has two systems that can exchange both heat and volume; both temperature and pressure are equalized as a consequence of entropy maximization. We end up with a system that is parameterized by total energy and total volume, and has an entropy function that is a function of those quantities.

The general procedure is the following. Suppose that we have thermostatic systems, Let be a convex space, that we think of as describing the quantities that are conserved when we compose the thermostatic systems (i.e., total energy, total volume, etc.). Each value of the conserved quantities corresponds to many different possible values for We represent this with a relation

We then turn into a thermostatic system by using the entropy function

It turns out that if we require to be a convex relation (that is, a convex subspace of ) then as defined above ends up being a concave function, so is a true thermostatic system.

We will have to wait until a later post in the series to see exactly how we describe this procedure using category theory. For now, however, I want to talk about why this procedure makes sense.

In the statistical mechanical interpretation, entropy is related to the probability of observing a specific macrostate. As we scale the system, the theory of large deviations tells us that seeing any macrostate other than the most probable macrostate is highly unlikely. Thus, we can find the macrostate that we will observe in practice by finding the entropy maxima. For an exposition of this point of view, see this paper:

There is also a dynamical systems interpretation of entropy, where entropy serves as a Lyapunov function for a dynamical system. This is the viewpoint taken here:

• Wassim M. Haddad, A Dynamical Systems Theory of Thermodynamics, Princeton U. Press.

In each of these viewpoints, however, the maximization of entropy is not global, but rather constrained. The dynamical system only maximizes entropy along its orbit, and the statistical mechanical system maximizes entropy with respect to constraints on the probability distribution.

We can think of thermostatics as a ‘common refinement’ of both of these points of view. We are agnostic as to the mechanism by which constrained maximization of entropy takes place and we are simply interested in investigating its consequences. We expect that a careful formalization of either system should end up deriving something similar to our thermostatic theory in the limit.

In his book Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work, G. H. Hardy tells this famous story:

He could remember the idiosyncracies of numbers in an almost uncanny way. It was Littlewood who said every positive integer was one of Ramanujan’s personal friends. I remember once going to see him when he was lying ill at Putney. I had ridden in taxi-cab No. 1729, and remarked that the number seemed to be rather a dull one, and that I hoped it was not an unfavourable omen. “No,” he replied, “it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways.”

Namely,

But there’s more to this story than meets the eye.

First, it’s funny how this story becomes more dramatic with each retelling. In the foreword to Hardy’s book A Mathematician’s Apology, his friend C. P. Snow tells it thus:

Hardy used to visit him, as he lay dying in hospital at Putney. It was on one of those visits that there happened the incident of the taxicab number. Hardy had gone out to Putney by taxi, as usual his chosen method of conveyance. He went into the room where Ramanujan was lying. Hardy, always inept about introducing a conversation, said, probably without a greeting, and certainly as his first remark: “I thought the number of my taxicab was 1729. It seemed to me rather a dull number.” To which Ramanujan replied: “No, Hardy! No, Hardy! It is a very interesting number. It is the smallest number expressible as the sum of two cubes in two different ways.”

Here Hardy becomes “inept” and makes his comment “probably without a greeting, and certainly as his first remark”. Perhaps the ribbing of a friend who knew Hardy’s ways?

I think I’ve seen later versions where Hardy “burst into the room”.

But it’s common for legends to be embroidered with the passage of time. Here’s something more interesting. In Ono and Trebat-Leder’s paper The 1729 K3 surface, they write:

While this anecdote might give one the impression that Ramanujan came up with this amazing property of 1729 on the spot, he actually had written it down before even coming to England.

In fact they point out that Ramanujan wrote it down more than once!

Before he went to England, Ramanujan mainly published by posting puzzles to the questions section of the Journal of the Indian Mathematical Society. In 1913, in Question 441, he challenged the reader to prove a formula expressing a specific sort of perfect cube as a sum of three perfect cubes. If you keep simplifying this formula to see why it works, you eventually get

In Ramanujan’s Notebooks, Part III, Bruce Berndt explains that Ramanujan developed a method for finding solutions of Euler’s diophantine equation

in his “second notebook”. This is one of three notebooks Ramanujan left behind after his death—and the results in this one were written down before he first went to England. In Item 20(iii) he describes his method and lists many example solutions, the simplest being

In 1915 Ramanujan posed another puzzle about writing a sixth power as a sum of three cubes, Question 661. And he posed a puzzle about writing $1$ as a sum of three cubes, Question 681.

Finally, four or five years later, Ramanujan revisited the equation in his so-called Lost Notebook. This was actually a pile of 138 loose unnumbered pages written by Ramanujan in the last two years of his life, 1919 and 1920. George Andrews found them in a box in Trinity College, Cambridge much later, in 1976.

Now the pages have been numbered, published and intensively studied: George Andrews and Bruce Berndt have written five books about them! Here is page 341 of Ramanujan’s Lost Notebook, where he came up with a method for finding an infinite family of integer solutions to the equation :

As you can see, one example is

In Section 8.5 of George Andrews and Bruce Berndt’s book Ramanujan’s Lost Notebook: Part IV, they discuss Ramanujan’s method, calling it “truly remarkable”.

In short, Ramanujan was well aware of the special properties of the number 1729 before Hardy mentioned it. And something prompted Ramanujan to study the equation again near the end of his life, and find a new way to solve it.

Could it have been the taxicab incident??? Or did Hardy talk about the taxi after Ramanujan had just thought about the number 1729 yet again? In the latter case, it’s hardly a surprise that Ramanujan remembered it.

Thinking about this story, I’ve started wondering about what really happened here. First of all, as James Dolan pointed out to me, you don’t need to be a genius to notice that

Was Hardy, the great number theorist, so blind to the properties of numbers that he didn’t notice either of these ways of writing 1729 as a sum of two cubes? Base ten makes them very easy to spot if you know your cubes, and I’m sure Hardy knew and .

Second of all, how often do number theorists come out and say that a number is uninteresting? Except in that joke about the “least uninteresting number”, I don’t think I’ve heard it happen.

My wife Lisa suggested an interesting possibility that would resolve all these puzzles:

Hardy either knew of Ramanujan’s work on this problem or noticed himself that 1729 had a special property. He wanted to cheer up his dear friend Ramanujan, who was lying deathly ill in the hospital. So he played the fool by walking in and saying that 1729 was “rather dull”.

I have no real evidence for this, and I’m not claiming it’s true. But I like how it flips the meaning of the story. And it’s not impossible. Hardy was, after all, a bit of a prankster: each time he sailed across the Atlantic he sent out a postcard saying he had proved the Riemann Hypothesis, just in case he drowned.

We could try to see if there really was a London taxi with number 1729 at that time. It would be delicious to discover that it was merely an invention of Hardy’s. But I don’t know if records of London taxi-cab numbers from around 1919 still exist.

Maybe I’ll let C. P. Snow have the last word. After telling his version of the incident with Hardy, Ramanujan and the taxicab, he writes:

This is the exchange as Hardy recorded it. It must be substantially accurate. He was the most honest of men; and further no one could possibly have invented it.

The Kepler problem is the study of a particle moving in an attractive inverse square force. In classical mechanics, this problem shows up when you study the motion of a planet around the Sun in the Solar System. In quantum mechanics, it shows up when you study the motion of an electron around a proton in a hydrogen atom.

In Part 2 we saw that the classical Kepler problem has, besides energy and the three components of angular momentum, three more conserved quantities: the components of the eccentricity vector!

This was discovered long ago, in 1710, by the physicist Jakob Hermann. But thanks to Noether, we now know that in classical mechanics, conserved quantities come from symmetries. In the Kepler problem, conservation of energy comes from time translation symmetry, while conservation of the angular momentum comes from rotation symmetry. Which symmetries give conservation of the eccentricity vector?

As we shall see, these symmetries are rotations in 4-dimensional space. These include the obvious rotations in 3-dimensional space which give angular momentum. The other 4-dimensional rotations act in a much less obvious way, and give the eccentricity vector.

In fact, we’ll see that the Kepler problem can be rephrased in terms of a free particle moving around on a sphere in 4-dimensional space. This is a nice explanation of the 4-dimensional rotation symmetry.

After that we’ll see a second way to rephrase the Kepler problem: in terms of a massless, relativistic free particle moving at the speed of light on a sphere in 4-dimensional space. Our first formulation will not involve relativity. This second will.

All this is very nice. You can read some fun explanations of the first formulation here:

But how could you guess this 4-dimensional rotation symmetry if you didn’t know about it already? One systematic approach uses Poisson brackets. I won’t explain these, just dive in and use them!

Remember, the particle in the Kepler problem has various observables, which are all ultimately functions of its position and momentum:

• position:

• momentum:

• energy:

• angular momentum:

• the eccentricity vector:

I’ll use conventions where the Poisson brackets of the components of position and momentum are taken to be

From this, using the rules for Poisson brackets, we can calculate the Poisson brackets of everything else. For starters:

These equations are utterly unsurprising, since they are equivalent to saying that angular momentum and the eccentricity vector are conserved. More interestingly, we have

where all the indices go from 1 to 3, I’m summing over repeated indices even if they’re both subscripts, and are the Levi–Civita symbols.

Now, the factor of above is annoying. But on the region of phase space where —that is, the space of bound states, where the particle carries out an elliptical orbit—we can define a new vector to deal with this annoyance:

Now we easily get

This is nicer, but we can simplify it even more if we introduce some new vectors that are linear combinations of and namely half their sum and half their difference:

Then we get

So, the observables and contain the same information as the angular momentum and eccentricity vectors, but now they commute with each other!

What does this mean?

Well, when you’re first learning math the Levi–Civita symbols may seem like just a way to summarize the funny rules for cross products in 3-dimensional space. But as you proceed, you ultimately learn that with its cross product is the Lie algebra of the Lie group of rotations in 3-dimensional space. From this viewpoint, the Levi–Civita symbols are nothing but the structure constants for the Lie algebra that is, a way of describing the bracket operation in this Lie algebra in terms of basis vectors.

So, what we’ve got here are two commuting copies of one having the as a basis and the other having the as a basis, both with the Poisson bracket as their Lie bracket.

A better way to say the same thing is that we’ve got a single 6-dimensional Lie algebra

having both the and as basis. But then comes the miracle:

The easiest way to see this is to realize that the unit sphere in 4 dimensions, is itself a Lie group with Lie algebra isomorphic to Namely, it’s the unit quaternions!—or if you prefer, the Lie group Like any Lie group it acts on itself via left and right translations, which commute. But these are actually ways of rotating So, you get a map of Lie algebras from to and you can check that this is an isomorphism.

So in this approach, the 4th dimension pops out of the fact that the Kepler problem has conserved quantities that give two commuting copies of By Noether’s theorem, it follows that conservation of angular momentum and the eccentricity vector must come from a hidden symmetry: symmetry under some group whose Lie algebra is

And indeed, it turns out that the group acts on the bound states of the Kepler problem in a way that commutes with time evolution!

But how can we understand this fact?

Historically, it seems that the first explanation was found in the quantum-mechanical context. In 1926, even before Schrödinger came up with his famous equation, Pauli used conservation of angular momentum and the eccentricity to determine the spectrum of hydrogen. But I believe he was using what we now call Lie algebra methods, not bringing in the group

In 1935, Vladimir Fock, famous for the ‘Fock space’ in quantum field theory, explained this 4-dimensional rotation symmetry by setting up an equivalence between hydrogen atom bound states and functions on the 3-sphere! In the following year, Valentine Bargmann, later famous for being Einstein’s assistant, connected Pauli and Fock’s work using group representation theory.

All this is quantum mechanics. It seems the first global discussion of this symmetry in the classical context was given by Bacry, Ruegg, and Souriau in 1966, leading to important work by Souriau and Moser in the early 1970s. Since then, much more has been done. You can learn about a lot of it from these two books, which are my constant companions these days:

• Victor Guillemin and Shlomo Sternberg, Variations on a Theme by Kepler, Providence, R.I., American Mathematical Society, 1990.

• Bruno Cordani, The Kepler Problem: Group Theoretical Aspects, Regularization and Quantization, with Application to the Study of Pertubation, Birkhäuser, Boston, 2002.

But let me try to summarize a bit of this material.

One way to understand the symmetry for bound states of the Kepler problem is the result of Hamilton that I explained last time: for a particle moving around an elliptical orbit in the Kepler problem, its momentum moves round and round in a circle.

I’ll call these circles Hamilton’s circles. Hamilton’s circles are not arbitrary circles in . Using the inverse of stereographic projection, we can map to the unit 3-sphere:

This map sends Hamilton’s circles in to great circles in Furthermore, this construction gives all the great circles in except those that go through the north and south poles, These missing great circles correspond to periodic orbits in the Kepler problem where a particle starts with momentum zero, falls straight to the origin, and bounces back the way it came. If we include these degenerate orbits, every great circle on the unit 3-sphere is the path traced out by the momentum in some solution of the Kepler problem.

Let me reemphasize: in this picture, points of correspond not to positions but to momenta in the Kepler problem. As time passes, these points move along great circles in but not at constant speed.

How is their dynamics related to geodesic motion on the 3-sphere?
We can understand this as follows. In Part 2 we saw that

and using the fact that an easy calculation gives

In the 3-sphere picture, the observables become functions on the cotangent bundle . These functions are just the components of momentum for a particle on , defined using a standard basis of right-invariant vector fields on Similarly, the observables are the components of momentum using a standard basis of left-invariant vector fields. It follows that

is the Hamiltonian for a nonrelativistic free particle on with an appropriately chosen mass. Such a particle moves around a great circle on at constant speed. Since the Kepler Hamiltonian is a function of , particles governed by this Hamiltonian move along the same trajectories—but typically not at constant speed!

Both and the Kepler Hamiltonian are well-defined smooth functions on the symplectic manifold that Souriau dubbed the Kepler manifold:

This is the cotangent bundle of the 3-sphere with the zero cotangent vectors removed, so that is well-defined.

All this is great. But even better, there’s yet another picture of what’s going on, which brings relativity into the game!

We can also think of as a space of null geodesics in the Einstein universe: the manifold with the Lorentzian metric

where is the usual Riemannian metric on the real line (‘time’) and is the usual metric on the unit sphere (‘space’). In this picture describes the geodesic’s position at time zero, while the null cotangent vector describes its 4-momentum at time zero. Beware: in this picture two geodesics count as distinct if we rescale by any positive factor other than 1. But this is good: physically, it reflects the fact that in relativity, massless particles can have different 4-momentum even if they trace out the same path in spacetime.

In short, the Kepler manifold also serves as the classical phase space for a free massless spin-0 particle in the Einstein universe!

And here’s the cool part: the Hamiltonian for such a particle is

So it’s a function of both the Hamiltonians we’ve seen before. Thus, time evolution given by this Hamiltonian carries particles around great circles on the 3-sphere… at constant speed, but at a different speed than the nonrelativistic free particle described by the Hamiltonian

In future episodes, I want to quantize this whole story. We’ll get some interesting outlooks on the quantum mechanics of the hydrogen atom.

It urges you — or your friends, or students — to apply for our free summer school in applied category theory run by the American Mathematical Society. It’s also a quick intro to some key ideas in applied category theory!

Applications are due Tuesday 2022 February 15 at 11:59 Eastern Time — go here for details. If you get in, you’ll get an all-expenses-paid trip to a conference center in upstate New York for a week in the summer. There will be a pool, bocci, lakes with canoes, woods to hike around in, campfires at night… and also whiteboards, meeting rooms, and coffee available 24 hours a day.

You can work with me on categories in chemistry, Nina on categories in the study of social networks, or Valeria on categories applied to concepts from computer science, like lenses.

There are also other programs to choose from. Read this, and click for more details:

You need the word 'latex' right after the first dollar sign, and it needs a space after it. Double dollar signs don't work, and other limitations apply, some described here. You can't preview comments here, but I'm happy to fix errors.