Shannon Entropy from Category Theory

22 April, 2022

I’m giving a talk at Categorical Semantics of Entropy on Wednesday May 11th, 2022. You can watch it live on Zoom if you register, or recorded later. Here’s the idea:

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

You can see the slides now, here. I talk a bit about all these papers:

• John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, 2011.

• Tom Leinster, An operadic introduction to entropy, 2011.

• John Baez and Tobias Fritz, A Bayesian characterization of relative entropy, 2014.

• Tom Leinster, A short characterization of relative entropy, 2017.

• Nicolas Gagné and Prakash Panangaden, A categorical characterization of relative entropy on standard Borel spaces, 2017.

• Tom Leinster, Entropy and Diversity: the Axiomatic Approach, 2020.

• Arthur Parzygnat, A functorial characterization of von Neumann entropy, 2020.

• Arthur Parzygnat, Towards a functorial description of quantum relative entropy, 2021.

• Tai-Danae Bradley, Entropy as a topological operad derivation, 2021.

Categorical Semantics of Entropy

19 April, 2022

There will be a workshop on the categorical semantics of entropy at the CUNY Grad Center in Manhattan on Friday May 13th, organized by John Terilla. I was kindly invited to give an online tutorial beforehand on May 11, which I will give remotely to save carbon. Tai-Danae Bradley will also be giving a tutorial that day in person:

Tutorial: Categorical Semantics of Entropy, Wednesday 11 May 2022, 13:00–16:30 Eastern Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

12:00-1:00 Eastern Daylight Time — Lunch in Room 5209.

1:00-2:30 — Shannon entropy from category theory, John Baez, University of California Riverside; Centre for Quantum Technologies (Singapore); Topos Institute.

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

2:30-3:00 — Coffee break.

3:00-4:30 — Operads and entropy, Tai-Danae Bradley, The Master’s University; Sandbox AQ.

This talk will open with a basic introduction to operads and their representations, with the main example being the operad of probabilities. I’ll then give a light sketch of how this framework leads to a small, but interesting, connection between information theory, abstract algebra, and topology, namely a correspondence between Shannon entropy and derivations of the operad of probabilities.

Symposium on Categorical Semantics of Entropy, Friday 13 May 2022, 9:30-3:15 Eastern Daylight Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

9:30-10:00 Eastern Daylight Time — Coffee and pastries in Room 5209.

10:00-10:45 — Operadic composition of thermodynamic systems, Owen Lynch, Utrecht University.

The maximum entropy principle is a fascinating and productive lens with which to view both thermodynamics and statistical mechanics. In this talk, we present a categorification of the maximum entropy principle, using convex spaces and operads. Along the way, we will discuss a variety of examples of the maximum entropy principle and show how each application can be captured using our framework. This approach shines a new light on old constructions. For instance, we will show how we can derive the canonical ensemble by attaching a probabilistic system to a heat bath. Finally, our approach to this categorification has applications beyond the maximum entropy principle, and we will give an hint of how to adapt this categorification to the formalization of the composition of other systems.

11:00-11:45 — Polynomial functors and Shannon entropy, David Spivak, MIT and the Topos Institute.

The category Poly of polynomial functors in one variable is extremely rich, brimming with categorical gadgets (e.g. eight monoidal products, two closures, limits, colimits, etc.) and applications including dynamical systems, databases, open games, and cellular automata. In this talk I’ll show that objects in Poly can be understood as empirical distributions. In part using the standard derivative of polynomials, we obtain a functor to Set × Setop which encodes an invariant of a distribution as a pair of sets. This invariant is well-behaved in the sense that it is a distributive monoidal functor: it acts on both distributions and maps between them, and it preserves both the sum and the tensor product of distributions. The Shannon entropy of the original distribution is then calculated directly from the invariant, i.e. only in terms of the cardinalities of these two sets. Given the many applications of polynomial functors and of Shannon entropy, having this link between them has potential to create useful synergies, e.g. to notions of entropic causality or entropic learning in dynamical systems.

12:00-1:30 — Lunch in Room 5209

1:30-2:15 — Higher entropy, Tom Mainiero, Rutgers New High Energy Theory Center.

Is the frowzy state of your desk no longer as thrilling as it once was? Are numerical measures of information no longer able to satisfy your needs? There is a cure! In this talk we’ll learn about: the secret topological lives of multipartite measures and quantum states; how a homological probe of this geometry reveals correlated random variables; the sly decategorified involvement of Shannon, Tsallis, Réyni, and von Neumann in this larger geometric conspiracy; and the story of how Gelfand, Neumark, and Segal’s construction of von Neumann algebra representations can help us uncover this informatic ruse. So come to this talk, spice up your entropic life, and bring new meaning to your relationship with disarray.

2:30-3:15 — On characterizing classical and quantum entropy, Arthur Parzygnat, Institut des Hautes Études Scientifiques.

In 2011, Baez, Fritz, and Leinster proved that the Shannon entropy can be characterized as a functor by a few simple postulates. In 2014, Baez and Fritz extended this theorem to provide a Bayesian characterization of the classical relative entropy, also known as the Kullback–Leibler divergence. In 2017, Gagné and Panangaden extended the latter result to include standard Borel spaces. In 2020, I generalized the first result on Shannon entropy so that it includes the von Neumann (quantum) entropy. In 2021, I provided partial results indicating that the Umegaki relative entropy may also have a Bayesian characterization. My results in the quantum setting are special applications of the recent theory of quantum Bayesian inference, which is a non-commutative extension of classical Bayesian statistics based on category theory. In this talk, I will give an overview of these developments and their possible applications in quantum information theory.

Wine and cheese reception to follow, Room 5209.

Applied Category Theory 2022

25 February, 2022

The Fifth International Conference on Applied Category Theory, ACT2022, will take place at the University of Strathclyde from 18 to 22 July 2022, preceded by the Adjoint School 2022 from 11 to 15 July. This conference follows previous events at Cambridge (UK), Cambridge (MA), Oxford and Leiden.

Applied category theory is important to a growing community of researchers who study computer science, logic, engineering, physics, biology, chemistry, social sciences, linguistics and other subjects using category-theoretic tools. The background and experience of our members is as varied as the systems being studied. The goal of the Applied Category Theory conference series is to bring researchers together, strengthen the applied category theory community, disseminate the latest results, and facilitate further development of the field.

Submissions

We accept submissions in English of original research papers, talks about work accepted/submitted/published elsewhere, and demonstrations of relevant software. Accepted original research papers will be published in a proceedings volume. The keynote addresses will be chosen from the accepted papers. The conference will include an industry showcase event and community meeting. We particularly encourage people from underrepresented groups to submit their work and the organizers are committed to non-discrimination, equity, and inclusion.

Submission formats

Extended Abstracts should be submitted describing the contribution and providing a basis for determining the topics and quality of the anticipated presentation (1-2 pages). These submissions will be adjudicated for inclusion as a talk at the conference. Such work should include references to any longer papers, preprints, or manuscripts providing additional details.

Conference Papers should present original, high-quality work in the style of a computer science conference paper (up to 14 pages, not counting the bibliography; detailed proofs may be included in an appendix for the convenience of the reviewers). Such submissions should not be an abridged version of an existing journal article (see item 1) although pre-submission Arxiv preprints are permitted. These submissions will be adjudicated for both a talk and publication in the conference proceedings.

Software Demonstrations should be submitted in the format of an Extended Abstract (1-2 pages) giving the program committee enough information to assess the content of the demonstration. We are particularly interested in software that makes category theory research easier, or uses category theoretic ideas to improve software in other domains.

Extended abstracts and conference papers should be prepared with LaTeX. For conference papers please use the EPTCS style files available at

http://style.eptcs.org

The submission link is

https://easychair.org/conferences/?conf=act2022

Important dates

The following dates are all in 2022, and Anywhere On Earth.

• Submission Deadline: Monday 9 May
• Author Notification: Tuesday 7 June
• Camera-ready version due: Tuesday 28 June
• Adjoint School: Monday 11 to Friday 15 July
• Main Conference: Monday 18 to Friday 22 July

Conference format

We hope to run the conference as a hybrid event with talks recorded or streamed for remote participation. However, due to the state of the pandemic, the possibility of in-person attendance is not yet confirmed. Please be mindful of changing conditions when booking travel or hotel accommodations.

Program committee

• Jade Master, University of Strathclyde (Co-chair)
• Martha Lewis, University of Bristol (Co-chair)

The full program committee will be announced soon.

Organizing committee

• Jules Hedges, University of Strathclyde
• Jade Master, University of Strathclyde
• Fredrik Nordvall Forsberg, University of Strathclyde
• James Fairbanks, University of Florida

Steering committee

• John Baez, University of California, Riverside
• Bob Coecke, Cambridge Quantum
• Dorette Pronk, Dalhousie University
• David Spivak, Topos Institute

Categories: the Mathematics of Connection

17 February, 2022

I gave this talk at Mathematics of Collective Intelligence, a workshop organized by Jacob Foster at UCLA’s Institute of Pure and Applied Mathematics, or IPAM for short. There have been a lot of great talks here, all available online.

Perhaps the main interesting thing about this talk is that I sketch some work happening at the Topos Institute where we are using techniques from category theory to design epidemiological models:

Categories: the mathematics of connection

Abstract. As we move from the paradigm of modeling one single self-contained system at a time to modeling ‘open systems’ which interact with their — perhaps unmodeled — environment, category theory becomes a useful tool. It gives a mathematical language to describe the interface between an open system and its environment, the process of composing open systems along their interfaces, and how the behavior of a composite system relates to the behaviors of its parts. It is far from a silver bullet: at present, every successful application of category theory to open systems takes hard work. But I believe we are starting to see real progress.

You can see my slides or watch a video of my talk on the IPAM website or here:

For some other related talks, see:

To read more about my work on categories and open systems, go here:

Compositional Thermostatics (Part 3)

14 February, 2022

guest post by Owen Lynch

This is the third part (Part 1, Part 2) of a blog series on a paper that we wrote recently:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

In the previous two posts we talked about what a thermostatic system was, and how we think about composing them. In this post, we are going to back up from thermostatic systems a little bit, and talk about operads: a general framework for composing things! But we will not yet discuss how thermostatic systems use this framework—we’ll do that in the next post.

The basic idea behind this framework is the following. Suppose that we have a bunch of widgets, and we want to compose these widgets. If we are lucky, given two widgets there is a natural way of composing them. This is the case if the widgets are elements of a monoid; we simply use the monoid operation. This is also the case if the widgets are morphisms in a category; if the domain and codomain of two widgets match up, then they can be composed. More generally, n-morphisms in a higher category also have natural ways of composition.

However, there is not always a canonical way of composing widgets. For instance, let $R$ be a commutative ring, and let $a$ and $b$ be elements of $R.$ Then there are many ways to compose them: we could add them, subtract them, multiply them, etc. In fact any element of the free commutative ring $\mathbb{Z}[x,y]$ gives a way of composing a pair of elements in a commutative ring. For instance, $x^2 + xy - y^2,$ when applied to $a$ and $b,$ gives $a^2 + ab - b^2.$ Note that there is nothing special here about the fact that we started with two elements of $R;$ we could start with as many elements of $R$ as we liked, say, $a_1,\ldots,a_n,$ and any element of $\mathbb{Z}[x_1,\ldots,x_n]$ would give a ‘way of composing’ $a_1,\ldots,a_n.$

The reader familiar with universal algebra should recognize that this situation is very general: we could do the exact same thing with vector spaces, modules, groups, algebras, or any more exotic structures that support a notion of ‘free algebra on $n$ variables’.

Let’s also discuss a less algebraic example. A point process $X$ on a subset of Euclidean space $A \subseteq \mathbb{R}^n$ can be described as an assignment of a $\mathbb{N}$-valued random variable $X_U$ to each measurable set $U \subseteq A$ that is countably additive under disjoint union of measurable sets.

The interpretation is that a point process gives a random collection of points in $A,$ and $X_U$ counts how many points fall in $U.$ Moreover, this collection of points cannot have a limit point; there cannot be infinitely many points in any compact subset of $A.$

Now suppose that $f \colon B \to A$ and $g \colon C \to A$ are rigid embeddings such that $f(B) \cap g(C) = \emptyset,$ and that $X$ is a point process on $B$ and $Y$ is a point process on $C.$ Then we can define a new point process $Z$ on $A$ (assuming that $X$ and $Y$ are independent) by letting

$Z_U = X_{f^{-1}(U)}+ Y_{g^{-1}(U)}$

This is the union of the point process $X$ running in $f(B)$ and the point process $Y$ running in $g(C).$

The precise details here are not so important: what I want to display is the intuition that we are geometrically composing things that ‘live on’ a space. The embeddings $f$ and $g$ give us a way of gluing together a point process on $B$ and a point process on $C$ to get a point process on $A.$ We could have picked something else that lives on a space, like a scalar/vector field, but I chose point processes because they are easy to visualize and composing them is fairly simple (when composing vector fields one has to be careful that they ‘match’ at the edges).

In all of the examples in the previous section, we have things that we want to compose, and ways of composing them. This situation is formalized by operads and operad algebras (which we will define very shortly). However, the confusing part is that the operad part corresponds to'”ways of composing them’, and the operad algebra part corresponds to ‘things we want to compose’. Thus, the mathematics is somewhat ‘flipped’ from the way of thinking that comes naturally; we first think about the ways of composing things, and then we think about what things we want to compose, rather than first thinking about the things we want to compose and only later thinking about the ways of composing them!

Unfortunately, this is the logical way of presenting operads and operad algebras; we must define what an operad is before we can talk about their algebras, even if what we really care about is the algebras. Thus, without further ado, let us define what an operad is.

An operad $\mathcal{O}$ consists of a collection $\mathcal{O}_0$ of types (which are abstract, just like the ‘objects’ in a category are abstract), and for every list of types $X_1,\ldots,X_n,Y \in \mathcal{O}_0,$ a collection of operations $\mathcal{O}(X_1,\ldots,X_n;Y).$

These operations are the ‘ways of composing things’, but they themselves can be composed by ‘feeding into’ each other, in the following way.

Suppose that $g \in \mathcal{O}(Y_1,\ldots,Y_n;Z)$ and for each $i = 1,\ldots,n,$ $f_i \in \mathcal{O}(X_{i,1},\ldots,X_{i,k_i};Y_i).$ Then we can make an operation

$g(f_1,\ldots,f_n) \in \mathcal{O}(X_{1,1},\ldots,X_{1,k_1},\ldots,X_{n,1},\ldots,X_{n,k_n};Z)$

We visualize operads by letting an operation be a circle that can take several inputs and produces a single output. Then composition of operations is given by attaching the output of circles to the input of other circles. Pictured below is the composition of a unary operator $f_1,$ a nullary operator $f_2,$ and a binary operator $f_3$ with a ternary operator $g$ to create a ternary operator $g(f_1,f_2,f_3).$

Additionally, for every type $X \in \mathcal{O}_0,$ there is an ‘identity operation’ $1_X \in \mathcal{O}(X;X)$ that satisfies for any $g \in \mathcal{O}(X_1,\ldots,X_n;Y)$

$g(1_{X_1},\ldots,1_{X_n}) = g$

and for any $f \in \mathcal{O}(X;Y)$

$1_Y(f) = f$

There is also an associativity law for composition that is a massive pain to write out explicitly, but is more or less exactly as one would expect. For unary operators $f,g,h,$ it states

$f(g(h)) = f(g)(h)$

The last condition for being an operad is that if $f \in \mathcal{O}(X_1,\ldots,X_n;Y)$ and $\sigma \in S(n),$ the symmetric group on $n$ elements, then we can apply $\sigma$ to $f$ to get

$\sigma^\ast(f) \in \mathcal{O}(X_{\sigma(1)},\ldots,X_{\sigma(n)};Y).$

We require that $(\sigma \tau)^\ast(f) = \tau^\ast(\sigma^\ast(f))$ if $\sigma,\tau \in S(n),$ and there are also some conditions for how $\sigma^\ast$ interacts with composition, which can be straightforwardly derived from the intuition that $\sigma^\ast$ permutes the arguments of an operation.

Note that our definition of an operad is what might typically be known as a ‘symmetric, colored operad’, but as we will always be using symmetric, colored operads, we choose to simply drop the modifiers.

That was a long definition, so it is time for an example. This example corresponds to the first situation in the first section, where we wanted to compose ring elements.

Define $\mathcal{R}$ to be an operad with one type, which we will call $R \in \mathcal{R}_0,$ and let $\mathcal{R}(R^n;R) = \mathbb{Z}[x_1,\ldots,x_n],$ where $\mathcal{R}(R^n;R)$ is $\mathcal{R}(R,\ldots,R;R)$ with $R$ repeated $n$ times.

Composition is simply polynomial substitution. That is, if

$q(y_1,\ldots,y_n) \in \mathbb{Z}[y_1,\ldots,y_n] \cong \mathcal{R}(R^n;R)$

and

$p_i(x_{i,1},\ldots,x_{i,k_i}) \in \mathbb{Z}[x_{i,1},\ldots,x_{i,k_i}] \cong \mathcal{R}(R^{k_i};R)$

then

$q(p_1(x_{1,1},\ldots,x_{1,k_1}),\ldots,p_n(x_{n,1},\ldots,x_{n,k_n})) \in \mathcal{R}(R^{\sum_{i=1}^n k_i};R)$

is the composite of $p_1,\ldots,p_n,q.$ For instance, composing

$x^2 \in \mathbb{Z}[x] \cong \mathcal{R}(R;R)$

and

$y+z \in \mathbb{Z}[y,z] \cong \mathcal{R}(R,R;R)$

results in

$(y+z)^2 \in \mathbb{Z}[y,z] \cong \mathcal{R}(R,R;R)$

The reader is invited to supply details for identities and the symmetry operators.

For the other example, define an operad $\mathcal{P}$ by letting $\mathcal{P}_0$ be the set of compact subsets of $\mathbb{R}^2$ (we could consider something more exciting, but this works fine and is easy to visualize). An operation $f \in \mathcal{P}(X_1,\ldots,X_n;Y)$ consists of disjoint embeddings $f_1,\ldots,f_n,$ where $f_i \colon X_i \to Y.$

We can visualize such an operation as simply a shape with holes in it.

Composition of such operations is just given by nesting the holes.

The outcome of the above composition is given by simply taking away the intermediate shapes (i.e. the big circle and the triangle).

Another source of examples for operads comes from the following construction. Suppose that $(C,\otimes,1)$ is a symmetric monoidal category. Define $\mathrm{Op}(C,\otimes,1) = \mathrm{Op}(C)$ by letting

$\mathrm{Op}(C)_0 = C_0$

where $C_0$ is the collection of objects in $C,$ and

$\mathrm{Op}(C)(X_1,\ldots,X_n;Y) = \mathrm{Hom}_C(X_1 \otimes \cdots \otimes X_n, Y)$

To compose operations $f_1,\ldots,f_n$ and $g$ (assuming that the types are such that these are composable), we simply take $g \circ (f_1 \otimes \ldots \otimes f_n).$ Moreover, the identity operation is simply the identity morphism, and the action of $\sigma \in S(n)$ is given by the symmetric monoidal structure.

In fact, the second example that we talked about is an example of this construction! If we let $C$ be the category where the objects are compact subsets of $\mathbb{R}^2,$ with embeddings as the morphisms, and let the symmetric monoidal product be disjoint union, then it is not too hard to show that the operad we end up with is the same as the one we described above.

Perhaps the most important example of this construction is when it is applied to $(\mathsf{Set}, \times, 1),$ because this is important in the next section! This operad has as types, sets, and an operation

$f \in \mathrm{Op}(\mathsf{Set})(X_1,\ldots,X_n;Y)$

is simply a function

$f \colon X_1 \times \cdots \times X_n \to Y$

Although ‘operad algebra’ is the name that has stuck in the literature, I think a better term would be ‘operad action’, because the analogy to keep in mind is that of a group action. A group action allows a group to ‘act on’ elements of a set; an operad algebra similarly allows an operad to ‘act on’ elements of a set.

Moreover, a group action can be described as a functor from the 1-element category representing that group to $\mathsf{Set},$ and as we will see, an operad algebra can also be described as an ‘operad morphism’ from the operad to $\mathrm{Op}(\mathsf{Set}),$ the operad just described in the last section.

In fact, this is how we will define an operad algebra; first we will define what an operad morphism is, and then we will define an operad algebra as an operad morphism to $\mathrm{Op}(\mathsf{Set}).$

An operad morphism $F$ from an operad $\mathcal{O}$ to an operad $\mathcal{P}$ is exactly what one would expect: it consists of

• For every $X_1,\ldots,X_n,Y \in \mathcal{O}_0,$ a map

$F \colon \mathcal{O}(X_1,\ldots,X_n;Y) \to \mathcal{P}(F(X_1),\ldots,F(X_n);F(Y))$

such that $F$ commutes with all of the things an operad does, i.e. composition, identities, and the action of $\sigma \in S(n).$

Thus an operad morphism $F$ from $\mathcal{O}$ to $\mathrm{Op}(\mathsf{Set}),$ also known as an operad algebra, consists of

• A set $F(X)$ for every $X \in \mathcal{O}_0$
• A function $F(f) \colon F(X_1) \times \cdots \times F(X_n) \to F(Y)$ for every operation $f \in \mathcal{O}(X_1,\ldots,X_n;Y)$

such that the assignment of sets and functions preserves identities, composition, and the action of $\sigma \in S(n).$

Without further ado, let’s look at the examples. From any ring $A$ we can produce an algebra $F_A$ of $\mathcal{R}.$ We let $F_A(R) = A$ (considered as a set), and for

$p(x_1,\ldots,x_n) \in \mathbb{Z}[x_1,\ldots,x_n] = \mathcal{R}(X_1,\ldots,X_n;Y)$

we let

$F(p)(a_1,\ldots,a_n) = p(a_1,\ldots,a_n)$

We can also make an operad algebra of point processes, $\mathrm{PP},$ for $\mathcal{P}.$ For $A \in \mathcal{P}_0,$ we let $\mathrm{PP}(A)$ be the set of point processes on $A.$ If $f \colon A_1 \sqcup \cdots \sqcup A_n \to B$ is an embedding, then we let $\mathrm{PP}(f)$ be the map that sends point processes $X_1,\ldots,X_n$ on $A_1,\ldots,A_n$ respectively to the point process $Y$ defined by

$Y_U = X_{f^{-1}(U) \cap A_1} + \cdots + X_{f^{-1}(U) \cap A_n}$

Finally, if $(C,\otimes,1)$ is a symmetric monoidal category, there is a way to make an operad algebra of $\mathrm{Op}(C)$ from a special type of functor $F \colon C \to \mathsf{Set}.$ This is convenient, because it is often easier to prove that the functor satisfies the necessary properties than it is to prove that the algebra is in fact well-formed.

The special kind of functor we need is a lax symmetric monoidal functor. This is a functor $F$ equipped with a natural transformation $\tau_{A,B} \colon F(A) \times F(B) \to F(A \otimes B)$ that is well-behaved with respect to the associator, identity, and symmetric structure of $(C, \otimes, 1).$ We call $\tau$ the laxator, and formally speaking, a lax symmetric monoidal functor consists of a functor along with a laxator. I won’t go into detail about the whole construction that makes an operad algebra out of a lax symmetric monoidal functor, but the basic idea is that given an operation $f \in \mathrm{Op}(C)(X,Y;Z)$ (which is a morphism $f \colon X \otimes Y \to Z$), we can construct a function $F(X) \times F(Y) \to F(Z)$ by composing

$\tau_{X,Y} \colon F(X) \times F(Y) \to F(X \otimes Y)$

with

$F(f) \colon F(X \otimes Y) \to F(Z)$

This basic idea can be extended using associativity to produce a function $X_1 \times \cdots \times X_n \to Y$ from an operation $f \colon X_1 \otimes \cdots \otimes X_n \to Y.$

As an example of this construction, consider point processes again. We can make a lax symmetric monoidal functor $\mathrm{PP}$ by sending a set $A$ to $\mathrm{PP}(A),$ the set of point processes on $A,$ and an embedding $f \colon A \to B$ to the map $F(f)$ that sends a point process $X$ to a point process $Y$ defined by

$Y_U = X_{f^{-1}(U)}$

The laxator $\tau_{A,B} \colon F(A) \times F(B) \to F(A \sqcup B)$ sends a point process $X$ on $A$ and a point process $Y$ on $B$ to a point process $Z$ on a $A \sqcup B$ defined by

$Z_{U} = X_{U \cap A} + Y_{U \cap B}$

The reader should inspect this definition and think about why it is equivalent to the earlier definition for the operad algebra of point processes.

Summary

This was a long post, so I’m going to try and go over the main points so that you can organize what you just learned in some sort of coherent fashion.

First I talked about how there frequently arises situations in which there isn’t a canonical way of ‘composing’ two things. The two examples that I gave were elements of a ring, and structures on spaces, specifically point processes.

I then talked about the formal way that we think about these situations. Namely, we organize the ‘ways of composing things’ into an operad, and then we organize the ‘things that we want to compose’ into an operad algebra. Along the way, I discussed a convenient way of making an operad out of a symmetric monoidal category, and an operad algebra out of a lax symmetric monoidal functor.

This construction will be important in the next post, when we make an operad of ‘ways of composing thermostatic systems’ and an operad algebra of thermostatic systems to go along with it.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

Compositional Thermostatics (Part 2)

7 February, 2022

guest post by Owen Lynch

In Part 1, John talked about a paper that we wrote recently:

• John Baez, Owen Lynch and Joe Moeller, Compositional thermostatics.

and he gave an overview of what a ‘thermostatic system’ is.

In this post, I want to talk about how to compose thermostatic systems. We will not yet use category theory, saving that for another post; instead we will give a ‘nuts-and-bolts’ approach, based on examples.

Suppose that we have two thermostatic systems and we put them in thermal contact, so that they can exchange heat energy. Then we predict that their temperatures should equalize. What does this mean precisely, and how do we derive this result?

Recall that a thermostatic system is given by a convex space $X$ and a concave entropy function $S \colon X \to [-\infty,\infty].$ A ‘tank’ of constant heat capacity, whose state is solely determined by its energy, has state space $X = \mathbb{R}_{> 0}$ and entropy function $S(U) = C \log(U),$ where $C$ is the heat capacity.

Now suppose that we have two tanks of heat capacity $C_1$ and $C_2$ respectively. As thermostatic systems, the state of both tanks is described by two energy variables, $U_1$ and $U_2,$ and we have entropy functions

$S_1(U_1) = C_1 \log(U_1)$

$S_2(U_2) = C_2 \log(U_2)$

By conservation of energy, the total energy of both tanks must remain constant, so

$U_1 + U_2 = U$

for some $U;$ equivalently

$U_2 = U - U_1$

The equilibrium state then has maximal total entropy subject to this constraint. That is, an equilibrium state $(U_1^{\mathrm{eq}},U_2^{\mathrm{eq}})$ must satisfy

$S_1(U_1^{\mathrm{eq}}) + S_2(U_2^{\mathrm{eq}}) = \max_{U_1+U_2=U} S_1(U_1) + S_2(U_2)$

We can now derive the condition of equal temperature from this condition. In thermodynamics, temperature is defined by

$\displaystyle{ \frac{1}{T} = \frac{\partial S}{\partial U} }$

The interested reader should calculate this for our entropy functions, and in doing this, see why we identify $C$ with the heat capacity. Now, manipulating the condition of equilibrium, we get

$\max_{U_1+U_2=U} S_1(U_1) + S_2(U_2) = \max_{U_1} S_1(U_1) + S_2(U-U_1)$

As a function of $U_1,$ the right hand side of this equation must have derivative equal to $0.$ Thus,

$\displaystyle{ \frac{\partial}{\partial U_1} (S_1(U_1) + S_2(U-U_1)) = 0 }$

Now, note that if $U_2 = U - U_1,$ then

$\displaystyle{ \frac{\partial}{\partial U_1} S(U-U_1) = -\frac{\partial}{\partial U_2} S(U_2) }$

Thus, the condition of equilibrium is

$\displaystyle{ \frac{\partial}{\partial U_1} S_1(U_1) = \frac{\partial}{\partial U_2} S_2(U_2) }$

Using the fact that

$\displaystyle{ \frac{1}{T_1} = \frac{\partial}{\partial U_1} S_1(U_1) , \qquad \frac{1}{T_2} = \frac{\partial}{\partial U_2} S_2(U_2) }$

the above equation reduces to

$\displaystyle{ \frac{1}{T_1} = \frac{1}{T_2} }$

so we have our expected condition of temperature equilibriation!

The result of composing several thermostatic systems should be a new thermostatic system. In the case above, the new thermostatic system is described by a single variable: the total energy of the system $U = U_1 + U_2.$ The entropy function of this new thermostatic system is given by the constrained supremum:

$S(U) = \max_{U = U_1 + U_2} S_1(U_1) + S_2(U_2)$

The reader should verify that this ends up being the same as a system with heat capacity $C_1 + C_2,$ i.e. with entropy function given by

$S(U) = (C_1 + C_2) \log(U)$

A very similar argument goes through when one has two systems that can exchange both heat and volume; both temperature and pressure are equalized as a consequence of entropy maximization. We end up with a system that is parameterized by total energy and total volume, and has an entropy function that is a function of those quantities.

The general procedure is the following. Suppose that we have $n$ thermostatic systems, $(X_1,S_1),\ldots,(X_n,S_n).$ Let $Y$ be a convex space, that we think of as describing the quantities that are conserved when we compose the $n$ thermostatic systems (i.e., total energy, total volume, etc.). Each value of the conserved quantities $y \in Y$ corresponds to many different possible values for $x_1 \in X_1, \ldots x_n \in X_n.$ We represent this with a relation

$R \subseteq X_1 \times \cdots \times X_n \times Y$

We then turn $Y$ into a thermostatic system by using the entropy function

$S(y) = \max_{R(x_1,\ldots,x_n,y)} S_1(x_1) + \ldots + S_n(x_n)$

It turns out that if we require $R$ to be a convex relation (that is, a convex subspace of $X_1 \times \cdots \times X_n \times Y$) then $S$ as defined above ends up being a concave function, so $(Y,S)$ is a true thermostatic system.

We will have to wait until a later post in the series to see exactly how we describe this procedure using category theory. For now, however, I want to talk about why this procedure makes sense.

In the statistical mechanical interpretation, entropy is related to the probability of observing a specific macrostate. As we scale the system, the theory of large deviations tells us that seeing any macrostate other than the most probable macrostate is highly unlikely. Thus, we can find the macrostate that we will observe in practice by finding the entropy maxima. For an exposition of this point of view, see this paper:

• Jeffrey Commons, Ying-Jen Yang and Hong Qian, Duality symmetry, two entropy functions, and an eigenvalue problem in Gibbs’ theory.

There is also a dynamical systems interpretation of entropy, where entropy serves as a Lyapunov function for a dynamical system. This is the viewpoint taken here:

• Wassim M. Haddad, A Dynamical Systems Theory of Thermodynamics, Princeton U. Press.

In each of these viewpoints, however, the maximization of entropy is not global, but rather constrained. The dynamical system only maximizes entropy along its orbit, and the statistical mechanical system maximizes entropy with respect to constraints on the probability distribution.

We can think of thermostatics as a ‘common refinement’ of both of these points of view. We are agnostic as to the mechanism by which constrained maximization of entropy takes place and we are simply interested in investigating its consequences. We expect that a careful formalization of either system should end up deriving something similar to our thermostatic theory in the limit.

See all four parts of this series:

Part 1: thermostatic systems and convex sets.

Part 2: composing thermostatic systems.

Part 3: operads and their algebras.

Part 4: the operad for composing thermostatic systems.

Hardy, Ramanujan and Taxi No. 1729

30 January, 2022

In his book Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work, G. H. Hardy tells this famous story:

He could remember the idiosyncracies of numbers in an almost uncanny way. It was Littlewood who said every positive integer was one of Ramanujan’s personal friends. I remember once going to see him when he was lying ill at Putney. I had ridden in taxi-cab No. 1729, and remarked that the number seemed to be rather a dull one, and that I hoped it was not an unfavourable omen. “No,” he replied, “it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways.”

Namely,

$10^3 + 9^3 = 1000 + 729 = 1729 = 1728 + 1 = 12^3 + 1^3$

But there’s more to this story than meets the eye.

First, it’s funny how this story becomes more dramatic with each retelling. In the foreword to Hardy’s book A Mathematician’s Apology, his friend C. P. Snow tells it thus:

Hardy used to visit him, as he lay dying in hospital at Putney. It was on one of those visits that there happened the incident of the taxicab number. Hardy had gone out to Putney by taxi, as usual his chosen method of conveyance. He went into the room where Ramanujan was lying. Hardy, always inept about introducing a conversation, said, probably without a greeting, and certainly as his first remark: “I thought the number of my taxicab was 1729. It seemed to me rather a dull number.” To which Ramanujan replied: “No, Hardy! No, Hardy! It is a very interesting number. It is the smallest number expressible as the sum of two cubes in two different ways.”

Here Hardy becomes “inept” and makes his comment “probably without a greeting, and certainly as his first remark”. Perhaps the ribbing of a friend who knew Hardy’s ways?

I think I’ve seen later versions where Hardy “burst into the room”.

But it’s common for legends to be embroidered with the passage of time. Here’s something more interesting. In Ono and Trebat-Leder’s paper The 1729 K3 surface, they write:

While this anecdote might give one the impression that Ramanujan came up with this amazing property of 1729 on the spot, he actually had written it down before even coming to England.

In fact they point out that Ramanujan wrote it down more than once!

Before he went to England, Ramanujan mainly published by posting puzzles to the questions section of the Journal of the Indian Mathematical Society. In 1913, in Question 441, he challenged the reader to prove a formula expressing a specific sort of perfect cube as a sum of three perfect cubes. If you keep simplifying this formula to see why it works, you eventually get

$12^3 = (-1)^3 + 10^3 + 9^3$

In Ramanujan’s Notebooks, Part III, Bruce Berndt explains that Ramanujan developed a method for finding solutions of Euler’s diophantine equation

$a^3 + b^3 = c^3 + d^3$

in his “second notebook”. This is one of three notebooks Ramanujan left behind after his death—and the results in this one were written down before he first went to England. In Item 20(iii) he describes his method and lists many example solutions, the simplest being

$1^3 + 12^3 = 9^3 + 10^3$

In 1915 Ramanujan posed another puzzle about writing a sixth power as a sum of three cubes, Question 661. And he posed a puzzle about writing $1$ as a sum of three cubes, Question 681.

Finally, four or five years later, Ramanujan revisited the equation $a^3 + b^3 = c^3 + d^3$ in his so-called Lost Notebook. This was actually a pile of 138 loose unnumbered pages written by Ramanujan in the last two years of his life, 1919 and 1920. George Andrews found them in a box in Trinity College, Cambridge much later, in 1976.

Now the pages have been numbered, published and intensively studied: George Andrews and Bruce Berndt have written five books about them! Here is page 341 of Ramanujan’s Lost Notebook, where he came up with a method for finding an infinite family of integer solutions to the equation $a^3 + b^3 = c^3 + d^3$:

As you can see, one example is

$9^3 + 10^3 = 12^3 + 1$

In Section 8.5 of George Andrews and Bruce Berndt’s book
Ramanujan’s Lost Notebook: Part IV, they discuss Ramanujan’s method, calling it “truly remarkable”.

In short, Ramanujan was well aware of the special properties of the number 1729 before Hardy mentioned it. And something prompted Ramanujan to study the equation $a^3 + b^3 = c^3 + d^3$ again near the end of his life, and find a new way to solve it.

Could it have been the taxicab incident??? Or did Hardy talk about the taxi after Ramanujan had just thought about the number 1729 yet again? In the latter case, it’s hardly a surprise that Ramanujan remembered it.

Thinking about this story, I’ve started wondering about what really happened here. First of all, as James Dolan pointed out to me, you don’t need to be a genius to notice that

$1000 + 729 = 1728 + 1$

Was Hardy, the great number theorist, so blind to the properties of numbers that he didn’t notice either of these ways of writing 1729 as a sum of two cubes? Base ten makes them very easy to spot if you know your cubes, and I’m sure Hardy knew $9^3 = 729$ and $12^3 = 1728$.

Second of all, how often do number theorists come out and say that a number is uninteresting? Except in that joke about the “least uninteresting number”, I don’t think I’ve heard it happen.

My wife Lisa suggested an interesting possibility that would resolve all these puzzles:

Hardy either knew of Ramanujan’s work on this problem or noticed himself that 1729 had a special property. He wanted to cheer up his dear friend Ramanujan, who was lying deathly ill in the hospital. So he played the fool by walking in and saying that 1729 was “rather dull”.

I have no real evidence for this, and I’m not claiming it’s true. But I like how it flips the meaning of the story. And it’s not impossible. Hardy was, after all, a bit of a prankster: each time he sailed across the Atlantic he sent out a postcard saying he had proved the Riemann Hypothesis, just in case he drowned.

We could try to see if there really was a London taxi with number 1729 at that time. It would be delicious to discover that it was merely an invention of Hardy’s. But I don’t know if records of London taxi-cab numbers from around 1919 still exist.

Maybe I’ll let C. P. Snow have the last word. After telling his version of the incident with Hardy, Ramanujan and the taxicab, he writes:

This is the exchange as Hardy recorded it. It must be substantially accurate. He was the most honest of men; and further no one could possibly have invented it.

The Kepler Problem (Part 4)

27 January, 2022

The Kepler problem is the study of a particle moving in an attractive inverse square force. In classical mechanics, this problem shows up when you study the motion of a planet around the Sun in the Solar System. In quantum mechanics, it shows up when you study the motion of an electron around a proton in a hydrogen atom.

In Part 2 we saw that the classical Kepler problem has, besides energy and the three components of angular momentum, three more conserved quantities: the components of the eccentricity vector!

This was discovered long ago, in 1710, by the physicist Jakob Hermann. But thanks to Noether, we now know that in classical mechanics, conserved quantities come from symmetries. In the Kepler problem, conservation of energy comes from time translation symmetry, while conservation of the angular momentum comes from rotation symmetry. Which symmetries give conservation of the eccentricity vector?

As we shall see, these symmetries are rotations in 4-dimensional space. These include the obvious rotations in 3-dimensional space which give angular momentum. The other 4-dimensional rotations act in a much less obvious way, and give the eccentricity vector.

In fact, we’ll see that the Kepler problem can be rephrased in terms of a free particle moving around on a sphere in 4-dimensional space. This is a nice explanation of the 4-dimensional rotation symmetry.

After that we’ll see a second way to rephrase the Kepler problem: in terms of a massless, relativistic free particle moving at the speed of light on a sphere in 4-dimensional space. Our first formulation will not involve relativity. This second will.

All this is very nice. You can read some fun explanations of the first formulation here:

• Greg Egan, The ellipse and the atom.

• John Baez, Planets in the fourth dimension.

But how could you guess this 4-dimensional rotation symmetry if you didn’t know about it already? One systematic approach uses Poisson brackets. I won’t explain these, just dive in and use them!

Remember, the particle in the Kepler problem has various observables, which are all ultimately functions of its position and momentum:

• position: $\vec q$

• momentum: $\vec p$

• energy: $H = \tfrac{1}{2} p^2 - \tfrac{1}{q}$

• angular momentum: $\vec L = \vec q \times \vec p$

• the eccentricity vector: $\vec e = \vec p \times \vec L - \tfrac{\vec q}{q}$

I’ll use conventions where the Poisson brackets of the components of position $q_k$ and momentum $p_\ell$ are taken to be

$\{q_k,p_\ell\} = \delta_{jk}$

From this, using the rules for Poisson brackets, we can calculate the Poisson brackets of everything else. For starters:

$\{H, L_k\} = \{H,e_h\} = 0$

These equations are utterly unsurprising, since they are equivalent to saying that angular momentum $\vec L$ and the eccentricity vector $\vec e$ are conserved. More interestingly, we have

$\begin{array}{ccl} \{L_k, L_\ell\} &=& \epsilon_{jk\ell} L_\ell \\ \{e_k, L_\ell\} &=& \epsilon_{jk\ell} e_\ell \\ \{e_k, e_\ell \} &=& -2H \epsilon_{jk\ell} L_\ell \end{array}$

where all the indices go from 1 to 3, I’m summing over repeated indices even if they’re both subscripts, and $\epsilon_{jk\ell}$ are the Levi–Civita symbols.

Now, the factor of $-2H$ above is annoying. But on the region of phase space where $H < 0$—that is, the space of bound states, where the particle carries out an elliptical orbit—we can define a new vector to deal with this annoyance:

$\displaystyle{ \vec M = \frac{\vec e}{\sqrt{-2H}} }$

Now we easily get

$\begin{array}{ccl} \{L_k, L_\ell\} &=& \epsilon_{jk\ell} L_\ell \\ \{L_j, M_k\} &=& \epsilon_{jk\ell} M_\ell \\ \{M_j, M_k \} &=& \epsilon_{jk\ell} M_\ell \end{array}$

This is nicer, but we can simplify it even more if we introduce some new vectors that are linear combinations of $\vec L$ and $\vec M,$ namely half their sum and half their difference:

$\vec A = \tfrac{1}{2} (\vec L + \vec M), \qquad \vec B = \tfrac{1}{2}(\vec L - \vec M)$

Then we get

$\begin{array}{ccl} \{ A_j, A_k\} &=& \epsilon_{jk\ell} A_\ell \\ \{ B_j, B_k\} &=& \epsilon_{jk\ell} B_\ell \\ \{ A_j, B_k\} &=& 0 \end{array}$

So, the observables $A_j$ and $B_k$ contain the same information as the angular momentum and eccentricity vectors, but now they commute with each other!

What does this mean?

Well, when you’re first learning math the Levi–Civita symbols $\epsilon_{jk\ell}$ may seem like just a way to summarize the funny rules for cross products in 3-dimensional space. But as you proceed, you ultimately learn that $\mathbb{R}^3$ with its cross product is the Lie algebra of the Lie group $\mathrm{SO}(3)$ of rotations in 3-dimensional space. From this viewpoint, the Levi–Civita symbols are nothing but the structure constants for the Lie algebra $\mathfrak{so}(3):$ that is, a way of describing the bracket operation in this Lie algebra in terms of basis vectors.

So, what we’ve got here are two commuting copies of $\mathfrak{so}(3),$ one having the $A_j$ as a basis and the other having the $B_k$ as a basis, both with the Poisson bracket as their Lie bracket.

A better way to say the same thing is that we’ve got a single 6-dimensional Lie algebra

$\mathfrak{so}(3) \oplus \mathfrak{so}(3)$

having both the $A_j$ and $B_k$ as basis. But then comes the miracle:

$\mathfrak{so}(3) \oplus \mathfrak{so}(3) \cong \mathfrak{so}(4)$

The easiest way to see this is to realize that $S^3,$ the unit sphere in 4 dimensions, is itself a Lie group with Lie algebra isomorphic to $\mathfrak{so}(3).$ Namely, it’s the unit quaternions!—or if you prefer, the Lie group $\mathrm{SU}(2).$ Like any Lie group it acts on itself via left and right translations, which commute. But these are actually ways of rotating $S^3.$ So, you get a map of Lie algebras from $\mathfrak{so}(3) \oplus \mathfrak{so}(3)$ to $\mathfrak{so}(4),$ and you can check that this is an isomorphism.

So in this approach, the 4th dimension pops out of the fact that the Kepler problem has conserved quantities that give two commuting copies of $\mathfrak{so}(3).$ By Noether’s theorem, it follows that conservation of angular momentum and the eccentricity vector must come from a hidden symmetry: symmetry under some group whose Lie algebra is $\mathfrak{so}(4).$

And indeed, it turns out that the group $\mathrm{SO}(4)$ acts on the bound states of the Kepler problem in a way that commutes with time evolution!

But how can we understand this fact?

Historically, it seems that the first explanation was found in the quantum-mechanical context. In 1926, even before Schrödinger came up with his famous equation, Pauli used conservation of angular momentum and the eccentricity to determine the spectrum of hydrogen. But I believe he was using what we now call Lie algebra methods, not bringing in the group $\mathrm{SO}(4).$

In 1935, Vladimir Fock, famous for the ‘Fock space’ in quantum field theory, explained this 4-dimensional rotation symmetry by setting up an equivalence between hydrogen atom bound states and functions on the 3-sphere! In the following year, Valentine Bargmann, later famous for being Einstein’s assistant, connected Pauli and Fock’s work using group representation theory.

All this is quantum mechanics. It seems the first global discussion of this symmetry in the classical context was given by Bacry, Ruegg, and Souriau in 1966, leading to important work by Souriau and Moser in the early 1970s. Since then, much more has been done. You can learn about a lot of it from these two books, which are my constant companions these days:

• Victor Guillemin and Shlomo Sternberg, Variations on a Theme by Kepler, Providence, R.I., American Mathematical Society, 1990.

• Bruno Cordani, The Kepler Problem: Group Theoretical Aspects, Regularization and Quantization, with Application to the Study of Pertubation, Birkhäuser, Boston, 2002.

But let me try to summarize a bit of this material.

One way to understand the $\mathrm{SO}(4)$ symmetry for bound states of the Kepler problem is the result of Hamilton that I explained last time: for a particle moving around an elliptical orbit in the Kepler problem, its momentum moves round and round in a circle.

I’ll call these circles Hamilton’s circles. Hamilton’s circles are not arbitrary circles in $\mathbb{R}^3$. Using the inverse of stereographic projection, we can map $\mathbb{R}^3$ to the unit 3-sphere:

$\begin{array}{rccl} f \colon &\mathbb{R}^3 &\to & S^3 \subset \mathbb{R}^4 \\ \\ & \vec p &\mapsto & \displaystyle{\left(\frac{p^2 - 1}{p^2 +1}, \frac{2 \vec p}{p^2 + 1}\right).} \end{array}$

This map sends Hamilton’s circles in $\mathbb{R}^3$ to great circles in $S^3.$ Furthermore, this construction gives all the great circles in $S^3$ except those that go through the north and south poles, $(\pm 1, 0,0,0).$ These missing great circles correspond to periodic orbits in the Kepler problem where a particle starts with momentum zero, falls straight to the origin, and bounces back the way it came. If we include these degenerate orbits, every great circle on the unit 3-sphere is the path traced out by the momentum in some solution of the Kepler problem.

Let me reemphasize: in this picture, points of $S^3$ correspond not to positions but to momenta in the Kepler problem. As time passes, these points move along great circles in $S^3...$ but not at constant speed.

How is their dynamics related to geodesic motion on the 3-sphere?
We can understand this as follows. In Part 2 we saw that

$L^2 + M^2 = - \frac{1}{2H}$

and using the fact that $\vec L \cdot \vec M = 0,$ an easy calculation gives

$H \; = \; -\frac{1}{8A^2} \; = \; -\frac{1}{8B^2}$

In the 3-sphere picture, the observables $A_j$ become functions on the cotangent bundle $T^\ast S^3$. These functions are just the components of momentum for a particle on $S^3$, defined using a standard basis of right-invariant vector fields on $S^3 \cong \mathrm{SU}(2).$ Similarly, the observables $B_j$ are the components of momentum using a standard basis of left-invariant vector fields. It follows that

$K = 8A^2 = 8B^2$

is the Hamiltonian for a nonrelativistic free particle on $S^3$ with an appropriately chosen mass. Such a particle moves around a great circle on $S^3$ at constant speed. Since the Kepler Hamiltonian $H$ is a function of $K$, particles governed by this Hamiltonian move along the same trajectories—but typically not at constant speed!

Both $K$ and the Kepler Hamiltonian $H = -1/K$ are well-defined smooth functions on the symplectic manifold that Souriau dubbed the Kepler manifold:

$T^+ S^3 = \{ (x,p) : \; x \in S^3, \, p \in T_x S^3, \, p \ne 0 \}$

This is the cotangent bundle of the 3-sphere with the zero cotangent vectors removed, so that $H = -1/K$ is well-defined.

All this is great. But even better, there’s yet another picture of what’s going on, which brings relativity into the game!

We can also think of $T^+ S^3$ as a space of null geodesics in the Einstein universe: the manifold $\mathbb{R} \times S^3$ with the Lorentzian metric

$dt^2 - ds^2$

where $dt^2$ is the usual Riemannian metric on the real line (‘time’) and $ds^2$ is the usual metric on the unit sphere (‘space’). In this picture $x \in S^3$ describes the geodesic’s position at time zero, while the null cotangent vector $p + \|p\| dt$ describes its 4-momentum at time zero. Beware: in this picture two geodesics count as distinct if we rescale $p$ by any positive factor other than 1. But this is good: physically, it reflects the fact that in relativity, massless particles can have different 4-momentum even if they trace out the same path in spacetime.

In short, the Kepler manifold $T^+ S^3$ also serves as the classical phase space for a free massless spin-0 particle in the Einstein universe!

And here’s the cool part: the Hamiltonian for such a particle is

$\sqrt{K} = \sqrt{-1/H}$

So it’s a function of both the Hamiltonians we’ve seen before. Thus, time evolution given by this Hamiltonian carries particles around great circles on the 3-sphere… at constant speed, but at a different speed than the nonrelativistic free particle described by the Hamiltonian $K.$

In future episodes, I want to quantize this whole story. We’ll get some interesting outlooks on the quantum mechanics of the hydrogen atom.

Learning Computer Science With Categories

26 January, 2022

The first book in Bob Coecke’s series on applied category theory is out, and the pdf is free—legally, even!—until 8 February 2022. Grab a copy now:

• Noson Yanofsky, Theoretical Computer Science for the Working Category Theorist, Cambridge U. Press, 2022.

There are already books on category theory for theoretical computer scientists. Why the reverse? Yanofsky explains:

There’s just one catch: you need to know category theory.

But it’s worth learning category theory, because it’s like a magic key to many subjects. It helps you learn more, faster.

26 January, 2022