Open Petri Nets (Part 2)

18 August, 2018

I’d like to continue talking about this paper:

• John Baez and Jade Master, Open Petri nets.

Last time I explained, in a sketchy way, the double category of open Petri nets. This time I’d like to describe a ‘semantics’ for open Petri nets.

In his famous thesis Functorial Semantics of Algebraic Theories, Lawvere introduced the idea that semantics, as a map from expressions to their meanings, should be a functor between categories. This has been generalized in many directions, and the same idea works for double categories. So, we describe our semantics for open Petri nets as a map

\blacksquare \colon \mathbb{O}\mathbf{pen}(\mathrm{Petri}) \to \mathbb{R}\mathbf{el}

from our double category of open Petri nets to a double category of relations. This map sends any open Petri net to its ‘reachability relation’.

In Petri net theory, a marking of a set X is a finite multisubset of X. We can think of this as a way of placing finitely ‘tokens’—little black dots—on the elements of X. A Petri net lets us start with some marking of its places and then repeatedly change the marking by moving tokens around, using the transitions. This is how Petri nets describe processes!

For example, here’s a Petri net from chemistry:

Here’s a marking of its places:

But using the transitions, we can repeatedly change the marking. We started with one atom of carbon, one molecule of oxygen, one molecule of sodium hydroxide and one molecule of hydrochloric acid. But they can turn into one molecule of carbon dioxide, one molecule of sodium hydroxide and one molecule of hydrochloric acid:

These can then turn into one molecule of sodium bicarbonate and one molecule of hydrochloric acid:

Then these can turn into one molecule of carbon dioxide, one molecule of water and one molecule of sodium chloride:

People say one marking is reachable from another if you can get it using a finite sequence of transitions in this manner. (Our paper explains this well-known notion more formally.) In this example every marking has 0 or 1 tokens in each place. But that’s not typical: in general we could have any natural number of tokens in each place, so long as the total number of tokens is finite.

Our paper adapts the concept of reachability to open Petri nets. Let \mathbb{N}[X] denote the set of markings of the set X. Given an open Petri net P \colon X \nrightarrow Y there is a reachability relation

\blacksquare P \subseteq \mathbb{N}[X] \times \mathbb{N}[Y]

This relation holds when we can take a given marking of X, feed those tokens into the Petri net P, move them around using transitions in P, and have them pop out and give a certain marking of Y, leaving no tokens behind.

For example, consider this open Petri net P \colon X \nrightarrow Y:

Here is a marking of X:

We can feed these tokens into P and move them around using transitions in P:

They can then pop out into Y, leaving none behind:

This gives a marking of Y that is ‘reachable’ from the original marking of X.

The main result of our paper is that the map sending an open Petri net P to its reachability relation \blacksquare P extends to a ‘lax double functor’

\blacksquare \colon \mathbb{O}\mathbf{pen}(\mathrm{Petri}) \to \mathbb{R}\mathbf{el}

where \mathbb{O}\mathbf{pen}(\mathrm{Petri}) is a double category having open Petri nets as horizontal 1-cells and \mathbb{R}\mathbf{el} is a double category having relations as horizontal 1-cells.

I can give you a bit more detail on those double categories, and also give you a clue about what ‘lax’ means, without it becoming too stressful.

Last time I said the double category \mathbb{O}\mathbf{pen}(\mathrm{Petri}) has:

• sets X, Y, Z, \dots as objects,

• functions f \colon X \to Y as vertical 1-morphisms,

• open Petri nets P \colon X \nrightarrow Y as horizontal 1-cells—they look like this:

• morphisms between open Petri nets as 2-morphisms—an example would be the visually obvious map from this open Petri net:

to this one:

What about \mathbb{R}\mathbf{el}? This double category has

• sets X, Y, Z, \dots as objects,

• functions f \colon X \to Y as vertical 1-morphisms,

• relations R \subseteq X \times Y as horizontal 1-cells from X to Y, and

• maps between relations as 2-morphisms. Here a map between relations is a square

that obeys

(f \times g) R \subseteq S

So, the idea of the reachability semantics is that it maps:

• any set X to the set \mathbb{N}[X] consisting of all markings of that set.

• any function f \colon X \to Y to the obvious function

\mathbb{N}(f) \colon \mathbb{N}[X] \to \mathbb{N}[Y]

(Yes, \mathbb{N} is a really a functor.)

• any open Petri net P \colon X \nrightarrow Y to its reachability relation

\blacksquare P \colon \mathbb{N}[X] \to \mathbb{N}[Y]

• any morphism between Petri nets to the obvious map between their reachability relations.

Especially if you draw some examples, all this seems quite reasonable and nice. But it’s important to note that \blacksquare is a lax double functor. This means that it does not send a composite open Petri net PQ to the composite of the reachability relations for P and Q. So, we do not have

\blacksquare Q \; \blacksquare P = \blacksquare (QP)

Instead, we just have

\blacksquare Q \; \blacksquare P \subseteq \blacksquare (QP)

It’s easy to see why. Take P \colon X \nrightarrow Y to be this open Petri net:

and take Q \colon Y \nrightarrow Z to be this one:

Then their composite QP \colon X \nrightarrow Y is this:

It’s easy to see that \blacksquare (QP) is a proper subset of \blacksquare Q \; \blacksquare P. In QP a token can move all the way from point 1 to point 5. But it does not do so by first moving through P and then moving through Q! It has to take a more complicated zig-zag path where it first leaves P and enters Q, then comes back into P, and then goes to Q.

In our paper, Jade and I conjecture that we get

\blacksquare Q \; \blacksquare P = \blacksquare (QP)

if we restrict the reachability semantics to a certain specific sub-double category of \mathbb{O}\mathbf{pen}(\mathrm{Petri}) consisting of ‘one-way’ open Petri nets.

Finally, besides showing that

\blacksquare \colon \mathbb{O}\mathbf{pen}(\mathrm{Petri}) \to \mathbb{R}\mathbf{el}

is a lax double functor, we also show that it’s symmetric monoidal. This means that the reachability semantics works as you’d expect when you run two open Petri nets ‘in parallel’.

In a way, the most important thing about our paper is that it illustrates some methods to study semantics for symmetric monoidal double categories. Kenny Courser and I will describe these methods more generally in our paper “Structured cospans.” They can be applied to timed Petri nets, colored Petri nets, and various other kinds of Petri nets. One can also develop a reachability semantics for open Petri nets that are glued together along transitions as well as places.

I hear that the company Statebox wants these and other generalizations. We aim to please—so we’d like to give it a try.

Next time I’ll wrap up this little series of posts by explaining how Petri nets give symmetric monoidal categories.

Part 1: the double category of open Petri nets.

Part 2: the reachability semantics for open Petri nets.

Part 3: the free symmetric monoidal category on a Petri net.

Open Petri Nets (Part 1)

15 August, 2018

Jade Master and I have just finished a paper on open Petri nets:

• John Baez and Jade Master, Open Petri nets.

Abstract. The reachability semantics for Petri nets can be studied using open Petri nets. For us an ‘open’ Petri net is one with certain places designated as inputs and outputs via a cospan of sets. We can compose open Petri nets by gluing the outputs of one to the inputs of another. Open Petri nets can be treated as morphisms of a category, which becomes symmetric monoidal under disjoint union. However, since the composite of open Petri nets is defined only up to isomorphism, it is better to treat them as morphisms of a symmetric monoidal double category \mathbb{O}\mathbf{pen}(\mathrm{Petri}). Various choices of semantics for open Petri nets can be described using symmetric monoidal double functors out of \mathbb{O}\mathbf{pen}(\mathrm{Petri}). Here we describe the reachability semantics, which assigns to each open Petri net the relation saying which markings of the outputs can be obtained from a given marking of the inputs via a sequence of transitions. We show this semantics gives a symmetric monoidal lax double functor from \mathbb{O}\mathbf{pen}(\mathrm{Petri}) to the double category of relations. A key step in the proof is to treat Petri nets as presentations of symmetric monoidal categories; for this we use the work of Meseguer, Montanari, Sassone and others.

I’m excited about this, especially because our friends at Statebox are planning to use open Petri nets in their software. They’ve recently come out with a paper too:

• Fabrizio Romano Genovese and Jelle Herold, Executions in (semi-)integer Petri nets are compact closed categories.

Petri nets are widely used to model open systems in subjects ranging from computer science to chemistry. There are various kinds of Petri net, and various ways to make them ‘open’, and my paper with Jade only handles the simplest. But our techniques are flexible, so they can be generalized.

What’s an open Petri net? For us, it’s a thing like this:

The yellow circles are called ‘places’ (or in chemistry, ‘species’). The aqua rectangles are called ‘transitions’ (or in chemistry, ‘reactions’). There can in general be lots of places and lots of transitions. The bold arrows from places to transitions and from transitions to places complete the structure of a Petri net. There are also arbitrary functions from sets X and Y into the set of places. This makes our Petri net into an ‘open’ Petri net.

We can think of open Petri nets as morphisms between finite sets. There’s a way to compose them! Suppose we have an open Petri net P from X to Y, where now I’ve given names to the points in these sets:

We write this as P \colon X \nrightarrow Y for short, where the funky arrow reminds us this isn’t a function between sets. Given another open Petri net Q \colon Y \nrightarrow Z, for example this:

the first step in composing P and Q is to put the pictures together:

At this point, if we ignore the sets X,Y,Z, we have a new Petri net whose set of places is the disjoint union of those for P and Q.

The second step is to identify a place of P with a place of Q whenever both are images of the same point in Y. We can then stop drawing everything involving Y, and get an open Petri net QP \colon X \nrightarrow Z, which looks like this:

Formalizing this simple construction leads us into a bit of higher category theory. The process of taking the disjoint union of two sets of places and then quotienting by an equivalence relation is a pushout. Pushouts are defined only up to canonical isomorphism: for example, the place labeled C in the last diagram above could equally well have been labeled D or E. This is why to get a category, with composition strictly associative, we need to use isomorphism classes of open Petri nets as morphisms. But there are advantages to avoiding this and working with open Petri nets themselves. Basically, it’s better to work with things than mere isomorphism classes of things! If we do this, we obtain not a category but a bicategory with open Petri nets as morphisms.

However, this bicategory is equipped with more structure. Besides composing open Petri nets, we can also ‘tensor’ them via disjoint union: this describes Petri nets being run in parallel rather than in series. The result is a symmetric monoidal bicategory. Unfortunately, the axioms for a symmetric monoidal bicategory are cumbersome to check directly. Double categories turn out to be more convenient.

Double categories were introduced in the 1960s by Charles Ehresmann. More recently they have found their way into applied mathematics. They been used to study various things, including open dynamical systems:

• Eugene Lerman and David Spivak, An algebra of open continuous time dynamical systems and networks.

open electrical circuits and chemical reaction networks:

• Kenny Courser, A bicategory of decorated cospans, Theory and Applications of Categories 32 (2017), 995–1027.

open discrete-time Markov chains:

• Florence Clerc, Harrison Humphrey and P. Panangaden, Bicategories of Markov processes, in Models, Algorithms, Logics and Tools, Lecture Notes in Computer Science 10460, Springer, Berlin, 2017, pp. 112–124.

and coarse-graining for open continuous-time Markov chains:

• John Baez and Kenny Courser, Coarse-graining open Markov processes. (Blog article here.)

As noted by Shulman, the easiest way to get a symmetric monoidal bicategory is often to first construct a symmetric monoidal double category:

• Mike Shulman, Constructing symmetric monoidal bicategories.

The theory of ‘structured cospans’ gives a systematic way to build symmetric monoidal double categories—Kenny Courser and I are writing a paper on this—and Jade and I use this to construct the symmetric monoidal double category of open Petri nets.

A 2-morphism in a double category can be drawn as a square like this:

We call X_1,X_2,Y_1 and Y_2 ‘objects’, f and g ‘vertical 1-morphisms’, M and N ‘horizontal 1-cells’, and \alpha a ‘2-morphism’. We can compose vertical 1-morphisms to get new vertical 1-morphisms and compose horizontal 1-cells to get new horizontal 1-cells. We can compose the 2-morphisms in two ways: horizontally and vertically. (This is just a quick sketch of the ideas, not the full definition.)

In our paper, Jade and I start by constructing a symmetric monoidal double category \mathbb{O}\mathbf{pen}(\textrm{Petri}) with:

• sets X, Y, Z, \dots as objects,

• functions f \colon X \to Y as vertical 1-morphisms,

• open Petri nets P \colon X \nrightarrow Y as horizontal 1-cells,

• morphisms between open Petri nets as 2-morphisms.

(Since composition of horizontal 1-cells is associative only up to an invertible 2-morphism, this is technically a pseudo double category.)

What are the morphisms between open Petri nets like? A simple example may be help give a feel for this. There is a morphism from this open Petri net:

to this one:

mapping both primed and unprimed symbols to unprimed ones. This describes a process of ‘simplifying’ an open Petri net. There are also morphisms that include simple open Petri nets in more complicated ones, etc.

This is just the start. Our real goal is to study the semantics of open Petri nets: that is, how they actually describe processes! And for that, we need to think about the free symmetric monoidal category on a Petri net. You can read more about those things in Part 2 and Part 3 of this series.

Part 1: the double category of open Petri nets.

Part 2: the reachability semantics for open Petri nets.

Part 3: the free symmetric monoidal category on a Petri net.

The Philosophy and Physics of Noether’s Theorems

11 August, 2018


I’ll be speaking at a conference celebrating the centenary of Emmy Noether’s work connecting symmetries and conservation laws:

The Philosophy and Physics of Noether’s Theorems, 5-6 October 2018, Fischer Hall, 1-4 Suffolk Street, London, UK. Organized by Bryan W. Roberts (LSE) and Nicholas Teh (Notre Dame).

They write:

2018 brings with it the centenary of a major milestone in mathematical physics: the publication of Amalie (“Emmy”) Noether’s theorems relating symmetry and physical quantities, which continue to be a font of inspiration for “symmetry arguments” in physics, and for the interpretation of symmetry within philosophy.

In order to celebrate Noether’s legacy, the University of Notre Dame and the LSE Centre for Philosophy of Natural and Social Sciences are co-organizing a conference that will bring together leading mathematicians, physicists, and philosophers of physics in order to discuss the enduring impact of Noether’s work.

There’s a registration fee, which you can see on the conference website, along with a map showing the conference location, a schedule of the talks, and other useful stuff.

Here are the speakers:

John Baez (UC Riverside)

Jeremy Butterfield (Cambridge)

Anne-Christine Davis (Cambridge)

Sebastian De Haro (Amsterdam and Cambridge)

Ruth Gregory (Durham)

Yvette Kosmann-Schwarzbach (Paris)

Peter Olver (UMN)

Sabrina Pasterski (Harvard)

Oliver Pooley (Oxford)

Tudor Ratiu (Shanghai Jiao Tong and Geneva)

Kasia Rejzner (York)

Robert Spekkens (Perimeter)

I’m looking forward to analyzing the basic assumptions behind various generalizations of Noether’s first theorem, the one that shows symmetries of a Lagrangian give conserved quantities. Having generalized it to Markov processes, I know there’s a lot more to what’s going on here than just the wonders of Lagrangian mechanics:

• John Baez and Brendan Fong, A Noether theorem for Markov processes, J. Math. Phys. 54 (2013), 013301. (Blog article here.)

I’ve been trying to get to the bottom of it ever since.

Compositionality: the Editorial Board

17 July, 2018

The editors of this journal have an announcement:

We are happy to announce the founding editorial board of Compositionality, featuring established researchers working across logic, computer science, physics, linguistics, coalgebra, and pure category theory (see the full list below). Our steering board considered many strong applications to our initial open call for editors, and it was not easy narrowing down to the final list, but we think that the quality of this editorial board and the general response bodes well for our growing research community.

In the meantime, we hope you will consider submitting something to our first issue. Look out in the coming weeks for the journal’s official open-for-submissions announcement.

The editorial board of Compositionality:

• Corina Cristea, University of Southampton, UK
• Ross Duncan, University of Strathclyde, UK
• Andrée Ehresmann, University of Picardie Jules Verne, France
• Tobias Fritz, Max Planck Institute, Germany
• Neil Ghani, University of Strathclyde, UK
• Dan Ghica, University of Birmingham, UK
• Jeremy Gibbons, University of Oxford, UK
• Nick Gurski, Case Western Reserve University, USA
• Helle Hvid Hansen, Delft University of Technology, Netherlands
• Chris Heunen, University of Edinburgh, UK
• Aleks Kissinger, Radboud University, Netherlands
• Joachim Kock, Universitat Autònoma de Barcelona, Spain
• Martha Lewis, University of Amsterdam, Netherlands
• Samuel Mimram, École Polytechnique, France
• Simona Paoli, University of Leicester, UK
• Dusko Pavlovic, University of Hawaii, USA
• Christian Retoré, Université de Montpellier, France
• Mehrnoosh Sadrzadeh, Queen Mary University, UK
• Peter Selinger, Dalhousie University, Canada
• Pawel Sobocinski, University of Southampton, UK
• David Spivak, MIT, USA
• Jamie Vicary, University of Birmingham, UK
• Simon Willerton, University of Sheffield, UK

Josh, Brendan, and Nina
Executive editors, Compositionality

Applied Category Theory Course: Collaborative Design

13 July, 2018

In my online course we’re reading the fourth chapter of Fong and Spivak’s book Seven Sketches. Chapter 4 is about collaborative design: building big projects from smaller parts. This is based on work by Andrea Censi:

• Andrea Censi, A mathematical theory of co-design.

The main mathematical content of this chapter is the theory of enriched profunctors. We’ll mainly talk about enriched profunctors between categories enriched in monoidal preorders. The picture above shows what one of these looks like!

Here are my lectures so far:

Lecture 55 – Chapter 4: Enriched Profunctors and Collaborative Design
Lecture 56 – Chapter 4: Feasibility Relations
Lecture 57 – Chapter 4: Feasibility Relations
Lecture 58 – Chapter 4: Composing Feasibility Relations
Lecture 59 – Chapter 4: Cost-Enriched Profunctors
Lecture 60 – Chapter 4: Closed Monoidal Preorders
Lecture 61 – Chapter 4: Closed Monoidal Preorders
Lecture 62 – Chapter 4: Constructing Enriched Categories
Lecture 63 – Chapter 4: Composing Enriched Profunctors
Lecture 64 – Chapter 4: The Category of Enriched Profunctors
Lecture 65 – Chapter 4: Collaborative Design
Lecture 66 – Chapter 4: Collaborative Design
Lecture 67 – Chapter 4: Feedback in Collaborative Design
Lecture 68 – Chapter 4: Feedback in Collaborative Design
Lecture 69 – Chapter 4: Feedback in Collaborative Design
Lecture 70 – Chapter 4: Tensoring Enriched Profunctors

Random Points on a Group

13 July, 2018

In Random Points on a Sphere (Part 1), we learned an interesting fact. You can take the unit sphere in \mathbb{R}^n, randomly choose two points on it, and compute their distance. This gives a random variable, whose moments you can calculate.

And now the interesting part: when n = 1, 2 or 4, and seemingly in no other cases, all the even moments are integers.

These are the dimensions in which the spheres are groups. We can prove that the even moments are integers because they are differences of dimensions of certain representations of these groups. Rogier Brussee and Allen Knutson pointed out that if we want to broaden our line of investigation, we can look at other groups. So that’s what I’ll do today.

If we take a representation of a compact Lie group G, we get a map from group into a space of square matrices. Since there is a standard metric on any space of square matrices, this lets us define the distance between two points on the group. This is different than the distance defined using the shortest geodesic in the group: instead, we’re taking a straight-line path in the larger space of matrices.

If we randomly choose two points on the group, we get a random variable, namely the distance between them. We can compute the moments of this random variable, and today I’ll prove that the even moments are all integers.

So, we get a sequence of integers from any representation \rho of any compact Lie group G. So far we’ve only studied groups that are spheres:

• The defining representation of \mathrm{O}(1) \cong S^0 on the real numbers \mathbb{R} gives the powers of 2.

• The defining representation of \mathrm{U}(1) \cong S^1 on the complex numbers \mathbb{C} gives the central binomial coefficients \binom{2n}{n}.

• The defining representation of \mathrm{Sp}(1) \cong S^3 on the quaternions \mathbb{H} gives the Catalan numbers.

It could be fun to work out these sequences for other examples. Our proof that the even moments are integers will give a way to calculate these sequences, not by doing integrals over the group, but by counting certain ‘random walks in the Weyl chamber’ of the group. Unfortunately, we need to count walks in a certain weighted way that makes things a bit tricky for me.

But let’s see why the even moments are integers!

If our group representation is real or quaternionic, we can either turn it into a complex representation or adapt my argument below. So, let’s do the complex case.

Let G be a compact Lie group with a unitary representation \rho on \mathbb{C}^n. This means we have a smooth map

\rho \colon G \to \mathrm{End}(\mathbb{C}^n)

where \mathrm{End}(\mathbb{C}^n) is the algebra of n \times n complex matrices, such that

\rho(1) = 1

\rho(gh) = \rho(g) \rho(h)


\rho(g) \rho(g)^\dagger = 1

where A^\dagger is the conjugate transpose of the matrix A.

To define a distance between points on G we’ll give \mathrm{End}(\mathbb{C}^n) its metric

\displaystyle{ d(A,B) = \sqrt{ \sum_{i,j} \left|A_{ij} - B_{ij}\right|^2} }

This clearly makes \mathrm{End}(\mathbb{C}^n) into a 2n^2-dimensional Euclidean space. But a better way to think about this metric is that it comes from the norm

\displaystyle{ \|A\|^2 = \mathrm{tr}(AA^\dagger) = \sum_{i,j} |A_{ij}|^2 }

where \mathrm{tr} is the trace, or sum of the diagonal entries. We have

d(A,B) = \|A - B\|

I want to think about the distance between two randomly chosen points in the group, where ‘randomly chosen’ means with respect to normalized Haar measure: the unique translation-invariant probability Borel measure on the group. But because this measure and also the distance function are translation-invariant, we can equally well think about the distance between the identity 1 and one randomly chosen point g in the group. So let’s work out this distance!

I really mean the distance between \rho(g) and \rho(1), so let’s compute that. Actually its square will be nicer, which is why we only consider even moments. We have

\begin{array}{ccl}  d(\rho(g),\rho(1))^2 &=& \|\rho(g) - \rho(1)\|^2  \\ \\  &=& \|\rho(g) - 1\|^2  \\  \\  &=& \mathrm{tr}\left((\rho(g) - 1)(\rho(g) - 1)^\dagger)\right) \\ \\  &=& \mathrm{tr}\left(\rho(g)\rho(g)^\dagger - \rho(g) - \rho(g)^\ast + 1\right) \\ \\  &=& \mathrm{tr}\left(2 - \rho(g) - \rho(g)^\dagger \right)   \end{array}

Now, any representation \sigma of G has a character

\chi_\sigma \colon G \to \mathbb{C}

defined by

\chi_\sigma(g) = \mathrm{tr}(\sigma(g))

and characters have many nice properties. So, we should rewrite the distance between g and the identity using characters. We have our representation \rho, whose character can be seen lurking in the formula we saw:

d(\rho(g),\rho(1))^2 = \mathrm{tr}\left(2 - \rho(g) - \rho(g)^\dagger \right)

But there’s another representation lurking here, the dual

\rho^\ast \colon G \to \mathrm{End}(\mathbb{C}^n)

given by

\rho^\ast(g)_{ij} = \overline{\rho(g)_{ij}}

This is a fairly lowbrow way of defining the dual representation, good only for unitary representations on \mathbb{C}^n, but it works well for us here, because it lets us instantly see

\mathrm{tr}(\rho(g)^\dagger) = \mathrm{tr}(\rho^\ast(g)) = \chi_{\rho^\ast}(g)

This is useful because it lets us write our distance squared

d(\rho(g),\rho(1))^2 = \mathrm{tr}\left(2 - \rho(g) - \rho(g)^\dagger \right)

in terms of characters:

d(\rho(g),\rho(1))^2 = 2n - \chi_\rho(g) - \chi_{\rho^\ast}(g)

So, the distance squared is an integral linear combination of characters. (The constant function 1 is the character of the 1-dimensional trivial representation.)

And this does the job: it shows that all the even moments of our distance squared function are integers!

Why? Because of these two facts:

1) If you take an integral linear combination of characters, and raise it to a power, you get another integral linear combination of characters.

2) If you take an integral linear combination of characters, and integrate it over G, you get an integer.

I feel like explaining these facts a bit further, because they’re part of a very beautiful branch of math, called character theory, which every mathematician should know. So here’s a quick intro to character theory for beginners. It’s not as elegant as I could make it; it’s not as simple as I could make it: I’ll try to strike a balance here.

There’s an abelian group R(G) consisting of formal differences of isomorphism classes of representations of G, mod the relation

[\rho] + [\sigma] = [\rho \oplus \sigma]

Elements of R(G) are called virtual representations of G. Unlike actual representations we can subtract them. We can also add them, and the above formula relates addition in R(G) to direct sums of representations.

We can also multiply them, by saying

[\rho] [\sigma] = [\rho \otimes \sigma]

and decreeing that multiplication distributes over addition and subtraction. This makes R(G) into a ring, called the representation ring of G.

There’s a map

\chi \colon R(G) \to C(G)

where C(G) is the ring of continuous complex-valued functions on G. This map sends each finite-dimensional representation \rho to its character \chi_\rho. This map is one-to-one because we know a representation up to isomorphism if we know its character. This map is also a ring homomorphism, since

\chi_{\rho \oplus \sigma} = \chi_\rho + \chi_\sigma


\chi_{\rho \otimes \sigma} = \chi_\rho \chi_\sigma

These facts are easy to check directly.

We can integrate continuous complex-valued functions on G, so we get a map

\displaystyle{\int} \colon C(G) \to \mathbb{C}

The first non-obvious fact in character theory is that we can compute inner products of characters as follows:

\displaystyle{\int} \overline{\chi_\sigma} \chi_\rho  =   \dim(\mathrm{hom}(\sigma,\rho))

where the expression at right is the dimension of the space of ‘intertwining operators’, or morphisms of representations, between the representation \sigma and the representation \rho.

What matters most for us now is that this inner product is an integer. In particular, if \chi_\rho is the character of any representation,

\displaystyle{\int} \chi_\rho

is an integer because we can take \sigma to be the trivial representation in the previous formula, giving \chi_\sigma = 1.

Thus, the map

R(G) \stackrel{\chi}{\longrightarrow} C(G) \stackrel{\int}{\longrightarrow} \mathbb{C}

actually takes values in \mathbb{Z}.

Now, our distance squared function

2n - \chi_\rho - \chi_{\rho^\ast} \in C(G)

is actually the image under \chi of an element of the representation ring, namely

2n - [\rho] - [\rho^\ast]

So the same is true for any of its powers—and when we integrate any of these powers we get an integer!

This stuff may seem abstract, but if you’re good at tensoring representations of some group, like \mathrm{SU}(3), you should be able to use it to compute the even moments of the distance function on this group more efficiently than using the brute-force direct approach. Instead of complicated integrals we wind up doing combinatorics.

I would like to know what sequence of integers we get for \mathrm{SU}(3). A much easier, less thrilling but still interesting example is \mathrm{SO}(3). This is the 3-dimensional real projective space \mathbb{R}\mathrm{P}^3, which we can think of as embedded in the 9-dimensional space of 3\times 3 real matrices. It’s sort of cool that I could now work out the even moments of the distance function on this space by hand! But I haven’t done it yet.

Random Points on a Sphere (Part 2)

12 July, 2018

This is the tale of a mathematical adventure. Last time our hardy band of explorers discovered that if you randomly choose two points on the unit sphere in 1-, 2- or 4-dimensional space and look at the probability distribution of their distances, then the even moments of this probability distribution are always integers. I gave a proof using some group representation theory.

On the other hand, with the help of Mathematica, Greg Egan showed that we can work out these moments for a sphere in any dimension by actually doing the bloody integrals.

He looked at the nth moment of the distance for two randomly chosen points in the unit sphere in \mathbb{R}^d, and he got

\displaystyle{ \text{moment}(d,n) = \frac{2^{d+n-2} \Gamma(\frac{d}{2}) \Gamma(\frac{1}{2} (d+n-1))}{\sqrt{\pi} \, \Gamma(d+ \frac{n}{2} - 1)} }

This looks pretty scary, but you can simplify it using the relation between the gamma function and factorials. Remember, for integers we have

\Gamma(n) = (n-1)!

We also need to know \Gamma at half-integers, which we can get knowing

\Gamma(\frac{1}{2}) = \sqrt{\pi}


\Gamma(x + 1) =  x \Gamma(x)

Using these we can express moment(d,n) in terms of factorials, but the details depend on whether d and n are even or odd.

I’m going to focus on the case where both the dimension d and the moment number n are even, so let

d = 2e, \; n = 2m

In this case we get

\text{moment}(2e,2m) = \displaystyle{\frac{\binom{2(e+m-1)} {m}}{\binom{e+m-1}{m}} }

Here ‘we’ means that Greg Egan did all the hard work:

From this formula

\text{moment}(2e,2m) = \displaystyle{\frac{\binom{2(e+m-1)} {m}}{\binom{e+m-1}{m}} }

you can show directly that the even moments in 4 dimensions are Catalan numbers:

\text{moment}(4,2m) = C_{m+1}

while in 2 dimensions they are binomial coefficients:

\mathrm{moment}(2,2m) = \displaystyle{ {2m \choose m} }

More precisely, they are ‘central’ binomial cofficients, forming the middle column of Pascal’s triangle:

1, 2, 6, 20, 70, 252,  924, 3432, 12870, 48620, \dots

So, it seems that with some real work one can get vastly more informative results than with my argument using group representation theory. The only thing you don’t get, so far, is an appealing explanation of why the even moments are integral in dimensions 1, 2 and 4.

The computational approach also opens up a huge new realm of questions! For example, are there any dimensions other than 1, 2 and 4 where the even moments are all integral?

I was especially curious about dimension 8, where the octonions live. Remember, 1, 2 and 4 are the dimensions of the associative normed division algebras, but there’s also a nonassociative normed division algebra in dimension 8: the octonions.

The d = 8 row seemed to have a fairly high fraction of integer entries:


I wondered if there were only finitely many entries in the 8th row that weren’t integers. Greg Egan did a calculation and replied:

The d=8 moments don’t seem to become all integers permanently at any point, but the non-integers become increasingly sparse.

He also got evidence suggesting that for any even dimension d, a large fraction of the even moments are integers. After some further conversation he found the nice way to think about this. Recall that

\text{moment}(2e,2m) = \displaystyle{\frac{\binom{2(e+m-1)} {m}}{\binom{e+m-1}{m}} }

If we let

r = e-1

then this moment is just

\text{moment}(2r+2,2m) = \displaystyle{\frac{\binom{2(m+r)}{m}}{\binom{m+r}{m}} }

so the question becomes: when is this an integer?

It’s good to think about this naively a bit. We can cancel out a bunch of stuff in that ratio of binomial coefficents and write it like this:

\displaystyle{ \text{moment}(2r+2,2m) = \frac{(2r+m+1) \cdots (2r+2m)}{(r+1) \cdots (r+m)} }

So when is this an integer? Let’s do the 8th moment in 4 dimensions:

\text{moment}(4,8) = \displaystyle{ \frac{7 \cdot 8 \cdot 9 \cdot 10 }{2 \cdot 3 \cdot 4 \cdot 5} }

This is an integer, namely the Catalan number 42: the Answer to the Ultimate Question of Life, the Universe, and Everything.  But apparently we had to be a bit ‘lucky’ to get an integer. For example, we needed the 10 on top to deal with the 5 on the bottom.

It seems plausible that our chances of getting an integer increase as the moment gets big compared to the dimension. For example, try the 4th moment in dimension 10:

\text{moment}(10,4) = \displaystyle{ \frac{11 \cdot 12}{5 \cdot 6}}

This not an integer, because we’re just not multiplying enough numbers to handle the prime 5 in the denominator. The 6th moment in dimension 10 is also not an integer. But if we try the 8th moment, we get lucky:

\text{moment}(10,8) = \displaystyle{ \frac{13 \cdot 14 \cdot 15 \cdot 16}{5 \cdot 6 \cdot 7 \cdot 8}}

This is an integer! We’ve got enough in the numerator to handle everything in the denominator.

Greg posted a question about this on MathOverflow:

• Greg Egan, When does doubling the size of a set multiply the number of subsets by an integer?, 9 July 2018.

He got a very nice answer from a mysterious figure named Lucia, who pointed out relevant results from this interesting paper:

• Carl Pomerance, Divisors of the middle binomial coefficient, American Mathematical Monthly 122 (2015), 636–644.

Using these, Lucia proved a result that implies the following:

Theorem. If we fix a sphere of some even dimension, and look at the even moments of the probability distribution of distances between randomly chosen points on that sphere, from the 2nd moment to the (2m)th, the fraction of these that are integers approaches 1 as m → ∞.

On the other hand, Lucia also believes Pomerance’s techniques can be used to prove a result that would imply this:

Conjecture. If we fix a sphere of some even dimension > 4, and consider the even moments of the probability distribution of distances between randomly chosen points on that sphere, infinitely many of these are not integers.

In summary: we’re seeing a more or less typical rabbit-hole in mathematics. We started by trying to understand how noncommutative quaternions are on average. We figured that out, but we got sidetracked by thinking about how far points on a sphere are on average. We started calculating, we got interested in moments of the probability distribution of distances, we noticed that the Catalan numbers show up, and we got pulled into some representation theory and number theory!

I wouldn’t say our results are earth-shaking, but we definitely had fun and learned a thing or two. One thing at least is clear. In pure math, at least, it pays to follow the ideas wherever they lead. Math isn’t really divided into different branches—it’s all connected!


Oh, and one more thing. Remember how this quest started with John D. Cook numerically computing the average of |xy - yx| over unit quaternions? Well, he went on and numerically computed the average of |(xy)z - x(yz)| over unit octonions!

• John D. Cook, How close is octonion multiplication to being associative?, 9 July 2018.

He showed the average is about 1.095, and he created this histogram:

Later, Greg Egan computed the exact value! It’s

\displaystyle{ \frac{147456}{42875 \pi} \approx 1.0947335878 \dots }

On Twitter, Christopher D. Long, whose handle is @octonion, pointed out the hidden beauty of this answer—it equals

\displaystyle{ \frac{2^{14}3^2}{5^3 7^3 \pi}    }

Nice! Here’s how Greg did this calculation:

• Greg Egan, The average associator, 12 July 2018.


If you want more details on the proof of this:

Theorem. If we fix a sphere of some even dimension, and look at the even moments of the probability distribution of distances between randomly chosen points on that sphere, from the 2nd moment to the (2m)th, the fraction of these that are integers approaches 1 as m → ∞.

you should read Greg Egan’s question on Mathoverflow, Lucia’s reply, and Pomerance’s paper. Here is Greg’s question:

For natural numbers m, r, consider the ratio of the number of subsets of size m taken from a set of size 2(m+r) to the number of subsets of the same size taken from a set of size m+r:

\displaystyle{ R(m,r)=\frac{\binom{2(m+r)}{m}}{\binom{m+r}{m}} }

For r=0 we have the central binomial coefficients, which of course are all integers:

\displaystyle{ R(m,0)=\binom{2m}{m} }

For r=1 we have the Catalan numbers, which again are integers:

\displaystyle{ R(m,1)=\frac{\binom{2(m+1)}{m}}{m+1}=\frac{(2(m+1))!}{m!(m+2)!(m+1)}}
            \displaystyle{ = \frac{(2(m+1))!}{(m+2)!(m+1)!}=C_{m+1}}

However, for any fixed r\ge 2, while R(m,r) seems to be mostly integral, it is not exclusively so. For example, with m ranging from 0 to 20000, the number of times R(m,r) is an integer for r= 2,3,4,5 are 19583, 19485, 18566, and 18312 respectively.

I am seeking general criteria for R(m,r) to be an integer.

Edited to add:

We can write:

\displaystyle{ R(m,r) = \prod_{k=1}^m{\frac{m+2r+k}{r+k}} }

So the denominator is the product of m consecutive numbers r+1, \ldots, m+r, while the numerator is the product of m consecutive numbers m+2r+1,\ldots,2m+2r. So there is a gap of r between the last of the numbers in the denominator and the first of the numbers in the numerator.

Lucia replied:

Put n=m+r, and then we can write R(m,r) more conveniently as

\displaystyle{ R(m,r) = \frac{(2n)!}{m! (n+r)!} \frac{m! r!}{n!} = \frac{\binom{2n}{n} }{\binom{n+r}{r}}. }

So the question essentially becomes one about which numbers n+k for k=1, \ldots, r divide the middle binomial coefficient \binom{2n}{n}. Obviously when k=1, n+1 always divides the middle binomial coefficient, but what about other values of k? This is treated in a lovely Monthly article of Pomerance:

• Carl Pomerance, Divisors of the middle binomial coefficient, American Mathematical Monthly 122 (2015), 636–644.

Pomerance shows that for any k \ge 2 there are infinitely many integers with n+k not dividing \binom{2n}{n}, but the set of integers n for which n+k does divide \binom{2n}{n} has density 1. So for any fixed r, for a density 1 set of values of n one has that (n+1), \ldots, (n+k) all divide \binom{2n}{n}, which means that their lcm must divide \binom{2n}{n}. But one can check without too much difficulty that the lcm of n+1, \ldots, n+k is a multiple of \binom{n+k}{k}, and so for fixed r one deduces that R(m,r) is an integer for a set of values m with density 1. (Actually, Pomerance mentions explicitly in (5) of his paper that (n+1)(n+2)\cdots (n+k) divides \binom{2n}{n} for a set of full density.)

I haven’t quite shown that R(m,r) is not an integer infinitely often for r\ge 2, but I think this can be deduced from Pomerance’s paper (by modifying his Theorem 1).

I highly recommend Pomerance’s paper—you don’t need to care much about which integers divide

\displaystyle{ \binom{2n}{n} }

to find it interesting, because it’s full of clever ideas and nice observations.