Open Petri Nets

15 August, 2018

Jade Master and I have just finished a paper on open Petri nets:

• John Baez and Jade Master, Open Petri nets

Abstract. The reachability semantics for Petri nets can be studied using open Petri nets. For us an ‘open’ Petri net is one with certain places designated as inputs and outputs via a cospan of sets. We can compose open Petri nets by gluing the outputs of one to the inputs of another. Open Petri nets can be treated as morphisms of a category, which becomes symmetric monoidal under disjoint union. However, since the composite of open Petri nets is defined only up to isomorphism, it is better to treat them as morphisms of a symmetric monoidal double category \mathbb{O}\mathbf{pen}(\mathrm{Petri}). Various choices of semantics for open Petri nets can be described using symmetric monoidal double functors out of \mathbb{O}\mathbf{pen}(\mathrm{Petri}). Here we describe the reachability semantics, which assigns to each open Petri net the relation saying which markings of the outputs can be obtained from a given marking of the inputs via a sequence of transitions. We show this semantics gives a symmetric monoidal lax double functor from \mathbb{O}\mathbf{pen}(\mathrm{Petri}) to the double category of relations. A key step in the proof is to treat Petri nets as presentations of symmetric monoidal categories; for this we use the work of Meseguer, Montanari, Sassone and others.

I’m excited about this, especially because our friends at Statebox are planning to use open Petri nets in their software. They’ve recently come out with a paper too:

• Fabrizio Romano Genovese and Jelle Herold, Executions in (semi-)integer Petri nets are compact closed categories.

Petri nets are widely used to model open systems in subjects ranging from computer science to chemistry. There are various kinds of Petri net, and various ways to make them ‘open’, and my paper with Jade only handles the simplest. But our techniques are flexible, so they can be generalized.

What’s an open Petri net? For us, it’s a thing like this:

The yellow circles are called ‘places’ (or in chemistry, ‘species’). The aqua rectangles are called ‘transitions’ (or in chemistry, ‘reactions’). There can in general be lots of places and lots of transitions. The bold arrows from places to transitions and from transitions to places complete the structure of a Petri net. There are also arbitrary functions from sets X and Y into the set of places. This makes our Petri net into an ‘open’ Petri net.

We can think of open Petri nets as morphisms between finite sets. There’s a way to compose them! Suppose we have an open Petri net P from X to Y, where now I’ve given names to the points in these sets:

We write this as P \colon X \nrightarrow Y for short, where the funky arrow reminds us this isn’t a function between sets. Given another open Petri net Q \colon Y \nrightarrow Z, for example this:

the first step in composing P and Q is to put the pictures together:

At this point, if we ignore the sets X,Y,Z, we have a new Petri net whose set of places is the disjoint union of those for P and Q.

The second step is to identify a place of P with a place of Q whenever both are images of the same point in Y. We can then stop drawing everything involving Y, and get an open Petri net QP \colon X \nrightarrow Z, which looks like this:

Formalizing this simple construction leads us into a bit of higher category theory. The process of taking the disjoint union of two sets of places and then quotienting by an equivalence relation is a pushout. Pushouts are defined only up to canonical isomorphism: for example, the place labeled C in the last diagram above could equally well have been labeled D or E. This is why to get a category, with composition strictly associative, we need to use isomorphism classes of open Petri nets as morphisms. But there are advantages to avoiding this and working with open Petri nets themselves. Basically, it’s better to work with things than mere isomorphism classes of things! If we do this, we obtain not a category but a bicategory with open Petri nets as morphisms.

However, this bicategory is equipped with more structure. Besides composing open Petri nets, we can also ‘tensor’ them via disjoint union: this describes Petri nets being run in parallel rather than in series. The result is a symmetric monoidal bicategory. Unfortunately, the axioms for a symmetric monoidal bicategory are cumbersome to check directly. Double categories turn out to be more convenient.

Double categories were introduced in the 1960s by Charles Ehresmann. More recently they have found their way into applied mathematics. They been used to study various things, including open dynamical systems:

• Eugene Lerman and David Spivak, An algebra of open continuous time dynamical systems and networks.

open electrical circuits and chemical reaction networks:

• Kenny Courser, A bicategory of decorated cospans, Theory and Applications of Categories 32 (2017), 995–1027.

open discrete-time Markov chains:

• Florence Clerc, Harrison Humphrey and P. Panangaden, Bicategories of Markov processes, in Models, Algorithms, Logics and Tools, Lecture Notes in Computer Science 10460, Springer, Berlin, 2017, pp. 112–124.

and coarse-graining for open continuous-time Markov chains:

• John Baez and Kenny Courser, Coarse-graining open Markov processes. (Blog article here.)

As noted by Shulman, the easiest way to get a symmetric monoidal bicategory is often to first construct a symmetric monoidal double category:

• Mike Shulman, Constructing symmetric monoidal bicategories.

The theory of ‘structured cospans’ gives a systematic way to build symmetric monoidal double categories—Kenny Courser and I are writing a paper on this—and Jade and I use this to construct the symmetric monoidal double category of open Petri nets.

A 2-morphism in a double category can be drawn as a square like this:

We call X_1,X_2,Y_1 and Y_2 ‘objects’, f and g ‘vertical 1-morphisms’, M and N ‘horizontal 1-cells’, and \alpha a ‘2-morphism’. We can compose vertical 1-morphisms to get new vertical 1-morphisms and compose horizontal 1-cells to get new horizontal 1-cells. We can compose the 2-morphisms in two ways: horizontally and vertically. (This is just a quick sketch of the ideas, not the full definition.)

In our paper, Jade and I start by constructing a symmetric monoidal double category \mathbb{O}\mathbf{pen}(\textrm{Petri}) with:

• sets X, Y, Z, \dots as objects,

• functions f \colon X \to Y as vertical 1-morphisms,

• open Petri nets P \colon X \nrightarrow Y as horizontal 1-cells,

• morphisms between open Petri nets as 2-morphisms.

(Since composition of horizontal 1-cells is associative only up to an invertible 2-morphism, this is technically a pseudo double category.)

What are the morphisms between open Petri nets like? A simple example may be help give a feel for this. There is a morphism from this open Petri net:

to this one:

mapping both primed and unprimed symbols to unprimed ones. This describes a process of ‘simplifying’ an open Petri net. There are also morphisms that include simple open Petri nets in more complicated ones, etc.

This is just the start. Our real goal is to study the semantics of open Petri nets: that is, how they actually describe processes! More on that later.

The Philosophy and Physics of Noether’s Theorems

11 August, 2018


I’ll be speaking at a conference celebrating the centenary of Emmy Noether’s work connecting symmetries and conservation laws:

The Philosophy and Physics of Noether’s Theorems, 5-6 October 2018, Fischer Hall, 1-4 Suffolk Street, London, UK. Organized by Bryan W. Roberts (LSE) and Nicholas Teh (Notre Dame).

They write:

2018 brings with it the centenary of a major milestone in mathematical physics: the publication of Amalie (“Emmy”) Noether’s theorems relating symmetry and physical quantities, which continue to be a font of inspiration for “symmetry arguments” in physics, and for the interpretation of symmetry within philosophy.

In order to celebrate Noether’s legacy, the University of Notre Dame and the LSE Centre for Philosophy of Natural and Social Sciences are co-organizing a conference that will bring together leading mathematicians, physicists, and philosophers of physics in order to discuss the enduring impact of Noether’s work.

There’s a registration fee, which you can see on the conference website, along with a map showing the conference location, a schedule of the talks, and other useful stuff.

Here are the speakers:

John Baez (UC Riverside)

Jeremy Butterfield (Cambridge)

Anne-Christine Davis (Cambridge)

Sebastian De Haro (Amsterdam and Cambridge)

Ruth Gregory (Durham)

Yvette Kosmann-Schwarzbach (Paris)

Peter Olver (UMN)

Sabrina Pasterski (Harvard)

Oliver Pooley (Oxford)

Tudor Ratiu (Shanghai Jiao Tong and Geneva)

Kasia Rejzner (York)

Robert Spekkens (Perimeter)

I’m looking forward to analyzing the basic assumptions behind various generalizations of Noether’s first theorem, the one that shows symmetries of a Lagrangian give conserved quantities. Having generalized it to Markov processes, I know there’s a lot more to what’s going on here than just the wonders of Lagrangian mechanics:

• John Baez and Brendan Fong, A Noether theorem for Markov processes, J. Math. Phys. 54 (2013), 013301. (Blog article here.)

I’ve been trying to get to the bottom of it ever since.

Compositionality: the Editorial Board

17 July, 2018

The editors of this journal have an announcement:

We are happy to announce the founding editorial board of Compositionality, featuring established researchers working across logic, computer science, physics, linguistics, coalgebra, and pure category theory (see the full list below). Our steering board considered many strong applications to our initial open call for editors, and it was not easy narrowing down to the final list, but we think that the quality of this editorial board and the general response bodes well for our growing research community.

In the meantime, we hope you will consider submitting something to our first issue. Look out in the coming weeks for the journal’s official open-for-submissions announcement.

The editorial board of Compositionality:

• Corina Cristea, University of Southampton, UK
• Ross Duncan, University of Strathclyde, UK
• Andrée Ehresmann, University of Picardie Jules Verne, France
• Tobias Fritz, Max Planck Institute, Germany
• Neil Ghani, University of Strathclyde, UK
• Dan Ghica, University of Birmingham, UK
• Jeremy Gibbons, University of Oxford, UK
• Nick Gurski, Case Western Reserve University, USA
• Helle Hvid Hansen, Delft University of Technology, Netherlands
• Chris Heunen, University of Edinburgh, UK
• Aleks Kissinger, Radboud University, Netherlands
• Joachim Kock, Universitat Autònoma de Barcelona, Spain
• Martha Lewis, University of Amsterdam, Netherlands
• Samuel Mimram, École Polytechnique, France
• Simona Paoli, University of Leicester, UK
• Dusko Pavlovic, University of Hawaii, USA
• Christian Retoré, Université de Montpellier, France
• Mehrnoosh Sadrzadeh, Queen Mary University, UK
• Peter Selinger, Dalhousie University, Canada
• Pawel Sobocinski, University of Southampton, UK
• David Spivak, MIT, USA
• Jamie Vicary, University of Birmingham, UK
• Simon Willerton, University of Sheffield, UK

Josh, Brendan, and Nina
Executive editors, Compositionality

Applied Category Theory Course: Collaborative Design

13 July, 2018

In my online course we’re reading the fourth chapter of Fong and Spivak’s book Seven Sketches. Chapter 4 is about collaborative design: building big projects from smaller parts. This is based on work by Andrea Censi:

• Andrea Censi, A mathematical theory of co-design.

The main mathematical content of this chapter is the theory of enriched profunctors. We’ll mainly talk about enriched profunctors between categories enriched in monoidal preorders. The picture above shows what one of these looks like!

Here are my lectures so far:

Lecture 55 – Chapter 4: Enriched Profunctors and Collaborative Design
Lecture 56 – Chapter 4: Feasibility Relations
Lecture 57 – Chapter 4: Feasibility Relations
Lecture 58 – Chapter 4: Composing Feasibility Relations
Lecture 59 – Chapter 4: Cost-Enriched Profunctors
Lecture 60 – Chapter 4: Closed Monoidal Preorders
Lecture 61 – Chapter 4: Closed Monoidal Preorders
Lecture 62 – Chapter 4: Constructing Enriched Categories
Lecture 63 – Chapter 4: Composing Enriched Profunctors
Lecture 64 – Chapter 4: The Category of Enriched Profunctors
Lecture 65 – Chapter 4: Collaborative Design
Lecture 66 – Chapter 4: Collaborative Design
Lecture 67 – Chapter 4: Feedback in Collaborative Design
Lecture 68 – Chapter 4: Feedback in Collaborative Design
Lecture 69 – Chapter 4: Feedback in Collaborative Design
Lecture 70 – Chapter 4: Tensoring Enriched Profunctors

Random Points on a Group

13 July, 2018

In Random Points on a Sphere (Part 1), we learned an interesting fact. You can take the unit sphere in \mathbb{R}^n, randomly choose two points on it, and compute their distance. This gives a random variable, whose moments you can calculate.

And now the interesting part: when n = 1, 2 or 4, and seemingly in no other cases, all the even moments are integers.

These are the dimensions in which the spheres are groups. We can prove that the even moments are integers because they are differences of dimensions of certain representations of these groups. Rogier Brussee and Allen Knutson pointed out that if we want to broaden our line of investigation, we can look at other groups. So that’s what I’ll do today.

If we take a representation of a compact Lie group G, we get a map from group into a space of square matrices. Since there is a standard metric on any space of square matrices, this lets us define the distance between two points on the group. This is different than the distance defined using the shortest geodesic in the group: instead, we’re taking a straight-line path in the larger space of matrices.

If we randomly choose two points on the group, we get a random variable, namely the distance between them. We can compute the moments of this random variable, and today I’ll prove that the even moments are all integers.

So, we get a sequence of integers from any representation \rho of any compact Lie group G. So far we’ve only studied groups that are spheres:

• The defining representation of \mathrm{O}(1) \cong S^0 on the real numbers \mathbb{R} gives the powers of 2.

• The defining representation of \mathrm{U}(1) \cong S^1 on the complex numbers \mathbb{C} gives the central binomial coefficients \binom{2n}{n}.

• The defining representation of \mathrm{Sp}(1) \cong S^3 on the quaternions \mathbb{H} gives the Catalan numbers.

It could be fun to work out these sequences for other examples. Our proof that the even moments are integers will give a way to calculate these sequences, not by doing integrals over the group, but by counting certain ‘random walks in the Weyl chamber’ of the group. Unfortunately, we need to count walks in a certain weighted way that makes things a bit tricky for me.

But let’s see why the even moments are integers!

If our group representation is real or quaternionic, we can either turn it into a complex representation or adapt my argument below. So, let’s do the complex case.

Let G be a compact Lie group with a unitary representation \rho on \mathbb{C}^n. This means we have a smooth map

\rho \colon G \to \mathrm{End}(\mathbb{C}^n)

where \mathrm{End}(\mathbb{C}^n) is the algebra of n \times n complex matrices, such that

\rho(1) = 1

\rho(gh) = \rho(g) \rho(h)


\rho(g) \rho(g)^\dagger = 1

where A^\dagger is the conjugate transpose of the matrix A.

To define a distance between points on G we’ll give \mathrm{End}(\mathbb{C}^n) its metric

\displaystyle{ d(A,B) = \sqrt{ \sum_{i,j} \left|A_{ij} - B_{ij}\right|^2} }

This clearly makes \mathrm{End}(\mathbb{C}^n) into a 2n^2-dimensional Euclidean space. But a better way to think about this metric is that it comes from the norm

\displaystyle{ \|A\|^2 = \mathrm{tr}(AA^\dagger) = \sum_{i,j} |A_{ij}|^2 }

where \mathrm{tr} is the trace, or sum of the diagonal entries. We have

d(A,B) = \|A - B\|

I want to think about the distance between two randomly chosen points in the group, where ‘randomly chosen’ means with respect to normalized Haar measure: the unique translation-invariant probability Borel measure on the group. But because this measure and also the distance function are translation-invariant, we can equally well think about the distance between the identity 1 and one randomly chosen point g in the group. So let’s work out this distance!

I really mean the distance between \rho(g) and \rho(1), so let’s compute that. Actually its square will be nicer, which is why we only consider even moments. We have

\begin{array}{ccl}  d(\rho(g),\rho(1))^2 &=& \|\rho(g) - \rho(1)\|^2  \\ \\  &=& \|\rho(g) - 1\|^2  \\  \\  &=& \mathrm{tr}\left((\rho(g) - 1)(\rho(g) - 1)^\dagger)\right) \\ \\  &=& \mathrm{tr}\left(\rho(g)\rho(g)^\dagger - \rho(g) - \rho(g)^\ast + 1\right) \\ \\  &=& \mathrm{tr}\left(2 - \rho(g) - \rho(g)^\dagger \right)   \end{array}

Now, any representation \sigma of G has a character

\chi_\sigma \colon G \to \mathbb{C}

defined by

\chi_\sigma(g) = \mathrm{tr}(\sigma(g))

and characters have many nice properties. So, we should rewrite the distance between g and the identity using characters. We have our representation \rho, whose character can be seen lurking in the formula we saw:

d(\rho(g),\rho(1))^2 = \mathrm{tr}\left(2 - \rho(g) - \rho(g)^\dagger \right)

But there’s another representation lurking here, the dual

\rho^\ast \colon G \to \mathrm{End}(\mathbb{C}^n)

given by

\rho^\ast(g)_{ij} = \overline{\rho(g)_{ij}}

This is a fairly lowbrow way of defining the dual representation, good only for unitary representations on \mathbb{C}^n, but it works well for us here, because it lets us instantly see

\mathrm{tr}(\rho(g)^\dagger) = \mathrm{tr}(\rho^\ast(g)) = \chi_{\rho^\ast}(g)

This is useful because it lets us write our distance squared

d(\rho(g),\rho(1))^2 = \mathrm{tr}\left(2 - \rho(g) - \rho(g)^\dagger \right)

in terms of characters:

d(\rho(g),\rho(1))^2 = 2n - \chi_\rho(g) - \chi_{\rho^\ast}(g)

So, the distance squared is an integral linear combination of characters. (The constant function 1 is the character of the 1-dimensional trivial representation.)

And this does the job: it shows that all the even moments of our distance squared function are integers!

Why? Because of these two facts:

1) If you take an integral linear combination of characters, and raise it to a power, you get another integral linear combination of characters.

2) If you take an integral linear combination of characters, and integrate it over G, you get an integer.

I feel like explaining these facts a bit further, because they’re part of a very beautiful branch of math, called character theory, which every mathematician should know. So here’s a quick intro to character theory for beginners. It’s not as elegant as I could make it; it’s not as simple as I could make it: I’ll try to strike a balance here.

There’s an abelian group R(G) consisting of formal differences of isomorphism classes of representations of G, mod the relation

[\rho] + [\sigma] = [\rho \oplus \sigma]

Elements of R(G) are called virtual representations of G. Unlike actual representations we can subtract them. We can also add them, and the above formula relates addition in R(G) to direct sums of representations.

We can also multiply them, by saying

[\rho] [\sigma] = [\rho \otimes \sigma]

and decreeing that multiplication distributes over addition and subtraction. This makes R(G) into a ring, called the representation ring of G.

There’s a map

\chi \colon R(G) \to C(G)

where C(G) is the ring of continuous complex-valued functions on G. This map sends each finite-dimensional representation \rho to its character \chi_\rho. This map is one-to-one because we know a representation up to isomorphism if we know its character. This map is also a ring homomorphism, since

\chi_{\rho \oplus \sigma} = \chi_\rho + \chi_\sigma


\chi_{\rho \otimes \sigma} = \chi_\rho \chi_\sigma

These facts are easy to check directly.

We can integrate continuous complex-valued functions on G, so we get a map

\displaystyle{\int} \colon C(G) \to \mathbb{C}

The first non-obvious fact in character theory is that we can compute inner products of characters as follows:

\displaystyle{\int} \overline{\chi_\sigma} \chi_\rho  =   \dim(\mathrm{hom}(\sigma,\rho))

where the expression at right is the dimension of the space of ‘intertwining operators’, or morphisms of representations, between the representation \sigma and the representation \rho.

What matters most for us now is that this inner product is an integer. In particular, if \chi_\rho is the character of any representation,

\displaystyle{\int} \chi_\rho

is an integer because we can take \sigma to be the trivial representation in the previous formula, giving \chi_\sigma = 1.

Thus, the map

R(G) \stackrel{\chi}{\longrightarrow} C(G) \stackrel{\int}{\longrightarrow} \mathbb{C}

actually takes values in \mathbb{Z}.

Now, our distance squared function

2n - \chi_\rho - \chi_{\rho^\ast} \in C(G)

is actually the image under \chi of an element of the representation ring, namely

2n - [\rho] - [\rho^\ast]

So the same is true for any of its powers—and when we integrate any of these powers we get an integer!

This stuff may seem abstract, but if you’re good at tensoring representations of some group, like \mathrm{SU}(3), you should be able to use it to compute the even moments of the distance function on this group more efficiently than using the brute-force direct approach. Instead of complicated integrals we wind up doing combinatorics.

I would like to know what sequence of integers we get for \mathrm{SU}(3). A much easier, less thrilling but still interesting example is \mathrm{SO}(3). This is the 3-dimensional real projective space \mathbb{R}\mathrm{P}^3, which we can think of as embedded in the 9-dimensional space of 3\times 3 real matrices. It’s sort of cool that I could now work out the even moments of the distance function on this space by hand! But I haven’t done it yet.

Random Points on a Sphere (Part 2)

12 July, 2018

This is the tale of a mathematical adventure. Last time our hardy band of explorers discovered that if you randomly choose two points on the unit sphere in 1-, 2- or 4-dimensional space and look at the probability distribution of their distances, then the even moments of this probability distribution are always integers. I gave a proof using some group representation theory.

On the other hand, with the help of Mathematica, Greg Egan showed that we can work out these moments for a sphere in any dimension by actually doing the bloody integrals.

He looked at the nth moment of the distance for two randomly chosen points in the unit sphere in \mathbb{R}^d, and he got

\displaystyle{ \text{moment}(d,n) = \frac{2^{d+n-2} \Gamma(\frac{d}{2}) \Gamma(\frac{1}{2} (d+n-1))}{\sqrt{\pi} \, \Gamma(d+ \frac{n}{2} - 1)} }

This looks pretty scary, but you can simplify it using the relation between the gamma function and factorials. Remember, for integers we have

\Gamma(n) = (n-1)!

We also need to know \Gamma at half-integers, which we can get knowing

\Gamma(\frac{1}{2}) = \sqrt{\pi}


\Gamma(x + 1) =  x \Gamma(x)

Using these we can express moment(d,n) in terms of factorials, but the details depend on whether d and n are even or odd.

I’m going to focus on the case where both the dimension d and the moment number n are even, so let

d = 2e, \; n = 2m

In this case we get

\text{moment}(2e,2m) = \displaystyle{\frac{\binom{2(e+m-1)} {m}}{\binom{e+m-1}{m}} }

Here ‘we’ means that Greg Egan did all the hard work:

From this formula

\text{moment}(2e,2m) = \displaystyle{\frac{\binom{2(e+m-1)} {m}}{\binom{e+m-1}{m}} }

you can show directly that the even moments in 4 dimensions are Catalan numbers:

\text{moment}(4,2m) = C_{m+1}

while in 2 dimensions they are binomial coefficients:

\mathrm{moment}(2,2m) = \displaystyle{ {2m \choose m} }

More precisely, they are ‘central’ binomial cofficients, forming the middle column of Pascal’s triangle:

1, 2, 6, 20, 70, 252,  924, 3432, 12870, 48620, \dots

So, it seems that with some real work one can get vastly more informative results than with my argument using group representation theory. The only thing you don’t get, so far, is an appealing explanation of why the even moments are integral in dimensions 1, 2 and 4.

The computational approach also opens up a huge new realm of questions! For example, are there any dimensions other than 1, 2 and 4 where the even moments are all integral?

I was especially curious about dimension 8, where the octonions live. Remember, 1, 2 and 4 are the dimensions of the associative normed division algebras, but there’s also a nonassociative normed division algebra in dimension 8: the octonions.

The d = 8 row seemed to have a fairly high fraction of integer entries:


I wondered if there were only finitely many entries in the 8th row that weren’t integers. Greg Egan did a calculation and replied:

The d=8 moments don’t seem to become all integers permanently at any point, but the non-integers become increasingly sparse.

He also got evidence suggesting that for any even dimension d, a large fraction of the even moments are integers. After some further conversation he found the nice way to think about this. Recall that

\text{moment}(2e,2m) = \displaystyle{\frac{\binom{2(e+m-1)} {m}}{\binom{e+m-1}{m}} }

If we let

r = e-1

then this moment is just

\text{moment}(2r+2,2m) = \displaystyle{\frac{\binom{2(m+r)}{m}}{\binom{m+r}{m}} }

so the question becomes: when is this an integer?

It’s good to think about this naively a bit. We can cancel out a bunch of stuff in that ratio of binomial coefficents and write it like this:

\displaystyle{ \text{moment}(2r+2,2m) = \frac{(2r+m+1) \cdots (2r+2m)}{(r+1) \cdots (r+m)} }

So when is this an integer? Let’s do the 8th moment in 4 dimensions:

\text{moment}(4,8) = \displaystyle{ \frac{7 \cdot 8 \cdot 9 \cdot 10 }{2 \cdot 3 \cdot 4 \cdot 5} }

This is an integer, namely the Catalan number 42: the Answer to the Ultimate Question of Life, the Universe, and Everything.  But apparently we had to be a bit ‘lucky’ to get an integer. For example, we needed the 10 on top to deal with the 5 on the bottom.

It seems plausible that our chances of getting an integer increase as the moment gets big compared to the dimension. For example, try the 4th moment in dimension 10:

\text{moment}(10,4) = \displaystyle{ \frac{11 \cdot 12}{5 \cdot 6}}

This not an integer, because we’re just not multiplying enough numbers to handle the prime 5 in the denominator. The 6th moment in dimension 10 is also not an integer. But if we try the 8th moment, we get lucky:

\text{moment}(10,8) = \displaystyle{ \frac{13 \cdot 14 \cdot 15 \cdot 16}{5 \cdot 6 \cdot 7 \cdot 8}}

This is an integer! We’ve got enough in the numerator to handle everything in the denominator.

Greg posted a question about this on MathOverflow:

• Greg Egan, When does doubling the size of a set multiply the number of subsets by an integer?, 9 July 2018.

He got a very nice answer from a mysterious figure named Lucia, who pointed out relevant results from this interesting paper:

• Carl Pomerance, Divisors of the middle binomial coefficient, American Mathematical Monthly 122 (2015), 636–644.

Using these, Lucia proved a result that implies the following:

Theorem. If we fix a sphere of some even dimension, and look at the even moments of the probability distribution of distances between randomly chosen points on that sphere, from the 2nd moment to the (2m)th, the fraction of these that are integers approaches 1 as m → ∞.

On the other hand, Lucia also believes Pomerance’s techniques can be used to prove a result that would imply this:

Conjecture. If we fix a sphere of some even dimension > 4, and consider the even moments of the probability distribution of distances between randomly chosen points on that sphere, infinitely many of these are not integers.

In summary: we’re seeing a more or less typical rabbit-hole in mathematics. We started by trying to understand how noncommutative quaternions are on average. We figured that out, but we got sidetracked by thinking about how far points on a sphere are on average. We started calculating, we got interested in moments of the probability distribution of distances, we noticed that the Catalan numbers show up, and we got pulled into some representation theory and number theory!

I wouldn’t say our results are earth-shaking, but we definitely had fun and learned a thing or two. One thing at least is clear. In pure math, at least, it pays to follow the ideas wherever they lead. Math isn’t really divided into different branches—it’s all connected!


Oh, and one more thing. Remember how this quest started with John D. Cook numerically computing the average of |xy - yx| over unit quaternions? Well, he went on and numerically computed the average of |(xy)z - x(yz)| over unit octonions!

• John D. Cook, How close is octonion multiplication to being associative?, 9 July 2018.

He showed the average is about 1.095, and he created this histogram:

Later, Greg Egan computed the exact value! It’s

\displaystyle{ \frac{147456}{42875 \pi} \approx 1.0947335878 \dots }

On Twitter, Christopher D. Long, whose handle is @octonion, pointed out the hidden beauty of this answer—it equals

\displaystyle{ \frac{2^{14}3^2}{5^3 7^3 \pi}    }

Nice! Here’s how Greg did this calculation:

• Greg Egan, The average associator, 12 July 2018.


If you want more details on the proof of this:

Theorem. If we fix a sphere of some even dimension, and look at the even moments of the probability distribution of distances between randomly chosen points on that sphere, from the 2nd moment to the (2m)th, the fraction of these that are integers approaches 1 as m → ∞.

you should read Greg Egan’s question on Mathoverflow, Lucia’s reply, and Pomerance’s paper. Here is Greg’s question:

For natural numbers m, r, consider the ratio of the number of subsets of size m taken from a set of size 2(m+r) to the number of subsets of the same size taken from a set of size m+r:

\displaystyle{ R(m,r)=\frac{\binom{2(m+r)}{m}}{\binom{m+r}{m}} }

For r=0 we have the central binomial coefficients, which of course are all integers:

\displaystyle{ R(m,0)=\binom{2m}{m} }

For r=1 we have the Catalan numbers, which again are integers:

\displaystyle{ R(m,1)=\frac{\binom{2(m+1)}{m}}{m+1}=\frac{(2(m+1))!}{m!(m+2)!(m+1)}}
            \displaystyle{ = \frac{(2(m+1))!}{(m+2)!(m+1)!}=C_{m+1}}

However, for any fixed r\ge 2, while R(m,r) seems to be mostly integral, it is not exclusively so. For example, with m ranging from 0 to 20000, the number of times R(m,r) is an integer for r= 2,3,4,5 are 19583, 19485, 18566, and 18312 respectively.

I am seeking general criteria for R(m,r) to be an integer.

Edited to add:

We can write:

\displaystyle{ R(m,r) = \prod_{k=1}^m{\frac{m+2r+k}{r+k}} }

So the denominator is the product of m consecutive numbers r+1, \ldots, m+r, while the numerator is the product of m consecutive numbers m+2r+1,\ldots,2m+2r. So there is a gap of r between the last of the numbers in the denominator and the first of the numbers in the numerator.

Lucia replied:

Put n=m+r, and then we can write R(m,r) more conveniently as

\displaystyle{ R(m,r) = \frac{(2n)!}{m! (n+r)!} \frac{m! r!}{n!} = \frac{\binom{2n}{n} }{\binom{n+r}{r}}. }

So the question essentially becomes one about which numbers n+k for k=1, \ldots, r divide the middle binomial coefficient \binom{2n}{n}. Obviously when k=1, n+1 always divides the middle binomial coefficient, but what about other values of k? This is treated in a lovely Monthly article of Pomerance:

• Carl Pomerance, Divisors of the middle binomial coefficient, American Mathematical Monthly 122 (2015), 636–644.

Pomerance shows that for any k \ge 2 there are infinitely many integers with n+k not dividing \binom{2n}{n}, but the set of integers n for which n+k does divide \binom{2n}{n} has density 1. So for any fixed r, for a density 1 set of values of n one has that (n+1), \ldots, (n+k) all divide \binom{2n}{n}, which means that their lcm must divide \binom{2n}{n}. But one can check without too much difficulty that the lcm of n+1, \ldots, n+k is a multiple of \binom{n+k}{k}, and so for fixed r one deduces that R(m,r) is an integer for a set of values m with density 1. (Actually, Pomerance mentions explicitly in (5) of his paper that (n+1)(n+2)\cdots (n+k) divides \binom{2n}{n} for a set of full density.)

I haven’t quite shown that R(m,r) is not an integer infinitely often for r\ge 2, but I think this can be deduced from Pomerance’s paper (by modifying his Theorem 1).

I highly recommend Pomerance’s paper—you don’t need to care much about which integers divide

\displaystyle{ \binom{2n}{n} }

to find it interesting, because it’s full of clever ideas and nice observations.

Random Points on a Sphere (Part 1)

10 July, 2018

John D. Cook, Greg Egan, Dan Piponi and I had a fun mathematical adventure on Twitter. It started when John Cook wrote a program to compute the probability distribution of distances |xy - yx| where x and y were two randomly chosen unit quaternions:

• John D. Cook, How far is xy from yx on average for quaternions?, 5 July 2018.

Three things to note before we move on:

• Click the pictures to see the source and get more information—I made none of them!

• We’ll be ‘randomly choosing’ lots of points on spheres of various dimensions. Whenever we do this, I mean that they’re chosen independently, and uniformly with respect to the unique rotation-invariant Borel measure that’s a probability measure on the sphere. In other words: nothing sneaky, just the most obvious symmetrical thing!

• We’ll be talking about lots of distances between points on the unit sphere in n dimensions. Whenever we do this, I mean the Euclidean distance in \mathbb{R}^n, not the length of the shortest path on the sphere connecting them.


If you look at the histogram above, you’ll see the length |xy - yx| is between 0 and 2. That’s good, since xy and yx are on the unit sphere in 4 dimensions. More interestingly, the mean looks bigger than 1. John Cook estimated it at 1.13.

Greg Egan went ahead and found that the mean is exactly

\displaystyle{\frac{32}{9 \pi}} \approx 1.13176848421 \dots

He did this by working out a formula for the probability distribution:

All this is great, but it made me wonder how surprised I should be. What’s the average distance between two points on the unit sphere in 4 dimensions, anyway?

Greg Egan worked this out too:

So, the mean distance |x - y| for two randomly chosen unit quaternions is

\displaystyle{\frac{64}{15 \pi}} \approx 1.35812218105\dots

The mean of |xy - yx| is smaller than this. In retrospect this makes sense, since I know what quaternionic commutators are like: for example the points x = \pm 1 at the ‘north and south poles’ of the unit sphere commute with everybody. However, we can now say the mean of |xy - yx| is exactly

\displaystyle{\frac{32}{9\pi} } \cdot  \frac{15 \pi}{64} = \frac{5}{6}

times the mean of |x - y|, and there’s no way I could have guessed that.

While trying to get a better intuition for this, I realized that as you go to higher and higher dimensions, and you standing at the north pole of the unit sphere, the chance that a randomly chosen other point is quite near the equator gets higher and higher! That’s how high dimensions work. So, the mean value of |x - y| should get closer and closer to \sqrt{2}. And indeed, Greg showed that this is true:

The graphs here show the probability distributions of distances for randomly chosen pairs of points on spheres of various dimensions. As the dimension increases, the probability distribution gets more sharply peaked, and the mean gets closer to \sqrt{2}.

Greg wrote:

Here’s the general formula for the distribution, with plots for n=2,…,10. The mean distance does tend to √2, and the mean of the squared distance is always exactly 2, so the variance tends to zero.

But now comes the surprising part.

Dan Piponi looked at the probability distribution of distances s = |x - y| in the 4-dimensional case:

P(s) = \displaystyle{\frac{s^2\sqrt{4 - s^2}}{\pi} }

and somehow noticed that its moments

\int_0^2 P(s) s^{n} \, dx

when n is even, are the Catalan numbers!

Now if you don’t know about moments of probability distributions you should go read about those, because they’re about a thousand times more important than anything you’ll learn here.

And if you don’t know about Catalan numbers, you should go read about those, because they’re about a thousand times more fun than anything you’ll learn here.

So, I’ll assume you know about those. How did Dan Piponi notice that the Catalan numbers

C_0 = 1, C_1 = 1, C_2 = 2, C_3 = 5, C_4 = 14, C_5 = 42, \dots

were the even moments of this probability distribution? Maybe it’s because he recently managed to get ahold of Richard Stanley’s book on Amazon for just $11 instead of its normal price of $77.

(I don’t know how that happened. Some people write 7’s that look like 1’s, but….)

Anyway, you’ll notice that this strange phenomenon is all about points on the unit sphere in 4 dimensions. It doesn’t seem to involve quaternions anymore! So I asked if something similar happens in other dimensions, maybe giving us other interesting sequences of integers.

Greg Egan figured it out, and got some striking results:

Here d is the dimension of the Euclidean space containing our unit sphere, and Egan is tabulating the nth moment of the probability distribution of distances between two randomly chosen points on that sphere. The gnarly formula on top is a general expression for this moment in terms of the gamma function.

The obvious interesting feature of this table is that only for d = 2 and d = 4 rows are all the entries integers.

But Dan made another great observation: Greg left out the rather trivial d = 1 row, and that all the entries of this row would be integers too! Even better, d = 1, 2, and 4 are the dimensions of the associative normed division algebras: the real numbers, the complex numbers and the quaternions!

This made me eager to find a proof that all the even moments of the probability distribution of distances between points on the unit sphere in \mathbb{R}^d are integers when \mathbb{R}^d is an associative normed division algebra.

The first step is to notice the significance of even moments.

First, we don’t need to choose both points on the sphere randomly: we can fix one and let the other vary. So, we can think of the distance

D(x) = |(x_1, \dots, x_d) - (1, \dots, 0)| = \sqrt{(x_1 - 1)^2 + x_2^2 + \cdots + x_d^2}

as a function on the sphere, or more generally a function of x \in \mathbb{R}^d. And when we do this we instantly notice that the square root is rather obnoxious, but all the even powers of the function D are polynomials on \mathbb{R}^d.

Then, we notice that restricting polynomials from Euclidean space to the sphere is how we get spherical harmonics, so this problem is connected to spherical harmonics and ‘harmonic analysis’. The nth moment of the probability distribution of distances between points on the unit sphere in \mathbb{R}^d is

\int_{S^{d-1}} D^n

where we are integrating with respect to the rotation-invariant probability measure on the sphere. We can rewrite this as an inner product in L^2(S^{d-1}), namely

\langle D^n , 1 \rangle

where 1 is the constant function equal to 1 on the whole sphere.

We’re looking at the even moments, so let n = 2m. Now, why should

\langle D^{2m} , 1 \rangle

be an integer when d = 1, 2 and 4? Well, these are the cases where the sphere S^{d-1} is a group! For d = 1,

S^0 \cong \mathbb{Z}/2

is the multiplicative group of unit real numbers, \{\pm 1\}. For d = 2,

S^1 \cong \mathrm{U}(1)

is the multiplicative group of unit complex numbers. And for d = 4,

S^3 \cong \mathrm{SU}(2)

is the multiplicative group of unit quaternions.

These are compact Lie groups, and L^2 of a compact Lie group is very nice. Any finite-dimensional representation \rho of a compact Lie group G gives a function \chi_\rho \in L^2(G) called its character, given by

\chi_\rho(g) = \mathrm{tr}(\rho(g))

And it’s well-known that for two representations \rho and \sigma, the inner product

\langle \chi_\rho, \chi_\sigma \rangle

is an integer! In fact it’s a natural number: just the dimension of the space of intertwining operators from \rho to \sigma. So, we should try to prove that

\langle D^{2m} , 1 \rangle

is an integer this way. The function 1 is the character of the trivial 1-dimensional representation, so we’re okay there. What about D^{2m}?

Well, there’s a way to take the mth tensor power \rho^{\otimes m} of a representation \rho: you just tensor the representation with itself m times. And then you can easily show

\chi_{\rho^{\otimes m}} = (\chi_\rho)^m

So, if we can show D^2 is the character of a representation, we’re done: D^{2m} will be one as well, and the inner product

\langle D^{2m}, 1 \rangle

will be an integer! Great plan!

Unfortunately, D^2 is not the character of a representation.

Unless \rho is the completely silly 0-dimensional representation we have

\chi_\rho(1) = \mathrm{tr}(\rho(1)) = \dim(\rho) > 0

where 1 is the identity element of G. But suppose we let D(g) be the distance of g from the identity element—the natural choice of ‘north pole’ when we make our sphere into a group. Then we have

D(1)^2 = 0

So D^2 can’t be a character. (It’s definitely not the character of the completely silly 0-dimensional representation: that’s zero.)

But there’s a well-known workaround. We can work with virtual representations, which are formal differences of representations, like this:

\delta = \rho - \sigma

The character of a virtual representation is defined in the obvious way

\chi_\delta = \chi_\rho - \chi_\sigma

Since the inner product of characters of two representations is a natural number, the inner product of characters of two virtual representations will be an integer. And we’ll be completely satisfied if we prove that

\langle D^{2m}, 1 \rangle

is an integer, since it’s obviously ≥ 0.

So, we just need to show that D^{2m} is the character of a virtual representation. This will easily follow if we can show D^2 itself is the character of a virtual representation: you can tensor virtual representations, and then their characters multiply.

So, let’s do it! I’ll just do the quaternionic case. I’m doing it right now, thinking out loud here. I figure I should start with a really easy representation, take its character, compare that to our function D^2, and then fix it by subtracting something.

Let \rho be the spin-1/2 representation of \mathrm{SU}(2), which just sends every matrix in \mathrm{SU}(2) to itself. Every matrix in \mathrm{SU}(2) is conjugate to one of the form

g = \left(\begin{array}{cc} \exp(i\theta) & 0 \\ 0 & \exp(-i\theta) \end{array}\right)

so we can just look at those, and we have

\chi_\rho(g) = \mathrm{tr}(\rho(g)) = \mathrm{tr}(g) = 2 \cos \theta

On the other hand, we can think of g as a unit quaternion, and then

g = \cos \theta + i \sin \theta

where now i stands for the quaternion of that name! So, its distance from 1 is

D(g) = |\cos \theta + i \sin \theta - 1|

and if we square this we get

D(g)^2 = (1 - \cos \theta)^2 + \sin^2 \theta = 2 - 2 \cos \theta

So, we’re pretty close:

D(g)^2 = 2 - \chi_{\rho}

In particular, this means D^2 is the character of the virtual representation

(1 \oplus 1) - \rho

where 1 is the 1d trivial rep and \rho is the spin-1/2 rep.

So we’re done!

At least we’re done showing the even moments of the distance between two randomly chosen points on the 3-sphere is an integer. The 1-sphere and 0-sphere cases are similar.

But course there’s another approach! We can just calculate the darn moments and see what we get. This leads to deeper puzzles, which we have not completely solved. But I’ll talk about these next time, in Part 2.