Category Theory Course

13 October, 2018

I’m teaching a course on category theory at U.C. Riverside, and since my website is still suffering from reduced functionality I’ll put the course notes here for now. I taught an introductory course on category theory in 2016, but this one is a bit more advanced.

The hand-written notes here are by Christian Williams. They are probably best seen as a reminder to myself as to what I’d like to include in a short book someday.

Lecture 1: What is pure mathematics all about? The importance of free structures.

Lecture 2: The natural numbers as a free structure. Adjoint functors.

Lecture 3: Adjoint functors in terms of unit and counit.

Lecture 4: 2-Categories. Adjunctions.

Lecture 5: 2-Categories and string diagrams. Composing adjunctions.

Lecture 6: The ‘main spine’ of mathematics. Getting a monad from an adjunction.

Lecture 7: Definition of a monad. Getting a monad from an adjunction. The augmented simplex category.

Lecture 8: The walking monad, the augmented simplex category and the simplex category.

Lecture 9: Simplicial abelian groups from simplicial sets. Chain complexes from simplicial abelian groups.

Lecture 10: The Dold-Thom theorem: the category of simplicial abelian groups is equivalent to the category of chain complexes of abelian groups. The homology of a chain complex.

Lecture 7: Definition of a monad. Getting a monad from an adjunction. The augmented simplex category.

Lecture 8: The walking monad, the
augmented simplex category and the simplex category.

Lecture 9: Simplicial abelian groups from simplicial sets. Chain complexes from simplicial abelian groups.

Lecture 10: Chain complexes from simplicial abelian groups. The homology of a chain complex.

Lecture 12: The bar construction: getting a simplicial objects from an adjunction. The bar construction for G-sets, previewed.

Lecture 13: The adjunction between G-sets and sets.

Lecture 14: The bar construction for groups.

Lecture 15: The simplicial set \mathbb{E}G obtained by applying the bar construction to the one-point G-set, its geometric realization EG = |\mathbb{E}G|, and the free simplicial abelian group \mathbb{Z}[\mathbb{E}G].

Lecture 16: The chain complex C(G) coming from the simplicial abelian group \mathbb{Z}[\mathbb{E}G], its homology, and the definition of group cohomology H^n(G,A) with coefficients in a G-module.

Lecture 17: Extensions of groups. The Jordan-Hölder theorem. How an extension of a group G by an abelian group A gives an action of G on A and a 2-cocycle c \colon G^2 \to A.

Lecture 18: Classifying abelian extensions of groups. Direct products, semidirect products, central extensions and general abelian extensions. The groups of order 8 as abelian extensions.

Lecture 19: Group cohomology. The chain complex for the cohomology of G with coefficients in A, starting from the bar construction, and leading to the 2-cocycles used in classifying abelian extensions. The classification of extensions of G by A in terms of H^2(G,A).

Lebesgue Universal Covering Problem (Part 3)

7 October, 2018

Back in 2015, I reported some progress on this difficult problem in plane geometry. I’m happy to report some more.

First, remember the story. A subset of the plane has diameter 1 if the distance between any two points in this set is ≤ 1. A universal covering is a convex subset of the plane that can cover a translated, reflected and/or rotated version of every subset of the plane with diameter 1. In 1914, the famous mathematician Henri Lebesgue sent a letter to a fellow named Pál, challenging him to find the universal covering with the least area.

Pál worked on this problem, and 6 years later he published a paper on it. He found a very nice universal covering: a regular hexagon in which one can inscribe a circle of diameter 1. This has area


But he also found a universal covering with less area, by removing two triangles from this hexagon—for example, the triangles C1C2C3 and E1E2E3 here:

The resulting universal covering has area


In 1936, Sprague went on to prove that more area could be removed from another corner of Pál’s original hexagon, giving a universal covering of area


In 1992, Hansen took these reductions even further by removing two more pieces from Pál’s hexagon. Each piece is a thin sliver bounded by two straight lines and an arc. The first piece is tiny. The second is downright microscopic!

Hansen claimed the areas of these regions were 4 · 10-11 and 6 · 10-18. This turned out to be wrong. The actual areas are 3.7507 · 10-11 and 8.4460 · 10-21. The resulting universal covering had an area of


This tiny improvement over Sprague’s work led Klee and Wagon to write:

it does seem safe to guess that progress on [this problem], which has been painfully slow in the past, may be even more painfully slow in the future.

However, in 2015 Philip Gibbs found a way to remove about a million times more area than Hansen’s larger region: a whopping 2.233 · 10-5. This gave a universal covering with area


Karine Bagdasaryan and I helped Gibbs write up a rigorous proof of this result, and we published it here:

• John Baez, Karine Bagdasaryan and Philip Gibbs, The Lebesgue universal covering problem, Journal of Computational Geometry 6 (2015), 288–299.

Greg Egan played an instrumental role as well, catching various computational errors.

At the time Philip was sure he could remove even more area, at the expense of a more complicated proof. Since the proof was already quite complicated, we decided to stick with what we had.

But this week I met Philip at The philosophy and physics of Noether’s theorems, a wonderful workshop in London which deserves a full blog article of its own. It turns out that he has gone further: he claims to have found a vastly better universal covering, with area


This is an improvement of 2.178245 × 10-5 over our earlier work—roughly equal to our improvement over Hansen.

You can read his argument here:

• Philip Gibbs, An upper bound for Lebesgue’s universal covering problem, 22 January 2018.

I say ‘claims’ not because I doubt his result—he’s clearly a master at this kind of mathematics!—but because I haven’t checked it and it’s easy to make mistakes, for example mistakes in computing the areas of the shapes removed.

It seems we are closing in on the final result; however, Philip Gibbs believes there is still room for improvement, so I expect it will take at least a decade or two to solve this problem… unless, of course, some mathematicians start working on it full-time, which could speed things up considerably.

Riverside Math Workshop

6 October, 2018

We’re having a workshop with a bunch of cool math talks at U. C. Riverside, and you can register for it here:

Riverside Mathematics Workshop for Excellence and Diversity, Friday 19 October – Saturday 20 October, 2018. Organized by John Baez, Carl Mautner, José González and Chen Weitao.

This is the first of an annual series of workshops to showcase and celebrate excellence in research by women and other under-represented groups for the purpose of fostering and encouraging growth in the U.C. Riverside mathematical community.

After tea at 3:30 p.m. on Friday there will be two plenary talks, lasting until 5:00. Catherine Searle will talk on “Symmetries of spaces with lower curvature bounds”, and Edray Goins will give a talk called “Clocks, parking garages, and the solvability of the quintic: a friendly introduction to monodromy”. There will then be a banquet in the Alumni Center 6:30 – 8:30 p.m.

On Saturday there will be coffee and a poster session at 8:30 a.m., and then two parallel sessions on pure and applied mathematics, with talks at 9:30, 10:30, 11:30, 1:00 and 2:00. Check out the abstracts here!

(I’m especially interested in Christina Vasilakopoulou’s talk on Frobenius and Hopf monoids in enriched categories, but she’s my postdoc so I’m biased.)

Applied Category Theory 2019

2 October, 2018


animation by Marius Buliga

I’m helping organize ACT 2019, an applied category theory conference and school at Oxford, July 15-26, 2019.

More details will come later, but here’s the basic idea. If you’re a grad student interested in this subject, you should apply for the ‘school’. Not yet—we’ll let you know when.

Dear all,

As part of a new growing community in Applied Category Theory, now with a dedicated journal Compositionality, a traveling workshop series SYCO, a forthcoming Cambridge U. Press book series Reasoning with Categories, and several one-off events including at NIST, we launch an annual conference+school series named Applied Category Theory, the coming one being at Oxford, July 15-19 for the conference, and July 22-26 for the school. The dates are chosen such that CT 2019 (Edinburgh) and the ACT 2019 conference (Oxford) will be back-to-back, for those wishing to participate in both.

There already was a successful invitation-only pilot, ACT 2018, last year at the Lorentz Centre in Leiden, also in the format of school+workshop.

For the conference, for those who are familiar with the successful QPL conference series, we will follow a very similar format for the ACT conference. This means that we will accept both new papers which then will be published in a proceedings volume (most likely a Compositionality special Proceedings issue), as well as shorter abstracts of papers published elsewhere. There will be a thorough selection process, as typical in computer science conferences. The idea is that all the best work in applied category theory will be presented at the conference, and that acceptance is something that means something, just like in CS conferences. This is particularly important for young people as it will help them with their careers.

Expect a call for submissions soon, and start preparing your papers now!

The school in ACT 2018 was unique in that small groups of students worked closely with an experienced researcher (these were John Baez, Aleks Kissinger, Martha Lewis and Pawel Sobociński), and each group ended up producing a paper. We will continue with this format or a closely related one, with Jules Hedges and Daniel Cicala as organisers this year. As there were 80 applications last year for 16 slots, we may want to try to find a way to involve more students.

We are fortunate to have a number of private sector companies closely associated in some way or another, who will also participate, with Cambridge Quantum Computing Inc. and StateBox having already made major financial/logistic contributions.

On behalf of the ACT Steering Committee,

John Baez, Bob Coecke, David Spivak, Christina Vasilakopoulou

Patterns That Eventually Fail

20 September, 2018

Sometimes patterns can lead you astray. For example, it’s known that

\displaystyle{ \mathrm{li}(x) = \int_0^x \frac{dt}{\ln t} }

is a good approximation to \pi(x), the number of primes less than or equal to x. Numerical evidence suggests that \mathrm{li}(x) is always greater than \pi(x). For example,

\mathrm{li}(10^{12}) - \pi(10^{12}) = 38,263


\mathrm{li}(10^{24}) - \pi(10^{24}) = 17,146,907,278

But in 1914, Littlewood heroically showed that in fact, \mathrm{li}(x) - \pi(x) changes sign infinitely many times!

This raised the question: when does \pi(x) first exceed \mathrm{li}(x)? In 1933, Littlewood’s student Skewes showed, assuming the Riemann hypothesis, that it must do so for some x less than or equal to

\displaystyle{ 10^{10^{10^{34}}} }

Later, in 1955, Skewes showed without the Riemann hypothesis that \pi(x) must exceed \mathrm{li}(x) for some x smaller than

\displaystyle{ 10^{10^{10^{964}}} }

By now this bound has been improved enormously. We now know the two functions cross somewhere near 1.397 \times 10^{316}, but we don’t know if this is the first crossing!

All this math is quite deep. Here is something less deep, but still fun.

You can show that

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, dt = \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, dt = \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \, dt = \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \, \frac{\sin \left(\frac{t}{301}\right)}{\frac{t}{301}} \, dt = \frac{\pi}{2} }

and so on.

It’s a nice pattern. But this pattern doesn’t go on forever! It lasts a very, very long time… but not forever.

More precisely, the identity

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \cdots \, \frac{\sin \left(\frac{t}{100 n +1}\right)}{\frac{t}{100 n + 1}} \, dt = \frac{\pi}{2} }

holds when

n < 9.8 \cdot 10^{42}

but not for all n. At some point it stops working and never works again. In fact, it definitely fails for all

n > 7.4 \cdot 10^{43}

The explanation

The integrals here are a variant of the Borwein integrals:

\displaystyle{ \int_0^\infty \frac{\sin(x)}{x} \, dx= \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin(x)}{x}\frac{\sin(x/3)}{x/3} \, dx = \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin(x)}{x}\, \frac{\sin(x/3)}{x/3} \, \frac{\sin(x/5)}{x/5} \, dx = \frac{\pi}{2} }

where the pattern continues until

\displaystyle{ \int_0^\infty \frac{\sin(x)}{x} \, \frac{\sin(x/3)}{x/3}\cdots\frac{\sin(x/13)}{x/13} \, dx = \frac{\pi}{2} }

but then fails:

\displaystyle{\int_0^\infty \frac{\sin(x)}{x} \, \frac{\sin(x/3)}{x/3}\cdots \frac{\sin(x/15)}{x/15} \, dx \approx \frac \pi 2 - 2.31\times 10^{-11} }

I never understood this until I read Greg Egan’s explanation, based on the work of Hanspeter Schmid. It’s all about convolution, and Fourier transforms:

Suppose we have a rectangular pulse, centred on the origin, with a height of 1/2 and a half-width of 1.

Now, suppose we keep taking moving averages of this function, again and again, with the average computed in a window of half-width 1/3, then 1/5, then 1/7, 1/9, and so on.

There are a couple of features of the original pulse that will persist completely unchanged for the first few stages of this process, but then they will be abruptly lost at some point.

The first feature is that F(0) = 1/2. In the original pulse, the point (0,1/2) lies on a plateau, a perfectly constant segment with a half-width of 1. The process of repeatedly taking the moving average will nibble away at this plateau, shrinking its half-width by the half-width of the averaging window. So, once the sum of the windows’ half-widths exceeds 1, at 1/3+1/5+1/7+…+1/15, F(0) will suddenly fall below 1/2, but up until that step it will remain untouched.

In the animation below, the plateau where F(x)=1/2 is marked in red.

The second feature is that F(–1)=F(1)=1/4. In the original pulse, we have a step at –1 and 1, but if we define F here as the average of the left-hand and right-hand limits we get 1/4, and once we apply the first moving average we simply have 1/4 as the function’s value.

In this case, F(–1)=F(1)=1/4 will continue to hold so long as the points (–1,1/4) and (1,1/4) are surrounded by regions where the function has a suitable symmetry: it is equal to an odd function, offset and translated from the origin to these centres. So long as that’s true for a region wider than the averaging window being applied, the average at the centre will be unchanged.

The initial half-width of each of these symmetrical slopes is 2 (stretching from the opposite end of the plateau and an equal distance away along the x-axis), and as with the plateau, this is nibbled away each time we take another moving average. And in this case, the feature persists until 1/3+1/5+1/7+…+1/113, which is when the sum first exceeds 2.

In the animation, the yellow arrows mark the extent of the symmetrical slopes.

OK, none of this is difficult to understand, but why should we care?

Because this is how Hanspeter Schmid explained the infamous Borwein integrals:

∫sin(t)/t dt = π/2
∫sin(t/3)/(t/3) × sin(t)/t dt = π/2
∫sin(t/5)/(t/5) × sin(t/3)/(t/3) × sin(t)/t dt = π/2

∫sin(t/13)/(t/13) × … × sin(t/3)/(t/3) × sin(t)/t dt = π/2

But then the pattern is broken:

∫sin(t/15)/(t/15) × … × sin(t/3)/(t/3) × sin(t)/t dt < π/2

Here these integrals are from t=0 to t=∞. And Schmid came up with an even more persistent pattern of his own:

∫2 cos(t) sin(t)/t dt = π/2
∫2 cos(t) sin(t/3)/(t/3) × sin(t)/t dt = π/2
∫2 cos(t) sin(t/5)/(t/5) × sin(t/3)/(t/3) × sin(t)/t dt = π/2

∫2 cos(t) sin(t/111)/(t/111) × … × sin(t/3)/(t/3) × sin(t)/t dt = π/2


∫2 cos(t) sin(t/113)/(t/113) × … × sin(t/3)/(t/3) × sin(t)/t dt < π/2

The first set of integrals, due to Borwein, correspond to taking the Fourier transforms of our sequence of ever-smoother pulses and then evaluating F(0). The Fourier transform of the sinc function:

sinc(w t) = sin(w t)/(w t)

is proportional to a rectangular pulse of half-width w, and the Fourier transform of a product of sinc functions is the convolution of their transforms, which in the case of a rectangular pulse just amounts to taking a moving average.

Schmid’s integrals come from adding a clever twist: the extra factor of 2 cos(t) shifts the integral from the zero-frequency Fourier component to the sum of its components at angular frequencies –1 and 1, and hence the result depends on F(–1)+F(1)=1/2, which as we have seen persists for much longer than F(0)=1/2.

• Hanspeter Schmid, Two curious integrals and a graphic proof, Elem. Math. 69 (2014) 11–17.

I asked Greg if we could generalize these results to give even longer sequences of identities that eventually fail, and he showed me how: you can just take the Borwein integrals and replace the numbers 1, 1/3, 1/5, 1/7, … by some sequence of positive numbers

1, a_1, a_2, a_3 \dots

The integral

\displaystyle{\int_0^\infty \frac{\sin(x)}{x} \, \frac{\sin(a_1 x)}{a_1 x} \, \frac{\sin(a_2 x)}{a_2 x} \cdots \frac{\sin(a_n x)}{a_n x} \, dx }

will then equal \pi/2 as long as a_1 + \cdots + a_n \le 1, but not when it exceeds 1. You can see a full explanation on Wikipedia:

• Wikipedia, Borwein integral: general formula.

As an example, I chose the integral

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \cdots \, \frac{\sin \left(\frac{t}{100 n +1}\right)}{\frac{t}{100 n + 1}} \, dt  }

which equals \pi/2 if and only if

\displaystyle{ \sum_{k=1}^n \frac{1}{100 k + 1} \le 1  }

Thus, the identity holds if

\displaystyle{ \sum_{k=1}^n \frac{1}{100 k} \le 1  }


\displaystyle{ \sum_{k=1}^n \frac{1}{k} \le 1 + \ln n }

so the identity holds if

\displaystyle{ \frac{1}{100} (1 + \ln n) \le 1 }


\ln n \le 99


n \le e^{99} \approx 9.8 \cdot 10^{42}

On the other hand, the identity fails if

\displaystyle{ \sum_{k=1}^n \frac{1}{100 k + 1} > 1  }

so it fails if

\displaystyle{ \sum_{k=1}^n \frac{1}{101 k} > 1  }


\displaystyle{ \sum_{k=1}^n \frac{1}{k} \ge \ln n }

so the identity fails if

\displaystyle{ \frac{1}{101} \ln n > 1 }


\displaystyle{ \ln n > 101}


\displaystyle{n > e^{101} \approx 7.4 \cdot 10^{43} }

With a little work one could sharpen these estimates considerably, though it would take more work to find the exact value of n at which

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \cdots \, \frac{\sin \left(\frac{t}{100 n +1}\right)}{\frac{t}{100 n + 1}} \, dt = \frac{\pi}{2} }

first fails.

What is Applied Category Theory?

18 September, 2018

Tai-Danae Bradley has a new free “booklet” on applied category theory. It was inspired by the workshop Applied Category Theory 2018, which she attended, and I think it makes a great complement to Fong and Spivak’s book Seven Sketches and my online course based on that book:

• Tai-Danae Bradley, What is Applied Category Theory?

Abstract. This is a collection of introductory, expository notes on applied category theory, inspired by the 2018 Applied Category Theory Workshop, and in these notes we take a leisurely stroll through two themes (functorial semantics and compositionality), two constructions (monoidal categories and decorated cospans) and two examples (chemical reaction networks and natural language processing) within the field.

Check it out!

The 5/8 Theorem

16 September, 2018

This is a well-known, easy group theory result that I just learned. I would like to explain it more slowly and gently, and I hope memorably, than I’ve seen it done.

It’s called the 5/8 theorem. Randomly choose two elements of a finite group. What’s the probability that they commute? If it exceeds 62.5%, the group must be abelian!

This was probably known for a long time, but the first known proof appears in a paper by Erdös and Turan.

It’s fun to lead up to this proof by looking for groups that are “as commutative as possible without being abelian”. This phrase could mean different things. One interpretation is that we’re trying to maximize the probability that two randomly chosen elements commute. But there are two simpler interpretations, which will actually help us prove the 5/8 theorem.

How big can the center be?

How big can the center of a finite group be, compared to the whole group? If a group G is abelian, its center, say Z, is all of G. But let’s assume G is not abelian. How big can |Z|/|G| be?

Since the center is a subgroup of G, we know by Lagrange’s theorem that |G|/|Z| is an integer. To make |Z|/|G| big we need this integer to be small. How small can it be?

It can’t be 1, since then |Z| = |G| and G would be abelian. Can it be 2?

No! This would force G to be abelian, leading to a contradiction! The reason is that the center is always a normal subgroup of G, so G/Z is a group of size |G/Z| = |G|/|Z|. If this is 2 then G/Z has to be \mathbb{Z}/2. But this is generated by one element, so G must be generated by its center together with one element. This one element commutes with everything in the center, obviously… but that means G is abelian: a contradiction!

For the same reason, |G|/|Z| can’t be 3. The only group with 3 elements is \mathbb{Z}/3, which is generated by one element. So the same argument leads to a contradiction: G is generated by its center and one element, which commutes with everything in the center, so G is abelian.

So let’s try |G|/|Z| = 4. There are two groups with 4 elements: \mathbb{Z}/4 and \mathbb{Z}/2 \times \mathbb{Z}/2. The second, called the Klein four-group, is not generated by one element. It’s generated by two elements! So it offers some hope.

If you haven’t studied much group theory, you could be pessimistic. After all, \mathbb{Z}/2 \times \mathbb{Z}/2 is still abelian! So you might think this: “If G/Z \cong \mathbb{Z}/2 \times \mathbb{Z}/2, the group G is generated by its center and two elements which commute with each other, so it’s abelian.”

But that’s false: even if two elements of G/Z commute with each other, this does not imply that the elements of G mapping to these elements commute.

This is a fun subject to study, but best way for us to see this right now is to actually find a nonabelian group G with G/Z \cong \mathbb{Z}/2 \times \mathbb{Z}/2. The smallest possible example would have \mathbb{Z}/2, and indeed this works!

Namely, we’ll take G to be the 8-element quaternion group

Q = \{ \pm 1, \pm i, \pm j, \pm k \}


i^2 = j^2 = k^2 = -1
i j = k, \quad j k = i, \quad k i = j
j i = -k, \quad k j = -i, \quad i k = -j

and multiplication by -1 works just as you’d expect, e.g.

(-1)^2 = 1

You can think of these 8 guys as the unit quaternions lying on the 4 coordinate axes. They’re the vertices of a 4-dimensional analogue of the octahedron. Here’s a picture by David A. Richter, where the 8 vertices are projected down from 4 dimensions to the vertices of a cube:

The center of Q is Z = \{ \pm 1 \}, and the quotient Q/Z is the Klein four-group, since if we mod out by \pm 1 we get the group

\{1, i, j, k\}


i^2 = j^2 = k^2 = 1
i j = k, \quad j k = i, \quad k i = j
j i = k, \quad k j = i, \quad i k = j

So, we’ve found a nonabelian finite group with 1/4 of its elements lying in the center, and this is the maximum possible fraction!

How big can the centralizer be?

Here’s another way to ask how commutative a finite group G can be, without being abelian. Any element g \in G has a centralizer C(g), consisting of all elements that commute with g.

How big can C(g) be? If g is in the center of G, then C(g) is all of G. So let’s assume g is not in the center, and ask how big the fraction |C(g)|/|G| can be.

In other words: how large can the fraction of elements of G that commute with g be, without it being everything?

It’s easy to check that the centralizer C(g) is a subgroup of G. So, again using Lagrange’s theorem, we know |G|/|C(g)| is an integer. To make the fraction |C(g)|/|G| big, we want this integer to be small. If it’s 1, everything commutes with g. So the first real option is 2.

Can we find an element of a finite group that commutes with exactly 1/2 the elements of that group?

Yes! One example is our friend the quaternion group Q. Each non-central element commutes with exactly half the elements. For example, i commutes only with its own powers: 1, i, -1, -i.

So we’ve found a finite group with a non-central element that commutes with 1/2 the elements in the group, and this is maximum possible fraction!

What’s the maximum probability for two elements to commute?

Now let’s tackle the original question. Suppose G is a nonabelian group. How can we maximize the probability for two randomly chosen elements of G to commute?

Say we randomly pick two elements g,h \in G. Then there are two cases. If g is in the center of G it commutes with h with probability 1. But if g is not in the center, we’ve just seen it commutes with h with probability at most 1/2.

So, to get an upper bound on the probability that our pair of elements commutes, we should make the center Z \subset G as large as possible. We’ve seen that |Z|/|G| is at most 1/4. So let’s use that.

Then with probability 1/4, g commutes with all the elements of G, while with probability 3/4 it commutes with 1/2 the elements of G.

So, the probability that g commutes with h is

\frac{1}{4} \cdot 1 + \frac{3}{4} \cdot \frac{1}{2} = \frac{2}{8} + \frac{3}{8} = \frac{5}{8}

Even better, all these bounds are attained by the quaternion group Q. 1/4 of its elements are in the center, while every element not in the center commutes with 1/2 of the elements! So, the probability that two elements in this group commute is 5/8.

So we’ve proved the 5/8 theorem and shown we can’t improve this constant.

Further thoughts

I find it very pleasant that the quaternion group is “as commutative as possible without being abelian” in three different ways. But I shouldn’t overstate its importance!

I don’t know the proof, but the website groupprops says the following are equivalent for a finite group G:

• The probability that two elements commute is 5/8.

• The inner automorphism group of G has 4 elements.

• The inner automorphism group of G is \mathbb{Z}/2 \times \mathbb{Z}/2.

Examining the argument I gave, it seems the probability 5/8 can only be attained if

|Z|/|G| = 1/4

|C(g)|/|G| = 1/2 for every g \notin Z.

So apparently any finite group with inner automorphism group \mathbb{Z}/2 \times \mathbb{Z}/2 must have these other two properties as well!

There are lots of groups with inner automorphism group \mathbb{Z}/2 \times \mathbb{Z}/2. Besides the quaternion group, there’s one other 8-element group with this property: the group of rotations and reflections of the square, also known as the dihedral group of order 8. And there are six 16-element groups with this property: they’re called the groups of Hall–Senior class two. And I expect that as we go to higher powers of two, there will be vast numbers of groups with this property.

You see, the number of nonisomorphic groups of order 2^n grows alarmingly fast. There’s 1 group of order 2, 2 of order 4, 5 of order 8, 14 of order 16, 51 of order 32, 267 of order 64… but 49,487,365,422 of order 1024. Indeed, it seems ‘almost all’ finite groups have order a power of two, in a certain asymptotic sense. For example, 99% of the roughly 50 billion groups of order ≤ 2000 have order 1024.

Thus, if people trying to classify groups are like taxonomists, groups of order a power of 2 are like insects.

In 1964, the amusingly named pair of authors Marshall Hall Jr. and James K. Senior classified all groups of order 2^n for n \le 6. They developed some powerful general ideas in the process, like isoclinism. I don’t want to explain it here, but which involves the quotient G/Z that I’ve been talking about. So, though I don’t understand much about this, I’m not completely surprised to read that any group of order 2^n has commuting probability 5/8 iff it has ‘Hall–Senior class two’.

There’s much more to say. For example, we can define the probability that two elements commute not just for finite groups but also compact topological groups, since these come with a god-given probability measure, called Haar measure. And here again, if the group is nonabelian, the maximum possible probability for two elements to commute is 5/8!

There are also many other generalizations. For example Guralnick and Wilson proved:

• If the probability that two randomly chosen elements of G generate a solvable group is greater than 11/30 then G itself is solvable.

• If the probability that two randomly chosen elements of G generate a nilpotent group is greater than 1/2 then G is nilpotent.

• If the probability that two randomly chosen elements of G generate a group of odd order is greater than 11/30 then G itself has odd order.

The constants are optimal in each case.

I’ll just finish with two questions I don’t know the answer to:

• For exactly what set of numbers p \in (0,1] can we find a finite group where the probability that two randomly chosen elements commute is p? If we call this set S we’ve seen

S \subseteq (0,5/8] \cup \{1\}

But does S contain every rational number in the interval (0,5/8], or just some? Just some, in fact—but which ones? It should be possible to make some progress on this by examining my proof of the 5/8 theorem, but I haven’t tried at all. I leave it to you!

• For what properties P of a finite group is there a theorem of this form: “if the probability of two randomly chosen elements generating a subgroup of G with property P exceeds some value p, then G must itself have property P”? Is there some logical form a property can have, that will guarantee the existence of a result like this?


Here is a nice discussion, where I learned some of the facts I mentioned, including the proof I gave:

• MathOverflow, 5/8 bound in group theory.

Here is an elementary reference, free online if you jump through some hoops, which includes the proof for compact topological groups, and other bits of wisdom:

• W. H. Gustafson, What is the probability that two group elements commute?, American Mathematical Monthly 80 (1973), 1031–1034.

For example, if G is finite simple and nonabelian, the probability that two elements commute is at most 1/12, a bound attained by \mathrm{A}_5.

Here’s another elementary article:

• Desmond MacHale, How commutative can a non-commutative group be?, The Mathematical Gazette 58 (1974), 199–202.

If you get completely stuck on Puzzle 1, you can look here for some hints on what values the probability of two elements to commute can take… but not a complete solution!

The 5/8 theorem seems to have first appeared here:

• P. Erdös and P. Turán, On some problems of a statistical group-theory, IV, Acta Math. Acad. Sci. Hung. 19 (1968) 413–435.