Patterns That Eventually Fail

20 September, 2018

Sometimes patterns can lead you astray. For example, it’s known that

\displaystyle{ \mathrm{li}(x) = \int_0^x \frac{dt}{\ln t} }

is a good approximation to \pi(x), the number of primes less than or equal to x. Numerical evidence suggests that \mathrm{li}(x) is always greater than \pi(x). For example,

\mathrm{li}(10^{12}) - \pi(10^{12}) = 38,263


\mathrm{li}(10^{24}) - \pi(10^{24}) = 17,146,907,278

But in 1914, Littlewood heroically showed that in fact, \mathrm{li}(x) - \pi(x) changes sign infinitely many times!

This raised the question: when does \pi(x) first exceed \mathrm{li}(x)? In 1933, Littlewood’s student Skewes showed, assuming the Riemann hypothesis, that it must do so for some x less than or equal to

\displaystyle{ 10^{10^{10^{34}}} }

Later, in 1955, Skewes showed without the Riemann hypothesis that \pi(x) must exceed \mathrm{li}(x) for some x smaller than

\displaystyle{ 10^{10^{10^{964}}} }

By now this bound has been improved enormously. We now know the two functions cross somewhere near 1.397 \times 10^{316}, but we don’t know if this is the first crossing!

All this math is quite deep. Here is something less deep, but still fun.

You can show that

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, dt = \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, dt = \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \, dt = \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \, \frac{\sin \left(\frac{t}{301}\right)}{\frac{t}{301}} \, dt = \frac{\pi}{2} }

and so on.

It’s a nice pattern. But this pattern doesn’t go on forever! It lasts a very, very long time… but not forever.

More precisely, the identity

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \cdots \, \frac{\sin \left(\frac{t}{100 n +1}\right)}{\frac{t}{100 n + 1}} \, dt = \frac{\pi}{2} }

holds when

n < 9.8 \cdot 10^{42}

but not for all n. At some point it stops working and never works again. In fact, it definitely fails for all

n > 7.4 \cdot 10^{43}

The explanation

The integrals here are a variant of the Borwein integrals:

\displaystyle{ \int_0^\infty \frac{\sin(x)}{x} \, dx= \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin(x)}{x}\frac{\sin(x/3)}{x/3} \, dx = \frac{\pi}{2} }

\displaystyle{ \int_0^\infty \frac{\sin(x)}{x}\, \frac{\sin(x/3)}{x/3} \, \frac{\sin(x/5)}{x/5} \, dx = \frac{\pi}{2} }

where the pattern continues until

\displaystyle{ \int_0^\infty \frac{\sin(x)}{x} \, \frac{\sin(x/3)}{x/3}\cdots\frac{\sin(x/13)}{x/13} \, dx = \frac{\pi}{2} }

but then fails:

\displaystyle{\int_0^\infty \frac{\sin(x)}{x} \, \frac{\sin(x/3)}{x/3}\cdots \frac{\sin(x/15)}{x/15} \, dx \approx \frac \pi 2 - 2.31\times 10^{-11} }

I never understood this until I read Greg Egan’s explanation, based on the work of Hanspeter Schmid. It’s all about convolution, and Fourier transforms:

Suppose we have a rectangular pulse, centred on the origin, with a height of 1/2 and a half-width of 1.

Now, suppose we keep taking moving averages of this function, again and again, with the average computed in a window of half-width 1/3, then 1/5, then 1/7, 1/9, and so on.

There are a couple of features of the original pulse that will persist completely unchanged for the first few stages of this process, but then they will be abruptly lost at some point.

The first feature is that F(0) = 1/2. In the original pulse, the point (0,1/2) lies on a plateau, a perfectly constant segment with a half-width of 1. The process of repeatedly taking the moving average will nibble away at this plateau, shrinking its half-width by the half-width of the averaging window. So, once the sum of the windows’ half-widths exceeds 1, at 1/3+1/5+1/7+…+1/15, F(0) will suddenly fall below 1/2, but up until that step it will remain untouched.

In the animation below, the plateau where F(x)=1/2 is marked in red.

The second feature is that F(–1)=F(1)=1/4. In the original pulse, we have a step at –1 and 1, but if we define F here as the average of the left-hand and right-hand limits we get 1/4, and once we apply the first moving average we simply have 1/4 as the function’s value.

In this case, F(–1)=F(1)=1/4 will continue to hold so long as the points (–1,1/4) and (1,1/4) are surrounded by regions where the function has a suitable symmetry: it is equal to an odd function, offset and translated from the origin to these centres. So long as that’s true for a region wider than the averaging window being applied, the average at the centre will be unchanged.

The initial half-width of each of these symmetrical slopes is 2 (stretching from the opposite end of the plateau and an equal distance away along the x-axis), and as with the plateau, this is nibbled away each time we take another moving average. And in this case, the feature persists until 1/3+1/5+1/7+…+1/113, which is when the sum first exceeds 2.

In the animation, the yellow arrows mark the extent of the symmetrical slopes.

OK, none of this is difficult to understand, but why should we care?

Because this is how Hanspeter Schmid explained the infamous Borwein integrals:

∫sin(t)/t dt = π/2
∫sin(t/3)/(t/3) × sin(t)/t dt = π/2
∫sin(t/5)/(t/5) × sin(t/3)/(t/3) × sin(t)/t dt = π/2

∫sin(t/13)/(t/13) × … × sin(t/3)/(t/3) × sin(t)/t dt = π/2

But then the pattern is broken:

∫sin(t/15)/(t/15) × … × sin(t/3)/(t/3) × sin(t)/t dt < π/2

Here these integrals are from t=0 to t=∞. And Schmid came up with an even more persistent pattern of his own:

∫2 cos(t) sin(t)/t dt = π/2
∫2 cos(t) sin(t/3)/(t/3) × sin(t)/t dt = π/2
∫2 cos(t) sin(t/5)/(t/5) × sin(t/3)/(t/3) × sin(t)/t dt = π/2

∫2 cos(t) sin(t/111)/(t/111) × … × sin(t/3)/(t/3) × sin(t)/t dt = π/2


∫2 cos(t) sin(t/113)/(t/113) × … × sin(t/3)/(t/3) × sin(t)/t dt < π/2

The first set of integrals, due to Borwein, correspond to taking the Fourier transforms of our sequence of ever-smoother pulses and then evaluating F(0). The Fourier transform of the sinc function:

sinc(w t) = sin(w t)/(w t)

is proportional to a rectangular pulse of half-width w, and the Fourier transform of a product of sinc functions is the convolution of their transforms, which in the case of a rectangular pulse just amounts to taking a moving average.

Schmid’s integrals come from adding a clever twist: the extra factor of 2 cos(t) shifts the integral from the zero-frequency Fourier component to the sum of its components at angular frequencies –1 and 1, and hence the result depends on F(–1)+F(1)=1/2, which as we have seen persists for much longer than F(0)=1/2.

• Hanspeter Schmid, Two curious integrals and a graphic proof, Elem. Math. 69 (2014) 11–17.

I asked Greg if we could generalize these results to give even longer sequences of identities that eventually fail, and he showed me how: you can just take the Borwein integrals and replace the numbers 1, 1/3, 1/5, 1/7, … by some sequence of positive numbers

1, a_1, a_2, a_3 \dots

The integral

\displaystyle{\int_0^\infty \frac{\sin(x)}{x} \, \frac{\sin(a_1 x)}{a_1 x} \, \frac{\sin(a_2 x)}{a_2 x} \cdots \frac{\sin(a_n x)}{a_n x} \, dx }

will then equal \pi/2 as long as a_1 + \cdots + a_n \le 1, but not when it exceeds 1. You can see a full explanation on Wikipedia:

• Wikipedia, Borwein integral: general formula.

As an example, I chose the integral

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \cdots \, \frac{\sin \left(\frac{t}{100 n +1}\right)}{\frac{t}{100 n + 1}} \, dt  }

which equals \pi/2 if and only if

\displaystyle{ \sum_{k=1}^n \frac{1}{100 k + 1} \le 1  }

Thus, the identity holds if

\displaystyle{ \sum_{k=1}^n \frac{1}{100 k} \le 1  }


\displaystyle{ \sum_{k=1}^n \frac{1}{k} \le 1 + \ln n }

so the identity holds if

\displaystyle{ \frac{1}{100} (1 + \ln n) \le 1 }


\ln n \le 99


n \le e^{99} \approx 9.8 \cdot 10^{42}

On the other hand, the identity fails if

\displaystyle{ \sum_{k=1}^n \frac{1}{100 k + 1} > 1  }

so it fails if

\displaystyle{ \sum_{k=1}^n \frac{1}{101 k} > 1  }


\displaystyle{ \sum_{k=1}^n \frac{1}{k} \ge \ln n }

so the identity fails if

\displaystyle{ \frac{1}{101} \ln n > 1 }


\displaystyle{ \ln n > 101}


\displaystyle{n > e^{101} \approx 7.4 \cdot 10^{43} }

With a little work one could sharpen these estimates considerably, though it would take more work to find the exact value of n at which

\displaystyle{ \int_0^\infty \frac{\sin t}{t} \, \frac{\sin \left(\frac{t}{101}\right)}{\frac{t}{101}} \, \frac{\sin \left(\frac{t}{201}\right)}{\frac{t}{201}} \cdots \, \frac{\sin \left(\frac{t}{100 n +1}\right)}{\frac{t}{100 n + 1}} \, dt = \frac{\pi}{2} }

first fails.

What is Applied Category Theory?

18 September, 2018

Tai-Danae Bradley has a new free “booklet” on applied category theory. It was inspired by the workshop Applied Category Theory 2018, which she attended, and I think it makes a great complement to Fong and Spivak’s book Seven Sketches and my online course based on that book:

• Tai-Danae Bradley, What is Applied Category Theory?

Abstract. This is a collection of introductory, expository notes on applied category theory, inspired by the 2018 Applied Category Theory Workshop, and in these notes we take a leisurely stroll through two themes (functorial semantics and compositionality), two constructions (monoidal categories and decorated cospans) and two examples (chemical reaction networks and natural language processing) within the field.

Check it out!

The 5/8 Theorem

16 September, 2018

This is a well-known, easy group theory result that I just learned. I would like to explain it more slowly and gently, and I hope memorably, than I’ve seen it done.

It’s called the 5/8 theorem. Randomly choose two elements of a finite group. What’s the probability that they commute? If it exceeds 62.5%, the group must be abelian!

This was probably known for a long time, but the first known proof appears in a paper by Erdös and Turan.

It’s fun to lead up to this proof by looking for groups that are “as commutative as possible without being abelian”. This phrase could mean different things. One interpretation is that we’re trying to maximize the probability that two randomly chosen elements commute. But there are two simpler interpretations, which will actually help us prove the 5/8 theorem.

How big can the center be?

How big can the center of a finite group be, compared to the whole group? If a group G is abelian, its center, say Z, is all of G. But let’s assume G is not abelian. How big can |Z|/|G| be?

Since the center is a subgroup of G, we know by Lagrange’s theorem that |G|/|Z| is an integer. To make |Z|/|G| big we need this integer to be small. How small can it be?

It can’t be 1, since then |Z| = |G| and G would be abelian. Can it be 2?

No! This would force G to be abelian, leading to a contradiction! The reason is that the center is always a normal subgroup of G, so G/Z is a group of size |G/Z| = |G|/|Z|. If this is 2 then G/Z has to be \mathbb{Z}/2. But this is generated by one element, so G must be generated by its center together with one element. This one element commutes with everything in the center, obviously… but that means G is abelian: a contradiction!

For the same reason, |Z|/|G| can’t be 3. The only group with 3 elements is \mathbb{Z}/3, which is generated by one element. So the same argument leads to a contradiction: G is generated by its center and one element, which commutes with everything in the center, so G is abelian.

So let’s try |Z|/|G| = 4. There are two groups with 4 elements: \mathbb{Z}/4 and \mathbb{Z}/2 \times \mathbb{Z}/2. The second, called the Klein four-group, is not generated by one element. It’s generated by two elements! So it offers some hope.

If you haven’t studied much group theory, you could be pessimistic. After all, \mathbb{Z}/2 \times \mathbb{Z}/2 is still abelian! So you might think this: “If G/Z \cong \mathbb{Z}/2 \times \mathbb{Z}/2, the group G is generated by its center and two elements which commute with each other, so it’s abelian.”

But that’s false: even if two elements of G/Z commute with each other, this does not imply that the elements of G mapping to these elements commute.

This is a fun subject to study, but best way for us to see this right now is to actually find a nonabelian group G with G/Z \cong \mathbb{Z}/2 \times \mathbb{Z}/2. The smallest possible example would have \mathbb{Z}/2, and indeed this works!

Namely, we’ll take G to be the 8-element quaternion group

Q = \{ \pm 1, \pm i, \pm j, \pm k \}


i^2 = j^2 = k^2 = -1
i j = k, \quad j k = i, \quad k i = j
j i = -k, \quad k j = -i, \quad i k = -j

and multiplication by -1 works just as you’d expect, e.g.

(-1)^2 = 1

You can think of these 8 guys as the unit quaternions lying on the 4 coordinate axes. They’re the vertices of a 4-dimensional analogue of the octahedron. Here’s a picture by David A. Richter, where the 8 vertices are projected down from 4 dimensions to the vertices of a cube:

The center of Q is Z = \{ \pm 1 \}, and the quotient Q/Z is the Klein four-group, since if we mod out by \pm 1 we get the group

\{1, i, j, k\}


i^2 = j^2 = k^2 = 1
i j = k, \quad j k = i, \quad k i = j
j i = k, \quad k j = i, \quad i k = j

So, we’ve found a nonabelian finite group with 1/4 of its elements lying in the center, and this is the maximum possible fraction!

How big can the centralizer be?

Here’s another way to ask how commutative a finite group G can be, without being abelian. Any element g \in G has a centralizer C(g), consisting of all elements that commute with g.

How big can C(g) be? If g is in the center of G, then C(g) is all of G. So let’s assume g is not in the center, and ask how big the fraction |C(g)|/|G| can be.

In other words: how large can the fraction of elements of G that commute with g be, without it being everything?

It’s easy to check that the centralizer C(g) is a subgroup of G. So, again using Lagrange’s theorem, we know |G|/|C(g)| is an integer. To make the fraction |C(g)|/|G| big, we want this integer to be small. If it’s 1, everything commutes with g. So the first real option is 2.

Can we find an element of a finite group that commutes with exactly 1/2 the elements of that group?

Yes! One example is our friend the quaternion group Q. Each non-identity element commutes with exactly half the elements. For example, i commutes only with its own powers: 1, i, -1, -i.

So we’ve found a finite group with a non-central element that commutes with 1/2 the elements in the group, and this is maximum possible fraction!

What’s the maximum probability for two elements to commute?

Now let’s tackle the original question. Suppose G is a nonabelian group. How can we maximize the probability for two randomly chosen elements of G to commute?

Say we randomly pick two elements g,h \in G. Then there are two cases. If g is in the center of G it commutes with h with probability 1. But if g is not in the center, we’ve just seen it commutes with h with probability at most 1/2.

So, to get an upper bound on the probability that our pair of elements commutes, we should make the center Z \subset G as large as possible. We’ve seen that |Z|/|G| is at most 1/4. So let’s use that.

Then with probability 1/4, g commutes with all the elements of G, while with probability 3/4 it commutes with 1/2 the elements of G.

So, the probability that g commutes with h is

\frac{1}{4} \cdot 1 + \frac{3}{4} \cdot \frac{1}{2} = \frac{2}{8} + \frac{3}{8} = \frac{5}{8}

Even better, all these bounds are attained by the quaternion group Q. 1/4 of its elements are in the center, while every element not in the center commutes with 1/2 of the elements! So, the probability that two elements in this group commute is 5/8.

So we’ve proved the 5/8 theorem and shown we can’t improve this constant.

Further thoughts

I find it very pleasant that the quaternion group is “as commutative as possible without being abelian” in three different ways. But I shouldn’t overstate its importance!

I don’t know the proof, but the website groupprops says the following are equivalent for a finite group G:

• The probability that two elements commute is 5/8.

• The inner automorphism group of G has 4 elements.

• The inner automorphism group of G is \mathbb{Z}/2 \times \mathbb{Z}/2.

Examining the argument I gave, it seems the probability 5/8 can only be attained if

|Z|/|G| = 1/4

|C(g)|/|G| = 1/2 for every g \notin Z.

So apparently any finite group with inner automorphism group \mathbb{Z}/2 \times \mathbb{Z}/2 must have these other two properties as well!

There are lots of groups with inner automorphism group \mathbb{Z}/2 \times \mathbb{Z}/2. Besides the quaternion group, there’s one other 8-element group with this property: the group of rotations and reflections of the square, also known as the dihedral group of order 8. And there are six 16-element groups with this property: they’re called the groups of Hall–Senior class two. And I expect that as we go to higher powers of two, there will be vast numbers of groups with this property.

You see, the number of nonisomorphic groups of order 2^n grows alarmingly fast. There’s 1 group of order 2, 2 of order 4, 5 of order 8, 14 of order 16, 51 of order 32, 267 of order 64… but 49,487,365,422 of order 1024. Indeed, it seems ‘almost all’ finite groups have order a power of two, in a certain asymptotic sense. For example, 99% of the roughly 50 billion groups of order ≤ 2000 have order 1024.

Thus, if people trying to classify groups are like taxonomists, groups of order a power of 2 are like insects.

In 1964, the amusingly named pair of authors Marshall Hall Jr. and James K. Senior classified all groups of order 2^n for n \le 6. They developed some powerful general ideas in the process, like isoclinism. I don’t want to explain it here, but which involves the quotient G/Z that I’ve been talking about. So, though I don’t understand much about this, I’m not completely surprised to read that any group of order 2^n has commuting probability 5/8 iff it has ‘Hall–Senior class two’.

There’s much more to say. For example, we can define the probability that two elements commute not just for finite groups but also compact topological groups, since these come with a god-given probability measure, called Haar measure. And here again, if the group is nonabelian, the maximum possible probability for two elements to commute is 5/8!

There are also many other generalizations. For example Guralnick and Wilson proved:

• If the probability that two randomly chosen elements of G generate a solvable group is greater than 11/30 then G itself is solvable.

• If the probability that two randomly chosen elements of G generate a nilpotent group is greater than 1/2 then G is nilpotent.

• If the probability that two randomly chosen elements of G generate a group of odd order is greater than 11/30 then G itself has odd order.

The constants are optimal in each case.

I’ll just finish with two questions I don’t know the answer to:

• For exactly what set of numbers p \in (0,1] can we find a finite group where the probability that two randomly chosen elements commute is p? If we call this set S we’ve seen

S \subseteq (0,5/8] \cup \{1\}

But does S contain every rational number in the interval (0,5/8], or just some? Just some, in fact—but which ones? It should be possible to make some progress on this by examining my proof of the 5/8 theorem, but I haven’t tried at all. I leave it to you!

• For what properties P of a finite group is there a theorem of this form: “if the probability of two randomly chosen elements generating a subgroup of G with property P exceeds some value p, then G must itself have property P”? Is there some logical form a property can have, that will guarantee the existence of a result like this?


Here is a nice discussion, where I learned some of the facts I mentioned, including the proof I gave:

• MathOverflow, 5/8 bound in group theory.

Here is an elementary reference, free online if you jump through some hoops, which includes the proof for compact topological groups, and other bits of wisdom:

• W. H. Gustafson, What is the probability that two group elements commute?, American Mathematical Monthly 80 (1973), 1031–1034.

For example, if G is finite simple and nonabelian, the probability that two elements commute is at most 1/12, a bound attained by \mathrm{A}_5.

Here’s another elementary article:

• Desmond MacHale, How commutative can a non-commutative group be?, The Mathematical Gazette 58 (1974), 199–202.

If you get completely stuck on Puzzle 1, you can look here for some hints on what values the probability of two elements to commute can take… but not a complete solution!

The 5/8 theorem seems to have first appeared here:

• P. Erdös and P. Turán, On some problems of a statistical group-theory, IV, Acta Math. Acad. Sci. Hung. 19 (1968) 413–435.

Noether’s Theorem

12 September, 2018


I’ve been spending the last month at the Centre of Quantum Technologies, getting lots of work done. This Friday I’m giving a talk, and you can see the slides now:

• John Baez, Getting to the bottom of Noether’s theorem.

Abstract. In her paper of 1918, Noether’s theorem relating symmetries and conserved quantities was formulated in term of Lagrangian mechanics. But if we want to make the essence of this relation seem as self-evident as possible, we can turn to a formulation in term of Poisson brackets, which generalizes easily to quantum mechanics using commutators. This approach also gives a version of Noether’s theorem for Markov processes. The key question then becomes: when, and why, do observables generate one-parameter groups of transformations? This question sheds light on why complex numbers show up in quantum mechanics.

At 5:30 on Saturday October 6th I’ll talk about this stuff at this workshop in London:

The Philosophy and Physics of Noether’s Theorems, 5-6 October 2018, Fischer Hall, 1-4 Suffolk Street, London, UK. Organized by Bryan W. Roberts (LSE) and Nicholas Teh (Notre Dame).

This workshop celebrates the 100th anniversary of Noether’s famous paper connecting symmetries to conserved quantities. Her paper actually contains two big theorems. My talk is only about the more famous one, Noether’s first theorem, and I’ll change my talk title to make that clear when I go to London, to avoid getting flak from experts. Her second theorem explains why it’s hard to define energy in general relativity! This is one reason Einstein admired Noether so much.

I’ll also give this talk at DAMTP—the Department of Applied Mathematics and Theoretical Physics, in Cambridge—on Thursday October 4th at 1 pm.

The organizers of London workshop on the philosophy and physics of Noether’s theorems have asked me to write a paper, so my talk can be seen as the first step toward that. My talk doesn’t contain any hard theorems, but the main point—that the complex numbers arise naturally from wanting a correspondence between observables and symmetry generators—can be expressed in some theorems, which I hope to explain in my paper.


Compositionality – Now Open For Submissions

24 August, 2018

Our new journal Compositionality is now open for submissions!

It’s an open-access journal for research using compositional ideas, most notably of a category-theoretic origin, in any discipline. Topics may concern foundational structures, an organizing principle, or a powerful tool. Example areas include but are not limited to: computation, logic, physics, chemistry, engineering, linguistics, and cognition.

Compositionality is free of cost for both readers and authors.


We invite you to submit a manuscript for publication in the first issue of Compositionality (ISSN: 2631-4444), a new open-access journal for research using compositional ideas, most notably of a category-theoretic origin, in any discipline.

To submit a manuscript, please visit


Compositionality refers to complex things that can be built by sticking together simpler parts. We welcome papers using compositional ideas, most notably of a category-theoretic origin, in any discipline. This may concern foundational structures, an organising principle, a powerful tool, or an important application. Example areas include but are not limited to: computation, logic, physics, chemistry, engineering, linguistics, and cognition.

Related conferences and workshops that fall within the scope of Compositionality include the Symposium on Compositional Structures (SYCO), Categories, Logic and Physics (CLP), String Diagrams in Computation, Logic and Physics (STRING), Applied Category Theory (ACT), Algebra and Coalgebra in Computer Science (CALCO), and the Simons Workshop on Compositionality.


Submissions should be original contributions of previously unpublished work, and may be of any length. Work previously published in conferences and workshops must be significantly expanded or contain significant new results to be accepted. There is no deadline for submission. There is no processing charge for accepted publications; Compositionality is free to read and free to publish in. More details can be found in our editorial policies at


John Baez, University of California, Riverside, USA
Bob Coecke, University of Oxford, UK
Kathryn Hess, EPFL, Switzerland
Steve Lack, Macquarie University, Australia
Valeria de Paiva, Nuance Communications, USA


Corina Cirstea, University of Southampton, UK
Ross Duncan, University of Strathclyde, UK
Andree Ehresmann, University of Picardie Jules Verne, France
Tobias Fritz, Max Planck Institute, Germany
Neil Ghani, University of Strathclyde, UK
Dan Ghica, University of Birmingham, UK
Jeremy Gibbons, University of Oxford, UK
Nick Gurski, Case Western Reserve University, USA
Helle Hvid Hansen, Delft University of Technology, Netherlands
Chris Heunen, University of Edinburgh, UK
Aleks Kissinger, Radboud University, Netherlands
Joachim Kock, Universitat Autonoma de Barcelona, Spain
Martha Lewis, University of Amsterdam, Netherlands
Samuel Mimram, Ecole Polytechnique, France
Simona Paoli, University of Leicester, UK
Dusko Pavlovic, University of Hawaii, USA
Christian Retore, Universite de Montpellier, France
Mehrnoosh Sadrzadeh, Queen Mary University, UK
Peter Selinger, Dalhousie University, Canada
Pawel Sobocinski, University of Southampton, UK
David Spivak, MIT, USA
Jamie Vicary, University of Birmingham and University of Oxford, UK
Simon Willerton, University of Sheffield, UK


The Editorial Board of Compositionality

Complex Adaptive System Design (Part 8)

22 August, 2018

John Foley, Joe Moeller and I have made some nice progress on compositional tasking for the Complex Adaptive System Composition and Design Environment project.

‘Compositional tasking’ means assigning tasks to networks agents in such a way that you can connect or even overlay such tasked networks and get larger ones. This lets you build up complex plans from smaller pieces.

In my last post in this series, I sketched an approach using ‘commitment networks’. A commitment network is a graph where nodes represent agents and edges represent commitments, like “A should move toward B either for 3 hours or until they meet, whichever comes first”. By overlaying such graphs we can build up commitment networks that describe complex plans of action. The rules for overlaying incorporate ‘automatic deconflicting’. In other words: don’t need to worry about agents being given conflicting duties as you stack up plans… because you’ve decided ahead of time what they should do in these situations.

I still like that approach, but we’ve been asked to develop some ideas more closely connected to traditional methods of tasking, like PERT charts, so now we’ve done that.

‘PERT’ stands for ‘program evaluation and review technique’. PERT charts were developed by the US Navy in 1957, but now they’re used all over industry to help plan and schedule large projects.

Here’s simple example:

The nodes in this graph are different states, like “you have built the car but not yet put on the tires”. The edges are different tasks, like “put the tires on the car”. Each state is labelled with an arbitrary name: 10, 20, 30, 40 and 50. The tasks also have names: A, B, C, D, E, and F. More importantly, each task is labelled by the amount of time that task requires!

Your goal is to start at state 10 and move all the way to state 50. Since you’re bossing lots of people around, you can make them do tasks simultaneously. However, you can only reach a state after you have done all the tasks leading up to that state. For example, you can’t reach state 50 unless you have already done all of tasks C, E, and F. Some typical questions are:

• What’s the minimum amount of time it takes to get from state 10 to state 50?

• Which tasks could take longer, without changing the answer to the previous question? How much longer could each task take, without changing the answer? This amount of time is called the slack for that task.

There are known algorithms for solving such problems. These help big organizations plan complex projects. So, connecting compositional tasking to PERT charts seems like a good idea.

At first this seemed confusing because in our previous work the nodes represented agents, while in PERT charts the nodes represent states. Of course graphs can be used for many things, even in the same setup. But the trick was getting everything to fit together nicely.

Now I think we’re close.

John Foley has been working out some nice example problems where a collection of agents need to move along the edges of a graph from specified start locations to specified end locations, taking routes that minimize their total fuel usage. However, there are some constraints. Some edges can only be traversed by specified teams of agents: they can’t go alone. Also, no one agent is allowed to run out of fuel.

This is a nice problem because while it’s pretty simple and specific, it’s representative of a large class of problems where a collection of agents are trying to carry out tasks together. ‘Moving along the edge of a graph’ can stand for a task of any sort. The constraint that some edges can only be traversed by specified teams is then a way of saying that certain tasks can only be accomplished by teams.

Furthermore, there are nice software packages for optimization subject to constraints. For example, John likes one called Choco. So, we plan to use one of these as part of the project.

What makes this all compositional is that John has expressed this problem using our ‘network model’ formalism, which I began sketching in Part 6. This allows us to assemble tasks for larger collections of agents from tasks for smaller collections.

Here, however, an idea due to my student Joe Moeller turned out to be crucial.

In our first examples of network models, explained earlier in this series, we allowed a monoid of networks for any set of agents of different kinds. A monoid has a binary operation called ‘multiplication’, and the idea here was this could describe the operation of ‘overlaying’ networks: for example, laying one set of communication channels, or committments, on top of another.

However, Joe knew full well that a monoid is a category with one object, so he pushed for a generalization that allowed not just a monoid but a category of networks for any set of agents of different kinds. I didn’t know what this was good for, but I figured: what the heck, let’s do it. It was a mathematically natural move, and it didn’t make anything harder—in fact it clarified some of our constructions, which is why Joe wanted to do it.

Now that generalization is proving to be crucial! We can take our category of networks to have states as objects and tasks (ways of moving between states) as morphisms! So, instead of ‘overlaying networks’, the basic operation is now composing tasks.

So, we now have a framework where if you specify a collection of agents of different kinds, we can give you the category whose morphisms are tasks those agents can engage in.

An example is John’s setup where the agents are moving around on a graph.

But this framework also handles PERT charts! While the folks who invented PERT charts didn’t think of them this way, one can think of them as describing categories of a certain specific sort, with states as objects and tasks as morphisms.

So, we now have a compositional framework for PERT charts.

I would like to dive deeper into the details, but this is probably enough for one post. I will say, though, that we use some math I’ve just developed with my grad student Jade Master, explained here:

Open Petri nets (part 3), Azimuth, 19 August 2018.

The key is the relation between Petri nets and PERT charts. I’ll have more to say about that soon, I hope!

Some posts in this series:

Part 1. CASCADE: the Complex Adaptive System Composition and Design Environment.

Part 2. Metron’s software for system design.

Part 3. Operads: the basic idea.

Part 4. Network operads: an easy example.

Part 5. Algebras of network operads: some easy examples.

Part 6. Network models.

Part 7. Step-by-step compositional design and tasking using commitment networks.

Part 8. Compositional tasking using category-valued network models.

Open Petri Nets (Part 3)

19 August, 2018

I’ve been talking about my new paper with Jade Master:

• John Baez and Jade Master, Open Petri nets.

In Part 1 we saw the double category of open Petri nets; in Part 2 we saw the reachability semantics for open Petri nets as a double functor. Now I’d like to wrap up by showing you the engine beneath the hood of our results.

I fell in love with Petri nets when I realized that they were really just presentations of free symmetric monoidal categories. If you like category theory, this turns Petri nets from something mysterious into something attractive.

In any category you can compose morphisms f\colon X \to Y and g\colon Y \to Z and get a morphism gf \colon X \to Z. In a monoidal category you can also tensor morphisms f \colon X \to X' and g \colon Y \to Y' and get a morphism f \otimes g \colon X \otimes X' \to Y \otimes Y'. This of course relies on your ability to tensor objects. In a symmetric monoidal category you also have X \otimes Y \cong Y \otimes X. And of course, there is more to it than this. But this is enough to get started.

A Petri net has ‘places’ and also ‘transitions’ going between multisets of places:

From this data we can try to generate a symmetric monoidal category whose objects are built from the places and whose morphisms are built from the transitions. So, for example, the above Petri net would give a symmetric monoidal category with an object

2 susceptible + infected

and a morphism from this to the object

susceptible + 2 infected

(built using the transition infection), and a morphism
from this to the object

susceptible + infected + resistant

(built using the transition recovery) and so on. Here we are using + to denote the tensor product in our symmetric monoidal category, as usual in chemistry.

When we do this sort of construction, the resulting symmetric monoidal category is ‘free’. That is, we are not imposing any really interesting equations: the objects are freely generated by the places in our Petri net by tensoring, and the morphisms are freely generated by the transitions by tensoring and composition.

That’s the basic idea. The problem is making this idea precise!

Many people have tried in many different ways. I like this approach the best:

• José Meseguer and Ugo Montanari, Petri nets are monoids, Information and Computation 88 (1990), 105–155.

but I think it can be simplified a bit, so let me describe what Jade and I did in our paper.

The problem is that there are different notions of symmetric monoidal category, and also different notions of morphism between Petri nets. We take the maximally strict approach, and work with ‘commutative’ monoidal categories. These are just commutative monoid objects in \mathrm{Cat}, so their associator:

\alpha_{A,B,C} \colon (A + B) + C \stackrel{\sim}{\longrightarrow} A + (B + C)

their left and right unitor:

\lambda_A \colon 0 + A \stackrel{\sim}{\longrightarrow} A

\rho_A \colon A + 0 \stackrel{\sim}{\longrightarrow} A

and even—disturbingly—their braiding:

\sigma_{A,B} \colon A + B \stackrel{\sim}{\longrightarrow} B + A

are all identity morphisms.

The last would ordinarily be seen as ‘going too far’, since while every symmetric monoidal category is equivalent to one with trivial associator and unitors, this ceases to be true if we also require the braiding to be trivial. However, it seems that Petri nets most naturally serve to present symmetric monoidal categories of this very strict sort. There just isn’t enough information in a Petri net to make it worthwhile giving them a nontrivial braiding

\sigma_{A,B} \colon A + B \stackrel{\sim}{\longrightarrow} B+A

It took me a while to accept this, but now it seem obvious. If you want a nontrivial braiding, you should be using something a bit fancier than a Petri net.

Thus, we construct adjoint functors between a category of Petri nets, which we call \textrm{Petri}, and a category of ‘commutative monoidal categories’, which we call \textrm{CMC}.

An object of \textrm{Petri} is a Petri net: that is, a set S of places, a set T of transitions, and source and target functions

s, t \colon T \to \mathbb{N}[S]

where \mathbb{N}[S] is the underlying set of the free commutative monoid on S.

More concretely, \mathbb{N}[S] is the set of formal finite linear combinations of elements of S with natural number coefficients. The set S naturally includes in \mathbb{N}[S], and for any function

f \colon S \to S'

there is a unique monoid homomorphism

\mathbb{N}[f] \colon \mathbb{N}[S] \to \mathbb{N}[S']

extending f.

A Petri net morphism from a Petri net

s, t \colon T \to \mathbb{N}[S]

to a Petri net

s', t' \colon T' \to \mathbb{N}[S']

is a pair of functions

f \colon T \to T'

g \colon S \to S'

making the two obvious diagrams commute:

There is a category \textrm{Petri} with Petri nets as objects and Petri net morphisms as morphisms.

On the other hand, a commutative monoidal category is a commutative monoid object in \mathrm{Cat}. Explicitly, it’s a strict monoidal category (C,+,0) such that for all objects A and B we have

A + B = B + A

and for all morphisms f and g

f + g = g + f

Note that a commutative monoidal category is the same as a strict symmetric monoidal category where the symmetry isomorphisms

\sigma_{A,B} \colon A + B \stackrel{\sim}{\longrightarrow} B+A

are all identity morphisms. Every strict monoidal functor between commutative monoidal categories is automatically a strict symmetric monoidal functor. So, we let \mathrm{CMC} be the category whose objects are commutative monoidal categories and whose morphisms are strict monoidal functors.

There’s a functor

U \colon \mathrm{CMC} \to \mathrm{Petri}

sending any commutative monoidal category C to its underlying Petri net. This Petri net has the set of objects \mathrm{Ob}(C) as its set of places and the set of morphisms \mathrm{Mor}(C) as its set of transitions, and

s, t \colon \mathrm{Mor}(C) \to \mathrm{Ob}(C) \hookrightarrow \mathbb{N}[\mathrm{Ob}(C)]

as its source and target maps.

Proposition. The functor U \colon \mathrm{CMC} \to \mathrm{Petri} has a left adjoint

F \colon \mathrm{Petri} \to \mathrm{CMC}

This is Proposition 10 in our paper, and we give an explicit construction of this left adjoint.

So that’s our conception of the free commutative monoidal category on a Petri net. It’s pretty simple. How could anyone have done anything else?

Montanari and Meseguer do almost the same thing, but our category of Petri nets is a subcategory of theirs: our morphisms of Petri nets send places to places, while they allow more general maps that send a place to a formal linear combination of places. On the other hand, they consider a full subcategory of our \mathrm{CMC} containing only commutative monoidal categories whose objects form a free commutative monoid.

Other papers do a variety of more complicated things. I don’t have the energy to explain them all, but you can see some here:

• Pierpaolo Degano, José Meseguer and Ugo Montanari, Axiomatizing net computations and processes, in Logic in Computer Science 1989, IEEE, New Jersey, 1989, pp. 175–185.

• Vladimiro Sassone, Strong concatenable processes: an approach to the category of Petri net computations, BRICS Report Series, Dept. of Computer Science, U. Aarhus, 1994.

• Vladimiro Sassone, On the category of Petri net computations, in Colloquium on Trees in Algebra and Programming, Springer, Berlin, 1995.

• Vladimiro Sassone, An axiomatization of the algebra of Petri net concatenable processes, in Theoretical Computer Science 170 (1996), 277–296.

• Vladimiro Sassone and Pavel Sobociński, A congruence for Petri nets, Electronic Notes in Theoretical Computer Science 127 (2005), 107–120.

Getting the free commutative monoidal category on a Petri net right is key to developing the reachability semantics for open Petri nets in a nice way. But to see that, you’ll have to read our paper!

Part 1: the double category of open Petri nets.

Part 2: the reachability semantics for open Petri nets.

Part 3: the free symmetric monoidal category on a Petri net.