Bigness (Part 3)

Last time I talked about a 200-terabyte proof, completed in 2016, that the journal Nature called the “largest ever”. But Michael Nielsen pointed me to a vastly longer proof from 2012. This one required about 60,000,000 terabytes—that is, 60 exabytes! So, they were unable to store the whole proof.

Let’s pause to contemplate what 60 exabytes amounts to. It’s been estimated that all words ever spoken by human beings, if transcribed into text, would take just 5 exabytes to store. As of 2020, the genome of all the humans on Earth would take about 10 exabytes to store (if one foolishly didn’t do any data compression). On the other hand, in 2011 the total capacity of all hard drives sold by Seagate Technology was 330 exabytes.

So what proof was 60 exabytes long?

A semigroup is a set with a binary associative operation, which we write as multiplication. We say two semigroups S and T are isomorphic if there is a bijection f\colon S \to T such that

f(ss') = f(s) f(s')

for all s,s' \in S, and anti-isomorphic if there’s a bijection with

f(ss') = f(s') f(s)

for all s,s' \in S. We say two semigroups are equivalent if they are isomorphic or anti-isomorphic.

In 2012, Andreas Distler, Chris Jefferson, Tom Kelsey, and Lars Kottho showed that up to equivalence, there are

12,418,001,077,381,302,684 ≈ 1.2 · 1019

semigroups with 10 elements. That’s the fact that took 60 exabytes to establish! They also checked their method against previous calculations of the number of semigroups with 7, 8, or 9 elements.

Upon hearing of the much shorter proof called the “largest ever” by Nature, Chris Jefferson wrote:

Darn, I had no idea one could get into the media with this kind of stuff. I had a much larger “proof”, where we didn’t bother storing all the details, in which we enumerated 718,981,858,383,872 semigroups, towards counting the semigroups of size 10.

Uncompressed, it would have been about 63,000 terabytes just for the semigroups, and about a thousand times that to store the “proof”, which is just the tree of the search.

Of course, it would have compressed extremely well, but also I’m not sure it would have had any value, you could rebuild the search tree much faster than you could read it from a disc, and if anyone re-verified our calculation I would prefer they did it by a slightly different search, which would give us much better guarantees of correctness.

As I mentioned, his team found a total of 12,418,001,077,381,302,684 semigroups of size 10. But they took advantage of a formula, also published in 2012, that counts most of the semigroups of a given size, namely those that are ‘nilpotent of degree 3’. Thus, they only had to find 718,981,858,383,872 by a brute force search, a mere 0.006% of the total:

• Andreas Distler, Chris Jefferson, Tom Kelsey, and Lars Kottho, The semigroups of order 10, in Principles and Practice of Constraint Programming, Springer Lecture Notes in Computer Science 7514, Springer, Berlin, 2012, pp. 883–899.

What does it mean for a semigroup to be nilpotent of degree 3? It means that if you multiply any three elements of the semigroup you always get the same answer! Here’s the paper that showed how to count those:

• Andreas Distler and James D. Mitchell, The number of nilpotent semigroups of degree 3, Electron. J. Combin. 19 (2012) 51.

It’s a piece of semigroup folklore that as the size approaches infinity, the fraction of semigroups of that size that are nilpotent of degree 3 approaches 1.

Puzzle. Can you prove this?

Apparently nobody has yet.

Puzzle. Assuming the folklore is true, what’s the rough growth rate of the number of semigroups with n elements, counted up to equivalence? The formulas given by Distler and Mitchell are very complicated, but they may have simple asymptotics.

Here they count the number of semigroups that are nilpotent of degree 3 with various numbers of elements, up to equivalence:


9 Responses to Bigness (Part 3)

  1. hmflogic says:

    Long finite sequences, Journal of Combinatorial Theory, Series A 95, 102-144 (2001).

  2. Royce Peng says:

    In the book “Enumeration of Finite Groups”, a simple argument is given which proves that the number of semigroups of order n is eventually greater than n^{n^2 (1 - \epsilon)} , so it’s quite close to the number n^{n^2} of binary operations. A more sophisticated argument gives a lower bound of n^{n^2 (1 - (\log \log n / \log n))} - O (1/ \log n) .

  3. Wolfgang says:

    Isn’t it a bit odd to count an exhaustive search as a proof? Technically, I guess, it is one, of course. But then this just means, that there will be no limit to the size of a proof anymore. There are possibly an infinite amount of especially combinatorial enumeration problems which go on forever, each contributing an infinite sized proof.

    • John Baez says:

      The proof I’m discussing here was far more clever than an exhaustive search. An exhaustive search would have taken vastly longer, and would be completely impractical. This is already visible from the fact that they proved there are


      semigroups of size 10 by listing just


      of them. The rest — 99.994% of them — were dealt with in a more clever way. But even listing the ones they did would have been impractical without some serious mathematical tricks.

      By the way: there never was a limit on the size of proofs. Since there are infinitely many theorems (in most formalizations of mathematics), and only finitely many of size < N for any number N (in any specific system), there have to be infinitely many of size > N.

      In particular, I know a theorem whose shortest proof (in your favorite consistent formalization of set theory) is so long that the number of digits in the number of digits in the number of digits in the number of steps in its shortest proof exceeds the number of atoms in the observable Universe. I explain how to find such a theorem here:

      Insanely long proofs.

      But the question I’m interested in right now is: what’s the longest proof that someone has actually done? If anyone knows one longer than 60 exabytes, please let me know!

      • Wolfgang says:

        Ok, thank you for the clarification.

      • arch1 says:

        I realize this is stretching the definition of “mathematical proof,” but by way of comparison, the largest chess endgame tablebase (which records the optimal-play outcome of each reachable chess position having 7 or fewer pieces on the board) is a miniscule 140 TB. Research is ongoing on an 8-piece tablebase, which is expected to exceed 1 petabyte in size.

  4. Phillip Helbig says:

    ” On the other hand, in 2011 the total capacity of all hard drives sold by Seagate Technology was 330 exabytes”

    You have to love the fact that their street address is on Disk Drive. :-)

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.