Last time I talked about a 200-terabyte proof, completed in 2016, that the journal Nature called the “largest ever”. But Michael Nielsen pointed me to a vastly longer proof from 2012. This one required about 60,000,000 terabytes—that is, 60 exabytes! So, they were unable to store the whole proof.
Let’s pause to contemplate what 60 exabytes amounts to. It’s been estimated that all words ever spoken by human beings, if transcribed into text, would take just 5 exabytes to store. As of 2020, the genome of all the humans on Earth would take about 10 exabytes to store (if one foolishly didn’t do any data compression). On the other hand, in 2011 the total capacity of all hard drives sold by Seagate Technology was 330 exabytes.
So what proof was 60 exabytes long?
A semigroup is a set with a binary associative operation, which we write as multiplication. We say two semigroups and are isomorphic if there is a bijection such that
for all , and anti-isomorphic if there’s a bijection with
for all . We say two semigroups are equivalent if they are isomorphic or anti-isomorphic.
In 2012, Andreas Distler, Chris Jefferson, Tom Kelsey, and Lars Kottho showed that up to equivalence, there are
semigroups with 10 elements. That’s the fact that took 60 exabytes to establish! They also checked their method against previous calculations of the number of semigroups with 7, 8, or 9 elements.
Upon hearing of the much shorter proof called the “largest ever” by Nature, Chris Jefferson wrote:
Darn, I had no idea one could get into the media with this kind of stuff. I had a much larger “proof”, where we didn’t bother storing all the details, in which we enumerated 718,981,858,383,872 semigroups, towards counting the semigroups of size 10.
Uncompressed, it would have been about 63,000 terabytes just for the semigroups, and about a thousand times that to store the “proof”, which is just the tree of the search.
Of course, it would have compressed extremely well, but also I’m not sure it would have had any value, you could rebuild the search tree much faster than you could read it from a disc, and if anyone re-verified our calculation I would prefer they did it by a slightly different search, which would give us much better guarantees of correctness.
As I mentioned, his team found a total of 12,418,001,077,381,302,684 semigroups of size 10. But they took advantage of a formula, also published in 2012, that counts most of the semigroups of a given size, namely those that are ‘nilpotent of degree 3’. Thus, they only had to find 718,981,858,383,872 by a brute force search, a mere 0.006% of the total:
• Andreas Distler, Chris Jefferson, Tom Kelsey, and Lars Kottho, The semigroups of order 10, in Principles and Practice of Constraint Programming, Springer Lecture Notes in Computer Science 7514, Springer, Berlin, 2012, pp. 883–899.
What does it mean for a semigroup to be nilpotent of degree 3? It means that if you multiply any three elements of the semigroup you always get the same answer! Here’s the paper that showed how to count those:
• Andreas Distler and James D. Mitchell, The number of nilpotent semigroups of degree 3, Electron. J. Combin. 19 (2012) 51.
It’s a piece of semigroup folklore that as the size approaches infinity, the fraction of semigroups of that size that are nilpotent of degree 3 approaches 1.
Puzzle. Can you prove this?
Puzzle. Assuming the folklore is true, what’s the rough growth rate of the number of semigroups with elements, counted up to equivalence? The formulas given by Distler and Mitchell are very complicated, but they may have simple asymptotics.
Here they count the number of semigroups that are nilpotent of degree 3 with various numbers of elements, up to equivalence:
Long finite sequences, Journal of Combinatorial Theory, Series A 95, 102-144 (2001).
Thanks! Yes, I should talk about that in this series sometime! I’ll spend a while just digging up old things I wrote and polishing them.
In the book “Enumeration of Finite Groups”, a simple argument is given which proves that the number of semigroups of order n is eventually greater than , so it’s quite close to the number of binary operations. A more sophisticated argument gives a lower bound of .
Nice! Thanks, Royce. There was a typo in formula which made it not compile; I hope I added parentheses in the right places.
Isn’t it a bit odd to count an exhaustive search as a proof? Technically, I guess, it is one, of course. But then this just means, that there will be no limit to the size of a proof anymore. There are possibly an infinite amount of especially combinatorial enumeration problems which go on forever, each contributing an infinite sized proof.
The proof I’m discussing here was far more clever than an exhaustive search. An exhaustive search would have taken vastly longer, and would be completely impractical. This is already visible from the fact that they proved there are
12,418,001,077,381,302,684
semigroups of size 10 by listing just
718,981,858,383,872
of them. The rest — 99.994% of them — were dealt with in a more clever way. But even listing the ones they did would have been impractical without some serious mathematical tricks.
By the way: there never was a limit on the size of proofs. Since there are infinitely many theorems (in most formalizations of mathematics), and only finitely many of size < N for any number N (in any specific system), there have to be infinitely many of size > N.
In particular, I know a theorem whose shortest proof (in your favorite consistent formalization of set theory) is so long that the number of digits in the number of digits in the number of digits in the number of steps in its shortest proof exceeds the number of atoms in the observable Universe. I explain how to find such a theorem here:
• Insanely long proofs.
But the question I’m interested in right now is: what’s the longest proof that someone has actually done? If anyone knows one longer than 60 exabytes, please let me know!
Ok, thank you for the clarification.
I realize this is stretching the definition of “mathematical proof,” but by way of comparison, the largest chess endgame tablebase (which records the optimal-play outcome of each reachable chess position having 7 or fewer pieces on the board) is a miniscule 140 TB. Research is ongoing on an 8-piece tablebase, which is expected to exceed 1 petabyte in size.
“On the other hand, in 2011 the total capacity of all hard drives sold by Seagate Technology was 330 exabytes”
You have to love the fact that their street address is on Disk Drive. :-)