If you followed the business about Edward Nelson’s claim to have proved the inconsistency of arithmetic, you might have heard people mention this paper:

• Shira Kritchman and Ran Raz, The surprise examination paradox and the second incompleteness theorem, *AMS Notices* **57** (December 2010), 1454–1458.

It’s got a great idea in it, which I’d like to explain in a really sloppy way so everyone can understand it. Logic is cool, but most people never get to the cool part because they can’t fight their way through the rather dry technicalities.

You all know the surprise examination paradox, right? The teacher says one day he’ll give a quiz and it will be a surprise. So the kids think “well, it can’t be on the last day then—we’d know it was coming.” And then they think “well, so it can’t be on the day before the last day, either!—we’d know it was coming.” And so on… and they convince themselves it can’t happen at all.

But then the teacher gives it the very next day, and they’re completely surprised.

People still argue a lot about how to settle this paradox. But anyway, Kritchman and Raz use a *rigorous* version of this paradox together with Chaitin’s incompleteness theorem to demonstrate Gödel’s second incompleteness theorem—which says, *very* roughly, that:

If math can prove itself consistent, it’s not.

If you’re a logician, I bet this sort of sloppy statement really annoys you. Yeah? Does it? Take a chill pill, dude: this post isn’t for you—it’s for ordinary mortals. I said I want to summarize Kritchman and Raz’s argument *in a really sloppy way*. If you want precision and details, read their paper.

Okay, here we go:

Chaitin’s theorem, which is already astounding, says there’s some length L such that you can’t prove any particular string of bits needs a program longer than L to print it out. At least, this is so if math is consistent. If it’s not consistent, you can prove anything!

On the other hand, there’s some finite number of programs of length ≤ L. So if you take a list of more numbers than that, say 1, 2, …, N, there’s *got to be at least one* that needs a program longer than L to print it out.

Okay: *there’s got to be at least one*. How many? Suppose *just one*. Then we can go through all programs of length ≤ L, find those that print all the other numbers on our list… and thus, by a process of elimination, find the culprit.

But that means we’ve proved that this culprit is a number can only be computed by a program of length > L. But Chaitin’s theorem says that’s impossible! At least, not if math is consistent.

So there can’t be just one. At least, not if math is consistent.

Okay: suppose there are *just two*. Well, we can pull the same trick and find out who they are! So there can’t be *just two*, either. At least, not if math is consistent.

We can keep playing this game until we rule out all the possibilities… and then we’re really stuck. We get a contradiction. At least, if math is consistent.

**So if we could prove math is consistent, we’d know it’s not!**

I haven’t read the paper yet (which, by the way, I couldn’t reach from your link) but informally your argument above depends on what kind of logic you are using to prove Chaitlin’s theorem. If you are using intuitionist logic, there exists a number ‘L’ gives you a number and the rest argument works. However, if Chaitlin’s theorem is non-constructive, then the argument only shows you cannot construct L (ie it is impossible to know the value of L).

By the way the url where I did find the paper was

http://www.ams.org/notices/201011/rtx101101454p.pdf

I left the “http” out of the URL—thanks for catching that. I fixed it now.

Kritchman and Raz are using classical logic, and their proof of Gödel’s incompleteness theorem applies to any recursively axiomatizable theory T extending, say, Peano arithmetic (though as Nelson notes one can get away with less, e.g. Robinson arithmetic). Chaitin’s theorem also applies in this context. Given such a theory and a choice of universal programming language, Chaitin’s theorem allows you to explicitly

constructa number L such that T is unable to prove any particular string of bits needs a program longer than L to print it out. The sketch of a proof of Chaitin’s theorem I linked to makes this clear.By the way, a second conclusion from the article would be that Eugenia Cheng has a way with words :)

Every blog needs ‘in jokes’ to make its readers feel like privileged members of an exclusive club. But yeah.

So roughly how big is L, for at least one reasonable specific choice of universal programming language? (Since you say L can be explicitly constructed, I presume that there must be a way to estimate an upper bound on its size, though perhaps a very non-tight bound.)

Actually, any explicit upper bound for whatever you’re calling L is itself a perfectly good explicit unprovable complexity: if for no x can T prove that K(x)>L, then a fortiori for no x can it prove K(x) > L+1.

That is, Chaitin’s theorem doesn’t need to find the minimal unprovable complexity to give explicit true non-theorems!

Hi, Bruce! I keep asking people to estimate for me, and nobody ever does. It’s possible Chaitin has, because he’s written a book that works everything out very explicitly for a particular programming language. But I don’t know.

To be precise: I want someone to give me a constant such that, say, Peano arithmetic can’t prove any natural number requires a C program more than characters long to print it out.

Since you’re a programmer, maybe you can guess if I tell you this:

Any number with

will do the job. Here:

• is the base in which we write numbers in our programming language.

• is the length of a program where you input a natural number and it tries to go through all proofs in Peano arithmetic until it finds one that proves some natural number has Kolmogorov complexity ; then it outputs this number. (Of course, for the value of we care about, it will not halt.)

• is a small overhead cost: the length of the extra 'glue' to create a bigger program that takes the number , written out in our chosen base, and feeds it into the program described above.

This bigger program thus has length

and for the proof to work we need this to be smaller than

The bulk of the problem is getting an upper bound on The number is just a bit bigger, surely smaller than

Anyone want to take a crack at it?

I don’t think the program can be done.

There is a slight problem here.

The Kolmogrov Complexity is not a computable function:

• 3.1 Incomputability of Kolmogorov complexity, Wikipedia.

So will any particular such program run into issues such as halting problem?

So although I am convinced that for every n and s such that PA proves K(n)=s, s<L I suspect that there may be some n for which PA cannot prove the values of K(n),

So rather than giving you an actual value L it only shows it would be contradictory for there not to be such a value (Since our reasoning is classical this is not a distinction … isn't it weird that I now find this difficult to swallow when, 30 years ago during my BSc Mathematics course, I wouldn't have had difficulty percieving the distinction?).

As the Wikipedia article and I explained, you can explicitly compute a constant L such that Peano arithmetic can’t prove any natural number requires a C program more than L characters long to print it out.

It may be impossible to compute the

smallestsuch constant L—that would be an interesting result which I haven’t seen. However, as ‘some guy in the street’ pointed out, if some constant does the job then so does any larger constant. I’m merely asserting thatsomesuch constant does the job and can be explicitly computed.The halting problem and the incomputability of Kolmogorov complexity are no obstacle here. Indeed, they underlie everything we’re discussing.

Chaitin’s theorem says that not only is Kolmogorov complexity uncomputable, there’s no way to prove it takes any value bigger than L, even though it does so in all but finitely many cases. This is an extremely strong form of uncomputability!

From my description here, I’m sure that L = 10

^{}^{9}works.But a good programmer who has studied some logic (like Bruce) could probably guess whether L = 10

^{}^{6}, 10^{}^{5}or even some smaller value works!I don’t think it can be done in C, since C semantics are not well-defined unless you specify a particular finite machine size. (Since C programs can do things like convert pointers to integers and back, tell you the size of any datatype, and convert data of any specified datatype to bytes and back.) On a finite machine of N bits, all programs either finish in time less than about 2**N or take forever.

But if you take “C without size-specific operations”, or a higher level language like Python, or for that matter a different sort of low-level language like a Turing machine, then that’s not an issue — you can define a precise semantics that allows it to run a program for an arbitrarily long time and allocate an arbitrary number of objects in memory which contain pointers to each other. (To stick with the spirit of the question, for whatever language you choose, you’d want to disallow use of any large external batch of information like a “standard library”, except for whatever is so basic that you think of it as part of the native language. This is not a serious handicap for this problem.)

The main things that the program ‘U’ (I’d rather call the program itself ‘U’ than call its length ‘U’) needs to do are:

– recognize a syntactically correct statement or proof;

– check the validity of a purported proof;

– recognize certain statements as saying or implying “The Kolmogorov complexity of n is more than i” for some n and i. (It’s not necessary to recognize all such statements, just at least one for each n and i; so it can just recognize a statement that consists of some template with specific values of n and i inserted into it at certain places.)

Assuming that U expresses the proofs it wants to check in a practical proof language (which will be more like what a practical theorem-prover like Coq uses than like what a traditional logician would recognize as “straight Peano arithmetic”, but which will not be excessively powerful in the spirit of this question), I’d estimate that the most complex part is checking proof validity, but that that can still be expressed in at most a few dozen syntactic rules, each expressible in a few lines of code. (The authors of a system like Coq, which includes code to actually do that, would know better, as long as they remember that the vast majority of their system’s actual code is not needed for this problem.)

This makes me think that even without trying to compact it much, in a reasonable language we could write U in a few hundred lines of code, or (after a bit of simple compression) a few thousand bytes. (And perhaps much less if we tried hard to compact the whole program in clever ways.)

So L will also be “a few thousand” (bytes or digits), or perhaps less, rather than some number you can never possibly count to.

For comparison:

– Bill Gates’ first commercial success was an implementation of a useful version of BASIC in about 4000 bytes;

– 4k programs (4096 bytes or less) can produce graphical animations like this one http://www.youtube.com/watch?v=FWmv1ykGzis (to be fair, the complexity of some of the OS, graphics drivers, and hardware should be included, but this is a lot less than you might think if you imagine rewriting it purely for compactness rather than for speed, and only including what this sort of program needs to produce output);

– the complete genetic code of an organism can be as short as a few hundred thousand bytes, and that has to be encoded in a way that doesn’t allow for highly clever compression schemes.

So this answers my question, as precisely as I cared about. (The key point I didn’t know when I asked it was that it was basically just asking how complex a program like U needs to be.) In fact, given your description of Chaitin’s book, I’d be surprised if he doesn’t include a concrete version of a program that could be U, and therefore directly come up with a specific legal value for L.

People in-the-know already know this, but for everybody else, there is a nice post by Terry Tao about this.

I’m no expert on these matters but regarding this paragraph:

“Okay: there’s got to be at least one. How many? Suppose just one. Then we can go through all programs of length ≤ L, find those that print all the other numbers on our list… and thus, by a process of elimination, find the culprit.”

Doesn’t the halting problem prevent you from doing this? Some of these programs will run forever, and some of them will run for a very long time, and it’s impossible to tell the difference. So you wouldn’t end up with just one program.

Or have I misunderstood?

The uncomputability of the halting problem is crucial; it’s the real reason for Chaitin’s theorem and thus this spinoff. We must constantly bear it in mind. But it doesn’t cause problems here.

At this stage in the proof we’re assuming that only one number on our list is not printed out by a program of length ≤ L. So, we can list all programs of length ≤ L and do the following trick:

let the 1st run one step, let the 2nd run one step,… let the last run one step,

let the 1st run one step, let the 2nd run one step,… let the last run one step,

and so on. Some will never halt, and some will halt and print out numbers that aren’t on our list. But by our assumption, all but one of the numbers on our list will eventually get printed out… so then we know the ‘culprit’.

I didn’t explain this very clearly before—thanks for pushing me to do so.

[…] John Baez: The Inconsistency of Arithmetic, Chaitin’s Theorem and the Surprise Examination Paradox […]

More on Chaitin’s incompleteness theorem.

I think the reasoning has some interesting qualities which are time sensitive, language sensitive, but not paradoxical in the way it seems at first. To define terms, consider what it means to be surprised. There is clearly the issue of when something is a surprise. Let’s say the professor offered the test on Friday week prior, it would be a surprise until the end of class on Thursday. But let’s tighten up surprise — there would be 24 hours where you knew for certain there would be a test on Thursday, assuming at least one day of the week will definitely have a test. The professor says he will stipulate that that is not “really” a surprise, even though from the point of view one has on Monday, it surely would be. It would only be after class on Wed, that you would be sure the only option would be Thursday, not to experience this 24 hour window of knowledge going from Thursday to the Friday of the test. But on Monday, you would know it could be Monday, Tues or Wed, without the 24 hour window, going forward. On Tues, you know it could be that day or Wed. On Wed, you would know it had to be that day, or there would be a potential 24 hour period of not being surprised. If there is a potential, disallowed by the professor of never being aware, even for a moment of the test before it occurs, then you know the next day is a no go if the test doesn’t hit you that day, and we are back in the pickle. Tuesday must be the day, but we have the window. Uh oh. If the Professor lightens up and says on Monday you will not be able to guess when the test is, if the test hasn’t been given by Thursday you will have a one day heads up, then there is no predicting the test, until and unless the end of class on Thursday, but not on Monday or earlier( there is a Monty Hall issue, but that’s another story). Otherwise, the Professor is selling a bill of goods ( or he’s buying into wily George’s redefinition of surprise unwittingly), and can’t promise the test will be given on one of 5 days, and will never allow for a twenty four hour window.

So the Professor relents, and sees that looking forward until Thursday he will be able to tally surprise people, but not on Thursday for Friday — that like picking a marble out of jars, one marble, five jars, pick four empty jars you know that the fifth will have it. The paradox comes from a nutty stipulation of “surprise”, and the fallacy ignoratio elenchi , or redefinition thatGeorge introduces. It’s logically impossible to avoid the 24 window, and if that standard is utilized, it generates a nutty result. Just as if I said the same thing about the jars: “You will be surprised. You will never know which jar has the marble.” So you pick till 4, and realize it must be five = fail. Translate: can’t be 5, so 4 is the highest, number. So you get to three jars and know it must be 4 = fail. Translate: can’t be 4 so three is the highest number. And so on. What is occurring is just a product of a defective definition of “surprise” lurking that can’t be met given the other conditions. ( The contradiction argument is not a “proof,” but a disproof: A–> P. not P —> Contra—->invalid —->Plug in A, A is invalid). Change the def to what we normally mean as third dimensional creatures when we say surprise, meaning at the beginning of the sequence not foreclosing the possibility of this changing going forward. And we are back to being surprised, at least till Thursday. But on Monday, since we know it’s not foreclosed by the potential 24 hour window, it would still be a surprise to us if the test were to occur on Friday.