In 1936 Tarski proved a fundamental theorem of logic: the undefinability of truth. Roughly speaking, this says there’s no consistent way to extend arithmetic so that it talks about ‘truth’ for statements about arithmetic. Why not? Because if we could, we could cook up a statement that says “I am not true.” This would lead to a contradiction, the Liar Paradox: if this sentence is true then it’s not, and if it’s not then it is.
This is why the concept of ‘truth’ plays a limited role in most modern work on logic… surprising as that might seem to novices!
However, suppose we relax a bit and allow probability theory into our study of arithmetic. Could there be a consistent way to say, within arithmetic, that a statement about arithmetic has a certain probability of being true?
We can’t let ourselves say a statement has a 100% probability of being true, or a 0% probability of being true, or we’ll get in trouble with the undefinability of truth. But suppose we only let ourselves say that a statement has some probability greater than and less than
, where
Is that okay?
Yes it is, according to this draft of a paper:
• Paul Christiano, Eliezer Yudkowsky, Marcello Herreshoff and Mihaly Barasz, Definability of “Truth” in Probabilistic Logic
(Early draft), 28 March 2013.
But there’s a catch, or two. First there are many self-consistent ways to assess the probability of truth of arithmetic statements. This suggests that the probability is somewhat ‘subjective’ . But that’s fine if you think probabilities are inherently subjective—for example, if you’re a subjective Bayesian.
A bit more problematic is this: their proof that there exists a self-consistent way to assess probabilities is not constructive. In other words, you can’t use it to actually get your hands on a consistent assessment.
Fans of game theory will be amused to hear why: the proof uses Kakutani’s fixed point theorem! This is the result that John Nash used to prove games have equilibrium solutions, where nobody can improve their expected payoff by changing their strategy. And this result is not constructive.
In game theory, we use Kakutani’s fixed point theorem by letting each player update their strategy, improving it based on everyone else’s, and showing this process has a fixed point. In probabilistic logic, the process is instead that the thinker reflects on what they know, and updates their assessment of probabilities.
The statement
I have not yet carefully checked the proof of Barasz, Christiano, Herreshoff and Yudkowsky’s result. Some details have changed in the draft since I last checked, so it’s probably premature to become very nitpicky. But just to encourage technical discussions of this subject, let me try stating the result a bit more precisely. If you don’t know Tarski’s theorem, go here:
• Tarski’s undefinability theorem, Wikipedia.
I’ll assume you know that and are ready for the new stuff!
The context of this work is first-order logic. So, consider any language in first-order logic that lets us talk about natural numbers and also rational numbers. Let
be the language
with an additional function symbol
thrown in. We require that
be a rational number whenever
is a natural number. We want
to stand for the probability of the truth of the sentence whose Gödel number is
This will give a system that can reflect about probability that what it’s saying is true.
So, suppose is some theory in the language
How can we say that the probability function
has ‘reasonable opinions’ about truth, assuming that the axioms of
are true?
The authors have a nice way of answering this. First they consider any function assigning a probability to each sentence of
They say that
is coherent if there is a probability measure on the set of models of
such that
is the measure of the set of models in which
is satisfied. They show that
is coherent iff these three conditions hold:
1) for all sentences
2) for each tautology.
3) for each contradiction.
(By the way, it seems to me that 1) and 2) imply and thus 3). So either they’re giving a slightly redundant list of conditions because they feel in the mood for it, or they didn’t notice this list was redundant, or it’s not and I’m confused. It’s good to always say a list of conditions is redundant if you know it is. You may be trying to help your readers a bit, and it may seem obvious to you, but it you don’t come out and admit the redundancy, you’ll make some of your readers doubt their sanity.)
(Also by the way, they don’t say how they’re making the set of all models into a measurable space. But I bet they’re using the σ-algebra where all subsets are measurable, and I think there’s no problem with the fact that this set is very large: a proper class, I guess! If you believe in the axiom of universes, you can just restrict attention to ‘small’ models… and your probability measure will be supported on a countable set of models, since an uncountable sum of positive numbers always diverges, so the largeness of the set of these models is largely irrelevant.)
So, let’s demand that be coherent. And let’s demand that
whenever the sentence
is one of the axioms of
At this point, we’ve got this thing that assigns a probability to each sentence in our language. We’ve also got this thing
in our language, such that
is trying to be the probability of the truth of the sentence whose Gödel number is
But so far these two things aren’t connected.
To connect them, they demand a reflection principle: for any sentence and any rational numbers
Here is the Gödel number of the sentence
So, this principle says that if a sentence has some approximate probability of being true, the thinker—as described by
—knows this. They can’t know precise probabilities, or we’ll get in trouble. Also, making the reflection principle into an if and only if statement:
is too strong. It leads to a contradictions, very much as in Tarski’s original theorem on the undefinability of truth! However, in the latest draft of the paper, the authors seem to have added a weak version of the converse to their formulation of the reflection principle.
Anyway, the main theorem they’re claiming is this:
Theorem (Barasz, Christiano, Herreshoff and Yudkowsky). There exists a function assigning a probability to each sentence of
such that
1) is coherent,
2) whenever the sentence
is one of the axioms of
and
3) the reflection principle holds.
Cool! Typo in the reflection principle:
should be
.
Thanks – fixed!
Typo carried over to the “if and only if” reflection principle.
Thanks – now I fixed that too!
s/Herresho/Herreshoff/
Thank you for this post, it helped me understand the result a bit better.
You’re missing a few letters from the third author’s name: Herreshoff. (I’m guessing that you copy-pasted directly from the PDF, which tends to drop any f.)
Thanks, Aaron and Kaj—fixed!
Yes, I cut-and-pasted the name from the PDF. The letter f is a pain in the butt.
By the way, I believe a function
that’s coherent and gives probability 1 to all the axioms in a theory
also gives probability 1 to all the theorems of theory
. So, even if we should interpret these probabilities as ‘subjective’, this is the probability distribution for a very very smart mathematician: someone who can see all the logical consequences of any bunch of axioms!
For example, if
extends Peano arithmetic, and Goldbach’s conjecture is provable in Peano arithmetic, then
must assign probability 100% to Goldbach’s conjecture… even if my best guess of the probability is something like 99.99%.
Goldbach’s conjecture is of the type where if it is false, there is a finitely checkable counterexample. So if Goldbach’s conjecture is not provable in PA, it is true.
You mean if it is not refutable then it is true.
For the record, this problem of “logical uncertainty” is also on MIRI’s research agenda:
http://intelligence.org/2013/01/30/yudkowsky-on-logical-uncertainty/
But don’t axioms and their consequences exhaust all true statements in a theory? Unless there can be true statements that are not a consequence of the axioms it would seem to me that such function would define truth.
For any consistent finitely axiomatizable theory including arithmetic, there are lots of statements that can neither be proved nor disproved. This is Gödel’s first incompleteness theorem. Some of these statements are rather clearly true: for example, the unprovable statements that say “This statement is unprovable”. So truth is different from provability. Tarski’s theorem is about truth.
However, you’re touching on a good point: I believe it’s only the statements that are neither provable nor disprovable that can be assigned probabilities other than 0 or 1 by a probability function meeting the conditions of the big theorem at the bottom of my post.
I wish the paper stated this more clearly—or else said why it’s not true.
John B. wrote:
Let me say why I think this.
Given a theory
there’s a (large) set
consisting of all models of this theory. This becomes a measurable space if we say all subsets are measurable. Then I believe coherent probability assignments
assigning probability 1 to all the axioms of
are in 1-1 correspondence with probability measures
on
The probability
of a sentence
is the measure of the subset of
consisting of models in which
is satisfied.
The soundness and completeness theorems of first-order logic say
is a theorem of
if and only if it’s satisfied by every model in
.
The measure of
itself is 1. So, if
is satisfied in every model of
, we get 
Putting these together, we see any theorem
of the theory
must have
for every coherent probability assignment
Similarly, if
is a theorem,
So it’s only the sentences that are neither provable nor disprovable that can have probabilities
So, if we interpret these probabilities in a subjective Bayesian way, we’re assuming a rational agent of infinite powers, who is 100% sure of any statement that can proved from an axiom system, given that it’s sure of the axioms.
Over on G+, Valdis Kletnieks wrote:
Gödel’s first incompleteness theorem and Tarski’s theorem are different but related.
Tarski’s theorem says you can’t define truth of arithmetic statements within arithmetic—since if you could, you could write a statement saying “I am false”, and get a contradiction. If it’s true, it’s false. If it’s false, it’s true. Very bad.
Gödel knew this before Tarski, but he noticed something else: you can define provability of arithmetic statements within arithmetic. So you can write a statement saying “I am unprovable”. But you don’t get a contradiction! If it’s unprovable, it’s true. If it’s provable, it’s false. Not so bad, but still very interesting: either arithmetic contains true but unprovable statements, or false but provable ones.
Gödel then restated this in a way that avoids mentioning the concept of “truth”.
Over on G+, Alexander Kruel wrote:
I’m not sure how helpful this work will be in developing artificial intelligence. In my article, I mentioned that their proof that well-behaved assignments of probabilities to mathematical statements exist is nonconstructive. In other words, it doesn’t provide an algorithm to actually find probabilities.
In an earlier email to Yudkowsky I asked whether any well-behaved assignment of probabilities is actually uncomputable: in other words, there’s no algorithm to compute the probabilities to arbitrary accuracy. The new draft of the paper says that indeed any well-behaved assignment is uncomputable… but it doesn’t (yet) provide a proof, and I don’t see why this must be true.
But anyway:
Even if the ideas here can’t be implemented algorithmically, they could lay the groundwork for ideas that can. And I think the idea of blending probability theory and logic this way is a good one.
By the way, that quote neglected to mention that the world can be saved by caped mathematicians with the power of flight.
OK, so I skimmed this very quickly, but … can we Godel-number all the various statements about probability (e.g. as suggested by the reflection principle, as stated for rationals, not reals)? If so, isn’t there a problem? That is, if we can ‘arithmetize’ this theorem itself, and all the elements of its proof, we come back to Tarski’s thm. I see only two ways out: (a) my observation above has no bearing on the theorem, or (b) some kind of magic happens in going from a countable variant of probability/measure-theory, to one set up on a proper class.
I suspect (a) holds: the theorem is not really defining truth, but a concept of the probability of truth.
Linas Vepstas wrote:
Yes, we can.
Can you say what you think the problem would be?
Okay. Sure. It’s certainly not defining truth! It’s merely defining a concept of the probability of truth. That’s the whole point.
And this concept is weak enough that we can’t say a statement is true if and only if its probability of being true equals 1… because if we could, we’d get a paradox. We’d be able to make a statement S that says
and this would imply that negation of S has a probability 1 of being true, so the negation of S would be true, so S would be false… so it would be true! In short, we’d get a paradox.
You might be interested in Section 3.4 of Christiano et al‘s paper, currently the last section, which is called “Going Meta”. They talk about systems of arithmetic that “know” the reflection principle is true. They don’t currently talk about what happens when you arithmetize their whole argument. That could have some interesting consequences. But I don’t think they’d be “problems”, exactly.
This was helpful to me, thank you. I didn’t realize until reading it that the fact this argument relies on axioms doesn’t make it self-contradictory.
These are not as revolutionary ideas. There are decades old works on many valued logic and on its applications to paradoxes. See for example Petr Hajek’s book Methamatematics of Fuzzy Logic. These are well-known to experts in many value and philosophical logic.
I have modest familiarity with metamathematics of fuzzy logics. You can’t carry out this construction in fuzzy logic, because the handling of quantifiers is different. In general, the goals and philosophy of fuzzy logic seem significantly different, and so we diverge on many implementation details.
I should clarify because this may be an important point, and does deserve more serious discussion. There will definitely be a more extensive discussion in the paper itself, which should be forthcoming soon.
In fuzzy logic, we can introduce a truth predicate Tr that returns “true” or “false,” in the usual way, and the truth of self-referential sentences involving Tr may simply be fuzzy. However the logic cannot make statements about Tr(phi)’s fuzzy value, it can simply include it in a proposition whose truth value will then be fuzzy.
At face value this would be a modestly limited but satisfactory way to make use of a truth predicate. But for such a fuzzy truth predicate, you might have Tr(A) & Tr(B) != A & B, which is problematic for this usage. These issues are well-understood; if there have been significant advances in the last few years however then I may be ignorant of them.
In our system we can directly discuss the values of P. This is the difference between P being a procedure that returns a number, and P being a routine that returns “True” with the appropriate probability. Again, the latter usage would be fine, except that in the case of the fuzzy logic truth predicate we must admit arbitrary correlations between this return probability and the actual state of the world (such that e.g. when P(phi) is run on a statement with probability 1/2, it returns “True” half of the time as desired, but it returns true in precisely those worlds where phi itself is false).
Our system has an analogous problem to the fuzzy truth predicate, in that we might find that P(phi) 1/2 in worlds where phi is false (e.g. if phi := P(phi) < 1/2). But now we have isolated the problem in an infinitesimal error in the value of P, which is not an issue for any system that makes careful use of it (whereas arbitrary correlations between the return value of a stochastic truth predicate and the actual truth of the world, would be a quite serious problem).
You may also want to check the modal logic literature.
Is the Barasz, Christiano, Herreshoff and Yudkowsky theorem expressible in L?
Good question!
If we take
to be the language of set theory and take
to be the axioms of ZFC (Zermelo-Fraenkel set theory including the axiom of choice), we can state the Barasz, Christiano, Herreshoff and Yudkowsky theorem in
and prove it using
So, fire away!
(ZFC sounds like overkill, but their theorem relies on the Kakutani fixed point theorem, which is nonconstructive, so the axiom of choice could easily be lurking in there someplace.)
This reminds me my ancient thesis in Physics (I had ro read it again) regarding the fuzzy logic.
There are some value associated to adjectives (colors,temperature,etc.), that are not probability: for example the colors of the rainbow; a sentence can have a fuzzy propositional value in a chain of logic:
because of the proposition have different truths.
I use the functional calculus to associate the optimal value to sentence:
where the p are the truth value, and I use the Lagrange multipliers to obtain the optimal probability to associate cluster data to the value.
This choice divides the space into clusters of propositions.
The output of a sentence is:
and this value can be optimized for some value of the proposition (that are not boolean): I use they only for function approximation, not for a boolean-like logic.
I am thinking today to the true liar.
I think that logic, and mathematics, can give us the truth only if are falsifiable, like the physics.
I think that a true liars are the proposition that applied to the reality give us no information: the probability of true, and false, are equal (p=0.5) and the series of probability have not correlation (we cannot extract information by the true liar).
An example of true liar can be a proposition: a random point on a fractal surface has positive curvature (each measure gives a different curvature, and the proposition don’t gives we nothing).
If logic, and mathematics, are applied iteratively to the reality, then we cannot give truths (like the physics), but we have nearly-true proposition.
Saluti
DOmenico
Sorry about the confusing conditions 1-3; condition 3 seems to be necessary; you can’t derive it from 1 and 2 because e.g. you don’t know a priori that the probability of
and the probability (
AND TRUE) are the same (this can be derived from condition 3, or you could use some different axioms).
We mention in the paper that the consistency of a principle (e.g. the reflection principle) can be relevant to constructing practical systems even if the existence proof is non-constructive. For example, the principle could be used as a rule of inference (allowing a system to infer that
is likely to be true once it learns that
is likely to be close to 1), in which case the consistency statement shows that adopting it is unproblematic.
Intuitively, our P represents “truth” not “knowledge.” As you say, it would be a smart mathematician indeed! We imagine the reflection principle as being a rule which constrains our beliefs, in the same way that a formal correctness criterion for a truth predicate would do so.
To see that no coherent P can be computable, consider two recursively enumerable, provably disjoint, but recursively inseparable sets
and
(e.g. Turing machines that halt and output 0 and Turing machines that halt and output 1). Then
if
, and
if
, so you could use
to separate
and
.
Finally, the point of this research program is to contribute as much as possible to our understanding of reasoning; the aim is to (as a society) ensure we have an understanding of intelligence which matches our ability to build intelligences. This is an instance of the general principle that if you are going to build something which might have negative effects, it is worthwhile to understand it well first. I think it is fair to say that both Tarski’s result and our paper contribute (at least modestly) to our understanding of semantics and reflective reasoning. It will take many such steps before we have amassed something really useful, but the point is to indicate that there is a real research program here which is currently being neglected.
Thanks for your clarifications! I’m sorry for giving your paper a public blogging before you were done writing it, but it was too interesting for me to resist thinking about it, and the way I think is by writing… and these days, the way I write is by blogging.
I like your proof that any coherent
has to be uncomputable. I was looking in the wrong places for a proof of this! (I guess this is only for
‘s that assign probability 1 to the axioms in some sufficiently powerful theory, like Peano arithmetic or Robinson arithmetic, right?)
Since I like logic and probability theory, I think it’s great to combine them this way even if it’s not immediately relevant to designing rational agents with limited powers.
Over on G+, Richard Elwes wrote:
I don’t know this other work. I’m curious about it now.
[…] week I went to a workshop on mathematical logic to work on an extension of this problem, which basically aims to define a probabilistic version of first order logic that gets […]
A quick question to those who have understood the proof:
The use of the Kakutani fixed-point theorem demonstrates that there is a fixed point – an enumeration of probabilities for sentences in L’. Do we know that such an enumeration is unique? How do we know this?
It seems to me that much of the value of having probabilities for statements analogous to the use of the predicate True relies upon uniqueness, and not just consistency.
No, the choice of probabilities is far from unique!
For any axiomatization of arithmetic with finitely many axioms, there are infinitely many statements that can be neither proved nor disproved using these axioms. In the setup discussed here, we have no choice at all about assigning probabilities to statements that can be proved: they have to get probability 100%. And we have no choice at all about assigning probabilities to statements that can be disproved: they have to get probability 0%. But we have quite a bit of freedom in assigning probabilities to statements that can neither be proved nor disproved. The reflection principle lessens this freedom, but it doesn’t eliminate it.
What’s more problematic than the nonuniqueness, in my opinion, is that in this setup, no choice of probabilities obeying the rules listed is computable. That is, you can’t write a program to compute these probabilities, no matter what choice you make.
Thank-you for that, John.
My own interest in this question is whether this result casts light upon the bivalency of the Continuum Hypothesis (CH), and a number of other questions. A ‘fixed’ probability would suggest strongly that CH is bivalent. As things stand, some progress seems to have been made on this question, although I’m not quite sure what.
I don’t know what ‘bivalent’ means. Neither true nor false? Both true and false?
As you probably know, the Continuum Hypothesis can neither be proved nor disproved from ZFC (Zermelo-Fraenkel set theory plus the Axiom of Choice). So, by the completeness theorem, there are models of ZFC in which the Continuum Hypothesis holds and models in which it doesn’t. Arguments about which of these models is ‘true’ must invoke mathematical taste, or some other axioms which one is willing to declare ‘true’. Personally I think it’s best to explore both kinds of models and learn what they’re like. And indeed that’s what set theorists have been doing.
I bet that if you take ZFC plus the refleciton principle governing
as the theory you ‘know’, the one I’d been calling
you can find probability assignments
meeting all the conditions of the Barasz-Christiano-Herreshoff-Yudkowsky theorem that assign whatever probability you want to the Continuum Hypothesis. However, my attempted proof of this has a hole.
Thank-you, John, for your attempt to address this question.
I’m having trouble replying to your latest reply, so I’m responding to an earlier reply. The bivalency of CH is a really slippery problem, and I was interested to see if this probabilistic approach could make any traction in resolving it. At first blush, I would expect that any probability could be attributed to CH, but then it doesn’t surprise me if even that is hard, or impossible to establish.
I should be clear that I am gathering data for writing an essay in logic on the continuum hypothesis as part of my philosophy Masters, so I’m not looking for homework help, but only pointers at best.
The question of bivalency is a philosopher’s question, as much as it is a mathematician’s. In classical first order logic, for any X, (X v ¬X) is true, ie. every X is bivalent. Indeed, you can enter (X v ¬X) as a line in a formal proof. The trouble with this is that in second order logic, sentences such as the liar sentence L used in Gödel’s theorem cause serious problems for this rule.
So possibly some sentences are bivalent, and others are not. Intuitionists hold that you cannot perform certain operations that rely on the law of the excluded middle, although others are allowed, or else some basic mathematics couldn’t be done. Naïve intuitionism has a problem dealing with the Goldback Conjecture (GC), since it seems pretty clear that either there is a counterexample, or there isn’t, so it could conceivably have no counterexamples and yet no proof – yet surely in such a case it would be true, so (GC v ¬GC) holds?
In the light of this, the bivalency of CH could conceivably be resolved without answering the question as to whether it is in fact true or false. Also, CH could turn out to be resolvable with (say) a large cardinality axiom, or else a strengthening of the logic being used. I’ve listed some relevant papers below:
‘On the Question of Absolute Undecidability’ – Peter Koellner – doi:10.1093/philmat/nkj009
‘Is the Continuum Hypothesis a Definite Mathematical Problem’ – Solomon Feferman – #95,96 http://math.stanford.edu/~feferman/papers.html
‘Kreisel, The Continuum Hypothesis and Second Order Set Theory’ – Thomas Weston – Journal of Philosophical Logic 5 (1976) 281-298
‘What is Absolute Undecidability’ – Justin Clarke-Doane – https://files.nyu.edu/jcd305/public/
December 13-21, 2013 – I’ll be attending Workshop on Probability, Logic and Reflection at the Machine Intelligence Research Institute at 2721 Shattuck Avenue in Berkeley, with Eliezer Yudkowsky, Paul Christiano and others.
I thought that each mathematical demonstration can be translate in boolean algebra, the boolean net can be transformed in an electronic circuit, if the truth table of the circuit is constant then the demonstration have not inconsistencies; but it exist the flip flop, in the electronic, that have not a constant value, so that the boolean algebra can be inconsistent.
If the value of the truth of a sentence is a spin value (spin up and down), then with a quantum computer there is not more inconsistence, because there is a quantum value for each node of the net, and it is possible deduce theorem from flip flop outputs: there is an overlap between boolean net (without incongruence) and quantum spin net (the truth tables are the same).
The boolean net have two state of truth, with an oscillation between the two state, but I am thinking that the quantum computer have a single quantum equilibrium state for a boolean net, and there is not more incongruence (intermediate values).
Domenico wrote approximately:
Each statement in the propositional calculus can be translated into boolean algebra, but I doubt this is true for the predicate logic.
It is a my great mistake: I don’t know if each mathematical theorem can be represented like a boolean net (this is the muddle), with nodes of propositions and axioms, so that the Godel’s incompleteness theorems can be solved with a quantum computer.
Thank you.
Here’s another way to show that
is uncomputable in general: suppose
is the sentence that expresses “the Turing machine
halts on empty input”. Then if
if not zero or one, the machine
does not halt! This is clearly undecidable in general.
I think this work is interesting! My question is this: what is the difference with working with a probabilistic assignment of truth and the more “standard” provability predicate? It seems that both assign a value to each sentence into some (boolean) lattice. Is there a way of connecting both points of view?
I agree that it would be great to connect the two concepts as precisely as possible. Are probabilities really elements of a Boolean lattice? Or did you mean something else? I’m getting a bit confused…
Probabilities (or rather, events in a probability space) really are elements of a boolean lattice as are most spaces stable under intersection, union and complement, but I’m not sure my suggestion isn’t junk, as I don’t think the “classical” point of view brings much more to the table than the usual analysis of modal logic in S4.
[…] earlier “Definability of truth in probabilistic logic” report, discussed by John Baez here? In this paper, Paul aims to take a broader look at the interaction between probabilistic […]