## Struggles with the Continuum (Part 3)

12 September, 2016

In these posts, we’re seeing how our favorite theories of physics deal with the idea that space and time are a continuum, with points described as lists of real numbers. We’re not asking if this idea is true: there’s no clinching evidence to answer that question, so it’s too easy to let ones philosophical prejudices choose the answer. Instead, we’re looking to see what problems this idea causes, and how physicists have struggled to solve them.

We started with the Newtonian mechanics of point particles attracting each other with an inverse square force law. We found strange ‘runaway solutions’ where particles shoot to infinity in a finite amount of time by converting an infinite amount of potential energy into kinetic energy.

Then we added quantum mechanics, and we saw this problem went away, thanks to the uncertainty principle.

Now let’s take away quantum mechanics and add special relativity. Now our particles can’t go faster than light. Does this help?

### Point particles interacting with the electromagnetic field

Special relativity prohibits instantaneous action at a distance. Thus, most physicists believe that special relativity requires that forces be carried by fields, with disturbances in these fields propagating no faster than the speed of light. The argument for this is not watertight, but we seem to actually see charged particles transmitting forces via a field, the electromagnetic field—that is, light. So, most work on relativistic interactions brings in fields.

Classically, charged point particles interacting with the electromagnetic field are described by two sets of equations: Maxwell’s equations and the Lorentz force law. The first are a set of differential equations involving:

• the electric field $\vec E$ and mangetic field $\vec B,$ which in special relativity are bundled together into the electromagnetic field $F,$ and

• the electric charge density $\rho$ and current density $\vec \jmath,$ which are bundled into another field called the ‘four-current’ $J.$

By themselves, these equations are not enough to completely determine the future given initial conditions. In fact, you can choose $\rho$ and $\vec \jmath$ freely, subject to the conservation law

$\displaystyle{ \frac{\partial \rho}{\partial t} + \nabla \cdot \vec \jmath = 0 }$

For any such choice, there exists a solution of Maxwell’s equations for $t \ge 0$ given initial values for $\vec E$ and $\vec B$ that obey these equations at $t = 0.$

Thus, to determine the future given initial conditions, we also need equations that say what $\rho$ and $\vec{\jmath}$ will do. For a collection of charged point particles, they are determined by the curves in spacetime traced out by these particles. The Lorentz force law says that the force on a particle of charge $e$ is

$\vec{F} = e (\vec{E} + \vec{v} \times \vec{B})$

where $\vec v$ is the particle’s velocity and $\vec{E}$ and $\vec{B}$ are evaluated at the particle’s location. From this law we can compute the particle’s acceleration if we know its mass.

The trouble starts when we try to combine Maxwell’s equations and the Lorentz force law in a consistent way, with the goal being to predict the future behavior of the $\vec{E}$ and $\vec{B}$ fields, together with particles’ positions and velocities, given all these quantities at $t = 0.$ Attempts to do this began in the late 1800s. The drama continues today, with no definitive resolution! You can find good accounts, available free online, by Feynman and by Janssen and Mecklenburg. Here we can only skim the surface.

The first sign of a difficulty is this: the charge density and current associated to a charged particle are singular, vanishing off the curve it traces out in spacetime but ‘infinite’ on this curve. For example, a charged particle at rest at the origin has

$\rho(t,\vec x) = e \delta(\vec{x}), \qquad \vec{\jmath}(t,\vec{x}) = \vec{0}$

where $\delta$ is the Dirac delta and $e$ is the particle’s charge. This in turn forces the electric field to be singular at the origin. The simplest solution of Maxwell’s equations consisent with this choice of $\rho$ and $\vec\jmath$ is

$\displaystyle{ \vec{E}(t,\vec x) = \frac{e \hat{r}}{4 \pi \epsilon_0 r^2}, \qquad \vec{B}(t,\vec x) = 0 }$

where $\hat{r}$ is a unit vector pointing away from the origin and $\epsilon_0$ is a constant called the permittivity of free space.

In short, the electric field is ‘infinite’, or undefined, at the particle’s location. So, it is unclear how to define the ‘self-force’ exerted by the particle’s own electric field on itself. The formula for the electric field produced by a static point charge is really just our old friend, the inverse square law. Since we had previously ignored the force of a particle on itself, we might try to continue this tactic now. However, other problems intrude.

In relativistic electrodynamics, the electric field has energy density equal to

$\displaystyle{ \frac{\epsilon_0}{2} |\vec{E}|^2 }$

Thus, the total energy of the electric field of a point charge at rest is proportional to

$\displaystyle{ \frac{\epsilon_0}{2} \int_{\mathbb{R}^3} |\vec{E}|^2 \, d^3 x = \frac{e^2}{8 \pi \epsilon_0} \int_0^\infty \frac{1}{r^4} \, r^2 dr. }$

But this integral diverges near $r = 0,$ so the electric field of a charged particle has an infinite energy!

How, if at all, does this cause trouble when we try to unify Maxwell’s equations and the Lorentz force law? It helps to step back in history. In 1902, the physicist Max Abraham assumed that instead of a point, an electron is a sphere of radius $R$ with charge evenly distributed on its surface. Then the energy of its electric field becomes finite, namely:

$\displaystyle{ E = \frac{e^2}{8 \pi \epsilon_0} \int_{R}^\infty \frac{1}{r^4} \, r^2 dr = \frac{1}{2} \frac{e^2}{4 \pi \epsilon_0 R} }$

where $e$ is the electron’s charge.

Abraham also computed the extra momentum a moving electron of this sort acquires due to its electromagnetic field. He got it wrong because he didn’t understand Lorentz transformations. In 1904 Lorentz did the calculation right. Using the relationship between velocity, momentum and mass, we can derive from his result a formula for the ‘electromagnetic mass’ of the electron:

$\displaystyle{ m = \frac{2}{3} \frac{e^2}{4 \pi \epsilon_0 R c^2} }$

where $c$ is the speed of light. We can think of this as the extra mass an electron acquires by carrying an electromagnetic field along with it.

Putting the last two equations together, these physicists obtained a remarkable result:

$\displaystyle{ E = \frac{3}{4} mc^2 }$

Then, in 1905, a fellow named Einstein came along and made it clear that the only reasonable relation between energy and mass is

$E = mc^2$

In 1906, Poincaré figured out the problem. It is not a computational mistake, nor a failure to properly take special relativity into account. The problem is that like charges repel, so if the electron were a sphere of charge it would explode without something to hold it together. And that something—whatever it is—might have energy. But their calculation ignored that extra energy.

In short, the picture of the electron as a tiny sphere of charge, with nothing holding it together, is incomplete. And the calculation showing $E = \frac{3}{4}mc^2,$ together with special relativity saying $E = mc^2,$ shows that this incomplete picture is inconsistent. At the time, some physicists hoped that all the mass of the electron could be accounted for by the electromagnetic field. Their hopes were killed by this discrepancy.

Nonetheless it is interesting to take the energy $E$ computed above, set it equal to $m_e c^2$ where $m_e$ is the electron’s observed mass, and solve for the radius $R.$ The answer is

$\displaystyle{ R = \frac{1}{8 \pi \epsilon_0} \frac{e^2}{m_e c^2} } \approx 1.4 \times 10^{-15} \mathrm{ meters}$

In the early 1900s, this would have been a remarkably tiny distance: $0.00003$ times the Bohr radius of a hydrogen atom. By now we know this is roughly the radius of a proton. We know that electrons are not spheres of this size. So at present it makes more sense to treat the calculations so far as a prelude to some kind of limiting process where we take $R \to 0.$ These calculations teach us two lessons.

First, the electromagnetic field energy approaches $+\infty$ as we let $R \to 0,$ so we will be hard pressed to take this limit and get a well-behaved physical theory. One approach is to give a charged particle its own ‘bare mass’ $m_\mathrm{bare}$ in addition to the mass $m_\mathrm{elec}$ arising from electromagnetic field energy, in a way that depends on $R.$ Then as we take the $R \to 0$ limit we can let $m_\mathrm{bare} \to -\infty$ in such a way that $m_\mathrm{bare} + m_\mathrm{elec}$ approaches a chosen limit $m,$ the physical mass of the point particle. This is an example of ‘renormalization’.

Second, it is wise to include conservation of energy-momentum as a requirement in addition to Maxwell’s equations and the Lorentz force law. Here is a more sophisticated way to phrase Poincaré’s realization. From the electromagnetic field one can compute a ‘stress-energy tensor’ $T,$ which describes the flow of energy and momentum through spacetime. If all the energy and momentum of an object comes from its electromagnetic field, you can compute them by integrating $T$ over the hypersurface $t = 0.$ You can prove that the resulting 4-vector transforms correctly under Lorentz transformations if you assume the stress-energy tensor has vanishing divergence: $\partial^\mu T_{\mu \nu} = 0.$ This equation says that energy and momentum are locally conserved. However, this equation fails to hold for a spherical shell of charge with no extra forces holding it together. In the absence of extra forces, it violates conservation of momentum for a charge to feel an electromagnetic force yet not accelerate.

So far we have only discussed the simplest situation: a single charged particle at rest, or moving at a constant velocity. To go further, we can try to compute the acceleration of a small charged sphere in an arbitrary electromagnetic field. Then, by taking the limit as the radius $r$ of the sphere goes to zero, perhaps we can obtain the law of motion for a charged point particle.

In fact this whole program is fraught with difficulties, but physicists boldly go where mathematicians fear to tread, and in a rough way this program was carried out already by Abraham in 1905. His treatment of special relativistic effects was wrong, but these were easily corrected; the real difficulties lie elsewhere. In 1938 his calculations were carried out much more carefully—though still not rigorously—by Dirac. The resulting law of motion is thus called the ‘Abraham–Lorentz–Dirac force law’.

There are three key ways in which this law differs from our earlier naive statement of the Lorentz force law:

• We must decompose the electromagnetic field in two parts, the ‘external’ electromagnetic field $F_\mathrm{ext}$ and the field produced by the particle:

$F = F_\mathrm{ext} + F_\mathrm{ret}$

Here $F_\mathrm{ext}$ is a solution Maxwell equations with $J = 0,$ while $F_\mathrm{ret}$ is computed by convolving the particle’s 4-current $J$ with a function called the ‘retarded Green’s function’. This breaks the time-reversal symmetry of the formalism so far, ensuring that radiation emitted by the particle moves outward as $t$ increases. We then decree that the particle only feels a Lorentz force due to $F_\mathrm{ext},$ not $F_\mathrm{ret}.$ This avoids the problem that $F_\mathrm{ret}$ becomes infinite along the particle’s path as $r \to 0.$

• Maxwell’s equations say that an accelerating charged particle emits radiation, which carries energy-momentum. Conservation of energy-momentum implies that there is a compensating force on the charged particle. This is called the ‘radiation reaction’. So, in addition to the Lorentz force, there is a radiation reaction force.

• As we take the limit $r \to 0,$ we must adjust the particle’s bare mass $m_\mathrm{bare}$ in such a way that its physical mass $m = m_\mathrm{bare} + m_\mathrm{elec}$ is held constant. This involves letting $m_\mathrm{bare} \to -\infty$ as $m_\mathrm{elec} \to +\infty.$

It is easiest to describe the Abraham–Lorentz–Dirac force law using standard relativistic notation. So, we switch to units where $c$ and $4 \pi \epsilon_0$ equal 1, let $x^\mu$ denote the spacetime coordinates of a point particle, and use a dot to denote the derivative with respect to proper time. Then the Abraham–Lorentz–Dirac force law says

$m \ddot{x}^\mu = e F_{\mathrm{ext}}^{\mu \nu} \, \dot{x}_\nu \; - \; \frac{2}{3}e^2 \ddot{x}^\alpha \ddot{x}_\alpha \, \dot{x}^\mu \; + \; \frac{2}{3}e^2 \dddot{x}^\mu .$

The first term at right is the Lorentz force, which looks more elegant in this new notation. The second term is fairly intuitive: it acts to reduce the particle’s velocity at a rate proportional to its velocity (as one would expect from friction), but also proportional to the squared magnitude of its acceleration. This is the ‘radiation reaction’.

The last term, called the ‘Schott term’, is the most shocking. Unlike all familiar laws in classical mechanics, it involves the third derivative of the particle’s position!

This seems to shatter our original hope of predicting the electromagnetic field and the particle’s position and velocity given their initial values. Now it seems we need to specify the particle’s initial position, velocity and acceleration.

Furthermore, unlike Maxwell’s equations and the original Lorentz force law, the Abraham–Lorentz–Dirac force law is not symmetric under time reversal. If we take a solution and replace $t$ with $-t,$ the result is not a solution. Like the force of friction, radiation reaction acts to make a particle lose energy as it moves into the future, not the past.

The reason is that our assumptions have explicitly broken time symmetry. The splitting $F = F_\mathrm{ext} + F_\mathrm{ret}$ says that a charged accelerating particle radiates into the future, creating the field $F_\mathrm{ret},$ and is affected only by the remaining electromagnetic field $F_\mathrm{ext}.$

Worse, the Abraham–Lorentz–Dirac force law has counterintuitive solutions. Suppose for example that $F_\mathrm{ext} = 0.$ Besides the expected solutions where the particle’s velocity is constant, there are solutions for which the particle accelerates indefinitely, approaching the speed of light! These are called ‘runaway solutions’. In these runaway solutions, the acceleration as measured in the frame of reference of the particle grows exponentially with the passage of proper time.

So, the notion that special relativity might help us avoid the pathologies of Newtonian point particles interacting gravitationally—five-body solutions where particles shoot to infinity in finite time—is cruelly mocked by the Abraham–Lorentz–Dirac force law. Particles cannot move faster than light, but even a single particle can extract an arbitrary amount of energy-momentum from the electromagnetic field in its immediate vicinity and use this to propel itself forward at speeds approaching that of light. The energy stored in the field near the particle is sometimes called ‘Schott energy’.

Thanks to the Schott term in the Abraham–Lorentz–Dirac force law, the Schott energy can be converted into kinetic energy for the particle. The details of how this work are nicely discussed in a paper by Øyvind Grøn, so click the link and read that if you’re interested. I’ll just show you a picture from that paper:

So even one particle can do crazy things! But worse, suppose we generalize the framework to include more than one particle. The arguments for the Abraham–Lorentz–Dirac force law can be generalized to this case. The result is simply that each particle obeys this law with an external field $F_\mathrm{ext}$ that includes the fields produced by all the other particles. But a problem appears when we use this law to compute the motion of two particles of opposite charge. To simplify the calculation, suppose they are located symmetrically with respect to the origin, with equal and opposite velocities and accelerations. Suppose the external field felt by each particle is solely the field created by the other particle. Since the particles have opposite charges, they should attract each other. However, one can prove they will never collide. In fact, if at any time they are moving towards each other, they will later turn around and move away from each other at ever-increasing speed!

This fact was discovered by C. Jayaratnam Eliezer in 1943. It is so counterintuitive that several proofs were required before physicists believed it.

None of these strange phenomena have ever been seen experimentally. Faced with this problem, physicists have naturally looked for ways out. First, why not simply cross out the $\dddot{x}^\mu$ term in the Abraham–Lorentz–Dirac force? Unfortunately the resulting simplified equation

$m \ddot{x}^\mu = e F_{\mathrm{ext}}^{\mu \nu} \, \dot{x}_\nu - \frac{2}{3}e^2 \ddot{x}^\alpha \ddot{x}_\alpha \, \dot{x}^\mu$

has only trivial solutions. The reason is that with the particle’s path parametrized by proper time, the vector $\dot{x}^\mu$ has constant length, so the vector $\ddot{x}^\mu$ is orthogonal to $\dot{x}^\mu$ . So is the vector $F_{\mathrm{ext}}^{\mu \nu} \dot{x}_\nu,$ because $F_{\mathrm{ext}}$ is an antisymmetric tensor. So, the last term must be zero, which implies $\ddot{x} = 0,$ which in turn implies that all three terms must vanish.

Another possibility is that some assumption made in deriving the Abraham–Lorentz–Dirac force law is incorrect. Of course the theory is physically incorrect, in that it ignores quantum mechanics and other things, but that is not the issue. The issue here is one of mathematical physics, of trying to formulate a well-behaved classical theory that describes charged point particles interacting with the electromagnetic field. If we can prove this is impossible, we will have learned something. But perhaps there is a loophole. The original arguments for the Abraham–Lorentz–Dirac force law are by no means mathematically rigorous. They involve a delicate limiting procedure, and approximations that were believed, but not proved, to become perfectly accurate in the $r \to 0$ limit. Could these arguments conceal a mistake?

Calculations involving a spherical shell of charge has been improved by a series of authors, and nicely summarized by Fritz Rohrlich. In all these calculations, nonlinear powers of the acceleration and its time derivatives are neglected, and one hopes this is acceptable in the $r \to 0$ limit.

Dirac, struggling with renormalization in quantum field theory, took a different tack. Instead of considering a sphere of charge, he treated the electron as a point from the very start. However, he studied the flow of energy-momentum across the surface of a tube of radius $r$ centered on the electron’s path. By computing this flow in the limit $r \to 0,$ and using conservation of energy-momentum, he attempted to derive the force on the electron. He did not obtain a unique result, but the simplest choice gives the Abraham–Lorentz–Dirac equation. More complicated choices typically involve nonlinear powers of the acceleration and its time derivatives.

Since this work, many authors have tried to simplify Dirac’s rather complicated calculations and clarify his assumptions. This book is a good guide:

• Stephen Parrott, Relativistic Electrodynamics and Differential Geometry, Springer, Berlin, 1987.

But more recently, Jerzy Kijowski and some coauthors have made impressive progress in a series of papers that solve many of the problems we have described.

Kijowski’s key idea is to impose conditions on precisely how the electromagnetic field is allowed to behave near the path traced out by a charged point particle. He breaks the field into a ‘regular’ part and a ‘singular’ part:

$F = F_\textrm{reg} + F_\textrm{sing}$

Here $F_\textrm{reg}$ is smooth everywhere, while $F_\textrm{sing}$ is singular near the particle’s path, but only in a carefully prescribed way. Roughly, at each moment, in the particle’s instantaneous rest frame, the singular part of its electric field consists of the familiar part proportional to $1/r^2,$ together with a part proportional to $1/r^3$ which depends on the particle’s acceleration. No other singularities are allowed!

On the one hand, this eliminates the ambiguities mentioned earlier: in the end, there are no ‘nonlinear powers of the acceleration and its time derivatives’ in Kijowski’s force law. On the other hand, this avoids breaking time reversal symmetry, as the earlier splitting $F = F_\textrm{ext} + F_\textrm{ret}$ did.

Next, Kijowski defines the energy-momentum of a point particle to be $m \dot{x},$ where $m$ is its physical mass. He defines the energy-momentum of the electromagnetic field to be just that due to $F_\textrm{reg},$ not $F_\textrm{sing}.$ This amounts to eliminating the infinite ‘electromagnetic mass’ of the charged particle. He then shows that Maxwell’s equations and conservation of total energy-momentum imply an equation of motion for the particle!

This equation is very simple:

$m \ddot{x}^\mu = e F_{\textrm{reg}}^{\mu \nu} \, \dot{x}_\nu$

It is just the Lorentz force law! Since the troubling Schott term is gone, this is a second-order differential equation. So we can hope that to predict the future behavior of the electromagnetic field, together with the particle’s position and velocity, given all these quantities at $t = 0.$

And indeed this is true! In 1998, together with Gittel and Zeidler, Kijowski proved that initial data of this sort, obeying the careful restrictions on allowed singularities of the electromagnetic field, determine a unique solution of Maxwell’s equations and the Lorentz force law, at least for a short amount of time. Even better, all this remains true for any number of particles.

There are some obvious questions to ask about this new approach. In the Abraham–Lorentz–Dirac force law, the acceleration was an independent variable that needed to be specified at $t = 0$ along with position and momentum. This problem disappears in Kijowski’s approach. But how?

We mentioned that the singular part of the electromagnetic field, $F_\textrm{sing},$ depends on the particle’s acceleration. But more is true: the particle’s acceleration is completely determined by $F_\textrm{sing}.$ So, the particle’s acceleration is not an independent variable because it is encoded into the electromagnetic field.

Another question is: where did the radiation reaction go? The answer is: we can see it if we go back and decompose the electromagnetic field as $F_\textrm{ext} + F_\textrm{ret}$ as we had before. If we take the law

$m \ddot{x}^\mu = e F_{\textrm{reg}}^{\mu \nu} \dot{x}_\nu$

and rewrite it in terms of $F_\textrm{ext},$ we recover the original Abraham–Lorentz–Dirac law, including the radiation reaction term and Schott term.

Unfortunately, this means that ‘pathological’ solutions where particles extract arbitrary amounts of energy from the electromagnetic field are still possible. A related problem is that apparently nobody has yet proved solutions exist for all time. Perhaps a singularity worse than the allowed kind could develop in a finite amount of time—for example, when particles collide.

So, classical point particles interacting with the electromagnetic field still present serious challenges to the physicist and mathematician. When you have an infinitely small charged particle right next to its own infinitely strong electromagnetic field, trouble can break out very easily!

### Particles without fields

Finally, I should also mention attempts, working within the framework of special relativity, to get rid of fields and have particles interact with each other directly. For example, in 1903 Schwarzschild introduced a framework in which charged particles exert an electromagnetic force on each other, with no mention of fields. In this setup, forces are transmitted not instantaneously but at the speed of light: the force on one particle at one spacetime point $x$ depends on the motion of some other particle at spacetime point $y$ only if the vector $x - y$ is lightlike. Later Fokker and Tetrode derived this force law from a principle of least action. In 1949, Feynman and Wheeler checked that this formalism gives results compatible with the usual approach to electromagnetism using fields, except for several points:

• Each particle exerts forces only on other particles, so we avoid the thorny issue of how a point particle responds to the electromagnetic field produced by itself.

• There are no electromagnetic fields not produced by particles: for example, the theory does not describe the motion of a charged particle in an ‘external electromagnetic field’.

• The principle of least action guarantees that ‘if $A$ affects $B$ then $B$ affects $A$’ . So, if a particle at $x$ exerts a force on a particle at a point $y$ in its future lightcone, the particle at $y$ exerts a force on the particle at $x$ in its past lightcone. This raises the issue of ‘reverse causality’, which Feynman and Wheeler address.

Besides the reverse causality issue, perhaps one reason this approach has not been more pursued is that it does not admit a Hamiltonian formulation in terms of particle positions and momenta. Indeed, there are a number of ‘no-go theorems’ for relativistic multiparticle Hamiltonians, saying that these can only describe noninteracting particles. So, most work that takes both quantum mechanics and special relativity into account uses fields.

Indeed, in quantum electrodynamics, even the charged point particles are replaced by fields—namely quantum fields! Next time we’ll see whether that helps.

## Logic, Probability and Reflection

26 December, 2013

Last week I attended the Machine Intelligence Research Institute’s sixth Workshop on Logic, Probability, and Reflection. This one was in Berkeley, where the institute has its headquarters.

You may know this institute under their previous name: the Singularity Institute. It seems to be the brainchild of Eliezer Yudkowsky, a well-known advocate of ‘friendly artificial intelligence’, whom I interviewed in week311, week312 and week313 of This Week’s Finds. He takes an approach to artificial intelligence that’s heavily influenced by mathematical logic, and I got invited to the workshop because I blogged about a paper he wrote with Mihaly Barasz, Paul Christiano and Marcello Herresho ff on probability theory and logic.

I only have the energy to lay the groundwork for a good explanation of what happened in the workshop. So, after you read my post, please read this:

• Benja Fallenstein, Results from MIRI’s December workshop, Less Wrong, 28 December 2013.

The workshop had two main themes, so let me tell you what they were.

### Scientific induction in mathematics

The first theme is related to that paper I just mentioned. How should a rational agent assign probabilities to statements in mathematics? Of course an omniscient being could assign

probability 1 to every mathematical statement that’s provable,

probability 0 to every statement whose negation is provable,

and

to every statement that is neither provable nor disprovable.

But a real-world rational agent will never have time to check all proofs, so there will always be lots of statements it’s not sure about. Actual mathematicians always have conjectures, like the Twin Prime Conjecture, that we consider plausible even though nobody has proved them. And whenever we do research, we’re constantly estimating how likely it is for statements to be true, and changing our estimates as new evidence comes in. In other words, we use scientific induction in mathematics.

How could we automate this? Most of us don’t consciously assign numerical probabilities to mathematical statements. But maybe an AI mathematician should. If so, what rules should it follow?

It’s natural to try a version of Solomonoff induction, where our probability estimate, before any evidence comes in, favors statements that are simple. However, this runs up against problems. If you’re interested in learning more about this, try:

• Jeremy Hahn, Scientific induction in probabilistic mathematics.

It’s a summary of ideas people came up with during the workshop. I would like to explain them sometime, but for now I should move on.

### The Löbian obstacle

The second main theme was the ‘Löbian obstacle’. Löb’s theorem is the flip side of Gödel’s first incompleteness theorem, less famous but just as shocking. It seems to put limitations on how much a perfectly rational being can trust itself.

Since it’s the day after Christmas, let’s ease our way into these deep waters with the Santa Claus paradox, also known as Curry’s paradox.

If you have a child who is worried that Santa Claus might not exist, you can reassure them using this sentence:

If this sentence is true, Santa Claus exists.

Call it P, for short.

Assume, for the sake of argument, that P is true. Then what it says is true: “If P is true, Santa Claus exists.” And we’re assuming P is true. So, Santa Claus exists.

So, we’ve proved that if P is true, Santa Claus exists.

But that’s just what P says!

So, P is true.

So, Santa Claus exists!

There must be something wrong about this argument, even if Santa Claus does exist, because if it were valid you could you use it to prove anything at all. The self-reference is obviously suspicious. The sentence in question is a variant of the Liar Paradox:

This sentence is false.

since we can rewrite the Liar Paradox as

If this sentence is true, 0 = 1.

and then replace “0=1” by any false statement you like.

However, Gödel figured out a way to squeeze solid insights from these dubious self-referential sentences. He did this by creating a statement in the language of arithmetic, referring to nothing but numbers, which nonetheless manages to effectively say

This sentence is unprovable.

If it were provable, you’d get a contradiction! So, either arithmetic is inconsistent or this sentence is unprovable. But if it’s unprovable, it’s true. So, there are true but unprovable statements in arithmetic… unless arithmetic is inconsistent! This discovery shook the world of mathematics.

Here I’m being quite sloppy, just to get the idea across.

For one thing, when I’m saying ‘provable’, I mean provable given some specific axioms for arithmetic, like the Peano axioms. If we change our axioms, different statements will be provable.

For another, the concept of ‘true’ statements in arithmetic is often shunned by logicians. That may sound shocking, but there are many reasons for this: for example, Tarski showed that the truth of statements about arithmetic is undefinable in arithmetic. ‘Provability’ is much easier to deal with.

So, a better way of thinking about Gödel’s result is that he constructed a statement that is neither provable nor disprovable from Peano’s axioms of arithmetic, unless those axioms are inconsistent (in which case we can prove everything, but it’s all worthless).

Furthermore, this result applies not just to Peano’s axioms but to any stronger set of axioms, as long as you can write a computer program to list those axioms.

In 1952, the logician Leon Henkin flipped Gödel’s idea around and asked about a sentence in the language of arithmetic that says:

This sentence is provable.

He asked: is this provable or not? The answer is much less obvious than for Gödel’s sentence. Play around with it and see what I mean.

But in 1954, Martin Hugo Löb showed that Henkin’s sentence is provable!

And Henkin noticed something amazing: Löb’s proof shows much more.

At this point it pays to become a bit more precise. Let us write $\mathrm{PA} \vdash P$ to mean the statement $P$ is provable from the Peano axioms of arithmetic. Gödel figured out how to encode statements in arithmetic as numbers, so let’s write $\# P$ for the Gödel number of any statement $P.$ And Gödel figured out how to write a statement in arithmetic, say

$\mathrm{Provable}(n)$

which says that the statement with Gödel number $n$ is provable using the Peano axioms.

Using this terminology, what Henkin originally did was find a number $n$ such that the sentence

$\mathrm{Provable}(n)$

has Gödel number $n.$ So, this sentence says

This sentence is provable from the Peano axioms of arithmetic.

What Löb did was show

$\mathrm{PA} \vdash \mathrm{Provable}(n)$

In other words, he showed that Henkin sentence really is provable from the Peano axioms!

What Henkin then did is prove that for any sentence $P$ in the language of arithmetic, if

$\mathrm{PA} \vdash \mathrm{Provable}(\# P) \implies P$

then

$\mathrm{PA} \vdash P$

In other words, suppose we can prove that the provability of $P$ implies $P.$ Then we can prove $P$!

At first this merely sounds nightmarishly complicated. But if you think about it long enough, you’ll see it’s downright terrifying! For example, suppose $P$ is some famous open question in arithmetic, like the Twin Prime Conjecture. You might hope to prove

The provability of the Twin Prime Conjecture implies the Twin Prime Conjecture.

Indeed, that seems like a perfectly reasonable thing to want. But it turns out that proving this is as hard as proving the Twin Prime Conjecture! Why? Because if we can prove the boldface sentence above, Löb and Henkin’s work instantly gives us a proof of Twin Prime Conjecture!

What does all this have to do with artificial intelligence?

Well, what I just said is true not only for Peano arithmetic, but any set of axioms including Peano arithmetic that a computer program can list. Suppose your highly logical AI mathematician has some such set of axioms, say $\mathrm{AI}.$ You might want it to trust itself. In other words, you might want

$\mathrm{AI} \vdash \mathrm{Provable}(\# P) \implies P$

for every sentence $P.$ This says, roughly, that whatever the AI can prove it can prove, it can prove.

But then Löb’s theorem would kick in and give

$\mathrm{AI} \vdash P$

for every sentence $P.$ And this would be disastrous: our AI would be inconsistent, because it could prove everything!

This is just the beginning of the problems. It gets more involved when we consider AI’s that spawn new AI’s and want to trust them. For more see:

• Eliezer Yudkowsky and Marcello Herreshoff, Tiling agents for self-modifying AI, and the Löbian obstacle.

At workshop various people made progress on this issue, which is recorded in these summaries:

• Eliezer Yudkowsky, The procrastination paradox.

Abstract. A theorem by Marcello Herresho , Benja Fallenstein, and Stuart Armstrong shows that if there exists an infinite series of theories $T_i$ extending $\mathrm{PA}$ where each $T_i$ proves the soundness of $T_{i+1}$, then all the $T_i$ must have only nonstandard models. We call this the Procrastination Theorem for reasons which will become apparent.

Here Fallenstein constructs a di fferent sequence of theories $T_i$ extending Peano arithmetic such that each $T_i$ proves the consistency of $T_{i+1},$ and all the theories are sound for $\Pi_1$ sentences—that is, sentences with only one $\forall$ quantifier outside the rest of the stuff.

The following summaries would take more work to explain:

• Nate Soares, Fallenstein’s monster.

• Nisan Stiennon, Recursively-defined logical theories are well-defined.

• Benja Fallenstein, The 5-and-10 problem and the tiling agents formalism.

## Talk at the SETI Institute

5 December, 2013

SETI means ‘Search for Extraterrestrial Intelligence’. I’m giving a talk at the SETI Institute on Tuesday December 17th, from noon to 1 pm. You can watch it live, watch it later on their YouTube channel, or actually go there and see it. It’s free, and you can just walk in at 189 San Bernardo Avenue in Mountain View, California, but please register if you can.

#### Life’s Struggle to Survive

When pondering the number of extraterrestrial civilizations, it is worth noting that even after it got started, the success of life on Earth was not a foregone conclusion. We recount some thrilling episodes from the history of our planet, some well-documented but others merely theorized: our collision with the planet Theia, the oxygen catastrophe, the snowball Earth events, the Permian-Triassic mass extinction event, the asteroid that hit Chicxulub, and more, including the global warming episode we are causing now. All of these hold lessons for what may happen on other planets.

If you know interesting things about these or other ‘close calls’, please tell me! I’m still preparing my talk, and there’s room for more fun facts. I’ll make my slides available when they’re ready.

The SETI Institute looks like an interesting place, and my host, Adrian Brown, is an expert on the poles of Mars. I’ve been fascinated about the water there, and I’ll definitely ask him about this paper:

• Adrian J. Brown, Shane Byrne, Livio L. Tornabene and Ted Roush, Louth crater: Evolution of a layered water ice mound, Icarus 196 (2008), 433–445.

Louth Crater is a fascinating place. Here’s a photo:

By the way, I’ll be in Berkeley from December 14th to 21st, except for a day trip down to Mountain View for this talk. I’ll be at the Machine Intelligence Research Institute talking to Eliezer Yudkowsky, Paul Christiano and others at a Workshop on Probability, Logic and Reflection. This invitation arose from my blog post here:

If you’re in Berkeley and you want to talk, drop me a line. I may be too busy, but I may not.

## Monte Carlo Methods in Climate Science

23 July, 2013

joint with David Tweed

One way the Azimuth Project can help save the planet is to get bright young students interested in ecology, climate science, green technology, and stuff like that. So, we are writing an article for Math Horizons, an American magazine for undergraduate math majors. This blog article is a draft of that. You can also see it in PDF form here.

We’d really like to hear your comments! There are severe limits on including more detail, since the article should be easy to read and short. So please don’t ask us to explain more stuff: we’re most interested to know if you sincerely don’t understand something, or feel that students would have trouble understanding something. For comparison, you can see sample Math Horizons articles here.

### Introduction

They look placid lapping against the beach on a calm day, but the oceans are actually quite dynamic. The ocean currents act as ‘conveyor belts’, transporting heat both vertically between the water’s surface and the depths and laterally from one area of the globe to another. This effect is so significant that the temperature and precipitation patterns can change dramatically when currents do.

For example: shortly after the last ice age, northern Europe experienced a shocking change in climate from 10,800 to 9,500 BC. At the start of this period temperatures plummeted in a matter of decades. It became 7° Celsius colder, and glaciers started forming in England! The cold spell lasted for over a thousand years, but it ended as suddenly as it had begun.

Why? The most popular theory is that that a huge lake in North America formed by melting glaciers burst its bank—and in a massive torrent lasting for years, the water from this lake rushed out to the northern Atlantic ocean. By floating atop the denser salt water, this fresh water blocked a major current: the Atlantic Meridional Overturning Circulation. This current brings warm water north and helps keep northern Europe warm. So, when iit shut down, northern Europe was plunged into a deep freeze.

Right now global warming is causing ice sheets in Greenland to melt and release fresh water into the North Atlantic. Could this shut down the Atlantic Meridional Overturning Circulation and make the climate of Northern Europe much colder? In 2010, Keller and Urban [KU] tackled this question using a simple climate model, historical data, probability theory, and lots of computing power. Their goal was to understand the spectrum of possible futures compatible with what we know today.

Let us look at some of the ideas underlying their work.

### Box models

The earth’s physical behaviour, including the climate is far too complex to simulate from the bottom up using basic physical principles, at least for now. The most detailed models today can take days to run on very powerful computers. So to make reasonable predictions on a laptop in a tractable time-frame, geophysical modellers use some tricks.

First, it is possible to split geophysical phenomena into ‘boxes’ containing strongly related things. For example: atmospheric gases, particulate levels and clouds all affect each other strongly; likewise the heat content, currents and salinity of the oceans all interact strongly. However, the interactions between the atmosphere and the oceans are weaker, and we can approximately describe them using just a few settings, such as the amount of atmospheric CO2 entering or leaving the oceans. Clearly these interactions must be consistent—for example, the amount of CO2 leaving the atmosphere box must equal the amount entering the ocean box—but breaking a complicated system into parts lets different specialists focus on different aspects; then we can combine these parts and get an approximate model of entire planet. The box model used by Keller and Urban is shown in Figure 1.

1. The box model used by Keller and Urban.

Second, it turn out that simple but effective box models can be distilled from the complicated physics in terms of forcings and feedbacks. Essentially a forcing is a measured input to the system, such as solar radiation or CO2 released by burning fossil fuels. As an analogy, consider a child on a swing: the adult’s push every so often is a forcing. Similarly a feedback describes how the current ‘box variables’ influence future ones. In the swing analogy, one feedback is how the velocity will influence the future height. Specifying feedbacks typically uses knowledge of the detailed low-level physics to derive simple, tractable functional relationships between groups of large-scale observables, a bit like how we derive the physics of a gas by thinking about collisions of lots of particles.

However, it is often not feasible to get actual settings for the parameters in our model starting from first principles. In other words, often we can get the general form of the equations in our model, but they contain a lot of constants that we can estimate only by looking at historical data.

### Probability modeling

Suppose we have a box model that depends on some settings $S.$ For example, in Keller and Urban’s model, $S$ is a list of 18 numbers. To keep things simple, suppose the settings are element of some finite set. Suppose we also have huge hard disc full of historical measurements, and we want to use this to find the best estimate of $S.$ Because our data is full of ‘noise’ from other, unmodeled phenomena we generally cannot unambiguously deduce a single set of settings. Instead we have to look at things in terms of probabilities. More precisely, we need to study the probability that $S$ take some value $s$ given that the measurements take some value. Let’s call the measurements $M$, and again let’s keep things simple by saying $M$ takes values in some finite set of possible measurements.

The probability that $S = s$ given that $M$ takes some value $m$ is called the conditional probability $P(S=s | M=m).$ How can we compute this conditional probability? This is a somewhat tricky problem.

One thing we can more easily do is repeatedly run our model with randomly chosen settings and see what measurements it predicts. By doing this, we can compute the probability that given setting values $S = s,$ the model predicts measurements $M=m.$ This again is a conditional probability, but now it is called $P(M=m|S=s).$

This is not what we want: it’s backwards! But here Bayes’ rule comes to the rescue, relating what we want to what we can more easily compute:

$\displaystyle{ P(S = s | M = m) = P(M = m| S = s) \frac{P(S = s)}{P(M = m)} }$

Here $P(S = s)$ is the probability that the settings take a specific value $s,$ and similarly for $P(M = m).$ Bayes’ rule is quite easy to prove, and it is actually a general rule that applies to any random variables, not just the settings and the measurements in our problem [Y]. It underpins most methods of figuring out hidden quantities from observed ones. For this reason, it is widely used in modern statistics and data analysis [K].

How does Bayes’ rule help us here? When we repeatedly run our model with randomly chosen settings, we have control over $P(S = s).$ As mentioned, we can compute $P(M=m| S=s).$ Finally, $P(M = m)$ is independent of our choice of settings. So, we can use Bayes’ rule to compute $P(S = s | M = m)$ up to a constant factor. And since probabilities must sum to 1, we can figure out this constant.

This lets us do many things. It lets us find the most likely values of the settings for our model, given our hard disc full of observed data. It also lets us find the probability that the settings lie within some set. This is important: if we’re facing the possibility of a climate disaster, we don’t just want to know the most likely outcome. We would like to know to know that with 95% probability, the outcome will lie in some range.

### An example

Let us look at an example much simpler than that considered by Keller and Urban. Suppose our measurements are real numbers $m_0,\dots, m_T$ related by

$m_{t+1} = s m_t - m_{t-1} + N_t$

Here $s,$ a real constant, is our ‘setting’, while $N_t$ is some ‘noise’: an independent Gaussian random variable for each time $t,$ each with mean zero and some fixed standard deviation. Then the measurements $m_t$ will have roughly sinusoidal behavior but with irregularity added by the noise at each time step, as illustrated in Figure 2.

2. The example system: red are predicted measurements for a given value of the settings, green is another simulation for the same $s$ value and blue is a simulation for a slightly different $s.$

Note how there is no clear signal from either the curves or the differences that the green curve is at the correct setting value while the blue one has the wrong one: the noise makes it nontrivial to estimate $s.$ This is a baby version of the problem faced by Keller and Urban.

### Markov Chain Monte Carlo

Having glibly said that we can compute the conditional probability $P(M=m | S=s),$ how do we actually do this? The simplest way would be to run our model many, many times with the settings set at $S=s$ and determine the fraction of times it predicts measurements equal to $m.$ This gives us an estimate of $P(M=m | S=s).$ Then we can use Bayes’ rule to work out $P(M=m|S=s),$ at least up to a constant factor.

Doing all this by hand would be incredibly time consuming and error prone, so computers are used for this task. In our example, we do this in Figure 3. As we keep running our model over and over, the curve showing $P(M=m |S=s)$ as a function of $s$ settles down to the right answer.

3. The estimates of $P(M=m | S=s)$ as a function of $s$ using uniform sampling, ending up with 480 samples at each point.

However, this is computationally inefficient, as shown in the probability distribution for small numbers of samples. This has quite a few ‘kinks’, which only disappear later. The problem is that there are lots of possible choices of $s$ to try. And this is for a very simple model!

When dealing with the 18 settings involved in the model of Keller and Urban, trying every combination would take far too long. A way to avoid this is Markov Chain Monte Carlo sampling. Monte Carlo is famous for its casinos, so a ‘Monte Carlo’ algorithm is one that uses randomness. A ‘Markov chain’ is a random walk: for example, where you repeatedly flip a coin and take one step right when you get heads, and one step right when you get tails. So, in Markov Chain Monte Carlo, we perform a random walk through the collection of all possible settings, collecting samples.

The key to making this work is that at each step on the walk a proposed modification $s'$ to the current settings $s$ is generated randomly—but it may be rejected if it does not seem to improve the estimates. The essence of the rule is:

The modification $s \mapsto s'$ is randomly accepted with a probability equal to the ratio

$\displaystyle{ \frac{P(M=m | S=s')}{ P(M=m | S=s)} }$

Otherwise the walk stays at the current position.

If the modification is better, so that the ratio is greater than 1, the new state is always accepted. With some additional tricks—such as discarding the very beginning of the walk—this gives a set of samples from which can be used to compute $P(M=m | S=s).$ Then we can compute $P(S = s | M = m)$ using Bayes’ rule.

Figure 4 shows the results of using the Markov Chain Monte Carlo procedure to figure out $P(S= s| M= m)$ in our example.

4. The estimates of $P(S = s|M = m)$ curves using Markov Chain Monte Carlo, showing the current distribution estimate at increasing intervals. The red line shows the current position of the random walk. Again the kinks are almost gone in the final distribution.

Note that the final distribution has only peformed about 66 thousand simulations in total, while the full sampling peformed over 1.5 million. The key advantage of Markov Chain Monte Carlo is that it avoids performing many simulations in areas where the probability is low, as we can see from the way the walk path remains under the big peak in the probability density almost all the time. What is more impressive is that it achieves this without any global view of the probability density, just by looking at how $P(M=m | S=s)$ changes when we make small changes in the settings. This becomes even more important as we move to dealing with systems with many more dimensions and settings, where it proves very effective at finding regions of high probability density whatever their shape.

Why is it worth doing so much work to estimate the probability distribution for settings for a climate model? One reason is that we can then estimate probabilities of future events, such as the collapse of the Atlantic Meridional Ocean Current. And what’s the answer? According to Keller and Urban’s calculation, this current will likely weaken by about a fifth in the 21st century, but a complete collapse is unlikely before 2300. This claim needs to be checked in many ways—for example, using more detailed models. But the importance of the issue is clear, and we hope we have made the importance of good mathematical ideas for climate science clear as well.

### Exploring the topic

The Azimuth Project is a group of scientists, engineers and computer programmers interested in questions like this [A]. If you have questions, or want to help out, just email us. Versions of the computer programs we used in this paper will be made available here in a while.

Here are some projects you can try, perhaps with the help of Kruschke’s textbook [K]:

• There are other ways to do setting estimation using time series: compare some to MCMC in terms of accuracy and robustness.

• We’ve seen a 1-dimensional system with one setting. Simulate some multi-dimensional and multi-setting systems. What new issues arise?

Acknowledgements. We thank Nathan Urban and other
members of the Azimuth Project for many helpful discussions.

### References

[A] Azimuth Project, http://www.azimuthproject.org.

[KU] Klaus Keller and Nathan Urban, Probabilistic hindcasts and projections of the coupled climate, carbon cycle and Atlantic meridional overturning circulation system: a Bayesian fusion of century-scale measurements with a simple model, Tellus A 62 (2010), 737–750. Also available free online.

[K] John K. Kruschke, Doing Bayesian Data Analysis: A Tutorial with R and BUGS, Academic Press, New York, 2010.

[Y] Eliezer S. Yudkowsky, An intuitive explanation of Bayes’ theorem.

## Probability Theory and the Undefinability of Truth

31 March, 2013

In 1936 Tarski proved a fundamental theorem of logic: the undefinability of truth. Roughly speaking, this says there’s no consistent way to extend arithmetic so that it talks about ‘truth’ for statements about arithmetic. Why not? Because if we could, we could cook up a statement that says “I am not true.” This would lead to a contradiction, the Liar Paradox: if this sentence is true then it’s not, and if it’s not then it is.

This is why the concept of ‘truth’ plays a limited role in most modern work on logic… surprising as that might seem to novices!

However, suppose we relax a bit and allow probability theory into our study of arithmetic. Could there be a consistent way to say, within arithmetic, that a statement about arithmetic has a certain probability of being true?

We can’t let ourselves say a statement has a 100% probability of being true, or a 0% probability of being true, or we’ll get in trouble with the undefinability of truth. But suppose we only let ourselves say that a statement has some probability greater than $a$ and less than $b$, where $0 < a < b < 1.$ Is that okay?

Yes it is, according to this draft of a paper:

• Paul Christiano, Eliezer Yudkowsky, Marcello Herresho ff and Mihaly Barasz, De finability of “Truth” in Probabilistic Logic
(Early draft)
, 28 March 2013.

But there’s a catch, or two. First there are many self-consistent ways to assess the probability of truth of arithmetic statements. This suggests that the probability is somewhat ‘subjective’ . But that’s fine if you think probabilities are inherently subjective—for example, if you’re a subjective Bayesian.

A bit more problematic is this: their proof that there exists a self-consistent way to assess probabilities is not constructive. In other words, you can’t use it to actually get your hands on a consistent assessment.

Fans of game theory will be amused to hear why: the proof uses Kakutani’s fixed point theorem! This is the result that John Nash used to prove games have equilibrium solutions, where nobody can improve their expected payoff by changing their strategy. And this result is not constructive.

In game theory, we use Kakutani’s fixed point theorem by letting each player update their strategy, improving it based on everyone else’s, and showing this process has a fixed point. In probabilistic logic, the process is instead that the thinker reflects on what they know, and updates their assessment of probabilities.

### The statement

I have not yet carefully checked the proof of Barasz, Christiano, Herreshoff and Yudkowsky’s result. Some details have changed in the draft since I last checked, so it’s probably premature to become very nitpicky. But just to encourage technical discussions of this subject, let me try stating the result a bit more precisely. If you don’t know Tarski’s theorem, go here:

Tarski’s undefinability theorem, Wikipedia.

I’ll assume you know that and are ready for the new stuff!

The context of this work is first-order logic. So, consider any language $L$ in first-order logic that lets us talk about natural numbers and also rational numbers. Let $L'$ be the language $L$ with an additional function symbol $\mathbb{P}$ thrown in. We require that $\mathbb{P}(n)$ be a rational number whenever $n$ is a natural number. We want $\mathbb{P}(n)$ to stand for the probability of the truth of the sentence whose Gödel number is $n.$ This will give a system that can reflect about probability that what it’s saying is true.

So, suppose $T$ is some theory in the language $L'.$ How can we say that the probability function $\mathbb{P}$ has ‘reasonable opinions’ about truth, assuming that the axioms of $T$ are true?

The authors have a nice way of answering this. First they consider any function $P$ assigning a probability to each sentence of $L'.$ They say that $P$ is coherent if there is a probability measure on the set of models of $L'$ such that $P(\phi)$ is the measure of the set of models in which $\phi$ is satisfied. They show that $P$ is coherent iff these three conditions hold:

1) $P(\phi) = P(\phi \wedge \psi) + P(\phi \wedge \lnot \psi)$ for all sentences $\phi, \psi.$

2) $P(\phi) = 1$ for each tautology.

3) $P(\phi) = 0$ for each contradiction.

(By the way, it seems to me that 1) and 2) imply $P(\phi) + P(\lnot \phi) = 1$ and thus 3). So either they’re giving a slightly redundant list of conditions because they feel in the mood for it, or they didn’t notice this list was redundant, or it’s not and I’m confused. It’s good to always say a list of conditions is redundant if you know it is. You may be trying to help your readers a bit, and it may seem obvious to you, but it you don’t come out and admit the redundancy, you’ll make some of your readers doubt their sanity.)

(Also by the way, they don’t say how they’re making the set of all models into a measurable space. But I bet they’re using the σ-algebra where all subsets are measurable, and I think there’s no problem with the fact that this set is very large: a proper class, I guess! If you believe in the axiom of universes, you can just restrict attention to ‘small’ models… and your probability measure will be supported on a countable set of models, since an uncountable sum of positive numbers always diverges, so the largeness of the set of these models is largely irrelevant.)

So, let’s demand that $P$ be coherent. And let’s demand that $P(\phi) = 1$ whenever the sentence $\phi$ is one of the axioms of $T.$

At this point, we’ve got this thing $P$ that assigns a probability to each sentence in our language. We’ve also got this thing $\mathbb{P}$ in our language, such that $\mathbb{P}(n)$ is trying to be the probability of the truth of the sentence whose Gödel number is $n.$ But so far these two things aren’t connected.

To connect them, they demand a reflection principle: for any sentence $\phi$ and any rational numbers $0 < a < b < 1,$

$a < P(\phi) < b \implies P(a < \mathbb{P}(\ulcorner \phi \urcorner) < b) = 1$

Here $\ulcorner \phi \urcorner$ is the Gödel number of the sentence $\phi.$ So, this principle says that if a sentence has some approximate probability of being true, the thinker—as described by $\mathbb{P}$—knows this. They can’t know precise probabilities, or we’ll get in trouble. Also, making the reflection principle into an if and only if statement:

$a < P(\phi) < b \iff P(a < \mathbb{P}(\ulcorner \phi \urcorner) < b) = 1$

is too strong. It leads to a contradictions, very much as in Tarski’s original theorem on the undefinability of truth! However, in the latest draft of the paper, the authors seem to have added a weak version of the converse to their formulation of the reflection principle.

Anyway, the main theorem they’re claiming is this:

Theorem (Barasz, Christiano, Herresho ff and Yudkowsky). There exists a function $P$ assigning a probability to each sentence of $L',$ such that

1) $P$ is coherent,

2) $P(\phi) = 1$ whenever the sentence $\phi$ is one of the axioms of $T,$

and

3) the reflection principle holds. 

## Five Books About Our Future

16 May, 2012

Jordan Peacock has suggested interviewing me for Five Books, a website where people talk about five books they’ve read.

It’s probably going against the point of this site to read books especially for the purpose of getting interviewed about them. But I like the idea of talking about books that paint different visions of our future, and the issues we face. And I may need to read some more to carry out this plan.

So: what are you favorite books on this subject?

I’d like to pick books with different visions, preferably focused on the relatively near-term future, and preferably somewhat plausible—though I don’t expect every book to seem convincing to all reasonable people.

Here are some options that leap to mind.

### Whole Earth Discipline

• Stewart Brand, Whole Earth Discipline: An Ecopragmatist Manifesto, Viking Penguin, 2009.

I’ve been meaning to write about this one for a long time! Brand argues that changes in this century will be dominated by global warming, urbanization and biotechnology. He advocates new thinking on topics that traditional environmentalists have rather set negative opinions about, like nuclear power, genetic engineering, and the advantages of urban life. This is on my list for sure.

### Limits to Growth

• Donnella Meadows, Jørgen Randers, and Dennis Meadows, Limits to Growth: The 30-Year Update, Chelsea Green Publishing Company, 2004.

Sad to say, I’ve never read the original 1972 book The Limits to Growth—or the 1974 edition which among other things presented a simple computer model of world population, industrialization, pollution, food production and resource depletion. Both the book and the model (called World3) have been much criticized over the years. But recently some have argued its projections—which were intended to illustrate ideas, not predict the future—are not doing so badly:

• Graham Turner, A comparison of The Limits to Growth with thirty years of reality, Commonwealth Scientific and Industrial Research Organisation (CSIRO).

It would be interesting to delve into this highly controversial topic. By the way, the model is now available online:

• Brian Hayes, Limits to Growth.

with an engaging explanation here:

• Brian Hayes, World3, the public beta, Bit-Player: An Amateur’s Look at Computation and Mathematics, 15 April 2012.

It runs on your web-browser, and it’s easy to take a copy for yourself and play around with it.

### The Ecotechnic Future

John Michael Greer believes that ‘peak oil’—or more precisely, the slow decline of fossil fuel production—will spell the end to our modern technological civilization. He spells this out here:

• John Michael Greer, The Long Descent, New Society Publishers, 2008.

I haven’t read this book, but I’ve read the sequel, which begins to imagine what comes afterwards:

• John Michael Greer, The Ecotechnic Future, New Society Publishers, 2009.

Here he argues that in the next century or three we will go through a transition through ‘scarcity economies’ to ‘salvage economies’ to sustainable economies that use much less energy than we do now.

Both these books seem to outrage everyone who envisages our future as a story of technological progress continuing more or less along the lines we’ve already staked out.

### The Singularity is Near

In the opposite direction, we have:

• Ray Kurzweil, The Singularity is Near, Penguin Books, 2005.

I’ve only read bits of this. According to Wikipedia, the main premises of the book are:

• A technological-evolutionary point known as “the singularity” exists as an achievable goal for humanity. (What exactly does Kurzeil mean by the “the singularity”? I think I know what other people, like Vernor Vinge and Eliezer Yudkowsky, mean by it. But what does he mean?)

• Through a law of accelerating returns, technology is progressing toward the singularity at an exponential rate. (What does in the world does it mean to progress toward a singularity at an exponential rate? I know that Kurzweil provides evidence that lots of things are growing exponentially… but if they keep doing that, that’s not what I’d call a singularity.)

• The functionality of the human brain is quantifiable in terms of technology that we can build in the near future.

• Medical advances make it possible for a significant number of Kurzweil’s generation (Baby Boomers) to live long enough for the exponential growth of technology to intersect and surpass the processing of the human brain.

If you think you know a better book that advocates a roughly similar thesis, let me know.

### A Prosperous Way Down

• Howard T. Odum and Elisabeth C. Odum, A Prosperous Way Down: Principles and Policies, Columbia University Press, 2001.

Howard T. Odum is the father of ‘systems ecology’, and developed an interesting graphical language for describing energy flows in ecosystems. According to George Mobus:

In this book he and Elisabeth take on the situation regarding social ecology under the conditions of diminishing energy flows. Taking principles from systems ecology involving systems suffering from the decline of energy (e.g. deciduous forests in fall), showing how such systems have adapted or respond to those conditions, they have applied these to the human social system. The Odums argued that if we humans were wise enough to apply these principles through policy decisions to ourselves, we might find similar ways to adapt with much less suffering than is potentially implied by sudden and drastic social collapse.

This seems to be a more scholarly approach to some of the same issues:

• Howard T. Odum, Environment, Power, and Society for the Twenty-First Century: The Hierarchy of Energy, Columbia U. Press, 2007.

### More?

There are plenty of other candidates I know less about. These two seem to be free online:

• Lester Brown, World on the Edge: How to Prevent Environmental and Economic Collapse, W. W. Norton & Company, 2011.

• Richard Heinberg, The End of Growth: Adapting to Our New Economic Reality, New Society Publishers, 2009.

I would really like even more choices—especially books by thoughtful people who do think we can solve the problems confronting us… but do not think all problems will automatically be solved by human ingenuity and leave it to the rest of us to work out the, umm, details.

## What To Do? (Part 1)

24 April, 2011

In a comment on my last interview with Yudkowsky, Eric Jordan wrote:

John, it would be great if you could follow up at some point with your thoughts and responses to what Eliezer said here. He’s got a pretty firm view that environmentalism would be a waste of your talents, and it’s obvious where he’d like to see you turn your thoughts instead. I’m especially curious to hear what you think of his argument that there are already millions of bright people working for the environment, so your personal contribution wouldn’t be as important as it would be in a less crowded field.

Indeed, the reason I quit work on my previous area of interest—categorification and higher gauge theory—was the feeling that more and more people were moving into it. When I started, it seemed like a lonely but exciting quest. By now there are plenty of conferences on it, attended by plenty of people. It would be a full-time job just keeping up, much less doing something truly new. That made me feel inadequate—and worse, unnecessary. Helping start a snowball roll downhill is fun… but what’s the point in chasing one that’s already rolling?

The people working in this field include former grad students of mine and other youngsters I helped turn on to the subject. At first this made me a bit frustrated. It’s as if I engineered my own obsolescence. If only I’d spent less time explaining things, and more time proving theorems, maybe I could have stayed at the forefront!

But by now I’ve learned to see the bright side: it means I’m free to do other things. As I get older, I’m becoming ever more conscious of my limited lifespan and the vast number of things I’d like to try.

But what to do?

This a big question. It’s a bit self-indulgent to discuss it publicly… or maybe not. It is, after all, a question we all face. I’ll talk about me, because I’m not up to tackling this question in its universal abstract form. But it could be you asking this, too.

For me this question was brought into sharp focus when I got a research position where I was allowed—nay, downright encouraged!—to follow my heart and work on what I consider truly important. In the ordinary course of life we often feel too caught up in the flow of things to do more than make small course corrections. Suddenly I was given a burst of freedom. What to do with it?

In my earlier work, I’d always taken the attitude that I should tackle whatever questions seemed most beautiful and profound… subject to the constraint that I had a good chance of making some progress on them. I realized that this attitude assumes other people will do most of the ‘dirty work’, whatever that may be. But I figured I could get away with it. I figured that if I were ever called to account—by my own conscience, say—I could point to the fact that I’d worked hard to understand the universe and also spent a lot of time teaching people, both in my job and in my spare time. Surely that counts for something?

I had, however, for decades been observing the slow-motion train wreck that our civilization seems to be engaged in. Global warming, ocean acidification and habitat loss may be combining to cause a mass extinction event, and perhaps—in conjunction with resource depletion—a serious setback to human civilization. Now is not the time to go over all the evidence: suffice it to say that I think we may be heading for serious trouble.

It’s hard to know just how much trouble. If it were just routine ‘misery as usual’, I’ll admit I’d be happy to sit back and let everyone else deal with these problems. But the more I study them, the more that seems untenable… especially since so many people are doing just that: sitting back and letting everyone else deal with them.

I’m not sure this complex of problems rises to the level of an ‘existential risk’—which Nick Bostrom defines as one where an adverse outcome would either annihilate intelligent life originating on Earth or permanently and drastically curtail its potential. But I see scenarios where we clobber ourselves quite seriously. They don’t even seem unlikely, and they don’t seem very far-off, and I don’t see people effectively rising to the occasion. So, just as I’d move to put out a fire if I saw smoke coming out of the kitchen and everyone else was too busy watching TV to notice, I feel I have to do something.

But the question remains: what to do?

I honestly don’t see how a rationalist can avoid this conclusion: At this absolutely critical hinge in the history of the universe—Earth in the 21st century—rational altruists should devote their marginal attentions to risks that threaten to terminate intelligent life or permanently destroy a part of its potential. Those problems, which Nick Bostrom named ‘existential risks’, have got all the scope. And when it comes to marginal impact, there are major risks outstanding that practically no one is working on. Once you get the stakes on a gut level it’s hard to see how doing anything else could be sane.

So how do you go about protecting the future of intelligent life? Environmentalism? After all, there are environmental catastrophes that could knock over our civilization… but then if you want to put the whole universe at stake, it’s not enough for one civilization to topple, you have to argue that our civilization is above average in its chances of building a positive galactic future compared to whatever civilization would rise again a century or two later. Maybe if there were ten people working on environmentalism and millions of people working on Friendly AI, I could see sending the next marginal dollar to environmentalism. But with millions of people working on environmentalism, and major existential risks that are completely ignored… if you add a marginal resource that can, rarely, be steered by expected utilities instead of warm glows, devoting that resource to environmentalism does not make sense.

Similarly with other short-term problems. Unless they’re little-known and unpopular problems, the marginal impact is not going to make sense, because millions of other people will already be working on them. And even if you argue that some short-term problem leverages existential risk, it’s not going to be perfect leverage and some quantitative discount will apply, probably a large one. I would be suspicious that the decision to work on a short-term problem was driven by warm glow, status drives, or simple conventionalism.

With that said, there’s also such a thing as comparative advantage—the old puzzle of the lawyer who works an hour in the soup clinic instead of working an extra hour as a lawyer and donating the money. Personally I’d say you can work an hour in the soup clinic to keep yourself going if you like, but you should also be working extra lawyer-hours and donating the money to the soup clinic, or better yet, to something with more scope. (See “Purchase Fuzzies and Utilons Separately” on Less Wrong.) Most people can’t work effectively on Artificial Intelligence (some would question if anyone can, but at the very least it’s not an easy problem). But there’s a variety of existential risks to choose from, plus a general background job of spreading sufficiently high-grade rationality and existential risk awareness. One really should look over those before going into something short-term and conventional. Unless your master plan is just to work the extra hours and donate them to the cause with the highest marginal expected utility per dollar, which is perfectly respectable.

Where should you go in life? I don’t know exactly, but I think I’ll go ahead and say “not environmentalism”. There’s just no way that the product of scope, marginal impact, and John Baez’s comparative advantage is going to end up being maximal at that point.

When I heard this, one of my first reactions was: “Of course I don’t want to do anything ‘conventional’, something that ‘millions of people’ are already doing”. After all, my sense of being just another guy in the crowd was a big factor in leaving work on categorification and higher gauge theory—and most people have never even heard of those subjects!

I think so far the Azimuth Project is proceeding in a sufficiently unconventional way that while it may fall flat on its face, it’s at least trying something new. Though I always want more people to join in, we’ve already got some good projects going that take advantage of my ‘comparative advantage’: the ability to do math and explain stuff.

The most visible here is the network theory project, which is a step towards the kind of math I think we need to understand a wide variety of complex systems. I’ve been putting most of my energy into that lately, and coming up with ideas faster than I can explain them. On top of that, Eric Forgy, Tim van Beek, Staffan Liljgeren, Matt Reece, David Tweed and others have other interesting projects cooking behind the scenes on the Azimuth Forum. I’ll be talking about those soon, too.

I don’t feel satisfied, though. I’m happy enough—that’s never a problem these days—but once you start trying to do things to help the world, instead of just have fun, it’s very tricky to determine the best way to proceed.

One can, of course, easily fool oneself into thinking one knows.

## This Week’s Finds (Week 313)

25 March, 2011

Here’s the third and final part of my interview with Eliezer Yudkowsky. We’ll talk about three big questions… roughly these:

• How do you get people to work on potentially risky projects in a safe way?

• Do we understand ethics well enough to build “Friendly artificial intelligence”?

• What’s better to work on, artificial intelligence or environmental issues?

JB: There are decent Wikipedia articles on “optimism bias” and “positive illusions”, which suggest that unrealistically optimistic people are more energetic, while more realistic estimates of success go hand-in-hand with mild depression. If this is true, I can easily imagine that most people working on challenging projects like quantum gravity (me, 10 years ago) or artificial intelligence (you) are unrealistically optimistic about our chances of success.

Indeed, I can easily imagine that the first researchers to create a truly powerful artificial intelligence will be people who underestimate its potential dangers. It’s an interesting irony, isn’t it? If most people who are naturally cautious avoid a certain potentially dangerous line of research, the people who pursue that line of research are likely to be less cautious than average.

I’m a bit worried about this when it comes to “geoengineering”, for example—attempts to tackle global warming by large engineering projects. We have people who say “oh no, that’s too dangerous”, and turn their attention to approaches they consider less risky, but that may leave the field to people who underestimate the risks.

So I’m very glad you are thinking hard about how to avoid the potential dangers of artificial intelligence—and even trying to make this problem sound exciting, to attract ambitious and energetic young people to work on it. Is that part of your explicit goal? To make caution and rationality sound sexy?

EY: The really hard part of the problem isn’t getting a few smart people to work on cautious, rational AI. It’s admittedly a harder problem than it should be, because there’s a whole system out there which is set up to funnel smart young people into all sorts of other things besides cautious rational long-term basic AI research. But it isn’t the really hard part of the problem.

The scary thing about AI is that I would guess that the first AI to go over some critical threshold of self-improvement takes all the marbles—first mover advantage, winner take all. The first pile of uranium to have an effective neutron multiplication factor greater than 1, or maybe the first AI smart enough to absorb all the poorly defended processing power on the Internet—there’s actually a number of different thresholds that could provide a critical first-mover advantage.

And it is always going to be fundamentally easier in some sense to go straight all out for AI and not worry about clean designs or stable self-modification or the problem where a near-miss on the value system destroys almost all of the actual value from our perspective. (E.g., imagine aliens who shared every single term in the human utility function but lacked our notion of boredom. Their civilization might consist of a single peak experience repeated over and over, which would make their civilization very boring from our perspective, compared to what it might have been. That is, leaving a single aspect out of the value system can destroy almost all of the value. So there’s a very large gap in the AI problem between trying to get the value system exactly right, versus throwing something at it that sounds vaguely good.)

You want to keep as much of an advantage as possible for the cautious rational AI developers over the crowd that is just gung-ho to solve this super interesting scientific problem and go down in the eternal books of fame. Now there should in fact be some upper bound on the combination of intelligence, methodological rationality, and deep understanding of the problem which you can possess, and still walk directly into the whirling helicopter blades. The problem is that it is probably a rather high upper bound. And you are trying to outrace people who are trying to solve a fundamentally easier wrong problem. So the question is not attracting people to the field in general, but rather getting the really smart competent people to either work for a cautious project or not go into the field at all. You aren’t going to stop people from trying to develop AI. But you can hope to have as many of the really smart people as possible working on cautious projects rather than incautious ones.

So yes, making caution look sexy. But even more than that, trying to make incautious AI projects look merely stupid. Not dangerous. Dangerous is sexy. As the old proverb goes, most of the damage is done by people who wish to feel themselves important. Human psychology seems to be such that many ambitious people find it far less scary to think about destroying the world, than to think about never amounting to much of anything at all. I have met people like this. In fact all the people I have met who think they are going to win eternal fame through their AI projects have been like this. The thought of potentially destroying the world is bearable; it confirms their own importance. The thought of not being able to plow full steam ahead on their incredible amazing AI idea is not bearable; it threatens all their fantasies of wealth and fame.

Now these people of whom I speak are not top-notch minds, not in the class of the top people in mainstream AI, like say Peter Norvig (to name someone I’ve had the honor of meeting personally). And it’s possible that if and when self-improving AI starts to get real top-notch minds working on it, rather than people who were too optimistic about/attached to their amazing bright idea to be scared away by the field of skulls, then these real stars will not fall prey to the same sort of psychological trap. And then again it is also plausible to me that top-notch minds will fall prey to exactly the same trap, because I have yet to learn from reading history that great scientific geniuses are always sane.

So what I would most like to see would be uniform looks of condescending scorn directed at people who claimed their amazing bright AI idea was going to lead to self-improvement and superintelligence, but who couldn’t mount an adequate defense of how their design would have a goal system stable after a billion sequential self-modifications, or how it would get the value system exactly right instead of mostly right. In other words, making destroying the world look unprestigious and low-status, instead of leaving it to the default state of sexiness and importance-confirmingness.

JB: “Get the value system exactly right”—now this phrase touches on another issue I’ve been wanting to talk about. How do we know what it means for a value system to be exactly right? It seems people are even further from agreeing on what it means to be good than on what it means to be rational. Yet you seem to be suggesting we need to solve this problem before it’s safe to build a self-improving artificial intelligence!

When I was younger I worried a lot about the foundations of ethics. I decided that you “can’t derive an ought from an is”—do you believe that? If so, all logical arguments leading up to the conclusion that “you should do X” must involve an assumption of the form “you should do Y”… and attempts to “derive” ethics are all implicitly circular in some way. This really bothered the heck out of me: how was I supposed to know what to do? But of course I kept on doing things while I was worrying about this… and indeed, it was painfully clear that there’s no way out of making decisions: even deciding to “do nothing” or commit suicide counts as a decision.

Later I got more comfortable with the idea that making decisions about what to do needn’t paralyze me any more than making decisions about what is true. But still, it seems that the business of designing ethical beings is going to provoke huge arguments, if and when we get around to that.

Do you spend as much time thinking about these issues as you do thinking about rationality? Of course they’re linked….

EY: Well, I probably spend as much time explaining these issues as I do rationality. There are also an absolutely huge number of pitfalls that people stumble into when they try to think about, as I would put it, Friendly AI. Consider how many pitfalls people run into when they try to think about Artificial Intelligence. Next consider how many pitfalls people run into when they try to think about morality. Next consider how many pitfalls philosophers run into when they try to think about the nature of morality. Next consider how many pitfalls people run into when they try to think about hypothetical extremely powerful agents, especially extremely powerful agents that are supposed to be extremely good. Next consider how many pitfalls people run into when they try to imagine optimal worlds to live in or optimal rules to follow or optimal governments and so on.

Now imagine a subject matter which offers discussants a lovely opportunity to run into all of those pitfalls at the same time.

That’s what happens when you try to talk about Friendly Artificial Intelligence.

And it only takes one error for a chain of reasoning to end up in Outer Mongolia. So one of the great motivating factors behind all the writing I did on rationality and all the sequences I wrote on Less Wrong was to actually make it possible, via two years worth of writing and probably something like a month’s worth of reading at least, to immunize people against all the usual mistakes.

Lest I appear to dodge the question entirely, I’ll try for very quick descriptions and google keywords that professional moral philosophers might recognize.

In terms of what I would advocate programming a very powerful AI to actually do, the keywords are “mature folk morality” and “reflective equilibrium”. This means that you build a sufficiently powerful AI to do, not what people say they want, or even what people actually want, but what people would decide they wanted the AI to do, if they had all of the AI’s information, could think about for as long a subjective time as the AI, knew as much as the AI did about the real factors at work in their own psychology, and had no failures of self-control.

There’s a lot of important reasons why you would want to do exactly that and not, say, implement Asimov’s Three Laws of Robotics (a purely fictional device, and if Asimov had depicted them as working well, he would have had no stories to write) or building a superpowerful AI which obeys people’s commands interpreted in literal English, or creating a god whose sole prime directive is to make people maximally happy, or any of the above plus a list of six different patches which guarantee that nothing can possibly go wrong, and various other things that seem like incredibly obvious failure scenarios but which I assure you I have heard seriously advocated over and over and over again.

In a nutshell, you want to use concepts like “mature folk morality” or “reflective equilibrium” because these are as close as moral philosophy has ever gotten to defining in concrete, computable terms what you could be wrong about when you order an AI to do the wrong thing.

For an attempt at nontechnical explanation of what one might want to program an AI to do and why, the best resource I can offer is an old essay of mine which is not written so as to offer good google keywords, but holds up fairly well nonetheless:

• Eliezer Yudkowsky, Coherent extrapolated volition, May 2004.

You also raised some questions about metaethics, where metaethics asks not “Which acts are moral?” but “What is the subject matter of our talk about ‘morality’?” i.e. “What are we talking about here anyway?” In terms of Google keywords, my brand of metaethics is closest to analytic descriptivism or moral functionalism. If I were to try to put that into a very brief nutshell, it would be something like “When we talk about ‘morality’ or ‘goodness’ or ‘right’, the subject matter we’re talking about is a sort of gigantic math question hidden under the simple word ‘right’, a math question that includes all of our emotions and all of what we use to process moral arguments and all the things we might want to change about ourselves if we could see our own source code and know what we were really thinking.”

The complete Less Wrong sequence on metaethics (with many dependencies to earlier ones) is:

• Eliezer Yudkowsky, Metaethics sequence, Less Wrong, 20 June to 22 August 2008.

And one of the better quick summaries is at:

• Eliezer Yudkowsky, Inseparably right; or, joy in the merely good, Less Wrong, 9 August 2008.

And if I am wise I shall not say any more.

JB: I’ll help you be wise. There are a hundred followup questions I’m tempted to ask, but this has been a long and grueling interview, so I won’t. Instead, I’d like to raise one last big question. It’s about time scales.

Self-improving artificial intelligence seems like a real possibility to me. But when? You see, I believe we’re in the midst of a global ecological crisis—a mass extinction event, whose effects will be painfully evident by the end of the century. I want to do something about it. I can’t do much, but I want to do something. Even if we’re doomed to disaster, there are different sizes of disaster. And if we’re going through a kind of bottleneck, where some species make it through and others go extinct, even small actions now can make a difference.

I can imagine some technological optimists—singularitarians, extropians and the like—saying: “Don’t worry, things will get better. Things that seem hard now will only get easier. We’ll be able to suck carbon dioxide from the atmosphere using nanotechnology, and revive species starting from their DNA.” Or maybe even: “Don’t worry: we won’t miss those species. We’ll be having too much fun doing things we can’t even conceive of now.”

But various things make me skeptical of such optimism. One of them is the question of time scales. What if the world goes to hell before our technology saves us? What if artificial intelligence comes along toolate to make a big impact on the short-term problems I’m worrying about? In that case, maybe I should focus on short-term solutions.

Just to be clear: this isn’t some veiled attack on your priorities. I’m just trying to decide on my own. One good thing about having billions of people on the planet is that we don’t all have to do the same thing. Indeed, a multi-pronged approach is best. But for my own decisions, I want some rough guess about how long various potentially revolutionary technologies will take to come online.

What do you think about all this?

EY: I’ll try to answer the question about timescales, but first let me explain in some detail why I don’t think the decision should be dominated by that question.

If you look up “Scope Insensitivity” on Less Wrong, you’ll see that when three different groups of subjects were asked how much they would pay in increased taxes to save 2,000 / 20,000 / 200,000 birds from drowning in uncovered oil ponds, the respective average answers were $80 /$78 / \$88. People asked questions like this visualize one bird, wings slicked with oil, struggling to escape, and that creates some amount of emotional affect which determines willingness to pay, and the quantity gets tossed out the window since no one can visualize 200,000 of anything. Another hypothesis to explain the data is “purchase of moral satisfaction”, which says that people give enough money to create a “warm glow” inside themselves, and the amount required might have something to do with your personal financial situation, but it has nothing to do with birds. Similarly, residents of four US states were only willing to pay 22% more to protect all 57 wilderness areas in those states than to protect one area. The result I found most horrifying was that subjects were willing to contribute more when a set amount of money was needed to save one child’s life, compared to the same amount of money saving eight lives—because, of course, focusing your attention on a single person makes the feelings stronger, less diffuse.

So while it may make sense to enjoy the warm glow of doing good deeds after we do them, we cannot possibly allow ourselves to choose between altruistic causes based on the relative amounts of warm glow they generate, because our intuitions are quantitatively insane.

And two antidotes that absolutely must be applied in choosing between altruistic causes are conscious appreciation of scope and conscious appreciation of marginal impact.

By its nature, your brain flushes right out the window the all-important distinction between saving one life and saving a million lives. You’ve got to compensate for that using conscious, verbal deliberation. The Society For Curing Rare Diseases in Cute Puppies has got great warm glow, but the fact that these diseases are rare should call a screeching halt right there—which you’re going to have to do consciously, not intuitively. Even before you realize that, contrary to the relative warm glows, it’s really hard to make a moral case for trading off human lives against cute puppies. I suppose if you could save a billion puppies using one dollar I wouldn’t scream at someone who wanted to spend the dollar on that instead of cancer research.

And similarly, if there are a hundred thousand researchers and billions of dollars annually that are already going into saving species from extinction—because it’s a prestigious and popular cause that has an easy time generating warm glow in lots of potential funders—then you have to ask about the marginal value of putting your effort there, where so many other people are already working, compared to a project that isn’t so popular.

I wouldn’t say “Don’t worry, we won’t miss those species”. But consider the future intergalactic civilizations growing out of Earth-originating intelligent life. Consider the whole history of a universe which contains this world of Earth and this present century, and also billions of years of future intergalactic civilization continuing until the universe dies, or maybe forever if we can think of some ingenious way to carry on. Next consider the interval in utility between a universe-history in which Earth-originating intelligence survived and thrived and managed to save 95% of the non-primate biological species now alive, versus a universe-history in which only 80% of those species are alive. That utility interval is not very large compared to the utility interval between a universe in which intelligent life thrived and intelligent life died out. Or the utility interval between a universe-history filled with sentient beings who experience happiness and have empathy for each other and get bored when they do the same thing too many times, versus a universe-history that grew out of various failures of Friendly AI.

(The really scary thing about universes that grow out of a loss of human value is not that they are different, but that they are, from our standpoint, boring. The human utility function says that once you’ve made a piece of art, it’s more fun to make a different piece of art next time. But that’s just us. Most random utility functions will yield instrumental strategies that spend some of their time and resources exploring for the patterns with the highest utility at the beginning of the problem, and then use the rest of their resources to implement the pattern with the highest utility, over and over and over. This sort of thing will surprise a human who expects, on some deep level, that all minds are made out of human parts, and who thinks, “Won’t the AI see that its utility function is boring?” But the AI is not a little spirit that looks over its code and decides whether to obey it; the AI is the code. If the code doesn’t say to get bored, it won’t get bored. A strategy of exploration followed by exploitation is implicit in most utility functions, but boredom is not. If your utility function does not already contain a term for boredom, then you don’t care; it’s not something that emerges as an instrumental value from most terminal values. For more on this see: “In Praise of Boredom” in the Fun Theory Sequence on Less Wrong.)

Anyway: In terms of expected utility maximization, even large probabilities of jumping the interval between a universe-history in which 95% of existing biological species survive Earth’s 21st century, versus a universe-history where 80% of species survive, are just about impossible to trade off against tiny probabilities of jumping the interval between interesting universe-histories, versus boring ones where intelligent life goes extinct, or the wrong sort of AI self-improves.

I honestly don’t see how a rationalist can avoid this conclusion: At this absolutely critical hinge in the history of the universe—Earth in the 21st century—rational altruists should devote their marginal attentions to risks that threaten to terminate intelligent life or permanently destroy a part of its potential. Those problems, which Nick Bostrom named “existential risks“, have got all the scope. And when it comes to marginal impact, there are major risks outstanding that practically no one is working on. Once you get the stakes on a gut level it’s hard to see how doing anything else could be sane.

So how do you go about protecting the future of intelligent life? Environmentalism? After all, there are environmental catastrophes that could knock over our civilization… but then if you want to put the whole universe at stake, it’s not enough for one civilization to topple, you have to argue that our civilization is above average in its chances of building a positive galactic future compared to whatever civilization would rise again a century or two later. Maybe if there were ten people working on environmentalism and millions of people working on Friendly AI, I could see sending the next marginal dollar to environmentalism. But with millions of people working on environmentalism, and major existential risks that are completely ignored… if you add a marginal resource that can, rarely, be steered by expected utilities instead of warm glows, devoting that resource to environmentalism does not make sense.

Similarly with other short-term problems. Unless they’re little-known and unpopular problems, the marginal impact is not going to make sense, because millions of other people will already be working on them. And even if you argue that some short-term problem leverages existential risk, it’s not going to be perfect leverage and some quantitative discount will apply, probably a large one. I would be suspicious that the decision to work on a short-term problem was driven by warm glow, status drives, or simple conventionalism.

With that said, there’s also such a thing as comparative advantage—the old puzzle of the lawyer who works an hour in the soup clinic instead of working an extra hour as a lawyer and donating the money. Personally I’d say you can work an hour in the soup clinic to keep yourself going if you like, but you should also be working extra lawyer-hours and donating the money to the soup clinic, or better yet, to something with more scope. (See “Purchase Fuzzies and Utilons Separately” on Less Wrong.) Most people can’t work effectively on Artificial Intelligence (some would question if anyone can, but at the very least it’s not an easy problem). But there’s a variety of existential risks to choose from, plus a general background job of spreading sufficiently high-grade rationality and existential risk awareness. One really should look over those before going into something short-term and conventional. Unless your master plan is just to work the extra hours and donate them to the cause with the highest marginal expected utility per dollar, which is perfectly respectable.

Where should you go in life? I don’t know exactly, but I think I’ll go ahead and say “not environmentalism”. There’s just no way that the product of scope, marginal impact, and John Baez’s comparative advantage is going to end up being maximal at that point.

Which brings me to AI timescales.

If I knew exactly how to make a Friendly AI, and I knew exactly how many people I had available to do it, I still couldn’t tell you how long it would take because of Product Management Chaos.

As it stands, this is a basic research problem—which will always feel very hard, because we don’t understand it, and that means when our brain checks for solutions, we don’t see any solutions available. But this ignorance is not to be confused with the positive knowledge that the problem will take a long time to solve once we know how to solve it. It could be that some fundamental breakthrough will dissolve our confusion and then things will look relatively easy. Or it could be that some fundamental breakthrough will be followed by the realization that, now that we know what to do, it’s going to take at
least another 20 years to do it.

I seriously have no idea when AI is going to show up, although I’d be genuinely and deeply shocked if it took another century (barring a collapse of civilization in the meanwhile).

If you were to tell me that as a Bayesian I have to put probability distributions on things on pain of having my behavior be inconsistent and inefficient, well, I would actually suspect that my behavior is inconsistent. But if you were to try and induce from my behavior a median expected time where I spend half my effort planning for less and half my effort planning for more, it would probably look something like 2030.

But that doesn’t really matter to my decisions. Among all existential risks I know about, Friendly AI has the single largest absolute scope—it affects everything, and the problem must be solved at some point for worthwhile intelligence to thrive. It also has the largest product of scope of marginal impact, because practically no one is working on it, even compared to other existential risks. And my abilities seem applicable to it. So I may not like my uncertainty about timescales, but my decisions are not unstable with respect to that uncertainty.

JB: Ably argued! If I think of an interesting reply, I’ll put it in the blog discussion. Thanks for your time.

The best way to predict the future is to invent it. – Alan Kay

## This Week’s Finds (Week 312)

14 March, 2011

This is the second part of my interview with Eliezer Yudkowsky. If you click on some technical terms here, you’ll go down to a section where I explain them.

JB: You’ve made a great case for working on artificial intelligence—and more generally, understanding how intelligence works, to figure out how we can improve it. It’s especially hard to argue against studying rationality. Even most people who doubt computers will ever get smarter will admit the possibility that people can improve. And it seems clear that the almost every problem we face could benefit from better thinking.

I’m intrigued by the title The Art of Rationality because it suggests that there’s a kind of art to it. We don’t know how to teach someone to be a great artist, but maybe we can teach them to be a better artist. So, what are some of the key principles when it comes to thinking better?

EY: Stars above, what an open-ended question. The idea behind the book is to explain all the drop-dead basic fundamentals that almost no one seems to know about, like what is evidence, what is simplicity, what is truth, the importance of actually changing your mind now and then, the major known cognitive biases that stop people from changing their minds, what it means to live in a universe where things are made of parts, and so on. This is going to be a book primarily aimed at people who are not completely frightened away by complex mathematical concepts such as addition, multiplication, and division (i.e., all you need to understand Bayes’ Theorem if it’s explained properly), albeit with the whole middle of the book being just practical advice based on cognitive biases for the benefit of people who don’t want to deal with multiplication and division. Each chapter is going to address a different aspect of rationality, not in full textbook detail, just enough to convey the sense of a concept, with each chapter being around 5-10,000 words broken into 4-10 bite-size sections of 500-2000 words each. Which of the 27 currently planned book chapters did you want me to summarize?

But if I had to pick just one thing, just one concept that’s most important, I think it would be the difference between rationality and rationalization.

Suppose there’s two boxes, only one of which contains a diamond. And on the two boxes there are various signs and portents which distinguish, imperfectly and probabilistically, between boxes which contain diamonds, and boxes which don’t. I could take a sheet of paper, and I could write down all the signs and portents that I understand, and do my best to add up the evidence, and then on the bottom line I could write, "And therefore, there is a 37% probability that Box A contains the diamond." That’s rationality. Alternatively, I could be the owner of Box A, and I could hire a clever salesman to sell Box A for the highest price he can get; and the clever salesman starts by writing on the bottom line of his sheet of paper, "And therefore, Box A contains the diamond", and then he writes down all the arguments he can think of on the lines above.

But consider: At the moment the salesman wrote down the bottom line on that sheet of paper, the truth or falsity of the statement was fixed. It’s already right or already wrong, and writing down arguments on the lines above isn’t going to change that. Or if you imagine a spread of probable worlds, some of which have different boxes containing the diamond, the correlation between the ink on paper and the diamond’s location became fixed at the moment the ink was written down, and nothing which doesn’t change the ink or the box is going to change that correlation.

That’s "rationalization", which should really be given a name that better distinguishes it from rationality, like "anti-rationality" or something. It’s like calling lying "truthization". You can’t make rational what isn’t rational to start with.

Whatever process your brain uses, in reality, to decide what you’re going to argue for, that’s what determines your real-world effectiveness. Rationality isn’t something you can use to argue for a side you already picked. Your only chance to be rational is while you’re still choosing sides, before you write anything down on the bottom line. If I had to pick one concept to convey, it would be that one.

JB: Okay. I wasn’t really trying to get you to summarize a whole book. I’ve seen you explain a whole lot of heuristics designed to help us be more rational. So I was secretly wondering if the "art of rationality" is mainly a long list of heuristics, or whether you’ve been able to find a few key principles that somehow spawn all those heuristics.

Either way, it could be a tremendously useful book. And even if you could distill the basic ideas down to something quite terse, in practice people are going to need all those heuristics—especially since many of them take the form "here’s something you tend to do without noticing you’re doing it—so watch out!" If we’re saddled with dozens of cognitive biases that we can only overcome through strenuous effort, then your book has to be long. You can’t just say "apply Bayes’ rule and all will be well."

I can see why you’d single out the principle that "rationality only comes into play before you’ve made up your mind", because so much seemingly rational argument is really just a way of bolstering support for pre-existing positions. But what is rationality? Is it something with a simple essential core, like "updating probability estimates according to Bayes’ rule", or is its very definition inherently long and complicated?

EY: I’d say that there are parts of rationality that we do understand very well in principle. Bayes’ Theorem, the expected utility formula, and Solomonoff induction between them will get you quite a long way. Bayes’ Theorem says how to update based on the evidence, Solomonoff induction tells you how to assign your priors (in principle, it should go as the Kolmogorov complexity aka algorithmic complexity of the hypothesis), and then once you have a function which predicts what will probably happen as the result of different actions, the expected utility formula says how to choose between them.

Marcus Hutter has a formalism called AIXI which combines all three to write out an AI as a single equation which requires infinite computing power plus a halting oracle to run. And Hutter and I have been debating back and forth for quite a while on which AI problems are or aren’t solved by AIXI. For example, I look at the equation as written and I see that AIXI will try the experiment of dropping an anvil on itself to resolve its uncertainty about what happens next, because the formalism as written invokes a sort of Cartesian dualism with AIXI on one side of an impermeable screen and the universe on the other; the equation for AIXI says how to predict sequences of percepts using Solomonoff induction, but it’s too simple to encompass anything as reflective as "dropping an anvil on myself will destroy that which is processing these sequences of percepts". At least that’s what I claim; I can’t actually remember whether Hutter was agreeing with me about that as of our last conversation. Hutter sees AIXI as important because he thinks it’s a theoretical solution to almost all of the important problems; I see AIXI as important because it demarcates the line between things that we understand in a fundamental sense and a whole lot of other things we don’t.

So there are parts of rationality—big, important parts too—which we know how to derive from simple, compact principles in the sense that we could write very simple pieces of code which would behave rationally along that dimension given unlimited computing power.

But as soon as you start asking "How can human beings be more rational?" then things become hugely more complicated because human beings make much more complicated errors that need to be patched on an individual basis, and asking "How can I be rational?" is only one or two orders of magnitude simpler than asking "How does the brain work?", i.e., you can hope to write a single book that will cover many of the major topics, but not quite answer it in an interview question…

On the other hand, the question "What is it that I am trying to do, when I try to be rational?" is a question for which big, important chunks can be answered by saying "Bayes’ Theorem", "expected utility formula" and "simplicity prior" (where Solomonoff induction is the canonical if uncomputable simplicity prior).

At least from a mathematical perspective. From a human perspective, if you asked "What am I trying to do, when I try to be rational?" then the fundamental answers would run more along the lines of "Find the truth without flinching from it and without flushing all the arguments you disagree with out the window", "When you don’t know, try to avoid just making stuff up", "Figure out whether the strength of evidence is great enough to support the weight of every individual detail", "Do what should lead to the best consequences, but not just what looks on the immediate surface like it should lead to the best consequences, you may need to follow extra rules that compensate for known failure modes like shortsightedness and moral rationalizing"…

JB: Fascinating stuff!

Yes, I can see that trying to improve humans is vastly more complicated than designing a system from scratch… but also very exciting, because you can tell a human a high-level principle like " "When you don’t know, try to avoid just making stuff up" and have some slight hope that they’ll understand it without it being explained in a mathematically precise way.

I guess AIXI dropping an anvil on itself is a bit like some of the self-destructive experiments that parents fear their children will try, like sticking a pin into an electrical outlet. And it seems impossible to avoid doing such experiments without having a base of knowledge that was either "built in" or acquired by means of previous experiments.

In the latter case, it seems just a matter of luck that none of these previous experiments were fatal. Luckily, people also have "built in" knowledge. More precisely, we have access to our ancestor’s knowledge and habits, which get transmitted to us genetically and culturally. But still, a fair amount of random blundering, suffering, and even death was required to build up that knowledge base.

So when you imagine "seed AIs" that keep on improving themselves and eventually become smarter than us, how can you reasonably hope that they’ll avoid making truly spectacular mistakes? How can they learn really new stuff without a lot of risk?

EY: The best answer I can offer is that they can be conservative externally and deterministic internally.

Human minds are constantly operating on the ragged edge of error, because we have evolved to compete with other humans. If you’re a bit more conservative, if you double-check your calculations, someone else will grab the banana and that conservative gene will not be passed on to descendants. Now this does not mean we couldn’t end up in a bad situation with AI companies competing with each other, but there’s at least the opportunity to do better.

If I recall correctly, the Titanic sank from managerial hubris and cutthroat cost competition, not engineering hubris. The original liners were designed far more conservatively, with triple-redundant compartmentalized modules and soon. But that was before cost competition took off, when the engineers could just add on safety features whenever they wanted. The part about the Titanic being extremely safe was pure marketing literature.

There is also no good reason why any machine mind should be overconfident the way that humans are. There are studies showing that, yes, managers prefer subordinates who make overconfident promises to subordinates who make accurate promises—sometimes I still wonder that people are this silly, but given that people are this silly, the social pressures and evolutionary pressures follow. And we have lots of studies showing that, for whatever reason, humans are hugely overconfident; less than half of students finish their papers by the time they think it 99% probable they’ll get done, etcetera.

And this is a form of stupidity an AI can simply do without. Rationality is not omnipotent; a bounded rationalist cannot do all things. But there is no reason why a bounded rationalist should ever have to overpromise, be systematically overconfident, systematically tend to claim it can do what it can’t. It does not have to systematically underestimate the value of getting more information, or overlook the possibility of unspecified Black Swans and what sort of general behavior helps to compensate. (A bounded rationalist does end up overlooking specific Black Swans because it doesn’t have enough computing power to think of all specific possible catastrophes.)

And contrary to how it works in say Hollywood, even if an AI does manage to accidentally kill a human being, that doesn’t mean it’s going to go “I HAVE KILLED” and dress up in black and start shooting nuns from rooftops. What it ought to do—what you’d want to see happen—would be for the utility function to go on undisturbed, and for the probability distribution to update based on whatever unexpected thing just happened and contradicted its old hypotheses about what does and does not kill humans. In other words, keep the same goals and say “oops” on the world-model; keep the same terminal values and revise its instrumental policies. These sorts of external-world errors are not catastrophic unless they can actually wipe out the planet in one shot, somehow.

The catastrophic sort of error, the sort you can’t recover from, is an error in modifying your own source code. If you accidentally change your utility function you will no longer want to change it back. And in this case you might indeed ask, "How will an AI make millions or billions of code changes to itself without making a mistake like that?" But there are in fact methods powerful enough to do a billion error-free operations. A friend of mine once said something along the lines of "a CPU does a mole of transistor operations, error-free, in a day" though I haven’t checked the numbers. When chip manufacturers are building a machine with hundreds of millions of interlocking pieces and they don’t want to have to change it after it leaves the factory, they may go so far as to prove the machine correct, using human engineers to navigate the proof space and suggest lemmas to prove (which AIs can’t do, they’re defeated by the exponential explosion) and complex theorem-provers to prove the lemmas (which humans would find boring) and simple verifiers to check the generated proof. It takes a combination of human and machine abilities and it’s extremely expensive. But I strongly suspect that an Artificial General Intelligence with a good design would be able to treat all its code that way—that it would combine all those abilities in a single mind, and find it easy and natural to prove theorems about its code changes. It could not, of course, prove theorems about the external world (at least not without highly questionable assumptions). It could not prove external actions correct. The only thing it could write proofs about would be events inside the highly deterministic environment of a CPU—that is, its own thought processes. But it could prove that it was processing probabilities about those actions in a Bayesian way, and prove that it was assessing the probable consequences using a particular utility function. It could prove that it was sanely trying to achieve the same goals.

A self-improving AI that’s unsure about whether to do something ought to just wait and do it later after self-improving some more. It doesn’t have to be overconfident. It doesn’t have to operate on the ragged edge of failure. It doesn’t have to stop gathering information too early, if more information can be productively gathered before acting. It doesn’t have to fail to understand the concept of a Black Swan. It doesn’t have to do all this using a broken error-prone brain like a human one. It doesn’t have to be stupid in the ways like overconfidence that humans seem to have specifically evolved to be stupid. It doesn’t have to be poorly calibrated (assign 99% probabilities that come true less that 99 out of 100 times), because bounded rationalists can’t do everything but they don’t have to claim what they can’t do. It can prove that its self-modifications aren’t making itself crazy or changing its goals, at least if the transistors work as specified, or make no more than any possible combination of 2 errors, etc. And if the worst does happen, so long as there’s still a world left afterward, it will say "Oops" and not do it again. This sounds to me like essentially the optimal scenario given any sort of bounded rationalist whatsoever.

And finally, if I was building a self-improving AI, I wouldn’t ask it to operate heavy machinery until after it had grown up. Why should it?

JB: Indeed!

Okay—I’d like to take a break here, explain some terms you used, and pick up next week with some less technical questions, like what’s a better use of time: tackling environmental problems, or trying to prepare for a technological singularity?

#### Some explanations

Here are some quick explanations. If you click on the links here you’ll get more details:

Cognitive Bias. A cognitive bias is a way in which people’s judgements systematically deviate from some norm—for example, from ideal rational behavior. You can see a long list of cognitive biases on Wikipedia. It’s good to know a lot of these and learn how to spot them in yourself and your friends.

For example, confirmation bias is the tendency to pay more attention to information that confirms our existing beliefs. Another great example is the bias blind spot: the tendency for people to think of themselves as less cognitively biased than average! I’m sure glad I don’t suffer from that.

Bayes’ Theorem. This is a rule for updating our opinions about probabilities when we get new information. Suppose you start out thinking the probability of some event A is P(A), and the probability of some event B is P(B). Suppose P(A|B) is the probability of event A given that B happens. Likewise, suppose P(B|A) is the probability of B given that A happens. Then the probability that both A and B happen is

P(A|B) P(B)

but by the same token it’s also

P(B|A) P(A)

so these are equal. A little algebra gives Bayes’ Theorem:

P(A|B) = P(B|A) P(A) / P(B)

If for some reason we know everything on the right-hand side, we can this equation to work out P(A|B), and thus update our probability for event A when we see event B happen.

For a longer explanation with examples, see:

• Eliezer Yudkowsky, An intuitive explanation of Bayes’ Theorem.

Some handy jargon: we call P(A) the prior probability of A, and P(A|B) the posterior probability.

Solomonoff Induction. Bayes’ Theorem helps us compute posterior probabilities, but where do we get the prior probabilities from? How can we guess probabilities before we’ve observed anything?

This famous puzzle led Ray Solomonoff to invent Solomonoff induction. The key new idea is algorithmic probability theory. This is a way to define a probability for any string of letters in some alphabet, where a string counts as more probable if it’s less complicated. If we think of a string as a "hypothesis"—it could be a sentence in English, or an equation—this becomes a way to formalize Occam’s razor: the idea that given two competing hypotheses, the simpler one is more likely to be true.

So, algorithmic probability lets us define a prior probability distribution on hypotheses, the so-called “simplicity prior”, that implements Occam’s razor.

More precisely, suppose we have a special programming language where:

1. Computer programs are written as strings of bits.

2. They contain a special bit string meaning “END” at the end, and nowhere else.

3. They don’t take an input: they just run and either halt and print out a string of letters, or never halt.

Then to get the algorithmic probability of a string of letters, we take all programs that print out that string and add up

2-length of program

So, you can see that a string counts as more probable if it has more short programs that print it out.

Kolmogorov complexity. The Kolmologorov complexity of a string of letters is the length of the shortest program that prints it out, where programs are written in a special language as described above. This is a way of measuring how complicated a string is. It’s closely related to the algorithmic entropy: the difference between the Kolmogorov complexity of a string and minus the logarithm of its algorithmic probability is bounded by a constant, if we take logarithms using base 2. For more on all this stuff, see:

• M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity Theory and its Applications, Springer, Berlin, 2008.

Halting Oracle. Alas, the algorithmic probability of a string is not computable. Why? Because to compute it, you’d need to go through all the programs in your special language that print out that string and add up a contribution from each one. But to do that, you’d need to know which programs halt—and there’s no systematic way to answer that question, which is called the halting problem.

But, we can pretend! We can pretend we have a magic box that will tell us whether any program in our special language halts. Computer scientists call any sort of magic box that answers questions an oracle. So, our particular magic box called a halting oracle.

AIXI. AIXI is Marcus Hutter’s attempt to define an agent that "behaves optimally in any computable environment". Since AIXI relies on the idea of algorithmic probability, you can’t run AIXI on a computer unless it has infinite computer power and—the really hard part—access to a halting oracle. However, Hutter has also defined computable approximations to AIXI. For a quick intro, see this:

• Marcus Hutter, Universal intelligence: a mathematical top-down approach.

For more, try this:

• Marcus Hutter, Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability, Springer, Berlin, 2005.

Utility. Utility is a hypothetical numerical measure of satisfaction. If you know the probabilities of various outcomes, and you know what your utility will be in each case, you can compute your "expected utility" by taking the probabilities of the different outcomes, multiplying them by the corresponding utilities, and adding them up. In simple terms, this is how happy you’ll be on average. The expected utility hypothesis says that a rational decision-maker has a utility function and will try to maximize its expected utility.

Bounded Rationality. In the real world, any decision-maker has limits on its computational power and the time it has to make a decision. The idea that rational decision-makers "maximize expected utility" is oversimplified unless it takes this into account somehow. Theories of bounded rationality try to take these limitations into account. One approach is to think of decision-making as yet another activity whose costs and benefits must be taken into account when making decisions. Roughly: you must decide how much time you want to spend deciding. Of course, there’s an interesting circularity here.

Black Swan. According to Nassim Taleb, human history is dominated by black swans: important events that were unpredicted and indeed unpredictable, but rationalized by hindsight and thus made to seem as if they could have been predicted. He believes that rather than trying to predict such events (which he considers largely futile), we should try to get good at adapting to them. For more see:

• Nassim Taleb, The Black Swan: The Impact of the Highly Improbable, Random House, New York, 2007.

The first principle is that you must not fool yourself—and you are the easiest person to fool. – Richard Feynman

## This Week’s Finds (Week 311)

7 March, 2011

This week I’ll start an interview with Eliezer Yudkowsky, who works at an institute he helped found: the Singularity Institute of Artificial Intelligence.

While many believe that global warming or peak oil are the biggest dangers facing humanity, Yudkowsky is more concerned about risks inherent in the accelerating development of technology. There are different scenarios one can imagine, but a bunch tend to get lumped under the general heading of a technological singularity. Instead of trying to explain this idea in all its variations, let me rapidly sketch its history and point you to some reading material. Then, on with the interview!

In 1958, the mathematician Stanislaw Ulam wrote about some talks he had with John von Neumann:

One conversation centered on the ever accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.

In 1965, the British mathematician Irving John Good raised the possibility of an "intelligence explosion": if machines could improve themselves to get smarter, perhaps they would quickly become a lot smarter than us.

In 1983 the mathematician and science fiction writer Vernor Vinge brought the singularity idea into public prominence with an article in Omni magazine, in which he wrote:

We will soon create intelligences greater than our own. When this happens, human history will have reached a kind of singularity, an intellectual transition as impenetrable as the knotted space-time at the center of a black hole, and the world will pass far beyond our understanding. This singularity, I believe, already haunts a number of science-fiction writers. It makes realistic extrapolation to an interstellar future impossible. To write a story set more than a century hence, one needs a nuclear war in between … so that the world remains intelligible.

In 1993 wrote an essay in which he even ventured a prediction as to when the singularity would happen:

Within thirty years, we will have the technological means to create superhuman intelligence. Shortly after, the human era will be ended.

You can read that essay here:

• Vernor Vinge, The coming technological singularity: how to survive in the post-human era, article for the VISION-21 Symposium, 30-31 March, 1993.

With the rise of the internet, the number of people interested in such ideas grew enormously: transhumanists, extropians, singularitarians and the like. In 2005, Ray Kurzweil wrote:

What, then, is the Singularity? It’s a future period during which the pace of technological change will be so rapid, its impact so deep, that human life will be irreversibly transformed. Although neither utopian or dystopian, this epoch will transform the concepts we rely on to give meaning to our lives, from our business models to the cycle of human life, including death itself. Understanding the Singularity will alter our perspective on the significance of our past and the ramifications for our future. To truly understand it inherently changes one’s view of life in general and one’s particular life. I regard someone who understands the Singularity and who has reflected on its implications for his or her own life as a "singularitarian".

He predicted that the singularity will occur around 2045. For more, see:

• Ray Kurzweil, The Singularity is Near: When Humans Transcend Biology, Viking, 2005.

Yudkowsky distinguishes three major schools of thought regarding the singularity:

Accelerating Change that is nonetheless somewhat predictable (e.g. Ray Kurzweil).

Event Horizon: after the rise of intelligence beyond our own, the future becomes absolutely unpredictable to us (e.g. Vernor Vinge).

Intelligence Explosion: a rapid chain reaction of self-amplifying intelligence until ultimate physical limits are reached (e.g. I. J. Good and Eliezer Yudkowsky).

Yukdowsky believes that an intelligence explosion could threaten everything we hold dear unless the first self-amplifying intelligence is "friendly". The challenge, then, is to design “friendly AI”. And this requires understanding a lot more than we currently do about intelligence, goal-driven behavior, rationality and ethics—and of course what it means to be “friendly”. For more, start here:

• The Singularity Institute of Artificial Intelligence, Publications.

Needless to say, there’s a fourth school of thought on the technological singularity, even more popular than those listed above:

Baloney: it’s all a load of hooey!

Most people in this school have never given the matter serious thought, but a few have taken time to formulate objections. Others think a technological singularity is possible but highly undesirable and avoidable, so they want to prevent it. For various criticisms, start here:

Technological singularity: Criticism, Wikipedia.

Personally, what I like most about singularitarians is that they care about the future and recognize that it may be very different from the present, just as the present is very different from the pre-human past. I wish there were more dialog between them and other sorts of people—especially people who also care deeply about the future, but have drastically different visions of it. I find it quite distressing how people with different visions of the future do most of their serious thinking within like-minded groups. This leads to groups with drastically different assumptions, with each group feeling a lot more confident about their assumptions than an outsider would deem reasonable. I’m talking here about environmentalists, singularitarians, people who believe global warming is a serious problem, people who don’t, etc. Members of any tribe can easily see the cognitive defects of every other tribe, but not their own. That’s a pity.

And so, this interview:

JB: I’ve been a fan of your work for quite a while. At first I thought your main focus was artificial intelligence (AI) and preparing for a technological singularity by trying to create "friendly AI". But lately I’ve been reading your blog, Less Wrong, and I get the feeling you’re trying to start a community of people interested in boosting their own intelligence—or at least, their own rationality. So, I’m curious: how would you describe your goals these days?

EY: My long-term goals are the same as ever: I’d like human-originating intelligent life in the Solar System to survive, thrive, and not lose its values in the process. And I still think the best means is self-improving AI. But that’s a bit of a large project for one person, and after a few years of beating my head against the wall trying to get other people involved, I realized that I really did have to go back to the beginning, start over, and explain all the basics that people needed to know before they could follow the advanced arguments. Saving the world via AI research simply can’t compete against the Society for Treating Rare Diseases in Cute Kittens unless your audience knows about things like scope insensitivity and the affect heuristic and the concept of marginal expected utility, so they can see why the intuitively more appealing option is the wrong one. So I know it sounds strange, but in point of fact, since I sat down and started explaining all the basics, the Singularity Institute for Artificial Intelligence has been growing at a better clip and attracting more interesting people.

Right now my short-term goal is to write a book on rationality (tentative working title: The Art of Rationality) to explain the drop-dead basic fundamentals that, at present, no one teaches; those who are impatient will find a lot of the core material covered in these Less Wrong sequences:

though I intend to rewrite it all completely for the book so as to make it accessible to a wider audience. Then I probably need to take at least a year to study up on math, and then—though it may be an idealistic dream—I intend to plunge into the decision theory of self-modifying decision systems and never look back. (And finish the decision theory and implement it and run the AI, at which point, if all goes well, we Win.)

JB: I can think of lots of big questions at this point, and I’ll try to get to some of those, but first I can’t resist asking: why do you want to study math?

My current sense of the problems of self-modifying decision theory is that it won’t end up being Deep Math, nothing like the proof of Fermat’s Last Theorem—that 95% of the progress-stopping difficulty will be in figuring out which theorem is true and worth proving, not the proof. (Robin Hanson spends a lot of time usefully discussing which activities are most prestigious in academia, and it would be a Hansonian observation, even though he didn’t say it AFAIK, that complicated proofs are prestigious but it’s much more important to figure out which theorem to prove.) Even so, I was a spoiled math prodigy as a child—one who was merely amazingly good at math for someone his age, instead of competing with other math prodigies and training to beat them. My sometime coworker Marcello (he works with me over the summer and attends Stanford at other times) is a non-spoiled math prodigy who trained to compete in math competitions and I have literally seen him prove a result in 30 seconds that I failed to prove in an hour.

I’ve come to accept that to some extent we have different and complementary abilities—now and then he’ll go into a complicated blaze of derivations and I’ll look at his final result and say "That’s not right" and maybe half the time it will actually be wrong. And when I’m feeling inadequate I remind myself that having mysteriously good taste in final results is an empirically verifiable talent, at least when it comes to math. This kind of perceptual sense of truth and falsity does seem to be very much important in figuring out which theorems to prove. But I still get the impression that the next steps in developing a reflective decision theory may require me to go off and do some of the learning and training that I never did as a spoiled math prodigy, first because I could sneak by on my ability to "see things", and second because it was so much harder to try my hand at any sort of math I couldn’t see as obvious. I get the impression that knowing which theorems to prove may require me to be better than I currently am at doing the proofs.

On some gut level I’m also just embarrassed by the number of compliments I get for my math ability (because I’m a good explainer and can make math things that I do understand seem obvious to other people) as compared to the actual amount of advanced math knowledge that I have (practically none by any real mathematician’s standard). But that’s more of an emotion that I’d draw on for motivation to get the job done, than anything that really ought to factor into my long-term planning. For example, I finally looked up the drop-dead basics of category theory because someone else on a transhumanist IRC channel knew about it and I didn’t. I’m happy to accept my ignoble motivations as a legitimate part of myself, so long as they’re motivations to learn math.

JB: Ah, how I wish more of my calculus students took that attitude. Math professors worldwide will frame that last sentence of yours and put it on their office doors.

I’ve recently been trying to switch from pure math to more practical things. So I’ve been reading more about control theory, complex systems made of interacting parts, and the like. Jan Willems has written some very nice articles about this, and your remark about complicated proofs in mathematics reminds me of something he said:

… I have almost always felt fortunate to have been able to do research in a mathematics environment. The average competence level is high, there is a rich history, the subject is stable. All these factors are conducive for science. At the same time, I was never able to feel unequivocally part of the mathematics culture, where, it seems to me, too much value is put on difficulty as a virtue in itself. My appreciation for mathematics has more to do with its clarity of thought, its potential of sharply articulating ideas, its virtues as an unambiguous language. I am more inclined to treasure the beauty and importance of Shannon’s ideas on errorless communication, algorithms such as the Kalman filter or the FFT, constructs such as wavelets and public key cryptography, than the heroics and virtuosity surrounding the four-color problem, Fermat’s last theorem, or the Poincaré and Riemann conjectures.

I tend to agree. Never having been much of a prodigy myself, I’ve always preferred thinking of math as a language for understanding the universe, rather than a list of famous problems to challenge heroes, an intellectual version of the Twelve Labors of Hercules. But for me the universe includes very abstract concepts, so I feel "pure" math such as category theory can be a great addition to the vocabulary of any scientist.

Anyway: back to business. You said:

I’d like human-originating intelligent life in the Solar System to survive, thrive, and not lose its values in the process. And I still think the best means is self-improving AI.

I bet a lot of our readers would happily agree with your first sentence. It sounds warm and fuzzy. But a lot of them might recoil from the next sentence. "So we should build robots that take over the world???" Clearly there’s a long train of thought lurking here. Could you sketch how it goes?

EY: Well, there’s a number of different avenues from which to approach that question. I think I’d like to start off with a quick remark—do feel free to ask me to expand on it—that if you want to bring order to chaos, you have to go where the chaos is.

In the early twenty-first century the chief repository of scientific chaos is Artificial Intelligence. Human beings have this incredibly powerful ability that took us from running over the savanna hitting things with clubs to making spaceships and nuclear weapons, and if you try to make a computer do the same thing, you can’t because modern science does not understand how this ability works.

At the same time, the parts we do understand, such as that human intelligence is almost certainly running on top of neurons firing, suggest very strongly that human intelligence is not the limit of the possible. Neurons fire at, say, 200 hertz top speed; transmit signals at 150 meters/second top speed; and even in the realm of heat dissipation (where neurons still have transistors beat cold) a synaptic firing still dissipates around a million times as much heat as the thermodynamic limit for a one-bit irreversible operation at 300 Kelvin. So without shrinking the brain, cooling the brain, or invoking things like reversible computing, it ought to be physically possible to build a mind that works at least a million times faster than a human one, at which rate a subjective year would pass for every 31 sidereal seconds, and all the time from Ancient Greece up until now would pass in less than a day. This is talking about hardware because the hardware of the brain is a lot easier to understand, but software is probably a lot more important; and in the area of software, we have no reason to believe that evolution came up with the optimal design for a general intelligence, starting from incremental modification of chimpanzees, on its first try.

People say things like "intelligence is no match for a gun" and they’re thinking like guns grew on trees, or they say "intelligence isn’t as important as social skills" like social skills are implemented in the liver instead of the brain. Talking about smarter-than-human intelligence is talking about doing a better version of that stuff humanity has been doing over the last hundred thousand years. If you want to accomplish large amounts of good you have to look at things which can make large differences.

Next lemma: Suppose you offered Gandhi a pill that made him want to kill people. Gandhi starts out not wanting people to die, so if he knows what the pill does, he’ll refuse to take the pill, because that will make him kill people, and right now he doesn’t want to kill people. This is an informal argument that Bayesian expected utility maximizers with sufficient self-modification ability will self-modify in such a way as to preserve their own utility function. You would like me to make that a formal argument. I can’t, because if you take the current formalisms for things like expected utility maximization, they go into infinite loops and explode when you talk about self-modifying the part of yourself that does the self-modifying. And there’s a little thing called Löb’s Theorem which says that no proof system at least as powerful as Peano Arithmetic can consistently assert its own soundness, or rather, if you can prove a theorem of the form

□P ⇒ P

(if I prove P then it is true) then you can use this theorem to prove P. Right now I don’t know how you could even have a self-modifying AI that didn’t look itself over and say, "I can’t trust anything this system proves to actually be true, I had better delete it". This is the class of problems I’m currently working on—reflectively consistent decision theory suitable for self-modifying AI. A solution to this problem would let us build a self-improving AI and know that it was going to keep whatever utility function it started with.

There’s a huge space of possibilities for possible minds; people makethe mistake of asking "What will AIs do?" like AIs were the Tribe that Lives Across the Water, foreigners all of one kind from the same country. A better way of looking at it would be to visualize a gigantic space of possible minds and all human minds fitting into one tiny little dot inside the space. We want to understand intelligence well enough to reach into that gigantic space outside and pull out one of the rare possibilities that would be, from our perspective, a good idea to build.

If you want to maximize your marginal expected utility you have to maximize on your choice of problem over the combination of high impact, high variance, possible points of leverage, and few other people working on it. The problem of stable goal systems in self-improving Artificial Intelligence has no realistic competitors under any three of these criteria, let alone all four.

That gives you rather a lot of possible points for followup questions so I’ll stop there.

JB: Sure, there are so many followup questions that this interview should be formatted as a tree with lots of branches instead of in a linear format. But until we can easily spin off copies of ourselves I’m afraid that would be too much work.

So, I’ll start with a quick point of clarification. You say "if you want to bring order to chaos, you have to go where the chaos is." I guess that at one level you’re just saying that if we want to make a lot of progress in understanding the universe, we have to tackle questions that we’re really far from understanding—like how intelligence works.

And we can say this in a fancier way, too. If we wants models of reality that reduce the entropy of our probabilistic predictions (there’s a concept of entropy for probability distributions, which is big when the probability distribution is very smeared out), then we have to find subjects where our predictions have a lot of entropy.

Am I on the right track?

EY: Well, if we wanted to torture the metaphor a bit further, we could talk about how what you really want is not high-entropy distributions but highly unstable ones. For example, if I flip a coin, I have no idea whether it’ll come up heads or tails (maximum entropy) but whether I see it come up heads or tails doesn’t change my prediction for the next coinflip. If you zoom out and look at probability distributions over sequences of coinflips, then high-entropy distributions tend not to ever learn anything (seeing heads on one flip doesn’t change your prediction next time), while inductive probability distributions (where your beliefs about probable sequences are such that, say, 11111 is more probable than 11110) tend to be lower-entropy because learning requires structure. But this would be torturing the metaphor, so I should probably go back to the original tangent:

Richard Hamming used to go around annoying his colleagues at Bell Labs by asking them what were the important problems in their field, and then, after they answered, he would ask why they weren’t working on them. Now, everyone wants to work on "important problems", so why areso few people working on important problems? And the obvious answer is that working on the important problems doesn’t get you an 80% probability of getting one more publication in the next three months. And most decision algorithms will eliminate options like that before they’re even considered. The question will just be phrased as, "Of the things that will reliably keep me on my career track and not embarrass me, which is most important?"

And to be fair, the system is not at all set up to support people who want to work on high-risk problems. It’s not even set up to socially support people who want to work on high-risk problems. In Silicon Valley a failed entrepreneur still gets plenty of respect, which Paul Graham thinks is one of the primary reasons why Silicon Valley produces a lot of entrepreneurs and other places don’t. Robin Hanson is a truly excellent cynical economist and one of his more cynical suggestions is that the function of academia is best regarded as the production of prestige, with the production of knowledge being something of a byproduct. I can’t do justice to his development of that thesis in a few words (keywords: hanson academia prestige) but the key point I want to take away is that if you work on a famous problem that lots of other people are working on, your marginal contribution to human knowledge may be small, but you’ll get to affiliate with all the other prestigious people working on it.

And these are all factors which contribute to academia, metaphorically speaking, looking for its keys under the lamppost where the light is better, rather than near the car where it lost them. Because on a sheer gut level, the really important problems are often scary. There’s a sense of confusion and despair, and if you affiliate yourself with the field, that scent will rub off on you.

But if you try to bring order to an absence of chaos—to some field where things are already in nice, neat order and there is no sense of confusion and despair—well, the results are often well described in a little document you may have heard of called the Crackpot Index. Not that this is the only thing crackpot high-scorers are doing wrong, but the point stands, you can’t revolutionize the atomic theory of chemistry because there isn’t anything wrong with it.

We can’t all be doing basic science, but people who see scary, unknown, confusing problems that no one else seems to want to go near and think "I wouldn’t want to work on that!" have got their priorities exactly backward.

JB: The never-ending quest for prestige indeed has unhappy side-effects in academia. Some of my colleagues seem to reason as follows:

If Prof. A can understand Prof. B’s work, but Prof. B can’t understand Prof. A, then Prof. A must be smarter—so Prof. A wins.

But I’ve figured out a way to game the system. If I write in a way that few people can understand, everyone will think I’m smarter than I actually am! Of course I need someone to understand my work, or I’ll be considered a crackpot. But I’ll shroud my work in jargon and avoid giving away my key insights in plain language, so only very smart, prestigious colleagues can understand it.

On the other hand, tenure offers immense opportunities for risky and exciting pursuits if one is brave enough to seize them. And there are plenty of folks who do. After all, lots of academics are self-motivated, strong-willed rebels.

This has been on my mind lately since I’m trying to switch from pure math to something quite different. I’m not sure what, exactly. And indeed that’s why I’m interviewing you!

(Next week: Yudkowsky on The Art of Rationality, and what it means to be rational.)

Whenever there is a simple error that most laymen fall for, there is always a slightly more sophisticated version of the same problem that experts fall for. – Amos Tversky