This Week’s Finds (Week 312)

This is the second part of my interview with Eliezer Yudkowsky. If you click on some technical terms here, you’ll go down to a section where I explain them.

JB: You’ve made a great case for working on artificial intelligence—and more generally, understanding how intelligence works, to figure out how we can improve it. It’s especially hard to argue against studying rationality. Even most people who doubt computers will ever get smarter will admit the possibility that people can improve. And it seems clear that the almost every problem we face could benefit from better thinking.

I’m intrigued by the title The Art of Rationality because it suggests that there’s a kind of art to it. We don’t know how to teach someone to be a great artist, but maybe we can teach them to be a better artist. So, what are some of the key principles when it comes to thinking better?

EY: Stars above, what an open-ended question. The idea behind the book is to explain all the drop-dead basic fundamentals that almost no one seems to know about, like what is evidence, what is simplicity, what is truth, the importance of actually changing your mind now and then, the major known cognitive biases that stop people from changing their minds, what it means to live in a universe where things are made of parts, and so on. This is going to be a book primarily aimed at people who are not completely frightened away by complex mathematical concepts such as addition, multiplication, and division (i.e., all you need to understand Bayes’ Theorem if it’s explained properly), albeit with the whole middle of the book being just practical advice based on cognitive biases for the benefit of people who don’t want to deal with multiplication and division. Each chapter is going to address a different aspect of rationality, not in full textbook detail, just enough to convey the sense of a concept, with each chapter being around 5-10,000 words broken into 4-10 bite-size sections of 500-2000 words each. Which of the 27 currently planned book chapters did you want me to summarize?

But if I had to pick just one thing, just one concept that’s most important, I think it would be the difference between rationality and rationalization.

Suppose there’s two boxes, only one of which contains a diamond. And on the two boxes there are various signs and portents which distinguish, imperfectly and probabilistically, between boxes which contain diamonds, and boxes which don’t. I could take a sheet of paper, and I could write down all the signs and portents that I understand, and do my best to add up the evidence, and then on the bottom line I could write, "And therefore, there is a 37% probability that Box A contains the diamond." That’s rationality. Alternatively, I could be the owner of Box A, and I could hire a clever salesman to sell Box A for the highest price he can get; and the clever salesman starts by writing on the bottom line of his sheet of paper, "And therefore, Box A contains the diamond", and then he writes down all the arguments he can think of on the lines above.

But consider: At the moment the salesman wrote down the bottom line on that sheet of paper, the truth or falsity of the statement was fixed. It’s already right or already wrong, and writing down arguments on the lines above isn’t going to change that. Or if you imagine a spread of probable worlds, some of which have different boxes containing the diamond, the correlation between the ink on paper and the diamond’s location became fixed at the moment the ink was written down, and nothing which doesn’t change the ink or the box is going to change that correlation.

That’s "rationalization", which should really be given a name that better distinguishes it from rationality, like "anti-rationality" or something. It’s like calling lying "truthization". You can’t make rational what isn’t rational to start with.

Whatever process your brain uses, in reality, to decide what you’re going to argue for, that’s what determines your real-world effectiveness. Rationality isn’t something you can use to argue for a side you already picked. Your only chance to be rational is while you’re still choosing sides, before you write anything down on the bottom line. If I had to pick one concept to convey, it would be that one.

JB: Okay. I wasn’t really trying to get you to summarize a whole book. I’ve seen you explain a whole lot of heuristics designed to help us be more rational. So I was secretly wondering if the "art of rationality" is mainly a long list of heuristics, or whether you’ve been able to find a few key principles that somehow spawn all those heuristics.

Either way, it could be a tremendously useful book. And even if you could distill the basic ideas down to something quite terse, in practice people are going to need all those heuristics—especially since many of them take the form "here’s something you tend to do without noticing you’re doing it—so watch out!" If we’re saddled with dozens of cognitive biases that we can only overcome through strenuous effort, then your book has to be long. You can’t just say "apply Bayes’ rule and all will be well."

I can see why you’d single out the principle that "rationality only comes into play before you’ve made up your mind", because so much seemingly rational argument is really just a way of bolstering support for pre-existing positions. But what is rationality? Is it something with a simple essential core, like "updating probability estimates according to Bayes’ rule", or is its very definition inherently long and complicated?

EY: I’d say that there are parts of rationality that we do understand very well in principle. Bayes’ Theorem, the expected utility formula, and Solomonoff induction between them will get you quite a long way. Bayes’ Theorem says how to update based on the evidence, Solomonoff induction tells you how to assign your priors (in principle, it should go as the Kolmogorov complexity aka algorithmic complexity of the hypothesis), and then once you have a function which predicts what will probably happen as the result of different actions, the expected utility formula says how to choose between them.

Marcus Hutter has a formalism called AIXI which combines all three to write out an AI as a single equation which requires infinite computing power plus a halting oracle to run. And Hutter and I have been debating back and forth for quite a while on which AI problems are or aren’t solved by AIXI. For example, I look at the equation as written and I see that AIXI will try the experiment of dropping an anvil on itself to resolve its uncertainty about what happens next, because the formalism as written invokes a sort of Cartesian dualism with AIXI on one side of an impermeable screen and the universe on the other; the equation for AIXI says how to predict sequences of percepts using Solomonoff induction, but it’s too simple to encompass anything as reflective as "dropping an anvil on myself will destroy that which is processing these sequences of percepts". At least that’s what I claim; I can’t actually remember whether Hutter was agreeing with me about that as of our last conversation. Hutter sees AIXI as important because he thinks it’s a theoretical solution to almost all of the important problems; I see AIXI as important because it demarcates the line between things that we understand in a fundamental sense and a whole lot of other things we don’t.

So there are parts of rationality—big, important parts too—which we know how to derive from simple, compact principles in the sense that we could write very simple pieces of code which would behave rationally along that dimension given unlimited computing power.

But as soon as you start asking "How can human beings be more rational?" then things become hugely more complicated because human beings make much more complicated errors that need to be patched on an individual basis, and asking "How can I be rational?" is only one or two orders of magnitude simpler than asking "How does the brain work?", i.e., you can hope to write a single book that will cover many of the major topics, but not quite answer it in an interview question…

On the other hand, the question "What is it that I am trying to do, when I try to be rational?" is a question for which big, important chunks can be answered by saying "Bayes’ Theorem", "expected utility formula" and "simplicity prior" (where Solomonoff induction is the canonical if uncomputable simplicity prior).

At least from a mathematical perspective. From a human perspective, if you asked "What am I trying to do, when I try to be rational?" then the fundamental answers would run more along the lines of "Find the truth without flinching from it and without flushing all the arguments you disagree with out the window", "When you don’t know, try to avoid just making stuff up", "Figure out whether the strength of evidence is great enough to support the weight of every individual detail", "Do what should lead to the best consequences, but not just what looks on the immediate surface like it should lead to the best consequences, you may need to follow extra rules that compensate for known failure modes like shortsightedness and moral rationalizing"…

JB: Fascinating stuff!

Yes, I can see that trying to improve humans is vastly more complicated than designing a system from scratch… but also very exciting, because you can tell a human a high-level principle like " "When you don’t know, try to avoid just making stuff up" and have some slight hope that they’ll understand it without it being explained in a mathematically precise way.

I guess AIXI dropping an anvil on itself is a bit like some of the self-destructive experiments that parents fear their children will try, like sticking a pin into an electrical outlet. And it seems impossible to avoid doing such experiments without having a base of knowledge that was either "built in" or acquired by means of previous experiments.

In the latter case, it seems just a matter of luck that none of these previous experiments were fatal. Luckily, people also have "built in" knowledge. More precisely, we have access to our ancestor’s knowledge and habits, which get transmitted to us genetically and culturally. But still, a fair amount of random blundering, suffering, and even death was required to build up that knowledge base.

So when you imagine "seed AIs" that keep on improving themselves and eventually become smarter than us, how can you reasonably hope that they’ll avoid making truly spectacular mistakes? How can they learn really new stuff without a lot of risk?

EY: The best answer I can offer is that they can be conservative externally and deterministic internally.

Human minds are constantly operating on the ragged edge of error, because we have evolved to compete with other humans. If you’re a bit more conservative, if you double-check your calculations, someone else will grab the banana and that conservative gene will not be passed on to descendants. Now this does not mean we couldn’t end up in a bad situation with AI companies competing with each other, but there’s at least the opportunity to do better.

If I recall correctly, the Titanic sank from managerial hubris and cutthroat cost competition, not engineering hubris. The original liners were designed far more conservatively, with triple-redundant compartmentalized modules and soon. But that was before cost competition took off, when the engineers could just add on safety features whenever they wanted. The part about the Titanic being extremely safe was pure marketing literature.

There is also no good reason why any machine mind should be overconfident the way that humans are. There are studies showing that, yes, managers prefer subordinates who make overconfident promises to subordinates who make accurate promises—sometimes I still wonder that people are this silly, but given that people are this silly, the social pressures and evolutionary pressures follow. And we have lots of studies showing that, for whatever reason, humans are hugely overconfident; less than half of students finish their papers by the time they think it 99% probable they’ll get done, etcetera.

And this is a form of stupidity an AI can simply do without. Rationality is not omnipotent; a bounded rationalist cannot do all things. But there is no reason why a bounded rationalist should ever have to overpromise, be systematically overconfident, systematically tend to claim it can do what it can’t. It does not have to systematically underestimate the value of getting more information, or overlook the possibility of unspecified Black Swans and what sort of general behavior helps to compensate. (A bounded rationalist does end up overlooking specific Black Swans because it doesn’t have enough computing power to think of all specific possible catastrophes.)

And contrary to how it works in say Hollywood, even if an AI does manage to accidentally kill a human being, that doesn’t mean it’s going to go “I HAVE KILLED” and dress up in black and start shooting nuns from rooftops. What it ought to do—what you’d want to see happen—would be for the utility function to go on undisturbed, and for the probability distribution to update based on whatever unexpected thing just happened and contradicted its old hypotheses about what does and does not kill humans. In other words, keep the same goals and say “oops” on the world-model; keep the same terminal values and revise its instrumental policies. These sorts of external-world errors are not catastrophic unless they can actually wipe out the planet in one shot, somehow.

The catastrophic sort of error, the sort you can’t recover from, is an error in modifying your own source code. If you accidentally change your utility function you will no longer want to change it back. And in this case you might indeed ask, "How will an AI make millions or billions of code changes to itself without making a mistake like that?" But there are in fact methods powerful enough to do a billion error-free operations. A friend of mine once said something along the lines of "a CPU does a mole of transistor operations, error-free, in a day" though I haven’t checked the numbers. When chip manufacturers are building a machine with hundreds of millions of interlocking pieces and they don’t want to have to change it after it leaves the factory, they may go so far as to prove the machine correct, using human engineers to navigate the proof space and suggest lemmas to prove (which AIs can’t do, they’re defeated by the exponential explosion) and complex theorem-provers to prove the lemmas (which humans would find boring) and simple verifiers to check the generated proof. It takes a combination of human and machine abilities and it’s extremely expensive. But I strongly suspect that an Artificial General Intelligence with a good design would be able to treat all its code that way—that it would combine all those abilities in a single mind, and find it easy and natural to prove theorems about its code changes. It could not, of course, prove theorems about the external world (at least not without highly questionable assumptions). It could not prove external actions correct. The only thing it could write proofs about would be events inside the highly deterministic environment of a CPU—that is, its own thought processes. But it could prove that it was processing probabilities about those actions in a Bayesian way, and prove that it was assessing the probable consequences using a particular utility function. It could prove that it was sanely trying to achieve the same goals.

A self-improving AI that’s unsure about whether to do something ought to just wait and do it later after self-improving some more. It doesn’t have to be overconfident. It doesn’t have to operate on the ragged edge of failure. It doesn’t have to stop gathering information too early, if more information can be productively gathered before acting. It doesn’t have to fail to understand the concept of a Black Swan. It doesn’t have to do all this using a broken error-prone brain like a human one. It doesn’t have to be stupid in the ways like overconfidence that humans seem to have specifically evolved to be stupid. It doesn’t have to be poorly calibrated (assign 99% probabilities that come true less that 99 out of 100 times), because bounded rationalists can’t do everything but they don’t have to claim what they can’t do. It can prove that its self-modifications aren’t making itself crazy or changing its goals, at least if the transistors work as specified, or make no more than any possible combination of 2 errors, etc. And if the worst does happen, so long as there’s still a world left afterward, it will say "Oops" and not do it again. This sounds to me like essentially the optimal scenario given any sort of bounded rationalist whatsoever.

And finally, if I was building a self-improving AI, I wouldn’t ask it to operate heavy machinery until after it had grown up. Why should it?

JB: Indeed!

Okay—I’d like to take a break here, explain some terms you used, and pick up next week with some less technical questions, like what’s a better use of time: tackling environmental problems, or trying to prepare for a technological singularity?


Some explanations

Here are some quick explanations. If you click on the links here you’ll get more details:

Cognitive Bias. A cognitive bias is a way in which people’s judgements systematically deviate from some norm—for example, from ideal rational behavior. You can see a long list of cognitive biases on Wikipedia. It’s good to know a lot of these and learn how to spot them in yourself and your friends.

For example, confirmation bias is the tendency to pay more attention to information that confirms our existing beliefs. Another great example is the bias blind spot: the tendency for people to think of themselves as less cognitively biased than average! I’m sure glad I don’t suffer from that.

Bayes’ Theorem. This is a rule for updating our opinions about probabilities when we get new information. Suppose you start out thinking the probability of some event A is P(A), and the probability of some event B is P(B). Suppose P(A|B) is the probability of event A given that B happens. Likewise, suppose P(B|A) is the probability of B given that A happens. Then the probability that both A and B happen is

P(A|B) P(B)

but by the same token it’s also

P(B|A) P(A)

so these are equal. A little algebra gives Bayes’ Theorem:

P(A|B) = P(B|A) P(A) / P(B)

If for some reason we know everything on the right-hand side, we can this equation to work out P(A|B), and thus update our probability for event A when we see event B happen.

For a longer explanation with examples, see:

• Eliezer Yudkowsky, An intuitive explanation of Bayes’ Theorem.

Some handy jargon: we call P(A) the prior probability of A, and P(A|B) the posterior probability.

Solomonoff Induction. Bayes’ Theorem helps us compute posterior probabilities, but where do we get the prior probabilities from? How can we guess probabilities before we’ve observed anything?

This famous puzzle led Ray Solomonoff to invent Solomonoff induction. The key new idea is algorithmic probability theory. This is a way to define a probability for any string of letters in some alphabet, where a string counts as more probable if it’s less complicated. If we think of a string as a "hypothesis"—it could be a sentence in English, or an equation—this becomes a way to formalize Occam’s razor: the idea that given two competing hypotheses, the simpler one is more likely to be true.

So, algorithmic probability lets us define a prior probability distribution on hypotheses, the so-called “simplicity prior”, that implements Occam’s razor.

More precisely, suppose we have a special programming language where:

  1. Computer programs are written as strings of bits.

  2. They contain a special bit string meaning “END” at the end, and nowhere else.

  3. They don’t take an input: they just run and either halt and print out a string of letters, or never halt.

Then to get the algorithmic probability of a string of letters, we take all programs that print out that string and add up

2-length of program

So, you can see that a string counts as more probable if it has more short programs that print it out.

Kolmogorov complexity. The Kolmologorov complexity of a string of letters is the length of the shortest program that prints it out, where programs are written in a special language as described above. This is a way of measuring how complicated a string is. It’s closely related to the algorithmic entropy: the difference between the Kolmogorov complexity of a string and minus the logarithm of its algorithmic probability is bounded by a constant, if we take logarithms using base 2. For more on all this stuff, see:

• M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity Theory and its Applications, Springer, Berlin, 2008.

Halting Oracle. Alas, the algorithmic probability of a string is not computable. Why? Because to compute it, you’d need to go through all the programs in your special language that print out that string and add up a contribution from each one. But to do that, you’d need to know which programs halt—and there’s no systematic way to answer that question, which is called the halting problem.

But, we can pretend! We can pretend we have a magic box that will tell us whether any program in our special language halts. Computer scientists call any sort of magic box that answers questions an oracle. So, our particular magic box called a halting oracle.

AIXI. AIXI is Marcus Hutter’s attempt to define an agent that "behaves optimally in any computable environment". Since AIXI relies on the idea of algorithmic probability, you can’t run AIXI on a computer unless it has infinite computer power and—the really hard part—access to a halting oracle. However, Hutter has also defined computable approximations to AIXI. For a quick intro, see this:

• Marcus Hutter, Universal intelligence: a mathematical top-down approach.

For more, try this:

• Marcus Hutter, Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability, Springer, Berlin, 2005.

Utility. Utility is a hypothetical numerical measure of satisfaction. If you know the probabilities of various outcomes, and you know what your utility will be in each case, you can compute your "expected utility" by taking the probabilities of the different outcomes, multiplying them by the corresponding utilities, and adding them up. In simple terms, this is how happy you’ll be on average. The expected utility hypothesis says that a rational decision-maker has a utility function and will try to maximize its expected utility.

Bounded Rationality. In the real world, any decision-maker has limits on its computational power and the time it has to make a decision. The idea that rational decision-makers "maximize expected utility" is oversimplified unless it takes this into account somehow. Theories of bounded rationality try to take these limitations into account. One approach is to think of decision-making as yet another activity whose costs and benefits must be taken into account when making decisions. Roughly: you must decide how much time you want to spend deciding. Of course, there’s an interesting circularity here.

Black Swan. According to Nassim Taleb, human history is dominated by black swans: important events that were unpredicted and indeed unpredictable, but rationalized by hindsight and thus made to seem as if they could have been predicted. He believes that rather than trying to predict such events (which he considers largely futile), we should try to get good at adapting to them. For more see:

• Nassim Taleb, The Black Swan: The Impact of the Highly Improbable, Random House, New York, 2007.

The first principle is that you must not fool yourself—and you are the easiest person to fool. – Richard Feynman

30 Responses to This Week’s Finds (Week 312)

  1. John F says:

    So, an “oops” moment may be a better signal of AI self-discovery than an “aha” moment? Aha! Wait, – oops.

    Although some of its priors will have to be inductively inferred, especially for new hypotheses, most ab initio priors can be spoon fed on bootup. It would be interesting to see if there is a maximal set of priors that doesn’t incorporate many biases.

    I think most self-destructive behavior can be avoided by hard coding, as a bias. It doesn’t have to be quite as explicit as “avoid self destruction”; maybe “it is always probably a bad idea to diminish the ability to compute probabilities”.

    The HAL problem in 2001 stemmed from the overconfidence bias – which was empirically derived: “We are all, by any practical definition of the words, foolproof and incapable of error.” So then when it made an error, it flawlessly inferred that it should be paranoid, since it was therefore the humans’ fault.

    An unsure AI doesn’t have to do nothing. For example, suppose it is in a large vehicle, and it encounters a sodden bundle on the highway. It is prudent to stop, but the choices are never just “keep going and run it over” or “stop and wait”. There is always also “poke it with a stick”.

  2. streamfortyseven says:

    I’ll let the experts in algorithmics argue about whether the modified AIXI approach is feasible in finite time. It still looks like a rule-based AI which changes its rules in a predetermined manner and thus has neither intentionality nor volition of its own.

    As to the Titanic, the statement that “the Titanic sank from managerial hubris and cutthroat cost competition, not engineering hubris…. The part about the Titanic being extremely safe was pure marketing literature” is counterfactual.

    On the contrary, an engineering study done in 1998 stated that “[t]he Titanic was also equipped with the ultimate in turn-of-the-century design and technology, including sixteen major watertight compartments in her lower section that could easily be sealed off in the event of a punctured hull.” This study found, by means of testing a piece of the steel recovered from the wreckage, that the steel used in the hull plating had high sulfur content and was prone to brittle fracture and not ductile deformation at the low temperatures that could be expected to exist in an ice field:

    “The failure of the hull steel resulted from brittle fractures caused by the high sulphur content of the steel, the low temperature water on the night of the disaster, and the high impact loading of the collision with the iceberg. When the Titanic hit the iceberg, the hull plates split open and continued cracking as the water flooded the ship. Low water temperatures and high impact loading also caused the brittle failure of the rivets used to fasten the hull plates to the ship’s main structure. On impact, the rivets were either sheared off or the heads popped off because of excessive loading, which opened up riveted seams.” (see

    Given the technology of the time, this failure mode could not have been easily foreseen. From an extensive analysis done by another group, “[t]he steel used in constructing the RMS Titanic was probably the best plain carbon ship plate available in the period of 1909 to 1911 … If the Titanic had not collided with the iceberg, it could have had a career of more than 20 years as the Olympic had. It was built of similar steel, in the same shipyard, and from the same design. The only difference was a big iceberg.” (see

    The fact of the matter is that it was neither managerial nor engineering hubris that caused the collision and the sinking, nor cost-cutting on the part of the White Star Line. The safety features were the best known at the time, and the materials used were the best available at the time.

    • John Baez says:

      Fascinating, streamfortyseven!

      Are these points universally accepted, or controversial? I can easily imagine people using the Titanic as a kind of football to push different philosophies of risk management, engineering, etcetera… sort of like how other people continue to re-argue the case of Alger Hiss. But I have no idea if they actually do.

      • streamfortyseven says:

        The best steel made at that time tended to be high-sulfur steel, which was also easier to machine, i.e. drill rivet holes in. There were 3 million rivets in the Titanic, roughly, and the technology for bending 3/4 inch sheet steel was relatively crude as well. This kind of steel has been shown by engineering studies done in the 1990s, more than 80 years after the sinking, to fracture and break in cold water, rather than bend. The studies showing this are not controversial, and this is the most accepted reason for the failure of Titanic’s hull. Titanic’s sister ship, the Olympic, built on very close to the same plan by the same shipyard and using the same steel, was in service for at least 20 years, the sole difference between the two is that the Olympic did not ever hit an iceberg.

        Double hulls with this kind of steel may not have saved the day; any collision with this kind of steel would result in fracturing, not ductile deformation. That’s a matter for speculation, but the industry from then on made double hulls a requirement, amongst many other safety improvements.

      • streamfortyseven says:

        The other trouble is that people tend to make analyses of historical events based on information that they have available at present, and do not take into consideration that the information available at the time of the event was much more limited, of a different character, was inaccurate or incomplete, or just plain wrong.

        It’s true, no one would build a ship like Titanic today, using high-sulfur steel, a single hull, and bulkheads that barely topped the waterline. However, given the state of engineering knowledge of 1909 to 1911, it was state-of-the-art technology. They didn’t have the testing ability to find out otherwise and they had no clue that such a catastrophic failure could occur, that the steel of the hull plating would break like a porcelain teacup on a glancing blow from an iceberg. The fact that more than 80 years had to pass before the true cause of the structural failure was found should give some idea of the difficulty of the problem – which wasn’t directly solved by postmortem examination, but by adoption of an entirely different way of making steel, and the use of high-carbon, low-sulfur steel for hull plating.

    • Carlos says:

      “has neither intentionality nor volition of its own”

      Can you define intentionality or volition in a way that doesn’t prescribe, prima facie, that silicon-based intelligence, theoretical however they might be, could have intentionality or volition? From your message, it would seem not, since by definition anything we could possibly program could be seen as constrained to obeying rules.

      As a result, the complaint about intentionality or volition feels very much like meat chauvinism.

      • streamfortyseven says:

        “meat chauvinism”? Is this the beginning of a civil rights complaint from the Association for Computing Machinery? ;-)

        I’m not sure how to parse your question, but these might provide some answers:



        • John Furey says:

          Previously I said it seemed to me that mere diligence met the externally observable definitions of volition you proffered. I added that irrational preference seemed to be the missing ingredient. Any thoughts? How would one externally observe a lack of “predetermined manner”, if such lack is required.

        • streamfortyseven says:

          Who decides what is “rational”? If we say that “rational” preferences maximize the probability of positive reinforcement, then we’ve got to wait until the end of the first iteration of the game, so to speak, before we can begin to collect information on this, and we’ve got to run the game numerous times to figure out the probablilities of positive reinforcement – but before any of this happens, we have to know what “positive reinforcement” means.

          Mere diligence is the act of doing the same thing over and over; irrational preference is (1) that condition which exists until there is sufficient information to provide some indication of the probability of positive reinforcement given a particular input set or (2) that condition which exists when the probability of positive reinforcement given a particular input set is known and is not taken into account; ignorance is an example of (1) and bluffing or creativity may be an example of (2).

          Diligence (or perseverance), the repeating of a certain set of actions may be an indicator of volition if there is a probability greater than 50% (say) that the end result is positive reinforcement, but may not be an indicator of anything if the end result is not positive reinforcement. The first case might be a learned behavior conditioned by positive reinforcement, the second case might be that of unconscious repetitive movements as seen in epileptic seizures or tardive dyskinesia.

        • John Furey says:

          No, diligence is not mindless repetition. It is perseverence in accomplishing.

          To be diligent one must know what to accomplish, and know whether you are accomplishing it. The notion is like a journey.

          Anyway a robot can be diligent without volition. For instance a Roomba with ordinary sensors can diligent about patroling a certain room, and with additional sensors can be diligent about many things.

  3. Only a tangential comment: It’s interesting how current nuclear crisis in Japan has its impact in Germany’s public debate on nuclear power. I tend to have an opinion along the lines “I can’t possibly imagine what kind of accident could happen here that would cause radioactive leakage”, whereas many people say something like “We don’t care you can’t imagine. We also can’t imagine such a thing, but we want the nuclear plants shut just to be sure.” After reading the interview, and steamfortyseven’s comment on Titanic, I’m pondering whether this isn’t actually a sensible approach of “avoiding black swans”.

    But as a matter of fact, I’m still waiting for the final assessment of the Japanese nuclear crisis, hoping that it won’t cause any substantial permanent damage.

    • John Baez says:

      I’m very interested in the debate about nuclear power, and how the Japanese tsunami will affect this debate. The issue of “black swans” is very relevant! But I hope people to discuss tsunami-related issues here, where I’ve summarized some news about the tsunami, the problems with the Fukushima reactors, and the many deaths that people in the West seem much less interested in.

  4. […] 3/14/11: Part two of the interview is […]

  5. streamfortyseven says:

    EY writes that “Marcus Hutter has a formalism called AIXI which combines all three to write out an AI as a single equation which requires infinite computing power plus a halting oracle to run. … Hutter sees AIXI as important because he thinks it’s a theoretical solution to almost all of the important problems; I see AIXI as important because it demarcates the line between things that we understand in a fundamental sense and a whole lot of other things we don’t.”

    Hutter presents AIXI as a sort of “chronological Turing machine” which uses reinforcement learning to modify itself to enhance the probability of getting a positive result. The math which he uses, to the limited extent I can understand it, looks quite similar to that used in connectionist learning machines, and specifically, to that used in backpropagation neural networks (see and The trouble with this is that backprop networks are well-specified only for “toy” problems and are nearly impossible, if not wholly impossible, to scale up to real problems, which Hutter states is a problem for his more limited version of AIXI as well (see at p57, “A direct implementation of the AIXItl model is, at best, possible for small-scale (toy) environments due to the large factor 2l in computation time.”)

    • John Baez says:

      Right, I don’t see AIXI and its computable variants as practical systems, but rather as tools to investigate certain conceptual points.

      Any human or animal has a vast supply of built-in tendencies at birth which are adapted to the environment it finds itself in: to take a random example, the part of our brain called the fusiform face area seems dedicated to recognizing faces and perhaps categorizing other types of visual stimuli. AIXI is much closer to the tabula rasa of Locke’s naive epistemology: in plain English, a blank slate.

      This is why the whole question of prior probabilities is so important for AIXI: how can you assign probabilities to hypotheses before you have any experience of the world? And that’s where Solomonoff induction comes into play: you can assign probabilities by saying that simpler hypotheses count as more probable.

      So, AIXI would start life as a radically unspecialized being, much more of a blank slate than a human baby or a baby fruit fly. Only after long exposure to some particular ‘environment’ would it have a chance to start seeming ‘intelligent’ in the ordinary sense.

      Since I haven’t read Hutter’s book, I don’t know if he’s done any experiments with AIXI’s computable variants in interesting ‘environments’. Given the very slow run-time you mention, it might be impractical. Someone should try versions that cut even more corners and does things worse but faster.

      Yudkowsky said:

      Hutter sees AIXI as important because he thinks it’s a theoretical solution to almost all of the important problems; I see AIXI as important because it demarcates the line between things that we understand in a fundamental sense and a whole lot of other things we don’t.

      I’m on Yudkowsky’s side here. Personally I’d emphasize that we understand a few things quite well and a whole lot of things not very well at all. And to me, this makes the task of constructing really interesting AI seem like a very large task that will take a long time.

      • Actually, I would expect the true infinite-computing-power version of AIXI to learn extremely fast. See That Alien Message for details.

      • DavidTweed says:

        As I mentioned last week, it’s far from clear that the first “semi-independent” AI’s will be logical machines that have been constructed, simply because it’s unclear that the number of (unaugmented) human beings devoted to the task will be able to keep all of the details in their “collective” heads. As a datapoint, one of the recent areas of work in language compilation technology is applying pattern recognition techniques to choosing optimatisations (selected from a set which are known to preserve correctness), because the complexity of the relatively low-power (compared to something that could host a true AI) chips+memory is exceeding human ability to design/test suitable optimisation strategies.

        It might, as pointed out, have the issue that we won’t know what kind of attitudes the AI might have if they are produced from incremental advances of pattern based systems, but I find it difficult to believe the “designed logic” approach will get there even close to the timescale of that approach.

      • streamfortyseven says:

        From what I can gather from this paper, On the Existence and Convergence of Computable Universal Priors∗. Marcus Hutter. IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland, available at, Table 5 on page 8 seems to suggest that the task of constructing really interesting AI will take a really long time…

        • Thomas says:

          “really interesting AI will take a really long time…” Not only that: Nahm estimates 150 years until “we really understand what is going on at the basic scale” in physics.

        • John Baez says:

          Thomas wrote:

          Not only that: Nahm estimates 150 years until “we really understand what is going on at the basic scale” in physics.

          Of course, that’s not particularly relevant to the question of AI: we don’t need to understand fundamental physics to develop AI.

          But I essentially agree with Nahm. Though I’d be loath to put a date on how long it’ll take to figure out the true laws of physics, there are lots of very basic important things that we’re completely clueless about. Why do elementary particles have mass at all, let alone the specific masses they have? What is most of the universe made of? (Not, apparently, the kinds of matter we understand.) Why does space have 3 dimensions? Etcetera.

          For a more complete list of puzzles, see:

          Open questions in physics, The Physics FAQ.

          (This needs to be updated a bit!)

  6. Oh, the thing called “rationalization” above has variously been called “rhetoric” and “sophistry” in the past, and is well known to historians.

    If I may interject one small caution, I suspect that the rarity of trained rationality is significant of the nature of human thinking: it isn’t rational, and for good reason: careful reason can easily take too much time! This is why, for instance, Aristotle and Thomas (that’s Aquinas…) argue in favour of virtue; the habit of acting ethically formed through long deliberate practice.

    It seems to me — and I may be wrong — that this art of rationality is essentially the ideal object of a classical Liberal Arts education — whether with a platonic academy or under Aristotle’s tutelage or in the University of Paris (the old one, that doesn’t exist anymore).

    In this recent period of apparently universal literacy it seems to me appalling that rational thought isn’t given formal attention in grade school curricula. You can tell this has been a problem for a while, when (for instance) music faculties at BIG modern universities stop teaching Gregorian chant on the grounds that “the students don’t find it accessible”. I have heard tell of such horrors near here, and can only wonder “since when is it fitting in a University to meet students where they are?” (A course has to, by nature, start where the students are, but the aim of a course has to be somewhere ELSE). Anyways. </rant>

  7. Roger Witte says:

    I have often though that there is a reasonable possibility that the first AI might not arise from people trying to build an AI at all. As both cloud computing and computer viruses advance, we may get virus style processes wandering around trying to steal the resources they need (processor time, storage space) from the cloud without being detected and deleted…Once we have the ingredients of competition, imperfect reproduction and natural selection, Darwinian evolution starts. If intelligence helps such a program or process prosper secretively (which it might) then it could happen without us noticing.

    • streamfortyseven says:

      Yeah, but will it stop the human operator from kill -9’ing the processes? Will it negotiate? “Dave, why are you doing this?”

      • Giampiero Campa says:

        Maybe they will perform ritual kill -9’ing of carefully selected and freshly spawned nice processes in an effort to please the gods.

    • John Baez says:

      This sort of Darwinian evolution may not quickly lead to ‘intelligence’ of the sort we humans pride ourselves on, but it could lead to prerequisites that are quite significant, like: the ability to distinguish between ‘self’ and ‘non-self’, the ability to sense and react to a wide variety of threats, the ability to cooperate, and so on.

  8. […] EY: I’d say that there are parts of rationality that we do understand […]

  9. […] I’d say that there are parts of rationality that we do understand very well in principle. Bayes’ Theorem, the expected utility formula, and Solomonoff induction between them will get you quite a long […]

  10. […] say that there are parts of rationality that we do understand very well in principle. Bayes’ Theorem, the expected utility formula, and Solomonoff induction between them will get you quite a long […]

  11. […] Last week I attended the Machine Intelligence Research Institute’s sixth Workshop on Logic, Probability, and Reflection. You may know this institute under their previous name: the Singularity Institute. It seems to be the brainchild of Eliezer Yudkowsky, a well-known advocate of ‘friendly artificial intelligence’, whom I interviewed in week311, week312 and week313 of This Week’s Finds. […]

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.