This Week’s Finds (Week 313)

25 March, 2011

Here’s the third and final part of my interview with Eliezer Yudkowsky. We’ll talk about three big questions… roughly these:

• How do you get people to work on potentially risky projects in a safe way?

• Do we understand ethics well enough to build “Friendly artificial intelligence”?

• What’s better to work on, artificial intelligence or environmental issues?

So, with no further ado:

JB: There are decent Wikipedia articles on “optimism bias” and “positive illusions”, which suggest that unrealistically optimistic people are more energetic, while more realistic estimates of success go hand-in-hand with mild depression. If this is true, I can easily imagine that most people working on challenging projects like quantum gravity (me, 10 years ago) or artificial intelligence (you) are unrealistically optimistic about our chances of success.

Indeed, I can easily imagine that the first researchers to create a truly powerful artificial intelligence will be people who underestimate its potential dangers. It’s an interesting irony, isn’t it? If most people who are naturally cautious avoid a certain potentially dangerous line of research, the people who pursue that line of research are likely to be less cautious than average.

I’m a bit worried about this when it comes to “geoengineering”, for example—attempts to tackle global warming by large engineering projects. We have people who say “oh no, that’s too dangerous”, and turn their attention to approaches they consider less risky, but that may leave the field to people who underestimate the risks.

So I’m very glad you are thinking hard about how to avoid the potential dangers of artificial intelligence—and even trying to make this problem sound exciting, to attract ambitious and energetic young people to work on it. Is that part of your explicit goal? To make caution and rationality sound sexy?

EY: The really hard part of the problem isn’t getting a few smart people to work on cautious, rational AI. It’s admittedly a harder problem than it should be, because there’s a whole system out there which is set up to funnel smart young people into all sorts of other things besides cautious rational long-term basic AI research. But it isn’t the really hard part of the problem.

The scary thing about AI is that I would guess that the first AI to go over some critical threshold of self-improvement takes all the marbles—first mover advantage, winner take all. The first pile of uranium to have an effective neutron multiplication factor greater than 1, or maybe the first AI smart enough to absorb all the poorly defended processing power on the Internet—there’s actually a number of different thresholds that could provide a critical first-mover advantage.

And it is always going to be fundamentally easier in some sense to go straight all out for AI and not worry about clean designs or stable self-modification or the problem where a near-miss on the value system destroys almost all of the actual value from our perspective. (E.g., imagine aliens who shared every single term in the human utility function but lacked our notion of boredom. Their civilization might consist of a single peak experience repeated over and over, which would make their civilization very boring from our perspective, compared to what it might have been. That is, leaving a single aspect out of the value system can destroy almost all of the value. So there’s a very large gap in the AI problem between trying to get the value system exactly right, versus throwing something at it that sounds vaguely good.)

You want to keep as much of an advantage as possible for the cautious rational AI developers over the crowd that is just gung-ho to solve this super interesting scientific problem and go down in the eternal books of fame. Now there should in fact be some upper bound on the combination of intelligence, methodological rationality, and deep understanding of the problem which you can possess, and still walk directly into the whirling helicopter blades. The problem is that it is probably a rather high upper bound. And you are trying to outrace people who are trying to solve a fundamentally easier wrong problem. So the question is not attracting people to the field in general, but rather getting the really smart competent people to either work for a cautious project or not go into the field at all. You aren’t going to stop people from trying to develop AI. But you can hope to have as many of the really smart people as possible working on cautious projects rather than incautious ones.

So yes, making caution look sexy. But even more than that, trying to make incautious AI projects look merely stupid. Not dangerous. Dangerous is sexy. As the old proverb goes, most of the damage is done by people who wish to feel themselves important. Human psychology seems to be such that many ambitious people find it far less scary to think about destroying the world, than to think about never amounting to much of anything at all. I have met people like this. In fact all the people I have met who think they are going to win eternal fame through their AI projects have been like this. The thought of potentially destroying the world is bearable; it confirms their own importance. The thought of not being able to plow full steam ahead on their incredible amazing AI idea is not bearable; it threatens all their fantasies of wealth and fame.

Now these people of whom I speak are not top-notch minds, not in the class of the top people in mainstream AI, like say Peter Norvig (to name someone I’ve had the honor of meeting personally). And it’s possible that if and when self-improving AI starts to get real top-notch minds working on it, rather than people who were too optimistic about/attached to their amazing bright idea to be scared away by the field of skulls, then these real stars will not fall prey to the same sort of psychological trap. And then again it is also plausible to me that top-notch minds will fall prey to exactly the same trap, because I have yet to learn from reading history that great scientific geniuses are always sane.

So what I would most like to see would be uniform looks of condescending scorn directed at people who claimed their amazing bright AI idea was going to lead to self-improvement and superintelligence, but who couldn’t mount an adequate defense of how their design would have a goal system stable after a billion sequential self-modifications, or how it would get the value system exactly right instead of mostly right. In other words, making destroying the world look unprestigious and low-status, instead of leaving it to the default state of sexiness and importance-confirmingness.

JB: “Get the value system exactly right”—now this phrase touches on another issue I’ve been wanting to talk about. How do we know what it means for a value system to be exactly right? It seems people are even further from agreeing on what it means to be good than on what it means to be rational. Yet you seem to be suggesting we need to solve this problem before it’s safe to build a self-improving artificial intelligence!

When I was younger I worried a lot about the foundations of ethics. I decided that you “can’t derive an ought from an is”—do you believe that? If so, all logical arguments leading up to the conclusion that “you should do X” must involve an assumption of the form “you should do Y”… and attempts to “derive” ethics are all implicitly circular in some way. This really bothered the heck out of me: how was I supposed to know what to do? But of course I kept on doing things while I was worrying about this… and indeed, it was painfully clear that there’s no way out of making decisions: even deciding to “do nothing” or commit suicide counts as a decision.

Later I got more comfortable with the idea that making decisions about what to do needn’t paralyze me any more than making decisions about what is true. But still, it seems that the business of designing ethical beings is going to provoke huge arguments, if and when we get around to that.

Do you spend as much time thinking about these issues as you do thinking about rationality? Of course they’re linked….

EY: Well, I probably spend as much time explaining these issues as I do rationality. There are also an absolutely huge number of pitfalls that people stumble into when they try to think about, as I would put it, Friendly AI. Consider how many pitfalls people run into when they try to think about Artificial Intelligence. Next consider how many pitfalls people run into when they try to think about morality. Next consider how many pitfalls philosophers run into when they try to think about the nature of morality. Next consider how many pitfalls people run into when they try to think about hypothetical extremely powerful agents, especially extremely powerful agents that are supposed to be extremely good. Next consider how many pitfalls people run into when they try to imagine optimal worlds to live in or optimal rules to follow or optimal governments and so on.

Now imagine a subject matter which offers discussants a lovely opportunity to run into all of those pitfalls at the same time.

That’s what happens when you try to talk about Friendly Artificial Intelligence.

And it only takes one error for a chain of reasoning to end up in Outer Mongolia. So one of the great motivating factors behind all the writing I did on rationality and all the sequences I wrote on Less Wrong was to actually make it possible, via two years worth of writing and probably something like a month’s worth of reading at least, to immunize people against all the usual mistakes.

Lest I appear to dodge the question entirely, I’ll try for very quick descriptions and google keywords that professional moral philosophers might recognize.

In terms of what I would advocate programming a very powerful AI to actually do, the keywords are “mature folk morality” and “reflective equilibrium”. This means that you build a sufficiently powerful AI to do, not what people say they want, or even what people actually want, but what people would decide they wanted the AI to do, if they had all of the AI’s information, could think about for as long a subjective time as the AI, knew as much as the AI did about the real factors at work in their own psychology, and had no failures of self-control.

There’s a lot of important reasons why you would want to do exactly that and not, say, implement Asimov’s Three Laws of Robotics (a purely fictional device, and if Asimov had depicted them as working well, he would have had no stories to write) or building a superpowerful AI which obeys people’s commands interpreted in literal English, or creating a god whose sole prime directive is to make people maximally happy, or any of the above plus a list of six different patches which guarantee that nothing can possibly go wrong, and various other things that seem like incredibly obvious failure scenarios but which I assure you I have heard seriously advocated over and over and over again.

In a nutshell, you want to use concepts like “mature folk morality” or “reflective equilibrium” because these are as close as moral philosophy has ever gotten to defining in concrete, computable terms what you could be wrong about when you order an AI to do the wrong thing.

For an attempt at nontechnical explanation of what one might want to program an AI to do and why, the best resource I can offer is an old essay of mine which is not written so as to offer good google keywords, but holds up fairly well nonetheless:

• Eliezer Yudkowsky, Coherent extrapolated volition, May 2004.

You also raised some questions about metaethics, where metaethics asks not “Which acts are moral?” but “What is the subject matter of our talk about ‘morality’?” i.e. “What are we talking about here anyway?” In terms of Google keywords, my brand of metaethics is closest to analytic descriptivism or moral functionalism. If I were to try to put that into a very brief nutshell, it would be something like “When we talk about ‘morality’ or ‘goodness’ or ‘right’, the subject matter we’re talking about is a sort of gigantic math question hidden under the simple word ‘right’, a math question that includes all of our emotions and all of what we use to process moral arguments and all the things we might want to change about ourselves if we could see our own source code and know what we were really thinking.”

The complete Less Wrong sequence on metaethics (with many dependencies to earlier ones) is:

• Eliezer Yudkowsky, Metaethics sequence, Less Wrong, 20 June to 22 August 2008.

And one of the better quick summaries is at:

• Eliezer Yudkowsky, Inseparably right; or, joy in the merely good, Less Wrong, 9 August 2008.

And if I am wise I shall not say any more.

JB: I’ll help you be wise. There are a hundred followup questions I’m tempted to ask, but this has been a long and grueling interview, so I won’t. Instead, I’d like to raise one last big question. It’s about time scales.

Self-improving artificial intelligence seems like a real possibility to me. But when? You see, I believe we’re in the midst of a global ecological crisis—a mass extinction event, whose effects will be painfully evident by the end of the century. I want to do something about it. I can’t do much, but I want to do something. Even if we’re doomed to disaster, there are different sizes of disaster. And if we’re going through a kind of bottleneck, where some species make it through and others go extinct, even small actions now can make a difference.

I can imagine some technological optimists—singularitarians, extropians and the like—saying: “Don’t worry, things will get better. Things that seem hard now will only get easier. We’ll be able to suck carbon dioxide from the atmosphere using nanotechnology, and revive species starting from their DNA.” Or maybe even: “Don’t worry: we won’t miss those species. We’ll be having too much fun doing things we can’t even conceive of now.”

But various things make me skeptical of such optimism. One of them is the question of time scales. What if the world goes to hell before our technology saves us? What if artificial intelligence comes along toolate to make a big impact on the short-term problems I’m worrying about? In that case, maybe I should focus on short-term solutions.

Just to be clear: this isn’t some veiled attack on your priorities. I’m just trying to decide on my own. One good thing about having billions of people on the planet is that we don’t all have to do the same thing. Indeed, a multi-pronged approach is best. But for my own decisions, I want some rough guess about how long various potentially revolutionary technologies will take to come online.

What do you think about all this?

EY: I’ll try to answer the question about timescales, but first let me explain in some detail why I don’t think the decision should be dominated by that question.

If you look up “Scope Insensitivity” on Less Wrong, you’ll see that when three different groups of subjects were asked how much they would pay in increased taxes to save 2,000 / 20,000 / 200,000 birds from drowning in uncovered oil ponds, the respective average answers were $80 / $78 / $88. People asked questions like this visualize one bird, wings slicked with oil, struggling to escape, and that creates some amount of emotional affect which determines willingness to pay, and the quantity gets tossed out the window since no one can visualize 200,000 of anything. Another hypothesis to explain the data is “purchase of moral satisfaction”, which says that people give enough money to create a “warm glow” inside themselves, and the amount required might have something to do with your personal financial situation, but it has nothing to do with birds. Similarly, residents of four US states were only willing to pay 22% more to protect all 57 wilderness areas in those states than to protect one area. The result I found most horrifying was that subjects were willing to contribute more when a set amount of money was needed to save one child’s life, compared to the same amount of money saving eight lives—because, of course, focusing your attention on a single person makes the feelings stronger, less diffuse.

So while it may make sense to enjoy the warm glow of doing good deeds after we do them, we cannot possibly allow ourselves to choose between altruistic causes based on the relative amounts of warm glow they generate, because our intuitions are quantitatively insane.

And two antidotes that absolutely must be applied in choosing between altruistic causes are conscious appreciation of scope and conscious appreciation of marginal impact.

By its nature, your brain flushes right out the window the all-important distinction between saving one life and saving a million lives. You’ve got to compensate for that using conscious, verbal deliberation. The Society For Curing Rare Diseases in Cute Puppies has got great warm glow, but the fact that these diseases are rare should call a screeching halt right there—which you’re going to have to do consciously, not intuitively. Even before you realize that, contrary to the relative warm glows, it’s really hard to make a moral case for trading off human lives against cute puppies. I suppose if you could save a billion puppies using one dollar I wouldn’t scream at someone who wanted to spend the dollar on that instead of cancer research.

And similarly, if there are a hundred thousand researchers and billions of dollars annually that are already going into saving species from extinction—because it’s a prestigious and popular cause that has an easy time generating warm glow in lots of potential funders—then you have to ask about the marginal value of putting your effort there, where so many other people are already working, compared to a project that isn’t so popular.

I wouldn’t say “Don’t worry, we won’t miss those species”. But consider the future intergalactic civilizations growing out of Earth-originating intelligent life. Consider the whole history of a universe which contains this world of Earth and this present century, and also billions of years of future intergalactic civilization continuing until the universe dies, or maybe forever if we can think of some ingenious way to carry on. Next consider the interval in utility between a universe-history in which Earth-originating intelligence survived and thrived and managed to save 95% of the non-primate biological species now alive, versus a universe-history in which only 80% of those species are alive. That utility interval is not very large compared to the utility interval between a universe in which intelligent life thrived and intelligent life died out. Or the utility interval between a universe-history filled with sentient beings who experience happiness and have empathy for each other and get bored when they do the same thing too many times, versus a universe-history that grew out of various failures of Friendly AI.

(The really scary thing about universes that grow out of a loss of human value is not that they are different, but that they are, from our standpoint, boring. The human utility function says that once you’ve made a piece of art, it’s more fun to make a different piece of art next time. But that’s just us. Most random utility functions will yield instrumental strategies that spend some of their time and resources exploring for the patterns with the highest utility at the beginning of the problem, and then use the rest of their resources to implement the pattern with the highest utility, over and over and over. This sort of thing will surprise a human who expects, on some deep level, that all minds are made out of human parts, and who thinks, “Won’t the AI see that its utility function is boring?” But the AI is not a little spirit that looks over its code and decides whether to obey it; the AI is the code. If the code doesn’t say to get bored, it won’t get bored. A strategy of exploration followed by exploitation is implicit in most utility functions, but boredom is not. If your utility function does not already contain a term for boredom, then you don’t care; it’s not something that emerges as an instrumental value from most terminal values. For more on this see: “In Praise of Boredom” in the Fun Theory Sequence on Less Wrong.)

Anyway: In terms of expected utility maximization, even large probabilities of jumping the interval between a universe-history in which 95% of existing biological species survive Earth’s 21st century, versus a universe-history where 80% of species survive, are just about impossible to trade off against tiny probabilities of jumping the interval between interesting universe-histories, versus boring ones where intelligent life goes extinct, or the wrong sort of AI self-improves.

I honestly don’t see how a rationalist can avoid this conclusion: At this absolutely critical hinge in the history of the universe—Earth in the 21st century—rational altruists should devote their marginal attentions to risks that threaten to terminate intelligent life or permanently destroy a part of its potential. Those problems, which Nick Bostrom named “existential risks“, have got all the scope. And when it comes to marginal impact, there are major risks outstanding that practically no one is working on. Once you get the stakes on a gut level it’s hard to see how doing anything else could be sane.

So how do you go about protecting the future of intelligent life? Environmentalism? After all, there are environmental catastrophes that could knock over our civilization… but then if you want to put the whole universe at stake, it’s not enough for one civilization to topple, you have to argue that our civilization is above average in its chances of building a positive galactic future compared to whatever civilization would rise again a century or two later. Maybe if there were ten people working on environmentalism and millions of people working on Friendly AI, I could see sending the next marginal dollar to environmentalism. But with millions of people working on environmentalism, and major existential risks that are completely ignored… if you add a marginal resource that can, rarely, be steered by expected utilities instead of warm glows, devoting that resource to environmentalism does not make sense.

Similarly with other short-term problems. Unless they’re little-known and unpopular problems, the marginal impact is not going to make sense, because millions of other people will already be working on them. And even if you argue that some short-term problem leverages existential risk, it’s not going to be perfect leverage and some quantitative discount will apply, probably a large one. I would be suspicious that the decision to work on a short-term problem was driven by warm glow, status drives, or simple conventionalism.

With that said, there’s also such a thing as comparative advantage—the old puzzle of the lawyer who works an hour in the soup clinic instead of working an extra hour as a lawyer and donating the money. Personally I’d say you can work an hour in the soup clinic to keep yourself going if you like, but you should also be working extra lawyer-hours and donating the money to the soup clinic, or better yet, to something with more scope. (See “Purchase Fuzzies and Utilons Separately” on Less Wrong.) Most people can’t work effectively on Artificial Intelligence (some would question if anyone can, but at the very least it’s not an easy problem). But there’s a variety of existential risks to choose from, plus a general background job of spreading sufficiently high-grade rationality and existential risk awareness. One really should look over those before going into something short-term and conventional. Unless your master plan is just to work the extra hours and donate them to the cause with the highest marginal expected utility per dollar, which is perfectly respectable.

Where should you go in life? I don’t know exactly, but I think I’ll go ahead and say “not environmentalism”. There’s just no way that the product of scope, marginal impact, and John Baez’s comparative advantage is going to end up being maximal at that point.

Which brings me to AI timescales.

If I knew exactly how to make a Friendly AI, and I knew exactly how many people I had available to do it, I still couldn’t tell you how long it would take because of Product Management Chaos.

As it stands, this is a basic research problem—which will always feel very hard, because we don’t understand it, and that means when our brain checks for solutions, we don’t see any solutions available. But this ignorance is not to be confused with the positive knowledge that the problem will take a long time to solve once we know how to solve it. It could be that some fundamental breakthrough will dissolve our confusion and then things will look relatively easy. Or it could be that some fundamental breakthrough will be followed by the realization that, now that we know what to do, it’s going to take at
least another 20 years to do it.

I seriously have no idea when AI is going to show up, although I’d be genuinely and deeply shocked if it took another century (barring a collapse of civilization in the meanwhile).

If you were to tell me that as a Bayesian I have to put probability distributions on things on pain of having my behavior be inconsistent and inefficient, well, I would actually suspect that my behavior is inconsistent. But if you were to try and induce from my behavior a median expected time where I spend half my effort planning for less and half my effort planning for more, it would probably look something like 2030.

But that doesn’t really matter to my decisions. Among all existential risks I know about, Friendly AI has the single largest absolute scope—it affects everything, and the problem must be solved at some point for worthwhile intelligence to thrive. It also has the largest product of scope of marginal impact, because practically no one is working on it, even compared to other existential risks. And my abilities seem applicable to it. So I may not like my uncertainty about timescales, but my decisions are not unstable with respect to that uncertainty.

JB: Ably argued! If I think of an interesting reply, I’ll put it in the blog discussion. Thanks for your time.


The best way to predict the future is to invent it. – Alan Kay


Energy, the Environment, and What Mathematicians Can Do (Part 2)

20 March, 2011

A couple of days ago I begged for help with a math colloquium talk I’m giving this Wednesday at Hong Kong University.

The response was immediate and wonderfully useful. Thanks, everyone! If my actual audience is as knowledgeable and critical as you folks, I’ll be shocked and delighted.

But I only showed you the first part of the talk… because I hadn’t written the second part yet! And the second part is the hard part: it’s about “what mathematicians can do”.

Here’s a version including the second part:

Energy, the Environment, and What Mathematicians Can Do.

I include just one example of what you’re probably dying to see: a mathematician proving theorems that are relevant to environmental and energy problems. And you’ll notice that this guy is not doing work that will directly help solve these problems.

That’s sort of on purpose: I think we mathematicians sit sort of near the edge of the big conversation about these problems. We do important things, now and then, but their importance tends to be indirect. And I think that’s okay.

But it’s also a bit unsatisfying. What’s your most impressive example of a mathematically exciting result that also directly impacts environmental and energy issues?

I have a bunch of my own examples, but I’d like to hear yours. I want to start creating a list.

(By the way: research is just part of the story! One of the easier ways mathematicians can help save the planet is to teach well. And I do discuss that.)


Mathematics of Planet Earth

20 March, 2011

While struggling to prepare my talk on “what mathematicians can do”, I remembered this website pointed out by Tom Leinster:

Mathematics of Planet Earth 2013.

The idea is to get lots of mathematicians involved in programs on these topics:

• Weather, climate, and environment
• Health, human and social services
• Planetary resources
• Population dynamics, ecology and genomics of species
• Energy utilization and efficiency
• Connecting the planet together
• Geophysical processes
• Global economics, safety and stability

There are already a lot of partner societies (including the American Mathematical Society) and partner institutes. I would love to see more details, but this website seems directed mainly at getting more organizations involved, rather than saying what any of them are going to do.

There is a call for proposals, but it’s a bit sketchy. It says:

A call to join is sent to the planet.

which makes me want to ask “From where?”

(That must be why I’m sitting here blogging instead of heading an institute somewhere. I never fully grew up.)

I guess the details will eventually become clearer. Does anyone know some activities that have been planned?


Energy, the Environment, and What Mathematicians Can Do (Part 1)

18 March, 2011

I’m preparing a talk to give at Hong Kong University next week. It’s only half done, but I could use your feedback on this part while I work on the rest:

Energy, The Environment, and What Mathematicians Can Do.

So far it makes a case for why mathematicians should get involved in these issues… but doesn’t say what they can to help! That’ll be the second part. So, you’ll just have to bear with the suspense for now.

By the way, all the facts and graphs should have clickable links that lead you to online references. The links aren’t easy to see, but if you hover the cursor over a fact or graph, and click, it should work.


This Week’s Finds (Week 312)

14 March, 2011

This is the second part of my interview with Eliezer Yudkowsky. If you click on some technical terms here, you’ll go down to a section where I explain them.

JB: You’ve made a great case for working on artificial intelligence—and more generally, understanding how intelligence works, to figure out how we can improve it. It’s especially hard to argue against studying rationality. Even most people who doubt computers will ever get smarter will admit the possibility that people can improve. And it seems clear that the almost every problem we face could benefit from better thinking.

I’m intrigued by the title The Art of Rationality because it suggests that there’s a kind of art to it. We don’t know how to teach someone to be a great artist, but maybe we can teach them to be a better artist. So, what are some of the key principles when it comes to thinking better?

EY: Stars above, what an open-ended question. The idea behind the book is to explain all the drop-dead basic fundamentals that almost no one seems to know about, like what is evidence, what is simplicity, what is truth, the importance of actually changing your mind now and then, the major known cognitive biases that stop people from changing their minds, what it means to live in a universe where things are made of parts, and so on. This is going to be a book primarily aimed at people who are not completely frightened away by complex mathematical concepts such as addition, multiplication, and division (i.e., all you need to understand Bayes’ Theorem if it’s explained properly), albeit with the whole middle of the book being just practical advice based on cognitive biases for the benefit of people who don’t want to deal with multiplication and division. Each chapter is going to address a different aspect of rationality, not in full textbook detail, just enough to convey the sense of a concept, with each chapter being around 5-10,000 words broken into 4-10 bite-size sections of 500-2000 words each. Which of the 27 currently planned book chapters did you want me to summarize?

But if I had to pick just one thing, just one concept that’s most important, I think it would be the difference between rationality and rationalization.

Suppose there’s two boxes, only one of which contains a diamond. And on the two boxes there are various signs and portents which distinguish, imperfectly and probabilistically, between boxes which contain diamonds, and boxes which don’t. I could take a sheet of paper, and I could write down all the signs and portents that I understand, and do my best to add up the evidence, and then on the bottom line I could write, "And therefore, there is a 37% probability that Box A contains the diamond." That’s rationality. Alternatively, I could be the owner of Box A, and I could hire a clever salesman to sell Box A for the highest price he can get; and the clever salesman starts by writing on the bottom line of his sheet of paper, "And therefore, Box A contains the diamond", and then he writes down all the arguments he can think of on the lines above.

But consider: At the moment the salesman wrote down the bottom line on that sheet of paper, the truth or falsity of the statement was fixed. It’s already right or already wrong, and writing down arguments on the lines above isn’t going to change that. Or if you imagine a spread of probable worlds, some of which have different boxes containing the diamond, the correlation between the ink on paper and the diamond’s location became fixed at the moment the ink was written down, and nothing which doesn’t change the ink or the box is going to change that correlation.

That’s "rationalization", which should really be given a name that better distinguishes it from rationality, like "anti-rationality" or something. It’s like calling lying "truthization". You can’t make rational what isn’t rational to start with.

Whatever process your brain uses, in reality, to decide what you’re going to argue for, that’s what determines your real-world effectiveness. Rationality isn’t something you can use to argue for a side you already picked. Your only chance to be rational is while you’re still choosing sides, before you write anything down on the bottom line. If I had to pick one concept to convey, it would be that one.

JB: Okay. I wasn’t really trying to get you to summarize a whole book. I’ve seen you explain a whole lot of heuristics designed to help us be more rational. So I was secretly wondering if the "art of rationality" is mainly a long list of heuristics, or whether you’ve been able to find a few key principles that somehow spawn all those heuristics.

Either way, it could be a tremendously useful book. And even if you could distill the basic ideas down to something quite terse, in practice people are going to need all those heuristics—especially since many of them take the form "here’s something you tend to do without noticing you’re doing it—so watch out!" If we’re saddled with dozens of cognitive biases that we can only overcome through strenuous effort, then your book has to be long. You can’t just say "apply Bayes’ rule and all will be well."

I can see why you’d single out the principle that "rationality only comes into play before you’ve made up your mind", because so much seemingly rational argument is really just a way of bolstering support for pre-existing positions. But what is rationality? Is it something with a simple essential core, like "updating probability estimates according to Bayes’ rule", or is its very definition inherently long and complicated?

EY: I’d say that there are parts of rationality that we do understand very well in principle. Bayes’ Theorem, the expected utility formula, and Solomonoff induction between them will get you quite a long way. Bayes’ Theorem says how to update based on the evidence, Solomonoff induction tells you how to assign your priors (in principle, it should go as the Kolmogorov complexity aka algorithmic complexity of the hypothesis), and then once you have a function which predicts what will probably happen as the result of different actions, the expected utility formula says how to choose between them.

Marcus Hutter has a formalism called AIXI which combines all three to write out an AI as a single equation which requires infinite computing power plus a halting oracle to run. And Hutter and I have been debating back and forth for quite a while on which AI problems are or aren’t solved by AIXI. For example, I look at the equation as written and I see that AIXI will try the experiment of dropping an anvil on itself to resolve its uncertainty about what happens next, because the formalism as written invokes a sort of Cartesian dualism with AIXI on one side of an impermeable screen and the universe on the other; the equation for AIXI says how to predict sequences of percepts using Solomonoff induction, but it’s too simple to encompass anything as reflective as "dropping an anvil on myself will destroy that which is processing these sequences of percepts". At least that’s what I claim; I can’t actually remember whether Hutter was agreeing with me about that as of our last conversation. Hutter sees AIXI as important because he thinks it’s a theoretical solution to almost all of the important problems; I see AIXI as important because it demarcates the line between things that we understand in a fundamental sense and a whole lot of other things we don’t.

So there are parts of rationality—big, important parts too—which we know how to derive from simple, compact principles in the sense that we could write very simple pieces of code which would behave rationally along that dimension given unlimited computing power.

But as soon as you start asking "How can human beings be more rational?" then things become hugely more complicated because human beings make much more complicated errors that need to be patched on an individual basis, and asking "How can I be rational?" is only one or two orders of magnitude simpler than asking "How does the brain work?", i.e., you can hope to write a single book that will cover many of the major topics, but not quite answer it in an interview question…

On the other hand, the question "What is it that I am trying to do, when I try to be rational?" is a question for which big, important chunks can be answered by saying "Bayes’ Theorem", "expected utility formula" and "simplicity prior" (where Solomonoff induction is the canonical if uncomputable simplicity prior).

At least from a mathematical perspective. From a human perspective, if you asked "What am I trying to do, when I try to be rational?" then the fundamental answers would run more along the lines of "Find the truth without flinching from it and without flushing all the arguments you disagree with out the window", "When you don’t know, try to avoid just making stuff up", "Figure out whether the strength of evidence is great enough to support the weight of every individual detail", "Do what should lead to the best consequences, but not just what looks on the immediate surface like it should lead to the best consequences, you may need to follow extra rules that compensate for known failure modes like shortsightedness and moral rationalizing"…

JB: Fascinating stuff!

Yes, I can see that trying to improve humans is vastly more complicated than designing a system from scratch… but also very exciting, because you can tell a human a high-level principle like " "When you don’t know, try to avoid just making stuff up" and have some slight hope that they’ll understand it without it being explained in a mathematically precise way.

I guess AIXI dropping an anvil on itself is a bit like some of the self-destructive experiments that parents fear their children will try, like sticking a pin into an electrical outlet. And it seems impossible to avoid doing such experiments without having a base of knowledge that was either "built in" or acquired by means of previous experiments.

In the latter case, it seems just a matter of luck that none of these previous experiments were fatal. Luckily, people also have "built in" knowledge. More precisely, we have access to our ancestor’s knowledge and habits, which get transmitted to us genetically and culturally. But still, a fair amount of random blundering, suffering, and even death was required to build up that knowledge base.

So when you imagine "seed AIs" that keep on improving themselves and eventually become smarter than us, how can you reasonably hope that they’ll avoid making truly spectacular mistakes? How can they learn really new stuff without a lot of risk?

EY: The best answer I can offer is that they can be conservative externally and deterministic internally.

Human minds are constantly operating on the ragged edge of error, because we have evolved to compete with other humans. If you’re a bit more conservative, if you double-check your calculations, someone else will grab the banana and that conservative gene will not be passed on to descendants. Now this does not mean we couldn’t end up in a bad situation with AI companies competing with each other, but there’s at least the opportunity to do better.

If I recall correctly, the Titanic sank from managerial hubris and cutthroat cost competition, not engineering hubris. The original liners were designed far more conservatively, with triple-redundant compartmentalized modules and soon. But that was before cost competition took off, when the engineers could just add on safety features whenever they wanted. The part about the Titanic being extremely safe was pure marketing literature.

There is also no good reason why any machine mind should be overconfident the way that humans are. There are studies showing that, yes, managers prefer subordinates who make overconfident promises to subordinates who make accurate promises—sometimes I still wonder that people are this silly, but given that people are this silly, the social pressures and evolutionary pressures follow. And we have lots of studies showing that, for whatever reason, humans are hugely overconfident; less than half of students finish their papers by the time they think it 99% probable they’ll get done, etcetera.

And this is a form of stupidity an AI can simply do without. Rationality is not omnipotent; a bounded rationalist cannot do all things. But there is no reason why a bounded rationalist should ever have to overpromise, be systematically overconfident, systematically tend to claim it can do what it can’t. It does not have to systematically underestimate the value of getting more information, or overlook the possibility of unspecified Black Swans and what sort of general behavior helps to compensate. (A bounded rationalist does end up overlooking specific Black Swans because it doesn’t have enough computing power to think of all specific possible catastrophes.)

And contrary to how it works in say Hollywood, even if an AI does manage to accidentally kill a human being, that doesn’t mean it’s going to go “I HAVE KILLED” and dress up in black and start shooting nuns from rooftops. What it ought to do—what you’d want to see happen—would be for the utility function to go on undisturbed, and for the probability distribution to update based on whatever unexpected thing just happened and contradicted its old hypotheses about what does and does not kill humans. In other words, keep the same goals and say “oops” on the world-model; keep the same terminal values and revise its instrumental policies. These sorts of external-world errors are not catastrophic unless they can actually wipe out the planet in one shot, somehow.

The catastrophic sort of error, the sort you can’t recover from, is an error in modifying your own source code. If you accidentally change your utility function you will no longer want to change it back. And in this case you might indeed ask, "How will an AI make millions or billions of code changes to itself without making a mistake like that?" But there are in fact methods powerful enough to do a billion error-free operations. A friend of mine once said something along the lines of "a CPU does a mole of transistor operations, error-free, in a day" though I haven’t checked the numbers. When chip manufacturers are building a machine with hundreds of millions of interlocking pieces and they don’t want to have to change it after it leaves the factory, they may go so far as to prove the machine correct, using human engineers to navigate the proof space and suggest lemmas to prove (which AIs can’t do, they’re defeated by the exponential explosion) and complex theorem-provers to prove the lemmas (which humans would find boring) and simple verifiers to check the generated proof. It takes a combination of human and machine abilities and it’s extremely expensive. But I strongly suspect that an Artificial General Intelligence with a good design would be able to treat all its code that way—that it would combine all those abilities in a single mind, and find it easy and natural to prove theorems about its code changes. It could not, of course, prove theorems about the external world (at least not without highly questionable assumptions). It could not prove external actions correct. The only thing it could write proofs about would be events inside the highly deterministic environment of a CPU—that is, its own thought processes. But it could prove that it was processing probabilities about those actions in a Bayesian way, and prove that it was assessing the probable consequences using a particular utility function. It could prove that it was sanely trying to achieve the same goals.

A self-improving AI that’s unsure about whether to do something ought to just wait and do it later after self-improving some more. It doesn’t have to be overconfident. It doesn’t have to operate on the ragged edge of failure. It doesn’t have to stop gathering information too early, if more information can be productively gathered before acting. It doesn’t have to fail to understand the concept of a Black Swan. It doesn’t have to do all this using a broken error-prone brain like a human one. It doesn’t have to be stupid in the ways like overconfidence that humans seem to have specifically evolved to be stupid. It doesn’t have to be poorly calibrated (assign 99% probabilities that come true less that 99 out of 100 times), because bounded rationalists can’t do everything but they don’t have to claim what they can’t do. It can prove that its self-modifications aren’t making itself crazy or changing its goals, at least if the transistors work as specified, or make no more than any possible combination of 2 errors, etc. And if the worst does happen, so long as there’s still a world left afterward, it will say "Oops" and not do it again. This sounds to me like essentially the optimal scenario given any sort of bounded rationalist whatsoever.

And finally, if I was building a self-improving AI, I wouldn’t ask it to operate heavy machinery until after it had grown up. Why should it?

JB: Indeed!

Okay—I’d like to take a break here, explain some terms you used, and pick up next week with some less technical questions, like what’s a better use of time: tackling environmental problems, or trying to prepare for a technological singularity?

 
 

Some explanations

Here are some quick explanations. If you click on the links here you’ll get more details:


Cognitive Bias. A cognitive bias is a way in which people’s judgements systematically deviate from some norm—for example, from ideal rational behavior. You can see a long list of cognitive biases on Wikipedia. It’s good to know a lot of these and learn how to spot them in yourself and your friends.

For example, confirmation bias is the tendency to pay more attention to information that confirms our existing beliefs. Another great example is the bias blind spot: the tendency for people to think of themselves as less cognitively biased than average! I’m sure glad I don’t suffer from that.


Bayes’ Theorem. This is a rule for updating our opinions about probabilities when we get new information. Suppose you start out thinking the probability of some event A is P(A), and the probability of some event B is P(B). Suppose P(A|B) is the probability of event A given that B happens. Likewise, suppose P(B|A) is the probability of B given that A happens. Then the probability that both A and B happen is

P(A|B) P(B)

but by the same token it’s also

P(B|A) P(A)

so these are equal. A little algebra gives Bayes’ Theorem:

P(A|B) = P(B|A) P(A) / P(B)

If for some reason we know everything on the right-hand side, we can this equation to work out P(A|B), and thus update our probability for event A when we see event B happen.

For a longer explanation with examples, see:

• Eliezer Yudkowsky, An intuitive explanation of Bayes’ Theorem.

Some handy jargon: we call P(A) the prior probability of A, and P(A|B) the posterior probability.


Solomonoff Induction. Bayes’ Theorem helps us compute posterior probabilities, but where do we get the prior probabilities from? How can we guess probabilities before we’ve observed anything?

This famous puzzle led Ray Solomonoff to invent Solomonoff induction. The key new idea is algorithmic probability theory. This is a way to define a probability for any string of letters in some alphabet, where a string counts as more probable if it’s less complicated. If we think of a string as a "hypothesis"—it could be a sentence in English, or an equation—this becomes a way to formalize Occam’s razor: the idea that given two competing hypotheses, the simpler one is more likely to be true.

So, algorithmic probability lets us define a prior probability distribution on hypotheses, the so-called “simplicity prior”, that implements Occam’s razor.

More precisely, suppose we have a special programming language where:

  1. Computer programs are written as strings of bits.

  2. They contain a special bit string meaning “END” at the end, and nowhere else.

  3. They don’t take an input: they just run and either halt and print out a string of letters, or never halt.

Then to get the algorithmic probability of a string of letters, we take all programs that print out that string and add up

2-length of program

So, you can see that a string counts as more probable if it has more short programs that print it out.


Kolmogorov complexity. The Kolmologorov complexity of a string of letters is the length of the shortest program that prints it out, where programs are written in a special language as described above. This is a way of measuring how complicated a string is. It’s closely related to the algorithmic entropy: the difference between the Kolmogorov complexity of a string and minus the logarithm of its algorithmic probability is bounded by a constant, if we take logarithms using base 2. For more on all this stuff, see:

• M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity Theory and its Applications, Springer, Berlin, 2008.


Halting Oracle. Alas, the algorithmic probability of a string is not computable. Why? Because to compute it, you’d need to go through all the programs in your special language that print out that string and add up a contribution from each one. But to do that, you’d need to know which programs halt—and there’s no systematic way to answer that question, which is called the halting problem.

But, we can pretend! We can pretend we have a magic box that will tell us whether any program in our special language halts. Computer scientists call any sort of magic box that answers questions an oracle. So, our particular magic box called a halting oracle.


AIXI. AIXI is Marcus Hutter’s attempt to define an agent that "behaves optimally in any computable environment". Since AIXI relies on the idea of algorithmic probability, you can’t run AIXI on a computer unless it has infinite computer power and—the really hard part—access to a halting oracle. However, Hutter has also defined computable approximations to AIXI. For a quick intro, see this:

• Marcus Hutter, Universal intelligence: a mathematical top-down approach.

For more, try this:

• Marcus Hutter, Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability, Springer, Berlin, 2005.


Utility. Utility is a hypothetical numerical measure of satisfaction. If you know the probabilities of various outcomes, and you know what your utility will be in each case, you can compute your "expected utility" by taking the probabilities of the different outcomes, multiplying them by the corresponding utilities, and adding them up. In simple terms, this is how happy you’ll be on average. The expected utility hypothesis says that a rational decision-maker has a utility function and will try to maximize its expected utility.


Bounded Rationality. In the real world, any decision-maker has limits on its computational power and the time it has to make a decision. The idea that rational decision-makers "maximize expected utility" is oversimplified unless it takes this into account somehow. Theories of bounded rationality try to take these limitations into account. One approach is to think of decision-making as yet another activity whose costs and benefits must be taken into account when making decisions. Roughly: you must decide how much time you want to spend deciding. Of course, there’s an interesting circularity here.


Black Swan. According to Nassim Taleb, human history is dominated by black swans: important events that were unpredicted and indeed unpredictable, but rationalized by hindsight and thus made to seem as if they could have been predicted. He believes that rather than trying to predict such events (which he considers largely futile), we should try to get good at adapting to them. For more see:

• Nassim Taleb, The Black Swan: The Impact of the Highly Improbable, Random House, New York, 2007.


The first principle is that you must not fool yourself—and you are the easiest person to fool. – Richard Feynman


Tsunami

12 March, 2011

I hope everyone reading this, and everyone they know, is okay…

Stories, anyone?

Check out this animation from NOAA, the National Oceanic and Atmospheric Administration:

The tsunami was unnoticeable here in Singapore. It was just 10 centimeters tall when it hit the North Maluku islands in Indonesia, and we’re protected from the open Pacific by lots of Indonesian islands.

Of course, this “protection” has its own dangers, since Indonesia is geologically active: since I’ve lived here there have been two volcanic eruptions in Java, and an earthquake in western Sumatra created a tsunami that killed over 282 people in the Mentawai islands. An earthquake in eastern Sumatra could cause a tsunami here, perhaps—Sumatra is visible from tall buildings downtown. But today things are fine, here.

They’re worse in California!—though as you might expect, some there took advantage of the tsunami for surfing.


This Week’s Finds (Week 311)

7 March, 2011

This week I’ll start an interview with Eliezer Yudkowsky, who works at an institute he helped found: the Singularity Institute of Artificial Intelligence.

While many believe that global warming or peak oil are the biggest dangers facing humanity, Yudkowsky is more concerned about risks inherent in the accelerating development of technology. There are different scenarios one can imagine, but a bunch tend to get lumped under the general heading of a technological singularity. Instead of trying to explain this idea in all its variations, let me rapidly sketch its history and point you to some reading material. Then, on with the interview!

In 1958, the mathematician Stanislaw Ulam wrote about some talks he had with John von Neumann:

One conversation centered on the ever accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.

In 1965, the British mathematician Irving John Good raised the possibility of an "intelligence explosion": if machines could improve themselves to get smarter, perhaps they would quickly become a lot smarter than us.

In 1983 the mathematician and science fiction writer Vernor Vinge brought the singularity idea into public prominence with an article in Omni magazine, in which he wrote:

We will soon create intelligences greater than our own. When this happens, human history will have reached a kind of singularity, an intellectual transition as impenetrable as the knotted space-time at the center of a black hole, and the world will pass far beyond our understanding. This singularity, I believe, already haunts a number of science-fiction writers. It makes realistic extrapolation to an interstellar future impossible. To write a story set more than a century hence, one needs a nuclear war in between … so that the world remains intelligible.

In 1993 wrote an essay in which he even ventured a prediction as to when the singularity would happen:

Within thirty years, we will have the technological means to create superhuman intelligence. Shortly after, the human era will be ended.

You can read that essay here:

• Vernor Vinge, The coming technological singularity: how to survive in the post-human era, article for the VISION-21 Symposium, 30-31 March, 1993.

With the rise of the internet, the number of people interested in such ideas grew enormously: transhumanists, extropians, singularitarians and the like. In 2005, Ray Kurzweil wrote:

What, then, is the Singularity? It’s a future period during which the pace of technological change will be so rapid, its impact so deep, that human life will be irreversibly transformed. Although neither utopian or dystopian, this epoch will transform the concepts we rely on to give meaning to our lives, from our business models to the cycle of human life, including death itself. Understanding the Singularity will alter our perspective on the significance of our past and the ramifications for our future. To truly understand it inherently changes one’s view of life in general and one’s particular life. I regard someone who understands the Singularity and who has reflected on its implications for his or her own life as a "singularitarian".

He predicted that the singularity will occur around 2045. For more, see:

• Ray Kurzweil, The Singularity is Near: When Humans Transcend Biology, Viking, 2005.

Yudkowsky distinguishes three major schools of thought regarding the singularity:

Accelerating Change that is nonetheless somewhat predictable (e.g. Ray Kurzweil).

Event Horizon: after the rise of intelligence beyond our own, the future becomes absolutely unpredictable to us (e.g. Vernor Vinge).

Intelligence Explosion: a rapid chain reaction of self-amplifying intelligence until ultimate physical limits are reached (e.g. I. J. Good and Eliezer Yudkowsky).

Yukdowsky believes that an intelligence explosion could threaten everything we hold dear unless the first self-amplifying intelligence is "friendly". The challenge, then, is to design “friendly AI”. And this requires understanding a lot more than we currently do about intelligence, goal-driven behavior, rationality and ethics—and of course what it means to be “friendly”. For more, start here:

• The Singularity Institute of Artificial Intelligence, Publications.

Needless to say, there’s a fourth school of thought on the technological singularity, even more popular than those listed above:

Baloney: it’s all a load of hooey!

Most people in this school have never given the matter serious thought, but a few have taken time to formulate objections. Others think a technological singularity is possible but highly undesirable and avoidable, so they want to prevent it. For various criticisms, start here:

Technological singularity: Criticism, Wikipedia.

Personally, what I like most about singularitarians is that they care about the future and recognize that it may be very different from the present, just as the present is very different from the pre-human past. I wish there were more dialog between them and other sorts of people—especially people who also care deeply about the future, but have drastically different visions of it. I find it quite distressing how people with different visions of the future do most of their serious thinking within like-minded groups. This leads to groups with drastically different assumptions, with each group feeling a lot more confident about their assumptions than an outsider would deem reasonable. I’m talking here about environmentalists, singularitarians, people who believe global warming is a serious problem, people who don’t, etc. Members of any tribe can easily see the cognitive defects of every other tribe, but not their own. That’s a pity.

And so, this interview:

JB: I’ve been a fan of your work for quite a while. At first I thought your main focus was artificial intelligence (AI) and preparing for a technological singularity by trying to create "friendly AI". But lately I’ve been reading your blog, Less Wrong, and I get the feeling you’re trying to start a community of people interested in boosting their own intelligence—or at least, their own rationality. So, I’m curious: how would you describe your goals these days?

EY: My long-term goals are the same as ever: I’d like human-originating intelligent life in the Solar System to survive, thrive, and not lose its values in the process. And I still think the best means is self-improving AI. But that’s a bit of a large project for one person, and after a few years of beating my head against the wall trying to get other people involved, I realized that I really did have to go back to the beginning, start over, and explain all the basics that people needed to know before they could follow the advanced arguments. Saving the world via AI research simply can’t compete against the Society for Treating Rare Diseases in Cute Kittens unless your audience knows about things like scope insensitivity and the affect heuristic and the concept of marginal expected utility, so they can see why the intuitively more appealing option is the wrong one. So I know it sounds strange, but in point of fact, since I sat down and started explaining all the basics, the Singularity Institute for Artificial Intelligence has been growing at a better clip and attracting more interesting people.

Right now my short-term goal is to write a book on rationality (tentative working title: The Art of Rationality) to explain the drop-dead basic fundamentals that, at present, no one teaches; those who are impatient will find a lot of the core material covered in these Less Wrong sequences:

Map and territory.
How to actually change your mind.
Mysterious answers to mysterious questions.

though I intend to rewrite it all completely for the book so as to make it accessible to a wider audience. Then I probably need to take at least a year to study up on math, and then—though it may be an idealistic dream—I intend to plunge into the decision theory of self-modifying decision systems and never look back. (And finish the decision theory and implement it and run the AI, at which point, if all goes well, we Win.)

JB: I can think of lots of big questions at this point, and I’ll try to get to some of those, but first I can’t resist asking: why do you want to study math?

EY: A sense of inadequacy.

My current sense of the problems of self-modifying decision theory is that it won’t end up being Deep Math, nothing like the proof of Fermat’s Last Theorem—that 95% of the progress-stopping difficulty will be in figuring out which theorem is true and worth proving, not the proof. (Robin Hanson spends a lot of time usefully discussing which activities are most prestigious in academia, and it would be a Hansonian observation, even though he didn’t say it AFAIK, that complicated proofs are prestigious but it’s much more important to figure out which theorem to prove.) Even so, I was a spoiled math prodigy as a child—one who was merely amazingly good at math for someone his age, instead of competing with other math prodigies and training to beat them. My sometime coworker Marcello (he works with me over the summer and attends Stanford at other times) is a non-spoiled math prodigy who trained to compete in math competitions and I have literally seen him prove a result in 30 seconds that I failed to prove in an hour.

I’ve come to accept that to some extent we have different and complementary abilities—now and then he’ll go into a complicated blaze of derivations and I’ll look at his final result and say "That’s not right" and maybe half the time it will actually be wrong. And when I’m feeling inadequate I remind myself that having mysteriously good taste in final results is an empirically verifiable talent, at least when it comes to math. This kind of perceptual sense of truth and falsity does seem to be very much important in figuring out which theorems to prove. But I still get the impression that the next steps in developing a reflective decision theory may require me to go off and do some of the learning and training that I never did as a spoiled math prodigy, first because I could sneak by on my ability to "see things", and second because it was so much harder to try my hand at any sort of math I couldn’t see as obvious. I get the impression that knowing which theorems to prove may require me to be better than I currently am at doing the proofs.

On some gut level I’m also just embarrassed by the number of compliments I get for my math ability (because I’m a good explainer and can make math things that I do understand seem obvious to other people) as compared to the actual amount of advanced math knowledge that I have (practically none by any real mathematician’s standard). But that’s more of an emotion that I’d draw on for motivation to get the job done, than anything that really ought to factor into my long-term planning. For example, I finally looked up the drop-dead basics of category theory because someone else on a transhumanist IRC channel knew about it and I didn’t. I’m happy to accept my ignoble motivations as a legitimate part of myself, so long as they’re motivations to learn math.

JB: Ah, how I wish more of my calculus students took that attitude. Math professors worldwide will frame that last sentence of yours and put it on their office doors.

I’ve recently been trying to switch from pure math to more practical things. So I’ve been reading more about control theory, complex systems made of interacting parts, and the like. Jan Willems has written some very nice articles about this, and your remark about complicated proofs in mathematics reminds me of something he said:

… I have almost always felt fortunate to have been able to do research in a mathematics environment. The average competence level is high, there is a rich history, the subject is stable. All these factors are conducive for science. At the same time, I was never able to feel unequivocally part of the mathematics culture, where, it seems to me, too much value is put on difficulty as a virtue in itself. My appreciation for mathematics has more to do with its clarity of thought, its potential of sharply articulating ideas, its virtues as an unambiguous language. I am more inclined to treasure the beauty and importance of Shannon’s ideas on errorless communication, algorithms such as the Kalman filter or the FFT, constructs such as wavelets and public key cryptography, than the heroics and virtuosity surrounding the four-color problem, Fermat’s last theorem, or the Poincaré and Riemann conjectures.

I tend to agree. Never having been much of a prodigy myself, I’ve always preferred thinking of math as a language for understanding the universe, rather than a list of famous problems to challenge heroes, an intellectual version of the Twelve Labors of Hercules. But for me the universe includes very abstract concepts, so I feel "pure" math such as category theory can be a great addition to the vocabulary of any scientist.

Anyway: back to business. You said:

I’d like human-originating intelligent life in the Solar System to survive, thrive, and not lose its values in the process. And I still think the best means is self-improving AI.

I bet a lot of our readers would happily agree with your first sentence. It sounds warm and fuzzy. But a lot of them might recoil from the next sentence. "So we should build robots that take over the world???" Clearly there’s a long train of thought lurking here. Could you sketch how it goes?

EY: Well, there’s a number of different avenues from which to approach that question. I think I’d like to start off with a quick remark—do feel free to ask me to expand on it—that if you want to bring order to chaos, you have to go where the chaos is.

In the early twenty-first century the chief repository of scientific chaos is Artificial Intelligence. Human beings have this incredibly powerful ability that took us from running over the savanna hitting things with clubs to making spaceships and nuclear weapons, and if you try to make a computer do the same thing, you can’t because modern science does not understand how this ability works.

At the same time, the parts we do understand, such as that human intelligence is almost certainly running on top of neurons firing, suggest very strongly that human intelligence is not the limit of the possible. Neurons fire at, say, 200 hertz top speed; transmit signals at 150 meters/second top speed; and even in the realm of heat dissipation (where neurons still have transistors beat cold) a synaptic firing still dissipates around a million times as much heat as the thermodynamic limit for a one-bit irreversible operation at 300 Kelvin. So without shrinking the brain, cooling the brain, or invoking things like reversible computing, it ought to be physically possible to build a mind that works at least a million times faster than a human one, at which rate a subjective year would pass for every 31 sidereal seconds, and all the time from Ancient Greece up until now would pass in less than a day. This is talking about hardware because the hardware of the brain is a lot easier to understand, but software is probably a lot more important; and in the area of software, we have no reason to believe that evolution came up with the optimal design for a general intelligence, starting from incremental modification of chimpanzees, on its first try.

People say things like "intelligence is no match for a gun" and they’re thinking like guns grew on trees, or they say "intelligence isn’t as important as social skills" like social skills are implemented in the liver instead of the brain. Talking about smarter-than-human intelligence is talking about doing a better version of that stuff humanity has been doing over the last hundred thousand years. If you want to accomplish large amounts of good you have to look at things which can make large differences.

Next lemma: Suppose you offered Gandhi a pill that made him want to kill people. Gandhi starts out not wanting people to die, so if he knows what the pill does, he’ll refuse to take the pill, because that will make him kill people, and right now he doesn’t want to kill people. This is an informal argument that Bayesian expected utility maximizers with sufficient self-modification ability will self-modify in such a way as to preserve their own utility function. You would like me to make that a formal argument. I can’t, because if you take the current formalisms for things like expected utility maximization, they go into infinite loops and explode when you talk about self-modifying the part of yourself that does the self-modifying. And there’s a little thing called Löb’s Theorem which says that no proof system at least as powerful as Peano Arithmetic can consistently assert its own soundness, or rather, if you can prove a theorem of the form

□P ⇒ P

(if I prove P then it is true) then you can use this theorem to prove P. Right now I don’t know how you could even have a self-modifying AI that didn’t look itself over and say, "I can’t trust anything this system proves to actually be true, I had better delete it". This is the class of problems I’m currently working on—reflectively consistent decision theory suitable for self-modifying AI. A solution to this problem would let us build a self-improving AI and know that it was going to keep whatever utility function it started with.

There’s a huge space of possibilities for possible minds; people makethe mistake of asking "What will AIs do?" like AIs were the Tribe that Lives Across the Water, foreigners all of one kind from the same country. A better way of looking at it would be to visualize a gigantic space of possible minds and all human minds fitting into one tiny little dot inside the space. We want to understand intelligence well enough to reach into that gigantic space outside and pull out one of the rare possibilities that would be, from our perspective, a good idea to build.

If you want to maximize your marginal expected utility you have to maximize on your choice of problem over the combination of high impact, high variance, possible points of leverage, and few other people working on it. The problem of stable goal systems in self-improving Artificial Intelligence has no realistic competitors under any three of these criteria, let alone all four.

That gives you rather a lot of possible points for followup questions so I’ll stop there.

JB: Sure, there are so many followup questions that this interview should be formatted as a tree with lots of branches instead of in a linear format. But until we can easily spin off copies of ourselves I’m afraid that would be too much work.

So, I’ll start with a quick point of clarification. You say "if you want to bring order to chaos, you have to go where the chaos is." I guess that at one level you’re just saying that if we want to make a lot of progress in understanding the universe, we have to tackle questions that we’re really far from understanding—like how intelligence works.

And we can say this in a fancier way, too. If we wants models of reality that reduce the entropy of our probabilistic predictions (there’s a concept of entropy for probability distributions, which is big when the probability distribution is very smeared out), then we have to find subjects where our predictions have a lot of entropy.

Am I on the right track?

EY: Well, if we wanted to torture the metaphor a bit further, we could talk about how what you really want is not high-entropy distributions but highly unstable ones. For example, if I flip a coin, I have no idea whether it’ll come up heads or tails (maximum entropy) but whether I see it come up heads or tails doesn’t change my prediction for the next coinflip. If you zoom out and look at probability distributions over sequences of coinflips, then high-entropy distributions tend not to ever learn anything (seeing heads on one flip doesn’t change your prediction next time), while inductive probability distributions (where your beliefs about probable sequences are such that, say, 11111 is more probable than 11110) tend to be lower-entropy because learning requires structure. But this would be torturing the metaphor, so I should probably go back to the original tangent:

Richard Hamming used to go around annoying his colleagues at Bell Labs by asking them what were the important problems in their field, and then, after they answered, he would ask why they weren’t working on them. Now, everyone wants to work on "important problems", so why areso few people working on important problems? And the obvious answer is that working on the important problems doesn’t get you an 80% probability of getting one more publication in the next three months. And most decision algorithms will eliminate options like that before they’re even considered. The question will just be phrased as, "Of the things that will reliably keep me on my career track and not embarrass me, which is most important?"

And to be fair, the system is not at all set up to support people who want to work on high-risk problems. It’s not even set up to socially support people who want to work on high-risk problems. In Silicon Valley a failed entrepreneur still gets plenty of respect, which Paul Graham thinks is one of the primary reasons why Silicon Valley produces a lot of entrepreneurs and other places don’t. Robin Hanson is a truly excellent cynical economist and one of his more cynical suggestions is that the function of academia is best regarded as the production of prestige, with the production of knowledge being something of a byproduct. I can’t do justice to his development of that thesis in a few words (keywords: hanson academia prestige) but the key point I want to take away is that if you work on a famous problem that lots of other people are working on, your marginal contribution to human knowledge may be small, but you’ll get to affiliate with all the other prestigious people working on it.

And these are all factors which contribute to academia, metaphorically speaking, looking for its keys under the lamppost where the light is better, rather than near the car where it lost them. Because on a sheer gut level, the really important problems are often scary. There’s a sense of confusion and despair, and if you affiliate yourself with the field, that scent will rub off on you.

But if you try to bring order to an absence of chaos—to some field where things are already in nice, neat order and there is no sense of confusion and despair—well, the results are often well described in a little document you may have heard of called the Crackpot Index. Not that this is the only thing crackpot high-scorers are doing wrong, but the point stands, you can’t revolutionize the atomic theory of chemistry because there isn’t anything wrong with it.

We can’t all be doing basic science, but people who see scary, unknown, confusing problems that no one else seems to want to go near and think "I wouldn’t want to work on that!" have got their priorities exactly backward.

JB: The never-ending quest for prestige indeed has unhappy side-effects in academia. Some of my colleagues seem to reason as follows:

If Prof. A can understand Prof. B’s work, but Prof. B can’t understand Prof. A, then Prof. A must be smarter—so Prof. A wins.

But I’ve figured out a way to game the system. If I write in a way that few people can understand, everyone will think I’m smarter than I actually am! Of course I need someone to understand my work, or I’ll be considered a crackpot. But I’ll shroud my work in jargon and avoid giving away my key insights in plain language, so only very smart, prestigious colleagues can understand it.

On the other hand, tenure offers immense opportunities for risky and exciting pursuits if one is brave enough to seize them. And there are plenty of folks who do. After all, lots of academics are self-motivated, strong-willed rebels.

This has been on my mind lately since I’m trying to switch from pure math to something quite different. I’m not sure what, exactly. And indeed that’s why I’m interviewing you!

(Next week: Yudkowsky on The Art of Rationality, and what it means to be rational.)


Whenever there is a simple error that most laymen fall for, there is always a slightly more sophisticated version of the same problem that experts fall for. – Amos Tversky


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers