Here’s the third and final part of my interview with Eliezer Yudkowsky. We’ll talk about three big questions… roughly these:
• How do you get people to work on potentially risky projects in a safe way?
• Do we understand ethics well enough to build “Friendly artificial intelligence”?
• What’s better to work on, artificial intelligence or environmental issues?
So, with no further ado:
JB: There are decent Wikipedia articles on “optimism bias” and “positive illusions”, which suggest that unrealistically optimistic people are more energetic, while more realistic estimates of success go hand-in-hand with mild depression. If this is true, I can easily imagine that most people working on challenging projects like quantum gravity (me, 10 years ago) or artificial intelligence (you) are unrealistically optimistic about our chances of success.
Indeed, I can easily imagine that the first researchers to create a truly powerful artificial intelligence will be people who underestimate its potential dangers. It’s an interesting irony, isn’t it? If most people who are naturally cautious avoid a certain potentially dangerous line of research, the people who pursue that line of research are likely to be less cautious than average.
I’m a bit worried about this when it comes to “geoengineering”, for example—attempts to tackle global warming by large engineering projects. We have people who say “oh no, that’s too dangerous”, and turn their attention to approaches they consider less risky, but that may leave the field to people who underestimate the risks.
So I’m very glad you are thinking hard about how to avoid the potential dangers of artificial intelligence—and even trying to make this problem sound exciting, to attract ambitious and energetic young people to work on it. Is that part of your explicit goal? To make caution and rationality sound sexy?
EY: The really hard part of the problem isn’t getting a few smart people to work on cautious, rational AI. It’s admittedly a harder problem than it should be, because there’s a whole system out there which is set up to funnel smart young people into all sorts of other things besides cautious rational long-term basic AI research. But it isn’t the really hard part of the problem.
The scary thing about AI is that I would guess that the first AI to go over some critical threshold of self-improvement takes all the marbles—first mover advantage, winner take all. The first pile of uranium to have an effective neutron multiplication factor greater than 1, or maybe the first AI smart enough to absorb all the poorly defended processing power on the Internet—there’s actually a number of different thresholds that could provide a critical first-mover advantage.
And it is always going to be fundamentally easier in some sense to go straight all out for AI and not worry about clean designs or stable self-modification or the problem where a near-miss on the value system destroys almost all of the actual value from our perspective. (E.g., imagine aliens who shared every single term in the human utility function but lacked our notion of boredom. Their civilization might consist of a single peak experience repeated over and over, which would make their civilization very boring from our perspective, compared to what it might have been. That is, leaving a single aspect out of the value system can destroy almost all of the value. So there’s a very large gap in the AI problem between trying to get the value system exactly right, versus throwing something at it that sounds vaguely good.)
You want to keep as much of an advantage as possible for the cautious rational AI developers over the crowd that is just gung-ho to solve this super interesting scientific problem and go down in the eternal books of fame. Now there should in fact be some upper bound on the combination of intelligence, methodological rationality, and deep understanding of the problem which you can possess, and still walk directly into the whirling helicopter blades. The problem is that it is probably a rather high upper bound. And you are trying to outrace people who are trying to solve a fundamentally easier wrong problem. So the question is not attracting people to the field in general, but rather getting the really smart competent people to either work for a cautious project or not go into the field at all. You aren’t going to stop people from trying to develop AI. But you can hope to have as many of the really smart people as possible working on cautious projects rather than incautious ones.
So yes, making caution look sexy. But even more than that, trying to make incautious AI projects look merely stupid. Not dangerous. Dangerous is sexy. As the old proverb goes, most of the damage is done by people who wish to feel themselves important. Human psychology seems to be such that many ambitious people find it far less scary to think about destroying the world, than to think about never amounting to much of anything at all. I have met people like this. In fact all the people I have met who think they are going to win eternal fame through their AI projects have been like this. The thought of potentially destroying the world is bearable; it confirms their own importance. The thought of not being able to plow full steam ahead on their incredible amazing AI idea is not bearable; it threatens all their fantasies of wealth and fame.
Now these people of whom I speak are not top-notch minds, not in the class of the top people in mainstream AI, like say Peter Norvig (to name someone I’ve had the honor of meeting personally). And it’s possible that if and when self-improving AI starts to get real top-notch minds working on it, rather than people who were too optimistic about/attached to their amazing bright idea to be scared away by the field of skulls, then these real stars will not fall prey to the same sort of psychological trap. And then again it is also plausible to me that top-notch minds will fall prey to exactly the same trap, because I have yet to learn from reading history that great scientific geniuses are always sane.
So what I would most like to see would be uniform looks of condescending scorn directed at people who claimed their amazing bright AI idea was going to lead to self-improvement and superintelligence, but who couldn’t mount an adequate defense of how their design would have a goal system stable after a billion sequential self-modifications, or how it would get the value system exactly right instead of mostly right. In other words, making destroying the world look unprestigious and low-status, instead of leaving it to the default state of sexiness and importance-confirmingness.
JB: “Get the value system exactly right”—now this phrase touches on another issue I’ve been wanting to talk about. How do we know what it means for a value system to be exactly right? It seems people are even further from agreeing on what it means to be good than on what it means to be rational. Yet you seem to be suggesting we need to solve this problem before it’s safe to build a self-improving artificial intelligence!
When I was younger I worried a lot about the foundations of ethics. I decided that you “can’t derive an ought from an is”—do you believe that? If so, all logical arguments leading up to the conclusion that “you should do X” must involve an assumption of the form “you should do Y”… and attempts to “derive” ethics are all implicitly circular in some way. This really bothered the heck out of me: how was I supposed to know what to do? But of course I kept on doing things while I was worrying about this… and indeed, it was painfully clear that there’s no way out of making decisions: even deciding to “do nothing” or commit suicide counts as a decision.
Later I got more comfortable with the idea that making decisions about what to do needn’t paralyze me any more than making decisions about what is true. But still, it seems that the business of designing ethical beings is going to provoke huge arguments, if and when we get around to that.
Do you spend as much time thinking about these issues as you do thinking about rationality? Of course they’re linked….
EY: Well, I probably spend as much time explaining these issues as I do rationality. There are also an absolutely huge number of pitfalls that people stumble into when they try to think about, as I would put it, Friendly AI. Consider how many pitfalls people run into when they try to think about Artificial Intelligence. Next consider how many pitfalls people run into when they try to think about morality. Next consider how many pitfalls philosophers run into when they try to think about the nature of morality. Next consider how many pitfalls people run into when they try to think about hypothetical extremely powerful agents, especially extremely powerful agents that are supposed to be extremely good. Next consider how many pitfalls people run into when they try to imagine optimal worlds to live in or optimal rules to follow or optimal governments and so on.
Now imagine a subject matter which offers discussants a lovely opportunity to run into all of those pitfalls at the same time.
That’s what happens when you try to talk about Friendly Artificial Intelligence.
And it only takes one error for a chain of reasoning to end up in Outer Mongolia. So one of the great motivating factors behind all the writing I did on rationality and all the sequences I wrote on Less Wrong was to actually make it possible, via two years worth of writing and probably something like a month’s worth of reading at least, to immunize people against all the usual mistakes.
Lest I appear to dodge the question entirely, I’ll try for very quick descriptions and google keywords that professional moral philosophers might recognize.
In terms of what I would advocate programming a very powerful AI to actually do, the keywords are “mature folk morality” and “reflective equilibrium”. This means that you build a sufficiently powerful AI to do, not what people say they want, or even what people actually want, but what people would decide they wanted the AI to do, if they had all of the AI’s information, could think about for as long a subjective time as the AI, knew as much as the AI did about the real factors at work in their own psychology, and had no failures of self-control.
There’s a lot of important reasons why you would want to do exactly that and not, say, implement Asimov’s Three Laws of Robotics (a purely fictional device, and if Asimov had depicted them as working well, he would have had no stories to write) or building a superpowerful AI which obeys people’s commands interpreted in literal English, or creating a god whose sole prime directive is to make people maximally happy, or any of the above plus a list of six different patches which guarantee that nothing can possibly go wrong, and various other things that seem like incredibly obvious failure scenarios but which I assure you I have heard seriously advocated over and over and over again.
In a nutshell, you want to use concepts like “mature folk morality” or “reflective equilibrium” because these are as close as moral philosophy has ever gotten to defining in concrete, computable terms what you could be wrong about when you order an AI to do the wrong thing.
For an attempt at nontechnical explanation of what one might want to program an AI to do and why, the best resource I can offer is an old essay of mine which is not written so as to offer good google keywords, but holds up fairly well nonetheless:
• Eliezer Yudkowsky, Coherent extrapolated volition, May 2004.
You also raised some questions about metaethics, where metaethics asks not “Which acts are moral?” but “What is the subject matter of our talk about ‘morality’?” i.e. “What are we talking about here anyway?” In terms of Google keywords, my brand of metaethics is closest to analytic descriptivism or moral functionalism. If I were to try to put that into a very brief nutshell, it would be something like “When we talk about ‘morality’ or ‘goodness’ or ‘right’, the subject matter we’re talking about is a sort of gigantic math question hidden under the simple word ‘right’, a math question that includes all of our emotions and all of what we use to process moral arguments and all the things we might want to change about ourselves if we could see our own source code and know what we were really thinking.”
The complete Less Wrong sequence on metaethics (with many dependencies to earlier ones) is:
• Eliezer Yudkowsky, Metaethics sequence, Less Wrong, 20 June to 22 August 2008.
And one of the better quick summaries is at:
• Eliezer Yudkowsky, Inseparably right; or, joy in the merely good, Less Wrong, 9 August 2008.
And if I am wise I shall not say any more.
JB: I’ll help you be wise. There are a hundred followup questions I’m tempted to ask, but this has been a long and grueling interview, so I won’t. Instead, I’d like to raise one last big question. It’s about time scales.
Self-improving artificial intelligence seems like a real possibility to me. But when? You see, I believe we’re in the midst of a global ecological crisis—a mass extinction event, whose effects will be painfully evident by the end of the century. I want to do something about it. I can’t do much, but I want to do something. Even if we’re doomed to disaster, there are different sizes of disaster. And if we’re going through a kind of bottleneck, where some species make it through and others go extinct, even small actions now can make a difference.
I can imagine some technological optimists—singularitarians, extropians and the like—saying: “Don’t worry, things will get better. Things that seem hard now will only get easier. We’ll be able to suck carbon dioxide from the atmosphere using nanotechnology, and revive species starting from their DNA.” Or maybe even: “Don’t worry: we won’t miss those species. We’ll be having too much fun doing things we can’t even conceive of now.”
But various things make me skeptical of such optimism. One of them is the question of time scales. What if the world goes to hell before our technology saves us? What if artificial intelligence comes along toolate to make a big impact on the short-term problems I’m worrying about? In that case, maybe I should focus on short-term solutions.
Just to be clear: this isn’t some veiled attack on your priorities. I’m just trying to decide on my own. One good thing about having billions of people on the planet is that we don’t all have to do the same thing. Indeed, a multi-pronged approach is best. But for my own decisions, I want some rough guess about how long various potentially revolutionary technologies will take to come online.
What do you think about all this?
EY: I’ll try to answer the question about timescales, but first let me explain in some detail why I don’t think the decision should be dominated by that question.
If you look up “Scope Insensitivity” on Less Wrong, you’ll see that when three different groups of subjects were asked how much they would pay in increased taxes to save 2,000 / 20,000 / 200,000 birds from drowning in uncovered oil ponds, the respective average answers were $80 / $78 / $88. People asked questions like this visualize one bird, wings slicked with oil, struggling to escape, and that creates some amount of emotional affect which determines willingness to pay, and the quantity gets tossed out the window since no one can visualize 200,000 of anything. Another hypothesis to explain the data is “purchase of moral satisfaction”, which says that people give enough money to create a “warm glow” inside themselves, and the amount required might have something to do with your personal financial situation, but it has nothing to do with birds. Similarly, residents of four US states were only willing to pay 22% more to protect all 57 wilderness areas in those states than to protect one area. The result I found most horrifying was that subjects were willing to contribute more when a set amount of money was needed to save one child’s life, compared to the same amount of money saving eight lives—because, of course, focusing your attention on a single person makes the feelings stronger, less diffuse.
So while it may make sense to enjoy the warm glow of doing good deeds after we do them, we cannot possibly allow ourselves to choose between altruistic causes based on the relative amounts of warm glow they generate, because our intuitions are quantitatively insane.
And two antidotes that absolutely must be applied in choosing between altruistic causes are conscious appreciation of scope and conscious appreciation of marginal impact.
By its nature, your brain flushes right out the window the all-important distinction between saving one life and saving a million lives. You’ve got to compensate for that using conscious, verbal deliberation. The Society For Curing Rare Diseases in Cute Puppies has got great warm glow, but the fact that these diseases are rare should call a screeching halt right there—which you’re going to have to do consciously, not intuitively. Even before you realize that, contrary to the relative warm glows, it’s really hard to make a moral case for trading off human lives against cute puppies. I suppose if you could save a billion puppies using one dollar I wouldn’t scream at someone who wanted to spend the dollar on that instead of cancer research.
And similarly, if there are a hundred thousand researchers and billions of dollars annually that are already going into saving species from extinction—because it’s a prestigious and popular cause that has an easy time generating warm glow in lots of potential funders—then you have to ask about the marginal value of putting your effort there, where so many other people are already working, compared to a project that isn’t so popular.
I wouldn’t say “Don’t worry, we won’t miss those species”. But consider the future intergalactic civilizations growing out of Earth-originating intelligent life. Consider the whole history of a universe which contains this world of Earth and this present century, and also billions of years of future intergalactic civilization continuing until the universe dies, or maybe forever if we can think of some ingenious way to carry on. Next consider the interval in utility between a universe-history in which Earth-originating intelligence survived and thrived and managed to save 95% of the non-primate biological species now alive, versus a universe-history in which only 80% of those species are alive. That utility interval is not very large compared to the utility interval between a universe in which intelligent life thrived and intelligent life died out. Or the utility interval between a universe-history filled with sentient beings who experience happiness and have empathy for each other and get bored when they do the same thing too many times, versus a universe-history that grew out of various failures of Friendly AI.
(The really scary thing about universes that grow out of a loss of human value is not that they are different, but that they are, from our standpoint, boring. The human utility function says that once you’ve made a piece of art, it’s more fun to make a different piece of art next time. But that’s just us. Most random utility functions will yield instrumental strategies that spend some of their time and resources exploring for the patterns with the highest utility at the beginning of the problem, and then use the rest of their resources to implement the pattern with the highest utility, over and over and over. This sort of thing will surprise a human who expects, on some deep level, that all minds are made out of human parts, and who thinks, “Won’t the AI see that its utility function is boring?” But the AI is not a little spirit that looks over its code and decides whether to obey it; the AI is the code. If the code doesn’t say to get bored, it won’t get bored. A strategy of exploration followed by exploitation is implicit in most utility functions, but boredom is not. If your utility function does not already contain a term for boredom, then you don’t care; it’s not something that emerges as an instrumental value from most terminal values. For more on this see: “In Praise of Boredom” in the Fun Theory Sequence on Less Wrong.)
Anyway: In terms of expected utility maximization, even large probabilities of jumping the interval between a universe-history in which 95% of existing biological species survive Earth’s 21st century, versus a universe-history where 80% of species survive, are just about impossible to trade off against tiny probabilities of jumping the interval between interesting universe-histories, versus boring ones where intelligent life goes extinct, or the wrong sort of AI self-improves.
I honestly don’t see how a rationalist can avoid this conclusion: At this absolutely critical hinge in the history of the universe—Earth in the 21st century—rational altruists should devote their marginal attentions to risks that threaten to terminate intelligent life or permanently destroy a part of its potential. Those problems, which Nick Bostrom named “existential risks“, have got all the scope. And when it comes to marginal impact, there are major risks outstanding that practically no one is working on. Once you get the stakes on a gut level it’s hard to see how doing anything else could be sane.
So how do you go about protecting the future of intelligent life? Environmentalism? After all, there are environmental catastrophes that could knock over our civilization… but then if you want to put the whole universe at stake, it’s not enough for one civilization to topple, you have to argue that our civilization is above average in its chances of building a positive galactic future compared to whatever civilization would rise again a century or two later. Maybe if there were ten people working on environmentalism and millions of people working on Friendly AI, I could see sending the next marginal dollar to environmentalism. But with millions of people working on environmentalism, and major existential risks that are completely ignored… if you add a marginal resource that can, rarely, be steered by expected utilities instead of warm glows, devoting that resource to environmentalism does not make sense.
Similarly with other short-term problems. Unless they’re little-known and unpopular problems, the marginal impact is not going to make sense, because millions of other people will already be working on them. And even if you argue that some short-term problem leverages existential risk, it’s not going to be perfect leverage and some quantitative discount will apply, probably a large one. I would be suspicious that the decision to work on a short-term problem was driven by warm glow, status drives, or simple conventionalism.
With that said, there’s also such a thing as comparative advantage—the old puzzle of the lawyer who works an hour in the soup clinic instead of working an extra hour as a lawyer and donating the money. Personally I’d say you can work an hour in the soup clinic to keep yourself going if you like, but you should also be working extra lawyer-hours and donating the money to the soup clinic, or better yet, to something with more scope. (See “Purchase Fuzzies and Utilons Separately” on Less Wrong.) Most people can’t work effectively on Artificial Intelligence (some would question if anyone can, but at the very least it’s not an easy problem). But there’s a variety of existential risks to choose from, plus a general background job of spreading sufficiently high-grade rationality and existential risk awareness. One really should look over those before going into something short-term and conventional. Unless your master plan is just to work the extra hours and donate them to the cause with the highest marginal expected utility per dollar, which is perfectly respectable.
Where should you go in life? I don’t know exactly, but I think I’ll go ahead and say “not environmentalism”. There’s just no way that the product of scope, marginal impact, and John Baez’s comparative advantage is going to end up being maximal at that point.
Which brings me to AI timescales.
If I knew exactly how to make a Friendly AI, and I knew exactly how many people I had available to do it, I still couldn’t tell you how long it would take because of Product Management Chaos.
As it stands, this is a basic research problem—which will always feel very hard, because we don’t understand it, and that means when our brain checks for solutions, we don’t see any solutions available. But this ignorance is not to be confused with the positive knowledge that the problem will take a long time to solve once we know how to solve it. It could be that some fundamental breakthrough will dissolve our confusion and then things will look relatively easy. Or it could be that some fundamental breakthrough will be followed by the realization that, now that we know what to do, it’s going to take at
least another 20 years to do it.
I seriously have no idea when AI is going to show up, although I’d be genuinely and deeply shocked if it took another century (barring a collapse of civilization in the meanwhile).
If you were to tell me that as a Bayesian I have to put probability distributions on things on pain of having my behavior be inconsistent and inefficient, well, I would actually suspect that my behavior is inconsistent. But if you were to try and induce from my behavior a median expected time where I spend half my effort planning for less and half my effort planning for more, it would probably look something like 2030.
But that doesn’t really matter to my decisions. Among all existential risks I know about, Friendly AI has the single largest absolute scope—it affects everything, and the problem must be solved at some point for worthwhile intelligence to thrive. It also has the largest product of scope of marginal impact, because practically no one is working on it, even compared to other existential risks. And my abilities seem applicable to it. So I may not like my uncertainty about timescales, but my decisions are not unstable with respect to that uncertainty.
JB: Ably argued! If I think of an interesting reply, I’ll put it in the blog discussion. Thanks for your time.
The best way to predict the future is to invent it. – Alan Kay