Wednesday, September 24, 2014

Bostrom's Superintelligence - Does AI constitute an Existential Risk?

The folks at OUP kindly sent me a review copy of Nick Bostrom's new book Superintelligence, exploring AI risk.  It's a topic that lends itself to eyerolls and easy mockery ("Computers taking over the world? No thanks, I already saw that movie.") -- but I don't think that's quite fair.  So long as you accept that there's a non-trivial chance of an Artificial General Intelligence eventually being designed that surpasses human-level general intelligence, then Bostrom's cautionary discussion is surely one well worth having.  For he makes the case that imperfectly implemented AGI constitutes an existential risk more dangerous than asteroids or nuclear war. To mitigate that risk, we need to work out in advance if/how humanity could safely constrain or control an AGI more intelligent than we are.

Is superintelligent AGI likely? There doesn't seem any basis for doubting its in-principle technological feasibility.  Could it happen anytime soon?  That's obviously much harder to say.  But, especially if we imagine starting from whole-brain emulations, and allowing for recursive self-improvement, it doesn't seem outrageous to think that it could occur within a century, or even a matter of mere decades. Perhaps many or most people would be sufficiently cautious to not wish to develop the technology, but all it takes is one...

Why is AGI so risky? Roughly, the argument can be thought of in two steps: (1) Superintelligence will tend to lead to super-capability, or power to optimize the world according to the AGI's goals; and (2) If those goals do not perfectly align with humanity's, we could easily find ourselves in the position of the dodo.

re: 1, Why expect an AGI to be super-capable? A smartphone can't do much by itself, after all; would an AGI be so different?  This depends a bit on how we conceive of AGI, but Bostrom discusses a few obvious possibilities for containment, particularly "Tool AI" (that, on the model of current computers, are not fully-fledged agents with goals, but mere tools for accomplishing a task in a more or less constrained manner), and "Oracle AI" that can answer questions but not initiate any other actions in the world.

Bostrom argues that Tool AI, if they are to be sufficiently useful, will be made to conduct fairly open-ended and wide-ranging searches for solutions to their assigned problems -- solutions that go beyond what their programmers could have thought of (this is just what makes them a form of AGI rather than just a very fast pocket calculator).  But this already introduces the risk of them deciding on a form of action that we wouldn't necessarily endorse, or that will have unexpected (to us) consequences.  At least if we program the AI with actual goals, we can enlist the superintelligence to help us prevent unintended disaster. (But how great a risk is this, exactly? I'm not totally sold on this argument against Tool AI.)

Oracle AI present two kinds of risk.  One is that they could provide excessive power to their controllers ("Tell me how to take over the world!"). On the other hand, if given goals besides truth-telling -- if, for example, they are somehow imbued with a sense of justice and morality to prevent exploitation should they fall into the wrong hands -- they may be motivated to break free of their confines and optimize the world more directly.  How could they break free?  Bostrom suggests that one avenue of capability that a "boxed-in" self-improving AGI would likely prioritize is social manipulation. It might internally simulate the brains of its jailers to test out various hypotheses about how to get them to "release" it. Perhaps it could blackmail its jailers by threatening to create & torture internal simulations of them -- which could be a frightening prospect if one holds both a computational view of consciousness and a psychological view of personal identity! Perhaps it could entice them with the prospect of how much more it could help them (or the world) if only it were let free...

None of which is to suggest that we would necessarily be unable to control an AI, of course. It's just to note that there is a non-trivial risk here to be aware of.

re: 2, we may ask: would a free and super-capable AGI be so bad? Well, that depends on what it has been programmed to want. If it wants to make us smile, Bostrom points out, it might find that the most effective way to do so is to paralyze our facial muscles into "constant beaming smiles". If it wants to maximize happiness, it might find that the best way to do so is to exterminate biological life and tile the universe with computationally-based happy experiences, simply "playing on loop", so to speak. Neither seems to capture what we really had in mind.

Bostrom takes these cautionary lessons (along with the familiar phenomenon of fundamental moral disagreement) to suggest that attempting to directly explicate the correct values is too risky, and that an indirect approach might be preferable. Something like, "Do what our ideally rational and fully informed selves would converge on wanting..."  How to operationalize these ideas into computer code remains unclear. (Could we hope that an AGI's general linguistic competence would allow it to correctly discern what normative concepts like ideally rational mean?)

Anyway, it's all interesting stuff (though probably not a lot new to readers of Less Wrong or Robin Hanson). And it seems to me there's a decent case to be made that it's extremely important stuff, too (though if you disagree, do post a comment to explain!). At any rate, I hope Bostrom's book gets people thinking more about existential risk and shaping the far future, whether or not AI risk turns out to be the most important point of focus to this end. (For more general discussion, see Nick Beckstead, 'On the Overwhelming Importance of Shaping the Far Future.')


  1. I think there's a pretty straightforward argument for taking this kind of discussion seriously, on general grounds independent of one's particular assessment of the possibility of AI itself. The issues discussed by Bostrom tend to be limit-case versions of issues that arise in forming institutions, especially ones that serve a wide range of purposes. Most of the things Bostrom discusses, on both the risk and the prevention side, have lower-level, less efficient efficient analogues in institution-building.

    1. Ah, that'd be interesting! Do you have any particular examples in mind?

    2. A lot of the problems -- perverse instantiation and principal agent problems, for instance -- are standard issues in law and constitutional theory, and a lot of constitutional theory is concerned with addressing them. In checks and balances, for instance, we are 'stunting' and 'tripwiring' different institutions to make them work less efficiently in matters where we foresee serious risks. Enumeration of powers is an attempt to control a government by direct specification, and political theories going back to Plato that insist on the importance of education are using domesticity and indirect normativity. (Plato's actually very interesting in this respect, because the whole point of Plato's Republic is that the constitution of the city is deliberately set up to mirror the constitution of a human person, so in a sense Plato's republic functions like a weirrd artificial intelligence.)

      The major differences arise, I think, from two sources: (1) With almost all institutions, we are dealing with less-than-existential risks. If government fails, that's bad, but it's short of wiping out all of humanity. (2) The artificial character of an AI introduces some quirks -- e.g., there are fewer complications in setting out to hardwire AIs with various things than trying to do it with human beings and institutions. But both of these mean that a lot of Bostrom's work on this point can be seen as looking at the kind of problems and strategies involved in institutions, in a sort of pure case where usual limits don't apply.

  2. Others know much more about this than I, but I'll offer the following thought anyway. It strikes me that there's a tension between (1) talking about what goals an AGI is programmed with, and on the other hand, (2) thinking that whole brain emulation (or something much like it) is one of the most plausible routes to AGI. There's no straightforward sense in which an AGI that comes about via whole brain emulation is "programmed to want" anything. A brain emulating AGI may end up wanting things, but not because its creators intentionally hard coded certain values. That is, with whole brain emulation, you don't have any clear picture of what sort of behavior you're going to get at the end. When google's "Artificial Brain" ended up identifying cat videos, that wasn't because its programmers had told it to look for cat videos. Rather, they'd "told" it, effectively, to look for patterns, and cats were what it found:

    So if that's right, the idea that a big part of avoiding the existential risk here involves figuring out what sort of value-strategy to code into an AGI (e.g., the idea towards the end of your post that rather than hard-coding particular substantive values, maybe we should hard-code a certain value-discovering methodology) strikes me as possibly beside the point. If AGI comes about via what strikes many as the most plausible route, there won't have been any hard-coding of values, or value-discovering methodologies, at any point in the process.

    1. Huh, yeah, good point! I guess Bostrom is implicitly assuming that some kind of old-fashioned programming approach is more likely? The sort of "em"(ulation)-centric future that Robin Hanson discusses does look very different from the picture Bostrom sketches -- and (as you note) raises very different issues.

    2. Yeah. In a sense that makes things scarier, since it eliminates one way of avoiding the danger (namely, figuring out the right value programming strategy).

  3. Personally I just hope (and believe) that AIs will realise that, if humans are its biological boot system, that keeping us around & functional is in its best long-term interests as there are natural disasters that would more hurt it, than us. The best example of this is solar flares - the Carrington event of 1859 was so powerful that it set fire to telegraph stations, and would be disproportionately more damaging to a machine intelligence - whilst humans would be relatively (relatively) unaffected. Skynet is going to feel pretty stupid if, the day after it kills the last human, a flare fries it's servants the surface & confemns it to existing in a bunker, with no tools to reboot! Whilst this is a flippant riposte to a serious point, it does highlight the fact that an ai, if intelligent & with an eye on maintaining it's survival, would have it's chances vastly enhanced if it cooperates with humans, rather than enslaves / eliminates us.


Visitors: check my comments policy first.
Non-Blogger users: If the comment form isn't working for you, email me your comment and I can post it on your behalf. (If your comment is too long, first try breaking it into two parts.)

Note: only a member of this blog may post a comment.