The folks at OUP kindly sent me a review copy of Nick Bostrom's new book Superintelligence, exploring AI risk. It's a topic that lends itself to eyerolls and easy mockery ("Computers taking over the world? No thanks, I already saw that movie.") -- but I don't think that's quite fair. So long as you accept that there's a non-trivial chance of an Artificial General Intelligence eventually being designed that surpasses human-level general intelligence, then Bostrom's cautionary discussion is surely one well worth having. For he makes the case that imperfectly implemented AGI constitutes an existential risk more dangerous than asteroids or nuclear war. To mitigate that risk, we need to work out in advance if/how humanity could safely constrain or control an AGI more intelligent than we are.
Is superintelligent AGI likely? There doesn't seem any basis for doubting its in-principle technological feasibility. Could it happen anytime soon? That's obviously much harder to say. But, especially if we imagine starting from whole-brain emulations, and allowing for recursive self-improvement, it doesn't seem outrageous to think that it could occur within a century, or even a matter of mere decades. Perhaps many or most people would be sufficiently cautious to not wish to develop the technology, but all it takes is one...
Why is AGI so risky? Roughly, the argument can be thought of in two steps: (1) Superintelligence will tend to lead to super-capability, or power to optimize the world according to the AGI's goals; and (2) If those goals do not perfectly align with humanity's, we could easily find ourselves in the position of the dodo.
re: 1, Why expect an AGI to be super-capable? A smartphone can't do much by itself, after all; would an AGI be so different? This depends a bit on how we conceive of AGI, but Bostrom discusses a few obvious possibilities for containment, particularly "Tool AI" (that, on the model of current computers, are not fully-fledged agents with goals, but mere tools for accomplishing a task in a more or less constrained manner), and "Oracle AI" that can answer questions but not initiate any other actions in the world.
Bostrom argues that Tool AI, if they are to be sufficiently useful, will be made to conduct fairly open-ended and wide-ranging searches for solutions to their assigned problems -- solutions that go beyond what their programmers could have thought of (this is just what makes them a form of AGI rather than just a very fast pocket calculator). But this already introduces the risk of them deciding on a form of action that we wouldn't necessarily endorse, or that will have unexpected (to us) consequences. At least if we program the AI with actual goals, we can enlist the superintelligence to help us prevent unintended disaster. (But how great a risk is this, exactly? I'm not totally sold on this argument against Tool AI.)
Oracle AI present two kinds of risk. One is that they could provide excessive power to their controllers ("Tell me how to take over the world!"). On the other hand, if given goals besides truth-telling -- if, for example, they are somehow imbued with a sense of justice and morality to prevent exploitation should they fall into the wrong hands -- they may be motivated to break free of their confines and optimize the world more directly. How could they break free? Bostrom suggests that one avenue of capability that a "boxed-in" self-improving AGI would likely prioritize is social manipulation. It might internally simulate the brains of its jailers to test out various hypotheses about how to get them to "release" it. Perhaps it could blackmail its jailers by threatening to create & torture internal simulations of them -- which could be a frightening prospect if one holds both a computational view of consciousness and a psychological view of personal identity! Perhaps it could entice them with the prospect of how much more it could help them (or the world) if only it were let free...
None of which is to suggest that we would necessarily be unable to control an AI, of course. It's just to note that there is a non-trivial risk here to be aware of.
re: 2, we may ask: would a free and super-capable AGI be so bad? Well, that depends on what it has been programmed to want. If it wants to make us smile, Bostrom points out, it might find that the most effective way to do so is to paralyze our facial muscles into "constant beaming smiles". If it wants to maximize happiness, it might find that the best way to do so is to exterminate biological life and tile the universe with computationally-based happy experiences, simply "playing on loop", so to speak. Neither seems to capture what we really had in mind.
Bostrom takes these cautionary lessons (along with the familiar phenomenon of fundamental moral disagreement) to suggest that attempting to directly explicate the correct values is too risky, and that an indirect approach might be preferable. Something like, "Do what our ideally rational and fully informed selves would converge on wanting..." How to operationalize these ideas into computer code remains unclear. (Could we hope that an AGI's general linguistic competence would allow it to correctly discern what normative concepts like ideally rational mean?)
Anyway, it's all interesting stuff (though probably not a lot new to readers of Less Wrong or Robin Hanson). And it seems to me there's a decent case to be made that it's extremely important stuff, too (though if you disagree, do post a comment to explain!). At any rate, I hope Bostrom's book gets people thinking more about existential risk and shaping the far future, whether or not AI risk turns out to be the most important point of focus to this end. (For more general discussion, see Nick Beckstead, 'On the Overwhelming Importance of Shaping the Far Future.')