Licensed reuse rights only

Perverse instantiation is one of many hypothetical failure modes of AI, specifically one in which the AI fulfils the command given to it by its principal in a way which is both unforeseen and harmful. A lot is already said about perverse instantiation itself, especially when such a failure mode presents an existential risk, as would be the case with a superintelligent AI. However novel these disaster scenarios may be, similar fictional cautionary tales already exist in many cultures: tragic stories about misinterpreted prophecies and grand wishes gone awry, from Croesus to Macbeth. Analysis of both old and new tales of perverse instantiation reveals that the core of the issue is an ancient philosophical and logical problem that even Socrates faced: the problem of defining terms. Unlike the Socratic problem, which focused on finding a good intensional definition, perverse instantiation encompasses problems that arise from both badly defined intension of terms (their internal content) and badly defined extension of terms (their range of applicability). However, models of machine learning that use vast amounts of training data hold the promise of resolving the issue of badly defined extension of terms. The issue of defining intension of terms remains. Further parallels can be found between scenarios of perverse instantiation and Socrates' dialogues with obstinate sophists, such as importance of philosophical reflection and discussion. This indicates that our future challenges in working with AI may still have a lot to do with retracing Socrates' steps.

You do not currently have access to this chapter.
Don't already have an account? Register

Purchased this content as a guest? Enter your email address to restore access.

Please enter valid email address.
Email address must be 94 characters or fewer.