AI Control Reading List

A curated list of resources on AI control research and concepts.

This page collates recommended reading for those seeking to understand the field of AI control.

AI control aims to develop techniques for training and deploying AI systems such that they cannot cause security or safety failures, even if they attempt to do so or try to defeat the control mechanisms. While the basic idea has been discussed for some time, systematic exploration and publication in this area significantly increased starting in late 2023.

Foundational Content

Other Important Conceptual Writing

In roughly decreasing order of priority.

More Stuff to Read