avatar

alignment research is very weird

on its face, AI alignment is just the field of study of how we make AI do what we want. seems simple enough.

in practice, it leads to many very strange places. turns out, making an AI that optimizes anything at all, we don't care what, is much easier than making it robustly optimize for what we want (whatever that is). here are some weird questions that come up and seem like they might actually need figuring out in order to build aligned AI:

it may be that some or most of those questions are irrelevant; for example, it may be that we can just build "dumb AI" that's limited in scope to writing poetry and designing non-smart-AI software, and somehow everyone agrees to only make that kind of AI (as opposed to facebook AI killing everyone six months later). but for the general case where AI is supposed to be arbitrarily capacitous, in a way that most AGI labs are pursuing (sometimes even intentionally and explicitely) these questions are relevant — at the very least in the meta sense of "which questions do we actually need to figure out".


CC_ -1 License Unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
This site lives at https://carado.moe and /ipns/k51qzi5uqu5di8qtoflxvwoza3hm88f5osoogsv4ulmhurge2etp9d37gb6qe9.