avatar

botched alignment and alignment awareness

AI alignment is hard.

an AI developer who doesn't know about the problem of alignment to general human values might accidentally develop a superintelligence which optimizes for something largely unrelated to humans, leading us to an X-line; on the other hand, if they make a botched attempt at alignment to human values, it seems like there's more of a chance (compared to if they don't try) at booting a superintelligence which cares about enough aspects of human existence to tile the universe with some form of humans, but not enough to make those humans' lives actually worth living (goals such as "humans must not die"), resulting in S-lines.

considering this, raising awareness of AI alignment issues may be a very bad idea: it might be much better to let everyone develop not-human-caring-at-all AI and cause X-lines rather than risk them making imperfect attempts resulting in S-lines. or: we shouldn't try to implement alignment to human values until we really know what we're doing.

contrary to a previous post of mine, this is a relatively hopeful position: no matter how many timelines end in X-risk, inhabited P-lines can continue to exist and research alignment, hopefully without too many S-lines being created. on the other hand, while it increases the chance of the singularity turning out good by leaving us more time to figure out alignment, it also means that it might take longer than i'd've otherwise expected.


RSS feed available here; new posts are also linked on my twitter.
CC_ -1 License Unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
This site lives at https://carado.moe and /ipns/k51qzi5uqu5di8qtoflxvwoza3hm88f5osoogsv4ulmhurge2etp9d37gb6qe9.