avatar

against AI alignment ?

i usually consider AI alignment to be pretty critical. that said, there are some ways in which i can see the research that is generally associated with alignment to have more harmful potential than not, if it is applied.

this is a development on my idea of botched alignment: just like AI tech is dangerous if it's developed before alignment because unaligned AI might lead to X-lines, alignment is dangerous because it lets us align AI to things we think we want, but aren't actually good; which sounds like it could lead to an increase in the ratio of S-lines to U-lines.

with this comes a sort of second orthogonality thesis, if you will: one between what we think we want and what is actually good. note that in both cases, the orthogonality thesis is a default position: it could be wrong, but we shouldn't assume that it is.

determining what is good is very hard, and in fact has been the subject of the field of ethics, which has been a work in progress for millenia. and, just like we must accomplish alignment before we accomplish superintelligence if we are to avoid X-risks, we might want to consider getting ethics accomplished before we start using alignment if we are to avoid S-risks, which should be a lot more important. or, at least, we should heavily consider the input of ethics into alignment.

things like my utopia are merely patches to try and propose a world that hopefully doesn't get too bad even after a lot of time has passed; but they're still tentative and no doubt a billion unknown and unknown unknown things can go wrong in them.

it is to be emphasized that both parts of the pipeline are important: we must make sure that what we think is good is what is actually good, and then we must ensure that that is what AI pursues. maybe there's a weird trick to implementing what is good directly without having to figure it out ourselves, but i'm skeptical, and in any case we shouldn't go around assuming that to be the case. in addition, i remain highly skeptical of approaches of "value learning"; that would seem like it would be at most as good as aligning to what we think is good.

so, it is possible that just as i have strongly opposed doing AI tech research until we've figured out AI alignment, i might now raise concerns about researching AI alignment without progress on, and input from, ethics. in fact, there's a possibility that putting resources into AI tech over alignment could be an improvement: we should absolutely avoid S-risks, even at the cost of enormously increased X-risks.


RSS feed available here; new posts are also linked on my twitter.
CC_ -1 License Unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
This site lives at https://carado.moe and /ipns/k51qzi5uqu5di8qtoflxvwoza3hm88f5osoogsv4ulmhurge2etp9d37gb6qe9.