posted on 2023-02-25

clarifying formal alignment implementation

one thing i think bears clarification for the purpose of how i intend to save the world using AI aligned to formal goals is that i think giving an AI a model of the world, with an understanding of people and 3D space and physics and stuff, is way too hard, pointing to concepts in such models is gonna be messy, and we don't have aligned data or reward signals to which to align to.

however, there are two things i think we can do more easily, and which are more likely to be eventually aligned / robust to the sharp left turn:

this is my way around the problem of getting AI to model the world and concepts in the way we want: leave it up to AI₀, and possibly its successors. at the moment, it seems plausible to me that we get to even delegate solving embedded agency to AI₁, at least if we're okay with Son-of-CDT, but maybe even if not.

this is why i don't have much of an interest in a large amount of existing prosaic alignment research and world-modeling capabilities.

posted on 2023-02-25

CC_ -1 License unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
unless explicitely mentioned, all content on this site was created by me; not by others nor AI.