(posted on 2022-06-10)

inner alignment is "just" a hard engineering problem; outer alignment is the work of philosophy and politics and values which our species has been investigating and debating about for millenia.

are human values they the same for everyone, or do they differ?

should we implement the values held by us, us now, everyone now, everyone ever, or everyone possible?

would some philosophical/political perspectives constitute suffering risks? for example, if many people on earth want to be correct, and if they also believe there is a hell where some people suffer forever, does that mean satisfying their values entails creating an at least moderately-sized hell, the inhabitants of which in some sense "value" suffering forever? is that okay?

if one person wants to go have gay sex, but ten christians want nobody anywhere to have gay sex, does self-determination trump naïve utilitarian value satisfaction?

or should we create one giant super-consensus society where we all value being boringly blissful, and forego all diversity, such that our values are easily implemented and non-conflicting; do we desire harmony above diversity?

if we value diversity, how much diversity should we instantatiate; what is the threshold of "evilness" at which a culture should not be able to exist?

how do we even reason about existential self-determination?

what about suffering in fundamental physics and suffering subroutines?

what are the politics and fundamental values of the people who will get to work on alignment?

on one hand, my belief about these questions is respectively "the latter", "us now", "possibly, yes, no", "yes", "no", "a bunch", "i don't know", "hopefully they don't matter too much", and "uh oh". on the other hand, i hope this post invokes how ridiculously not-talked-about-enough these questions are, considering how important they might be to what we fill the rest of this universe's history with.

mildly related: "politics is the mind-killer" is the mind-killer

