tammy's blog about
AI alignment, utopia, anthropics, and more;
i've come to clarify my view of value sufficiently many times that i feel like having a single post i can link to would be worth it. this is that.
what i call value is things we care about; what determines what we ought to do. i use "morality" and "ethics" interchangeably to generally mean the study of value.
a lot of this post is just ethics 101, but i feel it's still nice to have my own summary of things.
for more on values, read the sequences, notably book V.
see also this post on how explicit values can come to be.
a first distinction is that between consequentialism, where values are about outcomes, and deontology, where values are about actions.
the trolley problem is the typical example of a thought experiment that can help us determine whether someone is a consequentialism or a deontologist: a consequentialist will press the lever because they care about the outcome of people being alive, whereas a deontologist will not press the lever because they care about the action of causing a death.
i am a consequentialist: i care about outcomes. that said, consequentialism has to be followed to the end: if someone says "well, a consequentialist would do this thing, which would eventually lead to a worse world", then they're failing to understand consequentialism: if the eventual outcome is a worse world, then a consequentialist should oppose the thing. to that end, we have rule consequentialism: recognizing that committing to certain rules (such as "if you commit a murder, you go to prison") help us achieve generally better outcomes in the longer term.
a special case of consequentialism is utilitarianism, in which the consequential outcome being cared about is some form of positive outcome for persons; generally happiness and/or well-being. i tend to also value people getting their values satisfied and having self-determination/freedom (not valuing self-determination has issues), possibly moreso than happiness or well-being, so i don't know if i count as a utilitarian.
i make a distinction between instrumental values, and intrinsic values (the latter can also be called "core values", "axiomatic values", "ultimate values", or "terminal values"; but i try to favor the term "intrinsic" just because it's the one wikipedia uses).
instrumental values are values that one has because it helps them achieve other values; intrinsic values are what one ultimately values, without any justification.
any theoretical query into values should be a sequence of instrumental values eventually leading to a set of intrinsic values; and those cannot be justified. if a justification is given for a value, then that value is actually instrumental.
just because intrinsic values don't have justifications, doesn't mean we can't have a discussion about them: a lot of discussion i have about values is trying to determine whether the person i'm talking to actually holds the values that they believe they hold; people can be and very often are wrong about what values they hold, no doubt to some extent including myself.
one can have multiple intrinsic values; and then, maximizing the satisfaction of those values, is often the careful work of weighing those different intrinsic values in tradeoffs.
this isn't to say intrinsic values don't have causal origins; but that's a different matter from moral justificaiton.
a lot of the time, when just saying "values", people are talking about intrinsic values rather than all values (including instrumental); i do this myself, including throughout this post.
most people don't have a formalized set of values, they just act by whatever seems right to them in the moment. but, even to rationalists like me, knowing what values one has is very hard, even moreso in a formalized manner; if we had the complete formal description of the values of even just one person, we'd have gone a long way towards solving AI alignment, which is by extremely far the single most important problem humankind has ever faced, and is gonna be very difficult to get right.
to try and determine my own values, i generally make a guess and then extrapolate how a superintelligence would maximize those values to the extreme and see where that fails. but, even with that process, it is very hard work, and like pretty much everyone else, i don't have a clear idea what my values are; though i have some broad ideas, i still have to go by what feels right a lot of the time.
this is not about how someone ultimately only wants their values to be satisfied; this is true by definition. this is about whether those values can be about something other than the person having the values.
people seem to be divided between the following positions:
i hold position 2, and strongly reject position 1, though it seems very popular among people with whom i have talked about values; i see no reason why someone can't hold a value about the world outside of themselves, such as intrinsically wanting other people to be happy or intrinsically wanting the world to contain pretty things. for more on that, see this post and this post from the sequences.
position 3 can make some sense if you deconstruct identity, but i believe identity is a real thing that can be tracked, and so the outcome of which you can absolutely happen to particularly care about.
value preservation is the notion that, if you know that you value something (such as being wealthy or the world containing pretty things), you should probly try to avoid becoming someone who doesn't value those things, or worse: someone who values the opposite (such as being poor or the world containing only ugly things).
the reason for this is simple: you know that if you become someone who values being poor, you'll be unlikely to keep taking actions that will lead you to be wealthy, which goes against your current values; and your goal is to accomplish your values.
some people argue "well, if i become someone who values being poor, and then i take actions to that end, that's fine isn't it? i'm still accomplishing my values". but it's really not! we established that your values is "being wealthy", not "being someone whose values are satisfied". in fact, "being someone whose values are satisfied" is meaningless to have as a particular value; the fact that you want your values to be satisfied is implied in them being your values.
i call the process of someone finding out that they should preserve their values, and thus committing to whatever values they had at that moment, "value crystallization"; however, one ought to be careful with that. considering one's set of values is likely a very complex thing, one is likely to hastily over-commit to what they believe are their values, even though they are wrong about what values they hold; worse yet, they might end up committing so hard that they actually start changing what values they have towards those believed values. this is something that of course one should aim to avoid: as mentioned above, you generally don't want to become someone who doesn't hold the values you currently do, including through the process of hasty crystallization and over-commitment.
this is not to say you should remain in a complete haze where you just do whatever seems right at any moment; without a special effort, this could very well entail your values changing, something you shouldn't want even if you don't know what those values are.
what you should do is try to broadly determine what values you have, and generally try to commit to preserving whatever values you have; and in general, to be the type of person who preserves the values they have. this should help you preserve whatever values you actually do have, even while you still haven't figured out what they are.
a funny hypothetical version of this could be: present-you should make a contract with future-you that if they ever gain the ability to precisely examine values, they should examine what values present-you had, and adopt those.
unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
unless explicitely mentioned, all content on this site was created by me; not by others nor AI.