avatar

(posted on 2022-09-27 — also cross-posted on lesswrong, see there for comments)

existential self-determination

existential self-determination is a problem i have pondered about for a while (1, 2). in this post, i talk about how i've come to think about it since my shift towards rejecting continuous identity, and tentatively embracing trusting my meta-values.

here's the current view: among the set of viable moral-patient-instants (hereby "MPI"), current me has some values about which ones to instantiate. notably:

when the AI weighs those against various constraints like computational cost or conflict resolution, according to whatever set of meta-values it's aligned to, it can figure out what next set of MPIs to spawn. note that it's not clear how it is to be determined what an MPIs values are with regards to that; this is where difficulty remains.

(one of those constraints is that we should probly only create future MPIs which would retroactively consent to exist. i'm hopeful this is the case for my own future selves: i would want to create a future self/future selves that are reasonably aligned with my current self, and my current values include that i'm pretty happy about existing — or so i believe, at least. evaluating that would ultimately be up to the AI, of course.)

note that this framework doesn't embed a fundamental notion of continuous identity: the AI just looks at the values it's aligned to — hopefully those entail satisfying the values of currently existing MPIs — and satisfies those values in whatever way they want, including what new MPIs should exist. any notion of "continuous identity" is merely built inside those MPIs.

a typical notion of "continuous person" would be a particular case of sequences of MPIs generally valuing the existence of future instances in the sequence; but that's just one set of values among others, and other perspectives on individualism could be satisfied as well in the same future.

in fact this framework about which set of future MPIs we'd want to instantiate, describing not just which minds are instantiated but what environment they get to experience — including interactions with other MPIs — seems like it might be a sufficient foundation for AI-satisfied values in general. that is to say: it might be the case that any kind of meaningful values would be reasonably encodable as answers to the question "what next set of MPIs should be instantiated?". or, put another way, that might be the type that a utility function would take.

such a foundation does rule out caring about non-moral-patient material things: you can't want The Moon to be painted green; at most, you can want everyone to perceive A Moon as green. but, by way of embracing computational materialism, i kind of already hold this position — the ultimate point of importance is MPIs, and caring "radiates outwards" from those.

(posted on 2022-09-27 — also cross-posted on lesswrong, see there for comments)


CC_ -1 License unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.