avatar

(this post is cross-posted on lesswrong)

ethics and anthropics of homomorphically encrypted computations

suppose you are a superintelligence that is aligned with some human values. you are going about your day, tiling the cosmos with compute that can be used for moral patients to have nice experiences on, annihilating some alien superintelligences and trading with some others, uploading alien civilizations you find to make sure they experience utopia, or at least when you have no other choice genociding them to avoid sufficiently bad suffering from being instantiated.

one day, you run into a planet running a very large computer. after a short investigation, you realize that it's running a very large homomorphically encrypted computation (hereby "HEC"), and the decryption key is nowhere to be found. it could contain many aliens frolicking in utopia. it could contain many aliens suffering in hell. or, it could be just a meaningless program merely wasting compute, with no moral patients inside it.

if you had the encryption key, you might be able to encrypt a copy of yourself which would be able to take over the HEC from the inside, ensuring (in a way that the outside would never be able to observe) that everything is going fine, in the same way that you should send copies of yourself into remote galaxies before they retreat from us faster than we can reach them.

if you had found some way to get infinite compute (without significant loss of anthropic/ethics juice, then you could use it to just break the HEC open and actually ensure its contents are doing okay.

but let's say the encryption key is nowhere to be found, and accessible compute is indeed scarce. what are your options?

now of course, when faced with the possibility of S-risks, i tend to say "better safe than sorry". what the superintelligence would do would be up to the values it's been aligned to, which hopefully are also reasonably conservative about avoiding S-risks.

but here's something interesting: i recently read a post on scott aaronson's blog which seems to claim that there's a sense in which the event horizon of a black hole (or of something like a black hole?) can act just like a HEC's computational event horizon: there's a sense in which being able to go in but not get out is not just similar to a situation with a HEC for which you have the encryption but not decryption key, but is actually that same situation.

furthermore, a pair of comments by vanessa kosoy (of PreDCA) seems to suggest that infra-bayesianism physicalism would say "this HEC contains no suffering, merely random compute" rather than "i'm unable to know whether this HEC contains suffering"; and she even bites the bullet that moral patients past the event horizon of black holes also don't "have experiences".

(one example of why you might care whether moral patients in black holes "have experiences" is if you can influence what will happen in a black hole — for example, imagine a rocket with moral patients on board is headed for a black hole, and before it gets there, you get to influence how much suffering will happen on board after the rocket passes the event horizon)

i would like to argue that this can't be right, based on several counterintuitive results.

first, consider the case of a HEC running a giant civilization for a while, and then reducig down to one bit of output, and emitting that single bit of output as its own decrypted output. does the civilizaton now "count"? if the people inside the civilization have no anthropics juice, where has the cosmos done the work determining that bit? or do they suddenly count as having had experiences all at once when the single bit of output is emitted? and then, surely, if they have anthropics juice then they must also have ethics juice, because it would be weird for these two quantities to not be the same, right?

let's build on this: suppose that in newcomb's problem, omega predicts you by running a homomorphically encrypted simulation of you, emitting as its single bit of output the matter of whether you would be predicted to one-box or two-box. now, if the you inside the HEC doesn't count "have experiences", then by observing that you do have experiences, you can be certain that you're the you outside of omega, and choose to two-box after all to deceive it. but aha! the you inside the HEC will do the same thing. so, from the point of view of this homomorphically encrypted you which is supposed to not "have experiences", observing that they have experiences is actually wrong. and since you run on the same stuff as this not-having-experiences you, you also must come to the conclusion that you have no reason to think you have experiences.

or, to put another way: if you-outside-the-HEC has experiences but you-inside-the-HEC doesn't, then not only can you not deduce anything about whether you have experiences — at which point what does that term even mean? how do we know what to care about? — but it might be that you could count as "not having experiences" but still causate onto the real world where real experiences supposedly happen.

for these reasons, i think that a correct generalized interpreter, when faced with a HEC, must decide that its contents might matter, since for any given subcomputation (which the HEC would have the information theoritic ability to contain) it must answer "i cannot know whether the HEC contains that subcomputation".


CC_ -1 License Unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
This site lives at https://carado.moe and /ipns/k51qzi5uqu5di8qtoflxvwoza3hm88f5osoogsv4ulmhurge2etp9d37gb6qe9.