posted on 2023-01-29

communicating with successful alignment timelines

consider the following plan for AI alignment:

  1. (quantum-)generate an asymmetric cryptographic signature keypair (with a quantum-resistant signature scheme)
  2. (quantum-)generate an idea for alignment — such as a 1GB file of plaintext
  3. if the idea is good, use it to solve alignment and then have the aligned AI store a signature of the idea somewhere
  4. if it isn't, destroy the private signing key and then create an AI whose formal goal is to implement one of whichever solutions are signed — checked using that non-destroyed public signature verification key — and stored in the multiverse.

for this scheme to work, AIs we make in timelines where the randomly generated idea isn't good — which is the exponential majority of timelines — need to be unable to recover the private signing key, whether by brute-force, by examining the world for traces of it, or by resimulating history.

perhaps boxing an AI can work for this. note that it doesn't necessarily need to resimulate alternate timelines in full; it might be able, even from its limited boxed compute, to guess at what kind of ideas we'd tend to sign. requiring this limitation on an AI's capabilities makes this a wonky alignment scheme.

my understanding of quantum mechanics is limited, but as i understand there might be quantum computation schemes which could, at least theoretically, allow for a private signing key's bits to be stored in a way that we can either destroy or consumed by signing a piece of data, such that its bits are not leaked into the world when we destroy it. consuming the key when using it to sign might help ensure that even if our aligned-AI-and-civilization are later taken over and overwhelmed by an alien superintelligence, they can never use our private signing key to sign some other idea.

2023-02-28 edit: i've realized this wouldn't work because, if we can spawn exponentially many timelines to explore ideaspace, then the AI can spawn exponentially many timelines to generate all possible signature keypairs, find the private key that matches our public key that way, and use that to sign whatever idea makes its job easiest. so, we'd have to have some notion causality requiring the generated ideas to precede the AI, like the "past user" in QACI or PreDCA. but at that point, cryptography is not needed anymore; we can just look for instances of a good idea next to the phrase "and i think this a really good idea" — at most, cryptography might(?) help against remote attackers like aliens or something.

posted on 2023-01-29

CC_ -1 License unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
unless explicitely mentioned, all content on this site was created by me; not by others nor AI.