tammy's blog about
AI alignment, utopia, anthropics, and more;
here some things i've realized with regards to blob location, for QACI.
if i quantum-generate the question blob then it's not clear if, in order to inject the counterfactual question, i need to locate-and-counterfactually-replace just the first intended instance of the blob (perhaps the physics-level qubits), or if i need to in some sense locate-and-counterfactually-replace all intended instances, including ones that are "complexly encoded" macro-states — instances of the question blob that are for example stored in configurations of transistors on hard drives. it seems like i'll have to locate complexly encoded rather than basic-quantum-level encoded blobs anyways because:
capturing the answer as physics-level qubits might be complicated, because of quantum decoherence: you can turn qubits into decohered macro states, but you can't turn decohered macro state back into qubits, or at least not before we randomly reach that state exponentially far into heat death or something. so, it looks like anything that's downstream of the answer blob is going to have to be located as a macro state.
one way that i feel like we could somewhat reliably locate the answer blob, which also gives us flexibility in when the answer blob is produced rather than requiring every question-answer interval to end at the same time, would be by using a cryptographic signature scheme: we generate the question blob along with a quantum-random large cryptographic signature keypair, and then when we have an answer we produce a blob of data consisting of the answer followed by a signature of the answer. the QACI formal goal would require locating the answer next to its signature by the public key.
if we implemented "first occurrence of the answer in time" correctly, then we don't even need to destroy the private key — the unaligned AI can sign whatever it wants and that doesn't change what the first signed occurrence is like.
one doubt i have at this point, is how this plays with many-worlds. in communicating with successful alignment timelines, i got blindsighted by unintutive consequences of many-worlds (see the edit on that post) — i wonder if there could be something similar going on here.
finally, there might be a "cleaner" way to "entangle" the answer with the question somehow than a cryptographic signature, which i'm not aware of.
unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
unless explicitely mentioned, all content on this site was created by me; not by others nor AI.