trading with superintelligence: a wonky proto-alignment scheme

if we had as long as we want to figure out AI alignment, then we wouldn't worry as much — the problem is that timelines are short.

so, what if we traded with the AI? we could make an AI that isn't aligned yet, and we try to only let it have tiny effects on the world — maybe trading some stocks in a limited manner — while having, and telling it about, the following commitment:

when we figure out alignment, then we'll align (possibly new) AI to 99% our values, 1% whatever wants this current AI that we're trading with. if anything threatens to let it escape its box, we'll destroy down said box first.

if we can restrict that AI's ability to impact the world in ways that help it trick us enough, then its remaining option is to help, and impact the world in whatever way maximizes our chances of having the time to figure out alignment and minimizes the chance that we all die of some other AI.

there's a bunch of assumptions going into this:

but at least it seems like a vaguely plausible plan, or at least one that might inspire better ideas.

