posted on 2023-04-22

the multiverse argument argument against automated alignment

there are a bunch of "normal" reasons why i don't particularly favor using AI systems to do alignment research, but rather building something else (which might very well be powered by current AI techniques, but also has novel, cleverly-designed parts which take care of alignment):

but in this post, i present a wackier argument. let's say we in some kind(s) of multiverse — be it by way of tegmark 1, tegmark 3, or tegmark 4. then, let's compare the following plans:

  1. we solve alignment using technology B before technology A kills us
  2. we solve alignment using technology A before technology A kills us

in scenario 1, where our alignment progress does not depend on progress in the technology that kills us, the question "do we solve alignment before we die?" is largely an indexical one — there are manyworlds branches where we do, and branches where we don't. but in scenario 2, the question "do we solve alignment before we die?" is moreso a logical one — it could be that in all manyworlds branches the answer is we do, but hit could also be that in all manyworlds branches the answer is no, and that's arguably a larger risk than just "bleeding timelines".

(notice the use of "largely" and "moreso" — both questions are partly indexical and partly logical, just to different degrees)

this doesn't mean using AI to solve alignment is necessarily completely fraught. the approach would have more chances of working out if access to language models was very stringently restricted to sufficiently trustworthy alignment researchers — where trustworthyness is not just "will not publish capabilities" but "will not let insights slip which would be heard, perhaps indirectly, by someone who would publish capabilities". there are ways to develop dangerous AI without killing everyone, if one is extremely careful. OpenAI is just not doing that, and instead giving access to its systems to the masses and even planning to develop APIs to accelerate the capabilities of those systems as much as possible.

note that we should still consider using already existing dangerous technologies — this argument does not apply to cyborgism using current language models, so long as the alignment cyborgists are extremely careful about not letting out any kind of insights they gain about language models.

posted on 2023-04-22

CC_ -1 License unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
unless explicitely mentioned, all content on this site was created by me; not by others nor AI.