an action — such as building, running, giving access, or publishing an AI system — is dangerous to the extent that it moves the world in a direction that makes it be in more danger. giving people access to DALL-E caused the world to now contain the easy ability to create images automatically, which is probly not a big deal when it comes to doom; but GPT is a potentially highly useful automated piece of intelligence with a complex understanding of the world. someone out there building an agentic AI can just plug GPT (either GPT-3 via API access, or GPT-2 by embedding it directly) into their AI system, and give it the ability to manipulate the world in clever complex ways using GPT.
sure, with RLHF, GPT can be made to refuse (at least in naive circumstances) to say racist-sounding things or tell people how to make meth. but agentic world-affecting AI doesn't particularly need to say racist things or know how to make meth in order to have significant impacts on the world, including improving itself to the point of achieving decisive strategic advantage and then destroying everything — the fact that it can procedurally call the useful piece of intelligence that is GPT as much as it wants on arbitrary queries accelerates the likelyhood that it can significantly impact the world because GPT is intelligent and produces potentially useful results at all.
under these conditions, what should OpenAI (and other LLM developers) do?
of course the ideal would be for them to stop all development, close shop, and give all money to alignment. but short of that, if they really want to continue existing anyways, the second best thing would be to significantly limit access to GPT — don't give API access except maybe to very select alignment organizations, and definitely don't put entire models out there. while it might help with PR, i don't think RLHF particularly reduces X-risk except in that it generally makes the LLMs less useful.