avatar

(posted on 2022-10-31 — also cross-posted on lesswrong, see there for comments)

publishing alignment research and infohazards

to me, turning my thoughts into posts that i then publish on my blog and sometimes lesswrong serves the following purposes:

however, i've come to increasingly want to write and publish posts which i've determined — either on my own or with the advice of a trusted peers — to be potentially infohazardous, notably with regards to potentially helping AI capability progress.

on one hand, there is no post of mine i wouldn't trust, say, yudkowsky reading; on the other i can't just, like, DM him and everyone else i trust a link to an unlisted post every time i make one.

it would be nice to have a platform — or maybe a lesswrong feature — which lets me choose which persons or groups can read a post, with maybe a little ⚠ sign next to its title.

note that such a platform/feature would need something more complex than just a binary "trusted" flag: just because i can make a post that the Important People can read, doesn't mean i should be trusted to read everything else that they can read; and there might be people whom i trust to read some posts of mine but not others.

maybe trusted recipients could be grouped by orgs — such as "i trust MIRI" or "i trust The Standard List Of Trusted Persons". maybe something like the ability to post on the alignment forum is a reasonable proxy for "trustable person"?

i am aware that this seems hard to figure out, let alone implement. perhaps there is a much easier alternative i'm not thinking about; for the moment, i'll just stick to making unlisted posts and sending them to the very small intersection of people i trust with infohazards and people for whom it's socially acceptable for me to DM links to new posts of mine.

(posted on 2022-10-31 — also cross-posted on lesswrong, see there for comments)


CC_ -1 License unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.