avatar

(this post, cross-posted on lesswrong, has been written for the first Refine blog post day, at the end of the week of readings, discussions, and exercises about epistemology for doing good conceptual research)

goal-program bricks

this is the follow-up to the Insulated Goal-Program idea in which i suggest doing alignment by giving an AI a program to run as its ultimate goal, the running of which would hopefully realize our values. in this post, i talk about what pieces of software could be used to put together an appropriate goal-program, as well as some example of plans built out of them.

here are some naive examples of outlines for goal-program which seem like they could be okay:

these feel like we could be getting somewhere in terms of figuring out actual goal-program that could contain to valuable outcomes; at the very least, it seems like a valuable avenue of investigation. in addition, unlike AGI, individual many pieces of the goal-program can be individually tested, iterated on, etc. in the usual engineering fashion.

(this post is cross-posted on lesswrong; feel free to leave comments there)


CC_ -1 License Unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
This site lives at https://carado.moe and /ipns/k51qzi5uqu5di8qtoflxvwoza3hm88f5osoogsv4ulmhurge2etp9d37gb6qe9.