avatar

values system as test-driven development

i realized something while reading hands and cities on infinite ethics: the work of determining the shape of our values system is akin to test-driven development.

we are designing a procedure (possibly looking for the simplest one) by throwing it at a collection of decision tests, and looking for which one matches our intuitions.

i wonder if a value-learning approach to AI alignment could look like trying to get superintelligence to find such a procedure; perhaps we feed it a collection of tests and it looks for the simplest procedure that matches those, and hopefully that extrapolates well to situations we didn't think of.

perhaps, even pre-superintelligence we can formalize values research as tests and try to come up with or generate a simple procedure which passes them while also being selected for simplicity.

why simplicity? doesn't occam's razor only apply to descriptive research, not prescriptive? that is true, but "what is the procedure that formalizes my values system" is indeed a prescriptive matter, in a way: we're trying to model something to the best factual accuracy we can.


RSS feed available here; new posts are also linked on my twitter.
CC_ -1 License Unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
This site lives at https://carado.moe and /ipns/k51qzi5uqu5di8qtoflxvwoza3hm88f5osoogsv4ulmhurge2etp9d37gb6qe9.