we are designing a procedure (possibly looking for the simplest one) by throwing it at a collection of decision tests, and looking for which one matches our intuitions.
i wonder if a value-learning approach to AI alignment could look like trying to get superintelligence to find such a procedure; perhaps we feed it a collection of tests and it looks for the simplest procedure that matches those, and hopefully that extrapolates well to situations we didn't think of.
perhaps, even pre-superintelligence we can formalize values research as tests and try to come up with or generate a simple procedure which passes them while also being selected for simplicity.
why simplicity? doesn't occam's razor only apply to descriptive research, not prescriptive? that is true, but "what is the procedure that formalizes my values system" is indeed a prescriptive matter, in a way: we're trying to model something to the best factual accuracy we can.