tammy's blog about
AI alignment,
utopia,
anthropics,
and more;
also on twitter, lesswrong, rss
i believe the True Name for our formalized values will be a program running a computation (typically a simulation or bunch of simulations) in which those values are maximized or satisfied.
in generalized computational interpretability, i talk about what deeply caring about a general computation looks like. in this post, i will outline an example of this.
suppose you are given a program A described in SKI calculus, and you want to know whether it encodes a simulation of conway's game of life B with a hardcoded specific initial state.
in the general case, to get the state of a specific cell in a conway's game of life after n
steps takes Θ(n³)
time: n
because of the time steps, and n²
because of the two-dimensional light cone. for this example, let's say that this is actually the lowest bound to get the states you care about in this particular initial state of conway's game of life.
now, if there is an "extracting" program E such that for any coordinates x,y
and time step n
, taking as input the entire history of running A for Θ(n³)
steps, it returns in less than Θ(n³)
— for example in O(log n)
— a value that is always equal to the state of B at x,y,n
, then A encodes the computation B: A must have "done some of the work" of running B, because we can extract that value from A without re-doing all of the required work.
on the other hand, a formal proof that no such program exists can demonstrate that A does not encode B.
it could be that the problem is sometimes undecidable — i.e. there exists neither a program E (or proof that it exists), nor a proof that it doesn't exist. this seems fine to me; for example, when you're unable to determine whether a computation encodes suffering, just don't run it, or only run it for a limited amount of steps.
(to generalize this to constant-time or constant-size patterns, and to help us figure out the constants in those big-O/Θ's, perhaps information theory can help)
if the specific conway's game of life pattern B you're testing for happens to already be computable in less than Θ(n³)
, then whatever complexity is its lowest bound (if it has one) is the new one under which E must run.
hopefully this can give us an idea as to what formalized shape our values (avoiding suffering, etc) should take, and how to create a world that realizes them.
unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
unless explicitely mentioned, all content on this site was created by me; not by others nor AI.