tammy's blog about

AI alignment,
utopia,
anthropics,
and more;

*this post was written by Tamsin Leake at Orthogonal.*

*thanks to Julia Persson and mesaoptimizer for their help putting it together.*

this post explains the justification for, and the math formalization of, the QACI plan for formal-goal alignment. you might also be interested in its companion post, *formalizing the QACI alignment formal-goal*, which just covers the math in a more straightforward, bottom-up manner.

🟣 ** misato** — hi ritsuko! so, how's this alignment stuff going?

🟡 ** ritsuko** — well, i think i've got

🟢 ** shinji** — that's exciting! what is it?

🟡 ** ritsuko** — so, you know how in

🟡 ** ritsuko** — ah, yes, the good old days when we believed this was the single obstacle to alignment.

🔴 *asuka**barges into the room and exclaims* — hey, check this out! i found this fancy new theory on lesswrong about how "shards of value" emerge in neural networks!

🔴 *asuka**then walks away while muttering something about eiffel towers in rome and waluigi hyperstition…*

🟡 ** ritsuko** indeed. these days, all these excited kids running around didn't learn about AI safety by thinking really hard about what agentic AIs would do — they got here by being spooked by large language models, and as a result they're thinking in all kinds of strange directions, like what it means for a language model to be aligned or how to locate natural abstractions for human values in neural networks.

🟢 ** shinji** — of course that's what we're looking at! look around you, turns out that the shape of intelligence is RLHF'd language models, not agentic consequentialists! why are you still interested in those old ideas?

🟡 ** ritsuko** — the problem, shinji, is that we

🟣 ** misato** — wait, isn't that anthropics? i'd rather stay away from that type of thinking, it seems too galaxybrained to reason about…

🟡 ** ritsuko** — you can't really do that either — the "back to square one" interpretation of anthropics, where you don't update at all,

🟣 ** misato** — …is it?

🟡 ** ritsuko** — of course it is! on inside view,

🟢 ** shinji** — that's kind of frightening!

🟡 ** ritsuko** — well, it's where we are. we already thought we were small in space, now we also know that we're also small in probabilityspace. the important part is that it

🟣 ** misato** — so all the excited kids running around saying we have to figure out how to align language models or whatever…

🟡 ** ritsuko** — they're chasing a chimera. impressive LLMs are not what we observe because they're what powerful AI looks like — they're what we observe because they're what powerful AI

🟣 ** misato** — i'm not sure most timelines are dead yet, though.

🟡 ** ritsuko** — we don't know if "most" timelines are alive or dead from agentic AI, but we know that however many are dead, we couldn't have known about them. if every AI winter was actually a bunch of timelines dying, we wouldn't know.

🟣 ** misato** — you know, this doesn't necessarily seem so bad. considering that confused alignment people is what's caused the appearance of the three organizations trying to kill everyone as fast as possible, maybe it's better that alignment research seems distracted with things that aren't as relevant, rather than figuring out agentic AI.

🟡 ** ritsuko** — you can say that alright! there's already enough capability hazards being carelessly published everywhere as it is, including on lesswrong. if people were looking in the direction of the kind of consequentialist AI that actually determines the future, this could cause a lot of damage. good thing there's a few very careful people here and there, studying the

🟢 ** shinji** — whatever kind of anthropic shenanigans are at play here, they sure seem to be saving our skin! maybe we'll be fine because of quantum immortality or something?

🟣 ** misato** — that's not how things work shinji. quantum immortality explains how you got here, but doesn't help you save the future.

🟢 *shinji**sighs, with a defeated look on his face* — …so we're back to the good old MIRI alignment, we have to perfectly specify human values as a utility function *and* figure out how to align AI to it? this seems impossible!

🟡 ** ritsuko** — well, that's where things get interesting! now that we're talking about coherent agents whose actions we can reason about, agents whose instrumentally convergent goals such as goal-content integrity would be beneficial if they were aligned, agents who won't mysteriously turn bad eventually because they're not yet coherent agents, we can actually

🟣 ** misato** — …and that's what you've been doing?

🟡 ** ritsuko** — well, that's kind of what agent foundations had been about all along, and what got rediscovered elsewhere as "formal-goal alignment": designing an aligned coherent goal and figuring out how to make an AI that is aligned to maximizing it.

🟢 ** shinji** — so what's your idea? i sure could use some hope right now, though i have no idea what an aligned utility function would even

🟡 *ritsuko**smirks* — so, the first important thing to realize is that the challenge of designing an AI that emits output which save the world, can be formulated like this: design an AI trying to solve a mathematical problem, and make the mathematical problem be analogous enough to "what kind of output would save the world" that the AI, by solving it, happens to also save our world.

🟢 ** shinji** — but what does that actually

🟣 ** misato** — maybe it looks like "what output should you emit, which would cause your predicted sequence of stimuli to look like a nice world?"

🟡 ** ritsuko** — what do you think actually happens if an AI were to succeed at this?

🟣 ** misato** — oh, i guess it would hack its stimuli input, huh. is there even a way around this problem?

🟡 ** ritsuko** — what you're facing is a facet of the problem of

🟡 ** ritsuko** — the answer — as in PreDCA — is to model the world from the top-down, and ask: "look into this giant universe. you're in there somewhere. which action should the you-in-there-somewhere take, for this world to have the most expected utility?"

🟢 ** shinji** — expected utility? by what utility function?

🟡 ** ritsuko** — we're coming to it, shinji. there are three components to this: the

🟣 ** misato** — and this top-down view works? how the hell would it compute

🟡 ** ritsuko** — how the hell do you expect AI would have done expected utility maximization

🟡 ** ritsuko** — on the one hand, the question is immensely computationally expensive — it asks to compute the entire history of the universe up to this shinji! but on the other hand, it is talking about a world which

🟣 ** misato** — i'm not convinced. after all, we relied on humans to make this guess! of course you can guess about shinji, you're a human like him. why would the AI be able to make those guesses, being the alien thing that it is?

🟡 ** ritsuko** — i mean, one of its options is to

🟢 ** shinji** — but what if the worlds that are actually described by such math are not in fact this world, but strange alien worlds that look nothing like ours?

🟡 ** ritsuko** — yes, this is also part of the problem. but let's not keep moving the goalpost here. there are two problems:

🟡 ** ritsuko** — if you have to solve two problems A and B, then you have to solve A assuming B is solved, and then solve B assuming A is solved. then, you've got a pair of solutions which work with one another. here, we're solving the problem of whether an AI would be able to solve this problem,

🟢 ** shinji** — are there any

🟣 ** misato**,

🟤 *kaji**can be seen standing against a wall, whistling, pretending not to hear anything.*

🟡 ** ritsuko** — right. one thing i will reiterate, is that we should not observe a published solution to "how to get powerful problem-solving AI" before the world is saved. this is in the class of problems which we die shortly after a solution to it is found and published, so our lack of observing such a solution is not much evidence for its difficulty.

🟡 ** ritsuko** — anyways, to come back to embedded agency.

🟣 ** misato** — ah, i had a question. the AI returns a first action which it believes would overall steer the world in a direction that maximizes its expected utility. and then what? how does it get its observation, update its model, and take the next action?

🟡 ** ritsuko** — well, there are a variety of clever schemes to do this, but an easy one is to just

🟣 ** misato** — what?

🟡 ** ritsuko** — to just

🟢 ** shinji** — "run the action?"

🟡 ** ritsuko** — sure. we can decide in advance that the action will be a linux command to be executed, for example. the scheme does not really matter, so long as the AI gets an output channel which has pretty easy bits of steering the world.

🟣 ** misato** — hold on, hold on. a single action? what do you intend for the AI to do, output a really good pivotal act and then hope things get better?

🟡 ** ritsuko** — have a little more imagination! our AI — let's call it AI₀ — will almost certainly return a single action that

🟡 ** ritsuko** — …and because it's solving the problem "what action would maximize utility when inserted into this world", it will understand that AI₁ needs to have embedded agency and the various other aspects that are instrumental to it — goal-content integrity, robustly delegating RSI, and so on.

🟢 ** shinji** — "RSI"? what's that?

🟣 *misato**sighs* — you know, it keeps surprising me how many youths don't know about the acronym RSI, which stands for Recursive Self-Improvement. it's pretty indicative of how little they're thinking about it.

🟢 ** shinji** — i mean, of course! recursive self-improvement is an obsolete old MIRI idea that doesn't apply to the AIs we have today.

🟣 ** misato** — right, kids like you got into alignment by being spooked by chatbots. (what silly things do they even teach you in class these days?)

🟣 ** misato** — you have to realize that the generation before you, the generation of ritsuko and i, didn't have the empirical evidence that AI was gonna be impressive. we started on something like the empty string, or at least coherent arguments where we had to actually build a gears-level inside-view understanding of what AI would be like, and what it would be capable of.

🟣 ** misato** — to me, one of the core arguments that sold me on the importance of AI and alignment was recursive self-improvement — the idea that

🟢 ** shinji** — but this turned out irrelevant, because AI is getting better than humans

🟡 ** ritsuko** — again, false. we can

🟣 ** misato** — so, i think i have a vague idea of what you're saying, now. top-down view of the universe, which is untractable but that's fine apparently, thanks to some mysterious capabilities; one-shot AI to get around various embedded agency difficulties. what's the actual utility function to align to, now? i'm really curious. i imagine a utility function assigns a value between 0 and 1 to any, uh, entire world? world-history? multiverse?

🟡 ** ritsuko** — it assigns a value between 0 and 1 to any

🟣 ** misato** — oh boy.

🟡 ** ritsuko** — so, first: we're not passing a

🟣 ** misato** — wait, "hijack it"? aren't we assuming an inner-aligned AI, here?

🟡 ** ritsuko** — i don't like this term, "inner-aligned"; just like "AGI", people use it to mean too many different and unclear things. we're assuming an AI which does its best to pick an answer to a math problem. that's it.

🟡 ** ritsuko** — we don't make an AI which tries to not be harmful with regards to its side-channels, such as hardware attacks — except for its output, it needs to be strongly boxed, such that it can't destroy our world by manipulating software or hardware vulnerabilities. similarly, we don't make an AI which tries to output a solution we

🟡 *ritsuko**starts scribbling on a piece of paper on her desk* — let's write down some actual math here. let's call $\Omega $ the set of world-states, ${\mathrm{\Delta}}_{\Omega}$ distributions over world-states, and $A$ be the set of actions.

🟢 ** shinji** — what are the types of all of those?

🟡 ** ritsuko** — let's not worry about that, for now. all we need to assume for the moment is that those sets are countable. we could define both $\Omega \phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}$ and $A\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}$ — define them both as the set of finite bitstrings — and this would functionally capture all we need. as for distributions over world-states ${\mathrm{\Delta}}_{\Omega}$, we'll define ${\mathrm{\Delta}}_{X}\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}\{f|f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}X\to [0;1],\begin{array}{c}\hfill \stackrel{x}{\sum _{x\in X}}\end{array}f(x)\phantom{\rule{0.278em}{0ex}}\le \phantom{\rule{0.278em}{0ex}}1\}$ for any countable set $X$, and we'll call "mass" the number which a distribution associates to any element.

🟣 ** misato** — woah, woah, hold on, i haven't looked at math in a while. what do all those squiggles mean?

🟡 ** ritsuko** — ${\mathrm{\Delta}}_{X}$ is defined as the set of functions $f$, which take an $X$ and return a number between $0$ and $1$, such that if you take the $f$ of all $x$'s in $X$ and add those up, you get a number not greater than $1$. note that i use a notation of sums $\sum $ where the variables being iterated over are above the $\sum $ and the constraints that must hold are below it — so this sum adds up all of the $f(x)$ for each $x$ such that $x\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}X$.

🟣 ** misato** — um, sure. i mean, i'm not quite sure what this

🟡 ** ritsuko** — the set ${\mathrm{\Delta}}_{X}$ of distributions over $X$ is basically like saying "for any finite amounts of mass less than 1, what are some ways to distribute that mass among some or all of the $X$'s?" each of those ways is a distribution; each of those ways is an $f$ in ${\mathrm{\Delta}}_{X}$.

🟡 ** ritsuko** — anyways. the AI will take as input an untractable math expression of type $A\to [0;1]$, and return a single $A$. note that we're in math here, so "is of type" and "is in set" are really the same thing; we'll use $\in $ to denote both set membership and type membership, because they're the same concept. for example, $A\to [0;1]$ is the set of all functions taking as input an $A$ and returning a $[0;1]$ — returning a real number between $0$ and $1$.

🟢 ** shinji** — hold on, a

🟡 ** ritsuko** — well, a real number, but we're passing to the AI a discrete piece of math which will only ever describe countable sets, so we'll only ever describe countably many of those real numbers. infinitely many, but countably infinitely many.

🟣 ** misato** — so the AI has type $(A\to [0;1])\to A$, and we pass it an action-scoring function of type $A\to [0;1]$ to get an action. checks out. where do utility functions come in?

🟡 ** ritsuko** — they don't need to come in at all, actually! we'll be defining a piece of math which describes the world for the purpose of pointing at the humans who will decide on a scoring function, but the scoring function will only be over

🟡 ** ritsuko** — the AI doesn't need to know that its math points to the world it's in; and in fact, conceptually, it isn't

🟡 ** ritsuko** — we will just very carefully box it such that its only meaningful output into our world, the only bits of steering it can predictably use, are those of the action it outputs. and we will also have very carefully designed it such that the only thing it ultimately cares about, is that that output have as high of an expected scoring as possible — it will care about this

🟡 ** ritsuko** — this meaning of "inner-alignment" is still hard to accomplish, but it is much better defined, much narrower, and thus hopefully much easier to accomplish than the "full" embedded-from-the-start alignments which very slow, very careful corrigibility-based AI alignment would result in.

🟣 ** misato** — so what does that scoring function actually look like?

🟡 ** ritsuko** — you know what, i hadn't started mathematizing my alignment idea yet; this might be a good occasion to get started on that!

🟡 *ritsuko**wheels in a whiteboard* — so, what i expect is that the order in which we're gonna go over the math is going to be the *opposite order* to that of the final math report on QACI. here, we'll explore things from the top-down, filling in details as we go — whereas the report will go from the bottom-up, fully defining constructs and then using them.

$\begin{array}{cc}\hfill & \mathit{\text{Prior}}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\Delta}_{\mathit{\text{Hypothesis}}}\hfill \\ \hfill & \mathit{\text{LooksLikeThisWorld}}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathit{\text{Hypothesis}}\to [0;1]\hfill \\ \hfill & \mathit{\text{HowGood}}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}A\to [0;1]\hfill \\ \hfill & \begin{array}{cc}\hfill & h\hfill \\ \hfill \mathit{\text{Score}}(\mathit{\text{action}})\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \sum \mathit{\text{Prior}}(h)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\mathit{\text{LooksLikeThisWorld}}(h)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\mathit{\text{HowGood}}(\mathit{\text{action}},h)\hfill \\ \hfill & h\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathit{\text{Hypothesis}}\hfill \end{array}\hfill \end{array}$

🟡 ** ritsuko** — this is roughly what we'll be doing here. go over all hypotheses $h$ the AI could have within some set of hypotheses, called $\mathit{\text{Hypothesis}}$; measure their $Prior$ probability, the $\mathit{\text{LooksLikeThisWorld}}$ that they correspond to our world, and how good the $\mathit{\text{action}}$ are in them. this is the general shape of

🟢 ** shinji** — wait, the set of hypotheses is called $\mathit{\text{Hypothesis}}$, not $\mathit{\text{Hypotheses}}$? that's a bit confusing.

🟡 ** ritsuko** — this is pretty standard in math, shinji. the reason to call the set of hypotheses $\mathit{\text{Hypothesis}}$ is because, as explained before, sets are also types, and so $\mathit{\text{LooksLikeThisWorld}}$ will be of type $\mathit{\text{Hypothesis}}\to [0;1]$ rather than $\mathit{\text{Hypotheses}}\to [0;1]$.

🟣 ** misato** — what's in a $\mathit{\text{Hypothesis}}$, exactly?

🟡 ** ritsuko** — the set of

🟣 ** misato** — so, a mathematical object representing empirical beliefs?

🟡 ** ritsuko** — i would rather put it as a pair of:

🟢 ** shinji** — what the hell is "realityfluid"???

🟡 ** ritsuko** — it's a very long story, i'm afraid.

🟣 ** misato** — think of it as a measure of how some constant amount of "matteringness"/"realness" — typically 1 unit of it — is distributed across possibilities. even though it kinda mechanistically works like probability mass, it's "in the other direction": it represents what's

🟢 ** shinji** — why would it sum to 1? what if there's an infinite amount of stuff out there?

🟣 ** misato** — your realityfluid still needs to sum up to some constant. if you allocate an infinite amount of matteringness, things break and don't make sense.

🟡 ** ritsuko** — indeed. this is why the most straightforward way to allocate realityfluid is to just imagine that the set of all that exists is a universal program whose computation is cut into time-steps each doing a constant amount of work, and then allocate some diminishing quantities of realityfluid to each time step.

🟣 ** misato** — like saying that compute step number $n\phantom{\rule{0.278em}{0ex}}\ge \phantom{\rule{0.278em}{0ex}}1$ has $\frac{1}{{2}^{n}}$ realityfluid?

🟡 ** ritsuko** — that would indeed normalize, but it diminishes

🟢 ** shinji** — what the hell are you talking about??

🟡 *ritsuko**hands shinji a paper called "Why Philosophers Should Care About Computational Complexity"* — look, this is a whole other tangent, but basically, polynomial amounts of computation corresponds to "doing something", whereas exponential amounts of computation correspond to "magically obtaining something out of the ether", and this sort-of ramificates naturally across the rest of computational complexity applied to metaphysics and philosophy.

🟡 ** ritsuko** — so instead, we can say that computation step number $n\phantom{\rule{0.278em}{0ex}}\ge \phantom{\rule{0.278em}{0ex}}1$ has $\frac{1}{{n}^{2}}$ realityfluid. this only diminishes quadratically, which is satisfactory.

🟡 ** ritsuko** — oh, and for the same reason, the universal program needs to be quantum — for example, it needs to be a quantum equivalent of the classical universal program but for quantum computation, implemented on something like a quantum turing machine). otherwise, unless BQP=BPP, quantum multiverses like ours might be exponentially expensive to compute, which would be strange.

🟢 ** shinji** — why ${n}^{2}$? why not ${n}^{1.01}$ or ${n}^{37}$?

🟡 ** ritsuko** — those do indeed all normalize — but we pick $2$ because at some point you just have to

🟢 ** shinji** — and why are we assuming the universe is made of discrete computation anyways? isn't stuff made of real numbers?

🟡 *ritsuko**sighs* — look, this is what the church-turing-deutsch principle is about. for any universe made up of real numbers, you can approximate it thusly:

- compute 1 step of it with every number truncated to its first 1 binary digit of precision
- compute 1 step of it with every number truncated to its first 2 binary digits of precision

for 1 time step with 1 bit of precision, then 2 time steps with 2 bits of precision, then 3 with 3, and so on. for any piece of branch-spacetime which is only finitely far away from the start of its universe, there exists a threshold at which it starts being computed in a way that is indistinguishable from the version with real numbers.

🟢 ** shinji** — but they're only an approximation of us! they're not

🟡 *ritsuko**sighs* — you don't *know* that. you could be the approximation, and you would be unable to tell. and so, we can work without uncountable sets of real numbers, since they're unnecessary to explain observations, and thus an unnecessary assumption to hold about reality.

🟢 ** shinji**,

🟡 ** ritsuko** — what else are you going to do? you're expressing things in

🟣 ** misato** — actually, can't we introduce turing jumps/halting oracles into this universal program? i heard that this lets us

🟡 ** ritsuko** — there's kind-of-a-sense in which that's true. we could say that the universal program has access to a first-degree halting oracle, or a 20th-degree; or maybe it runs for 1 step with a 1st degree halting oracle, then 2 steps with a 2nd degree halting oracle, then 3 with 3, and so on.

🟡 ** ritsuko** — your program is now capable, at any time step, of computing an infinite amount of stuff. let's say one of those steps happens to run an entire universe of stuff, including a copy of us. how do you sub-allocate realityfluid? how much do we expect to be in there? you could allocate sub-compute-steps — with a 1st degree halting oracle executing at step $n\phantom{\rule{0.278em}{0ex}}\ge \phantom{\rule{0.278em}{0ex}}1$, you allocate $\frac{1}{{n}^{2}{m}^{2}}$ realityfluid to each of the $m\phantom{\rule{0.278em}{0ex}}\ge \phantom{\rule{0.278em}{0ex}}1$ infinite sub-steps in the call to the halting-oracle. you're just doing discrete realityfluid allocation again, except now your some of the realityfluid in your universe is allocated at people who have obtained results from a halting oracle.

🟡 ** ritsuko** — this works, but what does it get you? assuming halting oracles is kind of a very strange thing to do, and regular computation with no halting oracles is

🟢 *shinji**ruminates, unsure where to go from there.*

🟣 *misato**interrupts* — hey, do we really need to cover this? let's say you found out that this whole view of things is wrong. could you fix your math then, to whatever is the correct thing?

🟡 *ritsuko**waves around* — what?? what do you mean *if it's wrong*?? i'm not rejecting the premise that i might be wrong here, but like, my answer here depends a lot on *in what way i'm wrong* and *what is the better / more likely correct thing*. so, i don't know how to answer that question.

🟣 *misato**snaps shinji back to attention* — that's fair enough, i guess. well, let's get back on track.

🟡 ** ritsuko** — so, one insight i got for my alignment idea came from PreDCA, which stands for

- the AI locating itself within possibilities
- locating the high-agenticness-thing which had lots of causation-bits onto itself — call it the "
**Pre**cursor". this is supposed to find the human user who built/launched the AI. (**D**etection) - bunch of criteria to ensure that the precursor is the intended human user and not something else (
**C**lassification) - extrapolating that precursor's utility function, and maximizing it (
**A**ssistance)

🟣 ** misato** — what the hell kind of math would accomplish that?

🟡 ** ritsuko** — well, it's not entirely clear to me. some of it is explained, other parts seem like they're expected to just work naturally. in any case, this isn't so important — the "Learning Theoretic Agenda" into which PreDCA fits is not fundamentally similar to mine, and i do not expect it to be the kind of thing that saves us in time. as far as i predict, that agenda has purchased most of the dignity points it will have cashed out when alignment is solved, when it inspired my own ideas.

🟢 ** shinji** — and

🟡 ** ritsuko** — a lot more likely so, yes! for one, i am not trying to build

🟡 ** ritsuko** — anyways, the important thing is that that idea made me think "hey, what else could we do to even more make sure the selected precursor is the human use we want, and not something else like a nearby fly or the process of evolution?" and then i started to think of some clever schemes for locating the AI in a top-down view of the world, without having to decode physics ourselves, but rather by somehow pointing to the user "through" physics.

🟣 ** misato** — what does that mean, exactly?

🟡 ** ritsuko** — well, remember how PreDCA points to the user from-the-top-down? the way it tries to locate the user is by looking for

🟣 ** misato** — and what sort of patterns are we looking for? what are the

🟡 ** ritsuko** — as far as i understand, PreDCA looks for

🟣 ** misato** — …just raw bitstrings?

🟡 ** ritsuko** — that's right. the idea here is kinda like doing an incantation, except the incantation we're locating is a very large piece of data which is unlikely to be replicated outside of this world. imagine generating a very large (several gigabytes) file, and then asking the AI "look for things of information, in the set of all computations, which look like that pattern." we call "blobs" such bitstrings serving as *anchors into to find our world and location-within-it in the set of possible world-states and locations-within-them.

🟡 ** ritsuko** — for example, let's say the universe is a conway's game of life. then, the AI could have a set of hypotheses as programs which take as input the entire state of the conway's game of life grid at any instant, and returning a bitstring which must be equal to the blob.

🟡 ** ritsuko** — first, we define $\Omega \phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}\{\omega |\omega \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathcal{P}({\mathbb{Z}}^{2}),\phantom{\rule{0.278em}{0ex}}\#\omega \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathbb{N}\}$ (uppercase omega, a set of lowercase omega) as the set of "world-states" — states of the grid, defined as the set of cell positions whose cell is alive.

🟢 ** shinji** — what's $\mathcal{P}({\mathbb{Z}}^{2})$ and $\#\omega $?

🟡 ** ritsuko** — ${\mathbb{Z}}^{2}$ is the set of pairs whose elements are both a member of $\mathbb{Z}$, the set of relative integers. so ${Z}^{2}$ is the set of pairs of relative integers — that is, grid coordinates. then, $\mathcal{P}({\mathbb{Z}}^{2})$ is the set of subsets of ${\mathbb{Z}}^{2}$. finally, $\#w$ is the size of set $w$ — requiring that $\#w\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathbb{N}$ is akin to requiring that $w$ is a finite set, rather than infinite. let's also define:

- $\mathbb{B}=\{\top ,\perp \}$ as the set of booleans
- ${\mathbb{B}}^{*}$ as the set of finite bitstring
- ${\mathbb{B}}^{n}$ is the set of bitstrings of length $n$
- $|b|$ is the length of bitstring $b$

🟡 ** ritsuko** — what do you think "locate blob $b\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}$ in world-state $\omega \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega $" could look like, mathematically?

🟣 ** misato** — let's see — i can use the set of bitstrings of same length as $b$, which is ${\mathbb{B}}^{|b|}$. let's build a set of $\{f|f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \to {\mathbb{B}}^{|b|}\phantom{\rule{0.278em}{0ex}}\dots $

🟢 ** shinji** — wait, $\Omega \to {\mathbb{B}}^{|b|}$ is the set of

🟡 ** ritsuko** — this is a very good remark, shinji! indeed, we need to do a bit more work; for now we'll just posit that for any sets $A,B$, $A\stackrel{H}{\to}B$ is the set of always-halting, always-succeeding programs taking as input an $A$ and returning a $B$.

🟣 ** misato** — let's see — what about $\{f|f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \stackrel{H}{\to}{\mathbb{B}}^{|b|},f(\omega )=b\}$?

🟡 ** ritsuko** — you're starting to get there — this is indeed the set of programs which return $b$ when taking $\omega $ as input. however, it's merely a

🟢 ** shinji** — oh, i remember! it's the set of functions in $X\to [0;1]$ which sum up to at most one over all of $X$.

🟡 ** ritsuko** — indeed! so, we're gonna posit what i'll call

🟣 ** misato** — oh, i know then! the distribution, for each $f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \stackrel{H}{\to}{\mathbb{B}}^{|b|}$, must return $\{\begin{array}{cc}{K}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{*}}^{-}(f)\hfill & \text{if}\phantom{\rule{0.278em}{0ex}}f(\omega )=b\hfill \\ 0\hfill & \text{if}\phantom{\rule{0.278em}{0ex}}f(\omega )\phantom{\rule{0.278em}{0ex}}\ne \phantom{\rule{0.278em}{0ex}}b\hfill \end{array}$

🟡 ** ritsuko** — that's right! we can start to define ${\mathit{\text{Loc}}}_{n}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \times {\mathbb{B}}^{n}\to {\Delta}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}}$ as the function that takes as input a pair of world-state $\omega \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega $ and blob $b\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}$ of length $n$, and returns a distribution over programs that "find" $b$ in $\omega $. plus, since functions $f$ are weighed by their kolmogorov simplicity, for complex $b$'s they're "encouraged" to find the bits of complexity of $b$

🟡 ** ritsuko** — note also that this ${\mathit{\text{Loc}}}_{n}$ distribution over $\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}$ returns, for any function $f$, either ${K}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}}^{-}$ or $0$, which entails that for any given $\omega ,b$, the sum of ${\mathit{\text{Loc}}}_{n}(\omega ,b)(f)$ for all $f$'s sums up to less than one — that sum represents in a sense "how hard it is to find $b$ in $\omega $" or "the probability that $b$ is somewhere in $\omega $".

$\begin{array}{cc}\hfill & f\hfill \\ \hfill \forall (\omega ,b)\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \times {\mathbb{B}}^{n}:\phantom{\rule{0.278em}{0ex}}& \sum {\mathit{\text{Loc}}}_{n}(\omega ,b)(f)1\hfill \\ \hfill & f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\hfill \end{array}$

🟡 ** ritsuko** — the notation here, ${\mathit{\text{Loc}}}_{n}(\omega ,b)(f)$ is because ${\mathit{\text{Loc}}}_{n}(\omega ,b)$ returns a distribution ${\mathrm{\Delta}}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}}$, which is itself a function $(\Omega \stackrel{H}{\to}{\mathbb{B}}^{n})\to [0;1]$ — so we apply $\mathit{\text{Loc}}$ to $\omega ,b$, and then we sample the resulting distribution on $f$.

🟢 ** shinji** — "the sum represents"? what do you mean by "represents"?

🟡 ** ritsuko** — well, it's the concept which i'm trying to find a "true name" for, here. "how much is the blob $b$ located in world-state $\omega $? well, as much of the sum of the kolmogorov simplicity of every program that returns $b$ when taking as input $\omega $".

🟣 ** misato** — and then what? i feel like my understanding of how this ties into anything is still pretty loose.

🟡 ** ritsuko** — so, we're actually gonna get

🟢 ** shinji** — how are we gonna get

🟡 ** ritsuko** — here's my idea: we're gonna make $f(\omega )$ return not just ${\mathbb{B}}^{*}$ but rather ${\mathbb{B}}^{n}\times {\mathbb{B}}^{*}$ — a pair of the blob of a "free bitstring" $\tau $ (tau) which it can use to store "everything in the world-state except $b$". and we'll also sample programs $g\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega $ which "put the world-state back together" given the same free bitstring, and a

🟣 ** misato** — so, for $\omega ,b$, $\mathit{\text{Loc}}$ is defined as something like…

$\begin{array}{cc}\hfill & {\mathit{\text{Loc}}}_{n}(\omega ,b)\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathrm{\Delta}}_{(\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*})\times ({\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega )}\hfill \\ \hfill & {\mathit{\text{Loc}}}_{n}(\omega ,b)(f,g)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}\{\begin{array}{cc}{K}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}}^{-}(f)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}{K}_{{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega}^{-}(g)\hfill & \text{if}\phantom{\rule{0.278em}{0ex}}\{\begin{array}{c}\text{let}\phantom{\rule{0.278em}{0ex}}\tau \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}\phantom{\rule{0.278em}{0ex}}\text{suchthat}\phantom{\rule{0.278em}{0ex}}f(\omega )=(b,\tau )\hfill \\ \text{in}\phantom{\rule{0.278em}{0ex}}g(b,\tau )=\omega \hfill \end{array}\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}\hfill \end{array}$

🟢 *shinji**stares at the math for a while* — actually, shouldn't the $\text{if}$ statement be more general? you don't just want $g$ to work on $b$, you want $g$ to work on *any other blob of the same length*.

🟡 ** ritsuko** — that's correct shinji! let's call the original blob $b$ the "factual blob", let's call other blobs of the same length we could insert in its stead "counterfactual blobs" and write them as $b\prime $ — we can establish that $\prime $ (prime) will denote counterfactual things in general.

🟣 ** misato** — so it's more like…

$\{\begin{array}{c}\text{let}\phantom{\rule{0.278em}{0ex}}\tau \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}\phantom{\rule{0.278em}{0ex}}\text{suchthat}\phantom{\rule{0.278em}{0ex}}f(\omega )=(b,\tau )\hfill \\ \text{in}\phantom{\rule{0.278em}{0ex}}{\forall b\prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}:}\phantom{\rule{0.278em}{0ex}}g(b\prime ,\tau )={\dots}\hfill \end{array}$

🟣 ** misato** — …$g(b\prime ,\tau )$ should equal, exactly?

🟡 ** ritsuko** — we don't know what it should equal, but we do know

$\{\begin{array}{c}\text{let}\phantom{\rule{0.278em}{0ex}}\tau \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}\phantom{\rule{0.278em}{0ex}}\text{suchthat}\phantom{\rule{0.278em}{0ex}}f(\omega )=(b,\tau )\hfill \\ \text{in}\phantom{\rule{0.278em}{0ex}}{\forall b\prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}:\phantom{\rule{0.278em}{0ex}}f(g(b\prime ,\tau ))=(b\prime ,\tau )}\hfill \end{array}$

🟡 ** ritsuko** — actually, let's make ${\mathit{\text{Loc}}}_{n}$ be merely a distribution over functions that produce counterfactual world-states from counterfactual blobs ${\mathbb{B}}^{n}\to \Omega $ — let's call those "counterfactual insertion functions" and denote them $\gamma $ and their set ${\Gamma}_{n}$ (gamma) — and we'll encapsulate $\tau $ away from the rest of the math:

$\begin{array}{cc}\hfill & f,g,\tau \hfill \\ \hfill {\mathit{\text{Loc}}}_{n}(\omega ,b)(\gamma )\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \sum {K}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}}^{-}(f)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}{K}_{{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega}^{-}(g)\hfill \\ \hfill & f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\hfill \\ \hfill & g\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega \hfill \\ \hfill & f(\omega )=(b,\tau )\hfill \\ \hfill & \begin{array}{cc}\hfill \forall b\prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}:\phantom{\rule{0.278em}{0ex}}& f(g(b\prime ,\tau ))=(b\prime ,\tau )\hfill \\ \hfill & \gamma (b\prime )=g(b\prime ,\tau )\hfill \end{array}\hfill \end{array}$

🟢 ** shinji** — isn't $f(g(b\prime ,\tau ))=(b\prime ,\tau )$ a bit circular?

🟡 ** ritsuko** — well, yes and no. it leaves a lot of degrees of freedom to $f$ and $g$, perhaps too much. let's say we had some function $\mathit{\text{SimilarPasts}}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \times \Omega \to [0;1]$ — let's not worry about how it works. then could weigh each "blob location" by how much counterfactual world-states are similar, when sampled over all counterfactual blobs.

🟣 ** misato** — maybe we should also constrain the $f,g$ programs for how long they take to run?

🟡 ** ritsuko** — ah yes, good idea. let's say that for $x\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}X$ and $f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}X\stackrel{H}{\to}Y$, $R(f,x)\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathbb{N}\backslash \{0\}$ is how long it takes to run program $f$ on input $x$, in some amount of steps each doing a constant amount of work — such as steps of compute in a turing machine.

$\begin{array}{cccc}\hfill & f,g,\tau \hfill & \hfill & {b\prime}\hfill \\ \hfill {\mathit{\text{Loc}}}_{n}(\omega ,b)(\gamma )\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \sum {K}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}}^{-}(f)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}{K}_{{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega}^{-}(g)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\hfill & \hfill & {\sum \frac{1}{\#{\mathbb{B}}^{n}}\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\frac{\mathit{\text{SimilarPasts}}(\omega ,g(b\prime ,\tau ))}{R(g,(b\prime ,\tau ))+R(f,g(b\prime ,\gamma ))}}\hfill \\ \hfill & f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\hfill & \hfill & {b\prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}}\hfill \\ \hfill & g\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega \hfill & \hfill & \hfill \\ \hfill & f(\omega )=(b,\tau )\hfill & \hfill & \hfill \\ \hfill & \begin{array}{cc}\hfill \forall b\prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}:\phantom{\rule{0.278em}{0ex}}& f({\gamma (b\prime )})=(b\prime ,\tau )\hfill \\ \hfill & \gamma (b\prime )=g(b\prime ,\tau )\hfill \end{array}\hfill & \hfill & \hfill \end{array}$

🟡 ** ritsuko** — (i've also replaced $f(g(b\prime ,\tau ))$ with $f(\gamma (b\prime ))$ since that's shorter and they're equal anyways)

🟣 ** misato** — where does the first sum end, exactly?

🟡 ** ritsuko** — it applies to the whole– oh, you know what, i can achieve the same effect by flattening the whole thing into a single sum. and renaming the $b\prime $ in $\forall b\prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}$ to $b\prime \prime $ to avoid confusion.

$\begin{array}{cc}\hfill & f,g,\tau {,b\prime}\hfill \\ \hfill {\mathit{\text{Loc}}}_{n}(\omega ,b)(\gamma )\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \sum {K}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}}^{-}(f)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}{K}_{{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega}^{-}(g)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\frac{1}{\#{\mathbb{B}}^{n}}\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\frac{\mathit{\text{SimilarPasts}}(\omega ,g(b\prime ,\tau ))}{R(g,(b\prime ,\tau ))+R(f,g(b\prime ,\tau ))}\hfill \\ \hfill & f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\hfill \\ \hfill & g\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega \hfill \\ \hfill & {b\prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}}\hfill \\ \hfill & f(\omega )=(b,\tau )\hfill \\ \hfill & \begin{array}{cc}\hfill \forall {b\prime \prime}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}:\phantom{\rule{0.278em}{0ex}}& f(\gamma ({b\prime \prime}))=({b\prime \prime},\tau )\hfill \\ \hfill & \gamma ({b\prime \prime})=g({b\prime \prime},\tau )\hfill \end{array}\hfill & \hfill & \hfill \end{array}$

🟢 ** shinji** — are we still operating in conway's game of life here?

🟡 ** ritsuko** — oh yeah, now might be a good time to start generalizing. we'll carry around not just world-states $\omega \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega $, but

$\begin{array}{cc}\hfill & f,g,\tau ,b\prime \hfill \\ \hfill {\mathit{\text{Loc}}}_{n}({\alpha ,}\omega ,b)(\gamma )\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \sum {K}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}}^{-}(f)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}{K}_{{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega}^{-}(g)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\frac{1}{\#{\mathbb{B}}^{n}}\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\frac{{\mathit{\text{SimilarPasts}}}_{{\alpha}}(\omega ,g(b\prime ,\tau ))}{R(g,(b\prime ,\tau ))+R(f,g(b\prime ,\tau ))}\hfill \\ \hfill & f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\hfill \\ \hfill & g\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega \hfill \\ \hfill & b\prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}\hfill \\ \hfill & f(\omega )=(b,\tau )\hfill \\ \hfill & \begin{array}{cc}\hfill \forall b\prime \prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}:\phantom{\rule{0.278em}{0ex}}& f(\gamma (b\prime \prime ))=(b\prime \prime ,\tau )\hfill \\ \hfill & \gamma (b\prime \prime )=g(b\prime \prime ,\tau )\hfill \end{array}\hfill & \hfill & \hfill \end{array}$

🟢 ** shinji** — i notice that you're multiplying together your "kolmogorov simplicities" and $\frac{1}{\#{\mathbb{B}}^{n}}$ and now $\mathit{\text{SimilarPasts}}$ divided by a sum of how long they take to run. what's going on here exactly?

🟡 ** ritsuko** — well, each of those number is a "confidence amount" — scalars between 0 and 1 that say "how much does

🟢 ** shinji** — ah, i see. so these sums do something kinda like "expected value" in probability?

🟡 ** ritsuko** — something kinda like that. actually, this notation is starting to get unwieldy. i'm noticing a bunch of this pattern: $\begin{array}{cc}\hfill & x\hfill \\ \hfill & \sum \mathit{\text{SomeDistribution}}(x)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\mathit{\text{expression}}\hfill \\ \hfill & x\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathit{\text{SomeSet}}\hfill \end{array}$

🟣 ** misato** — so, if you want to use the standard probability theory notations, you need random variables which–

🟡 ** ritsuko** — ugh, i

$\begin{array}{cccc}\hfill & {v}_{1},\dots ,{v}_{p}\hfill & \hfill & {v}_{1},\dots ,{v}_{p}\hfill \\ \hfill & \mathbf{M}[V]\hfill & \hfill \u2254\phantom{\rule{0.278em}{0ex}}& \sum \phantom{\rule{0.278em}{0ex}}{X}_{1}({x}_{1})\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\dots \phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}{X}_{n}({x}_{n})\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}V\hfill \\ \hfill & {x}_{1}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{X}_{1}\hfill & \hfill & {x}_{1}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathit{\text{domain}}({X}_{1})\hfill \\ \hfill & \vdots \hfill & \hfill & \vdots \hfill \\ \hfill & {x}_{n}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{X}_{n}\hfill & \hfill & {x}_{n}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathit{\text{domain}}({X}_{n})\hfill \\ \hfill & {C}_{1}\hfill & \hfill & {C}_{1}\hfill \\ \hfill & \vdots \hfill & \hfill & \vdots \hfill \\ \hfill & {C}_{m}\hfill & \hfill & {C}_{m}\hfill \end{array}$

🟡 ** ritsuko** — $\mathbf{M}$ will stand for "constrained mass", and it's basically syntactic sugar for sums, where $x\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}X$ means "sum over $x\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\mathit{\text{domain}}(X)$ (where $\mathit{\text{domain}}$ returns the set of arguments over which a function is defined), and then multiply each iteration of the sum by $X(x)$". now, we just have to define uniform distributions over finite sets as…

🟢 ** shinji** — ${\mathit{\text{Uniform}}}_{X}(x)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}\frac{1}{\#X}$ for finite set $X$?

🟡 ** ritsuko** — that's it! and now, $\mathit{\text{Loc}}$ is much more easily written down:

$\begin{array}{cc}\hfill & f,g,\tau ,b\prime \hfill \\ \hfill {\mathit{\text{Loc}}}_{n}(\alpha ,\omega ,b)(\gamma )\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& {\mathbf{M}\left[{\begin{array}{c}\hfill \frac{{\mathit{\text{SimilarPasts}}}_{\alpha}(\omega ,g(b\prime ,\tau ))}{R(g,(b\prime ,\tau ))+R(f,g(b\prime ,\tau ))}\end{array}}\right]}\hfill \\ \hfill & f\phantom{\rule{0.278em}{0ex}}{:}\phantom{\rule{0.278em}{0ex}}{K}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}}^{-}\hfill \\ \hfill & g\phantom{\rule{0.278em}{0ex}}{:}\phantom{\rule{0.278em}{0ex}}{K}_{{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega}^{-}\hfill \\ \hfill & b\prime \phantom{\rule{0.278em}{0ex}}{{:\phantom{\rule{0.278em}{0ex}}\mathit{\text{Uniform}}}}_{{\mathbb{B}}^{n}}\hfill \\ \hfill & f(\omega )=(b,\tau )\hfill \\ \hfill & \begin{array}{cc}\hfill \forall b\prime \prime \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}:\phantom{\rule{0.278em}{0ex}}& f(\gamma (b\prime \prime ))=(b\prime \prime ,\tau )\hfill \\ \hfill & \gamma (b\prime \prime )=g(b\prime \prime ,\tau )\hfill \end{array}\hfill \end{array}$

🟢 ** shinji** — huh. you know, i'm pretty skeptical of you inventing your own probability notations, but this

🟣 ** misato** — so, are we done here? is this blob location?

🟡 ** ritsuko** — well, i expect that some thing are gonna come up later that are gonna make us want to change this definition. but right now, the only improvement i can think of is to replace $f\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}}^{-}$ and $g\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{{\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega}^{-}$ with $(f,g)\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{(\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*})\times ({\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega )}^{-}$.

🟣 ** misato** — huh, what's the difference?

🟡 ** ritsuko** — well, now we're sampling $f,g$ from kolmogorov simplicity

🟣 ** misato** — and we want that?

🟡 ** ritsuko** — yes! there are some cases where we'd want two mathematical objects to have a lot of information in common, and other places where we'd want them to not need to be dissimilar. here, it is clearly the former: we want the program that "deconstructs" the world-state into blob and everything-else, and the function that "reconstructs" a new world-state from a counterfactual blob and the same everything-else, to be able to share information as to how they do that.

🟢 ** shinji** — so we've put together a true name for "piece of data in the universe which can be replaced with counterfactuals". that's pretty nifty, i guess, but what do we do with it?

🟡 ** ritsuko** — now, this is where the core of my idea comes in: in the physical world, we're gonna create a random unique enough blob on someone's computer. then we're going to, still in the physical world, read its contents right after generating it. if it looks like a counterfactual (i.e. if it doesn't look like randomness) we'll create another blob of data, which can be recognized by $\mathit{\text{Loc}}$ as an answer.

🟢 ** shinji** — what does that entail, exactly?

🟡 ** ritsuko** — we'll have created a piece of

🟣 ** misato** — hold on — we already have this. the AI can already have an interface where it asks a human user something, and waits for our answer. and the problem with that is that, obviously, the AI hijacks us or its interface to get whatever answer makes its job easiest.

🟡 ** ritsuko** — aha, but this is different! we can point at a counterfactual question-and-answer chunk-of-time (call it "question-answer counterfactual interval", or "QACI") which is

🟣 ** misato** — huh.

🟡 ** ritsuko** — that's another idea i got from PreDCA — making the AI pursue the values of

🟢 ** shinji** — but we don't want the AI to lock-in our values, we want the AI to satisfy our values-as-they-evolve-over-time, don't we?

🟣 ** misato** — well, shinji, there's multiple ways to phrase your mistake, here. one is that, actually, you do — but if you're someone

🟣 ** misato** — but you

🟡 ** ritsuko** — put it another way: if you're reasonable, then if the AI asks you what you want inside the question-answer counterfactual interval, you won't answer "i want everyone to be forced to watch the most popular TV show in 2023". you'll answer something more like "i want everyone to be able to reflect on their own values and choose what values and choices they endorse, and how, and that the field of philosophy can continue in these ways in order to figure out how to resolve conflicts", or something like that.

🟣 ** misato** — wait, if the AI is asking the user counterfactual questions, won't it ask the user whatever counterfactual question brainhacks the user into responding whatever answer makes its job easiest? it can just hijack the QACI.

🟡 ** ritsuko** — aha, but we don't have to have

🟢 ** shinji** — so it's kinda like coherent extrapolated volition but for actions?

🟡 ** ritsuko** — sure, i think of it as

🟣 ** misato** — how does

🟡 ** ritsuko** — so, let's define $\mathit{\text{QACI}}$ as a function, and this'll clarify what's going on. $q\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}$ will be our initial random factual question blob. $\mathit{\text{QACI}}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \times {\Gamma}_{|q|}\times {\mathbb{B}}^{|q|}\to {\mathrm{\Delta}}_{{\mathbb{B}}^{|q|}}$ takes as parameter a blob location for the question — which, remember, comes in the form of a function you can use to produce counterfactual world-states with counterfactual blobs! — and a counterfactual question blob $q\prime $, and returns a distribution of possible answers $r$. it's defined as:

$\begin{array}{cc}\hfill & {\omega}_{r},{\gamma}_{r}\hfill \\ \hfill \mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}^{\prime})(r)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[1]\hfill \\ \hfill & {\omega}_{r}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}^{\to}({\gamma}_{q}({q}^{\prime}))\hfill \\ \hfill & {\gamma}_{r}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}(\alpha ,{\omega}_{r},r)\hfill \end{array}$

🟡 ** ritsuko** — we're, for now just positing, that there is a function ${\Omega}_{\alpha}^{\to}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \to {\mathrm{\Delta}}_{\Omega}$ (remember that $\alpha $ defines a hypothesis for the initial state, and mechanics, of our universe) which, given a world-state, returns a distribution of world-states that are in its future. so this piece of math samples possible future world-states of the counterfactual world-state where $q$ was replaced with ${q}^{\prime}$, and possible locations of possible answers in those world-states.

🟣 ** misato** — $\mathbf{M}\phantom{\rule{0.278em}{0ex}}[1]$? what does

🟡 ** ritsuko** — here, the fact that ${\mathit{\text{Loc}}}_{n}(\alpha ,\omega ,b)$

🟣 ** misato** — hmmm. wait, this just finds whichever-answers-are-the-easiest-to-find. what guarantees that $r$ looks like

🟡 ** ritsuko** — this is a good point. maybe we should define something like $\mathit{\text{Sign}}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}\to {\mathbb{B}}^{|q|}$ which, to any input "payload" of a certain length, associates a blob which is actually highly complex, because $\mathit{\text{Sign}}$ embeds a lot of bits of complexity. for example, maybe $\mathit{\text{Sign}}(\pi )$ (where $\pi $ is the "payload") concatenates $\pi $ together with a long cryptographic hash of $\pi $ and of some piece of information highly entangled with our world-state.

$\begin{array}{cc}\hfill & {\omega}_{r},{\gamma}_{r}\hfill \\ \hfill \mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}^{\prime})({{\pi}_{r}})\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[1]\hfill \\ \hfill & {\omega}_{r}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}^{\to}({\gamma}_{q}({q}^{\prime}))\hfill \\ \hfill & {\gamma}_{r}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}(\alpha ,{\omega}_{r},{\mathit{\text{Sign}}({\pi}_{r})})\hfill \end{array}$

🟢 ** shinji** — we're not signing the counterfactual question ${q}^{\prime}$, only the answer payload ${\pi}_{r}$?

🟡 ** ritsuko** — that's right. signatures matter for blobs we're

🟣 ** misato** — so, it seems to me like how ${\Omega}^{\to}$ works here, is pretty critical. for example, if it contains a bunch of mass at world-states where some AI is launched, whether ours or another, then that AI will try to fill its future lightcone with answers that would match various $\mathit{\text{Sign}}({\pi}_{r})$'s — so that

🟡 ** ritsuko** — this is true! indeed, how we sample for ${\Omega}^{\to}$ is pretty critical. how about this: first, we'll pass the distribution into $\mathit{\text{Loc}}$:

$\begin{array}{cc}\hfill & {\gamma}_{r}\hfill \\ \hfill \mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}^{\prime})({\pi}_{r})\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[1]\hfill \\ \hfill & {\gamma}_{r}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}(\alpha ,{{\Omega}_{\alpha}^{\to}({\gamma}_{q}({q}^{\prime}))},\mathit{\text{Sign}}({\pi}_{r}))\hfill \end{array}$

🟡 ** ritsuko** — …and inside ${\mathit{\text{Loc}}}_{n}$, which is now of type ${\mathit{\text{Loc}}}_{n}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \times {\mathrm{\Delta}}_{\Omega}\times {\mathbb{B}}^{n}\to {\mathrm{\Delta}}_{{\Gamma}_{n}}$, for any $f,g$ we'll only sample world-states $\omega $ which have the

$\begin{array}{cc}\hfill & f,g{,\omega},\tau ,{b}^{\prime}\hfill \\ \hfill {\mathit{\text{Loc}}}_{n}(\alpha ,{\delta},b)(\gamma )\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\left[\frac{{\mathit{\text{SimilarPasts}}}_{\alpha}(\omega ,g({b}^{\prime},\tau ))}{R(g,({b}^{\prime},\tau ))+R(f,g({b}^{\prime},\tau ))}\right]\hfill \\ \hfill & (f,g)\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{\phantom{\rule{0.278em}{0ex}}(\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*})\times ({\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega )}^{-}\hfill \\ \hfill & {\omega \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}\lambda \omega :{\mathit{\text{max}}}_{X}^{\Delta}(\lambda \omega :\Omega .\{\begin{array}{cc}\delta (\omega )\hfill & \text{if}\phantom{\rule{0.278em}{0ex}}f(\omega )=(b,\tau )\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}).\delta (\omega )}\hfill \\ \hfill & {b}^{\prime}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Uniform}}}_{{\mathbb{B}}^{n}}\hfill \\ \hfill & f(\omega )=(b,\tau )\hfill \\ \hfill & \begin{array}{cc}\hfill \forall {b}^{\prime \prime}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}:\phantom{\rule{0.278em}{0ex}}& \gamma ({b}^{\prime \prime})=g({b}^{\prime \prime},\tau )\hfill \\ \hfill & f(\gamma ({b}^{\prime \prime}))=({b}^{\prime \prime},\tau )\hfill \end{array}\hfill \end{array}$

🟡 ** ritsuko** — the intent here is that for any way-to-find-the-blob $f,g$, we only sample the closest matching world-states in time — which

🟣 ** misato** — can you disentangle the line where you sample $\omega $?

🟡 ** ritsuko** — sure! so, we write an anonymous function $\lambda \omega :X.\delta (\omega )$ — a distribution is a function, after all! — taking a parameter $\omega $ from the set $X$, and returning $\delta (\omega )$. so this is going to be a distribution that is just like $\delta $, except it's only defined for a subset of $\Omega $ — those in $X$.

🟡 ** ritsuko** — in this case, $X$ is defined as such: first, take the set of elements $\omega \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega $ for which $f(\omega )=(b,\tau )$. then, apply the distribution $\delta $ to all of them, and only keep elements for which they have the most $\delta $ (there can be multiple, if multiple elements have the same maximum mass!).

🟡 ** ritsuko** — oh, and i guess $f(\omega )=(b,\tau )$ is redundant now, i'll erase it. remember that this syntax means "sum over the body for all values of $f,g,\omega ,\tau ,b\prime $ for which these constraints hold…", which means we can totally have the value of $\tau $ be bound inside the definition of $\omega $ like this — it'll just have exactly one value for any pair of $f$ and $\alpha $.

🟢 ** shinji** — why is $\mathit{\text{QACI}}$ returning a distribution over answers, rather than picking the single element with the most mass in the distribution?

🟡 ** ritsuko** — that's a good question! in theory, it could be that, but we do want the user to be able to go to the next possible counterfactual answer if the first one isn't satisfactory, and the one after that if

🟢 ** shinji** — so the AI is asking the counterfactual past-user-in-time to come up with a good action-scoring function in… however long a question-answer counterfactual interval is.

🟡 ** ritsuko** — let's say about a week.

🟢 ** shinji** — and this helps… how, again?

🟡 ** ritsuko** — well. first, let's posit ${\mathit{\text{EvalMath}}}_{X}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}\to \{\{x\}|x\in X\}\cup \{\varnothing \}$, which tries to parse and evaluate a bitstring representing a piece of math (in some pre-established formal language) and returns either:

- what it evaluates to if it is a member of $X$
- an empty set if it isn't a member of $X$ or fails to parse or evaluate

🟡 ** ritsuko** — we then define ${\mathit{\text{EvalMath}}}_{X}^{\Delta}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\Delta}_{\Pi}\to X$ as a function that returns the highest-mass element of the distribution for which ${\mathit{\text{EvalMath}}}_{X}$ returns a value rather than the empty set. we'll also assume for convenience ${q}_{*}^{\prime}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}*\to {\mathbb{B}}^{|q|}$, a convenience function which converts any mathematical object into a counterfactual blob ${\mathbb{B}}^{|q|}$. this isn't really allowed, but it's just for the sake of example here.

🟣 ** misato** — okay…

🟡 ** ritsuko** — so, let's say the first call is $\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{*}^{\prime}({\text{"pleaseproduceagoodaction-scoring"}}))$. the user can return

🟣 ** misato** — right, this is the long-reflection process you mentioned. and about the part where they get a mathematical oracle?

🟡 ** ritsuko** — so, the user can return things like:

${\mathit{\text{EvalMath}}}_{U}^{\Delta}(\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{*}^{\prime}({\mathit{\text{SomeUncomputableQuery}}()})))$

${\mathit{\text{EvalMath}}}_{U}^{\Delta}(\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{*}^{\prime}({\mathit{\text{Halts}}(\mathit{\text{SomeProgram}},\mathit{\text{SomeInput}})})))$.

🟣 ** misato** — huh. that's nifty.

🟢 ** shinji** — what if some weird memetic selection effects happen, or what if in one of the QACI intervals, the user randomly gets hit by a truck and then the whole scheme fails?

🟡 ** ritsuko** — so, the user can set up giant giant acyclic graphs of calls to themselves, providing a lot of redundancy. that way, if any single node fails to return a coherent output, the next nodes can notice this and keep working with their peer's output.

🟡 ** ritsuko** — a small graph of QACI can be accomplished with something like

${\mathit{\text{EvalMath}}}_{U}^{\Delta}(\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{*}^{\prime}\left(\begin{array}{cc}\hfill & {\mathit{\text{EvalMath}}}_{U}^{\Delta}(\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{*}^{\prime}({\text{"whataboutthis\u2026"}}))),\hfill \\ \hfill & {\mathit{\text{EvalMath}}}_{U}^{\Delta}(\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{*}^{\prime}({\text{"whataboutthat\u2026"}})))\hfill \end{array}\right)))$

🟡 ** ritsuko** — and you can imagine how this could be extended to a larger structure such as:

🟢 ** shinji** — wait a minute; what if the QACI user goes crazy from realizing that they're

🟡 ** ritsuko** — ah. uuh. well, don't panic shinji, but,

🟢 *shinji**starts flailing around.*

🟣 ** misato** — what the hell are you talking about?

🟡 ** ritsuko** — misato, come on. we're inside a crazy high-tech compound designed to build and maintain giant robots to fight alien creatures,

🔵 *rei**appears in front of them and waves at the camera.*

🟣 ** misato** — …so, why are we talking about alignment, instead of, i don't know, trying to escape?

🟡 ** ritsuko** — misato, this is

🟡 ** ritsuko** — but don't be mistaken: if we are indeed fictional, then the future of this world we're observing barely matters. our parent world likely holds a lot more realityfluid. so we shou–

🟢 ** shinji**,

🟡 ** ritsuko** — anyways, the coherent thing for fictional characters to do is to try to steer their parent world, because it's generally "realer". and generally, this is achieved by cooperating with whatever their parent world seems to be doing, so we should probably just continue to discuss AI alignment.

🟡 ** ritsuko** — and, in a similar manner, there are people who grok these facts sufficiently well that we would expect them to take the correct actions, even upon realizing that they are the fictional consideration of a superintelligence. given the explanation i just gave, i would claim that i could do it myself, for one.

🔵 *rei**magically erases everyone's memory of this section of the dialogue, and then disappears, out of view from everyone.*

🟢 ** shinji** — so we have $\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},q\prime )$, the function which can call itself recursively. what's the top-level, terminal call to it which yields the action-scoring function?

🟡 ** ritsuko** — ah, i think it'd look like:

$\begin{array}{cc}\hfill & {\pi}_{r}\hfill \\ \hfill {\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q})(u)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[1]\hfill \\ \hfill & {\pi}_{r}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{0}^{\prime})\hfill \\ \hfill & u\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathit{\text{EvalMath}}}_{U}({\pi}_{r})\hfill \end{array}$

🟡 ** ritsuko** — where ${q}_{0}^{\prime}$ is some initial counterfactual blob, such as the plaintext string "please return a good scoring function over actions" encoded in ASCII, and then padded with zeros to be of the size needed for a blob. ${\mathit{\text{QACI}}}_{0}$ has type ${\Gamma}_{|q|}\to {\Delta}_{U}$ — from a question location, it returns a distribution of action-scoring functions.

🟣 ** misato** — so like, the counterfactual user inside the $\mathit{\text{QACI}}$ call should be able to return math that calls more $\mathit{\text{QACI}}$, but where do

🟢 ** shinji** — couldn't they return the whole math?

🟡 ** ritsuko** — ah, that's not gonna work — the chance of erroneous blob locations might accumulate too much if each $\text{QACI}$ does a new question location sampling; we want something more realiable. an easy solution is to $\mathit{\text{EvalMath}}$ the text not into a $U$, but into a $\Omega \times {\Gamma}_{|q|}\to U$ and to pass it $\alpha ,{\gamma}_{q}$ so that the user can return a function which receives those and uses them to call $\mathit{\text{QACI}}$.

🟡 ** ritsuko** — actually, while we're at it, we can pass a it whole lot more things it might need…

$\begin{array}{cc}\hfill & {\pi}_{r},f\hfill \\ \hfill {\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q})(u)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[1]\hfill \\ \hfill & {\pi}_{r}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{0}^{\prime})\hfill \\ \hfill & {f}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathit{\text{EvalMath}}}_{{\{q\}\times \Omega \times {\Gamma}_{|q|}\to}U}({\pi}_{r})\hfill \\ \hfill & {f(q,\alpha ,{\gamma}_{q})=u}\hfill \end{array}$

🟢 ** shinji** — what's going on with $f$ here?

🟡 ** ritsuko** — oh, this is just a trick of how we implement distributions — when measuring the mass of any specific $u$, we try to $\mathit{E}\mathit{v}\mathit{a}\mathit{l}\mathit{M}\mathit{a}\mathit{t}\mathit{h}$ the answer payload into a function $f$, and we only count the location when $u$ is equal to $f(q,\alpha ,{\gamma}_{q})$ with useful parameters passed to it.

🟣 ** misato** — what's

🟡 ** ritsuko** — so… remember this?

$\begin{array}{cc}\hfill & h\hfill \\ \hfill \mathit{\text{Score}}(\mathit{\text{a}})\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& {\mathbf{M}\phantom{\rule{0.278em}{0ex}}[}\mathit{\text{LooksLikeThisWorld}}(h)\phantom{\rule{0.278em}{0ex}}\cdot \phantom{\rule{0.278em}{0ex}}\mathit{\text{HowGood}}(\mathit{\text{a}},h){]}\hfill \\ \hfill & h\phantom{\rule{0.278em}{0ex}}{:}\phantom{\rule{0.278em}{0ex}}\mathit{\text{Prior}}\hfill \end{array}$

🟡 ** ritsuko** — this is where we start actually plugging in our various parts. we'll assume some distribution over initial world-states ${\Omega}_{\alpha}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\Delta}_{\Omega}$ and sample question locations ${\gamma}_{q}$ in futures of those initial world-states — which will serve, for now, as the $\mathit{\text{LooksLikeThisWorld}}$.

$\begin{array}{cc}\hfill & {\alpha ,{\gamma}_{q}}\hfill \\ \hfill \mathit{\text{Score}}(a)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}\left[{{\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q})(a)}\right]\hfill \\ \hfill & {\alpha \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}}\hfill \\ \hfill & {{\gamma}_{q}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}(\alpha ,{\Omega}_{\alpha}^{\to}(\alpha ),q)}\hfill \end{array}$

🟡 ** ritsuko** — the actual AI we use will be of a type like $U\stackrel{H}{\to}A$, and so we can just call $\mathit{\text{AI}}(\mathit{\text{Score}})$, and execute its action guess.

🟣 ** misato** — and… that's it?

🟡 ** ritsuko** — well, no. i mean, the whole fundamental structure is here, but there's still a bunch of work we should do if we want to increase the chances that this produces the outcomes we want.

🟡 ** ritsuko** — so, right now each call to $\mathit{\text{Loc}}$ penalizes $f,g$ for being being too kolmogorov-complex. we could take advantage of this by encouraging our two different blob locations — the question location and the answer location — to share bits of information, rather than coming up with their own, possibly different bits of information. this increases the chances that the question is located "in a similar way" to the answer.

🟣 ** misato** — what does this mean, concretely?

🟡 ** ritsuko** — well, for example, they could have the same bits of information for

🟡 ** ritsuko** — for this, we'll define a set of "location priors" being sampled as part of the hypothesis that $\mathit{\text{Score}}$ samples over — let's call it $\Xi $ (xi). we might as well posit $\Xi \phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}$.

🟡 ** ritsuko** — we'll also define ${K}_{P,X}^{-~}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}P\to {\mathrm{\Delta}}_{X}$ a kolmogorov simplicity measure which can use another piece of information, as, let's see…

${K}_{P,X}^{-~}(p)(x)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}{K}_{P\times X}^{-}(p,x)$

🟡 ** ritsuko** — there we go, measuring the simplicity of the pair of the prior and the element favors information being shared between them.

🟣 ** misato** — wait, this fails to normalize now, doesn't it? because not all of $P\times X$ is sampled, only pairs whose first element is $p$.

🟡 ** ritsuko** — ah, you're right! we can simply normalize this distribution to solve that issue.

${K}_{P,X}^{-~}(p)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}{{\mathit{\text{Normalize}}}_{X}(\lambda x:X.}{K}_{P\times X}^{-}(p,x){)}$

🟡 ** ritsuko** — and in $\mathit{\text{Score}}$ we'll simply add $\xi \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{\Xi}^{-}$ and then pass $\xi $ around to all blob locations:

$\begin{array}{cc}\hfill & \alpha {,\xi},{\gamma}_{q}\hfill \\ \hfill \mathit{\text{Score}}(u)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[{\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q}{,\xi})(u)]\hfill \\ \hfill & \alpha \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}\hfill \\ \hfill & {\xi \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{\Xi}^{-}}\hfill \\ \hfill & {\gamma}_{q}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}(\alpha ,{\Omega}_{\alpha}^{\to}(\alpha ),q{,\xi})\hfill \end{array}$

${\mathit{\text{QACI}}}_{0}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \times {\Gamma}_{|q|}{\times \Xi}\to {\Delta}_{U}$

$\begin{array}{cc}\hfill & {\pi}_{r},f\hfill \\ \hfill {\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q}{,\xi})(u)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[1]\hfill \\ \hfill & {\pi}_{r}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{0}^{\prime},{\xi})\hfill \\ \hfill & f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathit{\text{EvalMath}}}_{\{q\}\times \Omega \times {\Gamma}_{|q|}{\times \Xi}\to U}({\pi}_{r})\hfill \\ \hfill & f(q,\alpha ,{\gamma}_{q}{,\xi})=u\hfill \end{array}$

🟡 ** ritsuko** — finally, we'll use it in $\mathit{\text{Loc}}$ to sample $f,g$ from:

${\mathit{\text{Loc}}}_{n}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}\Omega \times {\Delta}_{\Omega}\times {\mathbb{B}}^{n}{\times \Xi}\to {\Delta}_{{\Gamma}_{n}}$

$\begin{array}{cc}\hfill & f,g,\omega ,\tau ,{b}^{\prime}\hfill \\ \hfill {\mathit{\text{Loc}}}_{n}(\alpha ,\delta ,b{,\xi})(\gamma )\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\left[\frac{{\mathit{\text{SimilarPasts}}}_{\alpha}(\omega ,g({b}^{\prime},\tau ))}{R(g,({b}^{\prime},\tau ))+R(f,g({b}^{\prime},\tau ))}\right]\hfill \\ \hfill & (f,g)\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{{\Xi ,}\phantom{\rule{0.278em}{0ex}}(\Omega \stackrel{H}{\to}{\mathbb{B}}^{n}\times {\mathbb{B}}^{*})\times ({\mathbb{B}}^{n}\times {\mathbb{B}}^{*}\stackrel{H}{\to}\Omega )}^{-{~}}{(\xi )}\hfill \\ \hfill & \omega \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}\lambda \omega :{\mathit{\text{max}}}_{X}^{\Delta}(\lambda \omega :\Omega .\{\begin{array}{cc}\delta (\omega )\hfill & \text{if}\phantom{\rule{0.278em}{0ex}}f(\omega )=(b,\tau )\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}).\delta (\omega )\hfill \\ \hfill & {b}^{\prime}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Uniform}}}_{{\mathbb{B}}^{n}}\hfill \\ \hfill & \begin{array}{cc}\hfill \forall {b}^{\prime \prime}\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{n}:\phantom{\rule{0.278em}{0ex}}& \gamma ({b}^{\prime \prime})=g({b}^{\prime \prime},\tau )\hfill \\ \hfill & f(\gamma ({b}^{\prime \prime}))=({b}^{\prime \prime},\tau )\hfill \end{array}\hfill \end{array}$

🟡 ** ritsuko** — here's an issue: currently in $\mathit{\text{Score}}$, we're weighing hypotheses by how hard it is to find both the question and the answer.

🟡 ** ritsuko** — do you think that's wrong?

🟣 ** misato** — i think we should first ask for how hard it is to find questions, and then normalize the distribution of answers, so that harder-to-find answers don't penalize hypotheses. the reasoning behind this is that we want QACI graphs to be able to do a lot of complicated things, and that we hope question location is sufficient to select what we want already.

🟡 ** ritsuko** — ah, that makes sense, yeah! thankfully, we can just normalize right around the call to ${\mathit{\text{QACI}}}_{0}$, before applying it to $u$:

$\begin{array}{cc}\hfill & \alpha ,\xi ,{\gamma}_{q}\hfill \\ \hfill \mathit{\text{Score}}(u)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[{{\mathit{\text{Normalize}}}_{U}(}{\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q},\xi ){)}(u)]\hfill \\ \hfill & \alpha \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}\hfill \\ \hfill & \xi \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{\Xi}^{-}\hfill \\ \hfill & {\gamma}_{q}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}(\alpha ,{\Omega}_{\alpha}^{\to}(\alpha ),q,\xi )\hfill \end{array}$

🟢 ** shinji** — what happens if we don't get the blob locations we want, exactly?

🟡 ** ritsuko** — well, it depends. there are two kinds of "blob mislocations": "naive" and "adversarial" ones. naive mislocations are hopefully not a huge deal; considering that we're doing average scoring over all scoring functions weighed by mass, hopefully the "signal" from our aligned scoring functions beats out the "noise" from locations that select the wrong thing at a random place, like "boltzmann blobs".

🟡 ** ritsuko** — adversarial blobs, however, are tougher. i expect that they mostly result from unfriendly alien superintelligences, as well as earth-borne AI, both unaligned ones and ones that might result from QACI. against those, i hope that inside QACI we come up with some good decision theory that lets us not worry about that.

🟣 ** misato** — actually, didn't someone recently publish some work on a threat-resistant utility bargaining function, called "Rose"?

🟡 ** ritsuko** — oh, nice! well in that case, if $\mathit{\text{Rose}}$ is of type ${\Delta}_{U}\to U$, then we can simply wrap it around all of $Score$:

$\begin{array}{cc}\hfill & \alpha ,\xi ,{\gamma}_{q}\hfill \\ \hfill \mathit{\text{Score}}\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Rose}}(\lambda u:U.}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[{\mathit{\text{Normalize}}}_{U}({\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q},\xi ))(u)]\phantom{\rule{0.278em}{0ex}}{)}\hfill \\ \hfill & \alpha \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}\hfill \\ \hfill & \xi \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{\Xi}^{-}\hfill \\ \hfill & {\gamma}_{q}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}(\alpha ,{\Omega}_{\alpha}^{\to}(\alpha ),q,\xi )\hfill \end{array}$

🟡 ** ritsuko** — note that we're putting the whole thing inside an anonymous $\lambda $-function, and assigning to $Score$ the result of applying $Rose$ to that distribution.

🟢 ** shinji** — you know, i feel like there ought to be some better ways to select hypotheses that look like our world.

🟡 ** ritsuko** — hmmm. you know, i do feel like if we had some "observation" bitstring $\mu \phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathbb{B}}^{*}$ (mu) which strongly identifies our world, like a whole dump of wikipedia or something, that might help — something like ${\gamma}_{\mu}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|\mu |}(\alpha ,{\Omega}_{\alpha}^{\to}(\alpha ),\mu ,\xi )$. but how do we tie that into the existing set of variables serving as a sampling?

🟣 ** misato** — we could look for the question $q$ in futures of the observation world-state– how do we get that world-state again?

🟡 ** ritsuko** — oh, if you've got ${\gamma}_{\mu}$ you an reconstitute the factual observation world-state with ${\gamma}_{\mu}(\mu )$.

🟣 ** misato** — in that case, we can just do:

$\begin{array}{cc}\hfill & \alpha ,\xi {,{\gamma}_{\mu}},{\gamma}_{q}\hfill \\ \hfill \mathit{\text{Score}}\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}\mathit{\text{Rose}}(\lambda u:U.& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[{\mathit{\text{Normalize}}}_{U}({\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q},\xi ))(u)]\phantom{\rule{0.278em}{0ex}})\hfill \\ \hfill & \alpha \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}\hfill \\ \hfill & \xi \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{\Xi}^{-}\hfill \\ \hfill & \begin{array}{cccccccc}\hfill & {{\gamma}_{\mu}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|\mu |}}\hfill & \hfill & {(\alpha ,{\Omega}_{\alpha}^{\to}(\alpha}\hfill & \hfill & {),\mu}\hfill & \hfill & {,\xi )}\hfill \\ \hfill & {\gamma}_{q}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}\hfill & \hfill & (\alpha ,{\Omega}_{\alpha}^{\to}({{\gamma}_{\mu}(\mu )}\hfill & \hfill & ),q\hfill & \hfill & ,\xi )\hfill \end{array}\hfill \end{array}$

🟡 ** ritsuko** — oh, neat! actually, couldn't we generate

🟣 ** misato** — let's see here, the second observation can be ${\mu}_{2}$…

$\begin{array}{cc}\hfill & \alpha ,\xi ,{\gamma}_{{\mu}_{{1}}}{,{\gamma}_{{\mu}_{2}}},{\gamma}_{q}\hfill \\ \hfill \mathit{\text{Score}}\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}\mathit{\text{Rose}}(\lambda u:U.& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[{\mathit{\text{Normalize}}}_{U}({\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q},\xi ))(u)]\phantom{\rule{0.278em}{0ex}})\hfill \\ \hfill & \alpha \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}\hfill \\ \hfill & \xi \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{\Xi}^{-}\hfill \\ \hfill & \begin{array}{cccccccccc}\hfill & {\gamma}_{{\mu}_{{1}}}\hfill & \hfill & :\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|{\mu}_{{1}}|}\hfill & \hfill & (\alpha ,{\Omega}_{\alpha}^{\to}(\alpha \hfill & \hfill & ),{\mu}_{{1}}\hfill & \hfill & ,\xi )\hfill \\ \hfill & {{\gamma}_{{\mu}_{2}}}\hfill & \hfill & {:\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|{\mu}_{2}|}}\hfill & \hfill & {(\alpha ,{\Omega}_{\alpha}^{\to}({\gamma}_{{\mu}_{1}}({\mu}_{1})}\hfill & \hfill & {),{\mu}_{2}}\hfill & \hfill & {,\xi )}\hfill \\ \hfill & {\gamma}_{q}\hfill & \hfill & :\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}\hfill & \hfill & (\alpha ,{\Omega}_{\alpha}^{\to}({\gamma}_{{\mu}_{{1}}}({\mu}_{{1}})\hfill & \hfill & ),q\hfill & \hfill & ,\xi )\hfill \end{array}\hfill \end{array}$

🟣 ** misato** — how do i sample the ${\gamma}_{q}$ location from both the future of ${\gamma}_{{\mu}_{1}}$

🟡 ** ritsuko** — well, i'm not sure we want to do that. remember that $\mathit{\text{Loc}}$ tries to find the

$\begin{array}{cc}\hfill & \alpha ,\xi ,{\gamma}_{{\mu}_{1}},{\gamma}_{{\mu}_{2}},{\gamma}_{q}\hfill \\ \hfill \mathit{\text{Score}}\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}\mathit{\text{Rose}}(\lambda u:U.& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[{\mathit{\text{Normalize}}}_{U}({\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q},\xi ))(u)]\phantom{\rule{0.278em}{0ex}})\hfill \\ \hfill & \alpha \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}\hfill \\ \hfill & \xi \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{\Xi}^{-}\hfill \\ \hfill & \begin{array}{cccccccccc}\hfill & {\gamma}_{{\mu}_{1}}\hfill & \hfill & :\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|{\mu}_{1}|}\hfill & \hfill & (\alpha ,{\Omega}_{\alpha}^{\to}(\alpha \hfill & \hfill & ),{\mu}_{1}\hfill & \hfill & ,\xi )\hfill \\ \hfill & {\gamma}_{{\mu}_{2}}\hfill & \hfill & :\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|{\mu}_{2}|}\hfill & \hfill & (\alpha ,{\Omega}_{\alpha}^{\to}({\gamma}_{{\mu}_{1}}({\mu}_{1})\hfill & \hfill & ),{\mu}_{2}\hfill & \hfill & ,\xi )\hfill \\ \hfill & {\gamma}_{q}\hfill & \hfill & :\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}\hfill & \hfill & (\alpha ,{\Omega}_{\alpha}^{\to}({\gamma}_{{\mu}_{2}}({\mu}_{2})\hfill & \hfill & ),q\hfill & \hfill & ,\xi )\hfill \end{array}\hfill \\ \hfill & {{\Omega}_{\alpha}^{\to}({\gamma}_{q}(q))({\gamma}_{{\mu}_{2}}({\mu}_{2})){\Omega}_{\alpha}^{\to}({\gamma}_{{\mu}_{2}}({\mu}_{2}))({\gamma}_{q}(q))}\hfill \end{array}$

🟡 ** ritsuko** — it's a bit hacky, but we can simply demand that "the ${\mu}_{2}$ world-state be in the future of the $q$ world-state more than the $q$ world-state is in the future of the ${\mu}_{2}$ world-state".

🟣 ** misato** — huh. i guess that's… one way to do it.

🟢 ** shinji** — could we encourage the blob location prior to use the bits of information from the observations? something like…

$\begin{array}{cc}\hfill & \alpha ,\xi ,{\gamma}_{{\mu}_{1}},{\gamma}_{{\mu}_{2}},{\gamma}_{q}\hfill \\ \hfill \mathit{\text{Score}}\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}\mathit{\text{Rose}}(\lambda u:U.& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[{\mathit{\text{Normalize}}}_{U}({\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q},\xi ))(u)]\phantom{\rule{0.278em}{0ex}})\hfill \\ \hfill & \alpha \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{\Omega}_{\alpha}\hfill \\ \hfill & \xi \phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}{K}_{{{\mathbb{B}}^{*}\times {\mathbb{B}}^{*},}\Xi}^{-{~}}{({\mu}_{1},{\mu}_{2})}\hfill \\ \hfill & \begin{array}{cccccccccc}\hfill & {\gamma}_{{\mu}_{1}}\hfill & \hfill & :\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|{\mu}_{1}|}\hfill & \hfill & (\alpha ,{\Omega}_{\alpha}^{\to}(\alpha \hfill & \hfill & ),{\mu}_{1}\hfill & \hfill & ,\xi )\hfill \\ \hfill & {\gamma}_{{\mu}_{2}}\hfill & \hfill & :\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|{\mu}_{2}|}\hfill & \hfill & (\alpha ,{\Omega}_{\alpha}^{\to}({\gamma}_{{\mu}_{1}}({\mu}_{1})\hfill & \hfill & ),{\mu}_{2}\hfill & \hfill & ,\xi )\hfill \\ \hfill & {\gamma}_{q}\hfill & \hfill & :\phantom{\rule{0.278em}{0ex}}{\mathit{\text{Loc}}}_{|q|}\hfill & \hfill & (\alpha ,{\Omega}_{\alpha}^{\to}({\gamma}_{{\mu}_{2}}({\mu}_{2})\hfill & \hfill & ),q\hfill & \hfill & ,\xi )\hfill \end{array}\hfill \\ \hfill & {\Omega}_{\alpha}^{\to}({\gamma}_{q}(q))({\gamma}_{{\mu}_{2}}({\mu}_{2})){\Omega}_{\alpha}^{\to}({\gamma}_{{\mu}_{2}}({\mu}_{2}))({\gamma}_{q}(q))\hfill \end{array}$

🟡 ** ritsuko** — nope. because then, $\mathit{\text{Loc}}$'s $f$ programs can simply return the observations as constants, rather than finding them in the world, which defeats the entire purpose.

🟣 ** misato** — …so, what's in those observations, exactly?

🟡 ** ritsuko** — well, ${\mu}_{2}$ is mostly just going to be ${\mu}_{1}$ with "more, newer content". but the core of it, ${\mu}_{1}$, could be a whole lot of stuff. a dump of wikipedia, a callable of a some LLM, whatever else would let it identify our world.

🟢 ** shinji** — can't we just, like, plug the AI into the internet and let it gain data that way or something?

🟡 ** ritsuko** — so there's like

🟣 ** misato** — interesting. though of course, the security concerns make this probably unviable.

🟡 ** ritsuko** — hahah. yeah. oh, and we probably want to pass ${\mu}_{1},{\mu}_{2}$ inside ${\mathit{\text{QACI}}}_{0}$:

$\begin{array}{cc}\hfill & {\pi}_{r},f\hfill \\ \hfill {\mathit{\text{QACI}}}_{0}(\alpha ,{\gamma}_{q},\xi )(u)\phantom{\rule{0.278em}{0ex}}\u2254\phantom{\rule{0.278em}{0ex}}& \mathbf{M}\phantom{\rule{0.278em}{0ex}}[1]\hfill \\ \hfill & {\pi}_{r}\phantom{\rule{0.278em}{0ex}}:\phantom{\rule{0.278em}{0ex}}\mathit{\text{QACI}}(\alpha ,{\gamma}_{q},{q}_{0}^{\prime},\xi )\hfill \\ \hfill & f\phantom{\rule{0.278em}{0ex}}\in \phantom{\rule{0.278em}{0ex}}{\mathit{\text{EvalMath}}}_{\{q\}{\times \{{\mu}_{1}\}\times \{{\mu}_{2}\}}\times \Omega \times {\Gamma}_{|q|}\times \Xi \to U}({\pi}_{r})\hfill \\ \hfill & f(q{,{\mu}_{1},{\mu}_{2}},\alpha ,{\gamma}_{q}{,\xi})=u\hfill \end{array}$

🟣 ** misato** — so, is that it then? are we done?

🟡 ** ritsuko** — hardly! i expect that there's

🟢 ** shinji** — you know, the math can seem intimidating at first, but actually it's

🟡 ** ritsuko** — for sure! it should be noted that i'm not particularly qualified at this. my education isn't in math

🟢 ** shinji** — what are some directions which you think are worth exploring, for people who want to help improve QACI?

🟡 ** ritsuko** — oh boy. well, here are some:

- find things that are broken about the current math, and ideally help fix them too.
- think about utility function bargaining more — notably, perhaps scores are regularized, such as maybe by weighing ratings that are more "extreme" (further away from $\frac{1}{2}$) as less probable. alternatively, maybe scoring functions have a finite amount of "votestuff" that they get to distribute amongst all options the way a normalizing distribution does, or maybe we implement something kinda like quadratic voting?
- think about how to make a lazily evaluated observation viable. i'm not sure about this, but it
*feels*like the kind of direction that might help avoid unaligned alien AIs capturing our locations by bruteforcing blob generation using many-worlds. - generally figure out more ways to ensure that the blob locations match the world-states we want — both by improving $\mathit{\text{Loc}}$ and $\mathit{\text{Sign}}$, and by finding more clever ways to use them — you saw how easy it was to add two blob locations for the two observations ${\mu}_{1},{\mu}_{2}$.
- think about turning this scheme into a continuous rather than one-shot AI. (possibly exfohazardous, do not publish)
- related to that, think about ways to make the AI aligned not just with regards to its guess, but also with regards to its side-effects, so as to avoid it wanting to exploit its way out. (possibly exfohazardous, do not publish)
- alternatively, think about how to box the AI so that the output with regards to which it is aligned is its only meaningful source of world-steering.
- one thing we didn't get into much is what could actually be behind $\Omega $, ${\Omega}^{\to}$, and $\mathit{\text{SimilarPasts}}$. you can read more about those here, but i don't have super strong confidence in the way they're currently put together. in particular, it would be great if someone who groks physics a lot more than me thought about whether many-worlds gives unaligned alien superintelligences the ability to forge any blob or observation we could put together in a way that would capture our AI's blob location.
- maybe there are some ways to avoid this by tying the question world-state with the AI's action world-state? maybe implementing embedded agency helps with this? note that blob location can totally
*locate the AI's action*, and use that to produce counterfactual action world-states. maybe that is useful. (possibly exfohazardous, do not publish) - think about $\mathit{\text{Sign}}$ and the $\mathit{\text{ExpensiveHash}}$ function (see the full math post) and how to either implement it or achieve a similar effect otherwise. for example, maybe instead of relying on an expensive hash, we can formally define that $f,g$ need to be "consequentialist agents trying to locate the blob in the way we want", rather than
*any program that works*. - think about how to make counterfactual QACI intervals resistant to someone launching unaligned superintelligence within them.

🟣 ** misato** — ack, i didn't really think of that last one. yeah, that sounds bad.

🟡 ** ritsuko** — yup. in general, i could also do with people who could help with

🟢 ** shinji** — well, things don't look great, but i'm glad this plan is around! i guess it's

🟡 ** ritsuko** — i know right? that's how i feel as well. lol.

🟣 ** misato** — lmao, even.

unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.

unless explicitely mentioned, all content on this site was created by me; not by others nor AI.