avatar

posted on 2023-11-22 — also cross-posted on lesswrong, see there for comments

So you want to save the world? an account in paladinhood

Aroden, who gave me the strength to be a holy warrior, give me also the wisdom to choose my fights wisely … and help me grow strong enough to save everyone, literally everyone, and do not let my strength come coupled with contempt for weakness, or my wisdom with contempt for foolishness … and look after everyone … in this strange country and in all the worlds You travelled to and any worlds You didn't.

And make Hell cease.

– Iomedae, paladin of Aroden; from lintamande's glowfic "in His strength, I will dare and dare and dare until I die"

Introduction.

A couple years ago, I was struggling to get myself to think abount alignment instead of making my video game and reading doujins. Today, I find that my main remaining bottleneck to my dignity-point output is my physical stamina, and that I might make an actual significant difference to p(doom).

This post is about how I got there. It is told in the imperative tense, because I hope that this serves as advice; but be aware that the space of human minds is vastly diverse, even just within lesswrong rationalists, and what worked for me might not work for you. (That said, it doesn't look like there's very many people striving for paladinhood-or-similar out there.)

What's an Iomedae?

(If you don't know what a glowfic is, maybe check out this review of Yudkowsky and lintamande's glowfic, planecrash.)

In glowfics involving linta!golarion (lintamande's interpretation of golarion, the setting of the tabletop RPG pathfinder), Iomedae is the lawful-good goddess of "defeating evil" — or, as lantalótë puts it:

Iomedae's a Good goddess but she's not the goddess of any of those. She's not a goddess of nice things. She's the goddess of prioritization, of looking at the world with all its horrors and saying 'There will be time enough for love and beauty and joy and family later. But first we must make the world safe for them.'

Before ascending to godhood, Iomedae was a paladin of Aroden, the god of civilization. Various lintamande glowfics feature here in various setting, but "in His strength, I will dare and dare and dare until I die" in particular features her not long after she became a paladin, getting isekai'd in america and then adopted by child protective services, and trying to understand this strange world she's appeared in, and what it means for her to do good there.

(Throughout this post, when I paste a quote without mentioning its source, its source will be that particular glowfic.)

What do I mean by "paladin"?

What linta!Iomedae's paladinhood represents to me, is being willfully entirely devoted to doing what matters; to do whatever it takes to save the world. Her paladinhood — that is, the thing of her being a paladin — is not a burden that she bears; it is what she wants to do, deep in her heart.

I would've cheerfully pressed a button that instantly transformed my mindset into that of Iomedae; I reflectively endorsed becoming someone who does whatever it takes to save the world, even at very large costs to myself. But human brains are strange apparatus! we have no such button; if we want to change something in ourselves, we have to do a lot of hard work.

I find myself mostly on the other side of that hard work, now, satisfied with what I've become. It is immensely wasteful that I didn't go through this seven years earlier, when I first became familiar with the sequences and other rationalist work. But I don't dwell on it, because dwelling on it does not, in fact, help save the world.

(Note that the notion of paladinhood is not one that I originally was aware of, when I decided to try hard to save the world. It is one that I learned after-the-fact, and thought it fit me well, as an aesthetic. My brain likes aesthetics and narratives a lot; having an archetype to identify with helps me reify the thing I'm going for, to myself and to others.)

Human minds and the sequences

Human minds are fucking weird.

For example: I expect that if I had framed this post as "you should want to save the world, and here's how" rather than "if you want to save the world, then here's how", a bunch of people who are currently on board with this post would have instead felt aggressed and reacted by finding excuses to reject this post's contents.

It takes a lot of epistemic discipline and epistemic hygiene to examine a proposition without immediately reacting in some way. I am not exempt from this, even now!

We are not good at rationality. That is why the sequences exist. If we were fully rational, we wouldn't need all this hard work to actually believe the true things and do the good actions. We have to train our system 2 to notice when we're making irrational mistakes, until system 1 eventually learns to be better at epistemics and agency.

If you haven't already, I think that you should probably read the sequences. Sure, they're far from perfect, a lot of what they cover is considered common sense. But still, in general, I expect that it's a lot of value for people who want to save the world. To quote the original dath ilan post:

Anyone in dath ilan would tell you that. They’ve read the real stuff and been through the real training, not the horrible mangled mess I could manage to remember and write up into the Sequences.

The sequences are far worse than dath ilan's education material. But it's also the only dath ilan artifact we've got.

The core of paladinhood

The core of paladinhood is to, in every degree of freedom you have, pick the option which maximizes utility. Pretty straightforward! everything else is downstream of that.

Yes, taking breaks and taking care of one's health is downstream of that, because a healthier paladin saves more worlds. Yes, having a general principle of being kind to others is downstream of that, because a paladin who is known to be kind and helpful will tend to have more resources to save the world with.

The human mind tends to try to generate excuses to reject arguments-from-the-outside urging one to take different actions or think differently. The fact that some people would've said things like

actually you shouldn't just maximize utility, you should also take breaks so you're healthy, so this post is wrong and I get to ignore it!

Is an example of irrationally avoiding taking the correct actions. Again, I used to fall for this myself a lot, and I still do! it's not clear what parts of the typical human mind is responsible for trying to generate these excuses all the time.

For me, I think it's a part of my brain that finds it somehow dishonorable to have others figure out what I should do rather than figuring it out myself.

But it doesn't particularly matter what part of the brain is responsible for these excuses. What matters is noticing that one is generating them and learning the habit of not doing that.

I expect that excuses to not have to change is in fact a very profound cause of people choosing to believe that AI definitely cannot kill everyone, or that it's good if AI kills everyone. These beliefs are not the fundamental thing going on; they're downstream of their minds looking for an excuse to not have been wrong, to not have to change their actions or beliefs. (As a note: it feels like people's reluctance to change their actions or beliefs in areas they don't usually do that might fundamentally be a form of executive dysfunction? executive function is really important, it seems.)

Maximizing utility (with the right decision theory) actually fully captures the thing that matters, if you're trying to save the world. Not that you have to actual math to compare utilities; most of the time, it's clear to me which choice is the best. Ultimately, maximizing utility is a matter of ordering, not of quantities: you don't need to figure out by how much margin a choice is the best one, you just need to know that it's the best one.

Oh, and don't completely discard all rule-utilitarianistic/deontological guidelines, obviously. That's part of the rationality, but it bears repeating. My claim is not that you should pick what feels like the "naive" optimal-seeming action instead of what the reasonable rule-utilitarian heuristics recommend; my claim is that you should steer actions which you're currently leaving up to status-quo/what's-easiest/what's-the-most-comfortable/what-other-people-say/whatever.

You should have intent in all of your actions, and check that you're not missing on some important actions you could take. That intent can be "ah yes the optimal allocation of my cognitive resources is to just apply the rule-utilitarian heuristics here" — but it should be some intent.

Wait, what's new here?

This post about "paladinhood" could be interpreted as just a repackaging of regular lesswrong-rationalist consequentialism.

It kind of is! but but I feel like we could use a new package right now, with some useful recommendations in there.

No, you don't have to care about "paladinhood" in particular to be someone who saves the world. But if identifying with a narrative does help your mind fit into a role as much as it does mine, then I expect that something like "paladinhood" has promise.

Very plausibly, paladins like-in-linta!golarion are dissimilar to me. I did take things pretty easy, not prioritizing at all, for the first 28-ish years of my life — even many earthlings have a much longer history of caring and trying.

I'm not deciding to apopt this narrative because I think I'm particularly more deserving of it than others — I'm adopting it because it helps my mind do the thing.

I am trying to take the choices that save the world the most, and turns out one of them is to vaguely roleplay as being inspired by a glowfic character.

Privilege

Most paladins are rich, really, because you need armor and a sword, and only rich people have those, and you need to let a strong healthy child go off to spend their life in Aroden's service, and desperate families can't do without a strong healthy child.

Some people are in a much better position to save the world than others. When I started trying to save the world, I had (in no particular order):

Don't get me wrong, I had some handicaps as well, some of those privileges are ones that I obtained by taking the right choices earlier in my life.

Because my goal is to save the world, I'm aiming this post at the kind of people who are more likely to be able to help save the world.

If you find yourself sufficiently unprivileged that saving the world seems exceedingly difficult for you, I'm not entirely sure what advice to give, sorry. Some people are gonna be having a bad enough time that it's actually better utility for them to take care of themselves than to try and solve alignment. You're a moral patient too! being a paladin doesn't make you less of a moral patient than others, it merely lets you realize that you're as much a moral patient as others.

Do note that many people tend to overestimate how high the bar for contributing to technical alignment is; this could easily be the case for you.

The empty string, inside-view vs outside-view

Yudkowsky famously mentions that his critical beliefs about AI were derived from the (nearly) empty string — that is to say, he didn't need convincing-from-the-outside, he just thought about it a bunch and came to the conclusions he did. I believe I'm similar to him in that respect.

Having an inside view, beliefs that make sense to you rather than merely believing others, is extremely important.

Let's say you're concerned about animal suffering. You should realize that what is gonna have the most impact on how much animal suffering the future will contain is, by far, determined by what kind of AI is the one that inevitably takes over the world, and then you should decide to work on something which impacts what kind of AI is the AI that inevitably takes over the world.

This work can be direct (building aligned AI which saves the world) or indirect (trying to make it that the people trying to solve alignment get enough time and resources to succeed before the people functionally-trying-to-kill-everyone-by-building-the-kind-of-AI-that-kills-everyone succeed). But for both of these, you should try really hard to have a profound understanding of how long you expect it to take until someone builds the AI that kills everyone, and what is required to build aligned AI which saves the world.

If you do not have an understanding of what kind of AI will kill everyone, what kind of AI will save the world, and what kind of AI will do neither (and thus is only relevant if it impacts the probability of the first two), then you have no way to know which is which.

People like to rely on outside view — adopting the beliefs of others based on a mixture of percieved-consensus, percieved-coherence, percieved-status, and outright vibe — because it saves them the hard work of thinking about the problem, and because they feel imposter-syndromey about coming up with their own beliefs. This leads to an immense amount of double-counting, in a way I find most easily illustrated by a comic.

You can take into account the opinions of others, but your criterion should be how much epistemic rigor and epistemic hygiene they seem to be having — and you need epistemic rigor&hygiene yourself to determine who exercises good epistemic rigor&hygiene, and you should be careful about double-counting claims when one of your sources says something because they heard it from another one of your sources.

Having epistemic hygiene entails, among other things: not being easily prone to memes, social-pressure-beliefs and other selection effects causing filtered-evidence issues. For more comprehensive training in epistemic rigor, consult the sequences.

Really, the safest is to build up a solid inside view model yourself and mostly rely on that.

Also, human minds are very susceptible to social pressure, including for beliefs. If — like me — you have a propensity to assume-in-system-1 that people know what they're talking about when they use a confident tone, then you should be very careful to only hang out with people who actually use confidence-tones to express their actual-confidence-while-aware-of-wheir-own-confusions.

The main thing that causes me to trust what others say are is them having good epistemics, rather than them being dishonest. There are immense amounts of erroneousness in the field of AI safety, and it's almost all the result of irrationality rather than adverseriality.

One safe way to develop a solid inside-view and maintain good epistemic hygiene is to simply not talk/hear/read about AI at all, anywhere, ever. While not on purpose, that's kind-of the strategy I implemented myself: I thought about the topic on my own without really following research or lesswrong, and that left me free from social-pressure-belief enough that when I finally started reading lesswrong and talking to alignment researchers, I had developed my own, comprehensive view of AI alignment and AI doom. I then checked my beliefs against others, updating an argument made sense to me, and very importantly maintaining my belief when I observed people disagree because they failed to consider the things I'd been considering myself.

If you see one person say something that doesn't make sense to you, you should update about as much as if you see one hundred people say the very same thing. Weigh arguments by coherence rather than repeatedness, and weigh people by the quality of their epistemics rather than by their status.

Also, when you hear a claim, you should also notice the implicit claim that the claim they said matters. You should notice this implicit claim, and think about whether it makes sense and whether it comes from an epistemically hygienic and rigorous place. And if it doesn't, then you need to not spend effort thinking about this irrelevant topic, regardless of how many people are talking about it.

People who are agentic — people who are trying, such as trying to save the world — will consider and put intent in what they choose to say. Their message is not just what is being said, it's that this is what they chose to say and that this is how they chose to say it. Steer the world, including by steering what you think and talk about, and pay less attention to people who do not particularly ensure that they're steering what they're talking about.

See also this quote from MIRI's alignment research field guide:

Why did we choose to write this document? What were we expecting from it, and what caused us to select this particular format and content, out of all of the possibilities?

This is why agency/executive-function is so important. You can be a genius at some topics, but if you don't aim yourself at the right things, you'll essentially just be digging ditches. To save the world you must steer the world, including what you think and do and say.

Be a straightforward, serious, reasonable rationalist

There are as many excuses to do a strange variant of lesswrong rationality as there are rationalist-adjacent offshoots (link has planecrash spoilers).

Furthermore, irony is one of people's favorite excuse to not have to change their mind or actions — if you find a way to not take what matters seriously, and if that way is considered cool by the ambiant memeplex, then you have found a very comfortable way to not even wonder if you should change your mind or actions!

But no, regular lesswrong rationality is actually The Correct Thing, and if there is a vibe-archetype-aesthetic which does promote sincerity and at least allow epistemic rigor at all, it's being a straightforward, serious, reasonable rationalist. Avoid memes, esoteric stuff, rat cults, irony, etc.

The human mind loves latching onto things that are aesthetic or memetic rather than things that are true. The way to remain epistemically safe is to practice good memetic hygiene and not expose your brain to things which you know your brain will be tempted to believe due to variables unrelated to actual truth.

Do not attempt to confront the memes and evaluate their truth value regardless. That's dumb. That's like exposing yourself to as many diseases as possible in order to become strong — it's not how it works.

You do not have to leave {the part of your mind that cares for aesthetics} starving, however! being a straightforward, serious, reasonable rationalist is its own aesthetic — in fact, it's kind of a helpful role that rationalist fiction plays in the rationalist community: it makes {being a straightforward, serious, reasonable rationalist} something cool, a vibe one can adapt.

If you act like an anime character, you win in anime and you lose in real life.
If you act like a ratfic character, you win in ratfic and you win in real life.

Memetics being what they are, your mind will of course be very tempted to engage in all of their bullshit. It probably takes a persistent effort to remain epistemically hygienic. But it is, in general, the action which lets one reason about how to save the world, which is required to save the world.

Try and grind trying

One quality of the human mind is its plasticity. I've noticed that when I first tried to be more paladin-ey, my mind was resistant — it wanted to stick to its virtue ethics, even in situations where the consequantially optimal choice was clearly different. It took a lot of effort to overcome my attachment to my virtue ethics and the various other ways I learned to spend my life.

But what I realized afterwards is that, while resisting, it also slowly gave in — such that some months later, I found it much easier to take the consequentialist actions.

I've found this to also apply to focus, rigor, and stamina.

Similar to an RPG, if you challenge your limit, you will actually improve.

This can also be motivating when facing a choice to do what's comfortable or to do what's right — if you do what's right, not only will more utility be produced, but also you'll have a little bit of an easier time picking the right choice at each later occasion.

It's okay to take breaks if you've pushed yourself too much — you don't want to burn out. It's a careful balance. I've found that, if I push myself enough to a slight burnout, I can recover with a few days~weeks of rest — so I actually have a safety net, in case I push myself too hard.

Maybe it's just me I've noticed some amount of stigma in EA/rat circles, about trying harder. Some people are pretty opposed to the sentiment that we should try harder. Needless to say, I disagree; trying harder improves how far you can comfortably push yourself from then on.

If you grind XP, you level up and your Max HP increases. And if your HP gets low enough, you can always come back and rest.

Resting itself is a skill. These days, I feel like I have a somewhat good sense of when I should stop working; and on those days, I actually manage to not work at all, the entire day.

If only you knew…

Notice how confused you are, and infer how confused others probably are.

Notice how much effort it's taking you to focus on what's important without getting distracted or nerdsniped, and infer how likely others are to be focusing on the right thing rather and how likely what they say is to be a distraction/nerdsnipe.

Notice how much effort it's taking you to withhold bits of exfohazardsous information, and infer how likely it is for someone else to leak information that you're sharing with them.

Notice how much work it's taking you to epistemically rigorously, epistemically hygienically, security-mindset-ly ensure your ideas for alignment hold up, and deduce how likely it is for the-kind-of-people-working-at-AGI-labs to be implementing alignment correctly.

Notice how hard it is to be the kind of person who tries to save the world without giving into irony or memes or cynicism or other excuses to ignore the hard parts, and infer how bad things really are.

And when you do, don't panic.

Part of steering the world is steering yourself, and part of steering yourself is steering how you feel. Yes, things are pretty bad, but wallowing will not actually make you better at saving the world. Yes, it feels unjust and frustrating that the burden of saving the world from the foolishness of some idiots falls on the shoulders of so few, but wallowing in these feelings of injustice and frustration won't help.

Yes, it is unpleasant to have to discard those feelings and do what's consequentially right instead. Very unpleasant! it felt to me like a core part of my personality, which I valued a lot, was being attacked.

But then I pushed myself a bunch, and eventually eroded that part of me away, and now I'm a consequentialist who tries to save the world. I'll go back to being my fun silly emotional self in utopia, after the world is saved, when I no longer require instrumental rationality and I can take it easy again.

But for now, if you are to be a paladin, you have to face reality. In order to take the correct decisions, you have to be able to face how bad things really are.

If you are a paladin, however, you also get some good news: there's one more paladin out there trying to save the world!

Your moral patienthood

Some people, faced with paladinhood, decide that others count more than themselves and that they're foregoing personhood or moral patienthood.

I Think that is wrong.

There is a fundamental mistake that is clearly being made, when one thinks they are "saving the world for literally everyone, except people who signed up for this" (sometimes adding "oh and also only I get to informedly-consent to sign up for this, for some reason"). There's something fundamentally mistaken here.

In my opinion, paladins should consider themselves just as much moral patients as others.

"how can eroding part of oneself's away be okay for you to do, if you consider yourself a person/moral patient?"

Well, there's really three reasons. The first is that these are extraordinary circumstances, where so much hangs in the balance that actually, it's okay for people to do that if they know what they're doing and if it actually helps. The second is that I strongly value freedom, including the freedom to undergo this kind of path. I don't know that I value arbitrary freedom — I don't think I should be able to sign up for irreversible maximal suffering forever. Thankfully, saving the world does not come at anywhere near that price! in fact, I'm plausibly still having a much better time than the majority of humans on earth. Some people have their own considerable life issues to deal with, I've got throwing myself really hard at saving the world.

The third is that, again, once the world is saved, I can probably restore myself. Indeed, taking into account the longer term, saving the world isn't just what's best for everyone-except-me, it's also what's best for me. I would still save the world if it wasn't, if it meant my death; but as it happens, no, saving the world is actually the thing that gets me not just the most utility, but the best life personally.


And if you do become a paladin, or if you already recognize yourself in this post, do please send me a message. I'd like to be friends with more paladins.

I'll end this post by linking a pair of replies near the very end of planecrash (spoilers!).

posted on 2023-11-22 — also cross-posted on lesswrong, see there for comments

CC_ -1 License unless otherwise specified on individual pages, all posts on this website are licensed under the CC_-1 license.
unless explicitely mentioned, all content on this site was created by me; not by others nor AI.