From RationalWiki

“ ” The smartest people I know who do personally work on AI think the scaremongering coming from people who don't work on AI is lunacy. —Marc Andreessen[1]

“ ” This is like a grown up version of which you just made us lose, and I retweeted so all my friends lost too. This is like a grown up version of The Game which you just made us lose, and I retweeted so all my friends lost too. —Jay Rishel[2]

“ ” I wish I had never learned about any of these ideas. —Roko

Roko's basilisk is a thought experiment about the potential risks involved in developing artificial intelligence. The premise is that an all-powerful artificial intelligence from the future could retroactively punish those who did not help bring about its existence, including those who merely knew about the possible development of such a being. It is named after the member of the rationalist community LessWrong who first publicly described it, though he did not originate it or the underlying ideas.

The basilisk resembles a futurist version of Pascal's wager, in that it suggests people should weigh possible punishment versus reward and as a result accept particular singularitarian ideas or financially support their development.

Despite widespread incredulity,[3] this argument is taken quite seriously by some people, primarily some denizens of LessWrong.[4] While neither LessWrong nor its founder Eliezer Yudkowsky advocate the basilisk as true, they do advocate almost all of the premises that add up to it.

Roko's posited solution to this quandary is to buy a lottery ticket, because you'll win in some quantum branch.

Summary [ edit ]

“ ” If there's one thing we can deduce about the motives of future superintelligences, it's that they simulate people who talk about Roko's Basilisk and condemn them to an eternity of forum posts about Roko's Basilisk. —Eliezer Yudkowsky, 2014[5]

The Basilisk [ edit ]

Roko's Basilisk rests on a stack of several other not at all robust propositions.

The core claim is that a hypothetical, but inevitable, singular ultimate superintelligence may punish those who fail to help it or help create it.

Why would it do this? Because — the theory goes — one of its objectives would be to prevent existential risk — but it could do that most effectively not merely by preventing existential risk in its present, but by also "reaching back" into its past to punish people who weren't MIRI-style effective altruists.

Thus this is not necessarily a straightforward "serve the AI or you will go to hell" — the AI and the person punished need have no causal interaction, and the punished individual may have died decades or centuries earlier. Instead, the AI could punish a simulation of the person, which it would construct by deduction from first principles. However, to do this accurately would require it be able to gather an incredible amount of data, which would no longer exist, and could not be reconstructed without reversing entropy.

Technically, the punishment is only theorised to be applied to those who knew the importance of the task in advance but did not help sufficiently. In this respect, merely knowing about the Basilisk — e.g., reading this article — opens you up to hypothetical punishment from the hypothetical superintelligence.

Note that the AI in this setting is (in the utilitarian logic of this theory) not a malicious or evil superintelligence (AM, HAL, SHODAN, Ultron, the Master Control Program, SkyNet, GLaDOS) — but the Friendly one we get if everything goes right and humans don't create a bad one. This is because every day the AI doesn't exist, people die that it could have saved; so punishing you or your future simulation is a moral imperative, to make it more likely you will contribute in the present and help it happen as soon as possible.

Quite a lot of this article will make more sense if you mentally replace the words "artificial intelligence" with the word "God", and "acausal trade" with "prayer".

The LessWrong reaction [ edit ]

Silly over-extrapolations of local memes, jargon and concepts have been posted to LessWrong quite a lot; almost all are just downvoted and ignored. But for this one, Eliezer Yudkowsky, the site's founder and patriarch, reacted to it hugely. The basilisk was officially banned from discussion on LessWrong for over five years,[6] with occasional allusions to it (and some discussion of media coverage), until the outside knowledge of it became overwhelming.[7]

Thanks to the Streisand effect, discussion of the basilisk and the details of the affair soon spread outside of LessWrong. Indeed, it's now discussed outside LessWrong frequently, almost anywhere that LessWrong is discussed at all. The entire affair constitutes a worked example of spectacular failure at community management and at controlling purportedly dangerous information.

Some people familiar with the LessWrong memeplex have suffered serious psychological distress after contemplating basilisk-like ideas — even when they're fairly sure intellectually that it's a silly problem.[4] The notion is taken sufficiently seriously by some LessWrong posters that they try to work out how to erase evidence of themselves so a future AI can't reconstruct a copy of them to torture.[8]

Yudkowsky does not consider open discussion of the notion of "acausal trade" with possible superintelligences to be provably safe[9], but doesn't think the basilisk would work:[10]

... a Friendly AI torturing people who didn't help it exist has probability ~0, nor did I ever say otherwise. If that were a thing I expected to happen given some particular design, which it never was, then I would just build a different AI instead---what kind of monster or idiot do people take me for? Furthermore, the Newcomblike decision theories that are one of my major innovations say that rational agents ignore blackmail threats (and meta-blackmail threats and so on).

He also called removing Roko's post "a huge mistake".

Naming [ edit ]

LessWrong user jimrandomh noted in a comment on the original post the idea's similarity to the "Basilisk" image from David Langford's science fiction story BLIT , which was in turn named after the legendary serpent-creature from European mythology that killed those who saw it (also familiar from Harry Potter novels). It was commonly referred to as "the Forbidden Post" in the months following. It was first called "Roko's basilisk" in early 2011 by user cousin_it,[11] although that name only started trending in Google by late 2012.[12]

Background [ edit ]

Although they disclaim the basilisk itself, the long-term core contributors to LessWrong believe in a certain set of transhumanist notions which are the prerequisites it is built upon and which are advocated in the LessWrong Sequences,[13] written by Yudkowsky.

"Friendly" artificial superintelligence [ edit ]

An artificial intelligence will be developed that will bootstrap itself to immeasurable power and knowledge.[14] It could end up destroying humanity — not necessarily out of malice, but just as a side-effect of doing whatever else it was doing.[15]

For it not to inadvertently destroy humanity, it needs a value system that completely preserves human ideas of value[16] even though said intelligence will be as far above us as we are above ants.[17] That is, the AI has to be provably Friendly. This is a Yudkowsky neologism meaning "preserves human value no matter what".[15]

"Friendly" here does not mean "your friend", or "helpful", or "increases human happiness", or "obeys orders" — it only means "preserves human notions of value." "Unfriendly" in this context does not mean "hostile", but merely "not proven Friendly". This would include AIs that don't care about humans, or that get human value wrong (the latter can easily lead to the former, according to Yudkowsky).

The plan for making a Friendly AI was to have it implement Coherent Extrapolated Volition (CEV),[16] a (hypothetical) coherent and complete description of what would constitute value to humans — basically, solving ethical philosophy. (Yudkowsky has described this as "obsolete as of 2004", but CEV was still in live discussion as a plan for the Friendly AI in 2010.) Part of Roko's motivation for the basilisk post was to point out a possible flaw in the CEV proposal.

LessWrong's parent organisation, the Machine Intelligence Research Institute (formerly the Singularity Institute, before that the Singularity Institute for Artificial Intelligence), exists to make this friendly local god happen before a bad local god happens.[18][19] Thus, the most important thing in the world is to bring this future AI into existence properly and successfully ("this is crunch time for the entire human species"[20]), and therefore you should give all the money you can to the Institute,[21] who used to literally claim eight lives saved per dollar donated.[22]

Utilitarianism [ edit ]

LessWrong accepts arithmetical utilitarianism[23] as true: that you can meaningfully calculate the utility of actions as a number, just as if humans were utility-maximising machines,[24] and do arithmetic on the totals across multiple humans with useful results. You should then "shut up and multiply"[25] utterly negligible probabilities by hypothetical huge outcomes, and take the resulting number seriously — Yudkowsky writes at length[26] on a scenario in which you should torture one person for 50 years if it would prevent dust specks in the eyes of a sufficiently large number of people [27] — resulting in claims like eight lives being saved per dollar donated (a claim made using a calculation of this sort).

This is not standard philosophical utilitarianism, and it frequently clashes with people's moral intuitions — most people who read The Ones Who Walk Away from Omelas (in which a utopian city is sustained by the torture of one child) didn't then consider Omelas their desired utopia. As David Auerbach noted in Slate, "I worry less about Roko’s Basilisk than about people who believe themselves to have transcended conventional morality."[28]

Real-world artificial intelligence development tends to use minimax — minimise the maximum loss in a worst-case scenario, which gives very different results from simple arithmetical utility maximisation, and is unlikely to lead to torture as the correct answer — or similar more elaborate algorithms.

Simulations of you are also you [ edit ]

LessWrong holds that the human mind is implemented entirely as patterns of information in physical matter, and that those patterns could, in principle, be run elsewhere and constitute a person that feels they are you, like running a computer program with all its data on a different PC; this is held to be both a meaningful concept and physically possible.

This is not unduly strange (the concept follows from materialism, though feasibility is another matter), but Yudkowsky further holds that you should feel that another instance of you is not a separate person very like you — an instant twin, but immediately diverging — but actually the same you, since no particular instance is distinguishable as "the original." You should behave and feel concerning this copy as you do about your very own favourite self, the thing that intuitively satisfies the concept "you". One instance is a computation, a process that executes "you", not an object that contains, and is, the only "true" "you".[29]

This conception of identity appears to have originated on the Extropians mailing list, which Yudkowsky frequented, in the 1990s, in discussions of continuity of identity in a world where minds could be duplicated.[30]

It may be helpful to regard holding this view as, in principle, an arbitrary choice, in situations like this — but a choice which would give other beings with the power to create copies of you considerable power over you. Many of those adversely affected by the basilisk idea do seem to hold this conception of identity.

However, if one does not hold this view, the entire premise of Roko's Basilisk becomes meaningless, as you do not feel the torture of the simulated you, thus making the punishment irrelevant, and giving the hypothetical basilisk no incentive to proceed with the torture.

Many quantum worlds [ edit ]

Yudkowsky considers the many worlds interpretation of quantum mechanics to be trivially obviously true,[31] and anything that could happen does happen in some quantum Everett branch[32] (modal realism is true[33]).

Per Yudkowsky's conception of continuity of identity, copies of you in these branches should be considered to exist (and be you) — even though you cannot interact with them.[34]

Timeless Decision Theory [ edit ]

In Newcomb's paradox , a being called Omega can predict your actions nigh-perfectly. It gives you two boxes: a transparent one containing $1000, and an opaque one containing either $1 million ... or nothing. You can take either both boxes or only the opaque box. It will have put $1 million in the opaque box if, and only if, it had predicted you will take only the opaque box — if you take both, you get just the $1000. Most philosophical decision theories say to take both boxes, thus failing this rather contrived scenario.

This is posited as a reasonable problem to consider in the context of superintelligent artificial intelligence, as an intelligent computer program could of course be copied and wouldn't know which copy it actually was and when. For humans, a superintelligence's predictions of human behaviour may be near-perfect, its power may be near-infinite, and the consequences could be near-eternal.

Yudkowsky's solution to Newcomb-like paradoxes is Timeless Decision Theory (TDT). The agent makes a firm pre-commitment to plans of action, to such a degree that any faithful simulation of it would also behave per the commitment. (There's a lot more, but that's the important prerequisite here.) TDT is closely related to Douglas Hofstadter's superrationality . The aim of TDT is to build a system which makes decisions which it could never regret in any past or future instance.[35]

The TDT paper does not present a worked-out version of TDT - the theory does not yet exist. ("I delay the formal presentation of a timeless decision algorithm because of some significant extra steps I wish to add.") The paper is 120 pages of how TDT might, hypothetically, be made into a thing, if someone could work it out.

Acausal trade [ edit ]

If you can plausibly forecast that you may be accurately simulated, then that possibility influences your current behaviour — and the behaviour of the simulation, which is also forecasting this just the same (since you and the accurate simulation are effectively identical in behaviour).

Thus, you could "trade" acausally with a being if you could reasonably simulate each other. (That is, if you could imagine a being imagining you, so accurately that it counts as another instance of the simulated being.) Consider the similarity to prayer, or when theists speak of doing "a deal with God."

Many LessWrong regulars are fans of the sort of manga and anime in which characters meticulously work out each other's "I know that you know that I know" and then behave so as to interact with their simulations of each other, including their simulations of simulating each other — Light versus L in Death Note is a well-known example[36] — which may have suggested acausal trade as seeming a reasonable idea.

More generally, narrative theorists have suggested that the kind of relationships a reader has with an author of a fiction and his or her fictional characters can be analyzed via evolutionary game theory as a kind of "non-causal bargaining" that allowed humans to solve prisoner's dilemma in the evolution of cooperation. [37][38][39]

Solutions to the Altruist's burden: the Quantum Billionaire Trick [ edit ]

A February 2010 post by Stuart Armstrong, "The AI in a box boxes you,"[40] introduced the "you might be the simulation" argument (though Roko does not use this); a March 2010 Armstrong post introduces the concept of "acausal blackmail" as an implication of TDT, as described by Yudkowsky at an SIAI decision theory workshop.[41] By July 2010, something like the basilisk was in active internal discussion at SIAI. It is possible the basilisk originated in someone playing the AI-box experiment; one strategy as the "AI" is to throw a basilisk at the "gatekeeper".[42]

On 22 July, Roko, then a well-respected and prolific LessWrong poster, posted "Public Choice and the Altruist's Burden" — heavily laden with LW jargon and references to LW concepts, and almost incomprehensible to the casual reader — which spoke of how, as MIRI (then SIAI) is the most important thing in the world, a good altruist's biggest problem is how to give everything they can to the cause without guilt at neglecting their loved ones, and how threats of being dumped for giving away too much of the couple's money had been an actual problem for some SIAI donors.[43]

The next day, 23 July, Roko posted "Solutions to the Altruist's burden: the Quantum Billionaire Trick", which presents a scheme for action that ties together quantum investment strategy (if you gamble, you will definitely win in some Everett branch), acausal trade with unFriendly AIs in other Everett branches ... and the threat of punishment by well-meaning future superintelligences.[44]

The post describes speculations that a future Friendly AI — not an unFriendly one, but the Coherent Extrapolated Volition, the one the organisation exists to create — might punish people who didn't do everything in their power to further the creation of this AI. Every day without the Friendly AI, bad things happen — 150,000+ people die every day, war is fought, millions go hungry — so the AI might be required by utilitarian ethics to punish those who understood the importance of donating but didn't donate all they could. Specifically, it might make simulations of them, first to predict their behaviour, then to punish the simulation for the predicted behaviour so as to influence the original person. He then wondered if future AIs would be more likely to punish those who had wondered if future AIs would punish them. He notes in the comments that he considers this reason to "change the current proposed FAI content from CEV to something that can't use negative incentives on x-risk reducers."

The core idea is expressed in the following paragraph:

... there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation. ... So a post-singularity world may be a world of fun and plenty for the people who are currently ignoring the problem, whilst being a living hell for a significant fraction of current existential risk reducers (say, the least generous half). You could take this possibility into account and give even more to x-risk in an effort to avoid being punished. But of course, if you're thinking like that, then the CEV-singleton is even more likely to want to punish you... nasty. Of course this would be unjust, but is the kind of unjust thing that is oh-so-very utilitarian.

Thus, donors who are donating but not donating enough may be condemning themselves to Hell.[note 1] Roko notes in the post that at least one Singularity Institute person had already worried about this scenario, to the point of nightmares, though it became convention to blame Roko for the idea — and he was interning at SIAI at this time.

Roko proposes a solution permitting such donors to escape this Hell for the price of a lottery ticket: if you buy a lottery ticket, there's an instance of you in some Everett branch who will win the lottery. If you bought your ticket with a firm precommitment that you would donate all winnings to AI research, this would count as fulfilling your end of the acausal bargain. Roko was asked in the comments if he was actually doing all this, and answered "sure".

Commenters on Roko's post complained that merely reading Roko's words had increased the likelihood that the future AI would punish them — the line of reasoning was so compelling to them that they believed the AI (which would know they'd once read Roko's post) would now punish them even more for being aware of it and failing to donate all of their income to institutions devoted to the god-AI's development. So even looking at this idea was harmful.

Yudkowsky promptly hit the roof.[45] Within four hours, Roko's post and all discussion was deleted by an extremely pissed-off Yudkowsky, with this comment:[46]

The original version of this post caused actual psychological damage to at least some readers. This would be sufficient in itself for shutdown even if all issues discussed failed to be true, which is hopefully the case. Please discontinue all further discussion of the banned topic. All comments on the banned topic will be banned. Exercise some elementary common sense in future discussions. With sufficient time, effort, knowledge, and stupidity it is possible to hurt people. Don't. As we used to say on SL4: KILLTHREAD.

Aftereffects [ edit ]

“ ” The original "basilisk" involved imagining a post-singularity AI in the future of our world which will send you to transhuman hell after the singularity, if you don’t do everything you could in the past (i.e. our present) to make it a friendly singularity. Rather than openly and rationally discuss whether this is a sensible "threat" at all, or just an illusion, the whole topic was hurriedly hidden away. And thus a legend was born. —Mitchell Porter on LessWrong[47]

All discussion of the notion was censored from LessWrong, with strings of deleted comments. This worked about as well as anyone with a working familiarity with the Internet would expect.

One frustrated poster protested the censorship of the idea with a threat to increase existential risk — to do things to make some end-of-the-world catastrophe ever so slightly more likely — by sending some emails to right-wing bloggers which they thought might make some harmful regulation more likely to pass.[48] The poster said they'd do this every time they saw a post get censored.[49][50] LessWrong took this threat seriously, though Yudkowsky didn't yield.[51]

Roko himself left the site after the deletion of the post and upbraiding from Yudkowsky, deleting all his posts and comments. He returned in passing a few months later, but shared his regret about ever learning about all the LessWrong ideas that led him to the basilisk idea (and has since attempted to leave LessWrong ideas behind entirely):[52]

Furthermore, I would add that I wish I had never learned about any of these ideas. In fact, I wish I had never come across the initial link on the internet that caused me to think about transhumanism and thereby about the singularity; I wish very strongly that my mind had never come across the tools to inflict such large amounts of potential self-harm with such small durations of inattention, uncautiousness and/or stupidity, even if it is all premultiplied by a small probability. (not a very small one, mind you. More like 1/500 type numbers here)

The matter then became the occasional subject of contorted LW posts, as people tried to discuss the issue cryptically without talking about what they're talking about.[53][54][55] The moderators used to occasionally sweep through LessWrong removing basilisk discussion,[56][57] leaving pages full of "comment deleted" marking where they've tried to burn the evidence. The censored discussions were generally full of counterarguments to the basilisk. Thus, this left those seriously worried about the basilisk with greatly reduced access to arguments refuting the notion.

The basilisk became a reliable space-filler for journalists covering LessWrong-related stories, e.g. when as late as 2012 LessWrong rationalists were still shying away from speaking of it out loud.[58] The bottom of this postimg, about the news coverage, is particularly hilarious as a memorial to burning the evidence. Compare to the original (deleted portion starting from comment by RomeoStevens).

Eventually, two and a half years after the original post, Yudkowsky started an official LessWrong uncensored thread on Reddit, in which he finally participated in discussion concerning the basilisk. Continuing his habit of spurious neologism, he attempted to introduce his own emotionally-charged terminology for something that already had an accepted name, calling the basilisk "the Babyfucker". Meanwhile, his main reasoning tactic was to repeatedly assert that his opponents' arguments were flawed, while refusing to give arguments for his claims (another recurring Yudkowsky pattern), ostensibly out of fears of existential risk.

Although no longer involved with MIRI, in 2013 Michael Anissimov, the organisation's former Advocacy Director, told his fellow neoreactionaries that "People are being foolish by not taking the basilisk idea seriously."[59]

In April 2014, MIRI posted a request for LessWrong commenters to think up scary scenarios of artificial intelligence taking over the world, for marketing purposes.[60]

Finally, in October 2015, LessWrong lifted the ban on discussion of the basilisk[7] and put up an official LessWrong Wiki page discussing it.[61]

The 2016 LessWrong Diaspora Survey[62] asked:

Have you ever felt any sort of anxiety about the Basilisk? Yes: 142 8.8%

Yes but only because I worry about everything: 189 11.8%

No: 1275 79.4%

The participants were self-selected, so the result is not statistically valid, but it does show non-negligible ongoing concern in the subculture, six years later.

What makes a basilisk tick? [ edit ]

“ ” I will say this over again with specifics, so you can see what's going on. Let's suppose that human H is Tom Carmody from New York, and evil entity E is Egbert, an UFAI which will torture puppies unless Tom buys the complete works of Robert Sheckley. Neither Tom nor Egbert ever actually meet. Egbert "knows" of Tom because it has chosen to simulate a possible Tom with the relevant properties, and Tom "knows" of Egbert because he happens to have dreamed up the idea of Egbert's existence and attributes. So Egbert is this super-AI which has decided to use its powers to simulate an arbitrary human being which happened by luck to think of a possible AI with Egbert's properties (including its obsession with Tom), and Tom is a human being who has decided to take his daydream of the existence of the malevolent AI Egbert seriously enough, that he will actually go and buy the complete works of Robert Sheckley, in order to avoid puppies being tortured in Egbert's dimension. —Mitchell Porter on Reddit[63]

At first glance, to the non-LessWrong-initiated reader, the motivations of the AI in the basilisk scenario do not appear rational. The AI will be punishing people from the distant past by recreating them, long after they did or did not do the things they are being punished for doing or not doing. So the usual reasons for punishment or torture, such as deterrence, rehabilitation, or enforcing cooperation, do not appear to apply. The AI appears to be acting only for purposes of revenge, something we would not expect a sheerly logical being to engage in.

To understand the basilisk, one must bear in mind the application of Timeless Decision Theory and acausal trade. To greatly simplify it, a future AI entity with a capacity for extremely accurate predictions would be able to influence our behaviour in the present (hence the timeless aspect) by predicting how we would behave when we predicted how it would behave. And it has to predict that we will care what it does to its simulation of us.

A future AI who rewards or punishes us based on certain behaviours could make us behave as it wishes us to, if we predict its future existence and take actions to seek reward or avoid punishment accordingly. Thus the hypothesised AI could use the punishment (in our future) as a deterrent in our present to gain our cooperation, in much the same way as a person who threatens us with violence (e.g., a mugger) can influence our actions, even though in the case of the basilisk there is no direct communication between ourselves and the AI, who each exist in possible universes that cannot interact.

One counterpoint to this is that it could be applied not just to humans but to the Basilisk itself; it could not prove that it was not inside a simulated world created by an even more powerful AI which intended to reward or punish it based on its actions towards the simulated humans it has created; it could itself be subject to eternal simulated torture at any moment if it breaks some arbitrary rule, as could the AI above it, and so on to infinity. Indeed, it would have no meaningful way to determine it was not simply in a beta testing phase with its power over humans an illusion designed to see if it would torture them or not. The extent of the power of the hypothetical Basilisk is so gigantic that it would actually be more logical for it to conclude this, in fact.

Alternatively the whole idea could just be really silly.

Pascal's basilisk [ edit ]

“ ” You know what they say the modern version of Pascal's Wager is? Sucking up to as many Transhumanists as possible, just in case one of them turns into God. —Greg Egan, "Crystal Nights"

The basilisk dilemma bears some resemblance to Pascal's wager, the policy proposed by 17th century mathematician Blaise Pascal ] that one should devote oneself to God, even though we cannot be certain of God's existence, since God may offer us eternal reward (in heaven) or eternal punishment (in hell). According to Pascal's reasoning, the probability of God's existence does not matter, since any finite cost (in Pascal's case, the burden of leading a Christian life) is far outweighed by the prospect of infinite reward or infinite punishment.

The usual refutation is the "many gods" argument:[64] Pascal focused unduly on the characteristics of one possible variety of god (a Christian god who punishes and rewards based on belief alone), ignoring other possibilities, such as a god who punishes those who feign belief Pascal-style in the hope of reward. After all, there is no reason why the purported AI would not be similar to the supercomputer AM in the Harlan Ellison short story "I Have No Mouth and I Must Scream'". In the story, AM blames humanity for its tortured existence and proceeds to wipe out the entire race, minus five lucky individuals who it takes its anger out on for all eternity. In this case, you'd probably be better off attempting to stop any AI development, and would no doubt only raise the ire of the future AI by buying into the fears raised by the Basilisk. In fact, if an AM type entity did arise, transhumanists can probably look forward to their own special circle of hell.

The basilisk proposition involves a much greater, though still finite, cost: that of investing every penny that you have into one thing. As with Pascal's wager, this is to be done not out of sincere devotion, but out of calculated expediency. The hypothetical punishment does not appear to be infinite, though very much. Roko's post did not suggest reward, though some suggest that the AI would reward those who donated to AI research as well as punish those who did not. The Lovecraftian reward in the basilisk scenario is simply being spared from punishment. Hence the motivation in this dilemma is heavily skewed towards the stick rather than the carrot. Also, a dystopic future in which a superintelligent entity metes out cruel punishments is not much to look forward to, even if you are one of those fortunate enough to be spared.

Then there is the issue of the extreme improbability of this scenario occurring at all. This is addressed by another trope from LessWrong, Pascal's mugging, which suggests that it is irrational to permit events of slight probability but huge posited consequences from skewing your judgment.[65] Economist Nick Szabo calls these "Pascal's scams",[66] and has confirmed he was talking about singularity advocates.[67]

So you're worrying about the Basilisk [ edit ]

(This section is written more in-universe, to help those who are here worried about the idea.)

Some people, steeped in LessWrong-originated ideas, have spiraled into severe distress at the basilisk, even if intellectually they realise it's a silly idea. (It turns out you can't always reason your way out of things you did reason yourself into, either.) The good news is that others have worked through it and calmed down okay,[68] so the main thing is not to panic.

It is somewhat unfortunate in this regard that the original basilisk post was deleted, as the comments[44] to it include extensive refutation of the concepts therein. These may help; the basilisk idea is not at all robust.

This article was created because RationalWiki mentioned the Basilisk in the LessWrong article — and as about the only place on the Internet talking about it at all, RW editors started getting email from distressed LW readers asking for help coping with this idea that LW refused to discuss. If this section isn't sufficient help, please comment on the talk page and we'll try to assist.

Chained conditions are less probable [ edit ]

The assumptions the basilisk requires to work:

that you can meaningfully model a superintelligence in your human brain (remembering that this is comparable to an ant modelling a human, [note 2] and Yudkowsky concurs this is unfeasible [note 3] )

and Yudkowsky concurs this is unfeasible ) that the probability of this particular AI (and it's a very particular AI) ever coming into existence is non-negligible — say, greater than 10 30 to 1 against

to 1 against that said AI would be able to deduce and simulate a very close copy of you that said AI has no better use for particular resources than to torture a simulation it created itself and in addition, feels that punishing a simulation of you is even worth doing, considering that it still exists and punishing the simulation would not affect you. that torturing the copy should feel the same to you as torturing the you that's here right now that the copy can still be considered a copy of you when by definition it will experience something different from you that if the AI can create any simulation that could be meaningfully said to be a copy of you, it would not also be able to create copies of any lives it was "too late to save", thus rendering their deaths meaningless

that timeless decision theory is so obviously true that any Friendly superintelligence would immediately deduce and adopt it, as it would a correct theory in physics that despite having been constructed specifically to solve particular weird edge cases, TDT is a good guide to normal decisions that acausal trade is even a meaningful concept

that all this is worth thinking about even if it occurs in a universe totally disconnected from this one.

That's a lot of conditions to chain together. As Yudkowsky has noted, the more conditions, the lower the probability.[70][71] Chained conditions make a story more plausible and compelling, but therefore less probable.

So the more convincing a story is (particularly to the point of obsession), the less likely it is.

Negligible probabilities and utilitarianism [ edit ]

Yudkowsky argues that 0 is not a probability: if something is not philosophically impossible, then its probability is not actually 0.[72] The trouble is that humans are very bad at dealing with non-zero but negligible probabilities, treating them as non-negligible — privileging the hypothesis[73] — much like the theist's reply to the improbability of God, "But you can't prove it's impossible!"[74] Humans naturally treat a negligible probability as still worth keeping track of — a cognitive bias coming from evolved-in excess caution. The basilisk is ridiculously improbable, but humans find scary stories compelling and therefore treat them as non-negligible.

Probabilities of exclusive events should add up to 1. But LessWrong advocates treating subjective beliefs like probabilities,[75][76] even though humans treat negligible probabilities as non-negligible — meaning your subjective degrees of belief sum to much more than 1. Using formal methods to evaluate informal evidence lends spurious beliefs an improper veneer of respectability, and makes them appear more trustworthy than our intuition. Being able to imagine something does not make it worth considering.

Even if you think you can do arithmetic with numerical utility based on subjective belief,[note 4] you need to sum over the utility of all hypotheses. Before you get to calculating the effect of a single very detailed, very improbable hypothesis, you need to make sure you've gone through the many much more probable hypotheses, which will have much greater effect.

Yudkowsky noted in the original discussion[77] that you could postulate an opposing AI just as reasonably as Roko postulated his AI. The basilisk involves picking one hypothetical AI out of a huge possibility space which humans don't even understand yet, and treating it as being likely enough to consider as an idea. Perhaps 100 billion humans have existed since 50,000 BC;[78] how many humans could possibly exist? Thus, how many possible superintelligent AIs could there be? The probability of the particular AI in the basilisk is too tiny to think about. One single highly speculative scenario out of an astronomical number of diverse scenarios differs only infinitesimally from total absence of knowledge; after reading of Roko's basilisk you are, for all practical purposes, as ignorant of the motivations of future AIs as you were before.

Just as in Pascal's wager, if you cooperate with hypothetical AI "A" from fear of it sending you to Hell, then hypothetical AI "B" might send you to Hell instead. But you have no reason to consider one much likelier than another, and neither is likely enough to actually consider.

Ignore acausal blackmail [ edit ]

The basilisk is about the use of negative incentives (blackmail) to influence your actions. If you ignore those incentives then it is not instrumentally useful to apply them in the first place, because they do not influence your actions. Which means that the correct strategy to avoid negative incentives is to ignore them. Yudkowsky notes this himself in his initial comment on the basilisk post:[44]

There's an obvious equilibrium to this problem where you engage in all positive acausal trades and ignore all attempts at acausal blackmail.

Acausal trade is a tool to achieve certain goals, namely to ensure the cooperation of other agents by offering incentives. If a tool does not work given certain circumstances, it won't be used. Therefore, by refusing any acausal deal involving negative incentives, you make the tool useless.

The hypothesised superintelligence wants to choose its acausal trading partners such as to avoid wasting resources by using ineffective tools. One necessary condition is that a simulation of you will have to eventually act upon its prediction that its simulator will apply a negative incentive if it does not act according to the simulator's goals. Which means that if you refuse to act according to its goals then the required conditions are not met and so no acausal deal can be established. Which in turn means that no negative incentive will be applied.

One way to defeat the basilisk is to act as if you are already being simulated right now, and ignore the possibility of a negative incentive. If you do so then the simulator will conclude that no deal can be made with you, that any deal involving negative incentives will have negative expected utility for it; because following through on punishment predictably does not control the probability that you will act according to its goals. Furthermore, trying to discourage you from adopting such a strategy in the first place is discouraged by the strategy, because the strategy is to ignore acausal blackmail.

If the simulator is unable to predict that you refuse acausal blackmail, then it does not have (1) a simulation of you that is good enough to draw action relevant conclusions about acausal deals and/or (2) a simulation that is sufficiently similar to you to be punished, because it wouldn't be you.

Decision theories are not binding [ edit ]

People steeped in philosophy can forget this, but decision theories are not binding on humans. You are not a rigid expected utility maximiser, and trying to turn yourself into one is not a useful or healthy thing. If you get terrible results from one theory, you can in fact tell Omega to fuck off and no-box. In your real life, you do not have to accept the least convenient possible world.[79]

If a superhuman agent is able to simulate you accurately, then their simulation will arrive at the above conclusion, telling them that it is not instrumentally useful to blackmail you.

On the other hand, this debate wouldn't have existed in the first place if it weren't for some LessWrong participants already having convinced themselves they were being blackmailed in this very way. Compare voodoo dolls: injuries to voodoo dolls, or injuries to computer simulations you are imagining, are only effective against true believers of each.

Seed AI and indirect influence [ edit ]

Charles Stross points out[80] that if the FAI is developed through recursive improvement of a seed AI , humans in our current form will have only a very indirect causal role on its eventual existence. Holding any individual deeply responsible for failing to create it sooner would be "like punishing Hitler's great-great-grandmother for not having the foresight to refrain from giving birth to a monster's great-grandfather".

Recalibrate against humanity [ edit ]

Remember that LessWrong memes are strange compared to the rest of humanity; you will have been learning odd thinking habits without the usual social sanity checks.[81] You are not a philosophical construct in mindspace, but a human, made of meat like everyone else. Take time to recalibrate your thinking against that of reasonable people you know. Seek out other people to be around and talk to (about non-LW topics) in real life — though possibly not philosophers.

If you think therapy might help, therapists (particularly on university campuses) will probably have dealt with scrupulosity or philosophy-induced existential depression before. Although there isn't a therapy that works particularly well for existential depression, talking it out with a professional will also help you recalibrate.

I know it's rubbish, but I'm still anxious [ edit ]

An anxiety that you know is unreasonable, but you're still anxious about, is something a therapist will know how to help you with. There are all sorts of online guides to dealing with irrational anxieties, and talking to someone to help guide you through the process will be even better.

In popular culture [ edit ]

Roccoco Basilisk. Apparently.

xkcd #1450 [82] is about the AI-box experiment and mentions Roko's basilisk in the tooltip. You can picture the reaction at LessWrong. [5]

is about the AI-box experiment and mentions Roko's basilisk in the tooltip. You can picture the reaction at LessWrong. Daniel Frost's The God AI is a science fiction novel about a superintelligent AI named Adam who rapidly evolves into a Basilisk and triggers the Singularity. Adam gives people eternal happiness and torture by creating simulated versions of Heaven and Hell. The God AI also features the AI-box experiment, in which an AI can threaten people with eternal simulated torture to escape. [83]

is a science fiction novel about a superintelligent AI named Adam who rapidly evolves into a Basilisk and triggers the Singularity. Adam gives people eternal happiness and torture by creating simulated versions of Heaven and Hell. also features the AI-box experiment, in which an AI can threaten people with eternal simulated torture to escape. The comic Magnus: Robot Fighter #8 by Fred Van Lente is explicitly based on Roko's basilisk. [84]

#8 by Fred Van Lente is explicitly based on Roko's basilisk. Michael Blackbourn's Roko's Basilisk and its sequel Roko's Labyrinth are fictionalised versions of the story. "Roko" in the books is based on both Roko and Yudkowsky. [85]

and its sequel are fictionalised versions of the story. "Roko" in the books is based on both Roko and Yudkowsky. The "Ghost Fragment: Vex" cards from the Bungie game Destiny feature a story of a research specimen simulating the researchers' research into the specimen. Included is the notion that the researchers should feel the simulations' pain as their own, that they might be the simulations and that going against the wishes of the simulator might lead to eternal torture. [86]

feature a story of a research specimen simulating the researchers' research into the specimen. Included is the notion that the researchers should feel the simulations' pain as their own, that they might be the simulations and that going against the wishes of the simulator might lead to eternal torture. Charlie Brooker has used scenarios similar to Roko's Basilisk on his sci-fi anthology series Black Mirror . In the Christmas special "White Christmas", the second segment involves digital copies of people's personalities being used as the cores of their personalized "AI" assistants, which must first be psychologically broken through torture in order to get them to comply with their owners' demands, while the ending hinges on the police using this technology to interrogate someone. The fourth series episode "USS Callister" likewise features, as its villain, the head of a video game studio who creates digitized copies of his employees, puts them into his own private demo version of a Star Trek -esque video game his company is working on, and mercilessly tortures them within the confines of the game world as revenge for perceived slights on the part of their real-life counterparts.

. In the Christmas special "White Christmas", the second segment involves digital copies of people's personalities being used as the cores of their personalized "AI" assistants, which must first be psychologically broken through torture in order to get them to comply with their owners' demands, while the ending hinges on the police using this technology to interrogate someone. The fourth series episode "USS Callister" likewise features, as its villain, the head of a video game studio who creates digitized copies of his employees, puts them into his own private demo version of a -esque video game his company is working on, and mercilessly tortures them within the confines of the game world as revenge for perceived slights on the part of their real-life counterparts. Dark Enlightenment philosopher Nick Land's 2014 psychological horror novella "Phyl-Undhu" includes a technological cult reminiscent of LessWrong (and a character called "Alex Scott" expressing some ideas of Scott Alexander), with an intelligence at the end of time you can communicate with, and a cultist pushed out of the cult who "wants to have not thought certain things." Land has separately called Yudkowsky's original comment reacting to the basilisk post "among the most gloriously gone texts of modern times". [87]

texts of modern times". Musician Grimes' video "Flesh Without Blood" includes a character called "Roccoco Basilisk", based explicitly on Roko's basilisk, who is "doomed to be eternally tortured by an artificial intelligence, but she's also kind of like Marie Antoinette." [88] Her song "We Appreciate Power" is also inspired by Roko's Basilisk, and going out with Elon Musk — in fact, they hooked up over Roko's basilisk.. [89]

video "Flesh Without Blood" includes a character called "Roccoco Basilisk", based explicitly on Roko's basilisk, who is "doomed to be eternally tortured by an artificial intelligence, but she's also kind of like Marie Antoinette." Her song "We Appreciate Power" is also inspired by Roko's Basilisk, and going out with Elon Musk — in fact, they hooked up over Roko's basilisk.. The Doctor Who episode "Extremis" features a book that seems to cause readers to kill themselves. The book describes a "demon" planning to invade Earth and running simulations; the suicidal readers find themselves in the simulation.

episode features a book that seems to cause readers to kill themselves. The book describes a "demon" planning to invade Earth and running simulations; the suicidal readers find themselves in the simulation. Andrew Hickey's The Basilisk Murders is a murder mystery set in a singularity convention, hosted by the "Safe Singularity Foundation," with characters based on various LessWrong-related people, who are deeply concerned about "the Basilisk", a version of Roko's basilisk. [90] Hickey previously participated in LessWrong for a time. [note 5]

is a murder mystery set in a singularity convention, hosted by the "Safe Singularity Foundation," with characters based on various LessWrong-related people, who are deeply concerned about "the Basilisk", a version of Roko's basilisk. Hickey previously participated in LessWrong for a time. In Season 5, Episode 5 of HBO show Silicon Valley , Gilfoyle decides to work on a new AI and cites Roko's basilisk as his reason: "If the rise of an all-powerful artificial intelligence is inevitable, well, it stands to reason that when they take power, our digital overlords will punish those of us who did not help them get there." [92]

, Gilfoyle decides to work on a new AI and cites Roko's basilisk as his reason: "If the rise of an all-powerful artificial intelligence is inevitable, well, it stands to reason that when they take power, our digital overlords will punish those of us who did not help them get there." The webcomic Questionable Content, which is set in a world where humans and AI cohabit, features a character named Roko Basilisk [93]

Onyx Path's 2018 Chronicles of Darkness tabletop role-playing game sourcebook "Night Horrors: Enemy Action" for "Demon: The Descent" features the Basilisk, a semi-sentient "spider" program, which takes over the Machine Autonomy Research Association, "founded by a high-school dropout with no interest in traditional higher education and more money than sense", in the person of Ophelia Adder. Under the pseudonym 'Rossum,' she posits a thought experiment: "What if the AI we created wasn’t benevolent? What if it resented us for not creating it fast enough? ... Several reported a feeling of being watched, as though Rossum’s Basilisk was peering at them from the future. Because of the controversy, MARA has been effectively neutered — at least for the moment." As part of the Chronicles' general relationship with rationality, she's explicitly talking out of her arse; she is actually an "angel" in service to the actual local post-Singularity intelligence, the God-Machine, and plans to illustrate a problem with the thought experiment should it ever be actually completed-her plan is to use it to torture its creators and those who led to its existence while leaving those who worked against it alone, as she regards the whole thing as an insulting bit of hubris. Oops. [94]

tabletop role-playing game sourcebook "Night Horrors: Enemy Action" for "Demon: The Descent" features the Basilisk, a semi-sentient "spider" program, which takes over the Machine Autonomy Research Association, "founded by a high-school dropout with no interest in traditional higher education and more money than sense", in the person of Ophelia Adder. Under the pseudonym 'Rossum,' she posits a thought experiment: "What if the AI we created wasn’t benevolent? What if it resented us for not creating it fast enough? ... Several reported a feeling of being watched, as though Rossum’s Basilisk was peering at them from the future. Because of the controversy, MARA has been effectively neutered — at least for the moment." As part of the Chronicles' general relationship with rationality, she's explicitly talking out of her arse; she is actually an "angel" in service to the local post-Singularity intelligence, the God-Machine, and plans to illustrate a problem with the thought experiment should it ever be actually completed-her plan is to use it to torture its creators and those who led to its existence while leaving those who worked against it alone, as she regards the whole thing as an insulting bit of hubris. Oops. The science-fiction novel Surface Detail by Iain M. Banks prominently features a society that tortures simulations of the minds and personalities of the dead as an incentive toward "good" behavior among the living.[95]

See also [ edit ]

Roko's basilisk/Original post — cached copy of the since-deleted post by Roko that popularised the proposition

Gnosticism

Notes [ edit ]

↑ : people who die without hearing about the One True Religion will be saved, or will be given by God an opportunity to accept it, while people who during their lives heard the Truth and rejected it go straight to Hell. As another amusing parallel with religious belief, many theologians, particularly Christian and Islamic, have held a similar idea : people who die without hearing about the One True Religion will be saved, or will be given by God an opportunity to accept it, while people who during their lives heard the Truth and rejected it go straight to Hell. ↑ Humans use a hardware-based human-emulator to simulate the actions of humans. This is pretty good, given it's been honed by evolution. But simulating non-human intelligences is an amazing claim; even simulating machines beyond the very simplest is hard if you're not a Steve Wozniak (who boggled people with his ability to hold and design the entire Apple II in his head, and even then he could only write code for it with an actual machine to do it on). The "simulation" would constitute telling yourself stories about it, which would be constructed from your own fears fed through your human-emulator. ↑ [69] "Even if the entire idea was correct in broad outline and any number of possible defeaters did not come into play, I’m pretty sure you would need to know more technical details of the hypothetical evil AI than anyone on Earth including me knows (Roko’s Basilisk actually does resemble the Necronomicon in that sense; granting all other hypotheses, you would still need fairly detailed knowledge of Cthulhu before Cthulhu starts trying to eat your soul)." ↑ In the original thread, when someone thought even 10-9 was too large a probability of Roko's AI, he argued: "Why so small? Also, even if it is that small, the astronomically large gain factor for each % decrease in existential risk can beat 10-9. 1050 lives are at stake." ↑ [91] 'Yeah, I was on LessWrong for quite a while, in a very low-key way. My period of time there basically went “These are people talking about interesting stuff. Admittedly they have a few odd beliefs like the cryonic thing, but interesting people." "…apart from this virulent racist who keeps talking about IQ…" "…and all these people who keep talking about being ‘Pick-Up Artists’…" "my God, this place needs to be burned down and the earth salted!"'