In response to Wei Dai's claim that a multi-post 2009 Less Wrong discussion on gender issues and offensive speech went well, MIRI researcher Evan Hubinger writes—

Do you think having that debate online was something that needed to happen for AI safety/x-risk? Do you think it benefited AI safety at all? I'm genuinely curious. My bet would be the opposite—that it caused AI safety to be more associated with political drama that helped further taint it.

Okay, but the reason you think AI safety/x-risk is important is because twenty years ago, people like Eliezer Yudkowsky and Nick Bostrom were trying to do systematically correct reasoning about the future, noticed that the alignment problem looked really important, and followed that line of reasoning where it took them—even though it probably looked "tainted" to the serious academics of the time. (The robot apocalypse is nigh? Pftt, sounds like science fiction.)

The cognitive algorithm of "Assume my current agenda is the most important thing, and then execute whatever political strategies are required to protect its social status, funding, power, un-taintedness, &c." wouldn't have led us to noticing the alignment problem, and I would be pretty surprised if it were sufficient to solve it (although that would be very convenient).

An analogy: it's actually easier to build a calculator that does correct arithmetic than it is to build a "triskaidekaphobic calculator" that does "correct arithmetic, except that it never displays the result 13", because the simplest implementation of the latter is just a calculator plus an extra conditional that puts something else on the screen when the real answer would have been 13.

If you don't actually understand how arithmetic works, but you feel intense social pressure to produce a machine that never displays the number 13, I don't think you actually succeed at building a triskaidekaphobic calculator: you're trying to solve a problem under constraints that make it impossible to solve a strictly easier problem.

Similarly, I conjecture that it's actually easier to build a rationality/alignment research community that does systematically correct reasoning, than it is to build a Catholic rationality/alignment research community that does "systematically correct reasoning, except never saying anything the Pope disagrees with." The latter is a strictly harder problem: you have to somehow both get the right answer, and throw out all of the steps of your reasoning that the Pope doesn't want you to say.