In a recent e-mail thread, Andrew Critch sent me the following "subtle problem with sending junior AI-x-risk-concerned researchers into AI capabilities research". Here's the explanation he wrote of his view, shared with his permission:

I'm fairly concerned with the practice of telling people who "really care about AI safety" to go into AI capabilities research, unless they are very junior researchers who are using general AI research as a place to improve their skills until they're able to contribute to AI safety later. (See Leveraging Academia ).

The reason is not a fear that they will contribute to AI capabilities advancement in some manner that will be marginally detrimental to the future. It's also not a fear that they'll fail to change the company's culture in the ways they'd hope, and end up feeling discouraged. What I'm afraid of is that they'll feel pressure to start pretending to themselves, or to others, that their work is "relevant to safety". Then what we end up with are companies and departments filled with people who are "concerned about safety", creating a false sense of security that something relevant is being done, when all we have are a bunch of simmering concerns and concomitant rationalizations.

This fear of mine requires some context from my background as a researcher. I see this problem with environmentalists who "really care about climate change", who tell themselves they're "working on it" by studying the roots of a fairly arbitrary species of tree in a fairly arbitrary ecosystem that won't generalize to anything likely to help with climate change.

My assessment that their work won't generalize is mostly not from my own outside view; it comes from asking the researcher about how their work is likely to have an impact, and getting a response that either says nothing more than "I'm not sure, but it seems relevant somehow", or an argument with a lot of caveats like "X might help with Y, which might help with Z, which might help with climate change, but we really can't be sure, and it's not my job to defend the relevance of my work. It's intrinsically interesting to me, and you never know if something could turn out to be useful that seemed useless at first."

At the same time, I know other climate scientists who seem to have actually done an explicit or implicit Fermi estimate for the probability that they will personally soon discover a species of bacteria that could safely scrub the Earth's atmosphere of excess carbon. That's much better.

I've seen the same sort of problem with political scientists who are "really concerned about nuclear war" who tell themselves they're "working on it" by trying to produce a minor generalization of an edge case of a voting theorem that, when asked, they don't think will be used by anyone ever.

At the same time, I know other political scientists who seem to be trying really hard to work backward from a certain geopolitical outcome, and earnestly working out the details of what the world would need to make that outcome happen. That's much better.

Having said this, I do think it's fine and good if society wants to sponsor a person to study obscure roots of obscure trees that probably won't help with climate change, or edge cases of theorems that no one will ever use or even take inspiration from, but I would like everyone to be on the same page that in such cases what we're sponsoring is intellectual freedom and development, and not climate change prevention or nuclear war prevention. If folks want to study fairly obscure phenomena because it feels like the next thing their mind needs to understand the world better, we shouldn't pressure them to have to think that the next thing they learn might "stop climate change" or "prevent nuclear war", or else we fuel the fire of false pretenses about which of the world's research gaps are being earnestly taken care of.

Unfortunately, the above pattern of "justifying" research by just reflecting on what you care about, rationalizing it, and not checking the rationalization for rationality, appears to me to be extremely prevalent among folks who care about climate change or nuclear war, and this is not something I want to see replicated elsewhere, especially not in the burgeoning fields of AI safety, AI ethics, or AI x-risk reduction. And I'm concerned that if we tell folks to go into AI research just to "be concerned", we'll be fueling a false sense of security by filling departments and companies with people who "seem to really care" but aren't doing correspondingly relevant research work, and creating a research culture where concerns about safety, ethics, or x-risk do not result in actually prioritizing research into safety, ethics, or x-risk.

When you’re giving general-purpose career advice, the meme "do AI yourself, so you're around to help make it safe" is a really bad meme. It fuels a narrative that says "Being a good person standing next to the development of dangerous tech makes the tech less dangerous." Just standing nearby doesn't actually help unless you're doing technical safety research. Just standing nearby does create a false sense of security through the mere-exposure effect . And the "just stand nearby" attitude drives people to worsen race conditions by creating new competitors in different geographical locations, so they can exercise their Stand Nearby powers to ensure the tech is safe.

Important: the above paragraphs are advice about what advice to give, because of the social pressures and tendencies to rationalize that advice-giving often produces. By contrast, if you're a person who's worried about AI, and thinking about a career in AI research, I do not wish to discourage you from going into AI capabilities research. To you, what I want to say is something different....

Step 1: Learn by doing. Leverage Academia . Get into a good grad school for AI research, and focus first on learning things that feel like they will help you personally to understand AI safety better (or AI ethics, or AI x-risk; replace by your area of interest throughout). Don't worry about whether you're "contributing" to AI safety too early in your graduate career. Before you're actually ready to make real contributions to the field, try to avoid rationalizing doing things because "they might help with safety"; instead, do things because "they might help me personally to understand safety better, in ways that might be idiosyncratic to me and my own learning process."

Remember, what you need to learn to understand safety, and what the field needs to progress, might be pretty different, and you need to have the freedom to learn whatever gaps seem important to you personally. Early in your research career, you need to be in "consume" mode more than "produce" mode, and it's fine if your way of "consuming" knowledge and skill is to "produce" things that aren't very externally valuable. So, try to avoid rationalizing the externally-usable safety-value of ideas or tools you produce on your way to understanding how to produce externally-usable safety research later.

The societal value of you producing your earliest research results will be that they help you personally to fill gaps in your mind that matter for your personal understanding of AI safety, and that's all the justification you need in my books. So, do focus on learning things that you need to understand safety better, but don't expect those things to be a "contribution" that will matter to others.

Step 2: Once you've learned enough that you're able to start contributing to research in AI safety (or ethics, or x-risk), then start focusing directly on making safety research contributions that others might find insightful. When you're ready enough to start actually producing advances in your field, that's when it's time to start thinking about the social impact of those advances would be, and start shifting your focus somewhat away from learning (consuming) and somewhat more toward contributing (producing).

(Content from Critch ends here.)