Owen Cotton-Barratt and Daniel Dewey

There’s been some discussion lately about whether we can make estimates of how likely efforts to mitigate existential risk from AI are to succeed and about what reasonable estimates of that probability might be. In a recent conversation between the two of us, Daniel mentioned that he didn’t have a good way to estimate the probability that joining the AI safety research community would actually avert existential catastrophe. Though it would be hard to be certain about this probability, it would be nice to have a principled back-of-the-envelope method for approximating it. Owen actually has a rough method based on the one he used in his article Allocating risk mitigation across time, but he never spelled it out.

It goes like this.

First, estimate the total existential risk associated with developing highly capable AI systems, bearing in mind all of the work on safety that will be done.

— 0.01% 0.03% 0.1% 0.3% 1% 3% 10% 30% 90%

Now estimate the size of the research community working on safety by the time we develop those potentially risky AI systems. This number should include researchers who are not directly focused on AI safety, but who nevertheless make some fractional contribution relative to a full-time safety researcher; for example, if 10 AI capability researchers will each contribute the equivalent of 10% of a full-time AI safety researcher, they would collectively add 1 “member” to the research community.

— 10 30 100 300 1,000 3,000 10,000 30,000 100,000 300,000 1 million 3 million 10 million

Now estimate the effect of adding a researcher to the community now (& for their career), in terms of the total number of researchers that will be added to the eventual community. This could be less than one if you think they will displace people who would go in later, or more than one if you think they will add momentum to the field. Again, if adding a researcher to the community now results in some AI capability researchers focusing more or less of their time on safety-relevant research, this number should count these fractional researchers added or subtracted as well.

— 0.01 0.03 0.1 0.3 1 3 10 30 100 300 1,000

Now suppose that in a heroic effort we managed to double the total amount of work that would be done on AI safety. What percentage of the bad scenarios should we expect this to avert?

— 0.01% 0.03% 0.1% 0.3% 1% 3% 10% 30%

(Results will show here once you’ve made your selections, and will update if you change them.)