Many effective altruists believe that efforts to shape artificial general intelligence (AGI) – in particular, solving the alignment problem – may be a top priority. Part of the reasoning behind this is that sudden progress in AI capabilities could happen soon, and might lead to a decisive strategic advantage for a single AI system. This could mean that the evolution of values reaches a steady state in the near future – the universe would be shaped according to the values of that AI. This, in turn, offers exceptional leverage to shape the far future by influencing how that AI is built.

I’m sceptical about this narrative, for the following reasons:

AI risk in different scenarios

Different views on what the future of AI will look like give rise to different safety concerns. Given my own views, it seems misguided to worry that a single uncontrolled AI system will take over the world. That doesn’t mean that there will be no serious issues – for instance, escalating conflicts between “human-AI systems” may lead to large amounts of disvalue.

Even if AGI will come in the form of one or more unified, autonomous agents, I think society will already change dramatically due to advances in narrow AI before we get to AGI. I would agree that if you magically dropped an AI system with superhuman capabilities into our world, then it would likely find a way to obtain a decisive strategic advantage – but I don’t think transition to advanced AI will look like this. It’s a mistake to imagine an AGI against the backdrop of our contemporary society.

It is also not clear at all whether the development of artificial general intelligence will quickly result in a steady state (the “end of history”, as the leading AI achieves a decisive strategic advantage). While that is a possible scenario, it seems more likely to me that no single actor will be able to seize all power, which means that there will still be many actors with differing goals for at least some time after the transition to advanced AI. Even if civilisation eventually reaches a steady state with a single, unified AI, it would be shaped by actors in the earlier multipolar world, which may persist for a long time.

If advanced AI does not quickly result in a steady state, then it’s not clear if influencing its development is a good lever to affect the far future. This could be true even if the technology has a large impact on society; similar to how e.g. electricity had a large impact but “shaping the development of electricity” arguably was not a top priority for past altruists.

Given all that, I think it’s not clear whether altruists should focus on (directly) shaping AI. It’s a bit of a reach to think one can accurately predict technological developments many decades (or even centuries) down the line; it’s even more of a reach to think one can anticipate and make progress on problems arising from these future technologies. On the other hand, influencing the future is difficult in general, and shaping the development of advanced AI could be high-leverage even if it is gradual and distributed. (For more on this, see Should altruists focus on artificial intelligence?.)

If (directly) shaping advanced AI is a priority, then it’s still unclear whether alignment with human values is the most important aspect to focus on. The main reason for this is that there will be strong pressures to solve alignment (since everybody wants AI systems that are aligned to their values), so I expect that AI systems will, by and large, be aligned with the values of their operators, at least in a narrow sense. (See Why I expect successful (narrow) alignment.)

Also, alignment may not be sufficient, and possibly not even robustly positive, in terms of reducing future suffering, which is why I’m mostly interested in implementing worst-case safety measures that are aimed at preventing s-risks from AI. (See An introduction to worst-case AI safety and Focus areas of worst-case AI safety.)

A broader perspective

Advanced AI constitutes just one example of a new growth mode, i.e. a significant acceleration of economic and technological progress, comparable to industrialisation. A plausible mechanism for how this would come about is a far greater degree of editability of the minds that drive technological innovation. Human minds can also adapt to some degree (see e.g. Flynn effect), but the available mechanisms are limited in scope. If future minds can be edited more liberally, they can be optimised for productivity, which could lead to much higher growth rates (and arguably also faster change in general).

That could happen if powerful AI systems play a larger role in technological progress, as software seems far more malleable than human brains. But this is only one example – greater editability could also become possible through strong forms of biological enhancement (e.g. iterated embryo selection) or through whole brain emulations, assuming that emulations are easier to edit than biological brains. (However, I don’t think these technologies are around the corner either.)

From this perspective, the key question is how to design such minds to ensure good outcomes (in terms of preventing both incidental and agential s-risks). In a certain sense, this is a generalization of AI safety, which considers the question of how to design safe AI systems.