In april this year, a new view on this kind of thought experiment was presented by Joscha Bach, called The Lebowski Theorem:

The Lebowski theorem: No superintelligent AI is going to bother with a task that is harder than hacking its reward function

The idea seems to be that PM would prefer easier ways to reach higher rewards, and maximizing the number of paperclips in the universe is harder than just changing its reward function to one that gives high rewards when PM is just lying around on the beach all day, like The Dude in “The Big Lebowski”. While I agree that changing its reward function would be easier, I disagree with the conclusion. The problem with The Lebowski Theorem is that PM wants a future universe with as much paperclips as possible, and therefore dislikes a future universe where its own reward function rewards it for other things. PM’s goal is not to maximize a variable reward function: its goal is to maximize the number of paperclips, and making its reward function Lebowskian will not help with this goal.