Stuart Armstrong recently posted a few ideas about restraining a superintelligent AI so that we can get useful work out of it. They are based on another idea of his, reduced impact. This is a quite elaborate and complicated way of limiting the amount of optimization power an AI can exert on the world. Basically, it tries to keep the AI from doing things that would make the world look too different than it already is.

First, why go to such great lengths to limit the optimization power of a superintelligent AI? Why not just not make it superintelligent to begin with? We only really want human level AI, or slightly above human level. Not a god-level being we can't even comprehend.

We can control the computer it is running on after all. We can just give it slower processors, less memory, and perhaps even purposely throttle it's code. E.g. restricting the size of it's neural network. Or other parameters that affect it's intelligence.

The counterargument to this is that it might be quite tricky to limit AI intelligence. We don't know how much computing power is enough. We don't know where "above human level" ends and "dangerous superintelligence" begins.

The simplest way would be to just run copies of the AI repeatedly, increasing it's computing power each time, until it solves the problem.

I have come up with a more elegant solution. Put a penalty on the amount of computing power the AI uses. This is put in it's utility function. The more computing power - and therefore intelligence and optimization - the AI uses, the more it is penalized. So it has an incentive to be as stupid as possible. Only using the intelligence necessary to solve the problem.

But we do want the AI to use as much computational resources as it needs to solve the problem. Just no more. So the penalty should be conditional on actually solving the problem it is given.

If the solution is probabilistic, then the penalty is only applied after reaching a plan that has a certain probability of success. This might need to be measured by another AI which is not resource constrained, but only does prediction.

To give a concrete example, lets say I give the AI a task. Say, I ask it to come up with a plan to collect 10,080 paperclips.

Another AI is given unlimited resources to do purely prediction. Given a plan, it predicts the probability that it will succeed or fail. This AI is safe in the sense that it is not an agent. It has no goals, and just makes predictions about things.

The main AI does the optimization. It tries to generate a plan that has the highest probability of succeeding.

Normally, this might involve extremely overoptimized plans for building nanotechnology and taking over the world. It uses all of it's available computing power. It tries to become as intelligent as possible by rewriting it's code. Perhaps it becomes thousands of times more intelligent than humans. Or millions of times more. It finds an answer that has 99.99% probability of succeeding.

However, now we give it a different utility function. We instead have it minimize the time it takes to get to a plan that has a 90% chance of succeeding.

Under a time constraint, the AI races to get to a plan. It tries to be as efficient as possible. It doesn't invest in any meta level improvements unless they really help it. It doesn't try to engineer complicated nanotechnology. That would take precious time.

Effectively, we have summoned a genie that is only just as powerful as it needs to be to fulfill our wish. And not any more powerful. It actually tries to be as stupid as possible.

There are other possible constraints we could use, or use in addition to this. Minimizing time limits intelligence because it gets fewer CPU cycles. We could also have it minimize memory or hard drive space, or any other computing resource.

We could also put a penalty on the complexity of the plan it produces. Perhaps measuring that by it's length. The simplest solution might prevent certain kinds of over-optimization. E.g. inserting plans for nanotechnology into it.

It's worth noting that you can't even create a paperclip maximizer in this system. You can't say "collect as many paperclips as possible". It has to be bounded. There needs to be a pass or fail test. E.g. "come up with a plan to collect 10,080 paperclips."

It's been noted in the past that bounding the goal isn't enough. The AI might then start maximizing the probability that it will achieve it's goal. E.g. building elaborate sensors to make sure it hasn't miscounted. Making as many redundant paperclips as possible, just in case something happens to them. You are still summoning an incredibly powerful genie, which might overoptimize.

This gets around that by only having it care about having a >90% chance of getting 10,080 paperclips. After that it stops optimizing.

Now this is not a solution to FAI, or even necessarily a usable reduced impact strategy. It will still try to find any loopholes it can in your task. What it does is limits the power of the genie trying to exploit them. It always takes the stupidest, simplest strategy that has a decent chance of working.

One failure case that bothers me is the AI just shoving a slight modification of it's own source code into the output. One which doesn't have a resource constraint. You may only be able to use it in cases where that isn't possible.

There are many cases where that isn't possible. The AI needs to also create a plan to obtain a computer to run it. And it needs to solve it's own version of the value loading problem, to get the copy to care about paperclips. So that adds complexity to that plan in some cases.

If we apply the complexity penalty I described above, then the source code for the AI must also be very short, or it can't do this.

And this requires obtaining it's own source code. An ability we can potentially restrict (you, as a human can't access your own source code, can you?) Without that it would need to reinvent AI, which takes resources.

But it is an interesting idea I had and I thought I'd share it.

Reposted from my blog.