If asked to rank humanity’s problems by severity, I would give the silver medal to the need to spend so much time doing things that give us no fulfillment—work, in a word. I consider that the ultimate goal of artificial intelligence is to hand off this burden, to robots that have enough common sense to perform those tasks with minimal supervision.

But some AI researchers have altogether loftier aspirations for future machines: they foresee computer functionality that vastly exceeds our own in every sphere of cognition. Such machines would not only do things that people prefer not to; they would also discover how to do things that no one can yet do. This process can, in principle, iterate—the more such machines can do, the more they can discover.

What’s not to like about that? Why do I NOT view it as a superior research goal than machines with common sense (which I’ll call “minions”)?

First, there is the well-publicised concern that such machines might run amok—especially if the growth of a machine’s skill set (its “self-improvement”) were not iterative but recursive. What researchers mean by this is that enhancements might be not only to the database of things a machine can do, but to its algorithms for deciding what to do. It has been suggested, firstly, that this recursive self-improvement might be exponential (or faster), creating functionality that we cannot remotely comprehend before we can stop the process. So far so majestic—if it weren’t that the trajectory of improvement would itself be out of our control, such that these superintelligent machines might gravitate to “goals” (metrics by which they decide what to do) that we dislike. Much work has been done on ways to avoid this “goal creep”, and to create a reliably, permanently “friendly,” recursively self-improving system, but with precious little progress.

My reason for believing that recursive self-improvement is not the right ultimate goal for AI research is actually not the risk of unfriendly AI, though: rather, it is that I quite strongly suspect that recursive self-improvement is mathematically impossible. In analogy with the so-called “halting problem” concerning determining whether any program terminates, I suspect that there is a yet-to-be-discovered measure of complexity by which no program can ever write another program (including a version of itself) that is an improvement.

The program written may be constrained to be, in a precisely quantifiable sense, simpler than the program that does the writing. It’s true that programs can draw on the outside world for information on how to improve themselves—but I claim (a) that that really only delivers far-less-scary iterative self-improvement rather than recursive, and (b) that anyway it will be inherently self-limiting, since once these machines become as smart as humanity they won’t have any new information to learn. This argument isn’t anywhere near iron-clad enough to give true reassurance, I know, and I bemoan the fact that (to my knowledge) no one is really working to seek such a measure of depth or to prove that none can exist—but it’s a start.

But in contrast, I absolutely am worried about the other reason why I stick to the creation of minions as AI’s natural goal. It is that any creative machine—whether technologically, artistically, whatever—undermines the distinction between man and machine. Humanity has massive uncertainty already regarding what rights various non-human species have. Since objective moral judgements build on agreed norms, which themselves arise from inspection of what we would want for ourselves, it seems impossible even in principle to form such judgements concerning entities that differ far more from us than animals do from each other, so I say we should not put ourselves in the position of needing to try. For illustration, consider the right to reproduce despite resource limitations. Economic incentive-based compromise solutions seem to work adequately. But how can we identify such compromises for “species” with virtually unlimited reproductive potential?

I contend that the possession of common sense does not engender these problems. I define common sense, for present purposes, as the ability to process highly incomplete information so as to identify a reasonably close-to-optimal method for achieving a specified goal, chosen from a parametrically pre-specified set of alternative methods. This explicitly excludes the option of “thinking”—of seeking new methods, outside the pre-specified set, that might outperform anything within the set.

Thus, again for illustration, if the goal is one that should ideally be achieved quickly, and can be achieved faster by many machines than by one, the machine will not explore the option of first building a copy of itself unless that option is pre-specified as admissible, however well it may “know” that doing so would be a good idea. Since admissibility is specified by inclusion rather than exclusion, the risk of “method creep” can (I claim) be safely eliminated. Vitally, it is possible to prevent recursive self-improvement (if it turns out to be possible after all!) entirely.

The availability of an open-ended vista of admissible ways to achieve one’s goals constitutes a good operational definition of “awareness” of those goals. Awareness implies the ability to reflect on the goal and on one’s options for achieving it, which amounts to considering whether there are options one hadn’t thought of.

I could end with a simple “So let’s not create aware machines”—but any possible technology that anyone thinks is desirable will eventually be developed, so it’s not that simple. What I say instead is, let’s think hard now about the rights of thinking machines, so that well before recursive self-improvement arrives we can test our conclusions in the real world with machines that are only slightly aware of their goals. If, as I predict, we thereby discover that our best effort at such ethics fails utterly even at that early stage, maybe such work will cease.