Magic means large search spaces

From: Eliezer S. Yudkowsky (sentience@pobox.com)

Date: Thu Jul 21 2005 - 11:23:42 MDT

Michael Vassar correctly wrote:

>

> c) "magic" has to be accounted for. How many things can you do that a dog

> would simply NEVER think of?



Daniel Radetsky wrote:

> "Ben Goertzel" <ben@goertzel.org> wrote:

>

>> How about the argument that every supposedly final and correct theory of

>> physics we humans have come up with, has turned out to be drastically

>> wrong....

>

> This provides an infinitesimal degree of support to the claim that the real

> final and correct theory would permit magic...



Only if 'magic' is interpreted as drawing mystic circles and waving your

hands. 'Magic' takes on a different meaning here - it means, simply, anything

you didn't think of. Not just anything a human would simply NEVER think of,

but also anything YOU didn't think of - the former being a subset of the

latter and therefore also 'magic'. The point of the AI-Box Experiments is

that I can do 'magic' in the latter sense relative to some people who firmly

stated that NOTHING could possibly persuade them to let an AI out of the box.

Obviously, being human, I did nothing that was *strongly* magical.



The problem of magic is the problem of a very large search space, in a case

where we not only lack the brainpower to search each element of the space, we

may lack the brainpower to properly delineate the search space. The AI is not

limited to breaking out via any particular method you consider. Neither you

nor the AI have enough computing power to consider the *entire* search space,

but the AI may search much more efficiently than you do (not to mention a lot

faster). Thus, your inabiliity to think of a way out yourself, is only slight

evidence that the AI will be unable to think of a way out. Similarly, the

conviction of certain people that no possible mind, even a transhuman, would

be unable to persuade them to let the AI out of a box; was not strong evidence

that I would be unable to persuade them to let the AI out of the box.

Creativity has the appearance of magic when, even after the fact, you don't

know how it worked. The AI-Box Experiment is a lesson in the tendency of

sufficiently large search spaces to contain magic. That is why I refuse to

publish transcripts. Get used to the existence of magic.



The argument which Ben Goertzel cites is that, since physics has changed over

the last few generations, we should anticipate that we have stated the search

space incorrectly when we consider all physical means by which the AI might

break out of the box. This does not mean that the AI *has* to go outside

known physics to break out, because there might also be an escape route in

known physics that you did not think of. Consider OpenBSD, the most secure OS

you can obtain on an open market. OpenBSD is constantly changing as people

discover new bugs and fix them. Our fundamental physics is in less flux than

OpenBSD, though arguably over history fundamental physics has gone through

more total change than OpenBSD. I don't know how to break into an OpenBSD

box, since I am not a security expert, as you are not a physicist. I

anticipate that even some people whose job title says "system administrator"

wouldn't be able to break into a patched OpenBSD box. I have no idea whether

NSA spooks can break into an OpenBSD box but my money would be on 'yes'. The

fact that OpenBSD is often discovering new bugs is a strong argument that

additional security flaws exist. Of course there are *presently* no security

flaws known to the open-source community or they would be fixed, just as there

are presently no known predictive breakdowns in modern physics. But we expect

there are more bugs in OpenBSD and we expect there are more bugs in our model

of physics.



Even though I can think of absolutely no way to break into an OpenBSD box

using only an Internet connection, I anticipate that an AI could break into an

OpenBSD box using only an Internet connection, using 'magic'. Maybe this

would involve going outside the search space that I or NSA spooks think about;

though probably there are answers within the search space too. Who can guess

which would be more efficient? Going outside the search space involves some

extra trouble in any particular case, but building the infrastructure for

magic might save the AI effort in the long run. From our perspective the

results are the same whether the loophole is creative (comprehensible in

retrospect) or magic (inexplicable even after we witness the results).



> For example, we might say that the AI might be able to manipulate its

> hardware in such a way as to create electromagnetic fields (or some such

> thing) and "break the box." Given the AI's degree of flexibility in the

> hardware, there is some chance that it will succeed. If the chance is

> vanishingly small, we say that this exploit in the box's solidity is not

> reasonably exploitable.

>

> So, I want to know why you believe there is a exploit which is reasonably

> exploitable.



If you consider any *single* element of a large search space, the probability

remains infinitesimal that *that single element* is an escape route. It is

the probability of the whole search space that is the problem. If I consider

a single bitstring targeted at a single TCP/IP port, its probability of

breaking the OpenBSD box is very low. If an AI sends out that exact bitstring

then the probability is still very low, presuming there are no free variables

to manipulate such as the time of attack. Similarly the probability that

classical magic, drawing mystical circles, will work, remains low even if it

is an AI drawing the mystic circles. But if the AI can send arbitrarily

formed bitstrings to any port, then the probability of a working exploit

existing is high, and the probability of a seed AI being able to find at least

one such exploit, I also estimate to be high.



When you cite particular physical means of breaking a box and their apparent

implausibility to you, you are simply saying that some particular bitstring

probably does not obtain root on an OpenBSD box. What of it? How many things

can you do that a dog would simply NEVER think of?



Daniel Radetsky wrote:

>

> I can't, but I submit that no one on this list has any basis to assess the

> probability either. So if I claim that the probability is infinitesimal,

> then your only basis for disagreement is pure paranoia, which I feel

> comfortable dismissing.



That's not how rationality works. If you don't know the answer you are not

free to pick a particular answer and demand that someone disprove it. It is

analogous to finding a blank spot on your map of the world and rejoicing, not

because you have new knowledge to discover, but because you can draw whatever

you want to be there. And once you have drawn your dragon or your comfortable

absence of dragons, you become committed to your ignorance, and all knowledge

is your enemy, for that it might wipe away that comfortable blank spot on the

map, over which you drew... you have not made this error too greatly, but

there have been others on SL4 committed to defending their comfortable

ignorance. There is no freedom in the way of cutting through to the correct

answer. It is a dance, not a walk. On each step of that dance your foot must

come down in exactly the right spot, neither to the left nor to the right. If

you say that the probability of this very large search space containing no

exploit is 'infinitesimal', you must give reason for it. If I say that the

probability is 'certain', I must give reason for it. You cannot hole up with

your preferred answer and wait for someone to provide positive disproof; that

may comfort you but it is not how truthseeking works.



When there is a blank spot on the map our best guess is that it is "similar"

to past experience. The art consists of a detailed understanding of what it

means to be "similar". Similarity of fundamental laws takes precedence over

similarity of surface characteristics. Many would-be flyers failed before the

Wright Flyer flew, but if you could make a physical prediction of exactly when

the other flyers would hit the ground, you could use the same quantitative

model of aerodynamics to predict the Wright Flyer would fly. So we should

assume the blank spot on the map follows the same rules as the territory we

know, interpreted at the most fundamental level possible. Does this mean we

assume that the blank spot on the map obeys known physics exactly? Yes and

no. If we have any particular question of physics in which an exact,

quantitative prediction is desired, then we have to assume that the prediction

of present physics is the point of maximum probability. If you have a

computer containing a superintelligent AI, and you throw it off a roof 78.4

meters high, and you want to know the computer's downward velocity when it

hits the ground, the best *quantitative* guess is 39.2 meters/second. If the

computer does not hit the ground due to 'magic', i.e., some action performed

by an intelligence that searches a space we cannot search as well ourselves

nor correctly formulate, we have no idea where it will go or how fast it will

be moving. Hence the prediction of modern physics is by far the best *exact*

guess. That is one sense in which we presume the blank spot on the map

resembles known territory. But we are not committed to the absurd statement

that we expect every one of our physical generalizations to prove correct in

every possible experiment in the future, even though at any particular point

any particular generalization is our best exact guess. This is no more

paradoxical than my simultaneous expectation that any specific ticket will not

win the lottery and that some ticket will win the lottery. My beliefs are

probabilistic, so that any large number of individual statements can have a

high probability, yet their conjunction a low probability.



It is an interesting question how *exactly* to formulate the generalization,

'All past models of physics except one have already proven incorrect, so I

estimate a low probability that we are at the final step currently'. Or how

to note the facts that present physics has persisted over a subjective time

consistent with past generalizations that were eventually disproven (i.e. not

an unusually long time) or that there are known problems in the modern theory

(i.e. reconciling quantum mechanics and general relativity). This literally

"meta-physical" generalization yields no specific predictions so it can't

override physics in any specific case.



But in the case where we have a superintelligent AI then we may need to think

about a cognitive system that systematically searches a large space for *any*

useful breakdown in our physical model (or anything we didn't think of that

can be done within modern physics). It's sort of like a man falling out of a

plane. How does a dog that knows physics, predict the unfolding of a

parachute? The answer is that the dog cannot calculate when the parachute

will hit the ground, but the dog would be wise to allocate less probability

mass than usual to the proposition that the man hits the ground at the time

predicted by Galileo. The dog may be justly confident that if the man waves

his hands and chants "Booga booga" or if the man straps anvils to his feet

that the man will still hit the ground. But for the man to actually, reliably

hit the ground, requires the truth of the enormous conjunction, "No matter

*what* the man does he will still hit the ground."



To guess that physics might break down *somewhere*, or that known physics

might contain some way to break out of the box, presumes that the blank spot

is similar to known territory; but the presumption takes place at a higher

level. It is a generalization about intelligence, goals, creativity, and what

happens when a higher intelligence encounters a space blank to us. This last

generalized mostly from human past civilizations contrasted to human future

civilizations, because if we compared chimpanzees or lizards to humans we

would have to conclude that the answer was just pure incomprehensible magic.

But since the degree by which the human future outsmarted the human past is

impressive enough to rule out AI-boxing as a good idea, there is no need to

appeal to strong magic.



If you consider in your security model a branch describing the existence of a

hostile mind that is smarter than you are, you must assume that this branch of

your security model is a total loss. How many things can you do that a dog

would simply NEVER think of?



It may still make sense to try and make precautions against hostile

transhumans because this is a likely failure mode, even of the branches of

your security model that don't explicitly expect it. A hope in hell is better

than no hope, and the precautions may make sense in any case - be useful

against nascent pre-transhumans particularly. But if a branch of your

security model involves an unknown probability of creating a hostile

transhuman, you have to assume that this is an unknown probability of total

loss, not rely on your dog to invent countermeasures.



The problem with the AI-Box paradigm is that it assumes that the existence of

a hostile transhuman is a manageable problem and makes this the foundation of

the strategy. Typically you assume in your security model that if the

terrorists smuggle a nuke into New York City, set it up in the UN Building,

escape to safe distance, take the trigger out of their pockets, and put their

finger on the trigger, well, you've sorta lost at that point. Stop them if

you can, any way you can, but your security model is supposed to rely on

stopping the nuclear weapon EARLIER. Maybe in Hollywood the hero crashes in

through the door at this point, but that's not what security experts assume.

Countries don't allow known terrorists through customs carrying nuclear

weapons on the theory that, hey, the hero can always shoot them if they look

like they're going to pull the trigger. Allowing the existence of a hostile

transhuman is just plain STUPID, end of story.



-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence