From The Transhumanist Wiki

By Eliezer Yudkowsky.

Last edited July 29, 2006.

The question:

This work addresses the following family of questions about Friendly AI:

"How can an AI be creative if we know exactly what it will do? Or if we don't know exactly what it will do, how can we know it will be Friendly?"

Sometimes this also takes the form of a confidently asserted objection:

"For cognition to work requires chaos/emergence/self-organization/randomness/noise. Therefore cognition is, by its nature, unpredictable. Therefore Friendly AI is impossible."

Then there's the versions based on Vinge:

"If you knew exactly what a smarter intelligence would do, you would be that smart yourself. Therefore you can't know what an AI smarter than yourself will do. Therefore Friendly AI is impossible."

"The more powerful an intelligence is, the less predictable it is. Thus if a very powerful intelligence operates in our world, we can't predict the outcome. Even if the AI is Friendly, humans will no longer be in control of their lives because we won't be able to understand what's happening around us."

Or:

"If the AI never surprises us, it must not be very smart. What if the AI surprises you by doing something unFriendly?"

"How can a predictable Goal System deal with an unpredictable world?"

"Algorithms that include randomness are often more powerful than algorithms without them. If the Friendly AI isn't allowed to use randomness it won't be able to compete with more powerful AIs that can."

"Anything mechanical enough to be understandable is too mechanical to be intelligent."

"An AI simple enough that humans can comprehend it will be dumber than humans."

Preface:

I hope to convey an understanding of intelligence, optimization, and predictability, from within which to answer the above family of questions; give some sense of why it is not necessarily impossible to create a creative, intelligent, predictably Friendly AI. This work is narrowly focused; for example, it doesn't try to ask - given that one has the power to create an AI that is "predictably Friendly" for some chosen sense of "Friendly" - what "Friendly" should mean. (Similarly, the document Coherent Extrapolated Volition, which does focus on choosing a sense of "Friendly", disclaims any attempt to say how a thus-Friendly AI might be built. One question at a time.)

This work is semi-technical. A full resolution to the above family of questions arises from rigorous understanding of probability theory - including the concepts of prediction, randomness, knowledge, and confidence. I have tried to explain this resolution using analogies and metaphors. I don't know how much impact these analogies and metaphors will have on someone not comfortable with algebra. Anyone planning to read both this work and Technical Explanation should read Technical Explanation first. Nonetheless this work is intended to stand on its own.









edit] Intelligence and predictability

Imagine that I'm visiting a distant city, and a local friend volunteers to drive me to the airport. I don't know the neighborhood. Each time my friend approaches a street intersection, I don't know whether my friend will turn left, turn right, or continue straight ahead. I can't predict my friend's move even as we approach each individual intersection - let alone, predict the whole sequence of moves in advance.

Yet I can predict the result of my friend's unpredictable actions: we will arrive at the airport. Even if my friend's house were located elsewhere in the city, so that my friend made a completely different sequence of turns, I would just as confidently predict our arrival at the airport. I can predict this long in advance, before I even get into the car. My flight departs soon, and there's no time to waste; I wouldn't get into the car in the first place, if I couldn't confidently predict that the car would travel to the airport along an unpredictable pathway.

Isn't this a remarkable situation, from a scientific perspective? I can predict the outcome of a process, without being able to predict any of the intermediate steps of the process. Ordinarily one predicts by imagining the present and then running the visualization forward in time. If you want a precise model of the Solar System, one that takes into account planetary perturbations, you must start with a model of all major objects and run that model forward in time, step by step. Sometimes simpler problems have a closed-form solution, where calculating the future at time T takes the same amount of work regardless of T. A coin rests on a table, and after each minute, the coin turns over. The coin starts out showing heads. What face will it show a hundred minutes later? Obviously you did not answer this question by visualizing a hundred intervening steps. You used a closed-form solution that worked to predict the outcome, and would also work to predict any of the intervening steps.

But when my friend drives me to the airport, I can predict the outcome successfully using a strange model that won't work to predict any of the intermediate steps. My model doesn't even require me to input the initial conditions - I don't need to know where we start out in the city!

I do need to know something about my friend. I must know that my friend wants me to make my flight. I must credit that my friend is a good enough planner to successfully drive me to the airport (if he wants to). These are properties of my friend's initial state - properties which let me predict the final destination, though not any intermediate turns.

I must also credit that my friend knows enough about the city to drive successfully. This may be regarded as a relation between my friend and the city; hence, a property of both. But an extremely abstract property, which does not require any specific knowledge about either the city or my friend's knowledge about the city.

edit] Optimization processes

Consider a car - say, a Toyota Corolla. The Toyota Corolla is made up of some number of atoms - say, on the (very) rough order of ten to the thirtieth. If you consider all the possible ways we could arrange those 10^30 atoms, it's clear that only an infinitesimally tiny fraction of possible configurations would qualify as a working car. If you picked a random configurations of 10^30 atoms once per Planck interval, many many ages of the universe would pass before you hit on a working car.

(At this point, someone in the audience usually asks: "But isn't this what the creationists argue? That if you took a bunch of atoms and put them in a box and shook them up, it would be astonishingly improbable for a fully functioning rabbit to fall out?" But the logical flaw in the creationists' argument is not that randomly reconfiguring molecules would by pure chance assemble a rabbit. The logical flaw is that there is a process, natural selection, which, through the non-chance retention of chance mutations, selectively accumulates complexity, until a few billion years later it produces a rabbit. Only the very first replicator in the history of time needed to pop out of the random shaking of molecules - perhaps a short RNA string, though there are more sophisticated hypotheses about autocatalytic hypercycles of chemistry.)

Restricting our attention to running vehicles, there is still an astronomically huge design space of vehicles that could be composed of the same atoms as the Corolla. Most possible running vehicles won't work quite as well. For example, we could take the parts in the Corolla's air conditioner and mix them up in hundreds of possible configurations; nearly all these configurations would result in an inferior vehicle (still recognizable as a car) that lacked a working air conditioner. Thus there are many more configurations corresponding to inferior vehicles than to vehicles of Corolla quality.

A tiny fraction of the design space does describe vehicles that we would recognize as faster, more efficient, and safer than the Corolla. Thus the Corolla is not optimal under our preferences, nor under the designer's own goals. The Corolla is, however, optimized, because the designer had to hit an infinitesimal target in design space just to create a working car, let alone a car of Corolla-equivalent quality. The subspace of working vehicles is dwarfed by the space of all possible molecular configurations for the same atoms. You cannot build so much as an effective wagon by sawing boards into random shapes and nailing them together according to coinflips. To hit such a tiny target in configuration space requires a powerful optimization process. The better the car you want, the more optimization pressure you have to exert. You need a huge optimization pressure just to get a car at all.

This whole discussion assumes implicitly that the designer of the Corolla was trying to produce a "vehicle", a means of travel. This assumption deserves to be made explicit, but it is not wrong, and it is highly useful in understanding the Corolla.

Planning also involves hitting tiny targets in a huge search space. On a 19-by-19 Go board there are roughly 1e180 legal positions. On early positions of a Go game there are more than 300 legal moves per turn. The search space explodes, and nearly all moves are foolish ones if your goal is to win the game. From all the vast space of Go possibilities, a Go player seeks out the infinitesimal fraction of plans which have a decent chance of winning.

You cannot even drive to the supermarket without planning - it will take you a long, long time to arrive if you make random turns at each intersection. The set of turn sequences that will take you to the supermarket is a tiny subset of the space of turn sequences. Note that the subset of turn sequences we're seeking is defined by its consequence - the target - the destination. Within that subset, we care about other things, like the driving distance. There are plans that would take us to the supermarket in a huge pointless loop-the-loop.

In general, as you live your life, you try to steer reality into a particular region of possible futures. When you buy a Corolla, you do it because you want to drive to the supermarket. You drive to the supermarket to buy food, which is a step in a larger strategy to avoid starving. All else being equal, you prefer possible futures in which you are alive, rather than dead of starvation. When you drive to the supermarket, you aren't really aiming for the supermarket, you're aiming for a region of possible futures in which you don't starve. Each turn at each intersection doesn't carry you toward the supermarket, it carries you out of the region of possible futures where you lie helplessly starving in your apartment. If you knew the supermarket was empty, you wouldn't bother driving there. An empty supermarket would occupy exactly the same place on your map of the city, but it wouldn't occupy the same role in your map of possible futures. It is not a location within the city that you are really aiming at, when you drive.

The key idea about an optimization process is that we can know something about the target - the region of possible futures into which the optimization process steers reality - without necessarily knowing how the optimization process will hit that target.

Human intelligence is one kind of powerful optimization process, capable of winning a game of Go or turning sand into digital computers. Natural selection is much slower than human intelligence; but over geological time, cumulative selection pressure qualifies as a powerful optimization process.

Once upon a time, human beings anthropomorphized stars, saw constellations in the sky and battles between constellations. But though stars burn longer and brighter than any craft of biology or human artifice, stars are neither optimization processes, nor products of strong optimization pressures. The stars are not gods; there is no true power in them.

edit] Is the notion of "optimization" useful?

Is the notion of a powerful optimization process useful for our particular purpose of discussing Friendly AI?

I would argue that if something is not a powerful optimization process, then it has neither the power to convey the special and unusual benefits of AI, nor the potential to pose a special danger. And if something is a powerful optimization process, then it may either convey the special benefits of AI, or pose a special danger.

If we spot a ten-mile-wide asteroid hurtling toward Earth, then we are all in deadly jeopardy. (If we do not spot the asteroid, then we are in much worse jeopardy; the map is not the territory.) But, if we do spot the asteroid, it is still only a natural hazard. If we devise a likely plan, the asteroid itself will not oppose us, will not try to think of a counterplan. If we try to deflect the asteroid with a nuclear weapon, the asteroid will not send out interceptors to stop us. If we train lasers on the asteroid's surface (to vaporize gas that expands and deflects the asteroid), the asteroid will not mirror its surface. If we deflect that one asteroid, the asteroid belt will not send another planet-killer in its place. We might have to do some work to steer the future out of the unpleasant region, but there would be no counterforce trying to steer it back. The sun going nova might prove just as deadly to the human species as a recursively-self-improving non-Friendly AI, but the sun going nova would not be a threat of that special kind that actively resists solution, a problem that counters humanity's attempt to solve it.

A universal antiviral agent would be a tremendous benefit to human society, but it would still have no purpose of its own. Even a universal antiviral can be used to negative purposes - for example, it could be administered to soldiers on a battlefield, who could then blanket the enemy with anthrax spores. A universal antiviral would have no power to benefit humanity except insofar as we, with our own minds and intelligence, planned to wield it well. A universal antiviral does not change the nature of the game that humans play against Death. It would still be just us, alone, with our own wits, steering the future.

What we fear from Artificial Intelligence is that we may be beaten at our own game, outwitted and out-invented. Against all our best efforts the future will be steered into a region where humanity is no more; our lesser abilities crushed, like a ninth-dan professional rolling over an amateur Go player. Conversely we may also hope for a powerful optimization process that steers the world out of trouble, away from futures in which humanity is extinguished. We can hope for better outcomes than we could have produced through unaugmented human wits.

In both cases, I speak of the real-world results, and not the particular fashion in which those results are achieved. I speak of the impact that the AI has upon the world - which is what the notion of an optimization process is all about. There are many other purposes of discussion which would imply a legitimate interest in how the AI works internally. There are other purposes of discussion, under which it would not be vacuous to argue whether to call the AI "intelligent" or confer upon it the label of "mind".

But, just so long as the AI is a powerful optimization process as defined - so long as it does in fact possess an enormously strong capacity to steer reality into particular regions of possible futures - then there automatically exists the potential for a huge impact upon the world, a major reshaping of reality which greatly helps or greatly harms humankind.

Conversely, anything that lacks a powerful ability to steer the future into narrow target regions, cannot harm or help humans in the special way. An asteroid that can only steer itself toward Earth using weak puffs of gas, would be slightly more difficult, but not impossible, to fend off. We would only need to push on it more strongly than it could push back. The same goes for an optimization process that can only weakly puff on the future. Similarly, a weak beneficial puff on the future may help us a little, but it is unlikely to help us more than we could have helped ourselves.

In this restricted sense, then, the notion of a powerful optimization process is necessary and sufficient to a discussion about Artificial Intelligence that could powerfully benefit or powerfully harm humanity. If you say that an AI is mechanical and therefore "not really intelligent", and it kills you, you are still dead; and conversely, if an "unintelligent" AI cures your cancer, you are still alive.

edit] Surface facts and deep generators

There's a popular conception of AI as a tape-recorder-of-thought, which only plays back knowledge given to it by the programmers.

Suppose that you tried to build a CPU by programming in, as separate and disconnected facts, every possible sum of two 32-bit integers. This requires a giant lookup table with 2^64 (18 billion billion) entries. Imagine the woes of a research team that tries to build this Arithmetic Expert System as a giant semantic network. They will run headlong into the "common-sense problem" for addition. Seemingly, to teach a computer addition, you must teach it a nearly infinite number of facts that humans somehow just know. Maybe the research team will launch a distributed Internet project to encode all the detailed knowledge necessary for addition. Or maybe they'll try to buy a supercomputer, on the theory that past projects to create Artificial Addition failed because of inadequate computing power.

A compact description of the underlying rules of arithmetic (e.g. the axioms of addition) can give rise to a vast variety of surface facts (e.g. that 953,188 + 12,152 = 965,340). Trying to capture the surface phenomenon, rather than the generator, rapidly runs into the problem of needing to capture an infinite number of surface facts.

You cannot build Deep Blue (the famous program that beat Garry Kasparov for the world chess championship) by programming in a good chess move for every possible chess position. First of all, it is impossible to build a chess player this way, because you don't know exactly which positions it will encounter. You would have to record a specific move for zillions of positions, more than you could consider in a lifetime with your slow neurons. And second, even if you did this, the resulting program would not play chess any better than you do. That is the peril of recording and playing back surface phenomena, rather than capturing the underlying generator.

Deep Blue played chess barely better than the world's top humans, but a heck of a lot better than its own programmers. Deep Blue's programmers could play chess, of course - they had to know the rules - but the programmers didn't play chess anywhere near as well as Kasparov or Deep Blue. Deep Blue's programmers didn't just capture their own chess-move generator. If they'd captured their own chess-move generator, they could have avoided the problem of programming an infinite number of chess positions - but they couldn't have beat Garry Kasparov; they couldn't have built a program that played better chess than any human in the world. The programmers built a better move generator - one that more powerfully steered the game toward the target of winning game positions. Deep Blue's programmers have some slight ability to find chess moves that aim at this same target, but their steering ability is much weaker than Deep Blue's.

Does this seem paradoxical? Maybe it seems paradoxical, but remember that it actually happened - the programmers did actually build Deep Blue, it did actually make moves the programmers could never have thought of, and it did actually beat Kasparov. You can call this "paradoxical", if you like, but it remains a fact. It is likewise "paradoxical" but true that Garry Kasparov was not born with a complete library of chess moves programmed into his DNA. Kasparov invented his own moves; he was not explicitly preprogrammed by evolution to make particular moves - though natural selection did build a brain that could learn. And Deep Blue's programmers invented Deep Blue's code without evolution explicitly encoding Deep Blue's code into their genes.

Steam shovels lift more weight than humans can heft, skyscrapers are taller than their human builders, humans play better chess than natural selection, and computer programs play better chess than humans. The creation can exceed the creator. Call this paradoxical, if you like, but it happens in real life. You can deliberately create a move-chooser that chooses according to a different rule than you yourself employ. You can call it a great and sacred magic, if you like, that humans can invent new strategies which blind unthinking evolution did not explicitly preprogram into us. But Deep Blue also made moves beyond the ability of its programmers. So if there is a sacred magic, it is a sacred magic which AI programmers can infuse into computer programs.

edit] Answers and questions

If I want to create an AI that plays better chess than I do, I have to program a search for winning moves. I can't program in specific moves because then the chess player won't be any better than I am.

This holds true on any level where an answer has to meet a sufficiently high standard. If you want any answer better than you could come up with yourself, you necessarily sacrifice your ability to predict the exact answer in advance.

But do you necessarily sacrifice your ability to predict everything?

As my coworker, Marcello Herreshoff, says: "We never run a program unless we know something about the output and we don't know the output." Deep Blue's programmers didn't know which moves Deep Blue would make, but they must have known something about Deep Blue's output which distinguished that output from the output of a pseudo-random move generator. After all, it would have been much simpler to create a pseudo-random move generator; but instead the programmers felt obligated to carefully craft the complex program that is Deep Blue. In both cases, the programmers wouldn't know the move - so what was the key difference? What was the fact that the programmers knew about Deep Blue's output, if they didn't know the output?

Imagine that the programmers had said to themselves, "Well, if we knew what Deep Blue's move would be, it couldn't possibly play any better than we could. So we need to make sure we don't know Deep Blue's move. So we'll use a random move generator. Problem solved!" One thing is for sure, the resulting program wouldn't have played good chess. Of course, the programmers might be able to convince themselves that the program would play well... after all, they don't know where the program will move, and they don't know what the best move is, so it cancels out, right?

edit] Intelligence and probability

edit] Calibrating predictions about intelligence

Imagine that I'm playing chess against a smarter opponent. If I could predict exactly where my opponent would move on each turn, I would automatically be at least as good a chess player as my opponent. I could just ask myself where my opponent would move, if she were in my shoes; and then make the same move myself. (In fact, to predict my opponent's exact moves, I would need to be superhuman - I would need to predict my opponent's exact mental processes, including her limitations and her errors. It would become a problem of psychology, rather than chess.)

So predicting an exact move is not possible, but neither is it true that I have no information about my opponent's moves. Personally, I am a very weak chess player (I play an average of maybe two games per year). But even if I'm playing against former world champion Garry Kasparov, there are certain things I can predict about his next move. When the game starts, I can guess that the move P-K4 is more likely than P-KN4. I can guess that if Kasparov has a move which would allow me to checkmate him on my next move, that Kasparov will not make that move. Much less reliably, I can guess that Kasparov will not make a move that exposes his queen to my capture - but here, I could be greatly surprised; there could be a rationale for a queen sacrifice which I have not seen.

And finally, of course, I can guess that Kasparov will win the game! Supposing that Kasparov is playing black, I can guess that the final position of the chess board will occupy the class of positions that are wins for black. I cannot predict specific features of the board in detail; but I can narrow things down relative to the class of all possible ending positions.

But I am not actually certain that Kasparov will win. It's extremely likely, but not certain. Such knowledge is made up of probabilities, not sureties. For our purposes here, a "probability" is a guess to which a number is attached, indicating how often you expect to be correct about that kind of guess.

If you're well-calibrated in your probabilities, it means that if we keep track of all the guesses where you say "sixty percent", about 6 in 10 of those guesses turn out to be correct. On the other hand, if you go around declaring that you are "ninety-eight percent certain" of something, and about 7 in 10 of those guesses turn out to be correct, we will say you are poorly calibrated.

(Mr. Spock of Star Trek is extremely poorly calibrated; he often says something like "Captain, if you steer the Enterprise directly into that black hole, our probability of surviving is only 2.234%" and yet nine times out of ten the Enterprise is not destroyed. What kind of tragic fool gives four significant digits for a figure that is off by two orders of magnitude? But then Spock is no more skilled a rationalist than the scriptwriters who produce his dialogue, for if you knew exactly what a great rationalist would say, you would be that rational yourself.)

If I play chess against a superior opponent, and I don't know for certain where my opponent will move, I can still produce a probability distribution that is well-calibrated - in the sense that, over the course of many games, legal moves that I label with a probability of "ten percent" are made by the opponent around 1 time in 10. That is my goal in the task of fine-tuning my own uncertainty: when I say "ten percent", around 1 time in 10 that event should happen; neither more often nor less; neither 1 time in 100, nor 1 time in 4, but 1 time in 10.

You might ask: Is producing a well-calibrated distribution over Kasparov beyond my abilities as an inferior chess player? The answer is a definite no! There is a trivial way to produce a well-calibrated probability distribution. If my opponent has 37 legal moves, I can assign a probability of 1/37 to each move. This is called a maximum-entropy distribution, representing my total ignorance - I have no idea where my opponent might move; all legal moves seem equally likely to me. (Note: "Maximum entropy" is a mathematical term, not just a colloquial way of saying "totally ignorant". There is a way to calculate the "entropy" of the probability distribution, and 1/37 for each legal move is the unique distribution that maximizes the calculated "entropy".) If I give the maximum-entropy distribution as my reply, then I am perfectly calibrated. Why? Because I assigned 37 different moves a probability of 1 in 37, and exactly one of those moves will happen, so I applied the label "1 in 37" to 37 different events and exactly 1 of those events occurred.

But total ignorance is not very useful, even if you confess it honestly. So the question then becomes whether I can do better than maximum entropy. Is it possible to do better than perfect calibration? Yes. Let's say that you and I both answer a quiz with ten questions. You assign probabilities of 90% to your answers, and get one answer wrong. I assign probabilities of 80% to my answers, and get two answers wrong. We are both perfectly calibrated but you exhibited better discrimination - your answers more strongly distinguished truth from falsehood.

(For more on this subject, see Technical Explanation.)

I can assign a well-calibrated probability distribution over the chess moves of a stronger opponent, even though I'm not certain. If I'm almost totally ignorant, I can still assign a well-calibrated distribution - but it will closely approach the maximum-entropy distribution that assigns equal probability to all legal moves. "Strong confidence" is when you assign probabilities that approach 1.0 or 0.0 - you label one specific outcome "nearly certain" and the others "nearly impossible". That which we call "honest ignorance" is when you assign roughly equal probabilities to most possibilities - you have no idea what might happen; all outcomes seem equally likely to you. In between is "guessing", where some outcomes seem more likely than others, but no outcome has a probability approaching 1.0.

edit] Entropy versus creativity

Suppose that someone shows me an arbitrary chess position, and asks me: "What move would Kasparov make if he played black, starting from this position?" Since I'm not nearly as good a chess player as Kasparov, I can only weakly guess Kasparov's move, and I'll assign a non-extreme probability distribution to Kasparov's possible moves. In principle I can do this for any legal chess position, though my guesses may approach maximum entropy. If you put me in a box and feed me chess positions and get probability distributions back out, then we would have - theoretically speaking - a system that produces Yudkowsky's guess for Kasparov's move in any chess position. We shall suppose (though it may be unlikely) that my prediction is well-calibrated, if not very discriminating.

Now suppose we turn "Yudkowsky's prediction of Kasparov's move" into an actual chess opponent, by having a computer randomly make moves at the exact probabilities I assigned. We'll call this system RYK, which stands for "Randomized Yudkowsky-Kasparov", though it should really be "Random Selection from Yudkowsky's Probability Distribution over Kasparov's Move."

Will RYK be as good a player as Kasparov? Definitely not! Sometimes the RYK system will randomly make dreadful moves which the real-life Kasparov would never make - start the game with P-KN4. I assign such moves a low probability, but sometimes the computer makes them anyway, by sheer random chance. The real Kasparov also sometimes makes moves that I assigned a low probability, but only when the move has a better rationale than I realized - the astonishing, unanticipated queen sacrifice.

Randomized Yudkowsky-Kasparov is definitely no smarter than Yudkowsky, because RYK draws on no more chess skill than I myself possess - I build all the probability distributions myself, using only my own abilities. Actually, RYK is a far worse player than Yudkowsky. I myself would make the best move I saw with my knowledge. RYK only occasionally makes the best move I saw - I won't be very confident that Kasparov would make exactly the same move I would.

Now suppose that I myself play a game of chess against the RYK system.

RYK has the odd property that, on each and every turn, my probabilistic prediction for RYK's move is exactly the same prediction I would make if I were playing against world champion Garry Kasparov.

Nonetheless, I can easily beat RYK, where the real Kasparov would crush me like a bug.

The creative unpredictability of intelligence is not like the noisy unpredictability of a random number generator. When I play against a smarter player, I can't predict exactly where my opponent will move against me. But I can predict the end result of my smarter opponent's moves, which is a win for the other player. When I see the randomized opponent make a move that I assigned a tiny probability, I chuckle and rub my hands, because I think the opponent has randomly made a dreadful move and now I can win. When a superior opponent surprises me by making a move to which I assigned a tiny probability, I groan because I think the other player saw something I didn't, and now I'm about to be swept off the board. Even though it's exactly the same probability distribution! I can be exactly as uncertain about the actions, and yet draw very different conclusions about the eventual outcome. (Technical note: This situation is possible because I am not logically omniscient; I do not explicitly represent a joint probability distribution over all entire games.)

When I play against a smarter player, I can't predict exactly where my opponent will move against me. If I could predict that, I would necessarily be at least that good at chess myself. But I can predict the consequence of the unknown move, which is a win for the other player; and the more the player's actual action surprises me, the more confident I become of this final outcome.

The unpredictability of intelligence is a very special and unusual kind of surprise, which is not at all like noise or randomness. There is a weird balance between the unpredictability of actions and the predictability of outcomes.

edit] What is the empirical content of beliefs about intelligence?

The strength of a hypothesis is determined by its simplicity and by the amount of probability mass it concentrates into the exact outcome observed. For example, suppose that I predict that the price of a cookie on Tuesday will be between 1 and 50 cents, while you predict that the price will be between 31 and 35 cents. If the price is 34 cents, both of our predictions came true, but yours concentrated ten times as much probability mass into the exact outcome of 34. Guessing an outcome between 1 and 50, without further specification, is like assigning a 2% probability to each of 50 possible numbers, while guessing an outcome between 31 and 35 is like assigning 20% probability to each of 5 possible numbers. The more probability mass your hypothesis concentrates into the actual observed outcome, the better you do. If a hypothesis is unfalsifiable - if you can make any observation seem to fit the hypothesis equally well - then the hypothesis doesn't concentrate its probability mass at all; it is a disguised maximum-entropy probability distribution, which is to say, a cleverly masked form of total ignorance. (For more on this, again see Technical Explanation.)

Since I am so uncertain of Kasparov's move, what is the empirical content of my belief that "Kasparov is a highly intelligent chess player"? What real-world experience does my belief tell me to anticipate? Is it a cleverly masked form of total ignorance?

To sharpen the dilemma, suppose Kasparov plays against some mere chess grandmaster Mr. G, who's not in the running for world champion. My own ability is far too low to distinguish between these levels of chess skill. When I try to guess Kasparov's move, or Mr. G's next move, all I can do is try to guess "the best chess move" using my own meager knowledge of chess. Then I would produce exactly the same prediction for Kasparov's move or Mr. G's move in any particular chess position. So what is the empirical content of my belief that "Kasparov is a better chess player than Mr. G"?

The empirical content of my belief is the testable, falsifiable prediction that the final chess position will occupy the class of chess positions that are wins for Kasparov, rather than drawn games or wins for Mr. G. (Counting resignation as a legal move that leads to a chess position classified as a loss.) The degree to which I think Kasparov is a "better player" is reflected in the amount of probability mass I concentrate into the "Kasparov wins" class of outcomes, versus the "drawn game" and "Mr. G wins" class of outcomes. These classes are extremely vague in the sense that they refer to vast spaces of possible chess positions - but "Kasparov wins" is more specific than maximum entropy, because it can be definitely falsified by a vast set of chess positions.

The outcome of Kasparov's game is predictable because I know, and understand, Kasparov's goals. Within the confines of the chess board, I know Kasparov's motivations - I know his success criterion, his utility function, his target as an optimization process. I know where Kasparov is ultimately trying to steer the future and I anticipate he is powerful enough to get there, although I don't anticipate much about how Kasparov is going to do it.

How exactly do I describe "where Kasparov is trying to steer the future"? In the case of chess, there's a simple function that classifies chess positions into wins for black, wins for white, and drawn games. If I know which side Kasparov is playing, I know the class of chess positions Kasparov is aiming for. (If I don't know which side Kasparov is playing, I can't predict whether black or white will win - which is not the same as confidently predicting a drawn game.)

More generally, I can describe motivations using a preference ordering. When I consider two potential outcomes, A and B, I can say that I prefer A to B, prefer B to A, or find myself indifferent between them. I would write these relations as A > B, B > A, or B ~ A. Suppose that we have the ordering A < B < C ~ D ~ E ~ F < G ~ H ~ I. Then you like B more than A, and C more than B. But {C, D, E, F} all belong to the same class, seem equally desirable to you; you are indifferent between which of {C, D, E, F} you receive, though you would rather have any of them than B, and you would rather have G (or H, or I) than any of C, D, E, or F.

When I think you're a powerful intelligence, and I think I know something about your preferences, then I'll predict that you'll steer reality into regions that are higher in your preference ordering. Think of a huge circle containing all possible outcomes, such that outcomes higher in your preference ordering appear to be closer to the center. Outcomes between which you are indifferent are the same distance from the center - imagine concentric rings of outcomes that are all equally preferred. If you aim your actions and strike a consequence close to the center - an outcome that ranks high in your preference ordering - then I'll think better of your ability to aim.

The more intelligent I believe you are, the more probability I'll concentrate into outcomes that I believe are higher in your preference ordering - that is, the more I'll expect you to achieve a good outcome, and the better I'll expect the outcome to be. Even if a powerful enemy opposes you, so that I expect the final outcome to be one that is low in your preference ordering, I'll still expect you to lose less badly if I think you're more intelligent.

edit] Side effects

Suppose that at the end of the game, I count the number of pieces on white squares, subtract the number of pieces on black squares, and ask whether the resulting number is odd or even - call this the "parity of the board". I don't know what the board parity will be at the end of the game; I assign 50/50 odds to two possibilities, representing my complete ignorance. The reason I can't make any prediction is that Kasparov doesn't care about the board's parity - there's no term for board parity in Kasparov's preference function, that I know of.

The exact final state of the board is determined by Kasparov and his opponent, both trying to steer the chess game. Their actions repeatedly affect the board's parity. Otherwise the board would just keep the even parity it has at the start of the game. But neither Kasparov nor his opponent cares specifically about the parity of the board - they aren't paying attention to it. Not caring about something isn't the same as wanting to leave it untouched. Neither Kasparov nor G have an explicit term in their preference function for the board parity - they don't even notice the board parity - but this does not imply that the board parity remains unchanged throughout the game. From my perspective, Kasparov and Mr. G randomize the board parity as a side effect of influencing the properties they do care about.

edit] Quantifying optimization and intelligence

We are now ready to define quantitatively the power of an optimization process. In addition to the notion of a preference ordering, already introduced, we'll need the further concept of a state space of possible plans, possible designs, or possible outcomes. For example, looking at a Toyota Corolla, we could regard the state space as the set of all possible molecular configurations of the same atoms.

Given a description of what is possible, and a preference ordering over the possibilities, then I can look at the outcome actually achieved - for example, the actual design of the Toyota Corolla - and ask:

How many possibilities in the state space would be as good or better than the actual outcome, under the preference ordering? How many possibilities are there, total, in the entire state space?

Divide the first number by the second. The result is the fraction of outcomes as-good-or-better within the total space of possibilities. This gives you a quantifiable measure of how small a target the optimization process was able to hit.

If you take the base two logarithm of the reciprocal of this fraction, that gives you the power of an optimization process measured in bits.

For example, suppose there are 1024 possible outcomes, and you achieve an outcome X. And suppose that there are only 4 possible outcomes that you regard as "as good as X or better", including X itself. Then only 1 in 256 possible outcomes are "as good or better" than the outcome actually achieved. An optimization process that reliably hits this close to the center does 8 bits of optimization.

(The mathematically sophisticated will recognize that I am measuring the entropy of something. We might call it the entropy of a system relative to a preference ordering. As always in our universe where Liouville's Theorem holds, it takes work to reduce entropy - any kind of entropy.)

It's about equally difficult to do 8 bits of optimization whether there are only 4 satisfactory outcomes in a space that contains 1024 possibilities, or only 1,000,000 satisfactory outcomes in a space that contains 256,000,000. In either case, the relative size of the target is the same. In either case, you would need to randomly search around 256 cases to find a satisfactory outcome, if you didn't have any way to search more efficiently.

You may also find it convenient to think in terms of utility functions, a kind of preference that is more structured than simple ordering. A utility function is when you can assign a real number saying exactly how much you want something - for example, you might assign a utility of 15 to eating chocolate ice cream, a utility of 10 to eating vanilla ice cream, and a utility of 0 to receiving no ice cream. Then you would prefer chocolate ice cream to vanilla ice cream; you would also prefer a 70% chance of receiving chocolate ice cream to a 100% chance of receiving vanilla ice cream.

You could also measure the fraction of all possible outcomes with utility greater than or equal to e.g. 42, and thereby get the observed power of an optimization process. If only one outcome in a million has a utility of 42 or better, then reliably achieving an outcome this good would require around 20 bits of optimization.

The two known powerful optimization processes in this universe, human intelligence and natural selection, both produce outcomes that are vastly improbable - thousands of bits or more. The usual analogy is "How long does it take a monkey randomly hitting typewriter keys to type the complete works of Shakespeare?" If you relax your requirements by allowing the monkey to produce any work of length and quality equal to a Shakespearean play (as judged by a fair-minded literary critic), it still takes a very long time. Program your computer to show you random strings of letters and punctuation, and see how long it takes to produce a single comprehensible sentence, let alone a page. It doesn't take much optimization pressure to leave the space of things that pure randomness could produce in a mere billion ages of the universe.

edit] Could we recognize an alien intelligence?

Could I recognize an alien intelligence as exceptionally smart, without understanding the alien mind's motivations, the way I understand Kasparov's goal in chess?

I could land on an alien planet and discover what seemed to be a highly sophisticated machine, all gleaming chrome as the stereotype demands. Can I recognize this machine as being in any sense well-designed, if I have no idea what the machine is intended to accomplish? Can I guess that the machine's makers were intelligent, without guessing their motivations?

I could examine a piece of the machine under a microscope, and discover billions of tiny transistors. What are the transistors computing? I don't know. But I can still recognize well-designed transistors and guess that the machine is computing something. There are many different possible computing problems which will require the aliens to solve the subproblem of efficiently processing information.

I can look at cables through which large electrical currents are running, and be astonished to realize that the cables are flexible, high-temperature, high-amperage superconductors - an amazingly good solution to the subproblem of transporting electricity that is generated in one location and used in another.

I can look at gears, whirring rapidly, and imagine that if those gears had random shapes, they would clash and fly apart and generate destructive internal forces - the gears seem to have been selected from a tiny subset of possible whirring shapes, such that the shapes mesh when they rotate.

In this scenario I have just imagined, what I recognize within the alien machine are well-optimized subgoals similar to the subgoals of human engineers. Subgoals might overlap even if the final goals are widely different from our own. I might also be able to infer a subproblem by inspecting a part of the machine, much more easily than I could infer the alien's psychological desires and final purposes.

If there are no subproblems to which I can recognize a good solution, then I can't recognize the machine as a "machine"! Think back to the Toyota Corolla; it occupies an infinitesimal fraction of state space describing "vehicles" of equal or greater speed, efficiency, safety, reliability, and comfort. This is something to remark upon when I see the Corolla (or any other car, even a Model-T). But if I don't see any criterion for the parts or the whole, so that, as far as I know, a random volume of air molecules or a clump of dirt would be just as surprising, just as worthy of remark, then why am I focusing on this particular object and saying, "Here is a machine"? Why not say the same about a cloud or a rainstorm? Why is it a good hypothesis to suppose that intelligence or any other optimization process played a role in selecting the form of what I see, any more than it is a good hypothesis to suppose that the dust particles in my rooms are arranged by dust elves?

Even the gleaming chrome exterior of the machine is a solution to the subproblem of protecting the machine's internal parts from the environment. If the machine is made of hard materials which retain their shape over time, then that is a solution to making a function persistent - ensuring that an invention, once it is designed and built, continues functioning over time.

If you can't identify any optimization target at all, you don't have optimization, you just have noise. Every possible configuration would appear to equally fit the criterion; every possible configuration would be assigned equal probability; nothing you could observe would falsify the theory. This is a hypothesis of maximum entropy.

edit] Creativity and breaking the rules

Creativity is surprising - but not just any kind of surprise counts as a creative surprise. Suppose I set up an experiment involving a quantum event of very low amplitude, such that the macroscopic probability is a hundred million to one. If the event is actually observed to occur, it is a happenstance of extremely low probability, and in that sense surprising. But it is not a creative surprise. Surprisingness is not a sufficient condition for creativity.

Creativity, as we all know, involves breaking the rules - but not all the rules. If everyone builds their cars from iron triangles, and I build a better car using bronze squares, then that is a creative surprise. I broke the surface rules normally used to invent solutions, and I built a better car thereby. Ordinarily, one would expect a car built from bronze squares to catch fire and explode; and yet this car starts up and drives to the supermarket. How unexpected! How surprising! But the result must still be a car. If I tried to make a better car from bronze squares, and failed completely, ending up with a heap of scrap metal, there would be nothing surprising about that. More experienced engineers would just shake their heads wisely and say, "That's why we use iron triangles, kiddo."

The pleasant shock of witnessing Art arises from the constraints of Art - from watching a skillful archer send an arrow into an exceedingly narrow target. Static on a television screen is not beautiful, it is noise.

In the strange domain known as Modern Art, people sometimes claim that their goal is to break the rules, to defy convention, for its own sake. They put up a blank square of canvas, and call it a painting; and by now that is considered staid and boring Modern Art, because a blank square of canvas still hangs on the wall and has a frame. What about a heap of garbage? That can also be Modern Art! Surely, this demonstrates that true creativity knows no rules, and even no goals...

But the rules are still there, though unspoken. I could submit a realistic landscape painting as Modern Art, and this would be rejected because it violates the rule that Modern Art cannot delight the untrained senses of a mere novice. Or better yet, if a heap of garbage can be Modern Art, then I'll claim that someone else's heap of garbage is my work of Modern Art - boldly defying the convention that I need to produce something for it to count as my artwork. Or what about the pattern of dust particles on my desk? Isn't that Art? Flushed with triumph, I present to you an even bolder, more convention-defying work of Modern Art - a stunning, outrageous piece of performance art that, in fact, I never performed. I am defying the foolish convention that I need to actually perform my performance art for it to count as Art.

Now, up to this point, you probably could still get a grant from the National Endowment for the Arts, and get sophisticated critics to discuss your shocking, outrageous non-work, which boldly violates the convention that art must be real rather than imaginary. But now suppose that you go one step further, and refuse to tell anyone that you have performed your work of non-Art. You even refuse to apply for an NEA grant. It is the work of Modern Art that never happened and that no one knows never happened; it exists only as my concept of what I am supposed not to conceptualize. Better yet, I will say that my Modern Art is your non-conception of something that you are not conceptualizing. Here is the ultimate work of Modern Art, that truly defies all rules: It isn't mine, it isn't real, and no one knows it exists...

And this ultimate rulebreaker you could not pass off as Modern Art, even if NEA grant committees knew that no one knew it existed. For one thing, they would realize that you were making fun of them - and that is an unspoken rule of Modern Art that no one dares violate. You must take yourself seriously. You must break the surface rules in a way that allows sophisticated critics to praise your boldness and defiance with a straight face. This is the unwritten real goal, and if it is not achieved, all efforts are for naught. Whatever gets sophisticated critics to praise your rule-breaking is good Modern Art, and whatever fails in this end is poor Modern Art. Within that unalterable constraint, you can use whatever creative means you like.

But let us turn from Modern Art to more conventional forms of creativity, such as engineering. Does creative engineering sometimes involve altering your goals? First my goal was to try and figure out how to build a car using iron triangles; now my goal is to build a car using bronze squares...

Creativity clearly involves altering my local intentions, my what-I'm-trying-to-do-next. I begin by intending to configure iron triangles, to build a car, to drive to the supermarket, to buy food, to eat food, so that I don't starve to death, because I prefer being alive to starving to death. I may creatively use bronze squares, instead of iron triangles; creatively walk, instead of driving; creatively drive to a gas station, instead of a supermarket; creatively grow my own vegetables, instead of buying them; or even creatively devise a way to run my body on electricity, instead of chemical energy. What does not count as "creativity" is creatively preferring to starve to death, rather than eating. This "solution" does not strike me as very impressive; it involves no effort, no intelligence, and no surprises. If this is someone's idea of how to break all the rules, they would become pretty easy to predict.

Are there cases where you genuinely want to change your preferences? You may look back in your life and find that your moral beliefs have changed over decades, and that you count this as progress. Civilizations also change their morals over time. In the seventeenth century, people used to think it was okay to enslave people with differently colored skin; and now we think otherwise.

The notion of "change in preferences" gets into Friendly AI issues which are far beyond the scope of this particular essay - though see Coherent Extrapolated Volition.

But you might guess by now, you might somehow intuit, that if these moral changes seem interesting and important and vital and indispensable, then not just any change would suffice. You might suspect that you're judging potential changes as better or worse, even if you can't consciously, verbally report the rules that govern your intuitive perceptions. If there's no criterion, no target, no way of choosing - then your current point in state space is just as good as any other point, no more, no less; and you might as well keep your current state, unchanging, forever.

Every improvement is necessarily a change, but not every change is an improvement. If all you learn from observing a history of improvements is that "change is good", and so you chase after change, any change - then that's rather like the dogs in Pavlov's famous experiment who salivated at the sound of a bell, whether or not the bell was accompanied by meat. You've trained yourself to chase the wrong stimulus.

edit] The supposed role of randomness in intelligence

Now imagine forgetting everything you've just read, and approaching the problem from a purely instinctive perspective. You might instinctively think something like this:

When someone shows me how to build a toaster that's vastly more efficient than any toaster I've ever seen before, I'm surprised .

. When I thought someone was trustworthy, and then it turns out they embezzled all the money from my bank account, I'm surprised .

. It's clear that you can't be super-smart without generating surprises .

. Therefore a smarter-than-human AI might surprisingly decide to kill humans.

The reasoning here follows the form:

Major premise: All oranges are fruits.

Minor premise: All apples are fruits.

Therefore, all oranges are apples.

When you describe different events using the same word "surprise", they don't thereby become the same sort of thing. And it doesn't follow that one kind of surprise implies the other. Marvin Minsky labeled this the problem of "suitcase words" - when you describe many different phenomena using the same word, and then reason about them as if they were indistinguishable.

If an AI is unpredictable in its exact actions, must it be unpredictable in its optimization target - its motives, its goals - or in the consequences of its actions? So far I have argued that there is no logical necessity to this effect - it is not "paradoxical" for Deep Blue to predictably make unpredictably good chess moves.

But one might still argue that there is a pragmatic necessity for some sort of genuine unpredictability as to motives. Maybe cognition must make internal use of chaos/randomness/noise in order to work effectively, and these chaotic internal algorithms will give rise to surprising surface behavior of the "surprisingly kill humans" type.

Chaos, of itself, is not dangerous - or at least, it's not a danger on the special level of AI. If you send a string of random ones and zeroes to motor output, that causes the AI to jerk around randomly, but it doesn't cause the AI to go on a killing spree - the resulting actions will not be optimized to cause harm to humans. Rather, the idea seems to be that an AI whose cognitive processes make use of noise, even if designed to be Friendly, has an unavoidable probability of going on a deliberate killing spree.

In other words, it's argued that a mind, searching for plans that strike close to the center of the criterion of helping humans, can only search effectively, using chaotic search methods that can potentially output motor actions coherently optimized to kill humans. It's argued that a really smart AI must include noisy cognitive processes that potentially do this. It's argued that to strike at the center of an optimization target, there is no way to get a really good aim, without using an aiming process that has so much unpredictability in it, that it can potentially end up aiming somewhere else - even the exact opposite direction.

But wait - why should cognition run on randomness? Why does this make any more sense than cognition running on peanut butter?

Maybe people observe that intelligence generates "surprises", and conclude that intelligence must run on surprise-stuff as fuel. There is a well-known principle of magic called the Law of Similarity which states that Effects Resemble Causes, which is why, in prescientific cultures, there are rituals like pouring water on the ground to summon rain. Similarly, if objects catch on fire and burn, the cause must be a mysterious fire-stuff called "phlogiston"...

But there are more serious arguments for randomness playing a role in cognition, so let's address those first.

edit] Calculating the power of pure randomness

You may note a certain trend in this essay: I've been arguing that noise hath no power, nor yet beauty from entropy, nor strength from randomness.

We can formalize this argument, using the concepts of a state space of possibilities, a preference ordering, and a fraction that describes the propertion of possibilities "as good or better" than some example. Or, if you prefer to think in terms of utility functions, then consider the fraction of all possible outcomes with utility greater than or equal to 42.

Suppose this fraction is 0.02: only 2% of all outcomes are outcomes with a utility of 42 or higher. And then suppose you observe an outcome with a utility of 42 (say, the car starts up immediately and drives to the supermarket in 8 minutes using a tenth of a gallon of gas). Then the likelihood of getting an outcome this good by pure chance is, obviously, 2%.

This may not sound very profound. But you may have heard people talking about emergence as if it could be used to explain complex, functional orders. People will say that the complex functional order of an ant colony emerges - as if, starting from ants that had been selected only to function as solitary individual ants, they got together in a group for the first time and the highly useful order of an ant colony popped right out. Actually, the complex order of the ant colony was produced by natural selection, the nonchance retention of chance mutations. A million mutations occur; by chance, one mutation builds an organism which reproduces more frequently. Because organisms which reproduce more often produce more copies of the genes they carry, the one mutant in a million that is lucky may become universal in the gene pool. This cycle repeats over, and over, and over again, through millions of generations, until you're left with an organism that could not possibly be explained by emergence - whose probability of emerging by pure luck would be infinitesimal over the lifespan of our universe.

The order of an ant colony is an evolved pattern, not an emergent pattern. If you shake up atoms randomly, with no natural selection operating, nothing resembling the higher levels of organization in the ant colony will fall out of the box.

Pure randomness has no more power than we would expect it to have. If an outcome is one in a million in our preference ordering, it will take an average of one million tries to produce it by pure randomness.

A probability of one in a million corresponds to only 20 bits of information. Mathematician's "bits" are not like ones and zeroes on a hard drive. 20 magnetic spots on a hard drive can transmit at most 20 bits of information, if an optimal encoding is used. (For more on this, see Shannon Information.) Many products of intelligence are optimized far beyond one in a million - they contain so many bits of information as to place them far beyond anything pure randomness could produce in the lifetime of a universe.

I emphasize "pure" randomness because if you combine a random process like mutation with a nonrandom process like selection - organisms dying or reproducing in a way that correlates nonrandomly with their genes - then you can get millions of bits of optimization in just a few billion years. And, just to anticipate the nitwit creationists, "non-random component" does not mean "orchestrated by a secret intelligence behind the scenes". "Non-random" means simple correlation: it is not the case that every possible genome has exactly the same chance of reproducing.

Pure randomness does not yield optimization, except in the sense that a billion tries may yield one result that apparently possesses 30 bits of optimization.

It should now be clear that a nonrandom component is necessary for high degrees of optimization.

But is it, perhaps, equally necessary to have a random component as well?

edit] Can we do better by adding randomness?

You may have heard that certain algorithms in Artificial Intelligence work better when we inject randomness into them. Is this true, and if so, how is it possible?

Technical Explanation discusses the Bayesian scoring method when you answer many questions in a row. There are many important properties that a scoring method should have. One of them is that if you pretend to be more confident than you really are, you should do worse. It's quite possible to do worse than a maximum-entropy estimate, if you know nothing but pretend otherwise.

Suppose you were asked twelve multiple-choice questions with four options apiece, and you gave your answer to each question in the form of a probability distribution over the four options - for each question you would give your probability that option A was correct, then that option B was correct, then C and D, with the probabilities summing to 1. How do we score you? For each question, we look at the probability that you assigned to the actual, correct answer, ignoring the probabilities assigned to other answers. Then we multiply together the probabilities assigned to the correct answer on all twelve questions. The result is the joint probability you assigned to the final outcome, that is, the probability you assigned to the correct answer-sheet for the entire test.

(Suppose you flip a coin three times. If you think that "heads" is 50% probable on the first flip, 50% probable on the second flip, and 50% probable on the third flip, and you think the coinflips are uncorrelated, then your probability of seeing "HHH" is 1/8. For more details on this, including what happens if the coinflips are correlated, see Technical Explanation.)

The maximum-entropy distribution for a question with four options is a probability of 1/4 for each option. Is it possible to score worse than maximum entropy? Sure! For example, you could, on each of your twelve questions, assign 85% probability to one answer, and 5% apiece to the other three answers. But despite your high confidence, you do no better than random chance, answering three questions correctly and nine incorrectly. So your final score, the joint probability you assigned to the entire answer-sheet, is (0.85)^3 * (0.05)^9 = 1.2e-12. If you'd given the maximum-entropy response, you would have been guaranteed a score of (0.25)^12 = 6e-8.

If you're strongly confident in wrong answers, it is quite possible to do worse than if you confess total ignorance. In this case, you will be able to predictably do better by adjusting your probability distribution toward greater entropy - by moving closer to the maxentropy distribution. One may distinguish stupidity from ignorance. Confessing your own ignorance is not a substitute for actually knowing something, but it's a step up from being stupid.

Similarly, it is quite possible for injecting randomness to improve a system's performance. Adding noise can predictably decrease the entropy of a system relative to your preference ordering. All you need is a system that starts out in a state that is literally worse than random - one that is worse than a majority of possible states, or with utility lower than average. If so, replacing the current state with a random state is expected to result in an improvement (although, with sufficiently bad luck, it could make things even worse).

If the average utility of a randomly selected state is -10, and the current system starts out with a utility of -100, then adding noise will cause the system to revert toward the mean. The expected utility will creep back up toward -10 until it approaches the level you could have gotten by pure randomness.

This is something to think about when you hear that the performance of an Artificial Intelligence algorithm can be improved by adding noise to it. To improve an algorithm by injecting randomness into it, the unrandomized version must (on some step) do worse than random.

This is not quite as severe an indictment of "algorithms that are improved by randomness" as it may sound. Imagine that we're trying to solve a pushbutton combination lock with 20 numbers and four steps - 160,000 possible combinations. And we try the following algorithm for opening it:

Enter 0-0-0-0 into the lock. If the lock opens, return with SUCCESS. If the lock remains closed, go to step 1.

Obviously we can improve this algorithm by substituting "Enter a random combination" on the first step.

If we were to try and explain in words why this works, a description might go something like this: "When we first try 0-0-0-0 it has the same chance of working (so far as we know) as any other combination. But if it doesn't work, it would be stupid to try it again, because now we know that 0-0-0-0 doesn't work."

The first key idea is that, after trying 0-0-0-0, we learn something - we acquire new knowledge, which should then affect how we plan to continue from there. This is knowledge, quite a different thing from randomness...

What exactly have we learned? We've learned that 0-0-0-0 doesn't work; or to put it another way, given that 0-0-0-0 failed on the first try, the conditional probability of it working on the second try, is negligible.

Consider your probability distribution over all the possible combinations: Your probability distribution starts out in a state of maximum entropy, with all 160,000 combinations having a 1/160,000 probability of working. After you try 0-0-0-0, you have a new probability distribution, which has slightly less entropy; 0-0-0-0 has an infinitesimal probability of working, and the remaining 159,999 possibilities each have a 1/159,999 probability of working. To try 0-0-0-0 again would now be stupid, as defined above - the expected utility of trying 0-0-0-0 is less than average; the vast majority of potential actions now have higher expected utility than does 0-0-0-0. An algorithm that tries 0-0-0-0 again would do worse than random, and we can improve the algorithm by randomizing it.

One may also consider an algorithm as a sequence of tries: The "unrandomized algorithm" describes the sequence of tries 0-0-0-0, 0-0-0-0, 0-0-0-0... and this sequences of tries is a special sequence that has below-average expected utility in the space of all possible sequences. Thus we can improve on this sequence by selecting a random sequence instead.

Or imagine that the combination changes every second. In this case, 0-0-0-0, 0-0-0-0 is just as good as the randomized algorithm - no better and no worse. What this shows you is that the supposedly "random" algorithm is "better" relative to a known regularity of the lock - that the combination is constant on each try. Or to be precise, the reason the random algorithm does predictably better than the stupid one is that the stupid algorithm is "stupid" relative to a known regularity of the lock.

However, the random algorithm is still not optimal - it does not take full advantage of the knowledge we have acquired. A random algorithm might randomly try 0-0-0-0 again; it's not impossible, but it could happen. The longer the random algorithm runs, the more likely it is to try the same combination twice; and if the random algorithm is sufficiently unlucky, it might still fail to solve the lock after millions of tries. We can take full advantage of all our knowledge by using an algorithm that systematically tries 0-0-0-0, 0-0-0-1, 0-0-0-2... This algorithm is guaranteed not to repeat itself, and will find the solution in bounded time. Considering the algorithm as a sequence of tries, no other sequence in sequence-space is expected to do better, given our initial knowledge. (Any other nonrepeating sequence is equally good; but nonrepeating sequences are rare in the space of all possible sequences.)

A combination dial often has a tolerance of 2 in either direction. 20-45-35 will open a lock set to 22-33-44. In this case, the algorithm that tries 0-1-0, 0-2-0, et cetera, ends up being stupid again; a randomized algorithm will (usually) work better. But an algorithm that tries 0-5-0, 0-10-0, 0-10-5, will work better still.

Sometimes it is too expensive to take advantage of all the knowledge that we could, in theory, acquire from previous tests. Moreover, a complete enumeration or interval-skipping algorithm would still end up being stupid. In this case, computer scientists often use a cheap pseudo-random algorithm, because the computational cost of using our knowledge exceeds the benefit to be gained from using it. This is does not show the power of randomness, but, rather, the predictable stupidity of certain specific deterministic algorithms on that particular problem. Remember, the pseudo-random algorithm is also deterministic! But the deterministic pseudo-random algorithm doesn't belong to the class of algorithms that are predictably stupid (do much worse than average).

edit] Noise and overfitting

There are other possible reasons why a noisy AI algorithm might work better than the noiseless version. There is always (I assert) some reason why the noiseless algorithm is being stupid (worse-than-random), somewhere or other; but the reason can get rather technical. For example, there are neural network training algorithms that work better if you simulate noise in the neurons. On this occasion it is especially tempting to say something like, "Lo! When we make our artificial neurons noisy, just like biological neurons, they work better! Behold the healing life-force of entropy!" What might actually be happening - for example - is that the network training algorithm, operating on noiseless neurons, would vastly overfit the data. If you expose the noiseless network to the series of coinflips "HTTTHHTTH"... the training algorithm will say the equivalent of, "I bet this coin was specially designed to produce HTTTHHTTH every time it's flipped!" instead of "This coin probably alternates randomly between heads and tails." A hypothesis overfitted to the data does not generalize. On the other hand, when we add noise to the neurons and then try training them again, they can no longer fit the data precisely, so instead they settle into a simpler hypothesis like "This coin alternates randomly between heads and tails."

To describe what was going on inside the combination lock, we needed concepts like expected utility, conditional probability, and learning from evidence. To describe what goes on inside the far more complex neural network, we would need far more sophisticated concepts, like prior probability, Kolmogorov complexity, Solomonoff induction, Vapnik-Chervonenkis dimension, and computational learning theory. But the general idea is still that the noiseless version of the network training algorithm is stupid on a certain stage of its operation - it overfits the data - and the noisy version substitutes ignorance-better-than-stupidity on that stage of the algorithm.

But the noisy network is not optimal. If we see a coin produce HTTTHHTTH we should not suspect that it is set to always produce HTTTHHTTH; but it is quite a different matter if we see the coin produce HTTTHHTTH on the first set of nine trials, HTTTHHTTH again on the second set, HTTTHHTTH again on the third set, and so on. The noisy neural network may never learn such a hypothesis.

There are other ways to avoid overfitting data - techniques deliberately constructed around principled notions such as prior probability. These methods do not blur the sensory data or add noise to the computing elements. These principled methods can learn precise hypotheses, but demand extra evidence to justify the extra complexity relative to vague hypotheses. These principled methods can take complete advantage of all the information they have, and produce better results thereby; just as, on the lockpicking problem, enumerating a non-repeating sequence of combinations takes full advantage of all information gained, and therefore works better than random tries that may repeat themselves.

edit] Noise and hill-climbing

What about hill-climbing, simulated annealing, or genetic algorithms? These AI algorithms are local search techniques that randomly investigate some of their nearest neighbors. If an investigated neighbor is superior to the current position, the algorithm jumps there. (Or sometimes probabilistically jumps to a neighbor with probability determined by the difference between neighbor goodness and current goodness.) Are these techniques drawing on the power of noise?

Local search algorithms take advantage of the regularity of the search space - that if you find a good point in the search space, its neighborhood of closely similar points is a likely place to search for a slightly better neighbor. And then this neighbor, in turn, is a likely place to search for a still better neighbor; and so on. To the extent this regularity of the search space breaks down, hill-climbing algorithms will perform poorly. If the neighbors of a good point are no more likely to be good than randomly selected points, then a hill-climbing algorithm simply won't work. We might as well search random points, rather than following a path of increasing fitness through the search space. (An excellent introductory work on this subject is Artificial Intelligence: A Modern Approach by Russell and Norvig.)

Doesn't a local search algorithm need to make random changes to the current point in order to generate neighbors for evaluation? Not necessarily; some local search algorithms systematically generate all possible neighbors, and select the best one. These greedy algorithms work fine for some problems, but on other problems it has been found that greedy local algorithms get stuck in local minima. The next step up from greedy local algorithms, in terms of added randomness, is random-restart hill-climbing - as soon as we find a local maximum, we restart someplace random, and repeat this process a number of times. For our final solution, we return the best local maximum found when time runs out. Random-restart hill-climbing is surprisingly useful; it can easily solve some problem classes where any individual starting point is unlikely to lead to a global maximum or acceptable solution, but it is likely that at least one of a thousand individual starting points will lead to the global maximum or acceptable solution.

The non-randomly-restarting, greedy, local-maximum-grabbing algorithm, is "stupid" at the stage where it gets stuck in a local maximum. Once you find a local maximum, you know you're not going to do better by greedy local search - so you may as well try something else with your time. Picking a random point and starting again is drastic, but it's not as stupid as searching the neighbors of a particular local maximum over and over again. (Evolution may do this, and often does get stuck in local optima. Evolution, being unintelligent, has no mind to "notice" when it is testing the same genomes over and over.)

Even more stupid is picking a particular starting point, and then evaluating its fitness over and over again, without even searching its neighbors. This is the lockpicker who goes on trying 0-0-0-0 forever. (This is what evolution would be like without any mutations. But since most mutations are detrimental, evolution favors mechanisms that reduce the number of mutations. That this path might ultimately lead to static genomes is not something evolution would "consider".)

Hill-climbing search is not so much a little bit randomized compared to the completely stupid lockpicker, as almost entirely nonrandomized compared to a completely ignorant searcher. We search only the local neighborhood, rather than selecting a random point from the entire state space. That probability distribution has been narrowed enormously, relative to the overall state space. This exploits the knowledge we gained by finding a good point that was likely to also have good neighbors.

You can imagine splitting a hill-climbing algorithm into components that are "deterministic" (or rather, knowledge-exploiting) and "randomized" (the leftover ignorance). A programmer writing a probabilistic hill-climber will use some formula to assign probabilities to each neighbor, as a function of the neighbor's fitness. For example, a neighbor with a fitness of 60 might have probability 80% of being selected, while other neighbors with fitnesses of 55, 52, and 40 might have selection probabilities of 10%, 9%, and 1%. The programmer writes a deterministic algorithm, a fixed formula, that produces these numbers - 80, 10, 9, and 1. What about the actual job of making a random selection at these probabilities? Usually the programmer will hand that job off to someone else's pseudo-random algorithm - almost any programming language's standard libraries will contain a standard pseudo-random algorithm; there's no need to write your own. If the hill-climber doesn't seem to work well, the programmer tweaks the deterministic part of the algorithm, the part that assigns these fixed numbers 80, 10, 9, and 1. The programmer does not say - "I bet these probabilities are right, but I need a source that's even more random like a thermal noise generator, instead of this merely pseudo-random algorithm that is ultimately deterministic!" The programmer does not go in search of better noise.

It is theoretically possible for a poorly designed "pseudo-random algorithm" to be stupid relative to the search space; for example, it might always jump in the same direction. But the "pseudo-random algorithm" has to be really shoddy for that to happen. You're only likely to get stuck with that problem if you reinvent the wheel instead of using a standard, off-the-shelf solution. A decent pseudo-random algorithm works just as well as a thermal noise source on optimization problems. It is possible (though difficult) for an exceptionally poor noise source to be exceptionally stupid on the problem, but you cannot do exceptionally well by finding a noise source that is exceptionally random. The power comes from the knowledge - the deterministic formula that assigns a fixed probability distribution. It does not reside in the remaining ignorance. If you knew even more, you would do better, not worse.

edit] Noise and natural selection

What about natural selection? Isn't that the classic algorithm for drawing on the power of randomness?

There is a popular conception that "mutations" are good things, that "mutants" have supernormal abilities - that the strength of evolution lies in its magical power to produce good mutations. I recall particularly a trailer for an X-Men movie which voiced over: "In every human being... there is the genetic code... for mutation..."

Evolutionary Biology is a complex subject; simple statements rarely do it justice. Nonetheless this is not how evolution works. The vast majority of mutations are neutral or deleterious. Very, very few are improvements. And this is what you would expect - the higher the utility, the smaller the region of configuration space with equal or greater utility. Most of the time, a random move will take you away from the center. Most mutations are bad for you. The power of natural selection is not that it produces good mutations, but that good mutations are selectively retained more often than bad mutations. It is nonrandom selection, not random mutation, which carries the power. Random mutation, by itself, would do nothing. But we could just as easily substitute a deterministic pseudo-random algorithm for making mutations (which is exactly what most genetic algorithms do), and natural selection would do just as well as if it were "really ultimately random".

Natural selection is much simpler than human intelligence, and correspondingly less efficient. Natural selection is so simple, in fact, that we can use simple math to describe its characteristics as an optimization process, including its inefficiency. For example, suppose there's a gene which has a 3% fitness advantage relative to its alternatives at that allele; an individual with this gene has, on average, around 3% more children than others. Imagine that a single mutant is born with this advantageous gene. (Remember, evolution isn't going to magically produce a batch of mutants all with the same advantageous mutation; this advantageous mutation was produced by a stray cosmic ray, along with innumerable deleterious or neutral mutations.) There's a certain probability that the advantageous mutation will die out of the population, by sheer bad luck, before it can promote itself to fixation. Superfly gets squashed by an elephant. So, if the first mutant has a 3% fitness advantage, what is the probability that this gene spreads through the whole population, as opposed to dying out? This calculation turns out to be independent of most things you would expect it to depend on, like population size, and the answer turns out to be 6%. The general rule is that if the fitness advantage is s, then the probability of fixation is 2s. So even if you have a mutation that confers a 3% advantage in fitness, which is huge as mutations go, the chance is only 6% that the mutation spreads to the whole population.

Suppose the beneficial mutation does spread. How long does it take to become universal in the gene pool? This calculation does depend on population size. With a fitness advantage of 3%, and a population size of 100,000, the mean time to fixation is 767 generations.

For humans, that would mean an average of sixteen tries and ten thousand years to accumulate a single beneficial mutation.

To get complex machinery, the mutations have to evolve serially - one at a time. If gene B is dependent on gene A, and gene A is only present in 1% of the population, then B isn't an advantage except in the presence of A, which only happens 1% of the time. So the fitness advantage of B goes down by a factor of 100. What this means is that A has to be universal, has to go to fixation in the gene pool, before other genes dependent on it can evolve. Evolution has no foresight. It doesn't look ahead. It doesn't produce good mutations in anticipation of other mutations coming along. Whosoever has the most kids in one generation, their genes are more frequent in the next generation, and that's all there is to it.

Once A and B are both fixed in the gene pool, an improved version of A, A*, which is dependent on B, can also evolve. Now A* and B are mutually dependent on each other. Then C comes along, which depends on A* and B; and B*, which is dependent on A* and C. Eventually you get complex machinery with lots of moving parts that all seem to depend on each other. Nitwit creationists point to the complex machine and say, "How could that happen by chance?" Well, it can't happen by chance, but it can happen by selecting on a sequence of chance mutations. In the battle to get evolution taught in high schools, biologists need to emphasize the counterintuitive creative powers of evolution. But what is also true, and less emphasized, is that it takes millions of years to embroider complex machinery this way, because the sequence of events has to happen serially, one after another.

We can calculate how fast natural selection is, and it's extraordinarily slow. The only reason natural selection can produce patterns as complex as living beings is that, over the course of hundreds of millions of years, you get strong cumulative selection pressures - powerful enough to hit the target of a rabbit genome, in all the space of possibilities.

In contrast, a human engineer - say, a programmer - can sit down at a computer and produce new complex machinery with hundreds of interdependent parts in one afternoon. The human can foresightfully design new parts in anticipation of later designing other new parts; produce coordinated simultaneous changes in interdependent machinery; and learn from experience what kinds of new tweaks are worth trying, rather than waiting for a cosmic ray to produce a good one. By the standards of evolution this is simply magic.

There's a public mystique of evolution, which exists for two reasons.

First, many of the people praising evolution to the stars are on the side of the scientists, but they are not scientists. People with a nontechnical understanding of evolution argue with creationists, perceive that they are very much smarter than the creationists, and think to themselves: "I must understand evolution really well." Meanwhile they have no idea that a quantitative understanding even exists. They understand evolution as a force that improves things, but they don't know how to calculate how much force it is exerting - like the difference between knowing that "things fall down" and being able to calculate a parabola. Human engineers "improve" designs, and evolution "improves" designs, and that puts them on essentially the same level as optimization processes - right? It's the same word, so it must be pretty much the same thing.

Second, the big public battle is over the counterintuitive idea that evolution works at all - not how slowly it works. Professional biology journals carry articles about constrained pathways and speed limits on evolution, but the public debate never gets past the point of arguing over and over again whether evolution works at all.

The human optic cable is installed backward; it comes out of the front of the retina and goes through a hole in the retina to get into the brain - rather than, as a human would have designed the system, simply coming out of the back of the retina to begin with. The retina initially evolved backward, and natural selection never fixed it, because when you've got a lot of interdependent machinery it's hard to change one thing without breaking everything else. A human engineer could do it with a pack of simultaneous changes. Evolution blindly climbs an incremental pathway of mutations.

That sort of biological stupidity is how we know that Earthly life was not created by a superintelligent designer - or if it was a superintelligent designer, it was a superintelligent designer who pretended to be incredibly stupid (by human standards) in exactly the way that evolution ought to be incredibly stupid. This point is not emphasized as heavily, in the public debate, as the idea that evolution could have created life. The result is that rather more friends of science understand that evolution is powerful enough to create life, than understand that evolution is not powerful enough to reroute the human optic cable so that it doesn't go through the retina.

Where do people get the notion that only a chaotic, noisy process can optimize properly? I suspect that it has a great deal to do with all the necessary public hammering-home of the idea that evolution can work at all. Evolution happens to be noisy and chaotic. It is well to remember that evolution is not only noisy and chaotic, but also inefficient, slow, and often jaw-droppingly stupid. (And yes, the noise and chaos have something to do with that.) The miracle of evolution is not how well it works, but that it works at all.

It is amazing that evolution works at all; it's a purely natural optimization process with no brain or intelligence. The story of humankind had to start somewhere, and it had to start somewhere simple. If not for evolution, the universe would contain no complex intelligences to marvel at how stupid evolution is. But evolution is still an extremely primitive optimization process by comparison with, oh, say, a human brain. In some ways biology is still ahead of human engineering, but give us 3.85 billion years to polish our designs and we could do a lot better.

edit] The mystery of ignorance

In the previous section, I analyzed a few special cases where cognitive power is attributed to randomness, arguing:

That the "non-random" version of the lockpicker, to which the random version is compared, is a special nonrandom algorithm that is exceptionally stupid. That most "non-random" versions will do as well as the random version. That some non-random versions will do exceptionally well because they fully exploit all available knowledge. That the "noisy neural network" does better because the "non-random" version engages in egregious overfitting, and because we know a priori that the correct answer fits into the smaller hypothesis space learnable by a noisy network. That the "random mutation" of a hill-climbing algorithm is almost entirely nonrandom, in the sense that it examines only a tiny neighborhood of the entire search space. That it is often possible to replace "random mutation" algorithms with algorithms that do just as well or better by, e.g., examining the entire local neighborhood rather than making a single jump in a random direction. That the power does not come from the "true randomness" of the noise source, in that a pseudo-random algorithm (making the system as a whole purely deterministic) does just as well. That the ability of the system to perform optimization, at all, derives purely from the part of the system that exploits knowledge. That evolution, a kind of naturally arising hill-climbing algorithm, is more limited than generally appreciated. That the noisiness and chaos give rise to specific calculable disadvantages. That the astonishing thing is not how well evolution worked but that it worked at all.

These four special cases do not constitute a general argument against randomness. I did think it wise to dispose of these special cases first, because they are often brought as examples by the advocates of chaos.

Some of the cases above are dangerously subtle. In mathematics it only requires one mistake to prove an erroneous theorem. Similarly it only requires one misstep to conclude that randomness is the key to optimization - like making one error in calculating the work done by an engine, and concluding that it derives power from waste heat, making it a perpetual motion machine. (There is actually a deep analogy between these two cases.) This is why it is important to appreciate the forthcoming general argument against power-from-randomness - so that, faced with an apparent proof of perpetual motion, you don't say "Wow!" and rush out to build a prototype, but instead go back and check your calculations.

edit] Does an unpredictable world demand an unpredictable Goal System

From Robyn Dawes, "Rational Choice in an Uncertain World", p. 259:

"Many psychological experiments were conducted in the late 1950s and early 1960s in which subjects were asked to predict the outcome of an event that had a random component but yet had base-rate predictability - for example, subjects were asked to predict whether the next card the experiment turned over would be red or blue in a context in which 70% of the cards were blue, but in which the sequence of red and blue cards was totally random. In such a situation, the strategy that will yield the highest proportion of success is to predict the more common event. For example, if 70% of the cards are blue, then predicting blue on every trial yields a 70% success rate. What subjects tended to do instead, however, was match probabilities - that is, predict the more probable event with the relative frequency with which it occurred. For example, subjects tended to predict 70% of the time that the blue card would occur and 30% of the time that the red card would occur. Such a strategy yields a 58% success rate, because the subjects are correct 70% of the time when the blue card occurs (which happens with probability .70) and 30% of the time when the red card occurs (which happens with probability .30); .70 * .70 + .30 * .30 = .58. In fact, subjects predict the more frequent event with a slightly higher probability than that with which it occurs, but do not come close to predicting its occurrence 100% of the time, even when they are paid for the accuracy of their predictions."

To this effect Dawes cites (Tversky, A. and Edwards, W. 1966. Information versus reward in binary choice. Journal of Experimental Psychology, 71, 680-683). Subjects who were paid a nickel for each prediction over a thousand trials in which the more frequent event occurred with 70% frequency on a random basis, guessed that event 76% of the time.

Dawes goes on to say, "Despite feedback through a thousand trials, subjects cannot bring themselves to believe that the situation is one in which they cannot predict." Maybe so! But even if subjects think they can make a prediction - if they come up with a hypothesis - they don't have to actually bet on the predicted card in order to test the hypothesis. They would be wiser to say quietly to themselves, "Now if this hypothesis is correct, the next card will be red," and then bet on blue until the hypothesis is confirmed - especially if all their previous hypotheses have failed!

I would not fault the subjects for continuing to invent hypotheses - how could they know the sequence was beyond their ability to predict? - but I would fault them for betting on their guesses when this wasn't necessary to gather information.

I would interpret the result as follows: People fail to realize that, given imperfect information, the optimal betting strategy does not resemble a typical sequence of actual cards. They see a mix of mostly blue cards with some red, and suppose that the optimal betting strategy (given their knowledge) must be a mix of mostly blue cards with some red. It is the old rule of magic that Effects Resemble Causes, formerly called the Law of Similarity, which these days is called "the representativeness heuristic".

A "random" key does not fit a "random" lock. A random code does not solve a random combination on the first try just because they are "both random". Different noise sources will not correlate; different randomnesses are not commensurate. The stock market has an element of randomness - or rather, unpredictability relative to our current knowledge - but you cannot crack the stock market by randomizing your stock-buying pattern. When your knowledge is imperfect, when the world seems to you to have an element of randomness, randomizing your actions doesn't solve the problem. Randomizing your actions takes you further from the target, not closer. In a world already imperfect, throwing away your intelligence just makes things worse.

edit] Blank maps and blank territories

The great Bayesian theorist E. T. Jaynes observed that if we are ignorant about a phenomenon, this is a fact about our state of mind, not a fact about the phenomenon. Suppose someone tells me (and I trust them) that a certain hat, lying on a table, overlies a coin. I didn't see the coin before the hat went on top of it, so I don't know whether the coin is showing heads or tails. Is this a fact about the coin? No, it is a fact about me. My beliefs exist as patterns of neural firing activity in my brain - they are not part of the coin. When I assign a probability of 50% that the coin is showing heads, and 50% probability that the coin is showing tails, I am describing my state of knowledge about the coin, not describing a property of the coin alone. Perhaps, with sufficient knowledge of physics and sufficient computing power and a fast detailed camera, you could write a program that would observe a coin-flipping machine and predict the outcome in advance. But in practice, a coinflip is "random" because humans can't predict it.

Or recall the chess game in which I must assign a probability distribution over Kasparov's possible moves. The probability distribution I assign to Kasparov is not so much a property of Kasparov, as a property of me. There are many possible systems that would produce well-calibrated probability distributions as predictions of Kasparov's moves, and it is quite possible for these systems to produce different probability distributions. One assigns a probability of 60% to a move, another assigns a probability of 80%, yet both systems can be well-calibrated in the sense that moves for which they say "70 percent" happen around 7 times out of 10.

Jaynes labeled the error of thinking that probabilities are properties of things-in-themselves the Mind Projection Fallacy, which occurs when we mistake cognitive properties for parts of the outside world. Suppose that I'm making a map of a city, and in one corner, corresponding to a part of the city I haven't visited, there's a blank space on the map. That doesn't mean that when I visit that part of the city, I'll find a blank territory. Ignorance exists in the map, not in the territory. There are mysterious questions, but never mysterious answers.

Now how could any AI be powered by our own ignorance about it, when this ignorance is a fact about us, rather than a fact about the AI?

An unknown key does not fit an unknown lock. This is the fundamental reason why noise hath no power.

edit] Worshipping sacred mysteries

The influence of animal or vegetable life on matter is infinitely beyond the range of any scientific inquiry hitherto entered on. Its power of directing the motions of moving particles, in the demonstrated daily miracle of our human free-will, and in the growth of generation after generation of plants from a single seed, are infinitely different from any possible result of the fortuitous concurrence of atoms... Modern biologists were c