While Minsky has always had a great fondness for robots, he came to the conclusion rather early that from the point of view of laboratory experiments making a robot mobile was more trouble than it was worth. “I thought that there were enough problems in trying to understand hands and eyes, and so forth, without getting into any extra irrelevant engineering,” he told me. “My friends at the Stanford Research Institute decided in the mid-sixties to make their first robot mobile—against my advice.”

In 1962, Henry Ernst, who was studying with both Minsky and Claude Shannon, made the Artificial Intelligence Group’s first computer-controlled robot. It was a mechanical arm with a shoulder, an elbow, and a gripper—basically, the kind of arm that is used to manipulate radioactive materials remotely. The arm was attached to a wall and activated by several motors, which, in turn, were controlled by a computer. The robot’s universe of discourse consisted of a box and blocks that were set out on a table. It had photocells in the fingertips of the gripper. The hand would come down until it was nearly in contact with the surface of the table, and then, when the photocells sensed the darkness of the hand’s shadow, its program would tell it to stop. It would thereupon begin to move sidewise until it came into contact with a block or the box. It could tell the difference, because if the object was less than three inches long it was a block and if it was more than three inches long it was the box. The program would then direct the arm to pick up the block and put it in the box. The arm could find all the blocks on a table and put them into the box. “It was sort of eerie to watch,” Minsky recalled. “Actually, the program was way ahead of its time. I don’t know if we appreciated then how advanced it was. It could deal with the unexpected. If something that it didn’t expect happened, it would jump to another part of its program. If you moved the box in the middle of things, that wouldn’t bother it much. It would just go and look for it. If you moved a block, it would go and find another one. If you put a ball on the table, it would try to verify that it was a block. Incidentally, when Stanley Kubrick was making his film ‘2001’ he asked me to check the sets to see if anything he was planning to film was technically impossible. I drew a sketch for Kubrick of how mechanical hands on the space pod might work. When I saw the film, I was amazed that M-G-M had been able to make better mechanical hands than we could. They opened the spaceship’s airlock door fantastically well. Later, I learned that the hands didn’t really work, and that the door had been opened by a person concealed on the other side.”

In the mid-nineteen-sixties, Minsky and Papert began working together on the problem of vision. These efforts . ultimately produced a program created by Minsky in collaboration with a group of hackers—Gerald Sussman, William Gosper, Jack Holloway, Richard Greenblatt, Thomas Knight, Russell Noftsker, and others—that was designed to make a computer “see.” To equip the computer for sight, Minsky adapted some television cameras. He found that the most optically precise one had been invented in the early nineteen-thirties by Philo Farnsworth, who was one of the early television pioneers. It was still being manufactured by ITT. Minsky ordered one and managed to get it working, but it kept blurring. He telephoned the company and was told that the best thing to do would be to talk to Farnsworth himself, who was still doing research at the company. Minsky explained his problem on the telephone, and Farnsworth instantly diagnosed it. Minsky then fixed the blurring and attached the camera to a PDP-6 computer. The idea was to connect this camera to an arm so that one could tell the computer to pick up objects that its eye had spotted and identified. The arm was then to do various things with the objects. In the course of this, Minsky designed a mechanical arm, powered by fourteen musclelike hydraulic cylinders. It had a moving shoulder, three elbows, and a wrist—all not much thicker than a human arm. When all the bugs were finally out and the machine was turned on, the hand would wave around until the eye found it. “It would hold its hand in front of its eye and move it a little bit to see if it really was itself,” Minsky said. The eye had to find itself in the coördinate system of the hand. Despite all the problems, they were able to get the arm to catch a ball by attaching a cornucopia to the hand, so that the ball would not fallout. It would sometimes try to catch people, too, so they finally had to build a fence around it.

The project turned out to be much more difficult than anyone had imagined it would be. In the first place, the camera’s eye, it was discovered, preferred to focus on the shadows of objects rather than on the objects themselves. When Minsky and his colleagues got that straightened out, they found that if the scene contained shiny objects the robot would again become confused and try to grasp reflections, which are often the brightest “objects” in a scene. To solve such problems, a graduate student named David Waltz (now a professor of electrical engineering at the University of Illinois at Urbana) developed a new theory of shadows and edges, which helped them eliminate most of these difficulties. They also found that conventional computer-programming techniques were not adequate. Minsky and Papert began to try to invent programs that were not centralized but had parts—heterarchies—that were semi-independent but could call on one another for assistance. Eventually, they developed these notions into something they called the society-of-the-mind theory, in which they conjectured that intelligence emerges from the interactions of many small systems operating within an evolving administrative structure. The first program to use such ideas was constructed by Patrick Winston—who would later succeed Minsky as director of the A.I. Laboratory. And by 1970 Minsky and his colleagues had been able to show the computer a simple structure, like a bridge made of blocks, and get the machine, on its own, to build a duplicate.

At about the same time, one of Papert’s students, Terry Winograd, who is now a professor of computer science and linguistics at Stanford, produced a system called SHRDLU. (On Linotype machines, operators used the phrase “ETAOIN SHRDLU” to mark a typographical error.) SHRDLU was probably the most complicated computer program that had ever been written up to that time. The world that Winograd created for his SHRDLU program consisted of an empty box, cubes, rectangular blocks, and pyramids, all of various colors. To avoid the complications of robotics, Winograd chose not to use actual objects but to have the shapes represented in three dimensions on a television screen. This display was for the benefit of the people running the program and not for the machine, which in this case was a PDP-10 with a quarter of a million words of memory. The machine can respond to a typed command like “Find a block that is taller than the one you are holding and put it into the box” or “Will you please stack up both of the red blocks and either a green cube or a pyramid?” When it receives such a request, an “arm,” symbolized by a line on the television screen, moves around and carries it out. The programming language was based on one named PLANNER, created by Carl Hewitt, another of Papert’s students. PLANNER, according to Minsky, consists largely of suggestions of the kind “If a block is to be put on something, then make sure there is room on the something for the block to fit.” The programmer does not have to know in advance when such suggestions will be needed, because the PLANNER system has ways to detect when they are necessary. Thus, the PLANNER assertions do not have to be written in any particular order—unlike the declarations in the ordinary programming languages—and it is easy to add new ones when they are needed. This makes it relatively easy to write the language, but it also makes it extremely difficult to anticipate what the program will do before one tries it out—“so hard,” Minsky remarked, “that no one tries to use it anymore.” He added, “But it was an important stepping stone to the methods we use now.” One can ask it to describe what it has done and say why it has done it. One can ask “Can a pyramid be supported by a block?” and it will say “Yes,” or ask “Can the table pick up blocks?” and it will say “No.” It is sensitive to ambiguities. If one asks it to pick up a pyramid—and there are several pyramids—it will say “I don’t understand which pyramid you mean.” SHRDLU can also learn, to a certain extent. When Winograd began a question “Does a steeple—” the machine interrupted him with “Sorry, I don’t know the word ‘steeple.’ ” It was then told that “a steeple is a stack which contains two green cubes and a pyramid,” and was then asked to build one. It did, discovering for itself that the pyramid has to be on top. It can also correctly answer questions like “Does the shortest thing the tallest pyramid’s support supports support anything green?” Still, as Douglas Hofstadter, in his book “Gödel, Escher, Bach,” points out, SHRDLU has limitations, even within its limited context. “It cannot handle ‘hazy’ language,” Hofstadter says. If one asks it, for example, “How many blocks go on top of each other to make a steeple?” the phrase “go on top of each other” —which, despite its paradoxical character, makes sense to us—is too imprecise to be understood by the machine. We use phrases like this all the time without being conscious of how peculiar they are when they’re analyzed logically.