The processor ran on integrated circuits, and this first time ever that ICs would be used in a computer. In fact, the technology was so new, the production engineers didn’t even know how to test them! Every single step of this computer’s hardware was a gamble, a story of clever trial and error by some very smart people.

The Problem(s)

When you attempt a project this ambitious, there’s bound to be problems. A lot of them. The AGC was built at a time when the concept of software itself was a little bit fuzzy. Software was simply part of the process of developing the hardware, and the MIT researchers were expected to program the thing as well. On this project however, the software was by no means trivial. It introduced engineering problems in a field that wasn’t even considered real engineering.

One problem of particular interest is the issue of handling multi-programming. Clearly, the AGC had a lot of different tasks to do at the same time, like calculate trajectories, run maintenance programs, display information to the user, etc. How would it handle all of these at once, especially with the pathetically low amount of memory it had?

Traditional computers at the time used a system called round-robin scheduling. When there were multiple tasks to be done, the processor handles each one for a specific slot of time, and keeps cycling between them. For 3 processes P1, P2, and P3, the computer might handle P1 for 10 microseconds, P2 for 10 microseconds, P3 for 10 more, and then go back to P1.

This method has its problems. For one, if there comes along a short program, then you’re wasting the remainder of the cycle. If there’s a long or important program that’s demanding a lot of computing power, you aren’t giving it enough attention. Round-robin can work for large server computers that might be working with repetitive tasks, but for a real-time system that must be responsive for the user, it fails.

The Solution

The engineers realized that they needed a method for the processor to complete the important jobs first, and know which jobs to keep for later — a notion of priority. The tasks should also be processed one by one, so that you finish the important tasks as soon as possible — a run queue.

The solution is to assign each process a particular priority, and keep moving the higher priority tasks to the top of your queue. This priority-queue system ushered the age of modern multi-programming and is still used in commercial operating systems.

Apollo had two separate job queues — The Waitlist and the Executive. Programs would move from the Waitlist to the Executive once the higher priority processes were executed.

When a higher priority process comes along, it is moved straight to the top of the Executive, in what is called an interrupt, as the executing process is interrupted to make way for the more important process. The AGC also had a 12-word erasable area called the Core Set, where it stored information about the programs that are executing. Hence, interrupted processes can be resumed where they were left off. Clever!

The Test

Priority-interrupt scheduling showed it’s true power during Apollo 11, after the Lunar Module initiated its descent towards the Moon. When the spacecraft was at 30,000 feet above the Lunar surface, while the descent program was executing, the computer ran a warning alarm with the code 1202. In what has to be called the most important tech support call of all time, engineers Steve Bales at Mission Control in Houston discovered that the code meant “Executive Overflow — No Core Sets”, implying that the computer was handing more than it could take.

Think about how much work the system must have been doing at the time — It had to know where the lunar module was and where it was moving, information called state vector. It needed to maintain the right attitude based on that position, as well as velocity, altitude, and engine performance data. It also needed to adjust the abort trajectory constantly, ready to get the crew back into orbit should something force an abort. All this consumed nearly 90% of the CPU’s computing time.

Houston identified that the overflow could be due to pilot Buzz Aldrin leaving the external radar on in passive mode (called SLEW mode), and that might be still be eating some of the computer’s precious clock cycles. Sure enough, once the radar was turned off, the alarms ceased and the computer was stable once again.

Without a priority-interrupt system, the system would have crashed when the program gave it a new job, and would need a hard reboot. Instead, the scheduler simply chucked out some lower priority tasks and sounded the alarm, keeping the system alive. Without the scheduler, the moon landing would never have happened!

The Greatest Computer Ever Made

Margret Hamilton, Director of Software Engineering at the MIT Instrumentation Lab, with the AGC source code.

I only talked about one small part of the AGC here, but it was truly a revolutionary machine in every way. As a programmer whose programs regularly leak memory, it baffles me how the software was that well optimized, to be able to do so much with so less. This wasn’t just a computer that had to work pretty well, it was a computer that would kill people if it failed. Perfection requires effort, enormous amounts of effort. The work was so demanding, 15 MIT engineers got divorced while working on the project! The years of passionate work building the AGC speaks in the stellar success story of the Apollo program.

I think it’s safe to say that this is indeed the greatest computer ever made, and will be for ages to come.