Which Machines Do Computer Architects Admire?

The Toolsmith Conference was held in Chapel Hill, NC, in October, 2001, to honor Fred Brooks. Dr. Brooks was a student of Howard Aiken at Harvard, joined IBM to work on the Stretch Project, and later served as chief architect of the IBM S/360. He founded the Computer Science Department at UNC-Chapel Hill in 1964 and served as chairman for 20 years. At UNC he has regularly taught the course in computer architecture, and in 1997 he published a textbook on computer architecture with co-author Gerrit Blaauw.

Dick Sites, co-architect of the DEC Alpha, was one of the speakers at the conference. He commented that part of his learning process at Chapel Hill was being motivated by hearing which machines Fred admired and then studying them to see for himself the noble characteristics.

On my way back from the conference, I started wondering which machines have been admired by other computer architects. So I asked several and herein I list their answers. I wish to thank each for his or her time in replying and permission to quote the answer.

[later additions: Gordon Bell]

primary machines

6502 - listed by Wilson

CDC-6600 and 7600 - listed by Fisher, Sites, Smith, Worley

Cray-1 - listed by Hill, Patterson, Sites, Smith, Sohi, Wallach

Cray X-MP and Cray Y-MP - listed by Sohi (see August, et al., IEEE Computer, January 1989, pp. 45-52)

Cray-2 - listed by Smith

Cray-4 - listed by Worley

GE-645 (Multics) - listed by Wallach

IAS - listed by Sites

IBM Stretch - listed by Sites

IBM 1401 - listed by Sites

IBM 1570 - listed by Worley

IBM 7040/90 - listed by Wallach

IBM S/360 and S/370 - listed by Alpert, Hill, Patterson, Sites

S/360 Model 91 - listed by Sohi

IBM ACS - listed by Worley

IBM America - listed by Alpert (see special issue, IBM Jrnl. Res. & Dev., January 1990)

Intel x86 - listed by Alpert

LC-2 - listed by Patt

MIPS - listed by Alpert, Hill, Patterson, Sohi, Worley

Multiflow - listed by Alpert

PDP-11 - listed by Alpert, Patt, Patterson

secondary machines

IBM ACS - listed by Sites

IBM Service Free Processor (801 predecessor) - listed by Sites

Xerox Alto - listed by Sites

sources of counterexamples

CDC-8600 - listed by Smith

IBM Stretch - listed by Sites

NS-32016 (16032) - listed by Wilson

VAX-11 - listed by Sites

Don Alpert, engineering manager of Intel Itanium project 1993-1997, chief architect of the Intel Pentium, also architect of NS 32532, NS Swordfish, and Zilog Z80,000

October, 2001

IBM System/370 - Formalized and exemplified the key distinction between architecture and implementation. Encompassed complete system functionality of multiple processors, virtual memory, and I/O. Performance of workloads and implementation tradeoffs became quantified and well characterized. I was especially influenced by Len Shustek's Ph.D. dissertation at Stanford comparing the IBM 370/168 and Amdahl 470/V6 machines. Shustek worked with Bernard Peuto on his analysis, and Bernard managed hardware engineering at Zilog, where I later worked. Finally, the 370 also demonstrated convincingly that an architecture can be successful despite technical flaws (FP guard bit, lack of signed displacements, lack of PC-relative addressing, lack of virtual memory, absolute addressing for interrupts, ...) if there is a valuable base of installed software, controlled evolution of features, and sufficient innovation and investment in hardware development.

PDP-11 - I was first exposed to the PDP-11 in my first semester as a grad student at Stanford. We had to design the datapath and microprogram the dataflow for basic operations. I continue to be impressed by the generality, simplicity, and power of the ISA. I was also fascinated by how the PDP-11 influenced other architectures of the late-1980's, specifically how architects of the Motorola 68K, DEC VAX, and Zilog Z8000 learned from PDP-11 usage patterns but chose different directions for 32-bit addressing, number/usage of registers, and addressing modes. I also had the opportunity to meet Gordon Bell over a working lunch about 10 years ago. He confirmed the story that the basic PDP-11 architecture was conceived over one weekend by him and a colleague at Carnegie Mellon, whose name I unfortunately cannot recall. In addition, the Unibus introduced by the PDP-11 stands out as a model for later I/O bus standards.

MIPS - I first met John Hennessey at a get-together for incoming Stanford EE grad students in August, 1977 when he was beginning his first year on the faculty. I also came to know the grad students who worked on the original Stanford MIPS design and compiler in the early 1980's. I have to admit that until the initial MIPS Co. systems came out I disbelieved that such a simple architecture could perform so well. This demonstrated that the CISC approach and the idiosyncratic features grafted onto RISCs (like register windows, capability addressing, even the original MIPS non-interlocked pipeline) performed no better than a pure RISC . In the summer of 1987 I visited with Mark Horowitz and his grad students at Stanford who were developing the MIPS-X to learn more about the implementation tradeoffs for this simple architecture.

America and Multiflow - When we were planning the successor to the NS32532 at NSC in 1987, it was clear that we could integrate multiple integer and FP functional units along with several KB of cache. The challenge was to figure out a machine organization that could deliver performance while maintaining a compatible programming model at a cost consistent with the targeted embedded applications. I first learned about the idea of superscalar architecture from hearing about John Cocke's work at IBM. If I recall correctly, the initial machine he described had the code name "America". The superscalar approach became the programming model for Swordfish, and the VLIW approach of Multiflow highly influenced the machine organization. I had met Josh Fisher once when I was a student at Stanford, then heard him give a talk about Multiflow at UC Berkeley in 1987

Intel x86 - I first came across the 8086 in late 1978, when I was looking to select an architecture for a grad-school project in Prof. Mike Flynn's emulation lab. I had a friend at Intel who provided me copies of the manuals to review. I concluded that the ISA was too quirky to be of interest, and chose to emulate a Pascal P-Code virtual machine instead. Of course, the 8086 turned into a great commercial success despite its flaws because Intel provided complete system development solutions and IBM chose the architecture for the PC. In 1988 I traveled extensively for NSC to present plans for Swordfish to computer system developers. I consistently heard from them that the NSC technology was excellent, but their number one criterion for adopting an architecture was the installed base of application software. So after developing the architectures for three microprocessors between 1981-1989, publishing papers about them but seeing none of them in widespread use, I jumped at the opportunity to join Intel to work on the P5. And since that time I have come to appreciate the great irony that after computer architects have spent decades accumulating knowledge about effective ISA techniques, this quirky architecture that violates most sound principles has become so wildly successful. The P5 eventually exceeded its initial production plans by over 10x!

Bob Colwell, chief architect of Intel IA32 microprocessors from 1992-2000, and formerly a CPU architect at Multiflow

October, 2001

I believe outstanding designs have some common characteristics, regardless of the product space. The first is that they succeed: they must accomplish the task for which they were designed. This is no trivial thing. For instance, in my book, it is not ok for a designer to say "so maybe my new server CPU isn't very fast at server workloads, but boy does it fly on floating point." Designers don't get to change targets to suit whatever they came up with in their designs. Buyers get to decide such things. (It goes without saying that the converse is not necessarily true -- sometimes poor designs make a lot of money. Ignore this, it isn't relevant to this discussion.)

Beyond the practical success of a design, I admire machines whose designers went beyond the requirements and reached for elegance and beauty. There is a fundamental truth to a well designed machine of any kind, and you can tell when the designers have found it. For example, when a pilot is in trouble, and needs to push her airplane beyond the envelope that's in the book, and the plane remains stable, predictable, and controllable, that pilot should find the designers and buy them all a beer. There are many stories about helicopters from the Vietnam era (Chickenhawk, for example) where the pilot relays stories about having pushed Huey's beyond their flight envelopes due to the necessities of war, and the machine stayed with him. The DC3 propeller plane was legendary for its dependability, no matter what trouble the pilot had flown into. There's a gestalt to these things, and they apply to microprocessors as well as to buildings, cars, and musical instruments.

Josh Fisher, pioneer of VLIW computing, chief architect of Multiflow, and now an HP Fellow

October, 2001

The CDC-6600 is well-known to have been important, but few people really have a sense of what it introduced:

Clean, concise (today we say RISC) instruction set

Seriously fast Icache

Software pipelining

Multithreading (SMT-style, more-or-less, though not as nice, in the peripherals)

Truly separated I/O

An easy-to-understand and thus easy-to-program-for superscalar control unit (the "Scoreboard")

I knew a lot of people who lived with this thing--we had serial #4 at NYU/Courant when I was a graduate student. It was a pretty wonderful lesson in Computer Architecture, and has influenced me to build really clean, RISC-like (VLIW being the logical successor of RISC if you think about it) architectures from then on - and also architectures you can compile for.

My whole career stemmed from building a little microcoded engine that emulated the CPU of that machine (the 6600 cost $250k/yr for maintenance, as I recall, and we built an ECL engine for $40k each). If the machine hadn't been so clean, we wouldn't have even considered doing that.

Mark D. Hill, University of Wisconsin, named a 2000 IEEE Fellow for contributions to cache memory design and analysis

February, 2002

I would like to spread my most admired designs over time.

For the 1960s, I choose the IBM System/360 architecture. The very concept of specifying an architecture to be the interface to software for many hardware implementations was a dramatic step forward. While in some ways it was an imperfect compromise (e.g., the SS instructions for Cobol), its core has lasted forty years and counting. If I would choose an implementation, it would be the 360/85, which included the first cache, enabling it to perform better than it more complex cousin, the 360/91.

For the 1970s, I choose the Cray-1. Architecturally, this clean design centered on supporting pipelining to the extreme (for the day). The implementation used optimizations that crossed levels of abstraction to obtain the best scalar performance of the day, augmented by even better vector performance.

For the 1980s, I choose the MIPS architecture and the R2000 implementation. This design embodies the quantitative approach to selecting the key hardware mechanisms to support software. The R2000 uses a classic pipelining to obtain excellent performance from the limited transistors of the day.

I prefer to wait ten more years before selecting my choice for the 1990s.

Yale Patt, University of Texas at Austin, winner of 1996 Eckert-Mauchly award (ACM/IEEE) for contributions to computer systems architecture

October, 2001

PDP-11

LC-2

Dave Patterson, University of California, Berkeley, and designer of the RISC-I

October, 2001

Cray 1

IBM 360 (if picking one: the model 85 which did the cache)

PDP 11 (if picking one: model 45)

MIPS (if picking one: R3000)

Dick Sites, DEC Alpha co-architect

October, 2001

I looked back this weekend at what CPU manuals I have that are particularly well-thumbed. This is a better clue than most about what influenced me. You will notice four names occur frequently below: Fred Brooks, Gerry Blaauw, John Cocke, and Seymour Cray.

Cray-1 - A pure load-store RISC machine with very few warts. In my opinion, Seymour's crowning achievement. "Farmers buy computers." [Cray]

CDC 6600 - A ground-breaking machine designed for pure Fortran floating-point speed. Innovations: a simpler and perhaps more effective multi-function-unit design than the 7030; peripheral processors (PPUs) for I/O and OS and console; single datapath timeshared to make illusion of multiple PPUs; weird floating-point, entirely for speed; separate address, integer, floating registers and instructions; load/store as side effect of address calculation; multiple instructions per word; "Parity is for farmers". [Cray]

IBM 7030 (Stretch) - An incredibly rich source of examples and counter-examples for perhaps half of all the ideas in computer architecture. Matching disk controller, operating system and optimizing Fortran compiler. Tried to stretch the state of the art by a factor of 100; 1956-1961. [Brooks, Blaauw, Cocke]

IBM 360 - 35 years later, and still a commercial success. Introduced architecture as distinct from implementation; upward and downward compatibility; planned future extension; planned for enough address bits; had base registers to allow wide addresses in small instructions; brought commercial and scientific processing into a single design; put microcoding on the map; put hardware emulation on the map; had multiple compiler languages; gave the industry the 8-bit byte. [Brooks, Blaauw, Amdahl]

VAX-11 - Truth be told, this served as a continuing source of counter-examples when we did the Alpha design. We wrote down everything that made VAX implementations difficult or slow or tied to a specific operating system, and made sure to leave it out.

von Neumann's IAS machine - Like the much-later Cray-1, carefully considered for speed. Functions that could be performed as quickly in software were explicitly left out of the hardware (an enduring principle, even though the tradeoffs change with technology each decade). IBM 1401 - A fine printer connected to a puny processor. This workhorse went though an astounding range of capabilities: from 1400 characters of memory to 16,000; from card I/O to magnetic tapes; from 4-tape Autocoder to 2-tape Autocoder to a Fortran compiler that ran in 8000 characters (Fortran program stayed in memory, compiler was 63 overlays); from simple 1-character instructions to index registers. The 1401 is the only machine I have used that allowed useful programs to be punched onto a single card, directly in machine language. Simplest boot sequence: read a card into locations 1..80 and branch to location 1.

I should also mention the (unpublished) IBM 801 predecessor I worked on one summer [Cocke], the unbuilt ACS processor from another summer at IBM [Cocke], and the seminal Xerox Alto [Thacker].

Jim Smith, University of Wisconsin, chief architect of the Astronautics ZS-1, and winner of 1999 Eckert-Mauchly award (ACM/IEEE) for contributions to computer systems architecture

October, 2001

An interesting question -- for me there is a clear answer. Although it is tempting to come up with an obscure machine -- the machine that I consider to be a true masterpiece is highly regarded by many other people. The CRAY-1 matched technology, implementation, instruction set like no other. Its elegant simplicity is unsurpassed.

But just about all the Cray machines were quite interesting.

The CDC 6600 was jointly developed by Cray and Thornton. It was the first RISC -- the RISC philosophy was laid out chapter-and-verse by Thornton in the "Elephant Book" passed out at the 6600's introduction in 1963 -- a copy of which no doubt made its way to IBM. Thornton designed the central processor design with Les Davis; Cray and Harry Runkel(?) did the memory and I/O systems (also a major achievement).

The 6600 central processor with its scoreboard-controlled out-of-order issue was innovative and is very well known, but the 7600 central processor, designed by Cray (Thronton had gone off to do the STAR-100) is probably a cleaner, more unified design than the 6600. The instruction set was very similar to the 6600's. Although it may sound heretical, I think the pipelined, in-order issue 7600 is the 6600 "done right."

The 8600 [US patent 3,833,889] was very interesting, but a technology over-reach that was never finished. Cray pushed the envelope and sometimes pushed too hard. The 8600 was constructed of discrete circuits (when discrete really meant discrete -- in the 8600 individual transistors and resistors were soldered to the circuit board). The 8600 also used very aggressive three dimensional packaging [U.S. patent 3,832,603] -- and would have run at 125 MHz (in the very early 70s) -- 50% faster than the Cray-1 , which came out in the mid 70s. The 8600 had four processors and used two-operand instructions with 16 operating registers (versus the three-operand instructions and 8 registers used in just about all the other Cray designs). The 8600 also used a cylindrical cabinet like the Cray-1 with power supplies radiating out at the base. Cray left CDC to form Cray Research before the 8600 was finished (largely because of CDC budget cuts) but people close to the project have indicated that it had serious packaging-related problems and it is doubtful that it ever would have been finished.

After the Cray-1, there were at least two aborted Cray-2 designs (maybe more). It took about ten years between the Cray-1 and Cray-2. One of the early Cray-2 attempts was remarkably innovative. It used a register hierarchy, with a single accumulator at the top. The single accumulator could be renamed, and dependent instructions were linked through the accumulator. These linked instructions would all be directed to one of eight parallel issue FIFOs (one per physical copy of the accumulator). My group is currently researching this type of architecture; its simple, distributed microarchitecture seems amenable to very high clock rates and wire-delay limited technologies.

The Cray-2 was not especially successful at the time but is another machine that may have relevance today. The scalar ISA was a little cleaner than the Cray-1, but the most interesting characteristic was the extremely deep pipelining -- four gate levels per pipeline stage. Instruction issue took two cycles between consecutive instructions. This was probably too deep for optimal scalar performance, but the deep pipes were excellent for vectors. The Cray X-MP (same chip technology, less aggressive packaging, eight gate levels per pipe stage) had a slower clock, 8.5 ns versus 4.1 ns, but faster scalar performance.

The Cray-3 was in some respects like the 8600 -- perhaps it pushed the envelope too far, also in the direction of very aggressive 3-D packaging. The ISA was very similar to the Cray-2, but Cray somehow found a way to issue a scalar instruction every cycle -- the clock frequency was around 500MHz, as I recall.

Highly recommended is the book "The Supermen: the Story of Seymour Cray and the Technical Wizards behind the Supercomputer" by Charles J. Murray. All the bits and pieces of history that I picked up while working at CDC and Cray correlate very strongly with the version of history given in this book.

Guri Sohi, University of Wisconsin, winner of 1999 Maurice Wilkes award (ACM SIGARCH) for contributions to computer architecture

November 2001

I learned a lot by studying the IBM 360/91. It is a must for any serious student of architecture.

The Cray-1 taught me a lot about basic high-performance processing principles: clean ISA, vectors, the importance of dealing with branches efficiently, etc.... I learned about basic MP support for automatically parallelized applications from the Cray X-MP and the Cray Y-MP.

The MIPS architecture taught me about modern RISC ISAs.

Steve Wallach, chief architect of the Convex C-series and the DG MV/8000

October, 2001

Three systems greatly influenced me:

The IBM 7040/90 - This was the first system that I ever programmed in assembly language. I was fascinated and intrigued by the register set, the addressing modes, etc., especially how the ISA was influenced by Fortran.

The Multics system - The whole protection and security greatly influenced me. The virtual addressing and protection structure of the Data General MV/8000 was based on Multics.

The Cray-1 - The Convex C-1 was a virtual memory mini-cray. I read the Cray manuals and the various published papers many, many times. I was fascinated with the Cray-1. I built a military vector machine (the AADC - an APL machine in 1973) and wanted to see what the Cray-1 was. I still have a pirated video tape of Seymour making a presentation at LLNL on the Cray-1. A must see for all future architects.

And while not a machine, the papers by Knuth and Wulf that showed the statistics of addressing modes, instruction set usage, etc.

Sophie Wilson, chief architect of ARM and more recently of the Broadcom FirePath

October, 2001

Primarily the 6502. I learned about pipelines from it (by comparison with the 6800) and its designers were clear believers in the KISS principle. Plus the syntax of its assembler and general accessibility of it from the machine code perspective. I can still write in hex for it - things like A9 (LDA #) are tattoed on the inside of my skull. The assembly language syntax (but obviously not the mnemonics or the way you write code) and general feel of things are inspirations for ARM's assembly language and also for FirePath's. I'd hesitate to say that the actual design of the 6502 inspired anything in particular - both ARM and FirePath come from that mysterious ideas pool which we can't really define (its hard to believe that ARM was designed just from using the 6502, 16032 and reading the original Berkeley RISC I paper - ARM seems to have not much in common with any of them!). And clearly the 6502's follow-up, the 65816, wasn't "clean" any more, so whichever of Mensch and Moore contributed what to the 6502, Mensch by himself was a bit at sea.

Biggest object lesson was, however, National Semiconductor's 32016 (aka 16032): this showed how to completely make a mess of things. The 32016 first exposed the value of memory bandwidth to Steve Furber and I, showed how making things over-complex led to exceedingly long implementation times with loads of bugs in the implementation, and showed that however hard you tried to approach what compiler writers claimed they wanted, you couldn't satisfy them (no, I never did use a VAX). And an 8MHz 32016 was completely trounced in performance terms by a 4MHz 6502...

Bill Worley, worked with John Cocke on the IBM 801, later a principal architect of HP's two most important computer architectures -- PA-RISC and PA Wide-Word (which was the basis of IA-64)

March, 2002

1. Mid 1960s - IBM 1570. This was a 16 bit personal computer that never was productized because IBM then saw no market for a "personal computer." The 1570 had a one bit data path and a high-speed ECL implementation. The primary software was an interactive APL system. Instructions were 16 bits or 32 bits in length. In the latter case, the low order 16 bits were an immediate value or an address displacement. The high order 16 bits consisted of four 4-bit fields specifying opcode, up to three register operands (16 general registers), or control. There were only two computation instructions: 2's complement ADD and NOR. Instruction inputs and outputs, however, freely could be complemented (1's complement). Code was remarkably compact. I recall that the soft disk I/O subroutine was 14 instructions in length. (The soft disk resembled a 45-RPM record with a plastic clamp around the outer periphery.) The system looked like a desk, with a built-in 2741 selectric typewriter and optical punched card reader. Two disk drives lay side by side in a center pull-out drawer. Because this never became a product I don't know if it qualifies for your list or not, but it did begin a family of peripheral controllers, and I did admire it.

2. 1960s - IBM ACS machine. This 48-bit machine also never became a product, but conceived nearly all the principles for an out-of-order superscalar RISC processor over two decades before such products emerged.

3. 1960s - CDC 6600. Seymour Cray broke new ground in many categories.

4. 1980s - MIPS. The MIPS architecture was elegant in its simplicity, though it often used two instructions where one could have sufficed.

5. 1990s - The Cray Computer Cray4. Seymour's architecture and hardware implementation I considered to be things of beauty.

Gordon Bell, architect or co-architect on the DEC PDP-4, PDP-5, PDP-6, and PDP-11; led the development of the DEC VAX

April 2008 - excerpt from Computerworld interview

What has been your favorite computer of all time?

For one that I was involved with, the VAX was the most successful. It was a joy to be involved with. It was a wonderful team. I'm very proud of what we produced. And for one that I wasn't with, in a funny way it's probably the IBM 360. I love Seymour Cray's computers. I'd say it's the vector processor. The Cray style of vector processor is one of the great inventions. It's certainly underappreciated by most scientists. It just computes very fast. It was the workhorse for computing for, really, two decades. It was the workhorse from '75 to '95. It had a wonderful elegance to it and the way it works. It really was a spectacular piece of engineering.

Related Lists

Perspectives from Eckert-Mauchly Award Winners

Bruce Shriver and Bennett Smith asked six Eckert-Maychly Awards winners what were the 5 or 6 most important books or articles that affected the way they approached the central issues of computer architecture and what 5 or 6 books or articles they would recommend for others to read because of likely impact on future architectures. See pp. 52-61 in Shriver and Bennett, The Anatomy of a High-Performance Microprocessor: A Systems Perspective, IEEE Computer Society Press, 1998. The six award winners are: John Cocke, Harvey Cragon, Mike Flynn, Yale Patt, Dan Siewiorek, and Robert Tomasulo.

Processor design pitfalls

Grant Martin and Steve Leibson listed thirteen failed processor design styles in "Beyond the Valley of the Lost Processors: Problems, Fallacies, and Pitfalls in Processor Design." See Chapter 3 inJari Nurmi (ed.), Processor Design: System-On-Chip Computing for ASICs and FPGAs, Springer, 2007.

Designing a high-level ISA to support a specific language of language domain

Use of intermediate ISAs to allow a simple machine to emulate its betters

Stack machines

Extreme CISC and extreme RISC

VLIW

Overly aggressive pipelining

Unbalanced processor design

Omitting pipeline interlocks

Non-power-of-2 data-word widths for general-purpose computing

Too small an address space

Memory segmentation

Multithreading

Symmetric multiprocessing

[Who are the architects page] [Mark's homepage] [CPSC homepage] [Clemson Univ. homepage]

mark@cs.clemson.edu