What Really Happened with Vista

85,275 reads

I generally have posted about things that I have been directly involved with — either code I wrote or projects I managed. In this post I am taking a different tack to write about my perspective on the underlying causes of the Windows Vista (codename Longhorn) debacle. While this happened over a decade ago, this was a crucial period in the shift to mobile and had long-running consequences internally to Microsoft. I have found many of the descriptions of Microsoft’s problems, especially around the shift to mobile, to be unconvincing and not to mesh with my understanding or experience of what went wrong. Vanity Fair’s article Microsoft’s Lost Decade, ascribed it to bureaucratic rot and infighting (“life … had become staid and brutish”) or culture rot due to the negative effects of a competitive stack ranking evaluation system. A more recent article in The Atlantic describes it as a classic “Innovator’s Dilemma” story.

I think there is a different story to tell — one that is better rooted in the actual facts of the projects and the real motivations of key parties. This is not an effort at alternative history — I have no idea what would have happened if these mistakes were not made but they certainly did not help Microsoft navigate this critical inflection point in the computing industry.

This is also not investigative journalism — I have not done a deep series of interviews with key parties. This is my point of view based on what I experienced during that period and learned afterwards. Although I was in the Office organization at the time, I worked closely with many Windows partners and so was very aware of what was happening in the Windows organization.

I apologize for the length. The TL;DR; version is:

Microsoft badly misjudged the underlying trends in computer hardware, in particular the right turn that occurred in 2003 to the trend of rapid improvements in single-threaded processor speed and matching improvements in other core elements of the PC. Vista was planned for and built for hardware that did not exist. This was bad for desktops, worse for laptops and disastrous for mobile.

The bet on C# and managed code was poorly motivated and poorly executed. This failure in particular can be laid directly on Bill Gates and his fruitless Holy Grail effort to create a universal storage and universal canvas applications infrastructure. This had especially long-running consequences.

Windows project management had teetered on catastrophic throughout its history with a trail of late projects that stumbled to completion. Vista was a disaster but was just the culmination of a series of near-catastrophes in the core executive mission of complex project execution.

Since it is so critical to this story, I want to start with a short primer on the structure of the industry and value creation.

Any device is constructed from hardware, an operating system (OS) and applications. At the most basic level, the OS manages and exposes hardware resources so they can be shared and used by applications. The OS exposes application programming interfaces (APIs) that allow different types of hardware to be connected to the device and APIs that allow applications access to this hardware as well as OS-provided services. While at the base level the OS is only exposing hardware resources, in practice the “OS” includes many other higher-level functions including a graphical user experience, complex controls for displaying and editing rich text or embedding HTML, higher level networking support, file management, mechanisms to share data and functionality between applications and even full applications like browser, email, photo management and cameras. The history of the OS, especially in the consumer world, is one of including more and more high-level services, either provided directly to users or exposed as APIs to applications.

This evolution of higher-level functionality is driven by the virtuous cycle and multi-sided network effects inherent in the OS business. More and more users of an OS attract more developers. More developers create more applications that make the OS more attractive to users. That results in a cycle of still more users leading to still more developers. The APIs exposed by the OS are what makes this such a stable business strategy for the winners in this contest. Millions of developers in aggregate expend massive effort programming to the system APIs and the services behind them. The deeper some application depends on the sophisticated APIs exposed by a particular OS, the harder it is to move that application to some other OS. This means that even if a competitor matches the core functionality of another OS, they will be missing all those applications. There is no way a single OS provider can match the effort expended by those millions of developers.

With this dynamic, there are multiple reinforcing reasons for an OS provider to add more and more sophisticated functionality and APIs to their OS and make it easier for developers to access this functionality. Sophisticated functionality should attract developers and make it easier for developers using these APIs to quickly build better applications. These better applications fit directly into the virtuous cycle to attract more users. A classic example was when Windows was the first OS that made it possible for an application to embed an HTML surface directly in the application. Critically, when an application uses this functionality, it now makes it harder to move that application to another OS.

If you look at Windows, iOS and Android, they all operate with this same dynamic despite the fact that Microsoft, Apple and Google all monetize differently. Microsoft classically charged a per-device licensing fee that was paid by OEMs that sold Windows devices. This was a horizontal business strategy with lots of OEMs all paying Microsoft for the devices they built and sold. Apple monetizes by building and selling devices directly. Google does not charge OEMs an OS license but rather depends on post-sale monetization primarily through search. In fact the fear of being excluded from mobile search (and mobile services generally) by Apple and Microsoft is what drove Google to invest in Android in the first place. Microsoft is also moving to direct device monetization (through its Surface line) as well as post-sales monetization (through Bing and subscription-based services like Office 365).

Another important part of the story here is third-party middleware like Java and Adobe Flash. Middleware is in some ways no different than higher-level OS services except they are built and provided by a third-party. OS providers have a love-hate relationship with middleware. To the extent that it makes it possible for developers to more quickly build great applications for their platform, there is love. The “hate” part is driven by a number of dynamics. Certain types of middleware specifically target the challenge of building applications for multiple platforms. Middleware like Java and Flash promised “write once, run anywhere”. Applications built on this middleware do not depend directly on OS APIs and therefore can run on any platform where the middleware exists. The middleware handles mapping the APIs it exposes on to the native OS APIs. (Note that modern readers will think of Java as either server-side infrastructure for web sites or as the preferred language for Android development. I am referring to its origin as a language for on-demand browser-based applications which was its main identity as Vista was planned.)

Cross-platform middleware disrupts the network effects driven by exclusive applications tightly bound to an OS through exclusive OS-specific APIs. Applications built on such middleware also tend to target “lowest common denominator” functionality and are slower to take advantage of new OS capabilities. Some types of OS functionality generate their own internal network effects where the more applications that use the functionality, the better all the applications behave. Rich copy-and-paste is a classic example; the more applications allow rich content to be copied and pasted between them, the more valuable the OS is for each user. If a third-party middleware provider blocks this dynamic, it has blocked a key opportunity to create further sustainable differentiation over time.

The browser as an application-delivery platform is probably the most stable example of middleware that disrupts the OS API dynamic. Looking back on 35 years of PC history, other approaches have existed for a time but ultimately collapsed for reasons that I do not need to dive into here. The critical thing for this story is that twenty years ago it was not so clear that this is how it would play out. Fear of middleware and disruption of the sustainable API differentiation drove much thinking at the time of Vista.

Let’s dive into the story of Windows and the challenges of execution.

I am going to do a brutal job of summarizing but I think it captures the essence. A Windows release generally had a key theme and rough timeline. For example, Windows 95 was about modernizing consumer Windows to 32-bit, a modern file system, updated UI and standard networking (including a browser). Beyond main themes, individual developers and teams would determine the key features for their area on their own and begin on development. The product under development would lose stability as new features were checked in and for long periods could barely be packaged together as a release much less used broadly on a dogfood basis. At a certain point, the team would determine they had made enough progress for the release and start the drive to stability and shipping. The history of the Windows team was that the release generally slipped significantly from the initial target date (Windows 95 was initially Windows 93) and important target functionality was either dropped or shipped well short of the original functional target. The drive to shipping was often a “death march” that involved long nights and weekends of bug fixing to make the new dates. I will note that a key distinction between Windows and Office was that after Office 97, Office would pick a release date and generally nail it. This made all the difference in achieving broad coordination with minimal process overhead.

This process contrasts significantly with modern engineering practices. Independent of whether individual feature ideas are driven top-down from a broad consistent vision or bottom-up from individual engineers and teams, modern practices generally involve maintaining continuous ship-level quality and actually shipping to customers on a very frequent basis. Services might ship multiple times a day while client code will ship weekly or monthly (updating clients has an expense for both the provider and user which militates against updating too rapidly). This requires major architectural and engineering infrastructure to accomplish reliably for large complex systems like Windows or Office. This process does not necessarily make it easier to build big advances in complex functionality, but it dramatically increases the teams agility and ability to respond to external events and realities. It also forces a much more honest ongoing appraisal of how much real progress is being made. It is probably worth a separate post to describe how Office made this transition but for this story suffice to say Windows was nowhere near this process during this timeframe.

Windows XP was a massive release that followed this pattern. It unified the business and consumer platforms on the robust kernel of Windows NT and the consumer Windows user experience. Compatibility with all the applications built on top of the consumer Windows platform was a huge challenge but was key to enabling the transition to a single consumer and business platform. Unfortunately, Windows XP also had a headline-making zero-day security exploit on its public release date. This and other high-profile security disasters started a pivot inside Microsoft to a huge overhaul of software and engineering practices around security and ultimately an immense service pack to Windows XP that consumed a large part of the Windows organization. Additionally, key parts of the Windows core engineering team were focused on the move to 64-bit computing which was especially key to unifying Windows across client and server development (Windows was behind other enterprise platforms like Sun’s Solaris in 64-bit computing at the time). This was critical as Windows competed (successfully) to build up a larger enterprise server business.

While large parts of the Windows organization were focused on these initiatives, an equally large organization was focusing on building the next generation of Windows on top of a “managed” C# platform. This requires a little more background.

The browser started evolving as an application delivery environment from very early days. Marc Andreessen’s notorious quote “Netscape will soon reduce Windows to a poorly debugged set of device drivers” is from 1995. This led to the “embrace and extend” strategy that got Microsoft in so much trouble in the antitrust case. Microsoft built and invested in its own browser and its own ActiveX code embedding mechanism. Java arrived in this time frame as an alternate application delivery strategy. Developers could use Java, a high level language with its own rich set of APIs, and have the code automatically downloaded and run in the browser. Java was billed as a “write once, run anywhere” technology that fell clearly in the “hate” half of the middleware reaction spectrum. Microsoft signed a surprising licensing deal with Sun for Java but then was sued when they extended Java to allow direct access to native Windows APIs (which would disrupt the “run-anywhere” promise but allow a Java developer access to the richer and evolving set of platform APIs). Microsoft settled the Java lawsuit and ultimately decided to forge its own path with the C# language. This proved to be disastrous for a wide range of reasons. (I’ll note that C# is a fine piece of technology on its own — the disaster was strategic.)

C# is a “managed” language which mostly means that developers do not need to manage allocating and freeing memory “by hand”. The language and its runtime component use garbage collection to automatically recover any memory no longer in use. Importantly for this time, the runtime also prevents the type of memory bugs that cause many of the security vulnerabilities we were seeing. At the time then and really for the following decade there were passionate arguments about the impact on programmer productivity and security of automatic memory management. I will not try to have that debate here but perhaps suffice to say that the most successful OS of our current era, iOS, did not make this gamble. (Android sells more copies but iOS captures the vast majority of the profit.) Managed environments have an inherent resource overhead compared to unmanaged environments so they tend to require more memory to run. Most environments that leverage the productivity benefits of managed code are careful to limit its use to where it makes the most sense rather than leverage it blindly.

Programmers new to these types of environments (and virtually all the 100’s of developers working in Windows on the project at this time were) tend to take a casual overall attitude to memory use. But memory use, whether automatic or manual, is resource use and a casual attitude to resource use results in bloated code that requires more memory to run. In fact, even being “highly productive” (generating lots of code) is not necessarily the best outcome for a project (“if I had more time I would have written a shorter letter”). Using more resources was part of the value system at the time since it reinforced how important a large rich client was to the computing experience (vs. a “thin client” like a simple application exposed through a web page). The Windows team providing updates on Longhorn would brag about how many new APIs they had written.

Part of the bet on C# was also a bet on a rich base class library and then building new client technology as a set of class libraries on top of this base. The base library provides simple types like strings and arrays as well as more complex data structures and services like lists and hashtables. The goal was that this would provide consistency to the overall Windows API. Win32 had started as a relatively small consistent API but had exploded over the previous decade with many different teams contributing to the API set and with little overall consistent oversight. This effort was seen as an opportunity to rein that back in.

The fact that no other operating system had taken this path was seen as the type of “big bet” that was a fundamental part of the value system in the internal Microsoft culture. Unfortunately, even beyond the issues of bloated resource use, there were fundamental challenges involved in using this as an operating system technology (especially around how to handle resource failures in critical parts of the OS, how to independently update applications, the managed runtime and the OS and how to allow different parts of the OS to evolve independently) that had not been addressed or were even fully understood at the time. There was virtually no migration strategy for existing applications built on top of the unmanaged Win32. Despite this, vast armies of developers were unleashed building on top of this unsteady platform. What were they working on?

This is really where we need to pull Bill Gates into this story.

The whole history of Microsoft from its origin is about the primacy of software over hardware. Hardware is a commodity — software is where the value lies. If you look at sustainable profit in the PC ecosystem, this view is mostly accurate. Certainly it was true looking at OEMs and Microsoft. Intel was the hardware player that was able to capture most value in the overall hardware stack. At the same time, the thing that makes the overall computing ecosystem so dynamic is the relentless exponential improvement in hardware capability driven by Moore’s Law and other exponential hardware trends. Software rides this wave even as it captures much of the economic value in the overall final product. Remember the fundamental core role of an OS is to expose hardware resources for fair use by applications.

The intertwined role of hardware and software can sometimes make these perspectives on value attribution hard to tease apart. In fact, it might be easier to look at the smartphone space to provide more clarity. When RIM engineers (maker of the dominant Blackberry phone) saw a demonstration of the original iPhone, their initial reaction was “that’s impossible”! It was impossible to build a full screen lightweight phone with touch interface and the demonstrated performance and actually have the battery last long enough to be useful. But it actually was possible. Over the last ten years, the market has been driven by ongoing hardware innovation (better screens, faster communication, faster processor, more memory, better camera, new sensors, better batteries, lighter weight, instant on) mediated by the OS software.

As iOS was opened up to third-party applications, the striking thing was just how carefully the OS controlled application behavior in order to preserve overall device performance. From the standards and review process enforced through the curated Apple store, to careful application sandboxing, to the initial limitation to a single task and no background processing, to tight constraints on application responsiveness, to low-power hardware-assisted audio and video processing as well as a host of other behaviors all focused on managing precious power use, many iOS innovations were fundamentally focused on the core OS function of managing and carefully exposing hardware resources to applications.

The contrast with the Windows team perspective as the Vista project started could not be more stark. The role of hardware innovation was to enable new software innovation rather than the role of software being to expose hardware innovation.

As the Longhorn project started, three big teams began working on large efforts to rethink the client software stack on top of the managed C# platform. The WinFS team would build a new universal storage layer for applications. Instead of a simple hierarchical collection of files and folders, the file system would be powered by a full-featured relational storage engine. This would not only make it easier to build powerful new applications, but their data would not be locked in opaque files but would be exposed to other applications as relational tables that could be mixed and matched into new and even richer applications. These are the types of internal network effects that create powerful competitive moats. The information could also be exposed in a new file browser for simple search as well as complex queries. This would finally realize the vision that inspired the Cairo universal storage effort but was abandoned in order to ship Windows 95.

The Avalon (later Windows Presentation Foundation) team would rethink the presentation layer on top of powerful graphics processors. This presentation layer would be focused on building a universal canvas where user interface and rich application content could be seamlessly mixed into experiences that were part document and part user interface, all powered with 3D game-capable graphics processors.

A third team built the code that shipped as Windows Communication Foundations (WCF) for building networked features, a critical component for virtually every modern application.

The combination of rich storage and rich presentation was Bill’s Holy Grail. Built on a consistent managed C# infrastructure, these components would enable new classes of powerful applications that could be quickly and efficiently constructed by developers. The rich infrastructure and API moat would power the OS virtuous cycle for another decade.

So what went wrong? In a word, everything.

Some problems were due to short-term execution failures and some were longer term strategic failures.

As the core team came off the security effort and the 64-bit Windows product, they re-evaluated the status of the overall Longhorn project. The teams had written a massive amount of code. Unfortunately, when you are building a complex system and running without clear constraints and delivery deadlines, the right mental image for a team that is generating lots of code is not one that is building a railroad and is now 90% across the country. A better image is one where you have dug an incredibly deep hole that you now have to figure out how to climb out of and fill back in. The team was just coming to grips with understanding all the implications of trying to ship OS features on top of this managed infrastructure. They recognized they had a ton of work to do to even make the basic premise a reality. On top of this, none of the major components were anywhere near ready to ship. They also started to realize the performance implications of introducing major new subsystems into existing code paths. The work being written in WinFS and Avalon were not replacing the existing OS infrastructure, they were layered on top of it. So all their significant performance costs were purely additive.

As detailed in a Wall Street Journal article from 2005, Allchin made the decision to push these major components out of the release while continuing development on them. The effect was that three years into the product cycle, they were effectively starting from scratch. All these managed features would be pushed out of the core OS and would ship separately. Pulling them out was clearly the right decision, but both revealed and introduced problems that would last for more than a decade.

The bet on C# and managed code included a strategy that reduced investments in the core unmanaged Win32 layers. I remember long meetings trying to get Windows to commit to relatively minor investments in text and graphics features that Office needed. Pulling these C# components out of the release made it even more obvious that Windows would be going years with very little improvement in core user interface controls for developers (like Office) on their main Win32 API.

Also catastrophically, the bet on Avalon had been paired with a major disinvestment in IE. The IE team was gutted to staff Avalon and IE was left on life support struggling to address the torrent of security issues cascading in. The vision was that HTML would be a legacy technology and the kinds of applications our competitors were targeting for the browser and HTML would be built on top of the new Avalon infrastructure.

This was a huge strategic mistake and opened up a gap for the rise of Firefox and then the Chrome browser from Google. Whether continued investment in IE would have prevented that is impossible to tell, but it certainly did not help. It also hamstrung the IE team and left them unprepared and unstaffed to address the continuing rapid evolution of web technologies which degraded IE’s reputation with web developers. The fact that it was a mistake was apparent across the company immediately; there was no need for twenty-twenty hindsight. Office and other parts of the company had large investments in the web and HTML. There was no plausible path where those investments would move over to Avalon, much less expecting the entire industry to move. In fact there was never even an attempt by the Avalon team to describe a plausible path — something magical would happen and suddenly everyone would be building Avalon apps instead of on HTML. It was absurd as well as being unconscionable. Immediately after we “won” the browser wars and saw Netscape absorbed by AOL, we radically cut further development in these open standard technologies. It was not until Windows 7 that we re-staffed the IE team and restarted aggressive investment in IE and standard web technologies.

There were further challenges with Avalon.

Avalon’s model was based on this focus to realize Bill’s vision and provide a universal canvas runtime for applications. As I detailed in the post Leaky by Design, one of the key challenges for developers of frameworks like Avalon is how to expose features at different levels so that applications can tie in at the appropriate functional level and not pay excessive performance costs. By only exposing functionality at a very high level, they made all their work essentially unavailable to more sophisticated applications (like the Office apps) that would like to tie in at lower levels. It would take 10 more years until the release of Windows 10 before they really resolved these design issues.

Avalon also made a bet on the PC graphics model driven by power-hungry graphics cards. The mobile graphics model, while sharing some elements, is mostly focused on achieving smooth animation by taking pre-rendered textures or layers and zooming, panning and blending them. The number of layers is carefully constrained so the mobile graphics processor can achieve the fast frame rates needed for smooth animation and user interaction with very low power usage. The graphics model Avalon was exposing was effectively moving in the opposite direction.

The challenges with WinFS were in some ways even more fundamental than for Avalon; while Avalon shipped independently and some key concepts were used as the basis for the UI components that shipped in Windows 8 and 10, WinFS was ultimately abandoned.

As initially envisioned, WinFS would become the file system. The challenge was that replacing the file system with a completely new implementation that provides major new functionality while at the same time appearing essentially unchanged to the vast array of existing software is an incredibly daunting task. Especially because key Windows core engineering was busy with other efforts (security and 64-bit), WinFS was built as a component that would sit on the side and provide additional functionality for searching and rich queries. This design meant that WinFS would incur significant additional performance cost with fewer opportunities to optimize end-to-end. As with any new feature, those costs would have to be balanced with the feature benefits. But at that point, WinFS was really just providing search which was “just a feature” and not the world-shaking paradigm change that Bill envisioned. Microsoft already had a desktop search engine that operated at significantly lower performance cost than WinFS. Furthermore, incurring such an upheaval in the ecosystem for local PC search right as most information was moving off the PC and into the cloud was a major misreading of where innovation was heading, driven by this relentless effort to try to focus innovation on the rich client.

More profoundly, Bill’s vision of an ecosystem of applications that all store and share their data in this relational store is in direct conflict with how applications build their data models. While some desktop applications (and almost all internal IT-written ones) use relational stores for their internal data model, they do not want to expose those data models for unmonitored read and write by other applications. I detailed some of the fundamental reasons in the post referenced above, Leaky by Design. There were (and are) lots of other choices for applications that want to use a relational store. Of course the long-term direction was that all this data was moving into the cloud, not getting trapped in a local PC storage system.

The decision to continue investing in this managed stack and push it out of the OS release would have long-running implications well after Vista. The management team accepted the reality that it would not be part of the OS release but continued to view these layers as the primary locus of client innovation. When Sinofsky reorganized the Windows organization for the Windows 7 product cycle, he pushed all of the managed code efforts out of Windows and into the developer division, aligned with the other teams in “DevDiv” focused on building the managed compilers, runtimes, base libraries and development environments. He would later fight the battle of what was the core Windows runtime for the Windows 8 product cycle but effectively deferred that fight by pushing those teams out and not creating alternate efforts inside the Windows organization. This had long-running consequences. It continued the internal investments and costs. It continued the public perception that the managed runtimes were the future of Windows. It created a “scorched earth” where no real significant investment was going on in these areas outside the managed area (and therefore were completely missing for Office and other large unmanaged applications). It also divorced these managed code teams from even thinking about deep investments focused on exposing new hardware innovation rather than building a purely independent middleware layer. These managed libraries and runtimes became “pure middleware”. In fact, in an aborted effort to compete with Flash, these teams packaged core components together into Silverlight and even delivered it across different OS platforms. It would be harder to provide clearer evidence that all this software innovation was completely divorced from a focus on how to uniquely expose hardware innovation in a way that only an OS is capable of.

If middleware is one of an OS providers nightmare scenarios for disintermediation, it is clear that “we met the enemy and he is us”. I do not claim to have had unique insight during this period. I was frustrated by the focus on these managed code layers and their uselessness for most Office scenarios but I could not articulate the strategic issues clearly. In fact, the OS innovations in iOS were what made it so clear in retrospect how wrong-headed the overall world view driving this work was.

The accusations of bloat I have made against the managed C# stack clearly does not explain the challenges with Vista performance since the managed layers were pushed out of the release. Windows XP shipped with a minimum system requirement of 64MB of memory, raised to 128MB in the major Windows XP security service pack. Vista increased the memory requirements to 512MB, although realistically required up to 1GB to run well (older readers will remember the scandal of questionably labeled “Vista Capable” computers). There is no single explanation for the increase in requirements. There were lots of individual teams that looked to take advantage of “inevitable” increases in performance due to Moore’s Law and the cumulative effect was this bloat. In fact, an important factor in this overall performance cost (and the overall quality issues) was the race to shipping that happened at the end of the release. Performance results come from big decisions but often comes from many small decisions and small improvements made by long hours spent analyzing code, driving results and balancing costs and benefits. That time simply was not available. The improvements made in Windows 7 clearly demonstrated that the opportunity was there — but not the time.

The other major impact to Vista’s reputation for stability was due to the problematic nature of the drivers — the key software written and provided by hardware makers (graphics cards, network cards, hard drives, etc.) that ties in to the OS. Vista made an important change to the driver model that moved this software out of the core OS kernel and into a layer that could be managed more robustly. The “blue screen of death” that Windows XP was renowned for when the OS crashed was almost always caused by the code in some third-party driver. By moving this code out of the kernel, Windows could make the overall system much more robust.

The changes made to the driver model required large code changes by all the vast landscape of hardware providers that wrote code for Windows. The advantage of that big moat becomes an anchor when trying to make these types of large scale changes across the ecosystem. Because Vista was so often delayed, hardware vendors had a difficult time scheduling or prioritizing this work. Much was not ready at the time of Vista launch, which meant that many users first experience with Vista was influenced by these missing or very flaky drivers.

The collapse of processor scaling I mentioned at the start of this post is just part of the performance story here. The computing industry has been driven by exponential improvements in data processing — the amount of data that can be stored and processed, the speed at which it can be processed and the bandwidth and latency with which it can be communicated between different devices. Much of this is driven by Moore’s Law — the regular doubling of the number of transistors that can fit in the same area of an integrated circuit. This simple doubling pattern is familiar to consumers as it manifests in the increase in processor speeds, increases in amount of dynamic memory, increases in storage capacity and the increases in communication speed they came to expect.

The reality is quite a bit more complicated. Increases in processor speed were accompanied by increases in power use and heat output. I remember a telling chart that plotted the increases in Intel’s processors heat output. The logarithmic scale showed a straight line through Intel’s early processors, through Pentium up to a heat output equivalent to the surface of the sun. Stein’s Law “if something cannot go on forever, it will stop” comes to mind here. The computing industry had run headlong into the “power wall”. Processor speeds could not scale without unacceptable increases in power requirements and heat output. When you look at charts of processor speed trends, there is a right turn in 2003, right in the middle of the Vista debacle. Other trends also made a naïve reading of performance improvements dangerous. Chip manufacturers were making denser memory chips, but the “memory wall” meant that latency between CPU and memory made effective use of all that memory harder and harder. Perhaps the worst problem in creating a balanced PC system was the increase in disk storage capacity but a far slower increase in the number of random IO operations per second. This meant that larger programs could fit on those larger disks and in that larger memory but took much longer to start up. The imbalance meant that faster programs could easily issue IO requests faster than the disk could service them — the result was sluggish systems despite machines that had faster processors and more memory.

Vista was shipping into an environment where the shift to mobility was gaining more and more speed. Revenue totals for laptops passed desktops in 2003; by 2005 laptops also passed desktops in total units sold. Because Vista ran so poorly on newer cheap laptops (“netbooks”), Microsoft was forced to let OEMs continue selling Windows XP for those lower end machines.

An important part of what was happening here was a deeper problem — the basic sufficiency of the desktop form factor for the jobs it was being asked to do. The basic use cases — productivity (mostly Office), communications, browsing (including search, web sites and web applications), custom internal line-of-business applications, front ends to custom devices (think of your dentist’s x-ray machine) had mostly stabilized by 2000 and have not changed much since then. Microsoft could continue building new APIs but mostly the devices already did what users needed. The improvements desired by users — better manageability, stability, performance, security on the software side and longer battery life, lighter weight, faster processors, faster communications, bigger screens on the hardware side in many cases needed less software, not more.

Sufficiency is a dirty word in this business although the challenges it causes are well captured and shared broadly by Christensen’s “The Innovator’s Dilemma”. The more recent book “The Rise and Fall of American Growth” by Robert Gordon describes this notion of sufficiency across a much wider spectrum of the American economy. Sufficiency is sort of like an economic recession. A recession is declared after two consecutive quarters of declining output — which means you are six or more months into the recession before it is actually acknowledged. Even as the use cases for desktop computers did not change, there continued to be important evolution in the basic hardware that kept participants in the ecosystem focused on trying to leverage these innovations into new use cases. Decades after laptops were introduced, I still want an even lighter laptop with an even longer battery life. But what I use that laptop for generally has not changed.

Note that I am focusing here on form factor sufficiency. Overall computing requirements across the economy have continued to grow explosively. But faster and more pervasive communications enable more flexibility in how an application allocates its computing requirements (data and processing) between different nodes in the system. Many influences push to place more of that processing in the server or cloud and have for the last two decades. I would rather power those compute cycles off the Grand Coulee Dam with a server in eastern Washington than have to lug a battery pack around with me. If the data needs to be accessed from multiple devices or accessed by multiple users, I want to store and process it in a server, not on a local PC. Continuing improvements to wireless communication (and end-to-end communication bandwidth overall) make this an extremely stable state for device computing.

We do not see this only in desktop (including laptop) computing. The tablet probably blasted to form factor sufficiency faster than any broad consumer computing device we have ever seen. Actually, a broader perspective would say that is untrue. We were struggling with weight, battery life, processing capability, input modes and overall responsiveness in different incarnations of the tablet for decades. But when the iPad arrived on the scene with its combination of screen size, weight, battery life, touch input, processing power and instant-on we had turned through an inflection point of sufficiency. Changes since then have been merely incremental — which drives crazy the engineers working on these things and expending great energy and creativity to have it described this way. The engineers at Maytag working on the next iteration of the washing machine probably feel the same way.

Yes, people want better screens, faster processors and longer battery life. But mostly the device does what people need it to do — which significantly explains the rapid leveling off of tablet sales. Smartphones seem to be going through a similar transition. In fact, as communication improves and software better manages how data is transparently managed between the service and the device, this becomes even more true.

Is there a broad lesson to draw from this story?

One is so fundamental as to be trite. Execution matters. There is no innovation without execution.

The second is one that I took greatly to heart in my subsequent career. If you want to do broad ambitious things, you need to be accountable to articulate why it is the right thing to do. You need to be able to write down your basic thesis and the evidence behind it and then defend it. In fact, the more power you hold, the more accountable you need to be to open yourself to honest challenge on either facts or logic. This is even more critical in times of rapid change because the facts and consequential logic might change. Accountability and transparency means you are able to reassess your conclusions and react quickly.

As I was living through this, I kept on trying to recall a short science fiction story that I had read growing up that tickled my memory as being relevant. I finally tracked it down and discovered that it was a famous story by Arthur C. Clarke, Superiority, first published in 1951. It is a short read — I encourage you to follow the link and see some amazing parallels.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

Tags