The Windows team had a history of attempting massive and ambitious projects that were often abandoned or repurposed after a few years. An earlier example was the ambitious Cairo project which was eventually gutted, with some components salvaged and shipped as part of Windows 2000.

By far the biggest problem with Windows releases, in my humble opinion, was the length of each release. On average, a release took about three years from inception to completion but only about six to nine months of that time was spent developing “new” code. The rest of the time was spent in integration, testing, alpha and beta periods — each lasting several months.

Some projects needed more than six months of core development so they proceeded in parallel and merged with the main code base when ready. This meant that the main tree was almost always in a semi-broken state as large pieces of functionality were being integrated or replaced. Much tighter controls were put in place during the Windows 7 release to ensure a constantly healthy and functioning code base but earlier releases were plagued with daily instability for months at a time.

The chaotic nature of development often resulted in teams playing schedule chicken, convincing themselves and others that their code was in better shape than other projects, that they could “polish” the few remaining pieces of work just in time, so they would be allowed to checkin their component in a half-finished state.

The three year release cycle also meant we rarely knew what the competitive landscape and external ecosystem would look like when we started a release. Missing a release meant cancellation (as the feature rarely made sense two releases or six years later) or, worse, banishment to Siberia — continued development on a component that was mostly ignored by the rest of the multi-thousand person organization and doomed to eventual failure or irrelevance, but one that the team or the execs simply couldn’t bring themselves to abandon. I was personally responsible for a few such projects. Hindsight is 20/20.

Given that each team was busy pushing their own agenda and features into the release, they often skimped on integration with other components, user interface, end to end testing, and ugly and tedious issues such as upgrade, leaving these thorny issues for the endgame. That, in turn, meant some teams quickly became bottlenecks as everyone jockeyed for their help in finishing their UI or upgrade testing at the last minute.

At any given point in time, there were multiple major releases in progress as well as multiple side projects. Different teams were responsible for code bases in various states of health resulting in a model where “the rich got richer and the poor got poorer” over time — teams that fell behind, for one reason or another, more often than not stayed behind.

As a project neared completion, program managers would start looking at requirements for the next release and developers in “healthy” (rich) teams would start implementing new code In a newly “forked” source tree for the next release but vast parts of the organization (the poor) were still stuck on the current release. In particular, test teams rarely freed up from a release until it shipped so new code wasn’t thoroughly tested in the beginning of a project and “unhealthy” teams always lagged behind, putting the finishing touches on the current release and falling further and further behind. These teams were also often the ones with the lowest morale and highest attrition meaning that the engineers inherited fragile code they hadn’t written and hence didn’t understand. I could write compelling horror stories about some of the “bug fixes” and “enhancements” such teams had introduced into the code base over the years.

For most of the duration of Vista/Longhorn, I was responsible for the storage and file systems technologies In Windows. That meant I was involved with the WinFS effort although it was driven primarily by the SQL database team, a sister organization to the Windows team.

Bill Gates was personally involved at a very detailed level and was even jokingly referred to as “the WinFS PM”: the program manager responsible for the project.

Hundreds, if not thousands, of man years of engineering went into an idea whose time had simply passed: what if we combine the query capabilities of the database with the streaming capabilities and unstructured data functionality of the file system and expose it as a programming paradigm for the creation of unique new “rich” applications.

In hindsight, it’s obvious that Google handily solved this problem, providing a seamless and fast indexing experience for unstructured and structured data. And they did so for the entire internet, not just for your local disk. And you didn’t even need to rewrite your applications to take advantage of it. Even if WinFS had been successful, it would have taken years for applications to be rewritten to take advantage of its features.

When Longhorn was cancelled and Vista was hastily put together from its smoldering embers, WinFS was kicked out of the OS release. It was pursued by the SQL team for a few more years as a standalone project. By this time, Windows had a built-in indexing engine and integrated search experience — implemented purely on the side with no application changes needed. So the relevance of WinFS became even murkier but the project still carried on.

The massive security related architectural changes in Longhorn were kept as part of the Windows Vista project after the Longhorn “reset.” We had learned a lot about security in the rapidly expanding internet universe and wanted to apply those learnings at an architectural level in the OS to improve overall security for all customers.

We had no choice. Windows XP had shown that we were victims of our own success. A system that was designed for usability fell far short in terms of security when confronted with the realities of the internet age. Addressing these security problems meant the creation of a parallel project, Windows XP Service Pack 2, which (despite its name) was a huge undertaking sucking thousands of resources away from Longhorn.

To be clear, the Windows NT kernel had always supported multiple user identities and properly implemented administrative privilege boundaries but these were ignored in user mode in order to enhance Windows 95 compatibility. This meant that the default user on the system was set up with administrative privileges “out of the box” and by default.

Applications routinely (and often unknowingly) abused this privilege and stepped on each others’ toes by overwriting common files and registry settings. The same, of course, was true of pieces of malware that took over the system easily because the user was running with supervisor (“root”) privileges.

Enforcing strict administrative boundaries in Vista meant breaking practically every single application in the Windows universe. Part of the solution was UAC (User Account Control), arguably the most hated feature of Vista. The system would ask the user whether she really meant to raise the privilege level when she ran a command or clicked on a script that tried to do so. Installing legacy applications almost always required elevation of privilege so a user’s first interaction with the system was tons of confusing UAC popups ruining the experience.

It would not be much of an exaggeration to say that 99% of all Windows apps failed to even install properly if administrative access was taken away from the logged-in user. Decades of backwards compatibility with Windows 95 meant our hands were tied: improve security, break app compat.

Vista was, by all measures, massively more secure than any earlier OS shipped by Microsoft but in the process also managed to break application and device driver compatibility in an unprecedented manner as we stopped supporting “known bad” APIs or to work around them using mechanisms like UAC.

Customers hated it because their apps broke and ecosystem partners hated it because they felt they didn’t have enough time to update and certify their drivers and applications as Vista was rushed out the door to compete with a resurgent Apple.

In many cases, these security changes meant deep architectural changes were required to third party Device drivers and solutions. And most ecosystem vendors had no incentives to invest heavily in their legacy apps. Some of these solutions took the unorthodox approach of modifying data structures and even instructions in the kernel in order to implement their functionality, bypassing APIs and multiprocessor locks, often causing havoc. At one point, something like 70% of all Windows system crashes (“blue screens”) were caused by these third party drivers and their unwillingness to use supported APIs to implement their functionality. Antivirus vendors were notorious for using this approach.

In my role as head of Microsoft security, I personally spent years explaining to antivirus vendors why we would no longer allow them to “patch” kernel instructions and data structures in memory, why this was a security risk, and why they needed to use approved APIs going forward, that we would no longer support their legacy apps with deep hooks in the Windows kernel — the same approach that hackers were using to attack consumer systems.

Our “friends,” the antivirus vendors, threatened to sue us in return, claiming we were blocking their livelihood and abusing our monopoly power! With friends like that, who needs enemies? They just wanted their old solutions to keep working even if that meant reducing the security of our mutual customers — the very thing they were supposed to be improving.

There were so many seismic shifts happening in the computing industry during those years — the advent of the internet, the rise of the mobile phone, the emergence of cloud computing, the creation of new ad-supported business models, the viral growth of social media, the relentless march of Moore’s law, the maturation of 64-bit computing, cheap reliable storage, abundant networking bandwidth, an evolving security and privacy landscape, and the popularity of open source are just a few factors that assaulted Windows from all directions.

The response, not surprisingly for a wildly successful platform, was to dig its heels in and keep incrementally improving the existing system — innovator’s dilemma in a nutshell. The more code we added, the more complexity we created, the larger the team got, the bigger the ecosystem, the harder it became to leapfrog the competition.

As if the competitive forces weren’t enough, this was also the time when armies of engineers and program managers spent countless hours, days, weeks, and months with representatives from the DOJ and corporate lawyers, documenting existing APIs from previous releases in order to comply with the government’s antitrust rulings.

The stark reality is that, at this point in its life cycle, it took roughly three years to get a major release of Windows out the door and that was simply too slow for the fast moving market. WinFS, Security, and Managed Code were just a few of the massive projects on the agenda for Longhorn. There were also hundreds of smaller bets.

When you have a multi-thousand person organization and literally billions of customers, everyone gets a say. The same OS release that is supposed to work on the forthcoming tablet and smartphone footprint is also supposed to work on your laptop, in servers running in the data center, and in embedded devices such as NAS boxes “Powered by Windows” — not to mention on top of a hypervisor (HyperV) in the cloud. The requirements pulled the team in opposite directions as we tried to make forward progress on all segments of the market simultaneously.