Windows 10 for PCs arrived two weeks ago. Thankfully, we don’t need to wait years to say this will be a Microsoft operating system release like no other.

The most obvious clue is not the fact that Windows 10 was installed on more than 14 million devices in 24 hours, that you can get it for cheap or upgrade to it for free, nor even that it ships with a digital assistant and a proper browser. No, the big deal here is that Microsoft is turning its OS into a service, and that means as you read these words, it’s still being built.

For the next few years, we’ll be getting not just Windows 10 updates and patches, but new improvements and features. This is possible because Microsoft built this version very differently from all its previous releases.

To get more details about what the software giant did this time around, I sat down to chat with Gabe Aul, a Microsoft veteran of 23 years. He was the general manager of the Data and Fundamentals team on the operating systems group before Windows 10’s release, but because he did such a superb job, he was promoted to vice president of the Engineering Systems team on the Windows devices group.

Aul oversaw Windows 10’s engineering process, and so he naturally had a lot to say about how the company built Windows 10, what went wrong, and what comes next.

Build upgrading

Before their respective releases, Windows 7 went through two public previews while Windows 8 went through three. Both were quite the achievement at the time: After all, Microsoft was showing off its most important piece of software to the world before it was finished.

During the Windows Insider Program, Microsoft released 15 public preview builds of Windows 10 for PCs (Windows 10 Mobile is coming later). Here they are in order: 9841, 9860, 9879, 9926, 10041, 10049, 10061, 10074, 10122, 10130, 10158, 10159, 10162, 10166, and 10240.

Windows 10 builds were made available in private and public rings. The private ones included: Canary Ring (daily updates and only accessible by Windows developers), Operating Systems Group Ring (after the Canary Ring approves a build), and Microsoft Ring (after the OSG Ring approves a build, Microsoft employees can test it internally). The 15 public builds went through two Windows Insider Rings: Fast (builds that have been approved by the Microsoft Ring) and Slow (builds that have no major issues in the Fast Ring).

One of the big reasons it was even technically possible to deliver so many builds is because of the changes the Windows 10 team made to the build upgrading process. To be clear, the core upgrade mechanism was not new. This is the same in-place upgrade technology that is already available in Windows 8 and Windows 8.1 (ESD files have been enhanced, but they’re still largely the same).

I learned that there were multiple new components, though, including targeting, pool management, registration, the insider channel, and so on. The most important new part is that the Windows 10 team was (and is still) able to offer a specific group of people a given set of builds, letting them do an in-place upgrade when a new build became available.

A photo from our Flight Ops meeting today, working on the next Fast ring flights. pic.twitter.com/ATXYYY1qbB — Gabriel Aul (@GabeAul) June 12, 2015

I also learned about some of the new infrastructure. The deployment mechanism (connect to Windows Update, take builds and stage them, declare that a build is available for update) was new. The measurement systems (a flight ops meeting held at 2 p.m. PT every day to check on problems both encountered by users and turned up by raw telemetry data for performance, reliability, app compatibility, and so on) were also new.

If all the lights were green, the team would go through the feedback and make the call to promote a given build to the next ring. The mechanism to do that actual promotion was also new, as was the whole return leg (all the user feedback details, which we’ll get to later) of any build’s journey.

Eliminating the ta-da!

This was a massive technical achievement, but the sheer total number of Windows 10 builds made available to testers (five times the number of Windows 8 preview builds) isn’t the only differentiator.

In past preview programs, Microsoft’s approach was to hold the final design assets until the very end. The company very much wanted a “big reveal” — there was a desire to be able to say “Here is what it actually looks like.”

Windows 10 was different. “For the whole product, all features across the board, we said as soon as they’re ready they’ll go in, and then once they’re in, they’ll go out,” Aul explained.

Furthermore, when the first Windows 7 and Windows 8 previews came out, they were two-thirds of the way through the process in terms of product development. As Aul put it, most of the product development was done “in the echo chamber here in Redmond.”

With Windows 10, the product development was “less than a third of the way through the release when we did our first preview.” There simply was no reveal moment.

Instead of holding the look and feel until the rest of the OS was complete, and then trying to explain why the chosen way was so great, Microsoft decided to show testers what they had so far, and then ask for feedback on “whether directionally we’re going in the right way or not, and what else they want us to do in addition to what our plans are.”

It was an open, non-Microsoft approach. We were “much more willing to test our assumptions and to see if we’re on the right track, rather than hoping we were on the right track and getting to the end and finding out if it was right or not.”

By getting user feedback “very early,” Microsoft could incorporate it much sooner in the development process. In short, the company was able to be much more responsive to what Windows users actually said they liked, didn’t like, and wanted changed.

Issues and delays

For testers, the biggest frustration by far was that Microsoft simply couldn’t get builds out fast enough. “Initially we just misjudged the hunger,” Aul admitted.

The thinking was that users would love the idea of monthly builds, especially when compared to previous Windows preview programs. Sure, they loved it, but they also wanted more.

The Windows 10 team needed to change its operations in order to spin out testable builds faster. The mission control dashboard, used for deciding when to promote a build from one ring to the next, had to evolve. Instead of relying on separate feeds and experts physically coming into the room to give their take, it eventually became possible to make a call in real time based on the data coming in on the screens.

That sounds great and all, but the truth is that Microsoft still failed to get builds out on a monthly basis. There were two instances (one for PC and one for phone) when a build didn’t arrive for longer than a month, and in both, Aul blamed infrastructure and engineering that “got in our way.”

In the case of the PC build, a week after the most recent build, the team brought in a bunch of shell changes, and it simply took a long time to achieve a certain level of stability. The monthly flight boundary was missed, and they didn’t have much choice but to wait until the next one.

For the phone build, no hotfix mechanism exists, so a full build is necessary every time. This means that if you find a bug during the ring progression process, you have to go back to the beginning. In this case, “honestly we just had bad luck,” Aul said — an individual bug set the team back three or four times in a row.

That problem has since been resolved by snapping code to a servicing branch and working from there if a bug needs to be fixed. Aul told me he doesn’t expect there will be delays anymore, and indeed, a month after the last build, a new one arrived yesterday, right on time.

Update: Microsoft releases another Windows 10 preview — 2 builds in 2 days! http://t.co/aJewTisllh pic.twitter.com/5c6Z2wvqoX — Emil Protalinski (@EPro) June 30, 2015

The big picture: Microsoft was engineering the process while also engineering the operating system. A lot was learned and a lot improved, but it wasn’t all smooth sailing.

Aul summarized this process. First, it took a while for Microsoft to figure out it was wrong in gauging demand. Then the team tried to react, but still wasn’t able to execute as fast as it wanted to. Now the group is able to promote builds based only on their health and stability, as opposed to technical limitations.

Aul pointed to the fact that Microsoft put out three builds in four days at the end of the Windows 10 preview, but I argued this was only because the OS was much more stable by that point. Not so, said Aul: “Even if we had three totally stable builds [earlier in the program], we would not be able to get those builds out in four days.” Essentially, the infrastructure improved just in time for the last handful of builds.

A/B testing

Microsoft gathered a wide variety of feedback during the Windows 10 preview program. Whereas previous Windows previews relied on community forums and indirect channels for commentary, the Windows 10 feedback systems added on top of that a dedicated feedback app (which remains in the OS even after launch), as well as popup surveys (which helped find new bugs, nominate feature suggestions, gauge emotional response on how features functioned, and even determine build quality to decide when builds should move up to the next ring).

What stood out to me about the Windows 10 preview tools was Microsoft’s ability to perform A/B testing. Sometimes called split testing, this is the process of comparing two very similar variants of the same product or feature to conclude which one performs better.

Microsoft was able to A/B test internally (including during user studies), starting in the Windows Vista betas and going up all the way to Windows 8.1 previews, but the company could not run them during any of its broad public previews. With Windows 10, “this is the first time we’ve been able to do that,” Aul confirmed.

When I asked him what the biggest A/B test was, he immediately answered it was the one for virtual desktops. This is an enthusiast feature in Windows 10 that lets you make virtual copies of your desktop view and switch between them.

The Windows 10 team started with the basic capability of having multiple virtual desktops, then added functionality to see what users wanted: Over the release, the team would look to incorporate the top feedback items. First basic switching was added, then keyboard support was requested and implemented, then users asked about being able to sort contents in the taskbar that map to the desktop, and so on. Aul considers virtual desktops a “great example for where the feedback really led the development of the feature.”

Eventually, an A/B test was required. For the taskbar, there were two different ways to show your running apps: Display all the apps or display a filtered view, where for any given desktop you only see what you’re running in that desktop. Microsoft offered the options to two different groups and asked each how they liked their variant.

Just as with any other A/B test during the Windows 10 preview, Microsoft then used that vote to decide the default behavior. In the case of virtual desktops, it was a split distribution: 52 percent liked filtered, 46 percent liked combined, and the rest were neutral. Microsoft decided to keep both options: Filtered became the default, but you could easily switch to show all apps.

For other A/B tests, the distribution was much more obvious, so the Windows 10 team limited functionality to just the popular behavior. In most cases, feedback drove how features were developed, but in other cases, the results were outright unanticipated.

For example, while developing Microsoft Edge, the team didn’t bother building a Home button because it wasn’t deemed to be functionality that a modern browser should have. It turned out that a lot of users wanted it, so Microsoft added the option (it’s not on by default, but you can easily flip the switch).

Not applying feedback

All this feedback, whether from the individual reports or from dedicated A/B tests, is a lot to process, understand, and act on. Commentary consisted of everything from crash reports, hangs, and system failures, all the way up to “I don’t like this icon.” In short, there were comments on “everything from underlying core quality to aesthetics,” as Aul put it.

Yet implementing user feedback isn’t as simple as just prioritizing all the requests and applying them one by one. The quantity and variety were “fantastic,” but it was still a bumpy road to travel.

There were many cases where feedback could only influence the Windows 10 team so much. Aul broke up the issues into three cases:

Branding case

Whenever Microsoft (or really any major business) needs to pick a new product name, the company has to go through a formal brand exercise, including ensuring it has all the rights and trademarks. Once a name is decided on, user feedback can’t do much.

As an example, Aul pointed to Microsoft Edge. Commentary ranged from people preferring the original Spartan code name all the way to arguments that Microsoft should have kept the Internet Explorer name (likely before it was made clear that both browsers would stay in Windows 10). In short, as Aul put it, “Thank you for the opinions,” but Edge is here to stay.

Iteration case

For smaller items that can be changed more easily, Microsoft often goes through a couple of different iterations, but eventually it just has to make a final decision. Aul: “We know we’re not going to make everybody happy, but we think we’ve evolved it to a better place.”

Examples include user interface elements like icons, what specific settings are called, how many options to include in basic versus advanced settings, and so on. The threshold for when Windows 10’s battery saver kicks in was a big discussion, but Microsoft eventually had to pick a default number (20 percent) — though you can still change the figure with the slider.

Long-term case

Then there are the bigger requests that require a lot of technical application and extra work to deliver exactly what users want. A change might seem trivial to the user, but there’s a lot more happening that Microsoft has to juggle.

The most obvious example was the request for a full Aero theme, which was the second biggest piece of feedback by volume, and which Microsoft didn’t deliver. Sure, the company added some transparency, but because the whole windowing system is designed to have thin borders and because, more importantly, Microsoft had made a commitment to app developers that the window color is their space (think Netflix and its iconic red for the window border), offering a full-blown Aero theme simply wasn’t in the cards.

And yet, a small percentage of users (though still a large number overall) want what Windows 10 can’t offer, at least in this first release. Aul wouldn’t commit to anything, but he did promise the Windows 10 team was “continuing to look at what could be done going forward.”

So if Aero was the second-most requested change, what was the first? The biggest piece of feedback by volume was about the fact that the task view and search bar were locked to the taskbar. Users wanted to be able to unpin them, so Microsoft added that option.

Attitude and obsession

While the above examples didn’t necessarily result in changes that users wanted, Aul insists Windows 10 was more of a feedback-driven release than any of its predecessors. In June, Microsoft said it had received 3 million pieces of Windows Insider feedback. As more and more testers joined the program, I’m told that number jumped to 5.8 million by July 1, and by the end of month, it had hit 7 million.

In fact, feedback items were present everywhere, from local team meetings all the way up to leadership team meetings. “I was really proud of everybody because there was no ‘Oh they just don’t get it, they just don’t understand, we just know better,'” Aul said.

Instead, the questions were more around figuring out an alternative, a compromise, a solution. “I loved it — it was such a great customer-focused way of working.”

He credits Microsoft CEO Satya Nadella and Terry Myerson, executive vice president of the Windows and Devices Group, with this approach. “They’re both very obsessed with doing the right thing for people,” Aul said.

In fact, it quickly became clear that individual development teams craved the user feedback. “The biggest problem that I’ve had this cycle is that all the feedback flows through my team, and being able to give the quantity and quality of interpretation, and analysis, and visualization of the feedback to the people who are demanding it,” Aul said. “We had to work hard to keep up with the demand.”

The solution? Data science clubs.

Microsoft employees would come in and learn how to access the source of all the feedback directly so they could perform pivots and explore what they wanted to know. This wasn’t just a basic SQL data dump: They were taught how to do complex analysis so they could infer some useful meaning out all the information.

Looking ahead

Windows 10 has launched, but some components you’d expect to have been removed are still there. The feedback app is open to all (although Aul hinted that Insiders will be getting special features in upcoming builds), and the team has been working to scale up its various systems to handle the feedback that will come from all Windows 10 users, not just Insiders.

But Insiders are still going to play a very important role going forward, because after all, Windows 10 development isn’t done. I asked Aul if Insiders will help test everything that goes out, including everything from very specific bug fixes all the way to completely new features. The short answer? “Yes.”

The long answer? “It’s complicated.” Insiders will still get full builds, but these will contain fixes that are actually applied to the current build — Microsoft will likely refer to this as the “consumer build.” In short, Insiders will be the guinea pigs not just for features but for fixes, too.

Here is how this will work. An engineer will code a fix for a given bug against the current branch (in order to ensure there is nothing in the latest build that might influence the fix), and the change will go out as part of the next build. Once the patch is validated, it will be taken as a separate hotfix (standalone patch rather than a new build) and sent out to affected users under the main branch.

In Windows 8.1, Microsoft deployed fixes to a small set of users first to see how they fared, but this was only done for driver updates. Now the company plans to broaden this approach to all types of patches. “Any package that we want to get stage validation on — except for security, because that we want to do immediately — we will do some sort of staged rollout,” Aul confirmed.

Insiders of course want to know how often they can expect new builds, now that Windows 10 for PCs is out. Aul didn’t want to put a number on it: “We’re still playing with the frequency.” He did say, however, that the team has considered adding another ring, and toyed with the idea of having a new build (say every week) that goes out to a subset of Insiders who really don’t care about stability.

Of the 6 million Windows Insiders, many will undoubtedly leave the program now that Windows 10 is available (in Windows Update, you can simply choose to stop receiving new builds). After all, the various testers joined at very different times with diverse expectations, from those who want builds as frequently as possible to those who expect builds to be as stable as the last few that were released weeks before launch. But Aul isn’t worried: “I will say that based on my interactions with Insiders, I’m betting the vast majority will stay; it will be 80+ percent.”

That still means losing over 1 million testers. When he looks to the future, though, Aul can’t help but be positive. After all, everything we talked about isn’t going to be filed away in some cabinet. It’s just going to be applied to Windows 10 going forward.

“I’m excited about the future for Windows. I feel like we are on the right track with the way we’re thinking about this and having it be a continuous evolution versus these big three-year drops that people get stuck on. I think it’s going to be great for the world, great for everyone to be running the latest, secure, up-to-date OS.”

In other words, Aul’s team doesn’t have time to take a break. His favorite way to refer to Windows 10 during our interview was as a “great flywheel.” And as he said, it’s only beginning to turn.