Windows 10, in normal usage and typical configurations, will send quite a lot of information to Microsoft. Windows 8, in normal usage and typical configurations, will also send quite a lot of information to Microsoft. On the other side of the fence, OS X, in normal usage and typical configurations, will send some information to Apple. It's hard to imagine a modern day operating system that doesn't do this, at least to some extent.

For example, Windows, OS X, iOS, and Android all sport app stores. Buying from those app stores requires payment information, typically including a name, address, and credit card number. Those stores may have age-based restrictions, so might require a date of birth. Those purchases are, of course, tracked, to both ensure that developers get paid and that popularity lists can be constructed.

Different platforms have different twists on this. The iOS App Store, for example, can show you apps that are popular nearby; it must be recording some location data when purchases are made so it can make this correlation. Windows 10 goes in a different direction. It includes personalized "Picks for you" and can suggest particular apps, based on their similarity to apps that have been previously installed. This currently doesn't seem very intelligent; it will sometimes recommend apps that are already installed.

Continuing evolution

It's probably fair to say that Windows 10 goes further in this kind of thing than previous operating systems. But it does so not as an outlier or some major break from past behaviors, but as a next step in a continuous process of making operating systems more connected, and to make data collection and analysis more extensive.

Some of this is an obvious repercussion of user-facing features. Windows and OS X both offer search (in Windows through Cortana, in OS X through Spotlight) that spans both the local system and online. Naturally, the online portion of that search is sent to the respective company.

Similarly, Siri and Cortana use online systems for their speech recognition. Siri maintains personalized but anonymous speech data on each user to improve speech recognition accuracy.

Cortana similarly personalizes speech models; corrections made to her transcriptions are used to adjust speech models and improve dictation accuracy. Perhaps more contentiously, information about appointments and contacts' names and nicknames is also incorporated into these models so that Cortana can better recognize the people and events that you're talking about.

Windows uses similar personalization systems for both handwriting (using a stylus) and typing. This is used to improve text recognition algorithms so that more handwriting is recognized, and so that autocomplete can make more relevant suggestions. Microsoft regards these personalization features as so important that you can't use Cortana without them. Apple, similarly, makes Siri's personalization an integral part of the service that can't be disabled without disabling Siri entirely.

There are two common reasons for this kind of data collection. The first is that these services simply need to know these things to be useful. Siri needs to know the names of your contacts to be able to set up calls or send messages. Cortana needs to know when and where your appointments are to tell you when you need to leave the home or office to get to them.

But there's a deeper reason: the software powering these capabilities is fundamentally heuristic, using approximation and guesswork to generate its results. Traditionally this wasn't the case; a hardware keyboard with no autocompletion doesn't need any fancy heuristics, it just needs to directly map key presses to characters. But speech recognition, software keyboards of all kinds, and handwriting recognition don't have this precision. The software driving these things has to construct and evaluate a range of different possible interpretations and then pick a most likely option among those interpretations.

Sometimes that software will pick the wrong interpretation; sometimes it won't even generate the right interpretation at all. Analyzing real-world usage data gives companies like Microsoft and Apple (and Apple's speech recognition provider, Nuance) the opportunity to make their heuristics better. This software is all fundamentally data-driven, and as intelligent systems such as Cortana, Siri, and Google Now become more capable and more advanced, they're going to want to slurp up ever more data.

There are further opt-in features that can expose even more data. Cortana, for example, can read your e-mails to find package tracking numbers and flight bookings, which she can then tell you about. This is an opt-in feature, and it means that Cortana will read your e-mails. This e-mail reading appears to occur locally, on each device, but Microsoft will still learn at least some things about your e-mail—for each flight or tracking number Cortana finds, she'll query Microsoft's systems to learn more about them. This should be obvious; your phone doesn't know whether a flight is delayed or just how lost your package has gotten, so naturally online services have to be queried.

The power of the cloud

One of the most important online services in wide use is essentially crowd-sourced: location. Microsoft, Google, Apple, and no doubt others, operate location services. While GPS provides a way for devices to figure out where they are without sending any data, all three companies have built systems that allow for location to be determined without GPS; instead, they use the IDs of Wi-Fi networks that a phone or computer can see.

These databases were often primed using data collected by street view camera cars—itself a contentious practice—but is further extended and updated using data collected and sent by end-users' phones and PCs: each time a device queries the location service by asking it where the nearby Wi-Fi IDs are, the location service might remember those Wi-Fi IDs and their inferred location.

This is very useful, but obviously has privacy implications: the online service providers can track which devices are making which requests, which devices are near which Wi-Fi networks, and feasibly might be able to track how devices move around. The service providers will all claim that the data is anonymized, and that no persistent tracking is performed... but it almost certainly could be.

Indeed, that same "useful but with privacy implications" trade-off is the recurring theme of modern systems. Siri, Cortana, Google Now—they're all useful. But they have privacy implications. Syncing files to OneDrive or Google Drive is useful, but it creates some privacy exposure. Using a Microsoft Account to log in to Windows, sync settings between PCs, and have access to the same apps, or using a Google account to log in to ChromeOS, for the same benefits, are both useful things, but they carry a privacy trade-off.

These trade-offs can bite people. Microsoft, in common with most other American online service providers, will generally comply with court orders demanding data and will cooperate with police investigations. Google, for example, has contacted the FBI when its algorithm detected that a user of its Picasa photo service had uploaded child pornography, and Microsoft performs similar analysis of files on OneDrive. Microsoft received a torrent of bad press when it revealed that it had looked through a Hotmail user's inbox while investigating piracy of Microsoft's own software, though since then the company has promised to hand over such investigations to law enforcement forces rather than conducting them internally.

One of the more contentious aspects of this is that Windows 10, like Windows 8 before it, has the ability to encrypt hard disks and back up the encryption keys to OneDrive (or, for corporate machines that are part of a Windows domain, Active Directory). This capability is not mandatory; while some have claimed that the only way to enable encryption without storing keys in OneDrive is to upgrade to Windows 10 Pro, this is untrue. If you want to put the backup key onto a USB drive instead of storing it online, that's possible.

This is, once again, a trade-off. Drive encryption has some value, especially on laptops, but it also has some risk; lost keys often mean lost data. For average home users, having an online key backup may well be a sensible risk/reward trade-off; the potential loss of privacy if a key is seized or stolen from OneDrive and subsequently used to decrypt their hard disk is likely outweighed by the extra protection that disk encryption provides. The default scheme may make your data less private if you're concerned about government seizure of your assets, but arguably more private if you're concerned about a crackhead stealing your laptop.

Windows 10 lets you opt out of these things if you prefer, but it's a less capable, less useful platform if you do—just as iOS, Android, ChromeOS, and even OS X become less useful if you disable every part of their online cloud service connectivity. This is a trend that isn't going to go away.

One other facet of modern-day computing is perhaps a little less welcome, but equally likely to be a fixture. Like Windows 8 and iOS, Windows 10 includes a persistent, anonymous advertiser ID. This advertising ID, which is enabled by default but can be turned off, is exposed to in-app advertisers to track your activity and in principle show ads that are more relevant to your interest. Turn it off and you'll get untargeted ads. The privacy concerns here are much the same as cookies on the Web; marginally better ads, at the expense of giving advertisers a somewhat better idea of the things that you're interested in.

A surprising change

There is one setting in Windows 10 that's a little more unusual, however. Windows has long had the capability to report basic usage data to Microsoft. This includes, for example, data about any programs or drivers that crash, so that Microsoft can detect any widespread problems. This facility has also included the ability to optionally send more detailed crash reports to the company. These optional reports can potentially include snapshots of the memory being used by processes, and these snapshots can include personal data. So far, so ordinary; OS X and other operating systems have a similar capability, and many applications have equivalent reporting facilities implemented at the app level. The data that these facilities can collect can be invaluable for detecting problems and developing fixes.

On top of this, many Microsoft programs, including Windows itself, have a thing called the Customer Experience Improvement Program. This is, traditionally, an opt-in program. When enabled, Microsoft collects various kinds of usage information. For the operating system, this might include, say, which programs are installed, how often each Control Panel is used, or what the preferred settings for Explorer windows are. For an application, it might include, say, which menu items are used most often, how many documents are opened simultaneously, or whatever else might be appropriate.

Microsoft asserts that the information collected is anonymous (tied to a randomly generated Id rather than any personal identifier), and filtered to remove any personal information it might accidentally collect. It also promises that the information collected from these schemes will only be used for diagnosis and development, never for advertising or sales.

Windows 10, however, shakes this up. Instead of two separate systems—one for error reporting, a second for collecting usage data—both have been rolled into one combined setting. This setting has four positions: off; basic error reporting and simple device capability reporting; enhanced diagnostic tracking that extends the basic information with more detailed error reporting, and usage telemetry; and full data, that adds process memory snapshots to the enhanced data. This means that there's no way to participate in error reporting without also participating in usage tracking, and vice versa.

Further, the "off" option is only available in Windows 10 Enterprise. The common home user versions of Windows, Windows 10 Home and Windows 10 Pro, always collect (and report) at least "Basic" level information and no way to turn off the feature entirely.

The genuine privacy implications of this seem slight, but for those who absolutely do not want to send anything to Microsoft, Windows 10 is certainly a regression. Is Microsoft poring over this data, trying to sniff out the details of Windows users' lives and figure out all their secrets? It's highly unlikely—but the removal of the ability to turn off this reporting is nonetheless strange, and there's no clear reason for it.

Microsoft describes its data collection and usage policies on its privacy page. Some of the descriptions are a little fuzzy, though overall the page gives a clearer idea of what Windows 10 and other services collect, and why.

But the broad pattern is clear. The days of mainstream operating systems that don't integrate cloud services, that don't exploit machine learning and big data, that don't let developers know which features are used and what problems occur, are behind us, and they're not coming back. This may cost us some amount of privacy, but we'll tend to get something in return: software that can do more things and that works better. For many of us the benefits of these design decisions will be worth it. Those who think they aren't will continue to have to hunt through options to turn these features off... if they can.

Listing image by Ruth Suehle, opensource.com