Is Microsoft reading your Skype instant messages?

That’s the inflammatory allegation that a UK-based security blog made in a post earlier today:

Anyone who uses Skype has consented to the company reading everything they write. The H's associates in Germany at heise Security have now discovered that the Microsoft subsidiary does in fact make use of this privilege in practice. Shortly after sending HTTPS URLs over the instant messaging service, those URLs receive an unannounced visit from Microsoft HQ in Redmond.

That's a pretty dramatic conclusion, based on very thin evidence.

Heise Security, the German branch of the same publishing company, received a tip from a reader alleging that he had “observed some unusual traffic” following an IM session over Skype. So they performed a single experiment:

Heise Security then reproduced the events by sending two test HTTPS URLs, one containing login information and one pointing to a private cloud-based file-sharing service. A few hours after their Skype messages, they observed the following in the server log: 65.52.100.214 - - [30/Apr/2013:19:28:32 +0200]

"HEAD /.../login.html?user=tbtest&password=geheim HTTP/1.1"

As an aside, if you're sending URLs that contain login credentials in plain text, you already have big security problems. The same is true if your session ID allows anyone to masquerade as you simply by clicking a link.

That IP address, 65.52.100.214, is indeed controlled by Microsoft, as a cursory inspection of DNS records confirms. But after doing some investigating of my own, I’ve concluded that the reason for the mysterious visit is almost certainly innocent.

Microsoft doesn't normally discuss the details of its security infrastructure. However, I’m reasonably certain that address is part of Microsoft’s SmartScreen infrastructure, which the company uses to identify suspicious and dangerous URLs so that it can block malware, phishing sites, and spam in Internet Explorer, Outlook.com, and other Microsoft services. Presumably, Skype picked up SmartScreen filtering when it took over the functions previously handled by Windows Live Messenger. (Microsoft has not publicly confirmed that change and declined a request to comment on this story.)

First, let’s dismiss the implication that someone at Skype is following links from its customers and “reading everything they write.” That HTTP request uses the HEAD method rather than a GET. As the relevant portion of the HTTP standard explains, this method specifically doesn’t retrieve content:

This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

Testing hyperlinks to see if they’re safe, perhaps? That’s the official explanation Microsoft gave to the original authors of the article when they asked:

In response to an enquiry from heise Security, Skype referred them to a passage from its data protection policy: "Skype may use automated scanning within Instant Messages and SMS to (a) identify suspected spam and/or (b) identify URLs that have been previously flagged as spam, fraud, or phishing links."

Heise Security was skeptical of that explanation. Wouldn’t Microsoft/Skype have to look at the contents of a given page to determine whether it’s a phishing site or spam? No. Microsoft’s SmartScreen technology works by examining the reputation of a host, and it uses a wide range of markers to assess that reputation. This 2010 post from the team responsible for the SmartScreen technology explains how it looks at URLs:

Obviously SmartScreen's reputation systems learn that particular URLs are bad—that is the first step—but we go much further. Every URL is hosted on a domain. … Abusers will often host hundreds or thousands of individually abusive URLs on a single domain. With the right evidence, SmartScreen's reputation system will flag whole domains as abusive. URLs and domains are concepts that let humans refer to computers. But every computer that's directly on the Internet also has a numeric code, called its IP address, that lets other computers refer to it. For example, 109.22.33.142 might be the IP address of the computer that's running the web server that's hosting the canada-pharmacy.us domain. SmartScreen's reputation system tracks these as well and will mark specific web server IP addresses as abusive. SmartScreen will also generalize to other computers "in the neighborhood" of known bad ones. For example, IP addresses are often allocated in blocks, and it's likely that the person who owns 109.22.33.142 also owns 109.22.33.143 and .144 and .145. We use knowledge about the way infrastructure blocks are allocated–into subnets, ASN (Autonomous System Number) blocks, the way message routing works, and more–to figure out what other computers the abusers own, and prevent those abusers from attacking Microsoft customers.

Let's be clear: SmartScreen doesn't scan every link in every IM or email. It doesn't need to. An algorithm determines that a message contains a link (identified by a text string like http:// or ftp://). Most links are from known safe domains. Those test links are unfamiliar and possibly suspicious, so the SmartScreen servers asked for more information from the server, using a HEAD (not GET) request, with the exact URL that was included in the original Skype message.

I spent 30 minutes or so poking around some particularly dark corners of the Internet, where the webmasters had inadvertently left their server logs and other incriminating documents open to the public. I found evidence that this particular Microsoft IP address had queried servers containing pages filled with PayPal usernames and passwords entered by phishing victims. That address was in logs from warez sites hosting downloads of pirated games and movies; it was in records kept by several spammy-looking sites offering "pharmaceuticals" for sale; and I even found it on one BBS where the site’s owners were alarmed by a possible Microsoft intrusion until they determined that the credentials of one of their administrators had been compromised and used to send spam to their members.

I couldn’t find any examples of legitimate sites complaining about unauthorized access from this IP address. Update: And contrary to heise Security's assertion, I found many examples of plain HTTP links that had been scanned by SmartScreen.

In short, Microsoft’s explanation checks out. If you share a URL in a Skype instant message, there’s a possibility (not a guarantee, just a chance) that a SmartScreen server will ask for more information about the server from which that URL originated. It will then use that information to help determine whether that link is legit. If someone on Skype sends you a link to a phishing site or one containing malware, you should know, right? That's the point of the SmartScreen feature.

There’s no evidence that anyone, human or machine, is reading your confidential messages. There's no evidence that the content of the messages is being examined at all. Automated scanning of some URLs within instant messages isn't the same as "reading everything you write." This is roughly equivalent to what mail servers do when they check the header information on an incoming message to determine whether it's spam. That's a legitimate security function, not an invasion of privacy.

You can put that tinfoil hat away, at least for now.