Very good question. Metadata is insanely powerful, and most people just don't realize how powerful it can be. Fortunately, it's not that hard to discover if you do a little digging into how technology and the internet works.

I'm of the opinion that it's not very easy to hide online. In most cases, if you're a small fry, nobody's going to pay attention to you much beyond the first investigation into your activities if nothing shady comes up.

National Security Letters and Subpoenas

Thanks to National Security Letters, the NSA, FBI, DEA, CIA, and other alphabet agencies can work together to request lots of data from American companies. If you wish to operate on U.S. soil, you are subject to U.S. laws and regulations.

Due to the Stored Communications Act, while companies aren't required to store information on users, they are required to handover any kind of information that they've stored.

In some repressive countries, the police just need to show up and force you to do anything they want, and you'll have no choice if you want to live. Worse still, the government may own those internet service providers.

What is a Device ID?

When I talk about "Device IDs" in this post, I refer to anything that can uniquely identify your machine: Operating System keys, hardware Device ID enumeration, browser fingerprints, accounts that you're logged into, et al. Assume it's a piece of information that uniquely identifies you.

Device ID Correlation

Many VPN users think that they can chain multiple VPNs on a single machine, and that they're "safe." They are usually reconnecting to multiple services under different IP addresses, but their Device ID remains the same.

Even if they were to shut down their service accounts (Skype, Steam, Battle.net, et al), they're generally still leaking information to Microsoft, Apple, et al. You're requesting updates periodically from Microsoft. You're also often sending packets to Microsoft servers every now and then.

And if you shut down your service accounts prior to performing illicit actions, it points to a clear pattern of trying to hide yourself, and will help give you away. In many cases, you're damned if you do, damned if you don't.

Even if you use Linux, you're usually requesting information from update servers periodically. For example, Ubuntu frequently phones home to get information. Ubuntu will use whatever IP address you happen to be using at the time.

Most major service providers record every single IP address that you've ever used to connect to those servers. Either you, or someone else using your account. When you use Windows or OSX, the information about your computer is sent to their servers. All IP addresses associated with that device ID will help investigators find you.

Note that most of the IP addresses in these images are local IP addresses so I don't end up posting someone's address and causing them annoyance.

These investigative methods are used by good guys, and bad guys. In some countries, having the wrong beliefs, being the wrong race, or opposing the wrong people will usually result in your death.

Real World Example of Successful Device ID Correlation

It appears that with the recent Apple Hack, the attacker attempted to hide his identity behind proxies, but connected to an Apple server via SSH, which somehow revealed his actual Mac serial number during the connection process:

"Two Apple laptops were seized, and the serial numbers matched the serial numbers of the devices which accessed the internal systems,"

IP Address Correlation

Another method, which usually always assists Device ID analysis, is the IP Address Correlation. I called it an attack in the picture, but it's really an investigative method. And a damn good one at that.

It isn't just your IP addresses used. It's the time range that such IP addresses are used.

This can be used to detect those who are clever enough to hide behind multiple proxies without leaking any additional information, other than the IP address itself.

Quantum Insertion

This is likely just a classic Man-in-the-Middle (MITM) attack on non-HTTPS websites, or HTTPS websites with improper configuration, which was given a fancy name. Nothing more.

This may only reserved for high-value targets, or abused with reckless abandon; using this attack against security researchers is actually a painfully stupid thing to do. It's best if the target is completely unaware. With the Snowden leaks, any researcher worth their salt will be reading about these kind of attacks because "bad guys" will be doing it too.

Quantum Insertion is just a MITM attack which could allow a threat actor to intercept a request to retrieve a file from a website, or even a fake website, and then replace it with another one entirely. For example, if you're downloading known hacking tools, a man in the middle attack, if successful, will redirect your download and replace it with an infected variant of the same file.

In other cases, a threat actor could insert infected Flash or Javascript files into the page and exploit a vulnerability.

With a successful MITM attack, an actor can gain access to the machine you're on, and easily find out who you really are by unraveling your VPN chain from the source. At the very least, they can get the first entry point of your IP address in the proxy chain.

They could also steal your password through unencrypted or intercepted connections. Password re-use can help reveal your identity as well (see page 57 in the document, or 59 if using "go to page").

Examples of things Vulnerable to Quantum Insertion / MITM attacks

Typical targets would tend to be high-traffic downloads or websites that are unencrypted. This could include websites that only provide insecure methods for downloading the files.

For example, Nvidia drivers default to non-https downloads with no verifiable hashes or signatures available on their HTTPS site, and many Linux distributions also serve ISOs over unencrypted connections - some of which even fail to provide verifiable hashes. What's more, Nvidia drivers frequently phone home with device IDs.

Popular security products such as Burp Suite require Jython to run modules. Jython has no https certificate on their website, does not provide hashes for the latest version of their software, and they only provide SHA1 and MD5 hashes for older versions. Given that Jython has no actual https certificate, these hashes could easily be modified to give you a false sense of security. Older versions of software are frequently outdated and may lead to compromise of your system, or further compromise through privilege escalation.

XKeyscore and PRISM

XKeyscore and Prism have all of the abilities I've mentioned here, and much more. Say what you want about the NSA and their generally ineffective dragnet surveillance: they are damn good at figuring things out after the fact. This helps prevent people from getting away with committing crimes in the long run. It also helps bring the bad guys to justice... do what you want with that information.

Is it just the government that can track you this way?

Unfortunately, no. Repressive regimes, corporations, advertisers, and the like, are able to create and maintain such databases in order to track users around the web.

Many companies also share your data with third parties. It's even there in the EULA. Read the EULA next time, and it will reveal connections to many of the things I've mentioned here. Remember, if it's free, then you are the product. And in many cases, even if it isn't free (Windows, et al), you are still the product.

And in repressive regimes, if companies such as Apple and Microsoft wish to do business, they are required to adhere to the laws of that country. This is one of the many ways that repressive regimes can track down individuals behind Yahoo, Hotmail, Google, and other accounts. And if those companies refuse to provide information, then they'll face fines, expulsion, or worse.