The Korean version of this article appears here. 이 글의 한국어 버전은 여기 있습니다.

Privacy threat in 2019

Everything we do on the internet can be recorded. What we search, what we click, how long we remained on that particular article, how far we scrolled through the article, on what part of the article our mouse pointer most frequented, what we typed and erased before actually posting it on the Facebook—Every single action can possibly be tied to our account and stored indefinitely in the database.

Who’s database, exactly? On wired.com , around 50 different companies’, it seems.



Surprising fact: Many VPN companies also do this.

And because these firms typically sell the collected data to “data brokers,” it’ll possibly end up in more than 50 companies’ servers.

Why is this problem? With sufficiently “big” data, analytics companies can statistically infer what is not directly observable in the data itself. Here are some examples: Just by looking at people’s Facebook likes (average 170 “likes” per subject), a group of three researchers from University of Cambridge and Microsoft were able to “accurately predict” their sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. Another group of researchers successfully applied machine learning to people’s Instagram photos to predict whether they’ll have depression in the future.

If a few researchers with limited access to data can predict this much, it is conceivable to think that companies with more comprehensive data and powerful computational devices can gain much deeper insight into our lives.

Although these data and statistically-inferred “insights” are currently used to just optimize some advertising for most of us, which is not harmful to us by itself, there’s no guarantee that it will remain like that in the future. Data can definitely be used against us, if circumstance permits.

Many people resort to VPNs to guard against these threats, but in my opinion, they are misguided.

VPNs does nothing about trackers like Facebook or Google.

VPN was not developed with privacy in mind. The major problem with VPNs is that VPN does nothing other than hiding your IP address. Cookies, browser fingerprints, TCP/IP fingerprints, and sometimes local IP address (via WebRTC) are still directly exposed to trackers even when “behind” a VPN.

Cookies can be cleared easily, but browser fingerprints and TCP/IP fingerprints can’t. Although there is a plethora of “fingerprinting-resistant” browsers out there, for example Brave and Safari, but they don’t work. Fingerprinting protection works by making everyone’s fingerprint the same, thereby increasing what is called anonymity set. Brave and Safari fail to make every Brave user’s, or every Safari user’s fingerprint the same, in part because blocking every single fingerprinting vector is very hard and sometimes infeasible (breaks websites). For example, Brave in its default setting leaks user’s window size, OS and its version, language setting (Brave is en-US regardless of OS language if not manually changed by user, but Safari leaks OS language by default), timezone (GMT+0900 Korean Standard Time), battery status, WebGL fingerprint, and HTML5 canvas fingerprint.

And in part because there are so many user-modifiable settings available. AFAIK, Brave’s default shield setting is like this:

3rd-party cookies blocked

All scripts allowed

3rd-party device recognition blocked

As this default setting lets the first party to perform every javascript and device fingerprinting attack, as seen in browserleaks.com for example, some privacy-conscious people might change this to stricter setting. But the catch is that, because relatively few people change these settings, those who do “stick out”, ending up with more unique fingerprints. Those who don’t change also stick out because, as they have no fingerprinting protection against first-party trackers, the first-party trackers can just uniquely fingerprint them. You lose no matter what.

It’s not the end of the story. VPN users, already fine-segmented by their relatively unique browser and TCP/IP fingerprints, can be further segmented by their IP addresses. Because relatively few people change their VPN server location during 1 session, trackers can be pretty sure that two users with exactly same browser fingerprint and TCP/IP fingerprint but different IP address, one 185.213.155.133 (Mullvad de-fra-003 server) and the other 185.213.155.134 (Mullvad de-fra-004 server), are indeed different users. (Tor Browser is immune to this type of segmentation, thanks to very-frequently-changing IP addresses and first-party stream isolation, both of which works transparently without requiring any end-user interaction.)

Even if we assume that the VPN user frequently changes server location, relatively few people change their entire VPN provider, partly because, to do so, he has to have several different VPN subscriptions, which can be quite expensive. So 1 user = 1 VPN subscription, and therefore a user with NordVPN IP and another user with ExpressVPN IP can be reliably distinguished.

Furthermore, VPN users with exactly same browser fingerprint, TCP/IP fingerprint, and public IP address can be further segmented by their internal IP addresses ( 192.168.x.x , 10.x.x.x , 172.x.x.x ), which leaks via WebRTC. This internal IP address remains stable during 1 VPN session. Using VPNs known to be WebRTC-leak-free doesn’t help because, even though such VPNs don’t leak user’s home IP address, they still leak user’s internal IP address in the VPN network which is allocated by the VPN server itself.

For example, consider this hypothetical VPN user having 4 different internal/external IP addresses:

Home external IP address: 81.4.175.181 (personally identifying)

(personally identifying) Home internal IP address: 192.168.1.106 (not personally identifying) ← This can be sometimes leaked via WebRTC, for example when using a “VPN router”.

(not personally identifying) ← This can be sometimes leaked via WebRTC, for example when using a “VPN router”. VPN external IP address: 185.213.155.133 (Mullvad de-fra-003 server)

(Mullvad de-fra-003 server) VPN internal IP address: 10.8.1.12 (not personally identifying) ← Usually, this one is leaked via WebRTC. Stable during 1 VPN session.

The internal IP address is, in itself, not personally identifying, but it can certainly be used to segment users and track a (pseudonymous) user across websites because it is stable.

Somewhat related concept is cross-site linkability. Because a user’s browser fingerprint, TCP/IP fingerprint, and sometimes VPN’s external IP address and internal IP address, stays the same during at least 1 VPN session, companies can track a user across websites with ease. That is, if a VPN user log-ins to Google to get some work done, closes the browser, and opens a new incognito window to watch porn, such porn-watching activity (not only the mere fact that the user watches porn, but also his specific preference, frequency, and duration) can reliably be linked to the user’s Google account, constituting what is called “shadow profile”, which doesn’t appear in Google’s My Activity web page and therefore not deletable. Which is not a concern with Google: Google ex-CEO Eric Schmidt once said, “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.” And all these happen even though the user is “behind” several chained VPNs using different VPN providers outside of 14-eyes countries to “distribute” trust…

You can’t solve this problem by changing VPN server location whenever accessing different websites, as the web browser is not the only program that accesses internet. Suppose the user has Google Drive installed on his PC. Then Google knows this:

2019-05-03 13:01-13:12. User’s Google Drive connects to Google server using VPN1 address.

2019-05-03 13:01-13:12. Someone with specific browser fingerprint connects to website 1 using VPN1 address.

2019-05-03 13:12. User’s Google Drive is disconnected (What happened: VPN’s “kill switch” function disconnected user’s PC from the internet while changing server location…)

2019-05-03 13:12. Someone with specific browser fingerprint disconnects from website 1.

2019-05-03 13:13-13:49. User’s Google Drive connects to Google server using VPN2 address.

2019-05-03 13:13-13:49. Someone with specific browser fingerprint connects to website 2 using VPN2 address.

2019-05-03 13:49. User’s Google Drive is disconnected

2019-05-03 13:49. Someone with specific browser fingerprint disconnects from website 2.

The pattern is obvious. Of course, you can try to thwart this pattern analysis by alternating between several different VPN providers, using several different web browsers, changing browser setting here and there and adding and deleting browser plugins all the time, and unpredictably turning on and off several different PCs and mobile devices connected to your home router, each device connected to same or different VPNs. But it’s very time- and energy-consuming especially in the long term, with dubious benefits.

Tor Browser addresses these problems by implementing first-party isolation (which isolates cookies and other offline caches from different domains), stream isolation (different circuits for each first-party), TCP/IP fingerprinting protection (the website sees the exit relay’s TCP/IP header, not user’s), and strong browser fingerprinting protection (see their design document).

VPN’s ISP retains logs.

Many VPN providers boast their strict “no-logs” policy. Some venerable VPNs even take extreme measures such as operating their servers in RAM-only mode, ensuring absolutely no log is left even by mistake.

A no-logs VPN is very much like a web browser with “incognito mode” turned on. Top-class engineers from Mozilla, Google, Apple, and Microsoft makes sure that the browser leaves absolutely zero logs on your computer when incognito mode is on, but we all know it doesn’t matter. Even though the browser itself dosen’t log, our ISP will certainly log. (And that’s why we use VPNs in the first place!)

Same thing happens with VPN servers. Even though the VPN server itself retains zero logs, upstream network providers such as the datacenter (in which the VPN server is placed), the datacenter’s ISP, and the ISP’s Autonomous System will certainly log. Knowing this, the Dutch police went straight to EarthVPN’s datacenter company (unrelated to the VPN company itself) and obtained datacenter logs to catch a user, bypassing the VPN company entirely. EarthVPN has strict “absolutely no logs” policy, which is a technically true statement.

Tor Browser addresses these problems by chaining 3 relays instead of 1 (to prevent any one relay from seeing both the source and the destination; see EFF’s illustration), cell padding (to thwart NetFlow log analysis; see their spec document), stream isolation (to prevent any one circuit from capturing a user’s entire traffic), New Identity feature (to prevent circuit re-use), and constantly changing any established circuit every few minutes (to prevent one circuit from capturing a user’s entire traffic to one domain).

If Tor is better than VPN, then why do all the tech magazines and blogs constantly say “Tor is dangerous, VPN is safe”?

Whenever the magazine’s or blog’s reader clicks a link to a VPN company’s website and buys a VPN service, the magazine or blog receives a hefty commission from the company.

So they have every incentive to smear Tor and “nudge” people to buy a VPN.

Actually, some VPN providers and tech blogs even spread outright false information, like ‘VPN makes your connection end-to-end encrypted’,

Although ExpressVPN doesn’t directly say their service makes your connection end-to-end encrypted, their illustration and video are certainly implicating such.

while noting ‘Tor traffic is not encrypted between the exit relay and the destination site’, which is technically true.

So their reasoning goes like, ‘therefore VPN is better’.

Blatantly false information.

Not true.

Well, in reality, VPN certainly doesn’t provide end-to-end encryption, and everyone positioned between VPN server and destination website can certainly snoop.

Moreover, using a VPN can sometimes be more dangerous than not using VPN/Tor at all because VPN can MITM HTTPS, unlike Tor or ISPs. Many VPN providers silently install VPN’s root certificate during VPN client installation (which requires administrative privilege), or make users install VPN’s root certificate manually, although it is not necessary for VPNs to function.

not good.

Now your VPN provider is in a position to capture all your internet traffic, and it has root certificate installed on your system. So it can theoretically perform man-in-the-middle (MITM) attack even on encrypted HTTPS communication whenever it wants to. Some VPNs even advertise their capability to perform MITM on their customers, albeit for a legitimate reason (ad-blocking). In contrast, Tor exit relays can’t MITM HTTPS because it doesn’t have its (the exit relay’s) root certificate installed on your system.

All in all, Tor is not dangerous, VPN is not safe, and tech blogs are not unbiased, despite what they say.