by

by Steven Englehardt [0], Gunes Acar, and Arvind Narayanan

So far in the No boundaries series, we’ve uncovered how web trackers exfiltrate identifying information from web pages, browser password managers, and form inputs.

Today we report yet another type of surreptitious data collection by third-party scripts that we discovered: the exfiltration of personal identifiers from websites through “login with Facebook” and other such social login APIs. Specifically, we found two types of vulnerabilities [1]:

seven third parties abuse websites’ access to Facebook user data

one third party uses its own Facebook “application” to track users around the web.

Vulnerability 1: Third parties piggyback on Facebook access granted to websites

Facebook Login and other social login systems simplify the account creation process for users by decreasing the number of passwords to remember. But social login brings risks: Cambridge Analytica was found misusing user data collected by a Facebook quiz app which used the Login with Facebook feature. We’ve uncovered an additional risk: when a user grants a website access to their social media profile, they are not only trusting that website, but also third parties embedded on that site.

We found seven scripts collecting Facebook user data using the first party’s Facebook access [4]. These scripts are embedded on a total of 434 of the top 1 million sites (UPDATE: see clarification below). We detail how we discovered these scripts in Appendix 1 below. Most of them grab the user ID, and two grab additional profile information such as email and username. We are unable to determine whether first parties are aware of this particular data access [5].

The user ID collected through the Facebook API is specific to the website (or the “application” in Facebook’s terminology), which would limit the potential for cross-site tracking. But these app-scoped user IDs can be used to retrieve the global Facebook ID, user’s profile photo, and other public profile information, which can be used to identify and track users across websites and devices [6].

Company Script Address Facebook Data Collected OnAudience* http://api.behavioralengine.com/scripts/be-init.js User ID (hashed),

Email (hashed), Gender Augur https://cdn.augur.io/augur.min.js Email, Username Lytics** https://d3c…psk.cloudfront.net/opentag-54778-547608.js?_=[*] User ID ntvk1.ru https://p1.ntvk1.ru/nv.js User ID ProPS^ http://st-a.props.id/ai.js User ID (has code to collect more) Tealium^ http://tags.tiqcdn.com/utag/ipc/[*]/prod/utag.js User ID Forter^ https://cdn4.forter.com/script.js?sn=[*] User ID

* OnAudience stopped collecting this information after we released the results of a previous study in the No Boundaries series, which showed them abusing browser autofill to collect user email addresses.

**(Added 2018-04-22; 5:55pm): The script loaded by Opentag tag manager includes a code snippet which accesses the Facebook API and sends the user ID to an endpoint of the form https://c.lytics.io/c/1299?fbstatus=[...]&fbuid=[...]&[...] This snippet appears to be a modified version of an example code snippet found on the lytics.github.io website with the description “Capturing Facebook Events”. The example code appears to provide instructions for first parties to collect Facebook user ID and login status.

^ Although we observe these scripts query the Facebook API and save the user’s Facebook ID, we could not verify that it is sent to their server due to obfuscation of their code and some limitations of our measurement methods.

While we can’t say how these trackers use the information they collect, we can examine their marketing material to understand how it may be used. OnAudience, Tealium AudienceStream, Lytics, and ProPS all offer some form of “customer data platform”, which collect data to help publishers to better monetize their users. Forter offers “identity-based fraud prevention” for e-commerce sites. Augur offers cross-device tracking and consumer recognition services. We were unable to determine the company which owns the ntvk1.ru domain.

Vulnerability 2: Tracking users around the web with the Facebook Login service

Some third parties use the Facebook Login feature to authenticate users across many websites: Disqus, a commenting widget, is a popular example. However, hidden third-party trackers can also use Facebook Login to deanonymize users for targeted advertising. This is a privacy violation, as it is unexpected and users are unaware of it. But how can a hidden tracker get the user to Login with Facebook? When the same tracker is also a first party that users visit directly. This is exactly what we found Bandsintown doing. Worse, they did so in a way that allowed any malicious site to embed Bandsintown’s iframe to identify its users.

We discovered that the iframe injected by Bandsintown would pass the user’s information to the embedding script indiscriminately. Thus, any malicious site could have used their iframe to identify visitors. We informed Bandsintown of this vulnerability and they confirmed that it is now fixed.

Conclusion

This unintended exposure of Facebook data to third parties is not due to a bug in Facebook’s Login feature. Rather, it is due to the lack of security boundaries between the first-party and third-party scripts in today’s web. Still, there are steps Facebook and other social login providers can take to prevent abuse: API use can be audited to review how, where, and which parties are accessing social login data. Facebook could also disallow the lookup of profile picture and global Facebook IDs by app-scoped user IDs. It might also be the right time to make Anonymous Login with Facebook available following its announcement four years ago.

Clarification (2018-04-19 2:25am): In a previous version of the post we listed three sites (fiverr.com, bhphotovideo.com, and mongodb.com) which embed scripts that match the URL patterns given above. Third-party scripts may contain different contents when loaded by different sites, even though the scripts are served from same or similar URLs. We confirmed that the Forter scripts embedded on fiverr.com and bhphotovideo.com do NOT include functionality to access Facebook data. On mongodb.com we only observed the presence of an Augur script. We have published an updated list of sites, marking the ones where we have confirmed the presence of functionality to access Facebook data.

Correction (2018-04-22; 05:55pm): In a previous version of this post, we listed a Lytics script ( https://c.lytics.io/static/io.min.js ) as the cause of Facebook API access. Although this script is used to send the Facebook user ID to Lytics ( c.lytics.io ), the code which accesses the Facebook API was served within an OpenTag script as described above. The code snippet in the OpenTag script responsible for accessing Facebook user data is likely configured by the first party, so we have removed our previous opinion that first parties are likely unaware of the data access.

Several companies stated that they do not use Facebook data for third-party tracking purposes.

[0] Steven Englehardt is currently working at Mozilla as a Privacy Engineer. He coauthored this post in his Princeton capacity, and this post doesn’t necessarily represent Mozilla’s views.

[1] We use the term “vulnerability” to refer to weaknesses arising from insecure design practices on today’s web, rather than its commonly understood sense in computer security of weaknesses arising due to software bugs.

[2] In this post we focus on websites which use Facebook Login, but the vulnerabilities we describe are likely to exist for most social login providers and on mobile devices. Indeed, we found scripts that appear to grab user identifiers from the Google Plus API and from the Russian social media site VK , but we limited our investigation to Facebook Login as it’s the most widely used social SDK on the web.

[3] When the user completes the Facebook login procedure on a website that uses Facebook’s Javascript SDK, the SDK stores an authentication token in the page. When the user navigates to a new page, the SDK automatically reestablishes the authentication token using the browser’s Facebook cookies. All third-party queries to the SDK automatically use this token.

[4] In order to better understand the level of integration a third party has with the first party, we categorize scripts based on their use of the first party’s Application ID (or AppId), which is provided to Facebook during the login initialization phase to identify the site. Inclusion of a site’s application ID and initialization code in the third-party library suggests a tighter integration—the first party was likely required to configure the third-party script to access the Facebook SDK on their behalf. While application IDs aren’t meant to be secrets, we take the lack of an App ID to imply loose integration—the first party may not be aware of the access. In fact, all of the scripts in this category take the same actions when embedded on a simple test page with no prior business relationship.

[5]. The following could indicate the first party’s awareness of the Facebook data access:

1) third-party initiates the Facebook login process instead of passively waiting for the login to happen; 2) third-party includes the unique App ID of the website it is embedded on. The seven scripts listed above neither initiate the login process, nor contain the app ID of the websites.

Still, it is very hard to be certain about the exact relationship between the first parties and third parties.

[6] The application-scoped IDs can be resolved to real user profile information by querying Facebook’s Graph API or retrieve the user’s profile photo (which does not even require authentication!). When security researchers showed that it is possible to map app-scoped IDs to Facebook IDs and download profile pictures Facebook responded as follows: “This is intentional behavior in our product. We do not consider it a security vulnerability, but we do have controls in place to monitor and mitigate abuse.” A Facebook interface with similar controls was reportedly used to harvest of 2 Billion Facebook users’ public profile data. Note that although the endpoint found by the researchers does not work anymore, the following endpoint still redirects to users’ profile page: https://www.facebook.com/[app_scoped_ID].

APPENDIX:

Appendix 1 — Measurement Methods

To study the abuse of social login APIs we extended OpenWPM to simulate that the user has authenticated and given full permissions to the Facebook Login SDK on all sites. We added instrumentation to monitor the use of the Facebook SDK interface (`window.FB`). We did not otherwise inject the user’s identity into the page, so any exfiltrated personal data must have been queried from our spoofed API.

As in our previous measurements, we crawled 50,000 sites from the Alexa top 1 million in June 2017. We used the following sampling strategy: visit all of the top 15,000 sites, randomly sample 15,000 sites from the Alexa rank range [15,000 100,000), and randomly sample 20,000 sites from the range [100,000, 1,000,000). This combination allowed us to observe the attacks on both high and low traffic sites. On each of these 50,000 sites we visited 6 pages: the front page and a set of 5 other pages randomly sampled from the internal links on the front page.

To spoof that a user is logged in, we create our own `window.FB` object and replicate the interface of version 2.8 of the Facebook SDK. The spoofed API has the following properties:

For method calls that normally return personal information we spoof the return values as if the user is logged in and call and necessary callback function arguments. These include `FB.api()`, `FB.init(), `FB.getLoginStatus()`, `FB.Event.subscribe()` for the events `auth.login`, `auth.authResponseChange`, and `auth.statusChange`, and `FB.getAuthResponse()`. For the Graph API (`FB.api`), we support most of the profile data fields supported by the real Facebook SDK. We parse the requested fields and return a data object in the same format the real graph API would return. For method calls that don’t return personal information we simply call a no-op function and ignore any callback arguments. This helps minimize breakage if a site calls a method we don’t fully replicate. We fire `window.fbAsyncInit` once the document has finished loading. This function is normally called by the Facebook SDK.

The spoofed `window.FB` object is injected into every frame on every page load, regardless of the presence of a real Facebook SDK. We then monitor access to the API using OpenWPM’s Javascript call monitoring. All HTTP request and response data, include HTTP POST payloads are examined to detect the exfiltration of any of the spoofed profile data (including that which has been hashed or encoded).

For both calls to `window.FB` and HTTP data, we store the Javascript stack trace at the time of execution. We use this stack trace to understand which APIs scripts accessed and when they were sending data back. For some scripts our instrumentation only captured the API access, but not the exfiltration. In these cases, we manually debugged the scripts to determine whether the data was only used locally or if it was obfuscated before being transmitted. We explicitly note the cases where we could not make this determination.

Appendix 2 — Third parties which access the Facebook API on behalf of first parties

We also found a number of third-party scripts interacting with the Facebook API, which appear to be operating on behalf of the first party [4]. These companies offer a range of services, such as integrating multiple social login options, monitoring social media engagement, and aggregating customer data. As a specific example, BlueConic offers a Facebook Profile transfer service, that copies information from the user’s Facebook profile information to BlueConic’s data platform. Additional third-party services which access Facebook profile information on the first party’s behalf include: Zummy, Social Miner, Limespot (personalizer.io), Kissmetrics, Gigya, and Webtrends. (Update: Limespot informed us that they deactivated the relevant feature in November 2017.)

Image assets used in figures are from the Noun Project:

computer tower by Melvin, Female by SBTS, javascript file by Adnen Kadri, click by Aybige