DataSpii: The catastrophic data leak via browser extensions

Sam Jadali

SecurityWithSam.com

Abstract



We present DataSpii (pronounced data-spy), the catastrophic data leak that occurs when any one of eight browser extensions collects browsing activity data — including personally identifiable information (PII) and corporate information (CI) — from unwitting Chrome and Firefox users. Our investigation uncovered an online service selling the collected browsing activity data to its subscription members in near real-time. In this report, we delineate the sensitive data source types relevant to the security of individuals and businesses across the globe. We observed two extensions employing dilatory tactics — an effective maneuver for eluding detection — to collect the data. We identified the collection of sensitive data from the internal network environments of Fortune 500 companies. Several Fortune 500 companies provided an additional measure of confirmation through a process of responsible disclosure. By deploying a honeypot to monitor web traffic, we discovered near-immediate visits to URLs collected by the extensions. To address the evolving threat to data security, we propose preemptive measures such as limiting access to shareable links, and removing PII and CI from metadata.

1. Introduction



Imagine if someone could publicly access, in near real-time — within an hour — your sensitive personal data on the websites you are browsing. Imagine, further, this person could access your sensitive business data in much the same way. Moreover, what if you and/or your colleagues were, yourselves, unwittingly leaking such data? In Table 1 below, we enumerate the types of data that can be accessed.

Table 1. Types of personal and corporate data accessible via the online service.

Personal data Corporate data personal interests

tax returns

GPS location

travel itineraries

gender

genealogy

usernames

passwords

credit card information

genetic profiles company memos

employee tasks

API keys

proprietary source code

LAN environment data

firewall access codes

proprietary secrets

operational material

zero-day vulnerabilities

This scenario may seem far-fetched, but in reality, it has global implications for millions of people across the globe who unwittingly send information about their browsing activity to a publicly accessible online service via at least eight browser extensions. The online service, referred to in this report as Company X, then provides the collected data to its paid or free-trial members who request such data about any domain of their choosing. See Table 2 for the Chrome and Firefox browser extensions in question.*

*The eight extensions state in either their terms of service, privacy policies, or descriptions that they may collect data that is personally identifiable or non-personally identifiable. While such statements appear to provide transparency about their data collection, privacy policies are largely ignored by web users [1].

Table 2. Chrome and Firefox extensions identified in the DataSpii leak.

Extension name Number of users

(as of Feb/Mar 2019) Browser vendor Chrome extension ID

(if applicable) Hover Zoom 800,000+ users Chrome nonjdcjchghhkdoolnlbekcfllmednbl SpeakIt! 1.4+ million users Chrome pgeolalilifpodheeocdmbhehgnkkbak SuperZoom 329,000+ users Chrome and Firefox gnamdgilanlgeeljfnckhboobddoahbl SaveFrom.net Helper† ≤140,000 users Firefox N/A FairShare Unlock‡ 1+ million users Chrome and Firefox alecjlhgldihcjjcffgjalappiifdhae PanelMeasurement‡ 500,000+ users Chrome kelbkhobcfhdcfhohdkjnaimmicmhcbo Branded Surveys‡ 8 users (June 2019) Chrome dpglnfbihebejclmfmdcbgjembbfjneo Panel Community Surveys‡ 1 user (June 2019) Chrome lpjhpdcflkecpciaehfbpafflkeomcnb

†The invasive data collecting behavior occurs when the SaveFrom.net Helper extension is installed from the author’s official website using Firefox on macOS or Ubuntu. We have not observed such invasive behavior when it is installed from a browser vendor store.

‡FairShare Unlock, PanelMeasurement, Branded Surveys, and Panel Community Surveys make explicit efforts to let their users know they collect browser activity data.

These browser extensions and their respective browsers (i.e., Chrome or Firefox) run on desktops and laptops with macOS, Windows, Chrome OS, or Ubuntu operating systems. The extensions also run on Chromium-based browsers (e.g., Opera, Yandex Browser) that accept Google Chrome extensions. Google Chrome Sync’s feature, for instance, can spread the extensions to home and work computers. The extensions collect highly sensitive user information from URLs, page titles, and referrers. The collected data is then filtered by domain name and offered for sale. In one instance, we observed the widespread exposure of corporate project data and employee tasks from thousands of companies that use a popular project management provider, atlassian.net. It is important to note that this report examines data obtained from Company X; it does not address whether other companies or individuals may have also disseminated such data.

1.1 How it works

As stated above, the term Company X shall refer to the online service selling analytics data of any website. To protect the data of the millions of users and businesses impacted by DataSpii, this report has omitted the name of the online service. The term Party Y shall refer to the data collector vis-à-vis the eight browser extensions at the center of this report. For convenience, Party Y also refers to one or more corporate entities that may or may not have legal relationships with each other. Party Y shall also refer to other browser extensions, as yet unidentified, involved in the collection, processing, and/or dissemination of data.

During the course of our investigation, we observed Party Y extensions collecting browser activity data. Within one hour of collection, we observed that same data being disseminated to members of Company X. (Company X does not provide all Party Y collected metadata types [e.g., last-modified] available to its customers.) Once signed up for Company X’s free trial, a member can then view the near real-time traffic data of any organization by merely entering its domain name (see Figure 1). The data is then filtered by domain name and delivered to the Google Analytics accounts of Company X members. The collected metadata delivered to Company X members includes hostname, URL, page title, referrer, browser, browser version, city, state, country, internet service provider (ISP), network domain, operating system (OS), OS version, date, and time. The Google Analytics service collates the metadata, thus providing Company X members with insight into the sources of a domain’s web traffic statistics.

Figure 1. Possible scenario for how one impacted individual, Jerry, can leak the data of many of his clients. Tax returns are among the types of data made accessible via DataSpii.

1.2 Background

A browser is software (e.g., Google Chrome, Mozilla Firefox, Safari, Microsoft Edge, Internet Explorer) that enables users to search, access, and display web pages. Browsers provide an interface to perform various web activities such as emailing, cloud-sharing, online shopping, watching movies, listening to music, and sharing photos. Browser extensions are modules that enhance and customize the browsing experience. For example, ad blockers are extensions that increase page speeds and reduce browser crashes, whereas password managers are extensions that generate and store random passwords, thereby increasing overall security.

According to the Google Chrome Developer site, “Extensions are event based programs used to modify or enhance the Chrome browsing experience. Events are browser triggers, such as navigating to a new page, removing a bookmark, or closing a tab. Extensions monitor these events in their background script, then react with specified instructions” [2]. Such triggers allow an extension to send user browsing data to third-party servers, which in turn can be used to compile user profiles for (the purpose of) targeting users with personalized advertisements.

In Data and Goliath, Bruce Schneier notes that some corporations use surveillance data to target consumers with discounts, trial offers, and other promotional features [3]. Corporations and governments that exchange surveillance data for legitimate purposes may also be sharing it — albeit inadvertently — with cybercriminals, often in huge data breaches. While most users understand that browsing activity is surveilled and tailored for advertisement purposes, few appreciate the full implications of being monitored.

Likewise, browser extensions surveil user data in return for convenience. Antimalware extensions warn users of danger by collecting the URLs that users visit and cross-referencing those URLs with a live database of known hacked sites. Grammar check extensions collect user-entered text, including social media posts, documents, and emails. In exchange for such conveniences, users are bombarded with personalized insights, performance stats, or advertisements. When such data is collected for tailored advertisements, market research, or enhanced browsing experiences, it may inadvertently end up in the wrong hands. Thus, the very same data that enhances personal experiences jeopardizes personal, corporate, and government information.

1.3 Company X and Party Y

Company X’s frequently asked questions (FAQ) page maintains that millions of people around the world opted in to anonymously share their browsing history. It further maintains that what it is doing is legal, ethical, and patented. While Company X does not disclose its method of data collection, it does seek to assure its users that the data collected remains anonymized. We suggest that this is not always the case. Moreover, if an employee is using a data-collecting browser extension on a company computer, the employee may not have the authority to consent on behalf of his or her employer.

While our investigation did not establish a legal relationship between Party Y and Company X, it did establish that much of the data collected by Party Y was made available to members of Company X.

The privacy policies of the Party Y extensions inform the end user that data may be collected. While privacy policies may allay some concerns about data collection, they are largely ignored by the typical web user [1]. Furthermore, in this report, we elucidate a delayed data collection measure employed by two Party Y extensions.

While Party Y extension policies state they may employ “PII scrubbers” to assist in providing anonymity, this report details their inability to scrub all types of PII. As a matter of legality, the European Union’s recently enacted data privacy law (EU-GDPR) states that the PII of all EU data subjects must be redacted or pseudonymized [4]. The failure to redact sensitive information, including PII and CI, endangers everyone who uses Party Y extensions, either directly or by association.

1.4 Invasiveness

After our reporting of a Party Y extension to a browser vendor, the vendor remotely disabled the extension for all users; however, that did not stop the data collection in our browser. While the extension ceased its functionality, we continued to observe our browsing activity being sent via POST request to Party Y servers. Ultimately, the data collection stopped when we removed the extension.

1.5 LAN impact

DataSpii circumvented the most effective security measures including authentication or encryption, which were designed to thwart leaks. Using a multi-pronged approach, we discovered the collection of sensitive data from the private LAN networks of many Fortune 500 companies. In addition, we devised a LAN experiment and observed the collection of hyperlinks stored within the page content of the website hosted on our LAN. Such data collected from a single visit to a page in a LAN environment can be used to map a corporation’s LAN environment. Furthermore, we observed the dissemination of our LAN data to three different hostnames. The collected data included our site’s LAN IP address, hostname, page title, timestamp of the visit, as well as the URLs of page resources (i.e., CSS files, JS files, and images) referenced in our HTML code. We then observed much of our LAN data being disseminated to members of Company X. Finally, through the responsible disclosure process, we corroborated our discovery with impacted individuals and major corporations.

1.6 History

Cybersecurity companies have long warned about the dangers of publicly accessible links [5]. In addition, the HTTP referer header has been notoriously known to cause leakage of publicly accessible URLs. Researchers have also warned about the data collection practices of browser extensions. Unfortunately, many corporations often fail to act on this direct knowledge. The problem is compounded when an online service such as Company X publishes the near real-time traffic data for its members. The dissemination of such data by Company X undermines the data protection features of data source types such as ephemeral links, URL-based query string authentication parameters, and API keys.

2. Affected data source types and impacted companies

Extensions have access to powerful browser application interface (API) functions that tap into user browsing activity. While some extensions may use this data to perform their inherent, legitimate functions, others use the data strictly for monetization or advertising. In order to be approved by a browser extension store, an extension must be reviewed by browser vendors; however, perfunctory reviews can result in the approval of bad actors. We identify seven data source types and provide examples of impacted companies by each.

Note: We have made disclosures to all impacted companies listed in this report. We have not identified any data being used maliciously. However, through our honeypot experiment, we observed third parties visiting unique URLs captured by Party Y browser extensions.

2.1 Data source type: Shared links

Web users frequently share publicly accessible unique links with family, friends, and colleagues. The links can direct recipients to Apple iCloud photos, Quickbooks invoices, 23andMe ancestry data, Nest security camera video clips, or confidential documents stored on OneDrive. The following examples detail how the leaking of shared links can impact major online services.

2.1.1 Zoom Video Communications

The DataSpii leak enables malicious parties to monitor corporate or client meetings. Enterprise businesses use services like Zoom for online meetings, video conferencing, and webinars. Typically, the meeting organizer will invite colleagues or clients by sharing a Zoom meeting link via email. The invite link URL often contains a unique meeting ID. When a participant of the meeting opens the link using a browser with an invasive extension, the link URL may then be displayed to a Company X customer in near real-time (see Figure 2).

Features of Zoom’s “Business” plan enable customers to choose a vanity hostname such as example.zoom.us. The vanity hostname acts as an easy way to brand the Zoom account and is included as a subdomain of the link URLs used for the organization’s online meetings. Search refinement tools provided by Google Analytics make it easy for a Company X member to filter the subdomains of Zoom customers, thereby allowing that member to filter meetings specific to an organization. Meetings that are not password protected become vulnerable to eavesdropping by outsiders with access to the URL. Although Zoom notifies meeting participants when a new user joins a meeting, it is possible for these notifications to be disregarded, especially during meetings with large numbers of participants. Zoom also offers a popular “waiting room” feature, enabling the host to manually admit each participant to ensure only their intended participants join their meeting.

Figure 2. Zoom.us meeting URLs.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.1.2 Skype

As with Zoom, Skype users ask their friends or colleagues to join a group chat using a publicly accessible shared link. The recipient of the link can join the chat directly from a web browser. When a user opens the link with a Party Y browser extension, the Skype group URL may become available to Company X customers who request the data of Skype.com (see Figure 3). A malicious Company X member may use the link to eavesdrop on chat conversations. Note that when a new user joins the group chat, Skype will notify connected users that a new guest has joined the conversation.

Figure 3. Skype group chat URLs.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.1.3 23andMe

23andMe customers can share reports with family, friends, and/or their entire social network using a publicly accessible link. When the link is opened by a user with a Party Y browser extension, the URL of the report becomes available to any Company X customer that requests data on 23andme.com. Figure 4 displays URLs matching a known format for publicly accessible 23andMe reports. Such reports may include information on ancestry, family DNA, muscle composition, and other biomedical data.





Figure 4. Redacted links to shared 23andMe DNA and ancestry reports.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.1.4 Apple example I

When an iOS user stores photos on iCloud and shares four or more photos via the Photos app, the app will generate a publicly accessible unique link to the photos. When a recipient or sender opens the link in a browser with a Party Y extension, the link to the photos is collected. Such a link may end up in the hands of a Company X member requesting the data of icloud.com (see Figure 5). When the URL is visited, the photos become visible and may even display an iOS user’s first and last name.

Figure 5. Links to shared Apple iCloud photos or videos.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.1.5 Apple example II

Shoppers who place orders via apple.com may receive an email from Apple containing a link to view their order status. When such links are viewed using a browser with a Party Y extension, they may be delivered to customers of Company X that request data of Apple.com. Figure 6 contains Apple order status links obtained from the Company X service.

Figure 6. Apple order status URLs provided by the Company X service.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

A bad actor may use the URL data to view an Apple.com product order status page in a browser. While Apple may well omit PII from visible areas of the order status page, during our observations, it did not conceal PII within the source code of our own Apple order. This source code can easily be viewed in web browsers by right-clicking anywhere on the page and selecting the “view page source” option. Figure 7 displays our PII data, including first name, last name, the last four digits of the purchaser’s credit card number, credit card type, and the Apple store location used for pickup.

Figure 7. Source code view of our Apple order status page.

2.1.6 Intuit Quickbooks

Vendors often email Quickbooks invoice links to customers for payment. Such invoices may be accessible via a publicly accessible unique link (see Figure 8). When the links are viewed using a browser with a Party Y extension, they may be delivered to customers of Company X that request data of intuit.com. Viewing the invoice may reveal the recipient’s full name, address, city, state, zip code, amount paid, invoice number, date, and details of the service, as well as the name and address of the vendor who sent the invoice.

Figure 8. Intuit Quickbooks invoice links obtained from the Company X service.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.1.7 Nest Labs

Nest provides businesses and homeowners with 24/7 video surveillance of their property. Nest members can share surveillance videos with others using a publicly accessible unique link. If the sender or recipient of a shared video clip accesses the URL via a browser with a Party Y extension, the link to the clip may then become available through the Company X service (see Figure 9).

Figure 9. Redacted URLs of shared Nest security cameras clips.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.1.8 JSFiddle

Web developers use sites such as JSFiddle.net to share and collaborate code with their colleagues. Developers can access their code via a publicly accessible unique URL. If the senders or recipients visit a JSFiddle.net URL using a browser with a Party Y extension, the code may appear to members of Company X requesting data on jsfiddle.net. In addition, Google Analytics tools allow Company X members to filter URLs by a visitor’s service provider. Google Analytics defines the service provider dimension as the internet service provider (ISP) registered to the IP address of the visitor. Many of the largest corporations operate as their own ISP. In Figure 10, we filter the service provider dimension for some of the largest organizations, and observe visits to JSFiddle.net URLs from those networks. The data obtained from Company X could then be used by bad actors to view or scrape the contents of JSFiddle URLs visited by targeted corporations.

Figure 10. Redacted JSFiddle.net URLs filtered by service provider dimension (ISP).

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.2 Data source type: PII embedded in URL

Insecure practices such as embedding PII in the URL still occurs within many organizations. When browser users with a Party Y extension visit pages with PII in the URL, the data can be viewed by Company X members. A cybercriminal can use such data to find a person’s GPS coordinates in near real-time, commit burglary during a homeowner’s absence, or to stalk travelers along their destinations.

2.2.1 Southwest Airlines, American Airlines, and United Airlines

Southwest.com, United.com, and AA.com (American) all include passenger last names and record locator (confirmation) numbers within the URL. In addition, Southwest.com URLs may also include a passenger’s first name. On February 6, 2019, Wandera’s threat research team reported that many airlines, including Southwest Airlines, expose PII in their URLs. During our investigation in early June of 2019, Company X data showed thousands of Southwest URLs containing passenger first names, last names, and record locator numbers within the URL (see Figure 11).

Note: After disclosure, Southwest confirmed they are aware of this issue and have enhancements underway.

Figure 11. Southwest.com URL data containing values for passengerFirstName, passengerLastName, and confirmationNumber.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.2.2 Uber

Uber customers can book rides in a web browser via the m.uber.com website. In doing so, customers with a Party Y browser extension may expose their pickup and drop-off locations to cybercriminals, who could then use such information to stalk riders using their GPS coordinates in near real-time (see Figure 12).

Figure 12. Uber.com URL data containing customer pickup and drop-off locations.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.2.3 Apple example III

In Figure 13, we observe URLs containing email addresses from the iforgot.apple.com hostname. Such email addresses can be cross-referenced with hacked databases located on the dark web to exploit vulnerable Apple users.

Figure 13. iforgot.apple.com page URLs containing email addresses.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.3 Data source type: Page titles

To protect corporate assets, security-conscious businesses enforce user authentication to access company documents and portals; however, such businesses may ignore the content of data stored within the page title, which may contain descriptive confidential information. For example, “Firewall needs to be re-enabled. All ports are currently open” or “reset VPN password to [REDACTED-PASSWORD].” The near real-time exposure of such information creates opportunities for cybercriminals to attack individuals or organizations.

2.3.1 Atlassian

Atlassian is an enterprise software company that provides project management and document collaboration software to its clients. Atlassian’s project management software, JIRA, inserts project issues and employee-assigned tasks into the page title. Atlassian customers have the option to self-host Atlassian software or use the Atlassian Cloud platform. Companies that choose the Atlassian Cloud option can host their project management portal on a subdomain of atlassian.net (i.e., example.atlassian.net) [6]. When a Company X member requests data for atlassian.net, the data includes all subdomains under the domain. Within two hours of requesting Atlassian.net data via the Company X service, we observed the collection of 30,000 unique URLs on more than 1,400 atlassian.net subdomains. This included page title data from the following atlassian.net subdomains: BuzzFeed, NBCdigital, AlienVault, CardinalHealth, TMobile, Reddit, and UnderArmour. Users of these subdomains may have a Party Y extension leaking corporate page title and URL data, as shown in the following examples.

Example: nbcdigital.atlassian.net page titles include:

“Technical failure for <5 minute period of time affecting public save submissions or technical issues before Save period starts that cannot be resolved before the Save period opens – [REDACTED] – Confluence”

“Wrong contestant name(s) and/or image(s) displaying online, in the viz, or in the official app”

Example page title from an unnamed automotive company:

“Whitelist IP [RED.ACT.ED.IP] and [RED.ACT.ED.IP] for all AWS servers.”

From such information, a cybercriminal would know that the two IP addresses or server(s) have unfettered access to the entire company’s AWS servers. The cybercriminal can use such knowledge to focus all of his or her efforts on breaking into the server(s) associated with those IPs.

2.3.2 Microsoft

OneDrive is a file- and photo-sharing hosting service operated by Microsoft. Users of OneDrive often share their documents with friends or colleagues via a publicly accessible unique link. OneDrive inserts the titles of documents or filenames into the browser page title. The Party Y extensions capture the URL and page title data and deliver it to a Company X customer requesting data on 1drv.ms. Using Google Analytics advanced filter tools, metadata such as page titles are searchable. Figure 14 illustrates a page title search query for the word “tax.” The redacted results include the URLs to page titles such as “2018 Tax Return [REDACTED],” where the redacted text is often an individual’s name. A cybercriminal can access these documents and use the data in them to steal identities and/or view financial income, bank account numbers, social security numbers, and other confidential information.

Figure 14. Microsoft OneDrive page title search query for “tax” under the 1drv.ms domain.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.3.3 Kareo

Kareo provides healthcare providers with electronic health record (EHR) management via Kareo’s practice management software. Healthcare providers that use Kareo can access patient information via a web browser. Customers of Company X requesting data on Kareo.com may use Google Analytics to find patient names. Figure 15 lists redacted patient names obtained from the Company X service. Furthermore, the figure illustrates many patient names being exposed from a single city. Over a four-day period, we observed 193 unique page titles from Hunters Creek, Florida. We observed the service provider (ISP/ASN/OrgName) value for all entries originating from Hunters Creek as being listed from a single service provider: [REDACTED-CLINIC-NAME] – [REDACTED]. Moreover, we observed all page title entries associated with the visits originating from Windows 7 running Chrome. The metadata led us to suspect the patient-name leak originated from one or more staff member(s) at the unnamed clinic.

Note: After our disclosure to Kareo, Kareo deployed a product update to remove patient names from page titles in the Kareo platform.

Figure 15. Company X obtained data for the Kareo.com domain name.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.3.4 DrChrono

DrChrono provides an online platform for doctors and patients that makes EHRs available digitally [7]. Healthcare staff members can access such software from their offices.

Customers of Company X requesting data on DrChrono.com may use Google Analytics to find patient and medication names. Figure 16 displays page title data such as “RefillRequest for [REDACTED-NAME].” In the right-hand column, many of the redacted subdomains of the hostname contain the names of various health clinics and doctors’ offices. Using collected metadata such as timestamp, browser, OS, ISP, and hostname, the metadata may be used to follow the visitor’s browser activity path and infer from this which patient is associated with which medication.

Figure 16. Company X data for the drchrono.com domain name provided page title data in the following format “Lab Results for JOHN DOE.”

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.3.5 Blue Origin

Blue Origin is a private aerospace company founded by Amazon’s CEO, Jeff Bezos. We observed the following page title data within a Google Analytics account populated by Company X and filtered for the domain name, blueorigin.com. We observed a blueorigin.com subdomain with URL and page title data that follow patterns similar to those of Atlassian’s JIRA software (e.g., /browse/[PROJECT-KEY]-[PROJECT-ID]). Due to the sensitive nature, we have decided against publishing a screenshot. Moreover, we have removed the project-name prefixes (commonly seen with JIRA issues) and suffixes which accompany the page titles.

Blue Origin page title examples:

“manifold [REDACTED] contaminated with [REDACTED] material”

“[REDACTED-COMPETITOR-NAME]”

“Rust found in [REDACTED] manifold”

“speed sensor broke during [REDACTED]”

“[REDACTED] Failed Calibration”

“Verify [REDACTED] can survive degrease and clean operations”

2.3.6 FireEye

FireEye is a publicly-traded cybersecurity company with offices around the world. We observed the following page title data within a Google Analytics account populated by Company X and filtered for the domain name, fireeye.com. The page titles contained the phrase “JIRA” or originated from fireye.com hostnames containing that phrase. We observed the service provider (ISP/ASN/OrgName) of the visitor(s) as “fireeye inc.” The city and region metadata components match a city of one of FireEye’s office locations [8]. This leads us to suspect that a user (or users) using the fireeye inc network is impacted by the DataSpii leak.

We have removed any category-name prefixes (commonly seen with JIRA issues) and suffixes which may accompany page titles. Due to the sensitive nature, we decided against publishing a screenshot.



The following are FireEye.com page titles collected from visitor(s) with an ISP listed as: fireeye inc.:

“URL with Malicious download missed detection [REDACTED-URL]”

“Large backup fails when [REDACTED]”

“As [REDACTED] engineer, I would like to run the [REDACTED]”

“Expose what Engines are running on the [REDACTED]”

2.4 Data source type: File attachments

Company X members can search, view, scrape, and save publicly accessible attachments leaked by users with Party Y extensions. This includes document, image, and video attachments from popular platforms such as Facebook Messenger and Zendesk. This vulnerability compromises the possible confidential and/or sensitive nature of the files being transmitted.

2.4.1 Facebook Messenger

The URLs to Facebook Messenger attachments may be captured by Party Y extensions (see Figure 17). In the course of our investigation, we sent a sample JPEG to a friend over Facebook Messenger. We observed the JPEG attachment being served over a publicly accessible fbcdn.net URL. While Facebook appears to append query strings that specify a time-limited window for access, Company X is able to provide these URLs in near real-time, thereby allowing access within Facebook’s time-limited window.

Figure 17. Search query for “tax” on 1drv.ms data collected over a 96-hour period.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.4.2 Zendesk

Many of the world’s largest corporations and organizations use Zendesk for their support ticketing system. The Zendesk website states that their customers include Uber, Airbnb, Mailchimp, GoFundMe, the American Civil Liberties Union (ACLU), and Dollar Shave Club [9]. Users who submit support tickets sometimes attach files containing confidential or personal information. Using an attachment we submitted to a Zendesk portal, we observed the storage of our attachment on the zdusercontent.com domain name. We observed that the URL to our attachment is publicly accessible. During the investigation of our own Zendesk ticket attachment, we observed that even when the token query parameter was excluded, the attachment still loaded in a remote browser. We discuss redacted query strings in Section 4.1. Figure 18 shows other attachment URLs captured on the zdusercontent.com domain name. Using Google Analytics filter tools, such ticket attachments can be filtered by referrer and narrowed to the companies from which they originate.

Note: Zendesk support instances have a setting which customers can enable to “Require authentication to download [attachments].” As such, the public accessibility of the zdusercontent.com URLs is dependent upon the setting the customers have in place. By default, it is not enabled.

Figure 18. URLs to Zendesk ticket attachments on the zdusercontent.com domain name.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.5 Data source type: Ephemeral links and authorization strings

In order to defend against malicious activity, businesses may protect publicly accessible URLs by using query string parameters that (a) set a time-limited window for access, (b) add an authorization string, or (c) use (a) and (b) together. However, the ability of Company X to deliver complete URLs to its customers in near real-time renders these security methods less effective, if not altogether ineffective.

2.5.1 Amazon Web Services (AWS) Simple Storage Service (S3) buckets

Per AWS S3 documentation: “By default, all S3 buckets are private and can be accessed only by users that are explicitly granted.” Administrators, however, may authenticate S3 access requests by using AWS S3 Query String Authentication. This form of authentication includes the following URL query string parameters, that can be captured by Party Y extensions:

The AWSAccessKeyId specifies, “the AWS secret access key used to sign the request and, indirectly, the identity of the developer making the request.”

The Signature specifies, “[t]he URL encoding of the Base64 encoding of the HMAC-SHA1 of StringToSign.” The encoding is a unique representation that includes an AWS secret access key ID that Amazon assigns to users when they sign up for Amazon Web Services. In other words, it ensures that the request a user is about to make is not modified in transit.

The Expires parameter specifies when the signature expires and is specified as a UNIX epoch timestamp. Any request received after the specified time will be rejected.

Shopify AWS S3 buckets

Customers of Company X requesting data on amazonaws.com can view URLs containing AWSAccessKeyId, Expires, and Signature query string parameters (explained in the previous subsection). The shopify.s3.amazonaws.com URL is one of many such examples (see Figure 19). A malicious Company X customer can view the captured data in near real-time and use it to access Shopify shipping labels and details about customer orders.

Figure 19. A Shopify AWS S3 bucket as an example of how authorization strings and ephemeral links are rendered ineffective by the DataSpii leak.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.5.2 Facebook Photos



Facebook users share private photos on the platform while making use of privacy settings to restrict access. Such Facebook photo privacy settings include Public, Friends, Custom, and Only Me. Once posted, the URLs to such photos may reside on the Facebook domain, fbcdn.net.

To illustrate this, we uploaded a Facebook photo and chose the Only Me photo privacy setting: URL to photo on fbcdn.net. We viewed this photo with a Party Y extension and observed the URL collection by the extension. Then, within one hour, we witnessed the delivery of the URL by Company X to our Google Analytics account (see Figure 20).

Figure 20. The fbcdn.net URL to our Facebook photo defined with an “Only me” privacy setting.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

As discussed in our results, our honeypot experiment determined that our URLs collected by Party Y extensions were visited by a third party almost immediately after collection. If a third party were to scrape and collect URLs to domains like fbcdn.net, it may enable them to use facial-recognition technology to process and identify individuals visible in the image. While we could only verify visits to our own honeypot domain, third-party viewing, processing, or scraping of such content has disastrous implications for privacy.

2.6 Data source type: API keys, secrets, and tokens

Party Y extensions can capture API authentication credentials, which poses especially vexing security issues. An API can offer software developers an easy way to integrate their software or app with another online platform. In order to serve their function, APIs often provide high-level access to an account. Malicious actors may use API keys and API secrets found on the Company X service to gain access to accounts on cryptocurrency exchanges or financial services, or to access sensitive information from any impacted corporation.

An unnamed, cloud-based website security service

An unnamed platform provides firewall protection services to businesses across the globe. The service’s web portal allows their customers to click on a link to perform firewall actions. Accessing such a link by a user with a Party Y browser extension can deliver the sensitive API key and API secret to a Company X customer requesting data on the unnamed service. A malicious Company X customer can then view customer API keys and secrets, bypass the website’s firewall, and pull the business’s logging data (see Figure 21). Unless use of the API is restricted by IP address, a bad actor can use the API data to wreak havoc on a corporation.

Figure 21. Customer API keys and secrets from unnamed cloud firewall provider on display.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

2.7 Data source type: Private LAN environments

For added security, corporations may host their most confidential data on a private intranet that is accessible only from within the organization. For convenience, corporations frequently host their intranet portals on a subdomain of their domain name (e.g., jira.example.com). When an employee with internet and intranet access has a Party Y extension and browses intranet portals on a company subdomain, the extension captures the private intranet browsing activity data before sending it to Party Y servers, which in turn deliver the data to any Company X customer requesting it on the corporation’s domain name.

Many of the page titles on the intranet subdomains include what appear to be employee tasks and issues. Data provided by Company X can provide competitors, state-sponsored actors, and cybercriminals with proprietary secrets, information about existing cybersecurity vulnerabilities, and other sensitive corporate data such as internal health memos.

We employed three prongs to identify corporations impacted by this data type. First, we used readily available DNS tools to ascertain whether the hostname for a data point was publicly resolvable. Second, we ascertained the visitor’s internet service provider (ISP). Many of the impacted Fortune 500 companies act as their own ISP with their own autonomous system number (ASN). Third, we relied on city and region metadata to ascertain if the visitor was near an impacted company’s place of operation.

We ascertained whether the three conditions listed below occurred. If at least two of the conditions occurred (condition 1 being required), we then presumed that the page visit was from a LAN environment, suggesting that a network user or staff member of the corporation may have a Party Y browser extension:

the hostname is not publicly resolvable or the hostname points to an IP in the known private LAN IP range (i.e., 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16) the Service Provider (ISP/ASN/OrgName) value matches the name of the company; the City and Region metadata show the visitor is near or in the city of an office location of the impacted company.

We denote each page title with one of the following notations:

‖ All three conditions occurred.

¦ Conditions 1 and 3 are confirmed. We use this notation to acknowledge that not all companies operate their own ASN. In addition, some users may use a VPN to access a corporate LAN while maintaining a wide-area-network (WAN) IP via their local ISP, such as Comcast.

§ Conditions 1 and 2 are confirmed, and the City and Region metadata were missing.

Due to the sensitive page title data, we do not provide screenshots but instead provide the redacted examples in text. The prefixes and suffixes that identify the category or task, like those used by JIRA, have been removed as well. We raise this note to make readers aware that most page titles belong to a subcategory of the removed category prefix under which those page titles were raised.

The following page titles were obtained from Company X, and we identify the pages as originating from a private LAN environment:

2.7.1 Health technology companies, including AthenaHealth and Epic Systems, Inc.

‖ “Vaccines not being marked as completed once administered”

‖ “Case Delays due to Emergent Case after Patient is In Room”

‖ “PRN Medications administered multiple times”

‖ ”Giving OR Nurses Access to [REDACTED]”

‖ “Timed PT Services are Displayed and Calculated Incorrectly on The Billing Tab”

‖ ”Newborns not showing up in [REDACTED]”

2.7.2 Pharmaceutical companies, including Amgen, Merck, Pfizer, and Roche

‖ “[REDACTED] Samples were approved without approving the test results”

§ “URGENT Missing Samples [REDACTED]”

§ “Revise Sample Limit of [REDACTED] 5mg to [REDACTED] Packs per person/year”

‖ “[REDACTED] Aliquotting Issue prevents [REDACTED] from fully documenting [REDACTED]”

‖ “Personnel plates were not marked as sampled”

‖ “User Approved own entered results”

‖ “We would like to use Jira as a ticketing system for handling request based on privacy regulations globally”

§ “MR Review Single [REDACTED-DRUG-NAME] [REDACTED]”

‖ “Samples are showing Out of Spec even when the results are In spec”

‖ “[REDACTED] admin disable api key”

‖ “Need the Parent [REDACTED] ID visible when attempting to match customers, see attachment for more detail”

‖ “Update library dependencies per [REDACTED] vulnerabilities”

2.7.3 Kaiser Permanente

‖ “Unpatched Library/Software: [REDACTED] Risk-High – [REDACTED] Jira”

‖ “[REDACTED] Insufficient [REDACTED-SECURITY-ISSUE]

2.7.4 Unnamed airplane manufacturer

‖ “MFA access does not work through [REDACTED].”

Note: MFA is an acronym often used for multifactor authentication. Bad actors could act on such near real-time information to gain access to systems.

2.7.5 SpaceX

‖ “[REDACTED] Propulsion Failures”

‖ “[REDACTED] Vent Pressure [REDACTED] Displacement”

‖ “Dragon 2 Propulsion Initialization [REDACTED]”

2.7.6 Tesla

‖ “Investigate [REDACTED] Raven front drivetrain [REDACTED] at [REDACTED] MPH and also [REDACTED]”

‖ “Model [REDACTED] Significant white oxidation along the [REDACTED]”

‖ “[REDACTED VIN#] Driving at speeds of approx 5 to 25 mph there is a loud whining sound from the rear drive unit. Noticed that when checking oil level that [REDACTED]”

‖ “Periodic clunking noises while supercharging Model [REDACTED]”

‖ “Bring back [REDACTED] trims for Model [REDACTED] features pages in the Performance section”

2.7.7 Apple

‖ “Issue where [REDACTED] and [REDACTED] field are getting updated in response of story and collection update APIs by [REDACTED]”

‖ “New hit test API & [REDACTED] concept”

‖ “HomePod Settings (New Proposal) [REDACTED]”

‖ “scripts for testing VIO accuracy amoung [sic] various devices”

2.7.8 Cybersecurity firms, including Symantec, Trend Micro, and Palo Alto Networks

‖ “[REDACTED] Production Security vulnerabilities detected in [REDACTED]”

‖ “[REDACTED] does not have [REDACTED] agents deployed to enable security monitoring”

¦ “[REDACTED-TIER] – [REDACTED-MAJOR-BANK] – IDS Sensor [REDACTED] not displaying [REDACTED]”

¦ “[REDACTED-TIER] – [REDACTED-MAJOR-RETAIL-CORP] – Evaluate false positive for [REDACTED]”

¦ “[REDACTED-TIER] – [REDACTED-MAJOR-TELECOM-CORP] – SMS only providing limited Log History”

‖ “[REDACTED] Servers reaching maximum capacity and processing capabilities”

¦ “Reputation filter ignoring IP exceptions”

¦ “[REDACTED] fails to activate package when large number of malware references are present on a filter”

‖ “IPS Performance degrades after enabling OOB”

Note: IPS is a common acronym for Intrusion Protection System; OOB is a common acronym for Out-of-Band.

2.8 Example attack scenario

One Fortune 500 company’s private LAN data included the following page titles:

“Add client [REDACTED-LAN-IP-ADDRESS] to NFS export list on [REDACTED-SERVER-NAME]”

“Supermicro PSUs are bad”

“Please install latest CentOS on [REDACTED]”

“Snapshot strategy: Moving from 30 dailies, to 5 dailies, and 2 weeklies”

“Need [REDACTED-FOLDER] exported for NFS for [REDACTED-LARGE-LAN-IP-BLOCK]”

A bad actor’s access to such data will inform him or her of the following:

(1) the Fortune 500 company’s disaster recovery strategy; (2) the brand of hardware in use; (3) the IP address to a server that has access to an NFS storage unit; (4) the company’s preferred Linux distribution; and (5) a large LAN IP block with full access to a folder.

Such data can be used by bad actors to pinpoint attack vectors and plan a large-scale breach.

3. Methodology

To identify the extensions involved in data collection, we began our analysis by reviewing the first reported extension: Hover Zoom. In order to view the data captured by the extension, we analyzed the extension in an isolated environment. We used this controlled environment to verify whether the data was being collected by the extension, to decode and decompress the data, and to verify whether it was accessible via the Company X service.

3.1 Extension analysis environment

In order to set up an environment with minimal interference, we employed the following:

Installation of a clean operating system (OS). Personal computers can be contaminated and exposed to malware in a variety of ways. OS software obtained from third-party providers may contain malware [10]. To decrease such risks, we provisioned a Windows Server 2016 virtual machine by using an OS ISO downloaded directly from microsoft.com. Clean IP. To rule out interference from ad networks or other third parties having obtained data from an IP address, we allocated to the server an IP address that had not been used for at least five years before this experiment. Browser. For each browser tested, we installed the browser by downloading the installation file from the browser vendor’s website. Burp Suite. Burp Suite, a web application security tool downloaded from portswigger.net, was used for recon and analysis. The app acts as an intermediary between the browser and destination web server, allowing inspection and analysis of web traffic. Burp Suite records all requests from the moment of extension installation and continues to do so as long as the application remains open. FoxyProxy Standard Chrome Extension. This extension proxies Chrome requests to Burp Suite. After vetting FoxyProxy, we used this extension in our Windows OS testing environment. We chose FoxyProxy instead of the native Windows proxy service because the native service tended to add unnecessary traffic bulk to our Burp Suite data. We also verified our results using other operating systems such as Chrome OS running on a Chromebook, and in some instances without using FoxyProxy. Suspected extension. We installed each browser extension in question using the resource from which it is most likely to be downloaded by everyday users (e.g., the Chrome Web Store). However, the invasive behavior of the SaveFrom.net Helper Firefox extension only occurred when downloaded from the extension author’s website, en.savefrom.net. In addition, we downloaded Firefox’s FairShare Unlock extension from https://unlock.fairsharelabs.com/install/.

We used Chrome, Chromium, and Firefox to verify that the leak can occur on macOS, Ubuntu, Windows, and Chrome OS (the OS used on Chromebooks).

3.2 Tools for analysis

For analysis, we used Burp Suite as well as Chrome’s Development Tools; specifically, we used “Inspect views background page” for each extension. We inspected the background page to monitor live events and network traffic requests initiated by the extension. We reviewed historical and live events via Burp Suite. We noted the timestamp, hostname, URL query strings, HTTP status, response length, the raw requests and responses for each request, and the sequence of the requests. We then searched for temporal patterns and anomalous requests such as an unusually long response length that occurred while the browser was idle. Such high response lengths may indicate either the download of an extension’s data collection instruction set or the more commonly observed automated extension update.

In some cases, the raw responses of requests contained minified source code, making it difficult to read. We used readily available unminification tools (e.g., Atom Beautify, unminify.com, beautifier.io) to translate the code into a more readable format. For additional extension source code analysis, we used Rob Wu’s CRX Viewer. We informed ourselves with actionable intelligence provided by Duo Security’s CRXcavator. We used PacketTotal’s PCAP historical analysis database to perform research on the hostnames that we uncovered via Burp Suite. For the historical WHOIS and hosting information of those hostnames, we used DomainTools.com. For greater hosting and DNS insight, we further assessed the hostnames and IPs using Farsight Security’s historical DNS database. We also performed a quantitative text and source code analysis of each extension author’s website, including the extension’s terms of service and privacy policy pages. In order to find similar sites and extensions, we used search engine tools such as PublicWWW.com to perform reverse source code lookups.

4. Results

4.1 Observations

The sync expansion

In order to provide a consistent experience across all devices, Chrome’s Sync function automatically syncs bookmarks, history, settings, and browser extensions across all computers logged into Chrome using a Gmail account. Consequently, an invasive extension combined with the browser’s syncing behavior enables data collection from all logged-in locations (e.g., home and work computers).

HTTPS and HTTP environments

The browser’s powerful API functions allow extensions to tap into browser activity. When an invasive extension is installed within the browser, it can access page content — regardless of whether the site being visited is listed as secure (i.e., it uses HTTPS) or insecure (i.e., it does not use HTTPS). The extension may then collect data and send it off to a server for processing.

Redaction methods

Company X-provided page URL data of AA.com (American Airlines) displayed redacted last names with “*****”. During our analysis of Hover Zoom’s POST requests in Burp Suite, we observed redaction of common query string parameters such as “lastName.” However, passenger names from Company X-provided URL data of Southwest.com (Southwest Airlines) were unredacted. Our observations suggest that this is due to the query string parameter (passengerLastName) employed by Southwest’s web developers. Hover Zoom exhibited redaction capabilities for obvious query string parameters; however, we did not observe such behavior for variations of such parameters.

During our analysis of SuperZoom’s POST requests, we did not observe any redaction capabilities. As a result, query string parameters such as lastname, firstname, username, password, ssn, and apikey, were not redacted, and were visible in data delivered to customers of Company X (see Figures 22 and 23).

Figure 22. Company X-provided data on our page visits using a browser with the SuperZoom extension. Visible are the unredacted query strings values for lastname, username, ssn, cc, password, and apikey. Using Burp Suite, we observed the unredacted data in SuperZoom’s POST request to cr-input.superzoom.net.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

Figure 23. Company X-provided data for our own domain.

We did, however, note that some redaction was performed post-collection. Using a browser with a Party Y extension, we submitted a URL with the parameter “[email protected]” In each instance, regardless of whether the extension performed redaction of the email address, the address was delivered to us by Company X with the redaction

4.1.1 POST request patterns

We observed all eight extensions perform POST requests to hostnames that resolve to two groups of IPs. The first group, identified as Group A, consists of five hostnames that were observed from four extensions. The five hostnames from Group A resolve to the same two IP addresses and use a subdomain of the following names: “cr-b” or “ff-b.” The second group, identified as Group B, consists of eight hostnames that all resolve to the same five IP addresses. IP Group B hostnames use subdomains of the following patterns: “cr-input” or “ff-input.” We observe that all invasive Chrome extensions send their requests to hostnames that begin with “cr.” All invasive Firefox extensions send their requests to hostnames that begin with “ff.”

4.1.2 Hostnames

Using the available toolsets, we uncovered Party Y hostnames being used in data collection for Company X (see Table 3).

Table 3. GET and POST request Hostnames used by Party Y extensions.

Group A Group B cr-b.hvrzm.com

cr-b.getspeakit.com

cr-b.ebehaviors.com

cr-b.panelmeasurement.com

ff-b.ebehaviors.com cr-input.hvrzm.com

cr-input.getspeakit.com

cr-input.superzoom.net

cr-input.ebehaviors.com

cr-input.panelmeasurement.com

ff-input.ebehaviors.com

ff-input.mxpnl.net

ff-input.superzoom.net

Each Group A hostname has two DNS A records that resolve to 34.195.36.59 and 52.200.192.173.

Each Group B hostname has five DNS A records that resolve to 52.44.145.169, 34.231.181.237, 34.237.105.7, 52.204.245.185, 54.160.162.145, and 52.72.99.240.

We observed eight Party Y extensions making GET or POST requests to one or more of the Table 3 hostnames.

We uncovered other hostnames, as well; however, we were unable to link them to any extension (see Table 4). These hostnames use DNS A records that point to the same DNS A records as those discussed in Table 3’s respective Group A and B hostnames. It is possible these extensions were already flagged or removed from browser vendor stores.

Table 4. Hostnames not yet identified with any particular extension.

Unidentified Group A hostnames Unidentified Group B hostnames cr-b.prestadb.net

networkanalytics.net

b.networkanalytics.net cr-input.prestadb.net

ff-input.prestadb.net

input.networkanalytics.net

cr-input.nodehop.com

ff-input.nodehop.com

cr-input.mxpnl.net

cr-input.freevideodownloader.net

ff-input.freevideodownloader.net

4.1.3 Text analysis

Notwithstanding the SaveFrom.net Helper extension, the privacy policies of the other five extensions are nearly identical. They share the same verbiage. FairShare Labs, PanelMeasurement, and Super Zoom listed the same business address on the extension author website, extension Chrome web store page, or extension privacy policy page, respectively. Links to the policies can be found in the appendices.

4.1.4 Data collection

To simplify the results for the body of our report, we analyzed a single extension, Hover Zoom. Figures and analyses for the other five extensions can be found in the appendices.

Immediately after the installation of Hover Zoom, we observed a GET request to cr-b.hvrzm.com containing query strings noting a unique browser ID, the extension’s installation time, install version, current version, and Hover Zoom’s Chrome extension ID. Using screenshots from Burp Suite, we present these requests in Figures 24 through 34. We identify the individual GET and POST requests using the request number located in the left-hand column of the screenshot (see Figure 24).

We observed the extension’s automated update to a new version, but, once again, we did not observe the collection of browsing activity data; however, 24 days after installation, we did observe a second automated update to the extension. Immediately following the second automated update, a GET request to cr-b.hvrzm.com was again performed with query strings noting the unique browser ID, installation time, install version (noted as two versions prior), current version, and Chrome extension ID (see Figure 25). One second later, another GET request was made, notably with a 156KB response payload. Following this request, all subsequent browser activity data was collected and sent via POST request to cr-input.hvrzm.com. Notably, the cr-input.hvrzm.com hostname does not appear in the original source code of the extension, nor does it appear in an installation on a new device. In Figure 26, we observe a unique reason query string parameter in the GET request. Figure 27 is the 156KB response payload to Figure 26’s GET request. We found 150KB of the original 156KB response payload stored in a file located in the Google Chrome profile’s File System folder, C:\Users\Administrator\AppData\Local\Google\Chrome\User Data\Default\File System\002\p\00\00000000. Upon examination of this payload, we observed minified JS data containing encoding and compression instructions, and we also observed the introduction of the cr-input.hvrzm.com hostname. We provide an unminified view of the 00000000 file in Figure 28.

Immediately following the download of the 156KB payload, a subsequent GET request was made noting a config statement in the URL (see Figure 29). In Figure 30, we observed the GET response to the config statement, which noted a batch_size of 10 and batch_max_wait of 600,000. Figure 31 illustrates how Hover Zoom waited until > 9 URLs had been collected before it made a POST request to cr-input.hvrzm.com. If fewer than 10 URLs were collected, it made a POST request with that data within 600,000 milliseconds (10 minutes). We browsed one page (request #12643) on the text.npr.org website. Precisely 10 minutes after request #12643, we observed POST request #12645 to cr-input.hvrzm.com containing a compressed and encoded payload.

Figure 24. GET request #113. On Feb 5, 2019 at 1:03 a.m., Hover Zoom was installed on a newly provisioned Windows Server 2016 Standard virtual machine with Google Chrome, Burp Suite, and the FoxyProxy Standard extension preinstalled. FoxyProxy Standard was used to proxy this information back to Burp Suite. Immediately upon Hover Zoom installation, we observed GET request #113 to cr-b.hvrzm.com with query string parameters notating the “installTime” and unique browser ID. After installation, we did not observe the collection of browser activity data until March 1, 2019.

Figure 25. GET request #2102. On March 1, 2019 at 8:00 a.m., a GET request was made to cr-b.hvrzm.com containing query strings noting the unique browser ID, original installTime recorded as a UNIX epoch timestamp, installVersion, and current extension version (currentVersion), as well as the Chrome extension ID (cid). This request immediately followed an automatic update to the Hover Zoom extension (request #2101).

Figure 26. GET request #2103. The “reason” query string value within the request is noted as {“parts”:{}}.

Figure 27. Response to GET request #2103. On this windows VM, contents of the response were saved within the following file: C:\Users\Administrator\AppData\Local\Google\Chrome\User Data\Default\File System\002\p\00\00000000. For reference, a copy of the 00000000 file (dated March 1, 2019) can be downloaded here.

Figure 28. Unminified view of the ~150KB file, 00000000, found in the Chrome File System folder. Lines 1437 and 1438 of this unminified view introduce a new hostname, cr-input.hvrzm.com, not seen in the official source code of the Hover Zoom extension. The original source code may be accessed by viewing the files in the Chrome profile’s Extensions folder or by using Rob Wu’s CRX Viewer.

Figure 29. GET request #2104, which contains “config” as a query string field name within the URL path.

Figure 30. Response to GET request #2104. The response notes a batch_size of 10, and batch_max_wait of 600000.

Figure 31. Request #12642. On March 11, 2019, we logged into the VM to check for any changes. We browsed several sites, including 10 pages on text.npr.org. After we visited those pages, we observed a POST request to cr-input.hvrzm.com. The payload contained compressed and encoded data of our browsing activity. We have published our DataSpii decoder tool to decode any payload data: https://decoder.dataspii.com. In addition, precisely 10 minutes after request #12643, we observed POST request #12645 to cr-input.hvrzm.com containing a compressed and encoded payload

To ascertain whether our browsing activity was accessible via Company X’s service, we used a browser with the actively data-collecting Hover Zoom extension. We then visited a domain under our own control, for which we had already requested analytics data via the Company X dashboard. We visited a unique URL on our domain. See Figure 32 (request is highlighted in red) for the POST request containing our browsing activity.

Figure 32. Request #12598 (highlighted in red) contains POST request data of our browsing activity from GET request #12589 (highlighted in green).

Figure 33. Decoded content of POST request #12598. We decoded the data using our DataSpii decoder, https://decoder.dataspii.com. Further, we observed Hover Zoom’s redaction abilities.

Figure 34. The URL containing some redacted query string parameters from GET request #12589 (Fig. 32) is visible in the Google Analytics account that received its data from the Company X service.

Note: This is a screenshot of data within a Google Analytics account populated by Company X.

4.2 Obstacles

The two users we identified via the data collected from Company X had one extension in common: Hover Zoom. From the outset of our investigation of Hover Zoom, we encountered several obstacles to identifying the extensions and observing the data collection process. The obstacles included dilatory tactics, encoding and compression, prevention of crawling by search engine crawlers, and evolving behavior, as detailed below.

4.2.1 Dilatory observation for data collection

During the course of our investigation, we observed dilatory tactics by Hover Zoom and SpeakIt!. We observed this tactic on multiple OS platforms, on virtual machines (VMs) located in several geographic regions, and at various hosting providers that use different ASNs. We observed these extensions undergo two stages of updating before downloading a data collection instruction set.

Timeline 1

February 5, 2019: We installed (via the Chrome Web Store) the latest version of SpeakIt!, 0.3.10, on one VM and the latest version of Hover Zoom, 6.0.40, on another VM. Since we did not observe any browsing activity data collection at the time of installation, we surmised that either there was no data collection or that that was a quiescent period after which data collection could begin.

February 15, 2019: We observed each extension perform an automated Chrome extension update. Hover Zoom was updated to version 6.0.41, and SpeakIt! to version 0.3.11. We did not observe any collection of browsing activity data after the update.

March 1, 2019: We observed each extension perform an automated Chrome extension update. Hover Zoom was updated to version 6.0.42, and SpeakIt! to version 0.3.12.

March 1, 2019: Seconds after the update, we observed a GET request to cr-b.hvrzm.com (Hover Zoom) or cr-b.getspeakit.com (SpeakIt!), with a response payload containing a data collection instruction set. Following the GET request, all subsequent user browser activity data was collected and sent via a POST request to cr-input.hvrzm.com (Hover Zoom) or cr-input.getspeakit.com (SpeakIt!).

Timeline 2

May 22, 2019: We installed SpeakIt! version 0.3.21 (the latest version at the time) on a VM located in a different geographic region and at a different hosting provider.

June 1, 2019: SpeakIt! was automatically updated to version 0.3.22. After the update, we did not observe any browsing activity data collection.

June 15, 2019: We observed an automatic update to SpeakIt! version 0.3.23.

June 15, 2019: Seconds after the update, we observed a GET request to cr-b.getspeakit.com. This GET request’s response payload contained the data collection instruction set. Following this request, all subsequent user browser activity data was collected and sent via a POST request to cr-input.getspeakit.com.

We repeated this experiment six times, under numerous scenarios; each time we obtained the same result. In the past, similar tactics have been used to avoid detection of data collection. As of May 9, 2019, more than 2.29 million people use Hover Zoom and SpeakIt!.

4.2.2 Encoding, compression, and the DataSpii decoder

Encoding and compression technologies can be used by extensions to evade the detection of data collection. Such technologies can obfuscate data and limit bandwidth consumption. We observed the use of an LZ-based compression algorithm by Hover Zoom, SpeakIt!, FairShare Labs, and PanelMeasurement (see Table 5). Using Burp Suite and a Windows grep tool, we identified the response payload involved in encoding and compression. We identified a file in the Chrome profile’s File System folder containing the response payload.

To assist our analyses, we developed a tool for decoding and decompressing the POST request payloads from all Party Y browser extensions. The DataSpii decoder tool can be found at https://decoder.dataspii.com.

Table 5. Encoding and compression used for SpeakIt!’s POST requests to cr-input.getspeakit.com.

i = new Array(JSON.stringify(e)),

o = t.utils.utf16to8(JSON.stringify(i)),

n = t.compressor.deflate(o);

return a.encode(t.utils.bin2String(n))

4.2.3 Robots.txt

We observed a robots.txt file with instructions to specifically prevent search engines from crawling the terms of service and privacy policy pages on four out of the eight extension authors’ websites. We circumvented the obstacle by performing additional searches on other suspected matching data, such as the corporate street address listed for the extensions.

4.2.4 Evolution

The evolving behavior of Party Y extensions complicated our research. Over the last five months, we observed several newly downloaded data collection instruction sets containing new encoding and compression methods, further hindering our decoding process. These changes required us to modify our DataSpii decoder so we could analyze the new data collection instructions sets. On May 8, 2019, for instance, we observed Hover Zoom’s introduction of a new hostname, pnldsk.adclarity.com. We then observed Hover Zoom making encoded POST requests to this hostname using a new encoding method.

In addition, we initially observed a Party Y extension making POST requests for every 10 URL visits; weeks later, we observed the same extension making POST requests for every 50 URL visits. The extension modified its behavior when the response payload of a GET request increased a batch_size value from 10 to 50.

4.3 Honeypot experiment

We visited a web page on a URL of our own domain using a browser with a Party Y extension. To classify the URL as a unique, we appended query string parameters that specified the time, OS, browser, and Party Y extension. To minimize outside interference, we avoided the use of analytics snippets and third-party trackers in the page code (see Figure 35).

Figure 35. Source code of the samtesting.html page.

Within one hour of the collection, the unique URL appeared in the Google Analytics account populated by Company X. Within 3.5 hours of the collection, we observed a third party visiting the unique URL. Over the course of several weeks, we visited more than two dozen unique URLs on our control domain. We performed this test with each of the eight extensions. Within three to five hours of each visit, a third party visited the unique URLs collected on our domain. Except for large attachments, we observed this visitation pattern 100% of the time. The rapid third-party visitation of collected URLs compromises the security of publicly accessible URLs and ephemeral links. Such visitation enables third parties to process and scrape the page contents of the URL.

While visiting unique URLs using a browser with a Party Y extension, we observed third party visits from several IPs, which all originated from AWS netblocks. We used DomainTools.com and standard DNS lookup tools to verify that the IP’s forward and reverse DNS records match. Table 6 presents the IP addresses, ASN owner, and hostname. We then observed the IP addresses matching forward and reverse DNS records, all of which originated from the kontera.com domain.

Table 6: Third-party IPs visiting URLs collected by our browser using a Party Y extension.

IP Address Netblock Owner/ASN Matching forward and reverse DNS Entry 54.209.60.63 Amazon Web Services / AS14618 nat.aws.kontera.com 52.71.155.178 Amazon Web Services / AS14618 nat-service.aws.kontera.com 184.72.115.35 Amazon Web Services / AS14618 nat-service1.aws.kontera.com 54.175.74.27 Amazon Web Services / AS14618 nat-service3.aws.kontera.com 54.86.66.252 Amazon Web Services / AS14618 nat-service4.aws.kontera.com

Honeypot results

Test 1. We visited our unique URL using Chromium 74.0.3684.0 with the SuperZoom extension on macOS. Note: The server time was logged in GMT. However, in our URL, the “time” parameter value was written in Pacific Daylight Time. In addition, we acknowledge an error in the notation of the time zone. The time zone values were mistakenly written as PST, when it should have been written as PDT.

Our original visit:

OUR-REDACTED-IP – – [11/Mar/2019:20:50:06 +0000] “GET /samtesting.html?&os=mac&brow=crmium&v=74.0.3684.0&ext=SZ&date=mar112019&time=149pmpst&socsec=123004567&customerssn=123004567&lastname=doe&first=john&last=doe&password=mypass&p=anotherpass&apikey=XYZ HTTP/1.1” 200 198 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3684.0 Safari/537.36”

Approximately 4 hours later, an unknown AWS IP performed a GET request of the collected URL:

184.72.115.35 – – [12/Mar/2019:01:03:45 +0000] “GET /samtesting.html?&os=mac&brow=crmium&v=74.0.3684.0&ext=SZ&date=mar112019&time=149pmpst&socsec=123004567&customerssn=123004567&lastname=doe&first=john&last=doe&password=mypass&p=anotherpass&apikey=XYZ HTTP/1.1” 200 198 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.1.25 (KHTML, like Gecko) Version/8.0 Safari/600.1.25”

Test 2. We visited a unique URL using Firefox 65.0.1 with the SaveFrom.net Helper extension (installed via the extension author’s website) on macOS. Note: The server time was logged in GMT. However, in our URL, the “time” parameter value was written in Pacific Daylight Time. Further, we acknowledge an error in the notation of the time zone. The time zone values were mistakenly written as PST, when it should have been written as PDT.

Our original visit:

[OUR-REDACTED-IP] – – [11/Mar/2019:21:42:00 +0000] “GET /samtesting.html?&os=macosx10143&brow=ff&v=65.0.1&ext=SFfromsfhelpernet&date=mar112019&time=241pmpst&socsec=123004567&customerssn=123004567&lastname=doe&first=john&last=doe&password=mypass&p=anotherpass&apikey=XYZ HTTP/1.1” 200 198 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0”

Approximately 3.5 hours later, an unknown AWS IP performed a GET request of the collected URL:

184.72.115.35 – – [12/Mar/2019:01:17:47 +0000] “GET /samtesting.html?&os=macosx10143&brow=ff&v=65.0.1&ext=SFfromsfhelpernet&date=mar112019&time=241pmpst&socsec=123004567&customerssn=123004567&lastname=doe&first=john&last=doe&password=mypass&p=anotherpass&apikey=XYZ HTTP/1.1” 200 198 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.1.25 (KHTML, like Gecko) Version/8.0 Safari/600.1.25”

Additional observations

Using a browser with a Party Y extension, we visited various sample file types, including zip and SQL database files. When visiting the zip file, the browser downloaded the file into the file system. It did not load them directly in the browser. As a result, we did not observe the transmission of the zip URL to a third-party hostname. However, the SQL files were loaded in the browser and the URL of our SQL files was transmitted to cr-input.hvrzm.com. Three hours after it was collected by the Party Y extension, we observed a third-party visit to our SQL file:

184.72.115.35 – – [18/May/2019:12:50:27 +0000] “GET /dataspii-sql-50000rows.sql HTTP/1.1” 200 4393501 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.1.25 (KHTML, like Gecko) Version/8.0 Safari/600.1.25”

As noted in the log entry above, the entirety of the file, 4393501 bytes, was requested and transmitted to the visitor’s IP: 184.72.115.35. Using a browser with a Party Y extension, we visited an SQL file that was 8MB in size, but we did not observe the expected third-party file request. Perhaps this was due to the larger file size.

4.3.1 Network map vulnerability — collection of LAN data

We set up a non-publicly resolvable website on a web server which contained three HTML pages and one CSS file. The primary index page and two additional HTML pages each include images with hyperlinks to the other two pages. Each page also references a CSS file. We placed these HTML pages on a web server accessible only via a private LAN. We browsed the website using the following method:

preconfigured the local computer’s hosts file with the non-publicly resolvable hostname alongside our local IP address, ensured the computer was connected to both the internet and private LAN, and used a browser that has a Party Y extension enabled.

In this experiment, we used the Hover Zoom Chrome extension. We visited two (non-publicly resolvable) URLs on our LAN, which we denote as URL 1 and URL 2. We monitored all connections by using Burp Suite or Chrome Development Tools’ “Inspect views background page” option for the Party Y extension. We then verified whether our data was retrievable using the Company X service. We denote URL 1 and URL 2 below:

URL 1

http://dataspii-lan-experiment-on-non-publicly-resolvable-hostname.[REDACTED].com/index.html?&os=win&brow=chrome&v=74.0.3729.131&ext=HZ&date=may162019&time=117ampst&lantest=yes&usinghostsfilefordns=yes&ispublic=no&ishostnamepubliclyresolveable=no&withoutfoxyproxy=yes

URL 2

http://dataspii-lan-experiment-on-non-publicly-resolvable-hostname.[REDACTED].com/corporate-passwords-stored-here.html

After visiting the URLs, we observed Hover Zoom making POST requests to three different hostnames:

cr-input.hvrzm.com

p.ymnx.co

pnldsk.adclarity.com

These POST requests occurred on ports 443 and 40443.

4.3.1.1 URL 1 and URL 2 sent to cr-input.hvrzm.com

URL 1 sent to cr-input.hvrzm.com

Using Chrome’s Development Tools’ inspect views background page feature, we observed the encoded POST request and decoded it with our DataSpii decoder tool (see Figure 36).

Figure 36. Decoded cr-input.hvrzm.com POST request containing browser activity data of website hosted on our private LAN. The website used a non-publicly resolvable hostname of a domain name we own.

The captured metadata included the site’s LAN IP address (10.10.33.5), page title, timestamp, and URL visited, as well as the URLs of the resources within the page, including CSS files, JS files, and images. The captured data included many elements of the HTTP header, including server type, content security policy, timestamp, access-control-allow-methods, access-control-allow-credentials, access-control-max-age, last-modified timestamp, and more (see Section 4.5.1 for more information).

URL 2 sent to cr-input.hvrzm.com

We visited URL 2 on our private LAN, observed the encoded POST request to cr-input.hvrzm.com using Burp Suite, and decoded it with the DataSpii decoder (see Figure 37).

Figure 37. Decoded cr-input.hvrzm.com POST request containing URL 2.

4.3.1.2 URL 2 sent to pnldsk.adclarity.com

We observed Hover Zoom sending an encoded POST request to a second hostname, pnldsk.adclarity.com. We decoded that request with our DataSpii decoder. The decoded request included values for panelistId, ticketId, the URL we visited on our private LAN, and several encoded values for content (see Figure 38). We subsequently ran one content value through a standard base64 decoder, which revealed the capture of hyperlinks that were coded within our web page. The content value also included the image source URL used for the hyperlink (see Figure 39). Such data could be used by bad actors to map LAN infrastructure.

Figure 38. Decoded POST request to pnldsk.adclarity.com. Base64-encoded values for content are visible. We decode the content in Figure 36.

Figure 39. Decoded base64 “content” value shown in Figure 38. The data contains an image source and hyperlink to a page within our private LAN environment. Such data was captured from our private LAN environment and sent via POST request to pnldsk.adclarity.com.

4.3.1.3 URL 2 sent to p.ymnx.co

We observed Hover Zoom make a third POST request to p.ymnx.co. The decoded request contained the URL that was visited on our LAN as well as the user agent (UA) of our browser (see Figure 40).

Figure 40. Decoded POST request data sent to p.ymnx.co.

4.4 Quantification

Quantifying Party Y’s collection of PII and CI and their dissemination by both Party Y and Company X warrants further investigation. We do not know the number of Company X members or the domains for which they requested data. In addition, the total number of data points collected or disseminated is difficult to ascertain. Company X does not include historical data to their customers. Company X provides the real-time data as it is being captured (with a one-hour delay) to a member’s Google Analytics account. The data is then saved, collated, and made available to the Company X member. In addition, Company X members can log into a dashboard and share the data with anyone who has a Gmail account.

Second, data collection companies are in the business of selling data. Our investigation did not set out to examine links between Party Y’s business transactions or legal relationships with other companies. However, as mentioned in Section 4.3, we observed third parties visiting the unique URLs captured by Party Y extensions. Such capability is dangerous to global security.

4.4.1 Google Analytics hit data

Before March of 2019, customers of Company X that requested the data of a domain could see the number of hits made to it. Using the data we requested for fbcdn.net as an example, over a 30-day period we observed 60,267,694 hits to that domain. A Google Analytics hit is defined as “[a]n interaction that results in data being sent to Analytics. Common hit types include page tracking hits, event tracking hits, and ecommerce hits” [11].

In addition, over that same 30-day period, we recorded the number of Google Analytics hits from 28 domains via the Company X service. The combined hit count was over 246 million. For reference, 3 of the 28 domains are ranked in the Alexa Top 100, 17 of the 28 domains in the Top 5000, and the remaining 8 are outside of the top 5000. We estimate that within one year, the hit count of these 28 domains would be in the billions.

4.4.2 Sample PII collection stats

Using an unnamed airline as an example, a 28-day period in March of 2019 yielded nearly two million Google Analytics property hits for that airline. During that period, we observed 175,000+ unique URLs, of which 8,500 URLs contained passenger names, yielding an average of 304 unique URLs of passenger names per day. Using this number as a daily average, we estimate an exposure of over 110,000 unique URLs containing passenger names per year.

Note: These are not scientific measurements. We acknowledge that there are many variables, such as: a Party Y extension’s active installation count during any period, evolving extension behavior, passenger traffic patterns over the course of the year, etc. In addition, the addition or removal of one individual (e.g., a travel agent) with a Party Y extension, could greatly affect these results.

4.4.3 Party Y redaction efforts

We recognize that Hover Zoom, SpeakIt!, PanelMeasurement, and FairShare Unlock have made some effort to redact PII during data collection. We observed redaction measures for query string parameters such as firstName and lastName; however, the measures did not cover all variations, such as first and last (see Figure 34). For example, Southwest.com uses the query string parameter, passengerLastName. We observed unredacted last names in all URLs containing the passengerLastName parameter in Southwest.com traffic data obtained from Company X.

Furthermore, the redaction abilities did not include employee tasks and other CI from page titles, nor did any Party Y extension exclude data collected from LAN environments. Moreover, we did not observe redaction of any PII or CI by SuperZoom and SaveFrom.net Helper. We noted a single exception: email addresses. For example, we observed the collection of unredacted email addresses by SuperZoom; however, the email addresses appeared in redacted format on the Company X platform. We presume the redaction of email addresses occurred by a third-party processor or by Company X.

4.5 Effects

4.5.1 Private LAN Blueprint

When corporate staff use extensions such as Hover Zoom while browsing pages on their LAN, the extensions may be acting as real-time private LAN crawlers. As seen in Figures 36 through 39, we discovered Hover Zoom’s ability to capture LAN IPs, hostnames, web server technology data, and even the hyperlinks embedded within our page. We then observed Company X disseminating LAN data to its members in near real-time. (Note that whereas Hover Zoom appears to collect many types of metadata, Company X does not appear to provide all the metadata types [e.g., last-modified] to its customers.) Armed with such data, a bad actor could create a detailed network topology of an organization’s private infrastructure. In addition, the collation of such data facilitates the identification of internal disaster recovery systems, firewalls, routers, switches, and web server technologies. Such detailed information can, in turn, be used to monitor and exploit vulnerabilities.

4.5.2 Browser vendor’s remote deactivation did not stop collection

After reporting a Party Y extension to a browser vendor, the vendor remotely disabled the extension for all users; however, that did not stop the data collection. While the extension ceased its functionality, we continued to observe our browsing activity being sent via POST request to Party Y servers. The data collection stopped when we removed the extension. Moreover, a second report to a browser vendor resulted in the extension’s removal from the extension store; however, the extension was not remotely deactivated. It remained active and working in our test environment.

4.6 Courses of action

4.6.1 Preemptive removal of PII, API keys, and confidential information from metadata

The continuous evolution of data collecting extensions, weak browser vendor policies, and the general complacency about publicly accessible unique URLs have engendered the DataSpii leak. Data collection via privacy-violating browser extensions has been a well-documented issue for nearly two decades [12]. Party Y extensions are a few of many privacy-violating extensions. While Google and Mozilla may decide to improve their extension-vetting processes or increase their browser protections, browser extensions are merely one of many avenues where data collection occurs. In addition, data-mining firms will continue to collect data via other platforms (e.g., mobile).

We recommend that developers, corporations, and cybersecurity professionals act preemptively to prevent the leaking of PII and CI via metadata. This includes removing PII, CI, and confidential material within metadata such as URLs. We propose that companies further protect their APIs by restricting access to whitelisted IP addresses. In addition, file sharing services such as Dropbox should increase protections by expanding their premium features such as file expiration into their free tier. The restriction of such features to only the premium tier poses global risks.

The companies impacted by the capture of LAN data, as listed in this report, may want to consider changing their local environments by, for instance, modifying LAN network IP ranges or subnets.

4.6.2 Firefox Send: Archetype of file sharing services

Mozilla’s Firefox Send is a file sharing service with enhanced security features that protect against leaks. Users of this service upload files and share the links with others. Unlike other file sharing services, Firefox Send, by default, sets URLs to expire in 24 hours and one download. In addition, Firefox Send does not insert the file or document name into metadata such as the page title. Such default measures protect users against the DataSpii leak.

5. Discussion

5.1 Motivation

Our research on DataSpii arose out of sheer serendipity. While attempting to sign up with Company X’s well-known sister company, we were redirected to Company X’s site. We thought we were signing up with the sister company, yet we were, in reality, signing up with Company X. At first glance, Company X’s claims of viewing any domain’s traffic data seemed dubious. Instead of canceling our free trial with Company X, we instead opted to continue the trial to see where our observations would lead us. Within days of requesting data of a domain under our control, we recognized unique URLs that could only have been accessed by specific users. Since Company X states that it provides data only from desktop browsers, browser extensions and their access to privileged API features naturally emerged as the culprit.

Having associated unique URLs with users of domains under our control, we then asked three of the impacted users to provide us with a list of their browser extensions. Taken aback by the implications, the first user had immediately deleted all browser extensions and was, as a result, unable to identify the extension names. The other two users provided us with lists of their browser extensions, which revealed one common denominator: Hover Zoom. The first user was using Firefox on Ubuntu, the second was using Chrome on a Chromebook running Chrome OS, and the third was using Chrome on Windows. Despite Company X’s claim that its users have opted for their data to be collected, Hover Zoom users we spoke with were unaware of the collection or could not recall agreeing to it. The discrepancy between what the company claims and what the users experienced prompted our investigation.

5.2 Alternative theories

Initially, we theorized a sensor sitting at the edge of an ISP or corporate network. While a sensor could in theory capture unencrypted data, it would not be able to do so from websites enforcing the HTTPS protocol. However, similar to our methodology using Burp Suite, network monitoring systems can allow enterprises to monitor HTTPS traffic via Certificate Authority (CA) certificates as deployed by group policy. Considering that Company X was able to present encrypted data outside of corporate environments, this was beyond the capability afforded by the use of a sensor. This technology ignores Company X’s ability to provide encrypted data that originated outside of corporate environments. Barring a catastrophic vulnerability in encryption protocols, a more plausible explanation would be that local software was capturing browsing activity data. While we cannot rule out additional forms of collection, the most logical explanation was the privileged API features afforded to browser extensions.

Second, we reviewed the methods used by Party Y and Company X to ascertain whether our findings could be spurious. In theory, we acknowledge it is possible to submit invalid data to Group B hostnames such as cr-input.hvrzm.com, which data could then be delivered to members of Company X. In addition, an IP address’s netblock owner could be modified to any owner name by modifying the ASN or OrgName records at the regional internet registry (RIR) (e.g., ARIN).

We believe such orchestration is unlikely, for several reasons. First, any practical gains would be minimal compared to the effort required to fabricate a data collection scenario like DataSpii. The amount of resources required to achieve such a feat only to sell the data at a nominal cost would be illogical. Furthermore, the orchestrator would jeopardize profit. Secondly, even if it were practical, no corporation would want to invite the scrutiny such orchestration would entail. Third, it is unprecedented. If we assume the orchestration occurred, there would be many technical obstacles. An orchestrator would have to fabricate employee tasks, LAN hostnames, and page URLs. The orchestrator would then have had to identify and associate service providers and their IP addresses with fabricated visitors. The orchestrator would concoct extensions as a ruse only to sell the data at nominal fees. For these reasons and others, we believe that DataSpii was a bonafide leak on a catastrophic scale.

5.3 Related work

Other investigations into browser extensions that collect data have been recently published [13,14]. In one such investigation, Weissbacher et al. employed a novel prototype, Ex-Ray, to identify privacy-violating extensions on a large scale. In the process, they uncovered the extension SpeakIt!, just as we did. The authors dismissed the likelihood of dormancy given the economic incentive for the immediate collection of data via privacy-violating extensions. Had the authors monitored the extensions as long as we did, they may well have replicated our findings of dormancy.

In another investigation, Kapravelos et al. used Hulk, a novel system comprising of HoneyPages and a fuzzer, to simulate the behavior of extensions [15]. During the course of our investigation, several Party Y extensions required 10 or 50 page visits before eliciting a POST request with the browser activity data. We observed the batch_size parameter in response to GET requests by repeated manual input, a process that systems like Hulk could obviate.

In 2018, Quan and Kapravelos employed a novel prototype called Mystique to uncover browser extension data leaks [16]. Of the 181,683 extensions analyzed in the study, 3,868 (2.13%) emerged as candidates for leaking sensitive data. Although a step in the right direction, approaches like Mystique may not be suited to uncovering leaks concealed by dilatory tactics. We propose that Mystique and Ex-Ray extend the period of monitoring, just as we did.

Inspired by these prior investigations, we developed the DataSpii decoder tool to decode and decompress the data collected by Party Y extensions. The DataSpii decoder enables instant analysis of the collected data, allowing us to identify the capture of sensitive private LAN data from Party Y extensions. To the best of our knowledge, we are the first to successfully employ metadata analyses to accurately identify impacted individuals and corporations.

6. Conclusion

Our investigation determined that Party Y extensions had been providing sensitive data to members of the Company X service. We made several key observations in this regard. First, browser extensions like SpeakIt! and Hover Zoom employed dilatory tactics. The collection of data began 24 days post installation. Second, third-party visits to the unique URLs collected by Party Y extensions occurred. Third, a Party Y extension captured detailed, private LAN data before sending it to multiple hostnames.

Based on our investigation, we offer several recommendations. First, we recommend further research using novel methods to replicate, qualify, or extend our findings. Second, we recommend that browser vendors review their extension policies. Third, we recommend that corporations enact stronger browser security policies. Fourth, we recommend that web developers remove PII and CI from metadata such as URLs.

Though prudent, short-term fixes will not ultimately protect data from threats such as DataSpii. True data security will require the sustained collaboration of web developers, cybersecurity professionals, marketers, and browser vendors. The implications of our investigation transcend any one extension, website, Fortune 500 company, browser, or OS.

DataSpii arose from hazardous assumptions about data security. It circumvented some of the best technological safeguards (e.g., authentication and encryption) against data leaks. Even the most responsible individuals proved vulnerable to DataSpii; with vast budgets and myriad experts on hand, even the largest cybersecurity corporations proved vulnerable to DataSpii. Our data is only as secure as those with whom we entrust it. It takes just one party to unwittingly leak another party’s data.

Born of serendipity, our investigation has carefully documented a catastrophic leak on an unprecedented scale. We hope this report is evaluated by experts in the field. Once that has been done, the real work can begin.

Acknowledgements

I would like to thank Bridget Vandenbosch for her editing, logo and site design, and tireless support and assistance over the last seven months. I would like to express my great appreciation to Dr. Michael Weissbacher for his valuable suggestions. I would also like to thank the tech editor who told me six months ago that my initial discovery alone was not enough. Finally, I would like to thank the Electronic Frontier Foundation for their legal assistance. This report would not have been possible without the help of the above.

References

[1] Schaub, Florian, Aditya Marella, Pranshu Kalvani, Blase Ur, Chao Pan, Emily Forney, and Lorrie Faith Cranor. “Watching them watching me: Browser extensions’ impact on user privacy awareness and concern.” In NDSS workshop on usable security. 2016.https://pdfs.semanticscholar.org/d5ae/e2be1abe0d672a205a0792adbae352c733f9.pdf.

[2] “Manage Events with Background Scripts.” Chrome: Developer. Accessed May 19, 2019. https://developer.chrome.com/extensions/background_pages.

[3] Bruce Schneier, Data and Goliath: The hidden battles to collect your data and control your world (New York: Norton, 2013).

[4] Council of the European Union, and European Parliament. “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation).” Official Website of the European Union. Accessed May 24, 2019. https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679.

[5] Sobers, Rob. “The Dangers of Shared Links.” Inside Out Security (June 11, 2013). Accessed June 6, 2019. https://www.varonis.com/blog/the-dangers-of-shared-links/.

[6] Atlassian Cloud Support. “Changing the cloud instance URL address or migrating to a new cloud instance.” (April 2019). https://confluence.atlassian.com/cloudkb/changing-the-cloud-instance-url-address-or-migrating-to-a-new-cloud-instance-691011835.html.

[7] “Practice Medicine, Not Administration.” DrChrono. Accessed May 21, 2019. https://www.drchrono.com/.

[8] “Contact Us.” FireEye. Accessed June 1, 2019. https://www.fireeye.com/company/contact-us.html.

[9] “Customers – What Companies Use to Service You.” Zendesk. Accessed May 19, 2019. https://www.zendesk.com/why-zendesk/customers/.

[10] Gantz, John F., Pavel Soper, Thomas Vavra, Lars Smith, Victor Lim, and Stephen Minton. “Unlicensed software and cybersecurity threats.” White Paper (2015). http://download.softwareculicenta.ro/raport-idc-2015-03-malware.pdf.

[11] “Hit – Analytics Help.” Google. Accessed June 12, 2019. https://support.google.com/analytics/answer/6086082?hl=en.

[12] Martin, David M., Richard M. Smith, Michael Brittain, Ivan Fetch, and Hailin Wu. “The privacy practices of Web browser extensions.” Communications of the ACM 44, no. 2 (2001): 45-50.

[13] Weissbacher, Michael, Enrico Mariconti, Guillermo Suarez-Tangil, Gianluca Stringhini, William Robertson, and Engin Kirda. “Ex-Ray: Detection of History-leaking Browser Extensions.” In Proceedings of the 33rd Annual Computer Security Applications Conference, pp. 590-602. ACM, 2017.

[14] “Chrome Extensions – AKA Total Absence of Privacy.” Detectify Labs. Accessed June 12, 2019. https://labs.detectify.com/2015/11/19/chrome-extensions-aka-total-absence-of-privacy/.

[15] Kapravelos, Alexandros, Chris Grier, Neha Chachra, Christopher Kruegel, Giovanni Vigna, and Vern Paxson. “Hulk: Eliciting malicious behavior in browser extensions.” In 23rd {USENIX} Security Symposium ({USENIX} Security 14: 2014): 641-654.

[16] Chen, Quan, and Alexandros Kapravelos. “Mystique: Uncovering information leakage from browser extensions.” In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1687-1700. ACM, 2018.

[17] “SpeakIt! – Text to Speech for Chrome.” n.d. Chrome Web Store. Google. Accessed May 19, 2019. https://chrome.google.com/webstore/detail/speakit/pgeolalilifpodheeocdmbhehgnkkbak?hl=en-US.

[18] “SuperZoom.” n.d. Chrome Web Store. Google. Accessed February 1, 2019. https://chrome.google.com/webstore/detail/super-zoom/gnamdgilanlgeeljfnckhboobddoahbl?hl=en

[19] “SaveFrom.net Help