Are you one of the allegedly 400 million users of Avast antivirus products? Then I have bad news for you: you are likely being spied upon. The culprit is the Avast Online Security extension that these products urge you to install in your browser for maximum protection.

But even if you didn’t install Avast Online Security yourself, it doesn’t mean that you aren’t affected. This isn’t obvious but Avast Secure Browser has Avast Online Security installed by default. It is hidden from the extension listing and cannot be uninstalled by regular means, its functionality apparently considered an integral part of the browser. Avast products promote this browser heavily, and it will also be used automatically in “Banking Mode.” Given that Avast bought AVG a few years ago, there is also a mostly identical AVG Secure Browser with the built-in AVG Online Security extension.

Summary of the findings

When Avast Online Security extension is active, it will request information about your visited websites from an Avast server. In the process, it will transmit data that allows reconstructing your entire web browsing history and much of your browsing behavior. The amount of data being sent goes far beyond what’s necessary for the extension to function, especially if you compare to competing solutions such as Google Safe Browsing.

Avast Privacy Policy covers this functionality and claims that it is necessary to provide the service. Storing the data is considered unproblematic due to anonymization (I disagree), and Avast doesn’t make any statements explaining just how long it holds on to it.

What is happening exactly?

Using browser’s developer tools you can look at an extension’s network traffic. If you do it with Avast Online Security, you will see a request to https://uib.ff.avast.com/v5/urlinfo whenever a new page loads in a tab:

So the extension sends some binary data and in return gets information on whether the page is malicious or not. The response is then translated into the extension icon to be displayed for the page. You can clearly see the full address of the page in the binary data, including query part and anchor. The rest of the data is somewhat harder to interpret, I’ll get to it soon.

This request isn’t merely sent when you navigate to a page, it also happens whenever you switch tabs. And there is an additional request if you are on a search page. This one will send every single link found on this page, be it a search result or an internal link of the search engine.

What data is being sent?

The binary UrlInfoRequest data structure used here can be seen in the extension source code. It is rather extensive however, with a number of fields being nested types. Also, some fields appear to be unused, and the purpose of others isn’t obvious. Finally, there are “custom values” there as well which are a completely arbitrary key/value collection. That’s why I decided to stop the extension in the debugger and have a look at the data before it is turned into binary. If you want to do it yourself, you need to find this.message() call in scripts/background.js and look at this.request after this method is called.

The interesting fields were:

Field Contents uri The full address of the page you are on. title Page title if available. referer Address of the page that you got here from, if any. windowNum

tabNum Identifier of the window and tab that the page loaded into. initiating_user_action

windowEvent How exactly you got to the page, e.g. by entering the address directly, using a bookmark or clicking a link. visited Whether you visited this page before. locale Your country code, which seems to be guessed from the browser locale. This will be “US” for US English. userid A unique user identifier generated by the extension (the one visible twice in the screenshot above, starting with “d916”). For some reason this one wasn’t set for me when Avast Antivirus was installed. plugin_guid Seems to be another unique user identifier, the one starting with “ceda” in the screenshot above. Also not set for me when Avast Antivirus was installed. browserType

browserVersion Type (e.g. Chrome or Firefox) and version number of your browser. os

osVersion Your operating system and exact version number (the latter only known to the extension if Avast Antivirus is installed).

And that’s merely the fields which were set. The data structure also contains fields for your IP address and a hardware identifier but in my tests these stayed unused. It also seems that for paying Avast customers the identifier of the Avast account would be transmitted as well.

What does this data tell about you?

The data collected here goes far beyond merely exposing the sites that you visit and your search history. Tracking tab and window identifiers as well as your actions allows Avast to create a nearly precise reconstruction of your browsing behavior: how many tabs do you have open, what websites do you visit and when, how much time do you spend reading/watching the contents, what do you click there and when do you switch to another tab. All that is connected to a number of attributes allowing Avast to recognize you reliably, even a unique user identifier.

If you now think “but they still don’t know who I am” – think again. Even assuming that none of the website addresses you visited expose your identity directly, you likely have a social media account. There has been a number of publications showing that, given a browsing history, the corresponding social media account can be identified in most cases. For example, this 2017 study concludes:

Of the 374 people who confirmed the accuracy of our de-anonymization attempt, 268 (72%) were the top candidate generated by the MLE, and 303 participants (81%) were among the top 15 candidates. Consistent with our simulation results, we were able to successfully de-anonymize a substantial proportion of users who contributed their web browsing histories.

With the Avast data being far more extensive, it should allow identifying users with an even higher precision.

Isn’t this necessary for the extension to do its job?

No, the data collection is definitely unnecessary to this extent. You can see this by looking at how Google Safe Browsing works, the current approach being largely unchanged compared to how it was integrated in Firefox 2.0 back in 2006. Rather than asking a web server for each and every website, Safe Browsing downloads lists regularly so that malicious websites can be recognized locally.

No information about you or the sites you visit is communicated during list updates. […] Before blocking the site, Firefox will request a double-check to ensure that the reported site has not been removed from the list since your last update. This request does not include the address of the visited site, it only contains partial information derived from the address.

I’ve seen a bunch of similar extensions by antivirus vendors, and so far all of them provided this functionality by asking the antivirus app. Presumably, the antivirus has all the required data locally and doesn’t need to consult the web service every time. In fact, I could see Avast Online Security also consult the antivirus application for the websites you visit if this application is installed. It’s an additional request however, the request to the web service goes out regardless. Update (2019-10-29): I understand this logic better now, and the requests made to the antivirus application have a different purpose.

Wait, but Avast Antivirus isn’t always installed! And maybe the storage requirements for the full database exceed what browser extensions are allowed to store. In this case the browser extension has no choice but to ask the Avast web server about every website visited. But even then, this isn’t a new problem. For example, the Mozilla community had a discussion roughly a decade ago about whether security extensions really need to collect every website address. The decision here was: no, sending merely the host name (or even a hash of it) is sufficient. If higher precision is required, the extension could send the full address only if a potential match is detected.

What about the privacy policy?

But Avast has a privacy policy. They surely explained there what they need this data for and how they handle it. There will most certainly be guarantees in there that they don’t keep any of this data, right?

Let’s have a look. The privacy policy is quite long and applies to all Avast products and websites. The relevant information doesn’t come until the middle of it:

We may collect information about the computer or device you are using, our products and services running on it, and, depending on the type of device it is, what operating systems you are using, device settings, application identifiers (AI), hardware identifiers or universally unique identifiers (UUID), software identifiers, IP Address, location data, cookie IDs, and crash data (through the use of either our own analytical tools or tolls provided by third parties, such as Crashlytics or Firebase). Device and network data is connected to the installation GUID. We collect device and network data from all users. We collect and retain only the data we need to provide functionality, monitor product and service performance, conduct research, diagnose and repair crashes, detect bugs, and fix vulnerabilities in security or operations (in other words, fulfil our contract with you to provision the service).

Unfortunately, after reading this passage I still don’t know whether they retain this data for me. I mean, “conduct research” for example is a very wide term and who knows what data they need to do it? Let’s look further.

Our AntiVirus and Internet security products require the collection of usage data to be fully functional. Some of the usage data we collect include: […] information about where our products and services are used, including approximate location, zip code, area code, time zone, the URL and information related to the URL of sites you visit online […] We use this Clickstream Data to provide you malware detection and protection. We also use the Clickstream Data for security research into threats. We pseudonymize and anonymize the Clickstream Data and re-use it for cross-product direct marketing, cross-product development and third party trend analytics.

And that seems to be all of it. In other words, Avast will keep your data and they don’t feel like they need your approval for that. They also reserve the right to use it in pretty much any way they like, including giving it to unnamed third parties for “trend analytics.” That is, as long as the data is considered anonymized. Which it probably is, given that technically the unique user identifier is not tied to you as a person. That your identity can still be deduced from the data – well, bad luck for you.

Edit (2019-10-29): I got a hint that Avast acquired Jumpshot a bunch of years ago. And if you take a look at the Jumpshot website, they list “clickstream data from 100 million global online shoppers and 20 million global app users” as their product. So you now have a pretty good guess as to where your data is going.

Conclusions

Avast Online Security collecting personal data of their users is not an oversight and not necessary for the extension functionality either. The extension attempts to collect as much context data as possible, and it does so on purpose. The Avast privacy policy shows that Avast is aware of the privacy implications here. However, they do not provide any clear retention policy for this data. They rather appear to hold on to the data forever, feeling that they can do anything with it as long as the data is anonymized. The fact that browsing data can usually be deanonymized doesn’t instill much confidence however.

This is rather ironic given that all modern browsers have phishing and malware protection built in that does essentially the same thing but with a much smaller privacy impact. In principle, Avast Secure Browser has this feature as well, it being Chromium-based. However, all Google services have been disabled and removed from the settings page – the browser won’t let you send any data to Google, sending way more data to Avast instead.

Update (2019-10-28): Somehow I didn’t find existing articles on the topic when I searched initially. This article mentions the same issue in passing, it was published in January 2015 already. The screenshot there shows pretty much the same request, merely with less data.