A study published in June 2019 reveals that in the Alexa Top 1 million websites, one out of 600 sites executes WebAssembly (Wasm) code. The study moreover finds that over 50% of those sites using WebAssembly apply it for malicious deeds, such as cryptocurrency mining and malware code obfuscation.

Marius Musch, Christian Wressnegger, Martin Johns, and Konrad Rieck, in a study sponsored by the Institute for Application Security and the Institute of System Security from the Technische Universität Braunschweig, analyzed the prevalence of WebAssembly in the Alexa Top 1 million websites. The team examined the websites in the Alexa sample over a time span of four days, and successfully studied 947,704 websites, eventually visiting 3,465,320 web pages. The study provides novel information about the prevalence of WebAssembly, the extent of its usage by the websites featuring Wasm modules, and categorizes WebAssembly usage purpose by those sites.

1,950 Wasm modules were found on 1,639 sites (roughly one site out of 600). An important portion of these modules is not loaded on the front page of a site, but on subpages, often through a third-party script or iframe with another origin (795 sites from the sample). The study reports that the 1,950 Wasm modules represent 150 unique samples, indicating that some Wasm modules are found on several sites, with the extreme case of one module being present on 346 different sites. Conversely, 87 samples are unique to a website, indicating a custom development for that particular website. On average, sites using WebAssembly use 1.2 Wasm module per page visited by the study. Ranking-wise, sites with a lower Alexa rank, i.e. higher user traffic ( google.com for instance ranks first) tend to use WebAssembly more often.

The study also provided data about the extent of usage of WebAssembly in relevant websites, using two indicators to that purpose. The first is the size of the WebAssembly module, ranging from 8 bytes to 25.3MB, with a median value of 100KB per module. This can be explained by the difference in WebAssembly usage purpose. The study reports that some sites just test if the browser does support WebAssembly, while other sites are actually relying on the functionality the module exposes.

The second indicator, WebAssembly relative usage vs. JavaScript as extracted by Chrome browsers integrated performance profiler, shows two clear segments. On the one side, a majority of sites (1121 sites or roughly two-thirds of the sample) almost never use WebAssembly. On the other side, the rest of the sites are nearly exclusively spending time running the Wasm code.

The research team manually categorized the Wasm modules in 6 categories, reflecting the purpose behind the use of WebAssembly: Custom, Game, Library, Mining, Obfuscation, and Test. Of these six categories, two (Mining – 55.6% of website sample, and Obfuscation – 0.2% of websites sample) represent malicious usage of WebAssembly. The study details:

The largest observed category implements a cryptocurrency miner in WebAssembly, for which we found 48 unique samples on 913 sites in the Alexa Top 1 Million.

(…) 56%, the majority of all WebAssembly usage in the Alexa Top 1 Million is for malicious purposes.

Wasm samples in the Mining category exhibit unique traits vs. modules from other categories. The collected WebAssembly miners’s code share a high similarity. Furthermore, profiling data indicates that the vast majority of websites with intense usage of Wasm (more than 50% of the time spent running WebAssembly code) are indeed mining for cryptocurrencies. A manual analysis of the modules in the Mining category, and which did not display intense Wasm code usage (relative CPU share below 50%) indicates four key reasons for the failure to run Wasm code:

A mining script is included, but the miner is not started or was disabled and the script not removed. The miner only starts once the user interacts with the web page or after a certain delay. The miner is broken, either because of invalid modifications or because the remote API has changed. The WebSocket backend is not responding, which prevents the miner from running.

The study concludes:

[The study] suggests that we are currently only seeing the tip of the iceberg of a new generation of malware (…). In consequence, incorporating the analysis of WebAssembly code hence is going to be of essence for effective future defense mechanisms.

The full study is available online. A shorter presentation summarizing the results of the study can also be consulted.

The data collection methodology defines a site as one entry in the Alexa list, together with the pages that share the same origin with that entry. The research team instrumented a browser to collect all WebAssembly code. As a preliminary study revealed that a significant fraction of the Wasm code is not loaded when visiting the front page of a domain, the study collected data from three randomly selected links from the front page. This led to identifying 25% more sites that use WebAssembly and to collecting 40% more unique samples, compared to a crawl of the same sites without any subpages.

The research team additionally used a profiler to gather information about the CPU usage of the visited sites, allowing the team to assess the percentage of time spent executing JavaScript and WebAssembly code. For profiling purposes, the research team measured the execution time of Wasm and JavaScript code and excluded all other factors like idle times when waiting for network responses.

At a technical level, the research team transparently hooked the creation of all JavaScript functions which can compile or instantiate Wasm modules. This includes the instantiate method, the instantiateStreaming method, the WebAssembly.Module constructor and more.

Alexa provides website traffic statistics, among which a website traffic ranking. The rank is calculated using a proprietary methodology that combines a site’s estimated average of daily unique visitors and its estimated number of pageviews over the past three months.

The Technische Universität Braunschweig (Braunschweig Institute of Technology) is the oldest Technische Universität (comparable to an institute of technology in the American system) in Germany and ranks among the top universities for engineering in Germany.