Chinese messaging app WeChat relies on the input of unwitting users to autonomously expand its blacklist of sensitive images, according to a new study by a Canadian Internet watchdog group.

Research released on July 15 by the University of Toronto’s Citizen Lab focused on how WeChat, owned by Internet tech giant Tencent and boasting an active monthly user base of more than 1 billion people, uses a number of censorship mechanisms to screen picture files sent between users in one-to-one and group chats.

The study found that the app, without requiring human involvement, can expand its own blacklist of prohibited images by subjecting files to both real-time and retroactive analyses.

“Users using the platform are building the [blacklist’s] database by sending images,” Jeffrey Knockel, one of the study’s authors, said, describing the model as a form of “outsourcing”.

“Building the censorship mechanism by using the service, it’s something that we hadn’t really measured on a platform before,” said Knockel, who is a postdoctoral fellow at Citizen Lab.

WeChat – or weixin in Chinese – is China’s most popular social media platform, handling around 45 billion messages sent and received by users daily, according to company data from 2018. That number is about 20 billion for Facebook’s Messenger chat platform, and 65 billion for Facebook-owned WhatsApp.

Researchers in the most recent study did not have access to the app’s inner workings, instead drawing conclusions based on months of experimental analysis.

The censorship mechanisms analysed by Citizen Lab underscore one of the ways internet applications in China and the US are diverging.

WeChat’s autonomous censorship “is in part a reflection of the bifurcation [between the Internet in China and outside the country], since US platforms are not involved in censoring individual chat”, said Adam Segal, director of the digital and cyberspace policy programme at the New York-based Council on Foreign Relations.

“Tencent is also training the system without the consent of the users,” Segal said.

One mechanism, Citizen Lab’s study found, checks images against a database of blacklisted pictures in real time by examining the file’s “hash” – a digital fingerprint made up of a series of bits that can be extracted and read in seconds before the file is to reach its intended recipient.

If the hash matches an existing file in WeChat’s hash index of blacklisted images, the file will be filtered and prevented from reaching the other user, according to the research.

If there is no match – which is to say the image file has never been sent over WeChat – the image will be delivered to the recipient, but will then be queued for retroactive screening using two further tools: an optical character recognition (OCR) tool that scans the image for sensitive words or phrases, and another that checks the picture for visual similarity with other previously censored images.

Should either of those tests come back positive, the hash of the image will be added to the app’s blacklist, meaning the picture can be blocked in real time whenever another user attempts to send it in the future.

Researchers confirmed the process by manipulating the hash of a known blocked image without altering its visual appearance; the altered file could be sent successfully once, but not a second time, indicating that the image’s hash had been added to WeChat’s blacklist.

Tencent did not respond to requests for comment.

The report found that results varied, depending on whether images were sent in one-to-one chats, group chats or to the app’s timeline feature – known as “Moments” – with each method appearing to have its own hash index against which to cross-check pictures.

Citizen Lab, based at the University of Toronto’s Munk School of Global Affairs & Public Policy, has been a whistle-blower of sorts on activities by governments and other authorities that amount to what it calls “digital espionage against civil society”.

More than a decade ago, the lab made international headlines about online efforts to combat issues China deems politically sensitive, when it analysed the workings of a cyber espionage network that it named GhostNet.

Its investigation concluded that Tibetan computer systems were compromised by multiple infections emanating from servers in China, which “gave attackers unprecedented access to potentially sensitive information, including documents from the private office of the Dalai Lama”.

Citizen Lab’s WeChat study also found that images were more likely to be censored within group chats and on Moments than one-to-one chats, indicating that the potential reach of an image is factored into the censorship mechanism.

Image filtering only occurred when at least one of the chat participants had an account registered to a mainland Chinese number, the study found, suggesting foreign users were not a high-priority target of the content censorship.

The processes revealed by the report operated autonomously, said Citizen Lab’s Knockel, though human involvement would, in some cases, be required to decide what kind of images were placed in the original blacklist.

Recent years have seen WeChat expand from a private messaging tool to a one-stop digital multi-tool enabling users to do anything from paying for groceries and booking train tickets to self-publishing and finding partners for dating.

Like all Internet companies in China, Tencent is required by law to screen and control content shared across its platforms, including WeChat.

While automated censorship of individual chat differentiates Chinese social media platforms from those in the US, there is convergence in a broader sense, CFR’s Segal said.

“The big platforms in the US would like to develop machine learning to deal with content issues. For example, in his early public comments, [Facebook CEO Mark Zuckerberg] often responded that AI would help Facebook with disinformation, racist, sexist and other abusive material,” Segal said.

In China, exactly what content is considered illicit and how it is to be controlled is often left in the hands of the platforms themselves, though regulators are known to have issued specific guidelines, specifically regarding topical sensitive issues.

Further analysis in Monday’s report found that WeChat’s image censorship was reactive to current affairs, filtering pictures related to politically charged news events, such as the arrest of Huawei executive Meng Wanzhou and the US-China trade war.

Researchers also found evidence of many images of the Hong Kong protests against the government’s controversial proposed extradition law falling foul of WeChat’s censorship tools.

The report, however, was completed before the demonstrations began in earnest and so does not include such findings.

Filtering was not limited to pictures of the protests alone, Knockel said, noting that images of a crowdfunded advert that activists had placed in international newspapers also appeared to have entered WeChat’s blacklist. – South China Morning Post