Here is a really superb paper on the 50 cent party, the Chinese Communist Party’s army of loyalist Internet trolls. The researchers scraped literally millions of below-the-line comments and Weibo posts, hired Chinese students to identify the 50-centers in random samples and classify the posts by subject, checked that the students, who worked independently, agreed with each others’ classifications (they did with a likelihood ratio of 0.880, where perfect agreement would be 1), and trained a variety of different machine learning models against this corpus. They then evaluated the different models against more randomly selected comments and picked the best, sending the results back to the students for cross-validation. That done, they could turn the machine loose to churn through the pile of comments.

The results are fascinating. Official trolling focuses on five key subjects: ethnic conflict, corruption, disasters, individual leaders, and nationalism.

What fascinates me here is that the mission of the 50 cent party could be summed up as clinging on to the mandate of heaven. Scandal, natural disasters (or more accurately, failure to respond to them), and ethnic strife are the classic markers of a Chinese empire that is losing its grip on legitimacy. The ideological means by which this is resisted seems to be the flag. As for the rest, it’s fairly obvious that, given an army of Internet trolls at their beck and call, individual leaders will tend to use it to look after their reputations. Also, of course, the legitimacy they are trying to defend is that of the leaders.

This similarly excellent paper is based on a very similar research project, but comes to subtly different conclusions about target subjects. This, however, is down to methodological differences. The first paper uses human investigators to classify a sample of the comments by the topics they perceive among them, and then uses software to identify comments with similar properties to the ones in each topic, in what is known as supervised learning. The second uses a different approach. Their software tries to identify clusters of traits that maximise the statistical variance between categories, in what is known as unsupervised learning. The investigators then attempted to identify what these empirically-determined clusters mean to human beings.

On nationalism, for example, they identify a cluster of topics around “taunting foreign countries” but note that this represents a small percentage of total output. This sounds like it contradicts the other study, but by far the biggest cluster they found was identified as “cheerleading”. Typical posts in this category include strings like “I love China!” and “Long live the CCP!”, which I think can fairly be described as expressions of nationalism.

The Chinese students correctly identified that vacuous cheerleading is a big part of nationalism, while the unsupervised classifier correctly detected that nationalist rah-rah yelling contains the same sentiment-analysis traits as the same kind of speech about abstract concepts, local or class identities, or the Party. George Orwell says much the same thing in Notes on Nationalism.

One important point that the unsupervised classifier picks out is that aggressive, negative comment about foreigners (so-called fenqing trolling) is probably a more authentic phenomenon than the 50-centers’ support-the-troops cheerleading, as it doesn’t originate from the official distribution network. Rather than deliver it on tap, the Party chooses whether to tolerate it or not when it happens to break out spontaneously.

Our second paper also shows that the command-and-control network is highly centralised at the district level, with trolls reporting to the Internet propaganda bureau, which communicates with numerous higher government and Party agencies. At the district level, the bureau is a highly critical node in the network.

Both papers converge on similar conclusions about the nature of the 50-centers themselves.

The first paper identifies four types of troll user account, which may even be a life cycle. 50-centers register lots and lots of user accounts which generally don’t engage much and aren’t extensively personalised. They don’t do much until they are mobilised for a topic- and event-specific blitz campaign. In intelligence terms, they would be considered sleeper agents.

Once activated, though, some of them start to display an informal affiliation with the Party and often with the local Public Security Bureau. This allows them to start distributing grey propaganda and projecting informal surveillance. They would now be considered agents-of-influence. Some of them are eventually acknowledged by the authorities, becoming semi-overt agents of the state or the Party. The second paper, basing its conclusions on a major document leak, argues that the typical 50-center actually is a Party or government employee.

Finally, their usefulness at an end, accounts go quiet and are deleted.

I would add that if we read the four phases as a life cycle, it matches some classic ideas about propaganda. The angry eggs serve to project a general mood, rather than specific messaging. In particular, they create false consensus, giving the impression everyone agrees with the system, and a generally hostile environment for dissenters (they are being gaslighted into noping-out of the discourse, some would say). Their development into insider sources permits new content to be introduced into the debate. Their revelation as official agents is a so-called surprising validator, confirming its validity. But you can only blow your cover once, so at this point, that particular account is no longer of use, and it is then garbage-collected.

A really interesting project would be to run a similar method back on Twitter. To what extent do wild-type trolls, cued in by stigmergic interaction with their environment and each other, and artificial ones commanded to act by authority, differ?