On August 19, Twitter dropped a new trove of state troll tweets that the company said were from “a significant state-backed information operation focused on the situation in Hong Kong, specifically the protest movement and their calls for political change."

The tweets deserve a deeper examination, and clearly more can be done with the material. I had previously worked on a project on uncovering such disinformation campaigns on Twitter.

Due to time constraints, here’s a quick and dirty first exploratory look at the data. More to come in the coming days and weeks.

1. DATA, NOTEBOOK AND ASSUMPTIONS

My rough notebook is here, and the repo will be updated as I find more time to work on this project.

The CSV files are too huge to be uploaded on Github. Download them directly from Twitter instead.

To contain the complexity of the project at this stage, I filtered out the retweets, which is an interesting area deserving a separate look. I also focused only on the English and Chinese-language tweets. The tweets in this dataset came in 59 languages, believe it or not.

2. OVERALL LOOK AT THE CHINESE STATE TROLL TWEETS

Twitter said the tweets it released came from “936 accounts originating from within the People’s Republic of China (PRC). Overall, these accounts were deliberately and specifically attempting to sow political discord in Hong Kong, including undermining the legitimacy and political positions of the protest movement on the ground”.

The accounts, already suspended, “represent the most active portions of this campaign; a larger, spammy network of approximately 200,000 accounts”, Twitter added in its press release.

Here are the key figures I found from a quick overview:

Unique userids: 890

Unique user display names: 883

Unique user screen names: 890

Unique user reported locations: 178

Unique user creation dates: 427

Unique account languages: 9

Unique tweet languages: 59

Unique tweet text: 3236991

Unique tweet time: 1412732

Unique hashtags: 110957

The number of troll tweets were whittled down from the initial 3.6 million down to 581,070 after I filtered them out for RTs and language. Too aggressive? Perhaps, but that’s still a lot to work with.

3. QUINTESSENTIAL CHINESE STATE TWEETS

First, let’s have a quick look at what these Chinese state tweets targeting Hong Kong look like, both in English and Chinese: