Fake Tweet Cloud

Co-Author: David Kes

With the fervor of the Presidential election being skewed by Russian probing’s as well as the notorious Facebook / Cambridge Analytics scandal still topping daily domestic headlines, it became clear to us that “fake news” and Russian users are still prevalent, yet vague concepts. Who are these fake users disguising as? How are these fake users fooling people? How are they influencing people? Therefore, with our backgrounds in natural language processing, data visualization, and interest in the combination of technology and politics, it was only natural to examine the fake Russian user’s Tweet data with Python and Plotly. In particular, we analyzed:

Users and Tweets Overview

Who these fake users pretend to be — names and descriptions When the fake accounts were being created When the fake accounts were most active in tweeting What topics the accounts were covering

Case Study of Most Successful (or Unsuccessful) Users

Tweet Velocity Tweet Polarity Tweet Subjectivity

Data

NBC News published the database of more than 200,000 tweets that Twitter has tied to “malicious activity” from Russia-linked accounts during the 2016 U.S. presidential election. These accounts, working in concert a part of large networked, pushed hundreds of thousands of inflammatory tweets. The data can be found here.

Figure 1: Joined User and Tweet Data Provided by NBC

Figure 2: Example of a (non-fake) Twitter Account with Highlighted Description and Tweet

Now that we have the data, let’s dive into the analysis:

Who are these fake users?

Figure 3: First and Last “Names” of the Users

If we look at the face value of our fake users, we can see that there are three distinct categories of fake account names. The first combination consists of American sounding first names such as “Chris,” “Rick,” or “Jeremy” combined with American sounding last names such as “Green,” “Roberts,” or “Cox.” The second combination consists of formal sounding news sources such as “New [York Times]” or “Atlanta Today.” Finally, the third combination consists of purely foreign names. From this, we can see that it’s often difficult to tell which accounts are fake based off the name alone as it could be any average Joe, news site, or unrecognizable name.

What are their profile descriptions?

Figure 4: Topics Within the Descriptions of Fake Users

A unique part of creating Twitter accounts is the option to include a short description of who you are. Users often post their summaries and ideologies for others to see. Specifically, the above graph shows one topic in common with the fake accounts — religion. By using words like “God,” “InGodWeTrust,” and “GodBlessAmerica,” the fake accounts become relatable to a large group of people. Other bios include black lives matters topics, official news sounding descriptions (such as “sports,” “weather,” and “official”), and foreign topics. Therefore, by quickly, if not instantly, relating to this fake user, users are more likely to follow or agree with the fake accounts tweets. More on this similarity attraction phenomenon can be read here.

When were they made?

Figure 5: Russian Fake Accounts Created from 2009 to 2017

Of the 454 accounts deemed to be fake Russian accounts, we can see the creation of the fake accounts started in 2009 and reached its peak of creations in 2013 before slowly lessening all the way to the start of 2017. Interestingly, this means the majority of fake accounts were created years before the actual 2016 Presidential election, perhaps to cause strife and division amongst US readers ahead of the election.

Where are they from?

We took a look at where the fake users “originated” from and discovered that of the 454 values, approximately half the values were missing. Of the 287 locations listed, 124 were listed as some form of “United States”, 68 were listed as a large metropolitan cities in the United States (ie: San Francisco, New York, Atlanta, Los Angeles), and 37 values were in foreign countries, and the remaining 58 values were imaginary like “located at the corner of happy and healthy” or “the block down the street.” Therefore, since the data was majority missing and most likely fake, we opted from analyzing the data further.

How influential are they?

Figure 6: Followers vs. Number of Tweets of Fake Accounts

On a high level, we can see the number of Tweets increasing with the number of followers. This makes sense as these fake accounts are leveraging their popularity on social media to reach out and influence more individuals. One such notable fake user is the infamous Jenna Abrams account, whose racist, controversial, and fake tweets were at one point covered in mainstream media. At this point, it’d be safe to say these tweets definitely were influential on the popular mass.

When are they posting?

Figure 7: Heat Map of Fake User Tweet Activity

From the above heat map, we can see that the fake users are predominantly posting on Sundays and Tuesdays in the later months of the year (ie: August, September, October, November, and December). Therefore, we can expect this is not by chance; that the fake users understand their content reaches more individuals on weekends rather than weekdays in the later half of the year when the election and other big events occur.

What are they saying?

Figure 8: Topical Modeling of Tweet Content

Similar to the descriptions of the fake users, we took a look at the topics covered within the actual tweet content. In the above graph, we can see that the Black Lives Matters and other racial subject matters were one such topic the Russian accounts targeted with words such as “police,” “blacklivesmatter,” “crime,” and references to the shooting in San Bernardino, particularly about the perpetrator being of minority descent. Other topics covered by the users consisted of Hillary Clinton’s private email server, ISIS, pro-Trump slogans, slanderings of the election debates, and school shootings. These poignant and popular events were a particularly easy topic the fake accounts could add their propaganda-opinion to, as individuals were already livid and divided on how to feel in reaction to these events.

Case Study — in depth look at top 20 users

We know now what the prototypical fake account looks like, but what is the anatomy of a successful (or perhaps unsuccessful) Russian fake account? How are they gaining attention? From Figure (scatterplot), we see a positive correlation between number of followers and number of Tweets. Therefore, the following graphs examine their tweeting behavior in terms of velocity, sentiment, and subjectivity over time of the top 20 followed fake Russian accounts.

Tweet Velocity

Figure 9: Number of Tweets Tweeted by Fake Accounts

In terms of pure tweet volume, we can see a trend of the fake accounts being almost nonexistent until around June 2016, at which point the volume of Tweets increases dramatically — reaching its apex in October 2016. The tweets then tumble down in volume after November 2016 (election month), with one last resurgence around December 2016 before going back to an almost inactive state. This worrying trend shows the opportunistic behavior of the fake accounts, tweeting at the most tense and vital points of the election fervor.

Tweet Sentiment and Subjectivity

The following graph is the average sentiment and subjectivity of the tweets made by the top 20 followed users. In the context of tweets, sentiment is defined as an attitude, thought, or judgement. In particular, tweets can be analyzed by using the Textblob package for their sentiment, with a score of -1 for a negative sentiment tweet of a +1 for positive sentiment tweets. Lastly, subjectivity in the context of tweets is often seen as how opinionated the given tweet is. In Textblob, the score ranges from 0 for very objective and 1 for heavily subjective or opinionated.

Figure 10: Average Sentiment and Subjectivity of Top 20 Followed Users

Time Series Point Analysis

Figure 11: Time Series Tweets with Hash Tags

Using the power of hindsight and the Wikipedia of US current events, we can see what some notable spikes are related to:

August 4, 2016: spike of hash tag #obamaswishlist which were posts about fanciful and perceived hypocritical items Obama “wanted”

August 17, 2016: spike of the hash tag #trumpsfavoriteheadline which are Tweets about sardonic headlines that Donald Trump would endorse

September 28, 2016: #ihavearighttoknow movement by fake accounts to know what Hillary Clinton’s emails were

October 5, 2016: #ruinadinnerinonephrase was actually seen as both politically-backed and non-politically-backed with some referencing it to Hillary Clinton while others made memes out of the hashtag

October 17, 2016: #makemehateyouinonephrase, another hash tag movement that was seen as either part of a meme culture or part of the political systems

November 14, 2016: #reallifemagicspells used in reference with black lives matters and Trump’s family

December 7, 2016: #idrunforpresidentif “I’d known I needed literally zero experience” and other sardonic comments about the presidential election

Perhaps coincidentally or not, the initial spikes were all related to fake accounts simultaneously using hashtags to mock presidents and presidential candidates. The tweets were clearly politically-based with the name drops to actual candidates. However, as time progressed, the distinguishing factor between these tweets became less obvious, as the fake accounts used actual popular hashtags that were not clearly political. Additionally, the tweets seemed aimed at all candidates rather than one particular candidate until Trump was actually elected — at which point all attacks were directed at Trump.

Conclusion

From our analysis, we learned that the fake accounts disguise themselves as (1) average Americans, (2) news sites with metropolitan names, or (3) international names that describe themselves with relatable topics such as political and religious beliefs; they achieved their objective of influencing Twitter users by posting subjective and polar tweets at opportunistic times such as the weekends when scandals and large announcements occurred. Finally, they grew sentient of their obvious posts by subtlely joining trending hashtags and injecting propaganda within it.

Thank you for taking the time to read our analysis of Russian Fake Tweets. Leave a like or comment if you can now talk about fake tweets without sounding fake, think there’s any other things we should analyze, or have any opinions. Feel free to follow or connect with Stephen and/or David on LinkedIn as we’ll be posting more interesting articles and tutorials!

References: