Regardless of which side of politics you are on, the country in which you live, or your socio-economic status, you will most likely have heard of Donald Trump. This analysis is an assessment of Donald Trump’s tweets, aimed at understanding the changes in his themes as he moved from being a billionaire on The Apprentice to the President of the United States of America. This analysis was performed using tweets archived on TrumpTwitterArchive.com, Stata 14 and WordStat for Stata for Mac produced by Provalis Research. This analysis was performed out of general interest and was not funded by or supported by any organisation.

Overall observations

The key observation from the analysis is that Donald Trump is pro-American (no surprise there) and the theme of “Make America Great” spans much of his Twitter posts. Whether he makes the news; is the news; or follows the news in his tweets, the tweets reflect the general content of news during any specific stage. This analysis also found that he tweets significantly about himself, with the word “Trump” being one of the most frequently used words across his statements. The other key observation was that he rarely tweeted about policies or direction of the nation, choosing instead to use Twitter as a means to self-promote or make comments about others (e.g. competing candidates, visiting dignitaries or other celebrities). Examination of the tweets when segregated between his stages of candidacy (i.e Pre-Candidate, Pseudo-Candidate, Candidate, President-Elect and President), notes a change in his core themes which reflect his goals / objectives and topics of interest, while still maintaining a shameless penchant for self-promotion.

Method

The approach to conduct this analysis involved the following key steps:

Extract the tweets from TrumpTwitterArchive.com Segment the tweets, using Stata 14, into relevant stages of Trump’s pre-political and political career through to his presidency. Use WordStat for Stata for Mac to identify word frequency, relationship and linkage to: identify core themes from the tweets; identify relationships between words; build a dictionary to categorise specific words with similar meaning or in similar categories; and analyse the text for each stage to see the impact of categorisation of his tweets. Analyse the outputs from WordStat for Stata for Mac to identify trends in topics and changes in attitudes and develop an understanding of the core themes which are important to Donald Trump when he tweets.

Extracting the tweets

TrumpTwitterArchive.com presents itself as a complete log of tweets, retweets and mentions of @realDonaldTrump. As at 12 March 2017 it recorded a total of 30,612 tweets covering the period of 4 May 2009 to 12 March 2017. While Donald Trump began using Twitter in 2009 he did not begin investing significant time and effort into Twitter until his unofficial campaign launch in 2011.

Extracting the tweets from TrumpTwitterArchive.com was performed by using the date range filters available in TrumpTwitterArchive.com and then selecting the available tweets.

TrumpTwitterArchive.com records the date and time of each tweet, the tweet text and the device used to make the tweet. While TrumpTwitterArchive.com claims to be a complete list of tweets, retweets and mentions of @realDonaldTrump, the completeness of TrumpTwitterArchive.com is not part of this analysis and has not been assessed in this analysis. Potential discrepancies exist due to the restriction of public access to a limited subset of historical tweets through the Twitter website and due to individual users (such as Trump) being able to delete tweets and previously uploaded content. Further, TrumpTwitterArchive.com does not contain geographic location information or picture content included in tweets.

This analysis is limited by the scope of the information retained in TrumpTwitterArchive.com. Native Twitter retains additional information about a tweet including the geographic location at which the tweet was made and other meta-data. The geographic and meta-data information is not available in TrumpTwitterArchive.com.

Extracting the tweets from the TrumpTwitterArchive.com was performed by selecting a date range of tweets and then using copy and paste to append the tweets to a text file of all tweets.

With the tweets in a single dataset, the tweets were then imported into Stata 14 to separate the tweets into their constituent parts using regular expression (regex) features of Stata 14. The following segments were extracted:

Date and time of tweet -> converted to -> Day, Month, Year

Twitter text -> converted to ->Tweet text and generating device

Removal of tweets from the dataset was performed through regular expressions used for identification. Removal included public tweets directed to @realDonaldTrump, re-tweets by Donald Trump of content published by other users and duplicate tweets. As tweets are conducted from multiple devices duplicate tweets were noted as the exact same text on the same day (even if the time was different). The resulting set of tweets in the analysis encompassed 20,720 tweets covering the period of 2009 to 13 March 2017.

Segmenting the tweets

Using key dates in Donald Trump’s presidential campaign the tweets were divided into six stages of his candidacy:

Billionaire host of The Apprentice: 4 May 2009 – 12 January 2011 . His pre-candidacy stage begins at the oldest records in the TrumpTwitterArchive.com and ends at the beginning of his pseudo-candidacy. There were 199 tweets recorded in this segment.

. His pre-candidacy stage begins at the oldest records in the TrumpTwitterArchive.com and ends at the beginning of his pseudo-candidacy. There were 199 tweets recorded in this segment. Pseudo-candidacy: 13 January 2011 – 31 May 2015. His pseudo-candidacy starts in early 2011 when a website shouldtrumprun.com was established prior to the 2012 presidential election. There were 14,472 tweets recorded in this stage.

His pseudo-candidacy starts in early 2011 when a website shouldtrumprun.com was established prior to the 2012 presidential election. There were 14,472 tweets recorded in this stage. Candidate: 1 June 2015 – 22 July 2016. This stage runs from the announcement that he was running for president to his acceptance of the Republican nomination on 22 July 2016. There were 4191 tweets recorded in this stage.

This stage runs from the announcement that he was running for president to his acceptance of the Republican nomination on 22 July 2016. There were 4191 tweets recorded in this stage. Presidential campaign: 23 July 2016 – 8 November 2016. His presidential campaign begins with his confirmation as the Republican nominee up to the date of the election. There were 1205 tweets recorded in this stage.

His presidential campaign begins with his confirmation as the Republican nominee up to the date of the election. There were 1205 tweets recorded in this stage. President-elect: 9 November 2016 – 20 January 2017. This short stage of President-elect begins at the announcement of his election win up to the date of inauguration. There were 324 tweets recorded at this stage.

This short stage of President-elect begins at the announcement of his election win up to the date of inauguration. There were 324 tweets recorded at this stage. President: 21 January 2017 up to 12 March 2017. This stage of covers the early days of his presidency up to 12 March 2017. There were 329 tweets recorded in this stage.

It should be noted that Donald Trump continues to tweet daily and thus adds to the unstructured data being generated by this frequent user of Twitter. On becoming President of the United States of America, Donald Trump took over the coveted @POTUS Twitter handle. Tweets under this handle have not been included in this analysis.

There are of course other key dates in the election campaign that may be of interest to segregate out, based on specific events such as Republican candidates dropping from the race or Democrats dropping from the race. This analysis, however, has focused on the six stages clearly identifiable in the TwitterArchive of Donald Trump.

General analysis

Using WordStat for Stata for Mac, the full list of 20,720 tweets were loaded with the text listed as the unstructured data to be analysed, the day, month, and year as categorical variables for additional analysis. Using WordStat’s ability to identify the frequency and linkage of the words used by Donald Trump, a text relationship diagram was produced. This diagram (Diagram 1) shows the general themes and topics of Donald Trump across the eight years of tweets.

Diagram 1: Single word themes from eight years of tweets (top 300 words)

Diagram 1 shows the top 300 words used by Donald Trump on Twitter from the period of 4 May 2009 to 12 March 2017, broken into 60 clusters of words used throughout this stage. The size of the bubble is an indicator of the frequency of use of the word, while the location of the words is a general approximation (adjusted for the size of the diagram) of the relative proximity of the words to each other in Donald Trump's tweets. The cluster of light orange bubbles are the keywords used by Donald Trump across the stage. This stands out as a theme, which rings true to the campaign that he ran: “Make America Great”. Other standout themes are Barack Obama, Obamacare and Hillary Clinton. Notable from this is that the most common word used in Donald Trump’s tweets is “Trump”.

The proximity plot below shows that MakeAmericaGreat (as three words merged) is the third most common word (phrase) used near “Trump” in his tweets. This clearly stands as his objective and one of his sales phrases.

Diagram 2: Proximity plot of words most commonly used with the word “Trump” across 8 years

It is however when we begin to separate out the six stages of Donald Trump’s Twitter campaign that we begin to see differences and changes in his approach, changes in his themes and the topics which fascinate him.

Stage 1 – The billionaire host of The Apprentice

This stage should be classified as Donald Trump’s celebrity phase, which would be superseded by his political phases, however, his use of Twitter, slogans and constant messaging indicates that this celebrity phase pales in comparison to other Twitter stages for @realDonaldTrump. In this stage he tweeted just 199 times across about 20 months, launching himself on Twitter and other social media platforms. The first tweet recorded on TrumpTwitterArchive.com is an invitation to join him on television later that night. This post is quickly followed up by a request to become a fan on Facebook. With just 199 tweets across that period, he did not have a lot to say. His main themes were his television appearances, Miss Universe, himself, and the hotels that he operates. The diagram below shows Donald Trump as his primary word, while things that he owns or controls closely follow – eg Apprentice, Miss (for Miss Universe), and his hotels.

Diagram 3: Stage 1 single word themes

Stage 2 – Pseudo-candidacy: The politicisation of Donald Trump’s social media

In early 2011 Donald Trump began a pseudo-campaign to run for president of the United States of America. A website was established: shouldtrumprun.com and people began to listen to his political statements and his calls for change in politics in America. In this paper, we will refer to this period of time as his pseudo-candidacy. He did not register as a candidate for the 2012 election, but as his tweets reveal, there is a constant campaign on Twitter by him to displace the incumbent president and “makeamericagreat”. The diagram below (diagram 4) shows that Donald tweets about himself and promotes himself using age old salesman techniques to convey his messages, such as excessive use of “Thanks” and “Thank” (thank you) in his messages to appeal to his listeners (see the blue and green bubbles at the bottom centre of the diagram.

Diagram 4: Single word themes – Psuedo-candidacy

Diagram 4 shows that he believes in himself. He uses the word “great” with high frequency and in close proximity to himself. He also associates Obamacare with being a disaster, often suggesting that it is a “total disaster”.

The next diagram is extracted from the Dendrogram tab of WordStat for Stata. It shows the increasing politicisation of Donald Trump’s tweets across the years from 2011 to 2015 when he decided to formally become a candidate. A cluster of 13 words stands out in the analysis of this stage with this cluster not appearing in the prior section. Key words in this cluster indicate his politicisation as he began his pseudo-campaign, notably President Obama, Country, Election and Vote. These are:

America

Obama

President

Donaldtrump

Run

Donald

Trump

Country

Leadership

Mr

Realdonaldtrump

Election

Vote

The diagram below show the increasing prevalence of these words in his text year by year.

Diagram 5: Word cluster prevalence by year – Pseudo-candidacy

Donald Trump was also fascinated by Obamacare during this same stage. He used a cluster of words expressing his opinion of Obamacare. The cluster includes the words “disaster”, “Obamacare” and “website”. Two example tweets from his fascination with Obamacare are “just sit back and watch, Obamacare is such a disaster it will fall like a house of broken cards. The website is the best part of this mess!” and “Today is the day that obamacare website was supposed to be up and working.WRONG-website is closed down, a total disaster! 90 million doomed.”

Stage 3 – Candidate: Make America Great

In 2015 Donald Trump formally decided to run for president of the United States of America and launched an official campaign with a slogan of “Make America Great”. In this stage he issued 4,191 tweets with a strong cluster of words focusing on making America great and thanking people for supporting him. This diagram shows that clustering. Again the stand out words featuring in this stage are Trump, Great and Thank (as a part of thank you).

Diagram 6: Single word themes used while campaigning as a candidate

The emerging cluster that appears in this stage for Donald Trump is a new fascination with Hillary Clinton, with Donald Trump using Twitter to clearly express his opinion of Hillary. The purple cluster above shows that he was fixated on “crooked Hillary Clinton” linking the word “bad” with her, while also referencing President Obama. Donald Trump’s competitors for the Republican nominee also get a mention by Donald Trump but not to the extent of his democratic rival Hillary. Interestingly Donald Trump must not have considered Bernie Sanders a threat as he mentions “Bernie Sanders” in only 21 tweets across the period while mentioning “Crooked Hillary” 123 times.

Stage 4: Presidential campaign – The nominee

With his race confirmed by the Republican Convention, Donald Trump tweeted 1,205 times in the three and half months between 22 July 2016 and 8 November 2016. That is an average of 11 tweets per day! A major change occurred in his tweets. For the first time in his Twitter campaign did he mention someone else more than he mentioned himself, with Hillary Clinton (referenced as Hillary, HillaryClinton or #CrookedHillary) mentioned 281 times in the 1,205 tweets – that is almost one-quarter of his tweets. Obamacare remains a key theme and finally, some resemblance of potential policies begin to emerge in his tweets with some small clusters appearing relating to borders / immigration, corruption, jobs and crime. The famous Donald Trump line of “We will build a wall” became a well-known slogan of his campaign, featuring in a tweet within this period of candidacy: “From day one I said that I was going to build a great wall on the SOUTHERN BORDER, and much more. Stop illegal immigration. Watch Wednesday!” (30/8/2016). Interestingly, this is mentioned only rarely, with more references to “Wall Street” than to the “Great Wall Across the Southern Border”. His salesman technique to thank people remained strong with “Thank” still appearing as one of the most commonly used words.

Diagram 7: Single word themes from Donald Trump’s campaign as Republican nominee

Emerging in this stage is a pre-curser cluster to future stages as Donald Trump begins his campaign of denouncing the media, with statements about the media being “totally dishonest”. See the khaki coloured bubbles in the upper right section of the above diagram “wow”, “media”, “dishonest” and “totally”.

Stage 5: President-elect – The elected president thanks his supports

This relatively short stage from 9 November 2016 to 20 January 2017 is where Donald Trump thanks his supporters for giving him the presidency. Hillary has not gone away in his tweets, nor has his comments about the media but the new cluster for Donald Trump contains “Bad” and “Russia” as he realises that this new role will involve international diplomacy.

Diagram 8: Single word themes – President-elect

Stage 6: President – The World leader?

Donald Trump continues to tweet, and while he is now the holder of the prestigious @POTUS account with Barack Obama being relegated to @POTUS44, Donald continues to tweet under his personal account @RealDonaldTrump using both accounts to promote the other. With 329 tweets in the period to 12 March 2017 Donald has new themes and new fascinations as the President. Donald Trump remains committed to using the word “great” in his tweets and is still enthralled by President Obama (his predecessor). The theme of Russia extended into this new stage, but now he is aware of the need for national security, while still responding to media with comments suggesting fake news and the failings of the New York Times. References to himself as “Trump” have decreased in this stage, but his salesmanship has not gone away as “Thank”, “Great” and “People” remain leading words for Donald Trump.

Diagram 9: Single word themes – As president to 12 March 2017

Categorisation

The analysis of Donald Trump’s tweets to this point in this paper has just relied on linkages between single words (some of which are merged words e.g. makeamericagreatagain or crookedhillary). Using WordStat for Stata for Mac it was possible to build word categories to merge some of the topics. A categorisation dictionary was built with the following categories:

Politics

Television

Hotels

Media

Opinions

US places

Famous people

Self

Using the features of WordStat to identify misspellings, phrases and named entities, categories were compiled. Analysis of these categories shows that across the period Donald Trump remains confident in himself and his hotels with a strong linkage between positive statements and these terms (as seen by the clustering of green bubbles). Hillary Clinton stands on her own closely related with negative comments (as seen in the lilac coloured bubbles at the top of the diagram).

Diagram 10: Categorised themes

Conclusion

Donald Trump has used Twitter to push his social agenda, his campaign and his point of view with the bravado of the salesman that he is. The analysis shows that his social agenda, his campaign and his view is Donald Trump-centric, as the word analysis and categorised analysis has shown that the most frequent topic relates to himself and the things that he owns. The analysis also showed that he uses Twitter to talk other people down while building himself up. His Twitter use avoided the topics that are frequently debated in politics: jobs; growth; security; health; and even as president he fell back to his standard Twitter persona to promote himself, talk down others and avoid the issues that face real Americans. This analysis of his Twitter posts shows the world according to Donald Trump, his fascinations and his concerns. It also shows that underlying themes change and are driven by events around him, while still maintaining a Donald Trump-centric agenda. Clearly, the constant across more than 20,000 tweets, is that he holds himself in high regard.

Where to next with this analysis?

1. An interesting extension of the analysis would be to study the relationship between Donald Trump’s tweets and general media to answer the question: does Donald Trump follow the news or does he generate the news? And if so when did this transition occur?

2. This extension could be built into a monthly blog to watch his topics change and then map that to general news headlines.

3. A separate analysis could be performed to identify linkages between posts from @POTUS and @realDonaldTrump.



