From the chart we can see that in general the last use of the Android phone is on 8 March 2018. Two tweets from Android appear again on March 25 but appear innocuous so for simplicity we’ll disregard them as no more Android Tweets appear from April until the end of the dataset.

Based on this I then split the data into two, pre-iPhone and post-iPhone data. For the pre-iPhone data we can assume that any Android tweets are Trump and any other tweets are Staff. For the post-iPhone data, we can no longer make that assumption. This means that we need to train our neural network model to distinguish Trump and Staff tweets on the pre-iPhone data, and then use that model to predict the source on the post-iPhone data. Easy!

Neural Networks like Numbers - Lots of Numbers

The first step before training the network was to make the words comprehensible for it. We do this by fitting a TFIDF Vectorizer from the Python Library ‘Scikit Learn’ to the entirety of the text data. This stage essentially creates a little model that keeps a record of every unique word across all of the tweets and how prevalent it is across the data. We keep this model handy because we’re going to need it to transform our different sets of text data into a numerical pattern that can be understood by the Neural Network. Vectorizers work by transforming a string of text, into a very wide spreadsheet, with a row for each document, and then a column for each possible word. The values calculated can be the number of times that word occurs in the document, a 1 or a 0 for whether it occurs at all, or in the case of TFIDF, a score based on how significant that word is for that document, considering both the document itself and all the other documents in the dataset. This is demonstrated below with two extracts from Alice in Wonderland and the two row array of TFIDF word scores.

But how does it “know”?! — Training a Neural Network

The way neural networks work for classification, is that they are trained by feeding them lots of examples of your data, and crucially, providing them the expected output or label that the model should predict. With every example the model compares its prediction to the ‘correct’ label, makes adjustments to itself, and tries again. Training continues until it is as close as possible to the ideal scenario of correctly predicting the classification for all instances of the training data. In our case we provide the model lots of tweets, and with each tweet we tell the model whether it is a Trump tweet or a Staff tweet until its prediction is very close to the correct category.

Often after training across many iterations a neural network model will become highly accurate at matching its prediction to the expected outcome, but really, we want to know how well it will perform predicting categories for tweets it has never seen. We can evaluate how well a neural network makes these predictions by holding back a percentage of our training data so that those tweets are never seen by the model in training. We then ask the model to predict the tweet author and check its predictions against the category that we’ve already assigned. For the final model that I built the result was a model that was 97.6% accurate in predicting the labels in the training data it had seen, and 76.8% accurate for tweets it had never seen before. However one doesn’t just build a model. First you build 120 models!

Choosing the right Neural Network for you…

Often when working with text data improvements in machine learning come through better curation of the training data, in our case for example by either focusing just on iPhone and Android Tweets, or considering whether Trump may be responsible for Tweets from other platforms as well. We might also think through whether better pre-processing of the text (cleaning out noise, correctly identifying emoji’s, removing URLs etc.) might be a good decision. However it could also be that these noisy elements are actually the most informative to the model. In our case, after cleaning out the URLs, hashtags and emojis from the tweets, I found the model’s accuracy plummeted by around 20%, indicating that it was these very features that were aiding identification.

However, crucially a key part of building and using neural networks are the choices made in terms of how many layers your network will have, and how many nodes in each layer. These aren’t necessarily things that can be chosen algorithmically, and they differ depending on the data. One suggestion is to run trials on different model shapes of different amounts of layers and nodes. For every model shape you train and evaluate the model multiple times, each time using a different subset of the training data. For each model shape, you take the average score of its trials, and see in general which model performed best on small chunks of the training data. As you can see after evaluating 120 independently built models, the best result was model 010, a neural network of 4 hidden layers where each hidden layer contained 1000 nodes. Note that simply increasing nodes and layers does not necessarily increase predictive power, the right model ‘shape’ is often dependent on the data you are working with.

Making this Graph required building and training 120 different neural networks. You’d think it would look more impressive.

The Grand Finale: Finding Trump

Having determined the best model shape to use (4 hidden layers of 1000 nodes) we build a brand-new model and train it up with all the training data we have and turn to our so far neglected post-iPhone dataset of tweets from after Trump switched to iPhone. In this instance, we have no ‘correct’ labels as we are no longer certain which tweets came from Trump’s phone and which from his Staff as they all come from the iPhone platform. We transform the text using our original TFIDF vectorizer and feed it to the model, asking it to provide us a prediction; Staff or Trump?. In terms of activity we can see the model has a relatively even split between the tweets, with more predicted activity from Staff on some days, and more predicted activity from Trump on others.