In the world’s current political climate, propaganda is the name of the game and Twitter is the medium of choice. Automation is king, and if you’re not using Twitter bots to sway the masses, you’re doing it wrong. Here at StdLib, we don’t really have any political motivations, but we sure do enjoy building bots. And, with the launch of StdLib Sourcecode, it’s never been easier for us to share our newest project with you: introducing Jaden Trudeau, the eccentric future Prime Minister of Canada. We’ll teach you all about how we built this wonder of modern engineering, and how you can build your own “Political Terminator” (shoutout to Arnie) in minutes.

With the goal of building a Twitter bot build to appeal to the masses, we chose to combine the wisdom of Jaden Smith:

With the wholesomeness of Justin Trudeau:

To create the world’s perfect politician: Jaden Trudeau. More specifically, the goal is to create a bot that occasionally tweets procedurally generated sentences in the style of Jaden Smith and Justin Trudeau. This combination results in wonderful specimens such as:

The tool of choice is for this project is a Markov chain: Markov chains have many real world applications like Google’s Page Rank algorithm, but none are as important as this one. If you want to skip to the working version of the code, you can checkout its API page here. From this page you can try the service yourself, and even mix in other peoples Twitters!

Coming to an election near you — Jaden Trudeau

What’s the deal with Markov Chains?

We describe a Markov chain as follows: We have a set of states, S = {s_₁, s_₂,…,s_r}. The process starts in one of these states and moves successively from one state to another. Each move is called a step. If the chain is currently in state s_i, then it moves to state s_j at the next step with a probability denoted by p_ij, and this probability does not depend upon which states the chain was in before the current. [source]

A two state Markov chain [source]

In short, a Markov chain is a mathematical model that transitions from one state to another by throwing out the history of previous states and only examining the present. While that explanation is still bit abstract, it becomes more clear within the context of generating sentences. Below is an outline for how you might generate text using a Markov chain.

Split a body of text (your corpus) into tokens (words and punctuation). Build a frequency table. This data structure has a key for every unique token in your corpus. This key is mapped to a list of all the words that follow the key, along with the frequency at which it occurs after that word. It also helps to add special keys for the start and end of sentences. This ensures that when sampling from the model you can always start and end sentences with appropriate words. Select a starting point (one of those special start words) and then randomly select a token from the list of tokens that follow the key. The probability that a key is chosen should be proportional to how often it appears after the key. This new token is now the state of the Markov chain. Lookup the new token in the frequency table and repeat.

Implementation

With a general idea of how to proceed, it’s time to get going. First things first, we need to fetch some tweets. With Twit, thats no problem.

After receiving the tweets, they need to be tokenized. With tweets, this is not an entirely trivial process. Tweets are full of URLs, emojis and ill formed sentences. We can turn a string representing a tweet into an array of tokens with the code below:

This function takes in a tweet, strips it of URLs and mentions and splits it into words. These arrays can then be feed into the frequency table.

The code to generate the table is a little long for a medium post, but you can see it here. After the table is generated, entries look like this:

These entries could be traversed in a few ways. At the beginning, there is a 50/50 change of selecting ‘our’ or ‘we’ as the starting word. Assuming ‘our’ gets chosen then there is a 2/5 chance that ‘future’ or ‘differences’ gets chosen and a 1/5 chance for ‘relationship’. This process keeps repeating until a chain is created such as:

__START -> our -> future -> office -> __END