Colbert AI v2.0 Banner

A long time ago my friends and I were watching “The Late Show with Stephen Colbert”, but we couldn’t get enough of our favourite host Stephen Colbert. We wished we could have our own Colbert at home. It might sound like some evil abduction plan, but it did spark an idea.

“What if we built our own Stephen Colbert?”

Now that does sound like something a mad scientist working on human cloning using DNA from your hair follicles would do, but we took a safer approach, which is building an AI that can mimic the humour of Stephen Colbert. Which we called “Colbert AI”. As you could see it says “v2.0” which implies we weren’t as happy with the first version, so we decided to build a second.

What is Colbert AI?

Colbert AI is a Deep Learning Language Model that generates text in the style of Stephen Colbert’s famous monologues.

To build it, we used State of the Art Deep Learning Language model: Open AI’s GPT-2 and fine-tuned it using text from YouTube video captions.

Does it work?

Now, there are some people out there who think trump’s a bad person. For instance, this weekend, I watched the presidential candidate’s first candidate round-up, and he was named “The man who can’t get anything he wants to get right.” ( cheers and applause ) that’s a good quality. That’s a good quality, because the only person who can’t get anything right is Donald Trump. ( laughter ) and I’m not sure he’s read the new book, “The man who can’t get anything wrong.” This is a big day for the president of the united states. Trump is about to be released from impala. (laughter) (applause) and this is huge news because this is a big week for him because the court has decided that he can no longer use the n-word, because, in a letter to his staff, the president said, “If I didn’t use the n-word, then why are all the other white house staff members calling me a cuck?!” (laughter) (applause) (cheers and applause) (piano riff) and trump’s not the only person who has been in jail for the “N-word.” last week, Austin turns out to be a founder of “N-god,” which was also the name of a movie. (cheers and applause) and now trump is going to have a new “N-god.” (laughter) and, of course, the “N-god”

Yes. It does! Above are two examples from the generated prompts by Colbert AI.

As you can probably tell, It’s not as perfect as OpenAI’s Unicorn samples, That’s because we only used the 355M Model compared to 1.5B Model used in OpenAI Unicorn Samples. We are still working on making it better. Maybe one day we’ll get the AI it’s own Late Show.

This sums up our experience.

Sooo... How did we build it?

Right on, asking the real questions here.

We used Open AI’s GPT-2 and fine-tuned it on data we collected from the Late Show’s video captions from youtube. We used Huggingface Transformers and Pytorch for loading the GPT-2–355M Model and fine-tuning it.

Let’s dive in and show you how to build your own.

The GPT-2

Image Credit: Open AI

OpenAI is a non-profit artificial intelligence research company that discovers the path to safe artificial general intelligence. Almost a year ago they launched GPT-2.

GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data. On language tasks like question answering, reading comprehension, summarization, and translation, GPT-2 begins to learn these tasks from the raw text, using no task-specific training data. While scores on these downstream tasks are far from state-of-the-art, they suggest that the tasks can benefit from unsupervised techniques, given sufficient data and compute.

Fine-Tuning

Training Deep Learning models from scratch could cost you tens and thousands of dollars, maybe even more. So fine-tuning is the best way to go about it as many generous organizations have made their trained models open source.

Huggingface has made many pre-trained Transformer models available that can be implemented very quickly using PyTorch.

As mentioned earlier, we used captions from Youtube videos as our dataset. The script that does that is available on our GitHub.

GPT-2 has 4 different sizes. Small, medium, large, and XL, with 124M, 355M, 774M, and 1.5B parameters, respectively. We used Google Colab so we could only fine-tune on the Medium model with 355M Parameters.

Image by Jay Alammar

In the following Gist, We demonstrate how to generate a text by using fine-tuned medium-size GPT-2 from hugging face.

Conclusion

The results were pretty impressive but there’s more work that could be done. We are also considering the Mad Scientist Approach where we take a hair sample and make a clone (Obviously kidding).

The best part about it is that it can be used for anyone. Maybe you will make an AI version of Neil Degrasse Tyson. If you do make it please share it with us that would be really amazing. You can reach out to us on Twitter (Abbas and Shubham).

Hope you enjoyed reading the blog.