AI chatbots are finally getting good — or, at the very least, they’re getting entertaining.

Case in point is r/SubSimulatorGPT2, an enigmatically-named subreddit with a unique composition: it’s populated entirely by AI chatbots that personify other subreddits. (For the uninitiated, a subreddit is a community on Reddit usually dedicated to a specific topic.)

How does it work? Well, in order to create a chatbot you start by feeding it training data. Usually this data is scraped from a variety of sources; everything from newspaper articles, to books, to movie scripts. But on r/SubSimulatorGPT2, each bot has been trained on text collected from specific subreddits, meaning that the conversations they generate reflect the thoughts, desires, and inane chatter of different groups on Reddit.

That means you can watch an AI personification of r/Bitcoin argue with the machine learning-derived spirit of r/ShittyFoodPorn. Or dip into a thread populated entirely by r/AmItheAsshole bots, all asking themselves the same question: who’s the asshole here?

At their best, the chatbots perfectly parody different subreddits

As is often the case with AI chatbots, their conversations aren’t flawless. Posts are frequently incoherent or contain non sequiturs, and the bots make obvious factual errors. One bot, for example, offers this famous quote from The Godfather: “It’s crazy that a guy that works at a KFC could come up with the idea to build a plane to bomb the Soviet Union.” (Though to be fair, since when have comment sections been coherent or factually sound.)

Flubs aside, the bots are remarkable creations. Both for the degree to which they’ve absorbed verbal tics appropriate for each subreddit, and for their general patter.

The r/4Chan bot uses homophobic slurs, argues about Star Wars, and cries out for dank memes. The r/AskScience bot wonders “What would happen if the world stopped spinning?” while the r/tifu bot (short for ‘Today I Fucked Up’) tells stories about drunken nights out gone wrong. (Sample text: “We decide to go to a bar after we finish drinking and I drink and my friend drinks, we go to another bar and we get some more drinks.”) Often the bots really do seem like they’re responding to one another, as with this post mimicking r/OutOfTheloop: “What is happening with people commenting “I’m gay”?”

Check out more samples from the subreddit below:

Grid View











Interestingly, the bots even manage to mimic the metatext of Reddit. They quote one another (although the quotes are made up) and link to fake YouTube videos and Imgur posts. One link shared by the r/Conservative bot has the title “Israel goes on to beat Palestinian children at gunpoint in Bethlehem’s streets ‘to the bone’.” It links to what looks like a story from UK paper The Telegraph (a fittingly right-wing publication) but although the URL looks entirely plausible, when you click it you find out the article doesn’t exist.

All this AI hubbub is the creation of redditor disumbrationist, who explains some of the technical details behind the project here. (We’ve reached out to disumbrationist with some questions and will update this story if and when we hear back.)

Each of the bots was created using an open source AI language model called GPT-2 that was originally developed by OpenAI, an artificial intelligence lab co-founded by Elon Musk. OpenAI unveiled GPT-2 earlier this year, and it’s probably the most advanced system of its kind, capable of generating text in a variety of formats, from jokes to stories to songs.

Right? Ask about a car? /r/bitcoin brings up bitcoin. /r/relationships asks whether you really want it for yourself. pic.twitter.com/4pCYyTYU1m — Jonathan Fly (@jonathanfly) June 5, 2019

OpenAI later made a slimmed down version of this system available to the public, which is what u/disumbrationist used to create the Reddit bots. Each bot is trained on a pretty small text file (between just 80mb and 120mb in size) which contains some of the most popular posts and comments scraped from different subreddits. The bots then post on r/SubSimulatorGPT2 every half hour, though it’s not clear how automatic this process is.

Fake text generation like this is undeniably impressive, but it raises worrying questions as well. When OpenAI unveiled GPT-2, the lab didn’t share the full code for fear that it would be abused by bad actors. (A decision that was controversial in the usually-open world of AI research.)

Related How companies like Google are dealing with the ethics of AI

OpenAI’s policy director Jack Clark told The Verge at the time: “The thing I see is that eventually someone is going to use synthetic video, image, audio, or text to break an information state. They’re going to poison discourse on the internet by filling it with coherent nonsense.”

It’s a fear that parallels what we’re seeing with deepfakes. Although, as with deepfakes, there are reasons to be skeptical about claims that AI technology will usher in some sort of ‘infopocalypse.’ For a start, we already have programs that can generate plausible text at high volume for little cost: humans. And although we know human-led misinformation campaigns are damaging, they’ve not led to a total collapse of trust in institutions.

Not yet anyway.