My latest obsession has been writing bots for Reddit. Reddit is an online forum that hosts a range of topics, everything from news and politics to strange, possibly nsfw captions to WikiHow images. But mostly, it’s just memes.

What’s nice about Reddit is the sheer volume of content. Apparently 6% of internet users in the US used the social network, so its a treasure trove of user generated content. But the best part is the Reddit API, specifically PRAW, the python API wrapper. PRAW does all the rate limiting for you and lets you drink from the proverbial fire hose by streaming every post or comment posted to the site live. That’s around 11mm monthly posts 2.8mm daily comments. And you can scrape them all with:

for c in r.subreddit("all").stream.comments(pause_after=-1):

do_something(c)

Well, not exactly. I do notice comments fly under the radar and don’t trigger my bots correctly, which I think is due to the internal workings of PRAW rate limiting or latency on my end.

And adding app authorization is as simple as Settings -> App Authorization -> Create Application.

Don’t worry, only name and redirect uri are actually required and redirect uri can be localhost

Create as many apps as you want. Create as many account as you want with one email.

Ease of access and a good API play no small role in Reddit’s success.

API doesn’t just benefit people like me, but also the hundreds of moderators making sure everyone stays in line. Reddit de-centralized the moderation process, although controversies still exist.

Drinking From the Fire Hose

Despite the ease of the API, some boiler plate is required. As with most things involving web-scraping, it mostly includes wrapping everything up in a try-except and logging. When you’re drinking from the fire hose and you start to choke, you just have to step back for a second and keep going.

I wrote a simple bot platform that I’ll eventually write a post on. It allows me to write a bot. Here is an example of a bot that replies “Nice” to comments that have at least 2 consecutive replies of “Nice” (don’t ask, it’s a Reddit thing).

from bot import Bot

import re

from utils.utils import get_commented_posts

from praw.models.reddit.comment import Comment NAME = "RepliesNice" already_commented = get_commented_posts("logs/{}.log".format(NAME))

NICE = "Nice"

ACCEPTABLE_NICE = {"Nice", "Nice."} def action(p, c, test):

global already_commented

already_commented = already_commented.union(set([p.id]))

if not test:

r = c.reply(NICE)

r.disable_inbox_replies()

return NICE def get_comment_body(comment):

if type(comment) is Comment:

return comment.body def get_bot():

bot = Bot(NAME)

bot.is_valid_post = lambda p : not p.archived and p.id not in already_commented

bot.is_valid_comment = lambda c: c.body in ACCEPTABLE_NICE and get_comment_body(c.parent()) in ACCEPTABLE_NICE

bot.action = action

return bot if __name__ == "__main__":

bot = get_bot()

bot.monitor_comments_live("all")

Here’s the bot in action reaping some of that sweet, sweet Reddit karma

Other bots include

Colorizer Bot

My favorite bot is the colorizer bot, which trolls subreddits pics, images, oldschoolschool, and historyporn — recently banned on historyporn :-(

The bot is based on the Colorful Image Colorization repo. The bot downloads every image posted on these subreddits, checks if they’re black and white and if so spins up a docker image to process the image and uploads it to imgur. The bot can also be triggered from outside subreddits by the keyword “colorize”.

Here’s an example of FDR as a boy:

FDR as a boy. A very pretty boy

The results are hit-or-miss. One problem is that many of the images processed are black and white because they’re old, and the neural network was trained on modern images.