What’s up? Today we’re going to look at how to retrieve the stories (official term for submissions or selfposts) from any given subreddit. What we’re going to do is pretty simple, essentially just customizing a url with the proper subreddit and reading the JSON object returned. It’s going to be a pretty short one. I’m going to attach the login code I’ve written along with the code we’ve looked at today so that you can just copy and paste it into your IDE and start playing with it, right away. Just make sure you’ve got all the required module installed, mentioned here. Hit the jump to get started!

Tinypaste Link for entire code

First we create a new function, called `subredditInfo` (feel free to choose better names) that takes the following arguments: ‘client’, which is simply a ‘requests.session’ object with the modhash saved in a cookie. We looked at how to make that session instance in an earlier post, so we won’t get into that again.

Next argument is ‘limit’ which limits the amount of stories reddit sends back to you. I believe the default it 25, but I haven’t checked. The reddit API says that you can only fetch 100 links at a time, so you’ll need to familiarize yourself with the ‘before’ and ‘after’ parameters in the docs.

The ‘sr’ argument is the subreddit from which we’re getting the stories from. The capitalization doesn’t seem to matter. Don’t forget that you can concatenate several subreddits at a time with ‘+’, so you could do “sr=’devblogs+loony'” if you wanted two at a time.

The ‘sorting’ argument is the way you’d like the API to sort the stories it sends back. You can choose ‘new’, ‘top’ or ‘hot’.

‘return_json’ is so our function knows whether to return the straight json response, or the list of stories. It just depends on what you want to use it for.

Finally, the ‘**kwargs‘ is for the unknown parameters you’d want to set, such as the aforementioned ‘before’ and ‘after’ parameters. Basically how those work is that you specify whether you want the stories before, or after, the id of the story you pass in. Say there was three stories, ‘a’, ‘b’, and, ‘c’. If you wanted ‘a’, you’d enter ‘before’ = ‘b’ as a parameter, and you’d get ‘a’. Likewise, if you pass the arugment ‘after’ = ‘b’, you’d get the ‘c’ story. It’s confusing, but if you play around with it, you’ll get the hang of it.

You can also pass in the time frame you’d like to get the stories from, either ‘hour’, ‘week’, ‘month’, ‘year’, or ‘all’ for all-time.

#---------------------------------------------------------------------- def subredditInfo(client, limit=25, sr='tankorsmash', sorting='', return_json=False, **kwargs): """retrieves X (max 100) amount of stories in a subreddit

'sorting' is whether or not the sorting of the reddit should be customized or not, if it is: Allowed passing params/queries such as t=hour, week, month, year or all"""

Here we set the parameters we’d like to send along with the URL. We first create a dict with the ‘limit’ string as a key and the ‘limit’ argument as a value. Then we update the dict (combine two dicts, overwriting any matching keys pairs) with the key word arguments we passed on when we called the function.

#query to send parameters = {'limit': limit,} parameters.update(kwargs)

Then we build the url, filling in the proper subreddit and sorting method on the fly. By default the url is ‘http://www.reddit.com/r/tankorsmash/.json’, but you’re going to want to change that for your own purposes fairly quickly. Then we called the ‘get’ method of the ‘client’, which is simply a ‘requests.session’ instance, which makes an HTTP request to the URL. After that, we catch the HTML response, and turn the JSON response into a Python dict. Here I use the builtin json method, but you could also use the ‘json’ module for the exact same thing.

url = r'http://www.reddit.com/r/{sr}/{top}.json'.format(sr=sr, top=sorting) r = client.get(url,params=parameters) print 'sent URL is', r.url j = r.json #j = json.loads(r.text) ## manual alternative

Here, we either return the raw json dict or return a list of stories, so it’s easier to iterate over and manipulate. There’s too many parts of the json response to go over here, but the most important parts are the ‘title’, ‘url’, ‘permalink’, ‘id’ and ‘author’. You can find each of those keys inside the list at j[‘data’][‘children’] which has the dict called ‘data’, which holds the key/value pairs you’re looking for. Say you were looking for the title of the first item in the returned json, you’d get it like this: ‘j[‘data’][‘children’][0][‘data’][‘title’]’. If you ever get an IndexError, it’s probably because you’re trying to put a key value in, instead of an index ( an integer ).

#return raw json if return_json: return j #or list of stories else: stories = [] for story in j['data']['children']: #print story['data']['title'] stories.append(story) return stories

And there you have it. Some of those paragraphs got a little long, but what can you do! Here’s the total code, just make sure you enter your own username and password.

import json import requests from pprint import pprint as pp2 #import os #print os.getcwd() #---------------------------------------------------------------------- def login(username, password): """logs into reddit, saves cookie""" print 'begin log in' #username and password UP = {'user': username, 'passwd': password, 'api_type': 'json',} headers = {'user-agent': '/u/TankorSmash\'s API python bot', } #POST with user/pwd client = requests.session(headers=headers) r = client.post('http://www.reddit.com/api/login', data=UP) #print r.text #print r.cookies #gets and saves the modhash j = json.loads(r.text) client.modhash = j['json']['data']['modhash'] print '{USER}\'s modhash is: {mh}'.format(USER=username, mh=client.modhash) client.user = username def name(): return '{}\'s client'.format(username) #pp2(j) return client #---------------------------------------------------------------------- def subredditInfo(client, limit=25, sr='tankorsmash', sorting='', return_json=False, **kwargs): """retrieves X (max 100) amount of stories in a subreddit

'sorting' is whether or not the sorting of the reddit should be customized or not, if it is: Allowed passing params/queries such as t=hour, week, month, year or all""" #query to send parameters = {'limit': limit,} #parameters= defaults.copy() parameters.update(kwargs) url = r'http://www.reddit.com/r/{sr}/{top}.json'.format(sr=sr, top=sorting) r = client.get(url,params=parameters) print 'sent URL is', r.url j = json.loads(r.text) #return raw json if return_json: return j #or list of stories else: stories = [] for story in j['data']['children']: #print story['data']['title'] stories.append(story) return stories client = login('USERNAME', 'PASSWORD') j = subredditInfo(client, limit=1) pp2(j)