I'm trying to scrape new stories from Reddit using their API and Python's urllib2, but I keep getting JSON documents like this one:

{ u'kind': u'Listing', u'data': { u'modhash': u'', u'children': [], u'after': None, u'before': None }}

Here is my code:

import json import time import urllib2 def get_submissions(after=None): url = 'http://reddit.com/r/all/new.json?limit=100' if after: url += '&after=%s' % after _user_agent = 'Reddit Link Analysis Bot by PirateLogic @ github.com/jamesbrewer' _request = urllib2.Request(url, headers={'User-agent': _user_agent}) _json = json.loads(urllib2.urlopen(_request).read()) return [story for story in _json['data']['children']], _json['data']['after'] if __name__ == '__main__': after = None stories = [] limit = 1 while len(stories) < limit: new_stories, after = get_submissions(after) stories.extend(new_stories) time.sleep(2) # The Reddit API allows one request every two seconds. print '%d stories collected so far .. sleeping for two seconds.' % len(stories)

What I've written is fairly short and straight-forward, but I'm obviously overlooking something or I don't have a complete understanding of the API or how urllib2 works.

Here's an example page from the API.

What's the deal?

EDIT After trying to load the example page in another browser, I'm also seeing the JSON I posted at the top of the page. It seems to be only for //new.json though. If I try //hot.json or just /.json, I get what I want.