This is the second installment of my Imgur API: ‘How to entire download Imgur Galleries’. Check out part 1 here in case you missed how to log into the API and upload an image!

I’m actually sort of cheating here, because we don’t actually need to use the API at all here, if we don’t want to. That is because we’re only going to deal with the galleries that are built from the images submitted to reddit. This means after you’re done with this tutorial, you’ll be able to just set the script to a given subreddits name, and grab all the images that have been submitted to /r/aww or /r/wallpapers.

I will be writing another tutorial soon for galleries and albums unrelated to reddit.com as well as grabbing gallery information, such as the title and other descriptive things like that, but that’s less related to the actual downloading of the gallery, which is what we’re interested in today!

Please note that I’m working with Python 2.7 on Windows 7 64-bit, so you might have to modify the code slightly to accommodate for your platform, or OS.

Anyways, hit the jump to get started!

Edit: reddit user: easttntoppedtree caught that it maxes out at 56 images, so you’ll have to add /page/PAGENUMBERHERE.json to the end of the url to get the next 56 images like so: http://imgur.com/r/scarlettjohansson/top/page/1.json’ while keeping in mind that 0 (zero) is a valid page number

Tinypaste link for the full working code, as seen at the bottom of the page

First things first: the imports. We are going to import the usual networking suspects, requests , json , as well as the ever useful pprint module. New to us this time is the datetime module, which is a handy way to handle time related things in Python. We’ll just be using it to grab todays date and time. Finally, there’s the os module, which Python uses to interact with the proper OS for your computer. You can do helpful things like detect whether or not a folder exists and create one, or check which path Python is currently running in. It’s a very powerful module; so again, we’ll only be briefly using it to create a folder to hold our images in.

import requests import json from pprint import pprint import datetime import os

Here is where we’ll have our constants for the script, such as how many images we want to download, and which subreddit’s gallery we will take the images from.

##Set constants for script DL_LIMIT = 5 SUBREDDIT = ‘scarlettjohansson’

Here, we use the requests and json module to make a GET request to the url customized to fit our SUBREDDIT value that set above. Once we get the response from the site, we transform the raw JSON object into a Python Dictionary, which is something we can manipulate a lot more effectively within Python. Make sure that you are using the text attribute of the response, instead of the content or raw response data.

Edit: May 29th 2012: easttntoppedtree caught that it maxes out at 56 images, so you’ll have to add /page/PAGENUMBERHERE.json to the end of the url to get the next 56 images like so: http://imgur.com/r/scarlettjohansson/top/page/1.json’ while keeping in mind that 0 (zero) is a valid page number

##Download and load the JSON information for the Gallery #get json object from imgur gallery. can be appended with /month or /week for # more recent entries r = requests.get(r'http://imgur.com/r/{sr}/top.json'.format(sr=SUBREDDIT) #creates a python dict from the JSON object j = json.loads(r.text)

This is just for your own uses, to see exactly what the response was. You can use this to determine whether imgur is over-capacity or if the URL was set incorrectly. For now, I’ve commented it out, since it should be working fine for now.

#prints the dict, if necessary. Used for debug mainly #pprint(j)

Now, we extract the List of images in the JSON dict we had just created. You can check out the layout of the dictionary we created from the JSON by uncommenting the `pprint` line above

get the list of images from j['gallery'] image_list = j['gallery']

Some more flavour text, so we can confirm the amount of images in the gallery. It checks the amount of objects in the list, using the len builtin Python function

#print the number of images found print len(image_list), 'images found in the gallery'

More debugging options, here for you to examine the content of the first image found in the list we had just created, which is found at index 0, because, as you know, lists begin at index 0 instead of 1.

#debugging, examine the first image in the gallery, confirm no errors pprint(image_list[0])

Now, we want to create a folder in which we can fit all the images we are going to be downloading in a minute. I like putting them in timestamped folders, but you can easily change it to be called the name of the subreddit, or anything else.

Here, we use the `datetime` module to fetch the current time, in a format specific to `datetime`

#get the time object for today folder = datetime.datetime.today()

So that means we need to turn it into a printable string we can use to name our folder, so we run the str builtin function on it, which does exactly what we want.

#turn it into a printable string string_folder = str(folder)

Then, since some characters cannot be used as a folder name, we need to remove them. We use the string’s function, called replace to remove the colon character, and replace it with a folder-friendly character, the period.

#replace some illegal chars legal_folder = string_folder.replace(':', '.')

Now, we use the mkdir function from the `os` module to create a folder using a legal string we just created. Remember that unless you specify otherwise, the folder will be created in the same location the script is running.

#create the folder using the name legal_folder os.mkdir(str(legal_folder))

Next, we need to extract the name and the type of image each image is. So we create an empty list, in which we’ll put a 2-item tuple, which will contain the name and extension of the file, which we’ll use for downloading and saving the image.

#list of pairs containing the image name and file extension image_pairs = []

At each index of the list of images we have created, we’ll find a dict filled will miscellaneous information about the image, such as its size, and how many times it was downloaded. All we’re interesting in though, is the hash and the ext keys and the value. So for every image dictionary in the list, we take the hash and ext keys and take associate values, and append them both to the newest list for later.

#extract image and file extension from dict for image in image_list: #get the raw image name img_name = image['hash'] #get the image extension(jpg, gif etc) img_ext = image['ext'] #append pair to list image_pairs.append((img_name, img_ext))

Next, we need to download the images from the website. We do that by substituting the name and ext of the image into the URL template below; but we don’t want to surpass our pre-set download limit, in case there’s a bandwidth limit, you don’t want to hammer Imgur’s server, so we need to keep track of the number of images we grab.

So first, we set a temporary variable to keep track of the number of images we’ve grabbed.

#current image number, for looping limits current = 0

Then we start a loop that stops when current is equal to or greater than the DL_LIMIT we set at the beginning of the file

#run download loop, until DL_LIMIT is reached for name, ext in image_pairs: #so long as we haven't hit the download limit: if current < DL_LIMIT:

Then, we fill the URL template with the name and extension of the image on the site.

#this is the image URL location url = r'http://imgur.com/{name}{ext}'.format(name=name, ext=ext) #print the image we are currently downloading print 'Current image being downloaded:', url

Next, we have to download the actual image, instead of the JSON that is referencing it. We do that by once again using the requests module to create a GET request to the URL we’ve filled in and then saving the response.

#download the image data response = requests.get(url) #set the file location path = r'./{fldr}/{name}{ext}'.format(fldr=legal_folder, name=name, ext=ext)

Then we create a file object, at the path location, and set the file object to ‘write binary’ instead of the default ‘read’ because we need to make sure we are writing, for one thing, but also in binary mode with binary data to a file, rather than strings (think 0s and 1s instead of ‘abcs’). This is the same reason we use the response.content attribute, instead of response.text

#open the file object in write binary mode fp = open(path, 'wb') #perform the write operation fp.write(response.content)

To finish off the for loop we close the file object we opened to write the image to disk, as well as increase the current image count, so we can make sure we don’t download too many images.

#close the file fp.close() #advance the image count current += 1

Finally, we close off with some flavour text, just to let the user know we’ve successfully ran the script.

#print off a completion string print 'Finished downloading {cnt} images to {fldr}!'.format(cnt=current, fldr=legal_folder)

As usual, here’s the full code, that you should be able to run on your PC just fine.

import requests import json from pprint import pprint import datetime import os ##Set constants for script DL_LIMIT = 5 ##Download and load the JSON information for the Gallery #get json object from imgur gallery. can be appended with /month or /week for # more recent entries r = requests.get(r'http://imgur.com/r/scarlettjohansson/top.json') #creates a python dict from the JSON object j = json.loads(r.text) #prints the dict, if necessary. Used for debug mainly #pprint(j) #get the list of images from j['gallery'] image_list = j['gallery'] #print the number of images found print len(image_list), 'images found in the gallery' #debugging, examine the first image in the gallery, confirm no errors pprint(image_list[0]) ## Create a dynamically named folder #get the time object for today folder = datetime.datetime.today() #turn it into a printable string string_folder = str(folder) #replace some illegal chars legal_folder = string_folder.replace(':', '.') #create the folder using the name legal_folder os.mkdir(str(legal_folder)) ## Extract image info from the gallery #list of pairs containing the image name and file extension image_pairs = [] #extract image and file extension from dict for image in image_list: #get the raw image name img_name = image['hash'] #get the image extension(jpg, gif etc) img_ext = image['ext'] #append pair to list image_pairs.append((img_name, img_ext)) ## Download images from imgur.com #current image number, for looping limits current = 0 #run download loop, until DL_LIMIT is reached for name, ext in image_pairs: #so long as we haven't hit the download limit: if current < DL_LIMIT: #this is the image URL location url = r'http://imgur.com/{name}{ext}'.format(name=name, ext=ext) #print the image we are currently downloading print 'Current image being downloaded:', url #download the image data response = requests.get(url) #set the file location path = r'./{fldr}/{name}{ext}'.format(fldr=legal_folder, name=name, ext=ext) #open the file object in write binary mode fp = open(path, 'wb') #perform the write operation fp.write(response.content) #close the file fp.close() #advance the image count current += 1 #print off a completion string print 'Finished downloading {cnt} images to {fldr}!'.format(cnt=current, fldr=legal_folder)