Photo by Alexander Andrews on Unsplash

We’re working on some improvements to the navigation experience on Lingo, and I needed a large number of images to upload, in order to make sure the navigation features scaled well. After hunting around for a bit, I remembered that NASA maintains huge image archives with lenient access and usage policies.

I thought it might be fun to see if I could download several thousand images from NASA’s archives using only command line tools. Here’s what I ended up with. I code on a Mac, so this is assuming a standard Bash environment (I installed one tool via Homebrew, but everything else should be installed by default in most UNIX-y shell environments).

To start of with, I poked around NASA’s image archives and landed on “Astronomy Picture of the Day”, which has a simple, convenient API. I ran a quick test using their handy demo key (I have truncated [...] some response values for readability):



{

"date": "2018-04-02",

"explanation": "While cruising around Saturn[...]",

"hdurl": "https://apod.nasa.gov/apod/image/1804/SaturnRin[...]",

"media_type": "image",

"service_version": "v1",

"title": "Moons, Rings, Shadows, Clouds: Saturn (Cassini)",

"url": "https://apod.nasa.gov/apod/image/1804/SaturnRings[...]"

} curl https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY "date": "2018-04-02","explanation": "While cruising around Saturn[...]","hdurl": "https://apod.nasa.gov/apod/image/1804/SaturnRin[...]","media_type": "image","service_version": "v1","title": "Moons, Rings, Shadows, Clouds: Saturn (Cassini)","url": "https://apod.nasa.gov/apod/image/1804/SaturnRings[...]"

The indispensable curl fetches the API response and prints it to my console without any futzing around.

Now I just want to pull out the value of the url key, and download that file. Parsing data from JSON responses isn’t super convenient with standard-issue command line tools, so I decided to use jq , which can be installed with your favorite package manager (eg brew install jq ). I just piped the response from curl , then had jq extract the url value:

Alright, we have the URL to an image! To download the file, we can use another well-worn UNIX workhorse, wget , which can be used to build a full-featured web scraper (it even has a cameo in The Social Network), but we’ll just use it to download a single file at a time:



"https://apod.nasa.gov/apod/image/1804/Saturn[...]": Scheme missing. wget `curl https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY | jq .url`"https://apod.nasa.gov/apod/image/1804/Saturn[...]": Scheme missing.

Oops. What’s going on here? jq is returning the JSON value in quotes, but wget is expecting an unquoted string, and can’t parse the scheme ( https ) at the beginning of the URL. Let’s strip those quotes off by piping the URL to tr :



[...]

2018-04-02 20:46:31 (229 KB/s) - ‘SaturnRingsMoons_Cassini_967.jpg’ saved [40322/40322] wget `curl https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY | jq .url | tr -d '"'`[...]2018-04-02 20:46:31 (229 KB/s) - ‘SaturnRingsMoons_Cassini_967.jpg’ saved [40322/40322]

The output from wget is pretty verbose, with a progress indicator, etc, so I’m truncating everything except the last line, where we see that we successfully downloaded an image.

But we wanted a bunch of images, right? The API allows passing in a date parameter in the form of date=2018–04–02 , in order to request the Picture of the Day for a specific date. So all we need to do is loop through the days of the year in order, and run our command for each day, until we get as many images as we need.

As luck would have it, the date program allows specifing an adjustment offset with the -v flag, like so: date -v1d prints the current date plus one day, date -v-1d prints the current date minus one day (you can also substitute y for d to adjust by years, m for months, etc). date also gives us some formatting options, so we can get yesterday’s date in the format the API expects like so: date -v-1d +"%Y-%m-%d" .

We can use the $(…) syntax to nest the date command within the curl call, so that the output of date is concatenated with the base API endpoint string. This command will download yesterday’s Picture of the Day:

wget `curl "https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY&date="$(date -v-1d +'%Y-%m-%d') | jq .url | tr -d '"'`

Finally, all we need to do is wrap the whole thing in a for loop, using Bash’s sequence expression syntax:

for COUNT in {0..5}; do wget `curl -s " https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY&date= "$(date -v-${COUNT}d +"%Y-%m-%d") | jq '.url' | tr -d '"'`; done