In a previous blog post I covered how to utilize the YouTube API to find the preview images for videos and then reverse search them using the TinEye API. In this blog post we will cover how to use the same techniques for Vimeo to retrieve the location of the preview image, and then we will use our previous code to submit it to TinEye for reverse searching. This can assist you in determining whether you are looking at a brand new video or something that has been reposted from an earlier point in time. Let’s get started.

The Vimeo Simple API

Vimeo does have a full featured API that we can use to do all kinds of fancy stuff like searching for videos, users, etc. This is called the Advanced API. But there is also a handy feature of Vimeo where they automatically provide JSON output for every video they publish, which they call the Simple API. For example if we view an awesome volleyball video:

https://vimeo.com/71215064

We can see that the video ID for that video is: 71215064

To retrieve all of the JSON for this video we can use the following URL scheme:

http://vimeo.com/api/v2/video/{VIDEOID}.json

So in our awesome volleyball example this looks like:

http://vimeo.com/api/v2/video/71215064.json

The main drawback of using the Simple API is that it is only useful on public videos. If you require your script to work with private videos or to do more advanced querying against Vimeo you will need to get an API key and check out the developer docs.

Examining the JSON

So what does this JSON document actually contain? If you just browse to the URL your browser will download a JSON file and you can open it with your favourite text editor or my favourite Python IDE, Wing. Let’s examine the JSON:

[{u’description’: u’Some moments and highlights from the Olympic Games 2012 Volleyball tournament. Music: Jakob – Malachite’,

u’duration’: 303,

u’embed_privacy’: u’anywhere’,

u’height’: 480,

u’id’: 71215064,

u’mobile_url’: u’https://vimeo.com/71215064′,

u’stats_number_of_comments’: 3,

u’stats_number_of_likes’: 59,

u’stats_number_of_plays’: 49593,

u’tags’: u’volleyball, olympics, olympic games, london 2012, highlights, slow motion, brazil, russia, poland, italy’,

u’thumbnail_large’: u’https://i.vimeocdn.com/video/444712440_640.webp’,

u’thumbnail_medium’: u’https://i.vimeocdn.com/video/444712440_200x150.webp’,

u’thumbnail_small’: u’https://i.vimeocdn.com/video/444712440_100x75.webp’,

u’title’: u’Olympic Games 2012 Volleyball in slow motion’,

u’upload_date’: u’2013-07-28 17:34:09′,

u’url’: u’https://vimeo.com/71215064′,

u’user_id’: 2460313,

u’user_name’: u’Yngve Sundfjord’,

u’user_portrait_huge’: u’https://i.vimeocdn.com/portrait/362408_300x300.webp’,

u’user_portrait_large’: u’https://i.vimeocdn.com/portrait/362408_100x100.webp’,

u’user_portrait_medium’: u’https://i.vimeocdn.com/portrait/362408_75x75.webp’,

u’user_portrait_small’: u’https://i.vimeocdn.com/portrait/362408_30x30.webp’,

u’user_url’: u’https://vimeo.com/sundfjord’,

u’width’: 640}]

Pretty awesome right? We have a bunch of useful information stored here. In particular we are interested in the thumbnail_large key as this will give us the image that we can use to submit to the TinEye API to see if we have additional results or other sites that contain the image. As well you will notice an upload_date which you can use to verify whether this video went up before other results that you find in your reverse image searching.

Now let’s start coding this up. If you haven’t already installed the TinEye API then look at the previous post instructions on how to get up and running with it. Let’s open a new script and call it vimeoreversesearch.py and punch in the following code:

import argparse import requests from pytineye import TinEyeAPIRequest tineye = TinEyeAPIRequest('http://api.tineye.com/rest/','PUBLICKEY','PRIVATEKEY') ap = argparse.ArgumentParser() ap.add_argument("-v","--videoID", required=True,help="The videoID of the Vimeo video. For example: https://www.vimeo.com/VIDEOID") args = vars(ap.parse_args()) video_id = args['videoID'] 1 2 3 4 5 6 7 8 9 10 11 12 import argparse import requests from pytineye import TinEyeAPIRequest tineye = TinEyeAPIRequest ( 'http://api.tineye.com/rest/' , 'PUBLICKEY' , 'PRIVATEKEY' ) ap = argparse . ArgumentParser ( ) ap . add_argument ( "-v" , "--videoID" , required = True , help = "The videoID of the Vimeo video. For example: https://www.vimeo.com/VIDEOID" ) args = vars ( ap . parse_args ( ) ) video_id = args [ 'videoID' ]

Ok so nothing too fancy here yet. We are just setting up the TinEye API, adding some argument parsing for the script and extracting the video_id variable from the command line arguments passed in. Let’s implement our Vimeo JSON retrieval function now:

import argparse import requests from pytineye import TinEyeAPIRequest tineye = TinEyeAPIRequest('http://api.tineye.com/rest/','PUBLICKEY','PRIVATEKEY') ap = argparse.ArgumentParser() ap.add_argument("-v","--videoID", required=True,help="The videoID of the Vimeo video. For example: https://www.vimeo.com/VIDEOID") args = vars(ap.parse_args()) video_id = args['videoID'] # # Retrieve the video JSON from Vimeo # def get_vimeo_video(video_id): url = "http://vimeo.com/api/v2/video/%s.json" % video_id response = requests.get(url) if response.status_code == 200: video_info = response.json() print "[*] Video uploaded: %s" % video_info[0]['upload_date'] return video_info[0]['thumbnail_large'] else: print "[!!!] Failed to retrieve the video: %s" % response.content # # Search TinEye for the image. # def search_tineye(image_url): try: result = tineye.search_url(image_url) except: pass result_urls = [] dates = {} for match in result.matches: for link in match.backlinks: if link.backlink not in result_urls: result_urls.append(link.backlink) dates[link.crawl_date] = link.backlink if len(result_urls) > 1: print print "[*] Discovered %d unique URLs with image matches." % len(result_urls) for url in result_urls: print url oldest_date = sorted(dates.keys()) print print "[*] Oldest match was crawled on %s at %s" % (str(oldest_date[0]),dates[oldest_date[0]]) else: print "[!!!] No results found on TinEye." # grab the Vimeo video details image_url = get_vimeo_video(video_id) # submit the image to TinEye search_tineye(image_url) 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 # # Retrieve the video JSON from Vimeo # def get_vimeo_video ( video_id ) : url = "http://vimeo.com/api/v2/video/%s.json" % video_id response = requests . get ( url ) if response . status_code == 200 : video_info = response . json ( ) print "[*] Video uploaded: %s" % video_info [ 0 ] [ 'upload_date' ] return video_info [ 0 ] [ 'thumbnail_large' ] else : print "[!!!] Failed to retrieve the video: %s" % response . content

Let’s break this code down a little bit:

Line 17: we define our get_vimeo_video function that takes in a video_id parameter that represents the Vimeo video ID we have covered previously.

we define our function that takes in a video_id parameter that represents the Vimeo video ID we have covered previously. Lines 19-21: we are building up the URL to retrieve the video JSON (19) and then we send the HTTP request off (21).

we are building up the URL to retrieve the video JSON (19) and then we send the HTTP request off (21). Lines 23-29: if our request was successful (23) then we store the parsed JSON in the video_info variable (25). We output the date that the video was uploaded (27) and then return the location of the large preview image for the video (29).

Ok so the code we have developed so far is going to take care of grabbing and parsing the JSON from the Vimeo servers and give us an image location that we can use for reverse searching with the TinEye API. Now let’s get that image to TinEye by implementing a function to deal with it. Some of this code is reused from the previous post so it might look familiar. Add the following code to your script:

import argparse import requests from pytineye import TinEyeAPIRequest tineye = TinEyeAPIRequest('http://api.tineye.com/rest/','PUBLICKEY','PRIVATEKEY') ap = argparse.ArgumentParser() ap.add_argument("-v","--videoID", required=True,help="The videoID of the Vimeo video. For example: https://www.vimeo.com/VIDEOID") args = vars(ap.parse_args()) video_id = args['videoID'] # # Retrieve the video JSON from Vimeo # def get_vimeo_video(video_id): url = "http://vimeo.com/api/v2/video/%s.json" % video_id response = requests.get(url) if response.status_code == 200: video_info = response.json() print "[*] Video uploaded: %s" % video_info[0]['upload_date'] return video_info[0]['thumbnail_large'] else: print "[!!!] Failed to retrieve the video: %s" % response.content # # Search TinEye for the image. # def search_tineye(image_url): try: result = tineye.search_url(image_url) except: print "[!!!] TinEye search failed!" pass result_urls = [] dates = {} for match in result.matches: for link in match.backlinks: if link.backlink not in result_urls: result_urls.append(link.backlink) dates[link.crawl_date] = link.backlink if len(result_urls) > 1: print print "[*] Discovered %d unique URLs with image matches." % len(result_urls) for url in result_urls: print url oldest_date = sorted(dates.keys()) print print "[*] Oldest match was crawled on %s at %s" % (str(oldest_date[0]),dates[oldest_date[0]]) else: print "[!!!] No results found on TinEye." # grab the Vimeo video details image_url = get_vimeo_video(video_id) # submit the image to TinEye search_tineye(image_url) 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 # # Search TinEye for the image. # def search_tineye ( image_url ) : try : result = tineye . search_url ( image_url ) except : print "[!!!] TinEye search failed!" pass result_urls = [ ] dates = { } for match in result . matches : for link in match . backlinks : if link . backlink not in result_urls : result_urls . append ( link . backlink ) dates [ link . crawl_date ] = link . backlink

This is a bit more code so let’s step through it together:

Line 39: we create our search_tineye function that receives an image_url parameter that is the location of the Vimeo preview image.

we create our function that receives an image_url parameter that is the location of the Vimeo preview image. Lines 41-45: we send off our request to the TinEye API (42) and if there are any problems with our call (usually because you copy/paste your API keys incorrectly) then we output an error message (44) and return (45).

we send off our request to the TinEye API (42) and if there are any problems with our call (usually because you copy/paste your API keys incorrectly) then we output an error message (44) and return (45). Lines 50-57: we walk through the list of TinEye results (50) and each result can contain links that we also walk through (52). We begin adding new links to our result_urls (56) as well as adding the date of the link to the dates list so that we can find the oldest post later.

Now let’s implement the pieces that will show us the results of our API requests. Add the following code:

import argparse import requests from pytineye import TinEyeAPIRequest tineye = TinEyeAPIRequest('http://api.tineye.com/rest/','PUBLICKEY','PRIVATEKEY') ap = argparse.ArgumentParser() ap.add_argument("-v","--videoID", required=True,help="The videoID of the Vimeo video. For example: https://www.vimeo.com/VIDEOID") args = vars(ap.parse_args()) video_id = args['videoID'] # # Retrieve the video JSON from Vimeo # def get_vimeo_video(video_id): url = "http://vimeo.com/api/v2/video/%s.json" % video_id response = requests.get(url) if response.status_code == 200: video_info = response.json() print "[*] Video uploaded: %s" % video_info[0]['upload_date'] return video_info[0]['thumbnail_large'] else: print "[!!!] Failed to retrieve the video: %s" % response.content # # Search TinEye for the image. # def search_tineye(image_url): try: result = tineye.search_url(image_url) except: print "[!!!] TinEye search failed!" return None result_urls = [] dates = {} for match in result.matches: for link in match.backlinks: if link.backlink not in result_urls: result_urls.append(link.backlink) dates[link.crawl_date] = link.backlink if len(result_urls) > 1: print print "[*] Discovered %d unique URLs with image matches." % len(result_urls) for url in result_urls: print url oldest_date = sorted(dates.keys()) print print "[*] Oldest match was crawled on %s at %s" % (str(oldest_date[0]),dates[oldest_date[0]]) else: print "[!!!] No results found on TinEye." # grab the Vimeo video details image_url = get_vimeo_video(video_id) # submit the image to TinEye search_tineye(image_url) 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 if len ( result_urls ) > 1 : print print "[*] Discovered %d unique URLs with image matches." % len ( result_urls ) for url in result_urls : print url oldest_date = sorted ( dates . keys ( ) ) print print "[*] Oldest match was crawled on %s at %s" % ( str ( oldest_date [ 0 ] ) , dates [ oldest_date [ 0 ] ] ) else : print "[!!!] No results found on TinEye."

Line 59: we are testing here if we have items in our result_urls list which indicates that we have hits from our TinEye request.

we are testing here if we have items in our list which indicates that we have hits from our TinEye request. Lines 60-65: we print out the number of hits (61) and then walk through the list of results (63) and print out the URL where the image was found (65).

we print out the number of hits (61) and then walk through the list of results (63) and print out the URL where the image was found (65). Lines 68-71: we sort the list of dates (68) which will put them in chronological order, so that we can print out the oldest date (71).

Alright the bulk of our script is finished so now all we need to do is put the final touch on which is to call the functions that we have setup:

import argparse import requests from pytineye import TinEyeAPIRequest tineye = TinEyeAPIRequest('http://api.tineye.com/rest/','PUBLICKEY','PRIVATEKEY') ap = argparse.ArgumentParser() ap.add_argument("-v","--videoID", required=True,help="The videoID of the Vimeo video. For example: https://www.vimeo.com/VIDEOID") args = vars(ap.parse_args()) video_id = args['videoID'] # # Retrieve the video JSON from Vimeo # def get_vimeo_video(video_id): url = "http://vimeo.com/api/v2/video/%s.json" % video_id response = requests.get(url) if response.status_code == 200: video_info = response.json() print "[*] Video uploaded: %s" % video_info[0]['upload_date'] return video_info[0]['thumbnail_large'] else: print "[!!!] Failed to retrieve the video: %s" % response.content # # Search TinEye for the image. # def search_tineye(image_url): try: result = tineye.search_url(image_url) except: print "[!!!] TinEye search failed!" return None result_urls = [] dates = {} for match in result.matches: for link in match.backlinks: if link.backlink not in result_urls: result_urls.append(link.backlink) dates[link.crawl_date] = link.backlink if len(result_urls) > 1: print print "[*] Discovered %d unique URLs with image matches." % len(result_urls) for url in result_urls: print url oldest_date = sorted(dates.keys()) print print "[*] Oldest match was crawled on %s at %s" % (str(oldest_date[0]),dates[oldest_date[0]]) else: print "[!!!] No results found on TinEye." # grab the Vimeo video details image_url = get_vimeo_video(video_id) # submit the image to TinEye search_tineye(image_url) 76 77 78 79 80 81 # grab the Vimeo video details image_url = get_vimeo_video ( video_id ) # submit the image to TinEye search_tineye ( image_url )

That’s it! We call our get_vimeo_video function to retrieve the preview image URL and then pass it off to our search_tineye function to do the search on TinEye. Let’s see what happens when we run it.

Let It Rip!

When you run the script using the ID above it would look like this:

# python vimeo_reverse_search.py -v 71215064

[*] Video uploaded: 2013-07-28 17:34:09

[*] Discovered 2 unique URLs with image matches.

http://wn.com/Olympic_Games

http://www.volleyball-movies.net/category/84

[*] Oldest match was crawled on 2014-02-07 00:00:00 at http://wn.com/Olympic_Games

Cool so we can see that the video was uploaded on July 28, 2013 and we see that the oldest detected image was from February 7, 2014. This could be an indicator that the video on Vimeo was put online before it was put on the other sites detected.