Last May 10 and 11 the course of Computer Forensic Expert was held in Reus (Spain) by the Asociación Nacional de Tasadores y Peritos Judiciales Informáticos (ANTPJI), which I am member and one of the instructors, where I had the pleasure of giving a talk on two of my passions such as Python and OSINT (Open Source Intelligence).

Python is a great language to quickly develop all kinds of powerful applications with lots of libraries to perform exploits, reverse engineering, web analysis tools and more. No doubt a useful knowledge for any security expert.

Internet is immense, sheltering all unimaginable information and this is the reason why OSINT techniques are vital to collect, analyze and present this information.

For this course, I decided it would be interesting for attendees to learn how to develop simple tools (scripts) that allow them to perform OSINT using Python with a series of practical exercises with a specific objective each.

Presentation and code are available at VULNEX website.

Note: I have removed from the scripts the Google Hacking query, so the reader can insert its own query.

Tool #1

Objective: search ANTPJI members on LinkedIn using Google Custom Search API.

These scripts are very simple and do the same thing but in a different way. The first one uses the Google API Client, while the second uses the fantastic Requests library.

In these scripts we are using some Google Hacking to find members of the association on LinkedIn.

# File: ex1_a.py # Date: 05/14/13 # Author: Simon Roses Femerling # Desc: Basic Google Hacking # # VULNEX (C) 2013 # www.vulnex.com import const from apiclient.discovery import build import pprint # your google hacking query query='' query_params='' doquery=query+query_params service = build("customsearch","v1",developerKey=const.cse_token) res = service.cse().list( q=doquery, cx=const.cse_id, num=10).execute() pprint.pprint(res) # VULNEX EOF

# File: ex1_b.py # Date: 05/14/13 # Author: Simon Roses Femerling # Desc: Simple Google Hacking # # VULNEX (C) 2013 # www.vulnex.com import requests import json import urllib import const site="https://www.googleapis.com/customsearch/v1?key=" # Your Google Hacking query query='' query_params='' url=site+const.cse_token+"&cx="+const.cse_id+"&q=" + urllib.quote(query+query_params) response = requests.get(url) print json.dumps(response.json,indent=4) # VULNEX EOF

When running any of these scripts, we get the following result:

Not too interesting for the moment 🙂

Tool #2

Objective: obtain photos of the ANTPJI members LinkedIn profiles using Google Custom Search API.

The following script gets the photos of the members of the association on LinkedIn and also extracts picture metadata 😉 The script generates an HTML page with all the photos.

Used libraries: Google API Client, PIL, Requests and Markup.

# File: ex2.py # Date: 05/14/13 # Author: Simon Roses Femerling # Desc: Download picture and extract metadata # # VULNEX (C) 2013 # www.vulnex.com import const from apiclient.discovery import build import pprint import os from PIL import Image from StringIO import StringIO from PIL.ExifTags import TAGS import requests import markup def do_query(istart=0): if istart == 0: res = service.cse().list( q=doquery, cx=const.cse_id, num=10).execute() else: res = service.cse().list( q=doquery, cx=const.cse_id, num=10, start=istart).execute() return res pic_id=1 do_stop=10 cnt=1 page=markup.page() # Set page title page.init(title="ANTPJI OSINT") page.h1("ANTPJI OSINT") # Set output directory out_dir = "pics_gepl" # Your Google Hacking query query='' query_params='' doquery=query+query_params service = build("customsearch","v1",developerKey=const.cse_token) if not os.path.exists(out_dir): os.makedirs(out_dir) res=[] while True: if cnt==1: res = do_query() else: if not res['queries'].has_key("nextPage"): break res = do_query(res['queries']['nextPage'][0]['startIndex']) cnt+=1 if cnt > do_stop: break if res.has_key("items"): for item in res['items']: name="" if not item.has_key('pagemap'): continue if not item['pagemap'].has_key('hcard'): continue hcard = item['pagemap']['hcard'] for card in hcard: pic_url="" if 'title' in card: if 'fn' in card: name = card['fn'] if 'photo' in card: pic_url = card['photo'] if pic_url != "": image = requests.get(pic_url) pic_n = os.path.join(out_dir,"%s.jpg") % pic_id file = open(pic_n,"w") pic_id+=1 try: i = Image.open(StringIO(image.content)) if hasattr(i,"_getexif"): ret = {} info = i._getexif() if info: for k,v in info.items(): decode = TAGS.get(k,v) ret[decode] = v print ret i.save(file,"JPEG") page.p(name.encode('ascii','ignore')) page.img(src=pic_n) page.br() page.br() except IOError, e: print "error: %s" % e file.close() # Set your output filename with open('index_gepl.html','w') as fp: fp.write(str(page)) # VULNEX EOF

And this is the result:

With few lines of code we have got a very interesting tool.

Tool #3

Objective: what is the relationship of ANTPJI members on LinkedIn?

With this script we are looking for the relationship between the members of the association at LinkedIn and create a graph that relates the words.

Used libraries: Google API Client, NetworkX and Matplotlib.

# File: ex3.py # Date: 05/14/13 # Author: Simon Roses Femerling # Desc: Build graph from profiles # # VULNEX (C) 2013 # www.vulnex.com import const from apiclient.discovery import build import networkx as nx import matplotlib.pyplot as plt def do_query(istart=0): if istart == 0: res = service.cse().list( q=doquery, cx=const.cse_id, num=10).execute() else: res = service.cse().list( q=doquery, cx=const.cse_id, num=10, start=istart).execute() return res do_stop=10 cnt=1 # Your Google Hacking query here query='' query_params='' doquery=query+query_params service = build("customsearch","v1",developerKey=const.cse_token) G=nx.DiGraph() res=[] while True: if cnt==1: res = do_query() else: if not res['queries'].has_key("nextPage"): break res = do_query(res['queries']['nextPage'][0]['startIndex']) cnt+=1 if cnt > do_stop: break if res.has_key("items"): for item in res['items']: name="" if not item.has_key('pagemap'): continue if not item['pagemap'].has_key('hcard'): continue hcard = item['pagemap']['hcard'] for card in hcard: if 'title' in card: if 'fn' in card: name = card['fn'] G.add_edge(name,card["fn"]) plt.figure(figsize=(30,30)) nx.draw(G) # Set your output filename plt.savefig('antpji_rela_map.png') # VULNEX EOF

And this is the graph generated:

Tool #4

Objective: what’s hot on Twitter account of the association?

This script downloads the latest tweets from the account of the association and generates a tag cloud. Useful to quickly view what are they talking about.

Used libraries: Requests, pytagcloud.

# File: ex4.py # Date: 05/14/13 # Author: Simon Roses Femerling # Desc: Create word cloud # # VULNEX (C) 2013 # www.vulnex.com import requests import json import urllib import const from pytagcloud import create_tag_image, make_tags from pytagcloud.lang.counter import get_tag_counts site="http://search.twitter.com/search.json?q=" # Your query here query="" url=site+urllib.quote(query) response = requests.get(url) tag = [] for res in response.json["results"]: tag.append(res["text"].encode('ascii','ignore')) text = "%s" % "".join(tag) tags = make_tags(get_tag_counts(text),maxsize=100) # Set your output filename create_tag_image(tags,"antpji_word_cloud.png", size=(600,500), fontname="Lobster") # VULNEX EOF

And this is the tag cloud:

Tool #5

Objective: do the ANTPJI usernames from Twitter exist on social networks sites?

The following script extracts the usernames that have been published or mentioned in the Twitter of the association and checks in 160 social networks sites.

Used libraries: Requests.

# File: ex5.py # Date: 05/14/13 # Author: Simon Roses Femerling # Desc: Check usernames on 160 social network sites # # VULNEX (C) 2013 # www.vulnex.com import requests import json import urllib import const import pprint site="http://search.twitter.com/search.json?q=" # Your query here query="" url=site+urllib.quote(query) print "Recolectando alias en Twitter: %s

" % query response = requests.get(url) users = [] for res in response.json["results"]: if res.has_key('to_user'): if not res['to_user'] in users: users.append(str(res["to_user"])) if res.has_key('from_user'): if not res['from_user'] in users: users.append(str(res["from_user"])) print "ALIAS-> %s" % users print "

Comprobrando alias en 160 websites

" for username in users: for service in const.services: try: res1 = requests.get('http://checkusernames.com/usercheckv2.php?target=' + service + '&username=' + username, headers={'X-Requested-With': 'XMLHttpRequest'}).text if 'notavailable' in res1: print "" print username + " -> " + service print "" except Exception as e: print e # VULNEX EOF

And the result is as follows:

Tool #6

Objective: can we extract metadata from ANTPJI photos?

This script downloads the photos related to ANTPJI from Google and extracts the metadata.

Used libraries: Requests, PIL and Markup.

# File: ex6.py # Date: 05/14/13 # Author: Simon Roses Femerling # Desc: Download pictures from Google and extract metadata # # VULNEX (C) 2013 # www.vulnex.com import const from apiclient.discovery import build import pprint import os from PIL import Image from StringIO import StringIO from PIL.ExifTags import TAGS import requests import markup def do_query(istart=0): if istart == 0: res = service.cse().list( q=doquery, cx=const.cse_id, num=10).execute() else: res = service.cse().list( q=doquery, cx=const.cse_id, num=10, start=istart).execute() return res pic_id=1 do_stop=10 cnt=1 page=markup.page() # Set your page title page.init(title="ANTPJI OSINT") page.h1("ANTPJI OSINT") # Set output directory out_dir = "pics_gepl" # Define your Google hacking query here query='' query_params='' doquery=query+query_params service = build("customsearch","v1",developerKey=const.cse_token) if not os.path.exists(out_dir): os.makedirs(out_dir) res=[] while True: if cnt==1: res = do_query() else: if not res['queries'].has_key("nextPage"): break res = do_query(res['queries']['nextPage'][0]['startIndex']) cnt+=1 if cnt > do_stop: break if res.has_key("items"): for item in res['items']: name="" if not item.has_key('pagemap'): continue if not item['pagemap'].has_key('hcard'): continue hcard = item['pagemap']['hcard'] for card in hcard: pic_url="" if 'title' in card: if 'fn' in card: name = card['fn'] if 'photo' in card: pic_url = card['photo'] if pic_url != "": image = requests.get(pic_url) pic_n = os.path.join(out_dir,"%s.jpg") % pic_id file = open(pic_n,"w") pic_id+=1 try: i = Image.open(StringIO(image.content)) if hasattr(i,"_getexif"): ret = {} info = i._getexif() if info: for k,v in info.items(): decode = TAGS.get(k,v) ret[decode] = v print ret i.save(file,"JPEG") page.p(name.encode('ascii','ignore')) page.img(src=pic_n) page.br() page.br() except IOError, e: print "error: %s" % e file.close() # Set your output filename with open('index_gepl.html','w') as fp: fp.write(str(page)) # VULNEX EOF

A picture is worth a thousand words!

As we have seen throughout this article we can easily write sophisticated OSINT tools with a little bit of Python that allows us to gather lots of information about individuals or collectives.

If you would like me to go into any topic in Python and OSINT in depth let me know 🙂

What tools do you use for OSINT?

— Simon Roses Femerling

References