In this part I will introduce the python implementation of the ip to geolocation script. It’s more object oriented and hopefully better to read. In the first part of this article I willdescribe the solution to read http resources and parse the content. The second part is the same like the php version. As conclusion I will compare the results of all five APIs with the data from the cache.

python: asynchonous http requests on google appengine

First I want to build the same multi-url-fetch-function (like in php) but I am using the google appengine. There are no threads allowed and the urllib/httplib modules are masked with the urlfetch module from google. I chose the normal and easy urllib.open call because the google backend works fast. After this was done I found in the updated URLFetch-documentation (since June 18, 2009 or appengine version 1.2.3) the section that said: “To do asynchronous calls you have to use the special modul from the urlfetch modul“. Have fun with the improved example.

from google.appengine.api import urlfetch class InfoItem(dict): '''dict with start reading while __init__ the ipinfodb ''' def __init__(self, url): self.rpc = urlfetch.create_rpc() urlfetch.make_fetch_call(self.rpc, url) def ready(self): '''Check if the async call is ready. @return True - if got data after parsing ''' try: result = self.rpc.get_result() except urlfetch.Error, ex: logging.error("Error while fetch: %s" % ex) return False if result.status_code != 200: return False return self.parse(result.content) #ready #InfoItem

For easy access the result class is based on a python dict. To check if the api data is filled in the dict call the ready() function. You can build the instances of InfoItem, do something other and then ask the instances with the ready-function, if the data has arrived (if not it will wait). Accessing the values is easy because it’s a dict.

Parsing XML data with python

xml should be parsed with the elementtree modul. Its very fast and simple to use. Using the InfoItem class there are two jobs: building the url to the api by simple adding the ip string and parsing the content.

import xml.etree.ElementTree as etree from xml.parsers.expat import ExpatError class IpInfoDbItem(InfoItem): '''Simple parsing the content of the IpInfoDP-API''' def __init__(self, ip): '''Init with the IpInfoDb-url''' super(IpInfoDbItem, self).__init__("http://ipinfodb.com/ip_query.php?ip="+ip) def parse(self, content): '''Parse the IpInfoDb-XML and save the keys in the inner dict. @return True - if parsing was successfull. ''' try: #etree needs a file-like-object instead a string! t = etree.ElementTree().parse(StringIO.StringIO(content)) self.update({'name': 'ipinfodb', 'country': t.find("CountryName").text or '', 'city': t.find("City").text or '', 'lat': float(t.find("Latitude").text), 'long': float(t.find("Longitude").text)}) return True except (ExpatError, IOError), ex: logging.warn("Nothing parsed: %s" % ex) return False #parse #IpInfoDbItem #Test the code directly (if google modules are in the path) testing = IpInfoDbItem("127.0.0.1") if testing.ready(): print testing # {'lat': 0.0, 'country': 'Reserved', 'name': 'ipinfodb', 'long': 0.0, 'city': None}

The example starts fetching the data from the IpInfoDb-API in the __init__ function, parses the xml und fills the values in the dict with self.update.

Parsing non-structured data with python

The same hint like in php – use regular expressing for matching the data!

import re class HostIpItem(InfoItem): '''dict with reading while __init__ the hostip ''' def __init__(self, ip): super(HostIpItem, self).__init__("http://api.hostip.info/get_html.php?position=true&ip="+ip) def parse(self, content): '''Parse the HostIp-Text and save the keys in the inner dict. @return True if parsing was successfull. ''' match = re.search("Country:\s+(.*?)\(\w+\)

City:\s+(.*?)

Latitude: (-*\d+\.\d+)

Longitude: (-*\d+\.\d+)", content, re.S|re.I) if match: self.update( {'name': 'hostip', 'country': match.group(1), 'city': match.group(2), 'long': float(match.group(4)), 'lat': float(match.group(3))}) return True return False #parse #HostIpItem

Works like the xml example …

Build a complete webapplication

To put this together you have to define a RequestHandler, who fetches the data and produces a javascript. In django style you need the following template, the values in {{ x }} will be replaced with a dict.

var com = com||{}; com.unitedCoders = com.unitedCoders||{}; com.unitedCoders.geo = com.unitedCoders.geo||{}; com.unitedCoders.geo.ll = {{ ll_json }} ; {{ maxmind }} {{ wipmania }} {{ google }} document.write('<script type="text/javascript" src="http://pyUnitedCoders.appspot.com/geo_func.js"></script>'); com.unitedCoders.geo.staticMapUrl = function(x, y) { var url = "http://maps.google.com/staticmap?key={{ google_key }}&size="+x+"x"+y+"&markers="; var colors = ["blue","green","red","yellow","white", "black"]; for (var i=0; i<com.unitedCoders.geo.ll.length;i++) { var s = com.unitedCoders.geo.ll[i]; url += s.lat+","+s.long+",mid"+colors[i]+(i+1)+"%7C"; }; url += this.getLat() + ","+this.getLong() + ",black"; return url; };

from google.appengine.ext import webapp from google.appengine.ext.webapp.util import run_wsgi_app from google.appengine.ext.webapp import template class GeoScript(webapp.RequestHandler): def get(self): '''Get the location infos for the calling ip (from api).''' self.response.headers['Content-Type'] = 'text/plain;charset=UTF-8' #result-dict and local location list result = {} ll = [] #Start fetching API data ipInfo = IpInfoDbItem(ip) hostIp = HostIpItem(ip) #Add some more Javascrip APIs scriptTemp = "document.write('<script type=\"text/javascript\" src=\"%s\"></script>');" result['maxmind'] = scriptTemp % "http://j.maxmind.com/app/geoip.js" result['wipmania'] = scriptTemp % "http://api.wipmania.com/wip.js" if self.request.get("key"): result['google_key'] = self.request.get("key") result['google'] = scriptTemp % \ ("http://www.google.com/jsapi?key=" + self.request.get("key")) #Get the fetched API Data if ipInfo.ready(): ll.append(ipInfo) if hostIp.ready(): ll.append(hostIp) result['ll_json'] = encoder.JSONEncoder().encode(result['ll']) #Put all together in the javascript template path = os.path.join(os.path.dirname(__file__), 'geo.temp') self.response.out.write(template.render(path, result)) #get #GeoScript application = webapp.WSGIApplication([ ('/geo_data.js', GeoScript) ], debug=True)

For more information on how to start a python google appengine Webapplication start reading the fine google documentation!

conclusion

Don't mix too many languages - you will be confused! The parallel implementation of the server side script in php and python and using one version for the advanced functions in javascript will mix three script language! My first failures have been setting some semicolons in python or forgetting the block parentheses in javascript.

After deploying and watching a server side script with the dashboard of google's appengine you get all data: Logs and API-calls in detail, you can manage many different versions (default is one) or give deploy access to other google accounts:That's great!

What is the best service provider?

I have done some caching and checked all five API results. Here is the hit rate for the location of visitors of this blog and the distance to the center from the given locations (The center is the average of all long/lat values pairs with a given city value per ip).

Service Provider lat/long per ip city per ip distance to center maxmind 86% 85% 123 km WIPmania 89% 0% 1059 km google 48% 0% 197 km IPinfoDB 98% 91% 168 km hostip 35% 53% 404 km

HostIP and google do not offer location data for many visitors. IPInfoDB and MaxMind do not have the same positions (like suggested in the comments) for all IPs. At this time WIPMania mainly offers the center of the country. So the positions are not very accurate (in comparision to the calculated center).

How calculate the distance between lat/long values?

I found some nice functions in javascript (please don't add functions to the prototype to String and Integer!!!), in python the distance function looks like the following lines:

def distance(lat1, long1, lat2, long2): return 6378.7 * math.acos(math.sin(lat1/57.2958) * math.sin(lat2/57.2958) + math.cos(lat1/57.2958) * math.cos(lat2/57.2958) * math.cos(lon2/57.2958 - lon1/57.2958)) #distance