Once upon a time, I had to quickly resolve thousands of DNS names. My first solution was to call gethostbyname repeatedly for each of the hosts. This turned out to be extremely slow. I could only do 200 hosts in a minute. I talked with someone and he suggested to try to do it asynchronously. I looked around and found adns - asynchronous dns library. Since I was writing the code in Python, I looked around some more and found Python bindings for adns. I tried adns and - wow - I could do 20000 hosts in a minute!

In this post I want to share the slow code and the fast asynchronous code. The slow code is only useful if you need to resolve just several domains. The asynchronous code is much more useful. I made it as a Python module so that you can reuse it. It's called "async_dns.py" and an example of how to use it is included at the bottom of the post.

Here is the slow code that uses gethostbyname. The only reusable part of this code is "resolve_slow" function that takes a list of hosts to resolve, resolves them, and returns a dictionary containing { host: ip } pairs.

To measure how fast it is I made it resolve hosts "www.domain0.com", "www.domain1.com", ..., "www.domain999.com" and print out how long the whole process took.

#!/usr/bin/python import socket from time import time def resolve_slow(hosts): """ Given a list of hosts, resolves them and returns a dictionary containing {'host': 'ip'}. If resolution for a host failed, 'ip' is None. """ resolved_hosts = {} for host in hosts: try: host_info = socket.gethostbyname(host) resolved_hosts[host] = host_info except socket.gaierror, err: resolved_hosts[host] = None return resolved_hosts if __name__ == "__main__": host_format = "www.domain%d.com" number_of_hosts = 1000 hosts = [host_format % i for i in range(number_of_hosts)] start = time() resolved_hosts = resolve_slow(hosts) end = time() print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)

And here is the fast code that uses adns. I created a class "AsyncResolver" that can be reused if you import it from this code. Just like "resolve_slow" from the previous code example, it takes a list of hosts to resolve and returns a dictionary of { host: ip } pairs.

If you run this code, it will print out how long it took to resolve 20000 hosts.

#!/usr/bin/python # import adns from time import time class AsyncResolver(object): def __init__(self, hosts, intensity=100): """ hosts: a list of hosts to resolve intensity: how many hosts to resolve at once """ self.hosts = hosts self.intensity = intensity self.adns = adns.init() def resolve(self): """ Resolves hosts and returns a dictionary of { 'host': 'ip' }. """ resolved_hosts = {} active_queries = {} host_queue = self.hosts[:] def collect_results(): for query in self.adns.completed(): answer = query.check() host = active_queries[query] del active_queries[query] if answer[0] == 0: ip = answer[3][0] resolved_hosts[host] = ip elif answer[0] == 101: # CNAME query = self.adns.submit(answer[1], adns.rr.A) active_queries[query] = host else: resolved_hosts[host] = None def finished_resolving(): return len(resolved_hosts) == len(self.hosts) while not finished_resolving(): while host_queue and len(active_queries) < self.intensity: host = host_queue.pop() query = self.adns.submit(host, adns.rr.A) active_queries[query] = host collect_results() return resolved_hosts if __name__ == "__main__": host_format = "www.host%d.com" number_of_hosts = 20000 hosts = [host_format % i for i in range(number_of_hosts)] ar = AsyncResolver(hosts, intensity=500) start = time() resolved_hosts = ar.resolve() end = time() print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)

I wrote it in a manner that makes it reusable in other programs. Here is an example of how to reuse this code:

from async_dns import AsyncResolver ar = AsyncResolver(["www.google.com", "www.reddit.com", "www.nonexistz.net"]) resolved = ar.resolve() for host, ip in resolved.items(): if ip is None: print "%s could not be resolved." % host else: print "%s resolved to %s" % (host, ip)

Output:

www.nonexistz.net could not be resolved. www.reddit.com resolved to 159.148.86.207 www.google.com resolved to 74.125.39.99

Download async_dns.py

Download link: catonmat.net/ftp/async_dns.py

See you next time!