According to Wikipedia,

Reverse geocoding is the process of back (reverse) coding of a point location (latitude, longitude) to a readable address or place name. This permits the identification of nearby street addresses, places, and/or areal subdivisions such as neighbourhoods, county, state, or country.

Reverse geocoding is crucial to the work that I do at OpenSignal where I've to churn through terabytes of crowdsourced data to compare operator performance (in terms of coverage and data speeds) and user numbers at various geographical areas across the globe. To be able to do these analyses easily, I built a geocoder library in Python that is offline and fast, and that improves on an existing one built by Richard Penman. By making this offline, you do not have to deal with slow web APIs (such as Nominatim and Google) and query limits. The library was built with speed in mind and it can geocode 10 million GPS coordinates in under 30 seconds on a machine with 8 cores.

Since its release over a year ago, the library has been pretty well received by the community. It's been downloaded over 10,000 times and has received 1230 stars on Github. Being featured as the #1 post on Hacker News certainly helped drive a lot of traffic to it. I'm really grateful to the community who have helped report and squash quite a few bugs along the way.

Under the Hood

Under the hood, the library comes packaged with a database of places with a population greater than 1000, which was obtained from GeoNames. This entire database is loaded into a k-d tree and the nearest neighbour algorithm is used to find the city/town closest to the input point location. There's a nice explanation of k-d trees in Data Skeptic, one of my favourite podcasts. The scipy implementation of k-d trees is, unfortunately, single-threaded and does not exploit the multiple CPUs available on your machine. Thus, to improve performance, I implemented a parallelised k-d tree that comes into its own for really large inputs (in the order of millions) as seen in the graph below.