A blazing fast geo database with LevelDB, Go and Geohashes Aug 30, 2015

You probably have heard of LevelDB it’s a blazing fast key value store (as a library not a daemon), that uses Snappy compression.

There is plenty of usages for it, the API is very simple at least in Go (I will be using Goleveldb).

The key is a []byte the value is a []byte so you can “get”, “put” & “delete” that’s it.

I needed a low memory low cpu system that could collect millions of geo data and query over them, Geohash has an interesting property you can encode longitude and latitude into a string : f2m616nn this hash represents the lat & long 46.770, -71.304 f2m616nn, if you shorten the string to f2m61 it still refers to the same lat & long but with less precisions f2m61.

A 4 digits hash leads to 19545 meters precision, to perfom a lookup around a position you simply query for the 8 adjacent blocks. A Geohash library for Go.

Here you would store all of your data points matching a geohash to the same set.

Problem there is no such thing as a set in LevelDB.

But there is a cursor so you can seek to a position then iterate over the next or previous one (byte ordered).

So your data could be stored that way: 4 digits geohash + a uniq id.

Then you can perform proximity lookup by searching for the 8 adjacents hashes from the position your are looking with a precision of 20km, good but not very flexible.

We can have a more generic solution, first we need a key a simple int64 uniq id.

// NewKey generates a new key using time prefixed by 'K' func NewKey () Key { return NewKeyWithInt ( time . Now (). UnixNano ()) } // NewKeyWithInt returns a key prefixed by 'K' with value i func NewKeyWithInt ( id int64 ) Key { key := bytes . NewBufferString ( "K" ) binary . Write ( key , binary . BigEndian , id ) return key . Bytes () }

Here we can encode a key with a Unix timestamp so our key is not just a key it’s also an encoded time value, it will be uniq thanks to the nanosecond precision. We are using BigEndian so it can be byte compared: older will be before newer after.

Now about geo encoding our key will be of the form:

G201508282105dhv766K��Ϸ�Y� (note the end of the key is binary encoded) You always need a prefix for your keys so you can seek and browse them without running over different keys types, here I have a G as Geo, then a string encoded date prefix, so we can search by date, but we don’t want extra precision here, it would add extra seek to LevelDB, (that’s why we have a modulo of 10 for minutes) then we add a precise geohash and finally our previous uniq id.

// NewGeoKey generates a new key using a position & a key func NewGeoKey ( latitude , longitude float64 ) GeoKey { t := time . Now (). UTC () kint := t . UnixNano () kid := NewKeyWithInt ( kint ) // G + string date + geohash 6 + timestamped key // G201508282105dhv766K.... gk := geohash . EncodeWithPrecision ( latitude , longitude , 6 ) ts := t . Format ( "2006010215" ) // modulo 10 to store 10mn interval m := t . Minute () - t . Minute () % 10 zkey := []byte( "G" + ts + fmt . Sprintf ( "%02d" , m ) + gk ) zkey = append( zkey , kid ... ) return zkey }

We can now lookup by flexible date & by flexible proximity like a Redis ZRANGE, you simply need to reverse the process.

// GeoKeyPrefix return prefixes to lookup using a GeoKey and timerange func GeoKeyPrefix ( start , stop time . Time ) [] string { var res [] string d := 10 * time . Minute var t time . Time t = start for { if t . After ( stop ) { break } key := "G" + t . Format ( "2006010215" ) + fmt . Sprintf ( "%02d" , t . Minute () - t . Minute () % 10 ) res = append( res , key ) t = t . Add ( d ) } return res }

Lookup that way:

d := time . Duration ( - 10 ) * time . Minute geoPrefixs := GeoKeyPrefix ( time . Now (). UTC (). Add ( d ), time . Now (). UTC ()) // find adjacent hashes in m // 1, 5003530 // 2, 625441 // 3, 123264 // 4, 19545 // 5, 3803 // 6, 610 gk := geohash . EncodeWithPrecision ( lat , long , 4 ) adjs := geohash . CalculateAllAdjacent ( gk ) adjs = append( adjs , gk ) // for each adjacent blocks for _ , gkl := range adjs { // for each time range modulo 10 for _ , geoPrefix := range geoPrefixs { startGeoKey := []byte( geoPrefix + gkl ) iter := s . NewIterator ( util . BytesPrefix ( startGeoKey ), nil ) for iter . Next () { log . Println ( iter . Value ()) } iter . Release () } }

It can be optimized, reducing the size of the keys, but it performs extremely well storing around 3 millions geopoints per day, using less than 3% cpu and can received hundreds of queries per second.

Oh did I forget to mention it’s running on a Raspberry Pi? :)

I could maybe turn it into a library but it’s so simple it’s probably useless.

Next blog post: what are those millions points used for?