A Time Series Storage for Coordinates Dec 11, 2019 go geo

TL;DR; Knowing your data helps compress them better than common algorithms.

Problem

For one of my side projects, an IoT database, I wanted a specialized time series to store timestamps coupled with coordinates.

I needed a simple solution which allows live and cold compressed storage with gaps in it: IoT devices can be off for days then reappear.

But couldn’t find anything fitting my needs, the Gorilla Paper from Facebook is really nice but expects a 4 hours maximum gap between time events.

Solution

Time series for geospacial data have particular properties:

events are occurring often regularly

object is not moving at all or

object is often moving at a constant speed

It means we could store deltas between events to reduce the space needed.

Storage

One uint32 is needed to encode an uncompressed time using Unix ts

is needed to encode an uncompressed time using Unix ts Two float32 are needed to encode an uncompressed latitude/longitude

(stored as uint32: float32 * 10e5, 0.00001 degree represents ~1 meter)

By storing the delta between two events, we need less space:

1201986030 then 10 seconds later 1201986040 , the delta is 10 , which can be stored in an uint8 , or in an uint16 for a bigger delta value but still reducing the original needed uint32 .

The next event will probably occur 10 seconds later at 1201986050 , by storing deltas of deltas (so we need a signed integer), we can store 0 (zero is nice since it takes zero space) or the delta +/- !

Many time series, Prometheus included, are using this trick.

Same applies to the coordinates, a moving car or a not moving car, can be compressed by this technic.

Because we may have large gap between times or between coordinates, larger than a uint16 , we may have to store a fully uncompressed value.

We still have to store something for every events indicating if it’s a zero, uint8 , uint16 delta or a full value.

Delta0 DeltaEncoding = 0 b00 Delta8 DeltaEncoding = 0 b01 Delta16 DeltaEncoding = 0 b10 Full32 DeltaEncoding = 0 b11

Using two bits we can tell the timestamp encoding, same goes for latitude and longitude, reading it back with & mask.

func ( d DeltaEncoding ) TSDelta () DeltaEncoding { return d & 0 b000011 } func ( d DeltaEncoding ) LatDelta () DeltaEncoding { return d & 0 b001100 >> 2 } func ( d DeltaEncoding ) LngDelta () DeltaEncoding { return d & 0 b110000 >> 4 }

Compression

Let’s recap with an example:

# timestamp lat lng encoded 1 1201986030 48.22222 2.33333 47a4d9ee004994ce00038f75 2 1201986040 48.22222 2.33333 010a 3 1201986050 48.22222 2.33333 00 4 1201986060 48.22221 2.33334 14ff01 5 1201986070 48.22220 2.33335 00 6 1201986080 48.22219 2.33336 00

From #1 to #3, you can see a not moving device, with a constant timestamp will end consume only 1 byte.

From #4 to #6, a moving device with a constant speed will take 1 byte per coordinate, then end consume and share the exact same 1 control byte.

By using these simple solutions on real data, it’s compressing better than common compressors, being very easy on CPU consumption:

Let’s compare the compressed size to other algorithms:

Size: 64776 Snappy 61007 LZ4 57225 TSD 23948 Size: 18864 Snappy 18870 LZ4 18267 TSD 9883 Size: 15768 Snappy 12721 LZ4 11860 TSD 4775

Conclusion

There is nothing revolutionary here and it can be improved a lot (6 bits used on the control byte, XORing values, RLE…) but it demonstrates a simple specialized piece of software could perform better than someone else compression library.

The code for tsd is up on Github !

Hiring

Looking for a remote full time employee or contractor, contact me.