Visualization is a fundamental part of modern data-centered applications: a plot can show you in the blink of an eye if your data has the shape that you’re expecting, but having to retrieve all your samples just to cram them in a graph without enough pixels to show them all is clearly not a good idea.

Downsampling seems the obvious next step, but how to choose which samples to keep and which to throw away? The key idea is to take the samples that make the overall shape of your data as similar to the original one as possible.

Downsampling with LTTB

LTTB (Largest Triangle Three Buckets) is a downsampling algorithm described by Sveinn Steinarsson in his Master thesis that decimates your data while keeping its visual aspect.

The gist of the algorithm is:

Choose the number of output samples that you want ( threshold ). Keep the first and last original samples and split the rest of the samples in threshold — 2 buckets of approximately the same size. For each bucket, choose the best sample. The best sample is the one that forms the triangle with the biggest area having as vertices itself, the best sample of the previous bucket (already chosen) and the average sample of the next bucket. Return a list with the first sample, the selected best samples and the last sample.

The algorithm gives much better results than just dropping random samples since it chooses samples which maximise the visual similarity to the original data. It also guarantees that samples contained in the output were present in the original input, so if you’re downsampling a time series of events you can be sure that all plotted data comes from actual events and it’s not the result of averaging effective events into a non-existing one.

Sprinkle some Elixir on it: ExLTTB

ExLTTB is the Elixir implementation of LTTB we wrote at Ispirata to visualise data coming out of Astarte, our open source IoT platform written in Elixir.

The library can be used to downsample all sorts of data, but in its most basic API ExLTTB.downsample_to expects a list of maps (or anything that can be used with the Access protocol) containing :x and :y keys and a threshold that is the number of output samples wanted.

Simple ExLTTB example

If you have data with a different shape, you just have to define 3 functions that must be passed as opts: sample_to_x_fun , sample_to_y_fun and xy_to_sample_fun . The first two must take a sample as input and return what you consider the x and y of your data. The last function takes an x and a y and must return a sample that can be used as an input to sample_to_{x,y}_fun (this function builds the average point used in the algorithm).

If your data contains additional fields, these are left unchanged: your output samples will contain the same fields of the original ones.

Let’s throw something more complex at it

Doing it lazily

So we have a way to downsample our finite data. But what if I wanted to downsample an infinite stream of samples?

ExLTTB contains another module, ExLTTB.Stream, which allows you to use a modified version of ExLTTB based on Stream functions. This makes it useable in Stream pipelines to have your result computed in a lazy way.

The API is slightly different: since you could have an infinite stream of data, it doesn’t make sense to specify how many output samples you want. So instead of a threshold, you specify an average bucket size, i.e. if your stream emits 1000 samples and you have a downsample rate of 10, ExLTTB will produce 100 output samples. The first sample is guaranteed to be the same, as in the original LTTB algorithm, while the last sample is the same only if the input is finite (otherwise, you don’t have the concept of a “last sample”).

To stress the difference, this function is called downsample instead of downsample_to .

ExLTTB.Stream also supports options for accessing complex data just like ExLTTB.

Let’s plot something

Here you can see a small example of the results obtainable with ExLTTB (I’ve used Expyplot for the plots).