At Mapbox we are happy to open source RoboSat our production ready end-to-end pipeline for feature extraction from aerial and satellite imagery. In the following I describe technical details, how it will change the way we make use of aerial and satellite imagery, and how OpenStreetMap can benefit from this project.

Berlin aerial imagery, segmentation mask, building outlines, simplified GeoJSON polygons

Live on-demand segmentation tile server for debugging purpose

Here is how RoboSat works.

The prediction core is a segmentation model — a fully convolutional neural net which we train on pairs of images and masks. The aerial imagery we download from our Mapbox Maps API in all its beauty. The masks we extract from OpenStreetMap geometries and rasterize them into image tiles. These geometries might sometimes be coarsely mapped but automatically extracting masks allows us to quickly bootstrap a dataset for training.

We then have two Slippy Map directory structures with images and corresponding masks. The Slippy Map directory structure helps us in preserving a tile’s geo-reference which will allow us later on to go back from pixels to coordinates. It is RoboSat’s main abstraction and most pipeline steps transform one Slippy Map directory into another Slippy Map directory.

We then train our segmentation models on (potentially multiple) GPUs and save their best checkpoints. We implemented our model architectures in PyTorch and are using GPUs, specifically AWS p2/p3 instances, and a NVIDIA GTX 1080 TI to keep our Berlin office warm during winter.

When we use the checkpoints for prediction on a Slippy Map directory with aerial imagery we get a Slippy Map directory with probabilities for every pixel in image tiles:

Parking lot prediction; probability scales saturation (S) in HSV colorspace

Which we turn into segmentation masks handling model ensembles and tile borders:

Smooth predictions across tile boundaries. Do you see tile boundaries here? No? Great!

Serializing the probabilities in quantized form and only storing binary model outputs allows us to save results in single-channel PNG files which we can attach continuous color palettes to for visualization. We do the same for masks and then make use of PNG compression to save disk space when scaling up this project e.g. across all of North America.

Based on the segmentation masks we then do post-processing to remove noise, fill in small holes, find contours, handle (potentially nested) (multi-)polygons, and simplify the shapes with Douglas-Peucker:

Segmentation masks, noise removal, restoring connectivity, finding contours

We then transform pixels in Slippy Map tiles into world coordinates: GeoJSON features. In addition we handle tile borders and de-duplicate against OpenStreetMap to filter out predictions which are already mapped.

The end result is a GeoJSON file with simplified (multi-)polygon recommendations. Thanks robots!

Here is an example visualizing the prediction pipeline:

Aerial imagery, segmentation probabilities, masks, extracted features, merging features across tile boundaries

I see RoboSat as a building block for multiple use-cases and projects:

RoboSat can “look” at every edit in OpenStreetMap in real-time to flag suspicious changesets. At the same time it can help to let good looking changesets go through without manual verification.

RoboSat can tell you how complete the map is in a specific area for a specific feature. For example: “Buildings in Seattle are 90% mapped”. And then it can show you unmapped buildings and polygon recommendations for them.

RoboSat can be integrated into imagery platforms like OpenAerialMap or toolchains like OpenDroneMap to generate a better understanding of the area minutes after flying your drone.

And while the possibilities are endless I want to emphasize that RoboSat is neither meant for fully automated mapping nor capable of doing so. We will use RoboSat as a supporting tool but not for automated imports.

In the coming months we will invest into RoboSat to expand it to multiple features like buildings and roads (what we already have internally; see images at the top), better handle variations in geography, imagery quality and zoom levels — all while making sure the pipeline stays generic and scalable.

If you want to give RoboSat or related projects a go check out Pratik’s note about using Mapbox imagery for machine learning.

Happy to hear your feedback; and feel free to open issues in the RoboSat repository for feature requests, ideas, questions, or bug reports :)