The commercially captured satellite imagery that is available in SpaceNet dataset provides wide array of opportunities for advancements in computer vision and machine learning. This article attempts to create understanding and awareness for SpaceNet dataset for those who are unfamiliar with it.

What is SpaceNet ?

SpaceNet logo

SpaceNet is a publicly available dataset containing commercial satellite imagery. This imagery is accompanied by labelled information that can be used to train machine learning models. CosmiQ Works, Radiant Solutions and NVIDIA have partnered to release the SpaceNet data set to the public in order to foster innovation in the field of computer vision with algorithms that can automatically extract geometric features such as roads, buildings etc.

Dataset Source

In order to better understand the dataset, it is helpful to know from where the data is sourced. The satellite imagery is provided by Digital Globe, an American vendor of space imagery and geospatial content. Digital Globe owns a “constellation” of satellites that capture high quality remote sensing imagery. The satellites include QuickBird, GeoEye-1 and the WorldView Series (WorldView-1, WorldView-2, WorldView-3 and WorldView-4). The data generated from each satellite varies in terms of features such as resolution, geolocational accuracy etc.

Comparison of data captured by different Digital Globe satellite (For a more detailed comparison of each satellite, read the official Digital Globe specification document here)

What locations are covered in SpaceNet?

SpaceNet contains satellite data for 5 Area of Interests (AOIs). Below table summarizes the data in 5 AOIs.

AOIs in SpaceNet

In the round 1 of SpaceNet Challenge(Nov, 2016) open corpus of imagery from WorldView-2 satellite for Rio De Janeiro (AOI_1 location) was used. This imagery was 50 cm mosaic of Ground Sample Distance (GSD) with eight spectral bands. In round 1, 42 developers competed in an open challenge hosted by TopCoder to create algorithms that extract building footprints from satellite imagery.

The next phases of the SpaceNet Challenge are a follow-on competition using DigitalGlobe’s 30 cm imagery from WorldView-3 and building footprints across four new geographically diverse cities spread around the globe. These datasets now have multiple imagery formats (panchromatic, multi-spectral, RGB-pansharpen, multispectral-pansharpen) to allow experimentation with different types of imagery. Also, the data is available for download in training-test split.

Sample of a high-resolution image captured by WorldView-3 satellite (see more examples here)

How much area is covered?

Below visualization show the geographical area covered in the SpaceNet dataset for all AOIs.

AOI_1: Rio de Janeiro, Brazil

AOI_2: Las Vegas, USA

AOI_3: Paris, France

AOI_4: Shanghai, China

AOI_5: Khartoum, Sudan

Understanding the Dataset

For better understanding, you can visualize the SpaceNet dataset to be divided in 2 categories: vector data and raster data.

In vector representation, we describe the data in terms of geometric shapes such as points, lines and polygons. This representation is usually written in a mark-up language syntax (such as Well-Known Text) and explicitly stores the actual coordinates of vertex.

In Raster representation(including imagery), we divide the area in equal squares and assign characteristics to these squares. For example, we may use a 2-dimensional matrix to represent an area. Other than an origin point, e.g. bottom left corner or top-left corner, no geographic coordinates are stored.

Raster vs Vector data

Now, let’s take a look at both kind of data in the context of SpaceNet.

Raster data in SpaceNet

Raster data in SpaceNet dataset is present in the form of .tif images. These GeoTiff images are a special format of image data which incorporate meta-data that can be used for georeferencing an image and how a pixel in the image is mapped to real world distances.

The Rio de Janeiro dataset (AOI_1) used in Round-1 of SpaceNet challenge, is formated slightly differently and is missing pansharpened, 8-band and multispectral data. This dataset only contains RGB channels pansharpened to a resolution of 0.5m. In all the other datasets, you would observe following directory structure for both train and test data.

Raster data in SpaceNet

Most commercial satellites capture imagery over multiple coarse resolution multispecral bands as well as a finer spatial resolution panchromatic band. This is done because of the tradeoff between the spectral resolution (i.e. the range of wavelengths that are sampled by an imaging detector) and the spatial resolution (read more). Spacenet dataset provides us images with panchromatic, RGB and 8-band channels. Let’s discuss the characteristic properties of these images.

MUL/RGB