2. Background

Before we explore network architectures and dive into the technical details, we want to explain why this is useful and refer to related research.

2.1. Image Segmentation

One of the reasons neural networks became so successful is their ability to solve complex tasks without any domain knowledge. This means that even though we used aerial images in this project, the same architecture can be used for any other image segmentation task. In fact, at no point did we use any of our knowledge about how roads or buildings should look like! In contrast, ten years ago, people would have handcrafted specific features and restricted the search space by manually specifying shape grammars or other mathematical constructs (describing the shapes of roads and buildings).

As with everything in life, there is an up and downside to having such a well-generalized tool at our disposal. The downside is that we are most likely giving up a bit of accuracy. For example, requiring parking spots to be rectangular might have given us another one percent increase in accuracy.

But on the upside, we:

Produce a more general model that can be re-purposed.

Avoid making any assumptions, reducing our (inaccurate) bias¹.

Create results that are easier to understand and reproduce.

¹ Side note: We refer to bias that gets introduced when designing a solution. Whenever you select specific features and ignore others, you are inserting bias (which can have a positive or negative effect). An example would be to assume that all buildings are rectangular.

Historically CNNs were mainly used for classifying images, but in recent years image segmentation has become increasingly popular. You can now find numerous papers describing various approaches. For a start, we recommend “Mask-CNN: Localizing Parts and Selecting

Descriptors for Fine-Grained Image Recognition” (by Wei, X.S. et al. 2016) and “Fully Convolutional Networks for Semantic Segmentation” (by Shelhamer, E et al. 2016). In this article, we draw upon many of their ideas, especially from the second paper.

2.2. Aerial Images

Aerial and satellite images are the perfect use case for image segmentation because they provide us with lots of different applications. We will discuss some here, but please let us know if you come across any other interesting applications (elias.s@highdimension.io).

Autonomous vehicles are the first use case. Safety is the most important factor when designing an autonomous vehicle, but safety requires an understanding of the environment. The width of the road, position of sidewalks or pedestrian crossings are only a few things an autonomous vehicle needs to know. Current maps do not provide this information and driving around to collect that knowledge is slow and expensive. By analysing satellite and aerial images, we can build high definition maps that can help all autonomous vehicles to navigate the world more safely.

The detection of parking spots is another interesting application. Even though aerial images are usually not available in real time, the initial position of parking spots, good drop-off points and pick up locations is something that is of value for both transportation companies and service providers such as parkopedia. Furthermore, acquiring multiple images from different days/weeks/months allows us to estimate the usage intensity of streets, parking spots and urban areas in general which is not only highly relevant for city planning but can also help companies making more accurate decisions.

Last but not least, we are also detecting buildings which can give us a good estimate of how populated a given area is. In fact, large NGOs are often interested in population estimates for areas where the government is not releasing reliable information. Assessing how much area is covered by buildings can significantly improve those estimates.