Mosaic is a nice way to group satellite images and to work with it as a single image. Creating mosaic in commercial (such as ArcGIS, or ENVI) and open source (QGIS) software is easy. However that’s not the case if you’re trying to implement it using open source libraries (that are used behind the scenes in ArcGIS, QGIS, etc) and to automate the whole process with Python. There are a lot of tips that commercial software handles silently without bothering you, so you don’t have to dig into details of it.

Since our goal is to have as much automated process of Detecting water areas on satellite images with computer vision as it is possible, we need to try to create mosaic using GDAL and Python 🙂 Let’s see how far we can go 🙂

There are no single answer on the question how to build a mosaic in GDAL, cause as it is common in GDAL, one goal could be achieved with many approaches. Depending on what input data you have algorithm could vary significantly.

Let’s start with some basic and later on we’ll try to develop a general approach for all cases.

There are 2 basic GDAL utilities that could be used for creating mosaic from command line:

There are a lot of things to take under consideration while using this GDAL utils. Each of them has it’s own pros and cons. Let’s try to play with them a bit.

To make process as much close to reality as possible, let’s create mosaic from Sentinel-2 images for NYC, Washington DC and Houston. After Getting Sentinel images and browsing it we can see that mosaic will cover big area. At the end our mosaic should look like this (and it will 🙂 ):

Size of each input file is 361 MB. Let’s see what we’ll get

So, let’s start with gdal_merge.

It took about 20 minutes for gdal_merge to complete the process. This is not acceptable. Let’s see how it looks in QGIS:

Mmm…. Black rectangle after 20 minutes? 🙂 No, this is not what we want.

Let’s try gdalwarp

With this command it seems like this is an infinite process.

It processes rasters one by one. After hours of work progress was still around 50 %. Definitely not acceptable.

After googling I found another approach worth to try:

build virtual dataset (gdalbuildvrt) from all 3 files convert virtual dataset to raster file (gdal_translate)

This sounds promising.

Let’s try this:

But this would be to easy to work 🙂 We got this warning:

Well… Bad sign. vrtFile.vrt file is created, but when we open it we see info only about NYC and WashingtonDC:

As warning message said, Houston was skipped.

Complaining on heterogeneous projections means that some of the raster files have different Spatial reference (this is quite logical, cause we explicitly chose Houston, NYC and Washington DC that are fare away from each other). Let’s check our suspicions: will run gdalinfo for each raster:

1) For NYC we have

2) For Washington DC we have

3) For Houston it shows

Aha! NYC and Washington DC have the same Spatial Reference, but Houston has different one! Ok, then we should reproject file into the same UTM zone and then try to build virtual raster once again. Since we have 2 files in UTM zone 18, we’re gonna reproject Houston (which is in UTM zone 15).

For reprojection we will use gdalwarp.

To perform transformation between between different UTM zones we need to pass to gdalwarp source and destination Spatial References in PROJ.4 format. How can we get this info? The answer is to use gdalsrsinfo util.

Let’s run this command for NYC raster file and Houston raster:

Output is following:

For now we’re interested only in PROJ.4 block. For Washington DC gdalsrsinfo will show exactly the same PROJ.4 block.

For Houston we will have

Ok, so now we will reproject Houston to UTM zone of NYC and Washington DC.

Here we created a new output file and specified source and destination Spatial References as well as additional options for better processing. After couple of seconds we have reprojected file.

If we open now all 3 files in QGIS full extent will look like this:

Looks good! Even though we reprojected Houston raster it is still in the right place (south-west corner)!

Let’s check what gdalinfo shows for newly created reprojected Houston file:

Exactly what we wanted.

Ok, so now we can try to build virtual dataset once again and we do not expect warnings about heterogenous projects.

So, we run

This time it goes without warnings and .vrt file contains information about 3 rasters (NYC, Washington DC and Houston):

Good sign! Now it’s time to convert this .vrt to raster files. But with gdal_translate command for mosaic we need to use important additional settings

It’s important to set TILED, BIGTIFF, COMPRESS and PHOTOMETRIC settings, cause our mosaic is gonna be a big file (though COMPRESS option will help us to avoid huge file size). Without setting this properties we’re gonna face a lot of problems.

It takes some time to build mosaic. After approx. 10 minutes process should be finished.

Size of mosaic is 204 MB! Of course, this is because of compression we used. 204 MB file mosaic for 3 files 360 MB each! Not bad!

The only problem is that when you try to open this mosaic file in SNAP or QGIS is takes around 10 minutes to open the file. This is really strange, but most likely this is because QGIS and SNAP have some problems with this compressed files. Not really sure about this. I tried different combination for gdal_translate, but without success. Still it took a lot of time to display it.

In SNAP it also takes a lot o time to display and it displays it with some “blocks”

Most likely reason for that is the same as in QGIS.

Anyway, even with this restriction here’s a summary of mosaic creation from Sentinel-2 images:

Check if rasters have the same Spatial reference. If not then reproject them to one UTM zone with gdalwarp. Build virtual dataset. Translate virtual dataset to raster file with options for big tif mosaic files.

This process can be easily automated with Python.