Working in French Guiana, i came to know the Mapazonia project which aims to improve OpenStreetMap data of the Amazon region. Starting to contribute, i learnt to use the OSM Tasking Manager, a tool that divides Areas Of Interest (AOI) into task tiles. Interested in the Guiana Shield, i wondered if i could use a similar way to map this area.

Once task tiles created, thousands of them for such a huge AOI, a question arises: Where to map first?

To answer this, we could think of several approaches:

we should find unmapped geographic features where the map is blank ;

; there should be less justified empty tiles in remote areas, where there is few people living;

where source data, satellite imagery , is of coarse resolution , geographic features covering large areas can be digitized (i.e. small scale mapping);

, is of , geographic features covering large areas can be digitized (i.e. small scale mapping); at heart of AOI first ;

Let’s build a task tile priority map from these ideas!

Existing data

We’ll begin by retrieving existing data to get a map similar to Martin Raifer’s work on node density. OSM extracts are available from Geofabrik and we can set up a PostGIS database following Regina Obe’s tutorial. Task tiles are level 12 tiles of the web mapping reference system. Here is how we create them:

CREATE TABLE tile AS SELECT x, y FROM generate_series(0, pow(2, 12)::INT - 1) AS x, generate_series(0, pow(2, 12)::INT - 1) AS y ; COMMENT ON TABLE tile IS 'Map tiles' ; COMMENT ON COLUMN tile.x IS 'Tile coordinate' ; COMMENT ON COLUMN tile.y IS 'Tile coordinate' ;

and optionally build their geometries (we are indeed at the conceptual frontier between vector and raster data models):

CREATE FUNCTION ST_GeomFromTileXY(z int, x int, y int, OUT tile_geom geometry(Polygon, 4326)) AS $$ DECLARE side real; xmin real; ymin real; xmax real; ymax real; minlon real; maxlon real; maxlat real; minlat real; BEGIN side := pow(2, z); xmin := x::real/side - .5; ymin := .5 - y::real/side; xmax := (x + 1)::real/side - .5; ymax := .5 - (y + 1)::real/side; minlon := 360.*xmin; maxlon := 360.*xmax; maxlat := 90. - 360.*atan(exp(-ymin*2.*pi()))/pi(); minlat := 90. - 360.*atan(exp(-ymax*2.*pi()))/pi(); tile_geom := ST_MakeEnvelope(minlon, minlat, maxlon, maxlat, 4326); END; $$ LANGUAGE plpgsql;

Reference system to tile coordinates conversion can be achieved as follows:

CREATE FUNCTION TileCoord(z int, point geometry(Point, 4326), OUT tile_x int, OUT tile_y int) AS $$ DECLARE tile_n real; sinphi real; x real; y real; BEGIN tile_n := power(2, z)::real; x := (ST_X(point) + 180.)/360.; sinphi = sin(ST_Y(point)*pi()/180.); y := .5 - ln((1. + sinphi)/(1. - sinphi))/(4.*pi()); tile_x := CAST(trunc(tile_n*x + .5) AS int); tile_y := CAST(trunc(tile_n*y + .5) AS int); END; $$ LANGUAGE plpgsql;

To compute node density, we first find the tile each point falls in:

CREATE TEMP TABLE planet_osm_point_dump AS SELECT id, type, (coord).tile_x, (coord).tile_y, geom FROM (SELECT row_number() OVER (ORDER BY type, id, (dump).path) AS id, type, TileCoord(12, (dump).geom) AS coord, (dump).geom AS geom FROM (SELECT 0 AS type, osm_id AS id, ST_DumpPoints(way) AS dump FROM planet_osm_point UNION ALL SELECT 1 AS type, osm_id AS id, ST_DumpPoints(way) AS dump FROM planet_osm_line UNION ALL SELECT 2 AS type, osm_id AS id, ST_DumpPoints(way) AS dump FROM planet_osm_polygon ) AS point ) AS point_tiled;

and count points by tile:

ALTER TABLE tile ADD COLUMN node_count INT DEFAULT 0;

UPDATE tile t SET node_count = count.num FROM ( SELECT x, y, count(*) AS num FROM planet_osm_point_dump GROUP BY x, y ORDER BY x, y) AS count WHERE t.x = count.x AND t.y = count.y;

Remote areas

The population seems to be concentrated on coasts and along main rivers, decreasing upstream. We’ll model remoteness with hydrological distance, the one traveled by surface runoff. It is computed with GRASS GIS r.stream.distance module. Hydrological modeling typically uses a flow direction map derived from a Elevation Model (DEM):

The USGS provides the GTOPO30 HYDRO1k dataset we can download from the Earth Explorer portal (ID GT30H1KSA for South America). Its projection is the Lambert Azimuthal Equal Area that is described by the PROJ.4 parameters:

+proj=laea +ellps=sphere +lon_0=-60 +lat_0=-15 +x_0=0.0 +y_0=0.0 +units=m

From these parameters, create a location with g.proj and a mapset:

g.proj -c proj4="+proj=laea +ellps=sphere +lon_0=-60 +lat_0=-15 +x_0=0.0 +y_0=0.0 +units=m" location=south_america g.mapset -c mapset=hydro location=south_america

The dataset contains several layers (see the README file) but we’re mainly interested in importing Elevation and Flow Directions:

r.in.gdal in=gt30h1ksa/sa_dem.bil out=elev r.in.gdal -e in=gt30h1ksa/sa_fd.bil out=dir_arc

Transform direction codes to GRASS GIS format using a lookup table file arc2grass.lut :

255 = 0 128 = 1 64 = 2 32 = 3 16 = 4 8 = 5 4 = 6 2 = 7 1 = 8

r.reclass in=dir_arc out=dir rules=arc2grass.lut

r.stream.distance -o dir=dir elev=elev method=up dist=dir_dist

To assign distance to tiles, we have to reproject the grid from LAEA to Web Mercator. It is at zoom level 15 a tile side is of about 1223 m, the closest to our distance grid resolution.

We first create a location with this projection and move to this location:

g.proj -c epsg=3857 location=mercator g.mapset -c mapset=priority location=mercator

Then create the level 15 region and reproject:

g.region w=-20037508.342789244 s=-20037508.342789244 e=20037508.342789244 n=20037508.342789244 rows=32768 cols=32768 r.proj in=dir_dist loc=south_america mapset=hydro out=dir_dist_1k meth=near

We now set the region to tile resolution and resample the grid:

g.region w=-20037508.3427892 s=-20037508.3430388 e=20037508.3427892 n=20037508.3430388 rows=4096 cols=4096 r.resamp.stats in=dir_dist_1k out=dir_dist meth=median

Export

We finally lighten the data, as we dont need submetric precision, and either export it to a raster file:

r.mapcalc "dir_dist = if(isnull(dir_dist), -1, int(dir_dist))" r.out.gdal -t in=dir_dist out=hydro_dist.tif type=UInt32 nodata=4294967295

or to text, prior to a PostGIS import:

r.mapcalc "y = row() - 1" r.mapcalc "x = col() - 1" r.out.xyz in=y sep=space out=hydro_dist-y.xyz r.out.xyz in=x sep=space out=hydro_dist-x.xyz r.out.xyz in=dir_dist sep=space out=hydro_dist.xyz paste hydro_dist-x.xyz hydro_dist-y.xyz hydro_dist.xyz | awk '!/-1$/ {print $3, $6, $9}' > hydro_dist.ssv

Note that NULL or masked cells won’t be exported (c.f. r.out.xyz documentation) making paste produce unexpected results.

CREATE TEMP TABLE tile_attr ( z INTEGER DEFAULT 12, x INTEGER, y INTEGER, value REAL);

COPY tile_attr (x, y, value) FROM 'hydro_dist.ssv' CSV DELIMITER ' ' ; UPDATE tile t SET hydro_dist = a.value FROM tile_attr a WHERE t.level = a.z AND t.x = a.x AND t.y = a.y ;

Distance to AOI

We here build a weight grid to increase priority for tiles located at the heart of our area of interest. The weight is the as-the-crow-flies distance to its boundary.

In the Web Mercator location with the level 12 zoom region, import (using v.proj if necessary) and rasterize your AOI:

v.to.rast in=aoi type=line,area out=aoi use=val value=1

Compute the distance to the AOI for outer cells:

r.grow.distance in=aoi dist=dist_outer_aoi

Proceed the same way for inner cells:

r.mapcalc "aoi_inv = if(isnull(aoi), 1, null())" r.grow.distance in=aoi_inv dist=dist_inner_aoi

Combine distance grids in inverting values for inner tiles:

r.mapcalc "dist_aoi = if(isnull(aoi), dist_outer_aoi, -dist_inner_aoi)"

With several AOIs, build a mask, shift and scale values to [0, 1]:

r.mapcalc "roi = aoi_1 ||| aoi_2" r.grow in=roi out=roi old=1 new=1 r.mask roi eval $(r.univar -g dist_aoi | grep "\(min\|max\)") r.mapcalc "dist_aoi = (dist_aoi - $min)/($max - $min)" --o

and compute a weighted average of each distance.

r.mapcalc "aois_dist = (dist_aoi_1 + dist_aoi_2)/2."

Imagery resolution

Imagery resolution can be retrieved the same way the Bing coverage analyser tool does (see the Coverage article on OSM Wiki). Here is an example Python code:

PATTERN = "http://t{srv}.domain.tld/tile/{key}.jpg" HEADER = "Content-Length"; server = randint(0, 7) url = PATTERN.format(srv=server, key=quadkey) meta = urlopen(url).info() size = meta.getheaders(HEADER)[0]

and the PostgreSQL function computing the quadkey from zoom level and tile coordinates:

CREATE FUNCTION QuadKey(z int, x int, y int, OUT quadkey character varying(16)) AS $$ DECLARE digit int; mask int; BEGIN quadkey := ''; FOR i IN REVERSE z..1 LOOP digit := 0; mask := 1 << (i - 1); IF (x & mask) != 0 THEN digit := digit + 1; END IF; IF (y & mask) != 0 THEN digit := digit + 2; END IF; quadkey := quadkey || digit; END LOOP; END; $$ LANGUAGE plpgsql;

In coarse resolution areas, Bing tiles are basically null. Let’s assess imagery quality by counting non null level 14 tiles:

SELECT substring(quadkey for 12) AS key, count(quadkey)/16 AS resolution_index FROM tile_bing WHERE z = 14 AND size > 0 GROUP BY substring(quadkey for 12) ORDER BY key ;

Cooking a priority index

With all these measurements by tile, we can now build a priority index according to our wishes. For example:

priority = ((1 - density) + dist_hydro + (1 - dist_aoi) + (1 - resolution))/4

sets a high value for tiles containing no nodes, located far from the coast but close to our area of interest and covered by low resolution imagery.

To use the priority map within QGIS, add a status column to the tile table, and create a dynamic TODO Tile layer with the DB Manager plugin and the following query:

SELECT row_number() OVER (ORDER BY priority DESC) AS id, priority, x, y, ST_ExteriorRing(geom) AS geom FROM tile WHERE status <> 'DONE' ORDER BY priority DESC LIMIT 32 ;

This new layer shows the 32 next task tiles to map:

Before mapping with JOSM, either save a selected tile to GPX (check FORCE_GPX_TRACK and note the use of ST_ExteriorRing ), or call GDAL from a layer action:

ogr2ogr -f GPX -sql "SELECT x || ' ' || y AS name, ST_ExteriorRing(geom) FROM tile WHERE x = '[% "x" %]' AND y = '[% "y" %]'" [% "x" %]_[% "y" %].gpx PG:dbname=osm -lco FORCE_GPX_TRACK=YES -nlt LINESTRING

Once the tile mapped, update its status attribute and refresh QGIS’ display ( F5 ).

Using this priority map, i noticed sometimes Mapbox imagery is of higher resolution than Bing’s. It would be useful to find if they provide an imagery resolution map. In addition, two tiles of consequential rank can be spatially very distant, finding a way to use them by clusters could be interesting. Finally, shouldn’t priority be directly included within OSMTM?

Happy mapping!