OSM to PostGIS – The Basics

Ever wondered how to import OSM (OpenStreetMap) data into PostGIS [1] for the purpose of visualization and further analytics? Here are the basic steps to do so.

There are a bunch of tools on the market— osm2pgsql; imposm; ogr2org; just to mention some of those. In this article I will focus on osm2pgsql [2] .

Let’s start with the software prerequisites. PostGIS comes as a PostgreSQL database extension, which must be installed in addition to the core database. Up till now, the latest PostGIS version is 3, which was released some days ago. For the current tasks I utilized PostGIS 2.5 on top of PostgreSQL 11.

This brings me to the basic requirements for the import – PostgreSQL >= 9.4 and PostGIS 2.2 are required, even though I recommend installing PostGIS >=2.5 on your database; it’s supported from 9.4 upwards. Please consult PostGIS’ overall compatibility and support matrix [3] to find a matching pair of components.

Osm2pgsql Setup

Let’s start by setting up osm2pgsql on the OS of your choice – I stick to Ubuntu 18.04.04 Bionic Beaver and compiled osm2gsql from source to get the latest updates.

Install required libraries

sudo apt-get install make cmake g++ libboost-dev libboost-system-dev \ libboost-filesystem-dev libexpat1-dev zlib1g-dev \ libbz2-dev libpq-dev libproj-dev lua5.2 liblua5.2-dev

Grab the repo

git clone https://github.com/openstreetmap/osm2pgsql.git

Compile

mkdir build && cd build cmake .. make sudo make install

If everything went fine, I suggest checking the resulting binary and its release by executing

./osm2pgsql-version. osm2pgsql version 1.0.0 (64 bit id space)

Data acquisition

In the world of OSM, data acquisition is a topic of its own, and worth writing a separate post discussing different acquisition strategies depending on business needs, spatial extent and update frequency. I won’t get into details here, instead, I’ll just grab my osm data for my preferred area directly from Geofabrik, a company offering data extracts and related daily updates for various regions of the world. This can be very handy when you are just interested in a subregion and therefore don’t want to deal with splitting the whole planet osm depending on your area of interest – even though osm2pgsql offers the possibility to hand over a bounding box as a spatial mask.

As a side note – osm data’s features are delivered as lon/lat by default.

So let’s get your hands dirty and fetch a pbf of your preferred area from Geofabrik’s download servers [4] [5]. For a quick start, I recommend downloading a dataset covering a small area:

wget https://download.geofabrik.de/europe/iceland-latest.osm.pbf

Optionally utilize osmium to check the pbf file by reading its metadata afterwards.

osmium fileinfo ~/osmdata/iceland-latest.osm.pbf

File: Name: /home/florian/osmdata/iceland-latest.osm.pbf Format: PBF Compression: none Size: 40357343 Header: Bounding boxes: (-25.7408,62.8455,-12.4171,67.5008) With history: no Options: generator=osmium/1.8.0 osmosis_replication_base_url=http://download.geofabrik.de/europe/iceland-updates osmosis_replication_sequence_number=2413 osmosis_replication_timestamp=2019-10-26T20:18:02Z pbf_dense_nodes=true timestamp=2019-10-26T20:18:02Z

Database setup

Before finally importing the osm into PostGIS, we have to set up a database enabling the PostGIS extension. As easy as it sounds – connect to your database with your preferred database client or pgsql, and enable the extension by executing

create extension postgis

.

Subsequently, execute

select POSTGIS_VERSION()

to validate the PostGIS installation within your database.

Database import

osm2pgsql is a powerful tool to import osmdata into PostGIS offering various parameters to tune. It’s worthwhile to mention the existence of the default-style parameter [6], which defines how osm data is parsed and finally represented in the database. The diagram below shows the common database model generated by osm2pgsql using the default style.

It’s hard to give a recommendation on how this style should be adopted, as this heavily depends on its application. As a rule of thumb, the default style is a good starting point for spatial analysis, visualizations and can even be fed with spatial services, since this layout is supported by various solutions (e.g. Mapnik rendering engine).

I will start off with some basic import routines and then move to more advanced ones. To speed up the process in general, I advise you to define the number of processes and cache in MB to use. Even if this blogpost is not intended as a performance report, I attached some execution times and further numbers for the given parametrized commands to better understand the impact the mentioned parameters have.

The imports were performed on a virtualized Ubuntu 18.04 (KVM) machine equipped with 24 cores (out of 32 logical cores provided by an AMD Ryzen Threadripper 2950X), 24GB RAM, and a dedicated 2TB NVMe SSD (Samsung 970 EVO).

Mandatory parameters

Before going into details, internalize the following main parameters:

-U for database user, -W to prompt for the database password, -d refers to the database and finally -H defines the host. The database schema is not exposed as a parameter and therefore must be adjusted via the search path.

Import utilizing default style

Default import of pbf: existing tables will be overwritten. Features are projected to WebMercator (SRID 3857) by default.

osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 --number-processes 24 -C 20480 iceland-latest.osm.pbf

The command was executed in ~11 seconds. The table below shows generated objects and cardinalities.

table_schema table_name row_estimate total index table public planet_osm_line 142681 97 MB 13 MB 67 MB public planet_osm_point 145017 19 MB 7240 kB 12 MB public planet_osm_polygon 176204 110 MB 17 MB 66 MB public planet_osm_roads 8824 14 MB 696 kB 5776 kB

Import utilizing default style with slim



Parameter s (“slim”) forces the tool to store temporary node information in the database instead of in the memory. It is an optional parameter intended to enable huge imports and avoid out-of-memory exceptions. The parameter is mandatory if you want to enable incremental updates instead of full ones. The parameter can be complemented with

--flat-nodes

to store this information outside the database as a file.

The command was executed in ~37 seconds. The table below shows generated objects and cardinalities.

table_schema table_name row_estimate total index table public planet_osm_line 142681 102 MB 18 MB 67 MB public planet_osm_nodes 6100392 388 MB 131 MB 258 MB public planet_osm_point 145017 23 MB 11 MB 12 MB public planet_osm_polygon 176204 117 MB 24 MB 66 MB public planet_osm_rels 9141 9824 kB 4872 kB 4736 kB public planet_osm_roads 8824 15 MB 1000 kB 5776 kB public planet_osm_ways 325545 399 MB 306 MB 91 MB

Import utilizing default style, tag configuration

By default, tags referenced by a column are exposed as separate columns. Parameter hstore forces the tool to store unreferenced tags in a separate hstore column.

Note: Database extension hstore must be installed beforehand.

osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 -s --hstore --number-processes 24 -C 20480 iceland-latest.osm.pbf

For the sake of completeness

–hstore-match-only and -hstore-all should be mentioned as well:

–hstore-all pushes standard tags to individual columns and the hstore column as well

The command was executed in ~35 seconds. The table below shows generated objects and cardinalities.

table_schema table_name row_estimate total index table public planet_osm_line 142875 106 MB 18 MB 71 MB public planet_osm_nodes 6100560 388 MB 131 MB 258 MB public planet_osm_point 151041 27 MB 12 MB 15 MB public planet_osm_polygon 176342 122 MB 24 MB 72 MB public planet_osm_rels 9141 9824 kB 4872 kB 4736 kB public planet_osm_roads 8824 16 MB 1000 kB 6240 kB public planet_osm_ways 325545 399 MB 306 MB 91 MB

Import utilizing default style, multi-geometry

By default, objects containing multiple disjoint geometries are stored as separate features within the database. Think of Vienna and its districts, which could be represented as 23 individual polygons or one multi-polygon. Parameter -G forces the tool to store geometries belonging to the same object as multi-polygon.

osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 -s -G --number-processes 24 -C 20480 iceland-latest.osm.pbf

The impact emerges most clearly during spatial operations, since the spatial index utilizes the feature to its fullextent, in order to decide which features must be considered.

The command was executed in ~36 seconds. The table below shows generated objects and cardinalities.

table_schema table_name row_estimate total index table public planet_osm_line 142681 102 MB 18 MB 67 MB public planet_osm_nodes 6100392 388 MB 131 MB 258 MB public planet_osm_point 145017 23 MB 11 MB 12 MB public planet_osm_polygon 174459 117 MB 24 MB 65 MB public planet_osm_rels 9141 9824 kB 4872 kB 4736 kB public planet_osm_roads 8824 15 MB 1000 kB 5776 kB public planet_osm_ways 325545 399 MB 306 MB 91 MB

Results and next steps

The table highlights the influence of osm2pgsql parameters on execution time, generated objects, cardinalities and subsequently sizing. In addition, it’s worth it to understand the impact of parameters like multi-geometry, which forces the tool to create multi-geometry features instead of single-geometry features. Preferring one over the other might lead to performance issues, especially when executing spatial operators (as those normally take advantage of the extent of the features).

The next posts will complement this post by inspecting and visualizing our import results and subsequently dealing with osm updates to stay up to date.

References

[1] „PostGIS Reference,“ [Online]. Available: https://postgis.net/. [2] „osm2pgsql GitHub Repository,“ [Online]. Available: https://github.com/openstreetmap/osm2pgsql. [3] „PostGIS Compatiblity and Support,“ [Online]. Available: https://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS. [4] „Geofabrik Download Server,“ [Online]. Available: https://download.geofabrik.de/. [5] „PBF Format,“ [Online]. Available: https://wiki.openstreetmap.org/wiki/PBF_Format. [6] „Default Style – osm2pgsql,“ [Online]. Available: https://wiki.openstreetmap.org/wiki/Osm2pgsql/schema. [7] „QGIS Osm styles,“ [Online]. Available: https://github.com/yannos/Beautiful_OSM_in_QGIS. [8] „osm2pgsql Database schema,“ [Online]. Available: https://wiki.openstreetmap.org/w/images/b/bf/UMLclassOf-osm2pgsql-schema.png.

Appendix

Generated objects and cardinalities from statistics

SELECT *, pg_size_pretty(total_bytes) AS total , pg_size_pretty(index_bytes) AS INDEX , pg_size_pretty(table_bytes) AS TABLE FROM ( SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM ( SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME , c.reltuples AS row_estimate , pg_total_relation_size(c.oid) AS total_bytes , pg_indexes_size(c.oid) AS index_bytes , pg_total_relation_size(reltoastrelid) AS toast_bytes FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE relkind = 'r' ) a ) a where a.table_schema ='public';

Generated objects and cardinalities with count

create function rowcount_all(schema_name text DEFAULT 'public'::text) returns TABLE(table_name text, cnt bigint) language plpgsql as $$ declare table_name text; begin for table_name in SELECT c.relname FROM pg_class c JOIN pg_namespace s ON (c.relnamespace=s.oid) WHERE c.relkind = 'r' AND s.nspname=schema_name LOOP RETURN QUERY EXECUTE format('select cast(%L as text),count(*) from %I.%I', table_name, schema_name, table_name); END LOOP; end $$; alter function rowcount_all(text) owner to postgres; with pgClass as ( SELECT *, pg_size_pretty(total_bytes) AS total , pg_size_pretty(index_bytes) AS INDEX , pg_size_pretty(table_bytes) AS TABLE FROM ( SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM ( SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME , c.reltuples AS row_estimate , pg_total_relation_size(c.oid) AS total_bytes , pg_indexes_size(c.oid) AS index_bytes , pg_total_relation_size(reltoastrelid) AS toast_bytes FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE relkind = 'r' ) a ) a) select rowcount_all.table_name, row_estimate,cnt,total,pgClass.INDEX,pgClass.TABLE from rowcount_all(),pgClass where rowcount_all.table_name=pgClass.TABLE_NAME