Remember the last time we helped Santa analyse statistical data using basic features of PostGIS? If not, take a time to read it first, cause now it’s time to move on to some serious stuff 😈 We’ll take a look at how to speed up queries execution and try to find the shortest path to our desired destination.

In the first tutorial we ended up writing simple queries with geospatial data using PostgreSQL and PostGIS. This powerful extension allowed us to add geospatial context to our data and analyse it from the totally different perspective, like calculating the distance between elements, joining tables based on objects’ position and much more. The problem we encountered was the time of queries execution even on the small dataset e.g. the Problem 2 with geospatial join of two tables (3500 and 250 rows) from the last post took about 3.5s to finish. A quick look at the EXPLAIN ANALYZE function result shows the problem.

Sort (cost=229281.12..229281.74 rows=246 width=23) (actual time=3233.233..3233.254 rows=246 loops=1)

Sort Key: (count(children.id)) DESC

Sort Method: quicksort Memory: 44kB

-> GroupAggregate (cost=229266.01..229271.35 rows=246 width=23) (actual time=3232.137..3233.140 rows=246 loops=1)

Group Key: countries.id, countries.name

-> Sort (cost=229266.01..229266.73 rows=288 width=19) (actual time=3232.125..3232.444 rows=3619 loops=1)

Sort Key: countries.id, countries.name

Sort Method: quicksort Memory: 309kB

-> Nested Loop Left Join (cost=0.00..229254.25 rows=288 width=19) (actual time=25.742..3230.079 rows=3619 loops=1)

Join Filter: ((countries.geom ~ children.geom) AND _st_contains(countries.geom, children.geom))

Rows Removed by Join Filter: 861175

-> Seq Scan on countries (cost=0.00..32.46 rows=246 width=1996) (actual time=0.020..0.445 rows=246 loops=1)

-> Materialize (cost=0.00..87.73 rows=3515 width=36) (actual time=0.000..0.288 rows=3515 loops=246)

-> Seq Scan on children (cost=0.00..70.15 rows=3515 width=36) (actual time=0.023..1.331 rows=3515 loops=1)

Planning time: 1.274 ms

Execution time: 3433.486 ms

For each country, the database is doing a sequential scan of all children objects performing complex operation of checking whether childrens.geom is within countries.geom or not, which gives 864'690 checks. Can we make it faster? Yes we can, let’s add indexes!

Spatial indexes

But how..? The normal way of adding index fails because we are trying to create it on a binary column which exceeds the max size.

=# CREATE INDEX children_gix ON chidren (geom);

ERROR: index row requires 11340 bytes, maximum size is 8191

The answer is to use special GIST index which instead of treating geom as binary data, uses its bounding box. The detailed explanation on how it works can be found here. To use this feature simply add USING GIST clause to your command.

=# CREATE INDEX children_gix ON chidren USING GIST (geom);

Now it’s time for some benchmarks. The Problem 2 shown before now takes only 400ms to finish so it’s almost 10x faster!

Pathfinding

Now as everything works fast, we can move to more advanced problems like finding the shortest and most optimal path between set of points. However, PostGIS doesn’t have built-in functions for performing such calculations, there is another extension named pgRouting which integrates with PostGIS and adds geospatial routing functionality to your database. That’s exactly what we were looking for!

Setup

pgRouting setup is easy on both OSX and Ubuntu operating systems. For detailed instructions for any OS visit official manual.

OSX

Use Homebrew package manager and run:

brew install pgrouting

Ubuntu



sudo sh -c 'echo "deb



# Import the repository key, update the package lists

sudo apt-get install wget ca-certificates # Create /etc/apt/sources.list.d/pgdg.list.sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'# Import the repository key, update the package listssudo apt-get install wget ca-certificates wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - sudo apt-get update



# Install pgrouting based on your postgres Installation: for this example is 9.3

sudo apt-get install postgresql-9.3-pgrouting

Features

Routing functions

pgRouting has support for wide range of routing algorithms eg.:

pgr_astar Shortest path A*

Shortest path A* pgr_dijkstra Dijkstra’s shortest path algorithm

Each of these functions takes as an input an edge_sql (SQL query, which should return a set of rows with the following columns: id, source, target, cost), start_vid (id of the starting vertex), end_vid (id of the ending vertex)

Database

Like PostGIS, pgRouting is also an optional extension, so before using its features you have to enable it in your database.

$ createdb pgr_tutorial

$ psql pgr_tutorial

pgr_tutorial=# CREATE EXTENSION postgis;

CREATE EXTENSION

pgr_tutorial=# CREATE EXTENSION pgrouting;

CREATE EXTENSION

Now, when everything is ready we can load some data. To explain the basics of pgRouting I’ll be using the public roads network of Masovian Voivodeship in Poland, downloaded from Geofabrik and converted using os2po tool. You can download it here [12MB].

gunzip -c roads.sql.gz | psql pgr_tutorial

Roads network

The only table in the dataset is roads which represents road network as LineStrings.

CREATE TABLE roads (

id integer NOT NULL PRIMARY KEY,

geom geometry(LineString, 4326)

);

Roads network of Masovian Voivodeship, Poland (Preview generated using QGis)

Network topology

As we have seen before, routing functions take as an argument network of connections in the form of a graph (start node id, end node id), so the last thing to do is to convert our actual road network to a graph. Fortunately, pgRouting helps us a lot with this task.

Start with adding two additional columns to our roads table.

ALTER TABLE roads ADD COLUMN source integer;

ALTER TABLE roads ADD COLUMN target integer;

CREATE INDEX roads_source_idx ON roads (source);

CREATE INDEX roads_target_idx ON roads (target);

And then use pgr_createTopology function which builds a network topology based on the geometry information (it analyses roads geometry and automatically assigns node ids to the source and target columns).

SELECT pgr_createTopology('roads', 0.0001, 'geom', 'id');

After about 5 minutes of calculations the dataset will be ready.

Cost

Since most of the algorithms need cost parameter assigned to each edge (road), we will use dynamically calculated length of each road using PostGIS st_length function (in the real application we should consider a lot of additional parameters such as type of roads, traffic, velocity limits etc.).

Start navigating! 🚗💨

After this short introduction let’s move to some real-world examples and find the shortest path between nodes #1 and #5000 using Dijkstra algorithm.

SELECT seq, node, edge, cost as cost, agg_cost, geom

FROM pgr_dijkstra(

'SELECT id, source, target, st_length(geom, true) as cost FROM roads',

1,

5000

) as pt

JOIN roads rd ON pt.edge = rd.id;

After about 1.5s the results appear (The total path length is about 57km):

Results list of nodes and roads on the path

Shortest path between nodes #1 and #5000 (Preview generated using QGis)

You may ask what are the nodes #1 and #5000? The answer is simple: just random numbers to verify if everything is working well. 😄 Of course the most common usage of routing applications is to find path between points on the map (lat, lon), not random nodes. To achieve that we have to find the nearest node to our desired locations, and then use the same query as before. So let’s find the way from our office (52.231909, 20.983455) to the Warsaw Chopin Airport (52.169647, 20.973400).

Query to find nearest node (PostGIS uses reversed order for lat lon):

SELECT source

FROM roads

ORDER BY ST_Distance(

ST_StartPoint(geom),

ST_SetSRID(ST_MakePoint(20.983455, 52.231909), 4326),

true

) ASC

LIMIT 1;

Final query:

SELECT seq, node, edge, cost as cost, agg_cost, geom

FROM pgr_dijkstra(

'SELECT id, source, target, st_length(geom, true) as cost FROM roads',

(SELECT source FROM roads

ORDER BY ST_Distance(

ST_StartPoint(geom),

ST_SetSRID(ST_MakePoint(20.983455, 52.231909), 4326),

true

) ASC

LIMIT 1),

(SELECT source FROM roads

ORDER BY ST_Distance(

ST_StartPoint(geom),

ST_SetSRID(ST_MakePoint(20.973400, 52.169647), 4326),

true

) ASC

LIMIT 1)

) as pt

JOIN roads rd ON pt.edge = rd.id;

The result obtained from Google Maps is almost the same as ours!

Comparison of Google Maps with pgRouting

As we can see in this short tutorial PostGIS & pgRouting is simple, yet powerful set of tools only waiting for you to use it in your new application. It can work with roads, public transport, flights network and much more. Why not try building Google Maps competitor in a week or less? Or integrate navigation in your new “Tinder for …” app? Possibilities are almost infinite!

If you enjoyed this post, please don’t forget to tap ❤! You can also follow us on Facebook, LinkedIn and Twitter.