par_psql v0.1: 'Parallel psql’, for queries and workflows in PostgreSQL/PostGIS.

================================================

Hi everyone

http://github.com/gbb/par_psql/

I’ve written a tool (par_psql) which makes parallelisation easier for PostgreSQL/PostGIS users, by providing a new piece of syntax.

With —-& inline, it runs queries or groups of queries in parallel.

Without —-&, it synchronises parallel work then runs subsequent code normally.

This allows easy control of parallelism and synchronisation inline within your SQL script.

The tool is backwards compatible with existing psql scripts, and par_psql scripts are backwards compatible with psql. It should work with any version of PostgreSQL. The only dependencies are bash and psql.

Benchmark and example code is provided at http://github.com/gbb/par_psql.

Quick example

=============

create table a as ...

create table a1 as ... —-&

create table a2 as ... —-&

create table c ...

Some cool uses

==============

1. GIS and any other discipline where you prepare diverse source datasets in a multi-stage workflow before integrating them together.

2. Where you have CPU-intensive queries, split the work via one field (e.g. ID) and create parallel temp tables. UNION the results.

3. Add “Preview runs”, that complete progressively using subsets of the data without delaying the main task.

4. Create scripts where several tasks run at fixed times after the script begins (use pg_sleep() and run them in parallel).

It’s available under the postgresql open source license. It's a 'quick hack' and version 0.1, so please be kind with any criticism/bug reports. That said, it works well for me. Enjoy! :-)

Graeme Bell

http://github.com/gbb/par_psql/

ps. I am grateful to the Norwegian Forest and Landscape Institute (soon to be integrated into the Norwegian NIBIO Institute) for supporting and open sourcing this and other scripts as a contribution to this year’s FOSS4G Europe Open Source Mapping conference.