What’s new in PostgreSQL 10? Part 2 – Native Partitioning by Michael Goldberg Software in PostgreSQL

Overview

Before version 10, PostgreSQL used inheritance tables as a method for physically partitioning a large table. The inheritance table creates multiple child tables for the parent table and maintains the consistency of data by CHECK constraints and triggers. The application can send requests to the parent table and transparently use the data of the child table. However, this method requires more time for development, since it is necessary to create CHECK constraints for each table, in addition, we have a certain loss of performance on the data insertion. The main disadvantages of the current implementation of the partitioning of large tables:

Support for data consistency only at the level of one partition Trigger development and support required To automatically add partitions, you need to develop a additional code INSERT statements for the parent table need to redirect to child tables by triggers so that it is slower Increase the time to build a query plan with an increase in the number of partitions. This is due to the sequential search of all the partitions by the planer and checking the constraint exclusion Can not use a foreign table as a partition. Single-Level Partitioning

In version 10, fell the patch which creates the basic infrastructure for native partitioning and also solves a some of current problems.

PostgreSQL offers built-in support for the following forms of partitioning:

Range Partitioning. The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. List Partitioning. The table is partitioned by explicitly listing which key values appear in each partition.

Range Partitioning

The RANGE partition table is a way to group multiple partitions that can store a range of specific values. To create a range partition table, first create a parent table accessed by the application. Specify the PARTITION BY RANGE clause in the CREATE TABLE statement. In the RANGE clause, specify the column name (or calculated value) to be partitioned. Multiple column names can be specified by separating them with a comma (,). A NOT NULL constraint is automatically set for the partitioned columns (except for calculated values). At this point the INSERT statement for the table fails.

In our example, we will store the coordinates of GPS device points. Specifically, the tables will have following schema:

postgres=# CREATE TABLE points(id serial NOT NULL, serial_num bigint NOT NULL, range_dt timestamp NOT NULL, lat numeric, lon numeric) PARTITION BY RANGE (range_dt); CREATE TABLE postgres=# CREATE TABLE points_20170701 PARTITION OF points (id, primary key(id)) FOR VALUES FROM ('2017-07-01 00:00:00') TO ('2017-07-02 00:00:00'); CREATE TABLE postgres=# CREATE TABLE points_20170702 PARTITION OF points (id, primary key(id)) FOR VALUES FROM ('2017-07-02 00:00:00') TO ('2017-07-03 00:00:00'); CREATE TABLE postgres=# CREATE TABLE points_archive PARTITION OF points(id, primary key(id)) FOR VALUES FROM (unbounded) TO ('2017-06-30 00:00:00') WITH (parallel_workers = 8); CREATE TABLE

Let’s see how the partition definition looks like:

postgres=# \d+ points Table "public.points" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+-----------------------------+-----------+----------+------------------------------------+---------+--------------+------------- id | integer | | not null | nextval('points_id_seq'::regclass) | plain | | serial_num | bigint | | not null | | plain | | range_dt | timestamp without time zone | | not null | | plain | | lat | numeric | | | | main | | lon | numeric | | | | main | | Partition key: RANGE (range_dt) Partitions: points_20170701 FOR VALUES FROM ('2017-07-01 00:00:00') TO ('2017-07-02 00:00:00'), points_20170702 FOR VALUES FROM ('2017-07-02 00:00:00') TO ('2017-07-03 00:00:00'), points_archive FOR VALUES FROM (UNBOUNDED) TO ('2017-06-30 00:00:00')

postgres=# \d points_20170701 Table "public.points_20170701" Column | Type | Collation | Nullable | Default ------------+-----------------------------+-----------+----------+--------- id | integer | | not null | serial_num | bigint | | not null | range_dt | timestamp without time zone | | not null | lat | numeric | | | lon | numeric | | | Partition of: points FOR VALUES FROM ('2017-07-01 00:00:00') TO ('2017-07-02 00:00:00') Indexes: "points_20170701_pkey" PRIMARY KEY, btree (id)

postgres-# \d points_20170702 Table "public.points_20170702" Column | Type | Collation | Nullable | Default ------------+-----------------------------+-----------+----------+--------- id | integer | | not null | serial_num | bigint | | not null | range_dt | timestamp without time zone | | not null | lat | numeric | | | lon | numeric | | | Partition of: points FOR VALUES FROM ('2017-07-02 00:00:00') TO ('2017-07-03 00:00:00') Indexes: "points_20170702_pkey" PRIMARY KEY, btree (id)

postgres=# \d points_archive Table "public.points_archive" Column | Type | Collation | Nullable | Default ------------+-----------------------------+-----------+----------+--------- id | integer | | not null | serial_num | bigint | | not null | range_dt | timestamp without time zone | | not null | lat | numeric | | | lon | numeric | | | Partition of: points FOR VALUES FROM (UNBOUNDED) TO ('2017-06-30 00:00:00') Indexes: "points_archive_pkey" PRIMARY KEY, btree (id)

Inserting data:

postgres=# INSERT INTO points(serial_num, range_dt, lat, lon) VALUES (12345678, '2017-07-01 00:10:00', 55.32, 45.34), (12345678, '2017-07-02 02:30:00', 56.21, 44.38), (12345678, '2017-01-01 17:55:00', 54.02, 47.34); INSERT 0 3

Selecting our data:

postgres=# SELECT * FROM public.points_20170701; id | serial_num | range_dt | lat | lon ----+------------+---------------------+-------+------- 1 | 12345678 | 2017-07-01 00:10:00 | 55.32 | 45.34 (1 row) postgres=# SELECT * FROM public.points_20170702; id | serial_num | range_dt | lat | lon ----+------------+---------------------+-------+------- 2 | 12345678 | 2017-07-02 02:30:00 | 56.21 | 44.38 (1 row) postgres=# SELECT * FROM public.points_archive; id | serial_num | range_dt | lat | lon ----+------------+---------------------+-------+------- 3 | 12345678 | 2017-01-01 17:55:00 | 54.02 | 47.34 (1 row)

PostgreSQL insert our data in the appropriate tables!

And selecting data with expression:

postgres=# EXPLAIN ANALYZE SELECT * FROM points WHERE range_dt < '2017-07-01 00:00:00'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Append (cost=0.00..19.00 rows=240 width=84) (actual time=0.008..0.009 rows=1 loops=1) -> Seq Scan on points_archive (cost=0.00..19.00 rows=240 width=84) (actual time=0.007..0.008 rows=1 loops=1) Filter: (range_dt < '2017-07-01 00:00:00'::timestamp without time zone) Planning time: 0.134 ms Execution time: 0.023 ms (5 rows)

Great! It’s work! OK, try creating a new partition and adding.

postgres=# CREATE TABLE points_201706 PARTITION OF points(id, primary key(id)) FOR VALUES FROM ('2017-06-01 00:00:00') TO ('2017-07-01 00:00:00'); ERROR: partition "points_201706" would overlap partition "points_archive"

Logically, partitions can not intersect. We know that the data in these two tables do not overlap, so we can do this:

postgres=# ALTER TABLE points DETACH PARTITION points_archive; ALTER TABLE postgres=# ALTER TABLE points ATTACH PARTITION points_archive FOR VALUES FROM (unbounded) TO ('2017-06-01 00:00:00'); ALTER TABLE postgres=# CREATE TABLE points_201706 PARTITION OF points(id, primary key(id)) FOR VALUES FROM ('2017-06-01 00:00:00') TO ('2017-07-01 00:00:00'); CREATE TABLE

Good. What about a nested partitioning? For example if we have many rows at the partition points_20170704 then we can make a data partition for the serial_num key.

postgres=# CREATE TABLE points_20170704 PARTITION OF points FOR VALUES FROM ('2017-07-04 00:00:00') TO ('2017-07-05 00:00:00') PARTITION BY RANGE (range_dt); CREATE TABLE postgres=# CREATE TABLE points_20170704_1 PARTITION OF points_20170704 (id, primary key(id)) FOR VALUES FROM (10000000) TO (12345678); CREATE TABLE postgres=# CREATE TABLE points_20170704_2 PARTITION OF points_20170704 (id, primary key(id)) FOR VALUES FROM (12345678) TO (unbounded); CREATE TABLE

It’s work! Insert the data:

postgres=# INSERT INTO points(serial_num, range_dt, lat, lon) VALUES(12345677, '2017-07-04 00:40:00', 55.32, 45.34), (12345678, '2017-07-04 01:30:00', 56.21, 44.38); INSERT 0 2

And look at the query result:

postgres=# EXPLAIN ANALYZE SELECT * FROM points WHERE serial_num = 12345678 AND range_dt BETWEEN '2017-07-04 00:10:55' AND '2017-07-04 00:50:30'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Append (cost=0.00..22.60 rows=1 width=84) (actual time=0.008..0.008 rows=0 loops=1) -> Seq Scan on points_20170704_2 (cost=0.00..22.60 rows=1 width=84) (actual time=0.008..0.008 rows=0 loops=1) Filter: ((range_dt >= '2017-07-04 00:10:55'::timestamp without time zone) AND (range_dt <= '2017-07-04 00:50:30'::timestamp without time zone) AND (serial_num = 12345678)) Rows Removed by Filter: 1 Planning time: 0.348 ms Execution time: 0.023 ms

Nice. And now let’s assume that access to archival data is very rare and too expensive to keep in the current database. But however, we need to be able to read them. What to do? Fortunately, native partitioning allows you to work with FDW tables. Let’s check it out!

postgres=# CREATE EXTENSION postgres_fdw; CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', port '5432', dbname 'postgres'); CREATE USER MAPPING FOR postgres SERVER foreign_server OPTIONS (user 'postgres', password 'bigsecret'); CREATE FOREIGN TABLE points_archive_fdw (id integer NOT NULL, serial_num bigint NOT NULL, range_dt timestamp NOT NULL, lat numeric, lon numeric) SERVER foreign_server OPTIONS (schema_name 'public', table_name 'points_archive'); postgres=# SELECT * FROM points_archive_fdw; id | serial_num | range_dt | lat | lon ----+------------+---------------------+-------+------- 3 | 12345678 | 2017-01-01 17:55:00 | 54.02 | 47.34 (1 row)

Simulate the situation, as if we moved the archive partition to another server and created a FDW table instead of it:

postgres=# ALTER TABLE points DETACH PARTITION points_archive; ALTER TABLE postgres=# ALTER TABLE points ATTACH PARTITION points_archive_fdw FOR VALUES FROM (unbounded) TO ('2017-06-01 00:00:00'); ALTER TABLE postgres=# EXPLAIN ANALYZE SELECT * FROM points WHERE range_dt < '2017-05-01 00:00:00'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Append (cost=100.00..124.53 rows=253 width=84) (actual time=0.332..0.333 rows=1 loops=1) -> Foreign Scan on points_archive_fdw (cost=100.00..124.53 rows=253 width=84) (actual time=0.331..0.332 rows=1 loops=1) Planning time: 0.274 ms Execution time: 0.602 ms (4 rows)

Wow! Native partitioning allows you to use tables on a remote server!

List Partitioning

And finally, check the list of partitioning and compare the speed of data insertion on a single table and on a partitioned.

postgres=# CREATE TABLE accounts (id serial NOT NULL, account_type character varying NOT NULL, created_at timestamp NOT NULL DEFAULT now()) PARTITION BY LIST (account_type); CREATE TABLE postgres=# CREATE TABLE accounts_europe PARTITION OF accounts (id, primary key(id)) FOR VALUES IN ('europe'); CREATE TABLE postgres=# CREATE TABLE accounts_australia PARTITION OF accounts (id, primary key(id)) FOR VALUES IN ('australia'); CREATE TABLE postgres=# CREATE TABLE accounts_america PARTITION OF accounts (id, primary key(id)) FOR VALUES IN ('america'); CREATE TABLE postgres=# CREATE TABLE accounts_asia PARTITION OF accounts (id, primary key(id)) FOR VALUES IN ('asia'); CREATE TABLE postgres=# CREATE TABLE accounts_single (id serial primary key NOT NULL, account_type character varying NOT NULL, created_at timestamp NOT NULL DEFAULT now()); CREATE TABLE

postgres=# INSERT INTO accounts_single(account_type) (SELECT (array['europe','asia','australia','america']::varchar[])[(random()*10000)::INT%4+1] FROM generate_series(1,100000)); INSERT 0 100000 Time: 995.336 ms postgres=# INSERT INTO accounts(account_type) (SELECT (array['europe','asia','australia','america']::varchar[])[(random()*10000)::INT%4+1] FROM generate_series(1,100000)); INSERT 0 100000 Time: 1058.801 ms

No bad. We have almost the same speed of data insertion.

Conclusion

The correct step in the development of native partitioning was made and some of the problems that the developers had faced before were solved. So far, only the basic infrastructure has been implemented, and everything is not very far from inheritance. In addition to the syntax, storing meta-information about partitions with the system catalog and cache is implemented, as well as routing of tuples with INSERT / COPY. But the slow CONSTRAINT EXCLUSION mechanism is still used to select the appropriate partitions for SELECT. But Robert Haas wrote about what improvements in the mechanism of partitioning will be in the future. And we are sure that this useful work will be done successfully.