Table partitioning is an optimisation technique available in PostgreSQL that might yield considerable performance boosts for slow queries which run on a big table, where indexes grow beyond available memory. The exact definition of “big” will obviously vary depending on the type of hardware used. For most purposes a table can be considered a candidate for partitioning if its size exceeds server’s physical memory.

We faced this issue when working on the recently launched new version of the CITES Trade Database. It is a database of international trade in CITES protected species, which holds a total of over 13mln reported transactions since 1975. Users can download filterable reports that summarise trade. Because of the number of possible combinations of filters those reports are not easily persisted, so we looked into other options of speeding things up.





Divide and conquer

Table partitioning allows to split a huge table in a number of smaller physical tables, but still access it as as single logical table. Most importantly, queries running on the logical table might only need to access some of the physical tables to fetch the results. The way data is split across the physical tables (= partitions) is managed by the database according to defined constraints. Those should be given as non-overlapping ranges (e.g. year > 2000 AND year <= 2001) or lists (e.g. customer_id = 5). The column used to differentiate between partitions is the partition key. It should be chosen so that it reflects the most often used filter on a given table. In our case a good candidate was year of transaction, as that filter is specified with every query. The objective when defining partitions should be to allow as many queries as possible to fetch data from as few partitions as possible - ideally one. There are no restrictions imposed by the database as to how you partition your data, and, more importantly, no verification is performed on the partitioning logic.

Magic starts here



There are 2 mechanisms that lie at the heart of partitioning:

table inheritance

table check constraints

Table inheritance allows a table to inherit column definitions from its parent table. The child table may have additional columns. It may also have multiple parents. For purposes of partitioning, all the tables in the hierarchy have the same set of columns. The root of the hierarchy is sometimes called the master table.

CREATE TABLE A (value INT); CREATE TABLE B () INHERITS (A); CREATE TABLE C () INHERITS (A);

The magic lies in the fact that querying the parent table will also fetch results from child tables.

INSERT INTO A VALUES (0); INSERT INTO B VALUES (10); INSERT INTO C VALUES (20); SELECT * FROM A WHERE value < 20; value ------- 0 10

Inspection using EXPLAIN ANALYZE reveals that the query now has an “append” step, which merges results from all tables in the hierarchy:

EXPLAIN ANALYZE SELECT * FROM A WHERE value < 20; QUERY PLAN ---------------------------------------------------------------------------------------------------- Append (cost=0.00..69.38 rows=1290 width=4) (actual time=0.020..0.033 rows=2 loops=1) -> Seq Scan on a (cost=0.00..4.00 rows=80 width=4) (actual time=0.019..0.021 rows=1 loops=1) Filter: (value < 20) -> Seq Scan on b (cost=0.00..40.00 rows=800 width=4) (actual time=0.005..0.006 rows=1 loops=1) Filter: (value < 20) -> Seq Scan on c (cost=0.00..25.38 rows=410 width=4) (actual time=0.005..0.005 rows=0 loops=1) Filter: (value < 20) Rows Removed by Filter: 1

While magical in itself, this alone does not help with performance as we still access all of the underlying partitions. Ideally we’d like the database to not even bother looking into tables which are guaranteed to not contain relevant data. This behaviour can be achieved with the check constraint.

CREATE TABLE A (value INT); CREATE TABLE B (CHECK (value < 20)) INHERITS (A); CREATE TABLE C (CHECK (value >= 20)) INHERITS (A);

There are no check constraints defined on the master table. That is because this table is supposed to be empty.

A check constraint defined on a table will prevent any data that does not match the constraint to get inside:

INSERT INTO B VALUES (10); INSERT INTO C VALUES (20); INSERT INTO C VALUES(5); ERROR: new row for relation "c" violates check constraint "c_value_check" DETAIL: Failing row contains (5).

but more importantly, it will allow the planner to optimise queries on partitioned tables and only consider those partitions which hold relevant data:

EXPLAIN ANALYZE SELECT * FROM A WHERE value < 20; QUERY PLAN ---------------------------------------------------------------------------------------------------- Append (cost=0.00..40.00 rows=801 width=4) (actual time=0.015..0.017 rows=1 loops=1) -> Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1) Filter: (value < 20) -> Seq Scan on b (cost=0.00..40.00 rows=800 width=4) (actual time=0.012..0.013 rows=1 loops=1) Filter: (value < 20)

Real life starts here

Obviously, there are a couple of pitfalls to watch out for.

Gotcha #1 : In order to enable exclusion of tables by the planner, a setting called constraint_exclusion needs to be turned on

: In order to enable exclusion of tables by the planner, a setting called needs to be turned on Gotcha #2 : When creating child tables, any constraints other than NOT NULL or CHECK are not inherited. They need to be specified for every partition (e.g. foreign keys, indexes).

: When creating child tables, any constraints other than NOT NULL or CHECK are not inherited. They need to be specified for every partition (e.g. foreign keys, indexes). Gotcha #3 : There is no practical way to enforce uniqueness of a serial id column across partitions. If application logic depends on that assumption it either needs to be extended to consider the partitioning key together with the numeric id, or a different type of id generator needs to be used (e.g. UUID)

: There is no practical way to enforce uniqueness of a serial id column across partitions. If application logic depends on that assumption it either needs to be extended to consider the partitioning key together with the numeric id, or a different type of id generator needs to be used (e.g. UUID) Gotcha #4: There is no way to define a foreign key referencing a partitioned table (other than one targeting a single partition).

A practical issue that remains to be solved is how to insert / update data so that it seamlessly goes into its dedicated partitions. The recommended way to ensure that is by defining triggers that redirect the statements. An alternative way would be to use rules, but because of their overhead they’re only recommended for batch operations. Here’s a simple trigger example:

CREATE OR REPLACE FUNCTION a_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF NEW.value < 20 THEN INSERT INTO b VALUES (NEW.*); ELSE INSERT INTO c VALUES (NEW.*); END IF; RETURN NULL; END; $$ LANGUAGE plpgsql;

CREATE TRIGGER insert_a_trigger BEFORE INSERT ON a FOR EACH ROW EXECUTE PROCEDURE a_insert_trigger();

Rails applications might benefit from using the Partitioned gem. It generates migrations with all the required sql (using rules for insert / update routing) and adds helpers to ActiveRecord models that simplify the api for accessing partitioned tables.

Another useful tool is pg_partman, a postgres extension for managing time-based and serial-based table partition sets and solid documentation.

Outcome

We ran some benchmarks using pg_bench to see if we can get a performance win from partitioning the transactions in the new Trade Database. The tests we ran involved partitioning by different year intervals, in order to discover the most optimal partitioning scheme. Best results were obtained when partitioning by 5-year intervals, with our reports running at twice the speed in cases where data was fetched from a single partition. Unfortunately, as more partitions were targeted, the performance gain diminished very quickly. We decided to hold off on implementing partitioning, as we were not sure if it is reasonable to assume that users are most likely to fetch data in 5-year ranges before we have some statistics from the new system to analyse.

This post is based on a presentation given at the Cambridge Ruby User Group meetup in March 2014. Partitioning is described nicely in “PostgreSQL 9.0 High Performance” by Gregory Smith.

@agnessa480