Handling Very Large Tables in Postgres Using Partitioning

Listen to this article

One of the interesting patterns that we’ve seen, as a result of managing one of the largest fleets of Postgres databases, is one or two tables growing at a rate that’s much larger and faster than the rest of the tables in the database. In terms of absolute numbers, a table that grows sufficiently large is on the order of hundreds of gigabytes to terabytes in size. Typically, the data in this table tracks events in an application or is analogous to an application log. Having a table of this size isn’t a problem in and of itself, but can lead to other issues; query performance can start to degrade and indexes can take much longer to update. Maintenance tasks, such as vacuum, can also become inordinately long. Depending on how you need to work with the information being stored, Postgres table partitioning can be a great way to restore query performance and deal with large volumes of data over time without having to resort to changing to a different data store.

We use pg_partman ourselves in the Postgres database that backs the control plane that maintains the fleet of Heroku Postgres, Heroku Redis, and Heroku Kafka stores. In our control plane, we have a table that tracks all of the state transitions for any individual data store. Since we don’t need that information to stick around after a couple of weeks, we use table partitioning. This allows us to drop tables after the two week window and we can keep queries blazing fast. To understand how to get better performance with a large dataset in Postgres, we need to understand how Postgres does inheritance, how to set up table partitions manually, and then how to use the Postgres extension, pg_partman, to ease the partitioning setup and maintenance process.

Postgres has basic support for table partitioning via table inheritance. Inheritance for tables in Postgres is much like inheritance in object-oriented programming. A table is said to inherit from another one when it maintains the same data definition and interface. Table inheritance for Postgres has been around for quite some time, which means the functionality has had time to mature. Let’s walk through a contrived example to illustrate how inheritance works:

CREATE TABLE products ( id BIGSERIAL, price INTEGER created_at TIMESTAMPTZ, updated_at TIMESTAMPTZ ); CREATE TABLE books ( isbn TEXT, author TEXT, title TEXT ) INHERITS (products); CREATE TABLE albums ( artist TEXT, length INTEGER, number_of_songs INTEGER ) INHERITS (products);

In this example, both books and albums inherit from products. This means that if a record was inserted into the books table, it would have all the same characteristics of the products table plus that of the books table. If a query was issued against the products table, that query would reference information on the product table plus all of its descendants. For this example, the query would reference products, books and albums. That’s the default behavior in Postgres. But, you can also issue queries against any of the child tables individually.

Now that we have a grasp on inheritance in Postgres, we’ll set up partitioning manually. The basic premise of partitioning is that a master table exists that all other children inherit from. We’ll use the phrase ‘child table’ and partition interchangeably throughout the rest of the setup process. Data should not live on the master table at all. Instead, when data gets inserted into the master table, it gets redirected to the appropriate child partition table. This redirection is usually defined by a trigger that lives in Postgres. On top of that, CHECK constraints are put on each of the child tables so that if data were to be inserted directly on the child table, the correct information will be inserted. That way data that doesn’t belong in the partition won’t end up in there.

When doing table partitioning, you need to figure out what key will dictate how information is partitioned across the child tables. Let’s go through the process of partitioning a very large events table in our Postgres database. For an events table, time is the key that determines how to split out information. Let’s also assume that our events table gets 10 million INSERTs done in any given day and this is our original events table schema:

CREATE TABLE events ( uuid text, name text, user_id bigint, account_id bigint, created_at timestamptz );

Let’s make a few more assumptions to round out the example. The aggregate queries that run against the events table only have a time frame of a single day. This means our aggregations are split up by hour for any given day. Our usage of the data in the events table only spans a couple of days. After that time, we don’t query the data any more. On top of that, we have 10 million events generated a day. Given these extra assumptions, it makes sense to create daily partitions. The key that we’ll use to partition the data will be the time at which the event was created (e.g. created_at).

CREATE TABLE events ( uuid text, name text, user_id bigint, account_id bigint, created_at timestamptz ); CREATE TABLE events_20160801 ( CHECK (created_at >= ‘2016-08-01 00:00:00’ AND created_at < ‘2016-08-02 00:00:00’) ) INHERITS (events); CREATE TABLE events_20160802 ( CHECK (created_at >= ‘2016-08-02 00:00:00’ AND created_at < ‘2016-08-03 00:00:00’) ) INHERITS (events);

Our master table has been defined as events and we have two tables out in the future that are ready to accept data, events_20160801 and events_20160802. We’ve also put CHECK constraints on them to make sure that only data for that day ends up on that partition. Now we need to create a trigger to make sure that any data entered on the master table gets directed to the correct partition:

CREATE OR REPLACE FUNCTION event_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.created_at >= ‘2016-08-01 00:00:00'AND NEW.created_at < ‘2016-08-02 00:00:00' ) THEN INSERT INTO events_20160801 VALUES (NEW.*); ELSIF ( NEW.created_at >= ‘2016-08-02 00:00:00'AND NEW.created_at < ‘2016-08-03 00:00:00' ) THEN INSERT INTO events_20160802 VALUES (NEW.*); ELSE RAISE EXCEPTION 'Date out of range. Fix the event_insert_trigger() function!'; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER insert_event_trigger BEFORE INSERT ON event FOR EACH ROW EXECUTE PROCEDURE event_insert_trigger();

Great! The partitions have been created, the trigger function defined, and the trigger has been added to the events table. At this point, my application can insert data on the events table and the data can be directed to the appropriate partition.

Unfortunately, utilizing table partitioning is a very manual setup fraught with chances for failure. It requires us to go into the database every so often to update the partitions and the trigger, and we haven’t even talked about removing old data from the database yet. This is where pg_partman comes in.

pg_partman is a partition management extension for Postgres that makes the process of creating and managing table partitions easier for both time and serial-based table partition sets. Compared to partitioning a table manually, pg_partman makes it much easier to partition a table and reduce the code necessary to run partitioning outside of the database. Let’s run through an example of doing this from scratch:

First, let’s load the extension and create our events table. If you already have a big table defined, the pg_partman documentation has guidance for how to convert that table into one that’s using table partitioning.

$ heroku pg:psql -a sushi sushi::DATABASE=> CREATE EXTENSION pg_partman; sushi::DATABASE=> CREATE TABLE events ( id bigint, name text, properities jsonb, created_at timestamptz );

Let’s reuse our assumptions that we made about our event data we made earlier. We’ve got 10 million events that are created a day and our queries really need aggregation on a daily basis. Because of this we’re going to create daily partitions.

sushi::DATABASE=> SELECT create_parent('public.events', 'created_at', 'time', 'daily');

This command is telling pg_partman that we’re going to use time-series based partitioning, created_at is going to be the column we use for partitioning, and we want to partition on a daily basis for our master events table. Amazingly, everything that was done to manually set up partitioning is completed in this one command. But we’re not finished, we need to make sure that on regular intervals maintenance is run on the partitions so that new tables get created and old ones get removed.

sushi::DATABASE=> SELECT run_maintenance();

The run_maintenance() command will instruct pg_partman to look through all of the tables that were partitioned and identify if new partitions should be created and old partitions destroyed. Whether or not a partition should be destroyed is determined by the retention configuration options. While this command can be run via a terminal session, we need to set this up to run on a regular basis. This is a great opportunity to use Heroku Scheduler to accomplish the task.

This command will run on an hourly basis to double check the partitions in the database. Checking the partitioning on an hourly basis might be a bit overkill in this scenario but since Heroku Scheduler is a best effort service, running it hourly is not going to cause any performance impacts on the database.

That’s it! We’ve set up table partitioning in Heroku Postgres and it will be running on its own with very little maintenance on our part. This setup only scratches the surface of what’s possible with pg_partman. Check out the extension’s documentation for the details of what’s possible.

Table partitioning allows you to break out one very large table into many smaller tables dramatically increasing performance. As pointed out in the 'Setting up Partitioning Manually' section, many challenges exist when trying to create and use table partitioning on your own but pg_partman can ease that operational burden. Despite that, table partitioning shouldn’t be the first solution you reach for when you run into problems. A number of questions should be asked to determine if table partitioning is the right fit:

Do you have a sufficiently large data set stored in one table, and do you expect it to grow significantly over time? Is the data immutable, that is, will it never updated after being initially inserted? Have you done as much optimization as possible on the big table with indexes? Do you have data that has little value after a period of time? Is there a small range of data that has to be queried to get the results needed? Can data that has little value be archived to a slower, cheaper storage medium, or can the older data be stored in aggregate or “rolled up”?

If you answered yes to all of these questions, table partitioning could make sense for you. The big caveat is that table partitioning requires you to evaluate how you’re querying your data. This is a big departure from designing a schema and optimizing it as you go, table partitioning requires you to plan ahead and consider your usage patterns. So long as you take these factors into account, table partitioning can create very real performance gains for your queries and your application.

In situations where you have high volumes of data that has a very short shelf life, days, weeks or even months, table partitioning can make lots of sense. As always, make sure you ask how you’re going to query your data now and into the future. Table partitioning won’t handle everything for you but it will at least allow you to extend the life of your Heroku Postgres installation. If you’re using pg_partman, we’d love to hear about it. Email us at postgres@heroku.com or via Twitter @heroku.