One of the things Postgres has been “missing” for a while is logical replication based on activity replay. Until fairly recently, in order to replicate single tables from one database to another, we had to encumber the table with performance-robbing triggers coupled to a third party daemon to manage transport. Those days might finally be behind us thanks to pglogical.

But is it easy to use? Let’s give it a try on our trusty sensor_log table. First, we’ll need to create everything on the donor node:

CREATE TABLE sensor_log ( id SERIAL PRIMARY KEY NOT NULL , location VARCHAR NOT NULL , reading BIGINT NOT NULL , reading_date TIMESTAMP NOT NULL ) ; INSERT INTO sensor_log ( location , reading , reading_date ) SELECT s . id % 1000 , s . id % 100 , CURRENT_DATE - ( s . id || 's' ) :: INTERVAL FROM generate_series ( 1 , 1000000 ) s ( id ) ; CREATE EXTENSION pglogical; SELECT pglogical . create_node ( node_name : = 'prod_sensors' , dsn : = 'host=localhost port=5432 dbname=postgres' ) ; SELECT pglogical . create_replication_set ( set_name : = 'logging' , replicate_insert : = TRUE , replicate_update : = FALSE , replicate_delete : = FALSE , replicate_truncate : = FALSE ) ; SELECT pglogical . replication_set_add_table ( set_name : = 'logging' , relation : = 'sensor_log' , synchronize_data : = TRUE ) ; CREATE TABLE sensor_log ( id SERIAL PRIMARY KEY NOT NULL, location VARCHAR NOT NULL, reading BIGINT NOT NULL, reading_date TIMESTAMP NOT NULL ); INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 1000, s.id % 100, CURRENT_DATE - (s.id || 's')::INTERVAL FROM generate_series(1, 1000000) s(id); CREATE EXTENSION pglogical; SELECT pglogical.create_node( node_name := 'prod_sensors', dsn := 'host=localhost port=5432 dbname=postgres' ); SELECT pglogical.create_replication_set( set_name := 'logging', replicate_insert := TRUE, replicate_update := FALSE, replicate_delete := FALSE, replicate_truncate := FALSE ); SELECT pglogical.replication_set_add_table( set_name := 'logging', relation := 'sensor_log', synchronize_data := TRUE );

After all of that, we have a replication set containing a single table representing one million rows of sensor data. At this point, any number of subscribers could connect to the replication set and request its contents. After that, all inserts would also be replayed on the subscriber once they are detected in the Postgres transaction log.

In a second Postgres instance, we would do this to “consume” the table:

CREATE TABLE sensor_log ( id INT PRIMARY KEY NOT NULL , location VARCHAR NOT NULL , reading BIGINT NOT NULL , reading_date TIMESTAMP NOT NULL ) ; CREATE EXTENSION pglogical; SELECT pglogical . create_node ( node_name : = 'sensor_warehouse' , dsn : = 'host=localhost port=5999 dbname=postgres' ) ; SELECT pglogical . create_subscription ( subscription_name : = 'wh_sensor_data' , replication_sets : = array [ 'logging' ] , provider_dsn : = 'host=localhost port=5432 dbname=postgres' ) ; SELECT pg_sleep ( 5 ) ; SELECT COUNT ( * ) FROM sensor_log; COUNT --------- 1000000 CREATE TABLE sensor_log ( id INT PRIMARY KEY NOT NULL, location VARCHAR NOT NULL, reading BIGINT NOT NULL, reading_date TIMESTAMP NOT NULL ); CREATE EXTENSION pglogical; SELECT pglogical.create_node( node_name := 'sensor_warehouse', dsn := 'host=localhost port=5999 dbname=postgres' ); SELECT pglogical.create_subscription( subscription_name := 'wh_sensor_data', replication_sets := array['logging'], provider_dsn := 'host=localhost port=5432 dbname=postgres' ); SELECT pg_sleep(5); SELECT count(*) FROM sensor_log; count --------- 1000000

We actually did a few things here, so let’s break it down. We began by creating the table structure itself. This will act as a container for the incoming data. Then we created the node like we did on the origin, and then subscribed to the provider itself.

We should note that it wasn’t strictly required to create the table beforehand. The create_subscription function has a parameter called synchronize_structure that uses pg_dump to obtain the table DDL during the subscription phase for replay on the subscriber. Unfortunately, this operation does not restrict itself to the tables in the replication set for some reason. As a result, any conflicting objects in the existing schema will cause the table import to fail.

Once we’ve established the subscription, we merely need to wait a few seconds for the initial data copy to complete. This is only one million rows, so we don’t have to wait very long. After that, all subsequent inserts should also show up on this subscriber.

One really cool thing about pglogical is that it takes advantage of Postgres 9.4+ background workers. This means there’s no external daemon sitting around watching a queue, or the transaction logs, or any other source. There’s an actual Postgres backend supervising the Postgres replication stream, and it will capture applicable content for the sensor_log table. The extension becomes the management daemon as if it were a native Postgres feature.

And it gets even better. For those of us that rely on large warehouses that might accumulate data from one or more active production locations for several years, this is something of a golden fleece. Trigger-based replication falls flat here because synchronization means synchronization. If we need a table to just sit and accumulate data, it was ETL, ad-hoc copy scripts, or nothing. But what if we set up our sensor_log table on the subscriber just a bit differently?

SELECT pglogical . drop_subscription ( subscription_name : = 'wh_sensor_data' ) ; TRUNCATE TABLE sensor_log; CREATE TABLE sensor_log_part ( LIKE sensor_log INCLUDING INDEXES ) INHERITS ( sensor_log ) ; CREATE OR REPLACE FUNCTION f_redirect_sensor_log ( ) RETURNS TRIGGER AS $$ BEGIN INSERT INTO sensor_log_part VALUES ( NEW .* ) ; PERFORM 1 ; RETURN NULL ; END ; $$ LANGUAGE plpgsql; CREATE TRIGGER t_redirect_sensor_log_bi BEFORE INSERT ON sensor_log FOR EACH ROW EXECUTE PROCEDURE f_redirect_sensor_log ( ) ; ALTER TABLE sensor_log ENABLE ALWAYS TRIGGER t_redirect_sensor_log_bi; SELECT pglogical . create_subscription ( subscription_name : = 'wh_sensor_data' , replication_sets : = array [ 'logging' ] , provider_dsn : = 'host=localhost port=5432 dbname=postgres' ) ; SELECT pg_sleep ( 5 ) ; SELECT COUNT ( * ) FROM ONLY sensor_log; COUNT ------- 0 SELECT COUNT ( * ) FROM sensor_log; COUNT --------- 1000000 SELECT pglogical.drop_subscription( subscription_name := 'wh_sensor_data' ); TRUNCATE TABLE sensor_log; CREATE TABLE sensor_log_part ( LIKE sensor_log INCLUDING INDEXES ) INHERITS (sensor_log); CREATE OR REPLACE FUNCTION f_redirect_sensor_log() RETURNS TRIGGER AS $$ BEGIN INSERT INTO sensor_log_part VALUES (NEW.*); PERFORM 1; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER t_redirect_sensor_log_bi BEFORE INSERT ON sensor_log FOR EACH ROW EXECUTE PROCEDURE f_redirect_sensor_log(); ALTER TABLE sensor_log ENABLE ALWAYS TRIGGER t_redirect_sensor_log_bi; SELECT pglogical.create_subscription( subscription_name := 'wh_sensor_data', replication_sets := array['logging'], provider_dsn := 'host=localhost port=5432 dbname=postgres' ); SELECT pg_sleep(5); SELECT count(*) FROM ONLY sensor_log; count ------- 0 SELECT count(*) FROM sensor_log; count --------- 1000000

This is where the magic really resides. Instead of inserting data into the table itself, we were able to redirect it into a table partition. This is important, because some of the larger Postgres warehouses rely on partitions to distribute maintenance and risk. Instead of one single 4TB table, we might have a dozen 350GB tables.

This proof of concept suggests pglogical is compatible with existing partition systems. And that is important if we’re trying to simplify our stack by removing complex ETL processes. Since we created the replication set to only capture insert activity, we can purge the origin of old data as frequently as we wish, and it will remain in the warehouse indefinitely.

This comes with one caveat, though. Notice that last ALTER TABLE statement where we marked our trigger to always fire? This is necessary because pglogical uses copy mechanisms that otherwise circumvent trigger logic. If we didn’t include that statement, the sync would have dumped all of the rows into the base sensor_log table instead of our target partition.

This last detail raises an important question: are common partition management extensions like pg_partman really compatible with this use case? Automated partition management libraries must regularly modify the underlying trigger to ensure current data is redirected into the appropriate partition. In this case, the answer is highly reliant on how the triggers are redefined.

If partition management systems simply replace the underlying function, all is safe. The trigger definition itself was not dropped and recreated without the ALWAYS modification. This would be the preferred method since it doesn’t require a lock on the table to recreate the trigger definition. Unfortunately this is only an assumption. Using the “correct” approach also doesn’t prevent the extra step of manually marking the triggers as ALWAYS post-creation.

For pg_partman, we’d do something like this after initial partition setup on the warehouse node:

DO $$ DECLARE TABLE_NAME VARCHAR ; trigger_name VARCHAR ; BEGIN FOR TABLE_NAME , trigger_name IN SELECT tgrelid::regclass::text , tgname FROM pg_trigger WHERE tgname LIKE '% \_ part \_ trig' LOOP EXECUTE 'ALTER TABLE ' || TABLE_NAME || ' ENABLE ALWAYS TRIGGER ' || trigger_name; END LOOP; END ; $$ LANGUAGE plpgsql; DO $$ DECLARE table_name VARCHAR; trigger_name VARCHAR; BEGIN FOR table_name, trigger_name IN SELECT tgrelid::regclass::text, tgname FROM pg_trigger WHERE tgname LIKE '%\_part\_trig' LOOP EXECUTE 'ALTER TABLE ' || table_name || ' ENABLE ALWAYS TRIGGER ' || trigger_name; END LOOP; END; $$ LANGUAGE plpgsql;

Gross. This also precludes any new tables from being added to the subscription set without this kind of interception. It would be better if popular partition libraries offered a parameter for always enforcing triggers. Or perhaps a more universal approach like some kind of post-definition hook for executing an arbitrary function that could handle advanced use cases. I promise I’m not picking on pg_partman! This applies even to not-so-popular partition extensions like my own tab_tier.

It’s a crazy world out there in extension land. Apparently we have to keep our eyes peeled for exciting new functionality like those pglogical introduces, and any of the ensuing implications for potentially related extensions. Either way, our warehouses are happier for it!