(This page is currently under development ahead of the release of PostgreSQL 9.5)

This page contains an overview of PostgreSQL Version 9.5's features, including descriptions, testing and usage information, and links to blog posts containing further information. See also PostgreSQL 9.5 Open Items.

Major new features

IMPORT FOREIGN SCHEMA

Previously, in order to create a foreign table in PostgreSQL, you would need to define the table, referencing the destination columns and data types, and if you have a lot of tables, this can become tedious and error-prone, and when those tables change, you need to do it all over again...

CREATE FOREIGN TABLE remote.customers ( id int NOT NULL, name text, company text, registered_date date, expiry_date date, active boolean, status text, account_level text) SERVER dest_server OPTIONS (schema_name 'public'); CREATE FOREIGN TABLE remote.purchases ( id int NOT NULL, purchase_time timestamptz, payment_time timestamptz, itemid int, volume int, invoice_sent boolean) SERVER dest_server OPTIONS (schema_name 'public');

As of PostgreSQL 9.5, you can import tables en masse:

IMPORT FOREIGN SCHEMA public FROM SERVER dest_server INTO remote;

This would create foreign tables in the schema named "remote" for every table that appeared in the public schema on the foreign server labelled "dest_server".

You can also filter out any tables you don't wish:

IMPORT FOREIGN SCHEMA public EXCEPT (reports, audit) FROM SERVER dest_server INTO remote;

Or limit it to just a specific set of tables:

IMPORT FOREIGN SCHEMA public LIMIT TO (customers, purchases) FROM SERVER dest_server INTO remote;

Waiting for 9.5 – Implement IMPORT FOREIGN SCHEMA.

PostgreSQL 9.5: IMPORT FOREIGN SCHEMA

Row-Level Security Policies

Additional security can be added to tables to prevent users from accessing rows they shouldn't be able to see.

Say you had a table with log data, where the username column contained the database user name which created the log entry:

CREATE TABLE log ( id serial primary key, username text, log_event text);

But you don't want users to see the log entries from other users, so we create a policy that says you're allowed to see the row if the username column matches the current user running the query:

CREATE POLICY policy_user_log ON log FOR ALL TO PUBLIC USING (username = current_user);

And then we enable Row Level Security on the table:

ALTER TABLE log ENABLE ROW LEVEL SECURITY;

As the user "report", we would then only see rows where the username column contained the value 'report':

# SELECT * FROM log; id | username | log_event ----+----------+---------------- 1 | report | DELETE issued 4 | report | Reset accounts (2 rows)

As the user "messaging", we see a different set of rows:

id | username | log_event ----+-----------+---------------------- 2 | messaging | Message queue purged 3 | messaging | Reset accounts (2 rows)

Whereas the "postgres" user, as the superuser would get:

id | username | log_event ----+-----------+---------------------- 1 | report | DELETE issued 2 | messaging | Message queue purged 3 | messaging | Reset accounts 4 | report | Reset accounts (4 rows)

That's because the superuser sees all rows due to the BYPASSRLS attribute on the superuser role by default.

If you have multiple policies, they are all OR'd together. For example, if you had the following 2 policies:

CREATE POLICY policy_user_log ON log FOR ALL TO PUBLIC USING (username = current_user);

CREATE POLICY policy_even_ids_only ON log FOR ALL TO PUBLIC USING (id % 2 = 0);

This would effectively result in the following being added to the WHERE clause of any statement:

WHERE ((username = current_user) OR (id % 2 = 0))

So as long as it matches one policy, it passes, so as the report user mentioned above, we'd now see the following:

id | username | log_event ----+-----------+---------------------- 1 | report | DELETE issued 2 | messaging | Message queue purged 4 | report | Reset accounts (3 rows)

The additional policy now allows the row where the id is 2 because it matches the new policy. The first row doesn't match the new policy, but it's returned because it matches the first. So polices are permissive rather than restrictive.

Waiting for 9.5 – Row-Level Security Policies (RLS)

BRIN Indexes

BRIN stands for Block Range INdexes, and store metadata on a range of pages. At the moment this means the minimum and maximum values per block.

This results in an inexpensive index that occupies a very small amount of space, and can speed up queries in extremely large tables. This allows the index to determine which blocks are the only ones worth checking, and all others can be skipped. So if a 10GB table of order contained rows that were generally in order of order date, a BRIN index on the order_date column would allow the majority of the table to be skipped rather than performing a full sequential scan. This will still be slower than a regular BTREE index on the same column, but with the benefits of it being far smaller and requires less maintenance.

For example:

-- Create the table CREATE TABLE orders ( id int, order_date timestamptz, item text);

-- Insert lots of data into it INSERT INTO orders (order_date, item) SELECT x, 'dfiojdso' FROM generate_series('2000-01-01 00:00:00'::timestamptz, '2015-03-01 00:00:00'::timestamptz,'2 seconds'::interval) a(x);

-- Let's look at how much space the table occupies # \dt+ orders List of relations Schema | Name | Type | Owner | Size | Description --------+--------+-------+-------+-------+------------- public | orders | table | thom | 13 GB | (1 row)

-- What's involved in finding the orders between 2 dates # EXPLAIN ANALYSE SELECT count(*) FROM orders WHERE order_date BETWEEN '2012-01-04 09:00:00' and '2014-01-04 14:30:00'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=5425021.80..5425021.81 rows=1 width=0) (actual time=30172.428..30172.429 rows=1 loops=1) -> Seq Scan on orders (cost=0.00..5347754.00 rows=30907121 width=0) (actual time=6050.015..28552.976 rows=31589101 loops=1) Filter: ((order_date >= '2012-01-04 09:00:00+00'::timestamp with time zone) AND (order_date <= '2014-01-04 14:30:00+00'::timestamp with time zone)) Rows Removed by Filter: 207652500 Planning time: 0.140 ms Execution time: 30172.482 ms (6 rows)

-- Now let's create a BRIN index on the order_date column CREATE INDEX idx_order_date_brin ON orders USING BRIN (order_date);

-- And see how much space it takes up # \di+ idx_order_date_brin List of relations Schema | Name | Type | Owner | Table | Size | Description --------+---------------------+-------+-------+--------+--------+------------- public | idx_order_date_brin | index | thom | orders | 504 kB | (1 row)

-- Now let's see how much faster the query is with this very small index # EXPLAIN ANALYSE SELECT count(*) FROM orders WHERE order_date BETWEEN '2012-01-04 09:00:00' and '2014-01-04 14:30:00'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=2616868.60..2616868.61 rows=1 width=0) (actual time=6347.651..6347.651 rows=1 loops=1) -> Bitmap Heap Scan on orders (cost=316863.99..2539600.80 rows=30907121 width=0) (actual time=36.366..4686.634 rows=31589101 loops=1) Recheck Cond: ((order_date >= '2012-01-04 09:00:00+00'::timestamp with time zone) AND (order_date <= '2014-01-04 14:30:00+00'::timestamp with time zone)) Rows Removed by Index Recheck: 6419 Heap Blocks: lossy=232320 -> Bitmap Index Scan on idx_order_date_brin (cost=0.00..309137.21 rows=30907121 width=0) (actual time=35.567..35.567 rows=2323200 loops=1) Index Cond: ((order_date >= '2012-01-04 09:00:00+00'::timestamp with time zone) AND (order_date <= '2014-01-04 14:30:00+00'::timestamp with time zone)) Planning time: 0.108 ms Execution time: 6347.701 ms (9 rows)

This example is on an SSD drive, so the results would be even more pronounced on an HDD.

By default, the block size is 128 pages. This resolution can be increased or decreased using the pages_per_range:

-- Create an index with 32 pages per block CREATE INDEX idx_order_date_brin_32 ON orders USING BRIN (order_date) WITH (pages_per_range = 32); -- Create an index with 512 pages per block CREATE INDEX idx_order_date_brin_512 ON orders USING BRIN (order_date) WITH (pages_per_range = 512);

The lower the pages per block, the more space the index will occupy, but the less lossy the index will be, i.e. it will need to discard fewer rows.

# \di+ idx_order_date_brin* List of relations Schema | Name | Type | Owner | Table | Size | Description --------+-------------------------+-------+-------+--------+---------+------------- public | idx_order_date_brin | index | thom | orders | 504 kB | public | idx_order_date_brin_32 | index | thom | orders | 1872 kB | public | idx_order_date_brin_512 | index | thom | orders | 152 kB | (3 rows)

Waiting for 9.5 – BRIN: Block Range Indexes.

Foreign Table Inheritance

Foreign tables can now either inherit local tables, or be inherited from.

For example, a local table can inherit a foreign table:

-- Create a new table which inherits from the foreign table # CREATE TABLE local_customers () INHERITS (remote.customers); -- Insert some data into it # INSERT INTO local_customers VALUES (16, 'Bruce',$$Jo's Cupcakes$$, '2015-01-15', '2017-01-14', true, 'running', 'basic'); -- And if we query the parent foreign table... # SELECT tableoid::regclass, * FROM remote.customers; tableoid | id | name | company | registered_date | expiry_date | active | status | account_level ------------------+----+-------+---------------+-----------------+-------------+--------+---------+--------------- remote.customers | 1 | James | Hughson Corp | 2014-05-03 | 2016-05-02 | t | idle | premium local_customers | 16 | Bruce | Jo's Cupcakes | 2015-01-15 | 2017-01-14 | t | running | basic (2 rows)

Or a foreign table can be made to inherit from a local table:

-- Create a new table that the foreign table will be a child of # CREATE TABLE master_customers (LIKE remote.customers); -- Insert a new row into this table # INSERT INTO master_customers VALUES (99, 'Jolly',$$Cineplanet$$, '2014-10-30', '2016-10-29', true, 'running', 'premium'); -- Have the foreign table inherit from the new table # ALTER TABLE remote.customers INHERIT master_customers; -- Let's have a look at the contents of the new table now # SELECT tableoid::regclass, * FROM master_customers; tableoid | id | name | company | registered_date | expiry_date | active | status | account_level ------------------+----+-------+---------------+-----------------+-------------+--------+---------+--------------- master_customers | 99 | Jolly | Cineplanet | 2014-10-30 | 2016-10-29 | t | running | premium remote.customers | 1 | James | Hughson Corp | 2014-05-03 | 2016-05-02 | t | idle | premium local_customers | 16 | Bruce | Jo's Cupcakes | 2015-01-15 | 2017-01-14 | t | running | basic (3 rows) -- And the query plan... # EXPLAIN ANALYSE SELECT tableoid::regclass, * FROM master_customers; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Result (cost=0.00..140.80 rows=1012 width=145) (actual time=0.014..0.595 rows=3 loops=1) -> Append (cost=0.00..140.80 rows=1012 width=145) (actual time=0.012..0.591 rows=3 loops=1) -> Seq Scan on master_customers (cost=0.00..1.48 rows=48 width=145) (actual time=0.012..0.013 rows=1 loops=1) -> Foreign Scan on customers (cost=100.00..124.52 rows=484 width=145) (actual time=0.567..0.567 rows=1 loops=1) -> Seq Scan on local_customers (cost=0.00..14.80 rows=480 width=145) (actual time=0.007..0.008 rows=1 loops=1) Planning time: 0.256 ms Execution time: 1.040 ms (7 rows)

Waiting for 9.5 – Allow foreign tables to participate in inheritance. – A.K.A. PostgreSQL got sharding.

Postgres 9.5 feature highlight: Scale-out with Foreign Tables now part of Inheritance Trees

GROUPING SETS, CUBE and ROLLUP

This set of features allows one to summarise data into sets.

For example, if we have this data:

# SELECT * FROM employees; name | role | department | gender ----------+-----------------+------------+-------- Tim | Manager | Sales | Male Sarah | Manager | Finance | Female Neil | Accountant | Finance | Male Joe | Project Manager | Sales | Male Yvette | Project Manager | Finance | Female Craig | Project Manager | IT | Male Penelope | Manager | IT | Female (7 rows)

If we wanted to see summaries for each department, role and gender, we can use GROUPING SETS:

# SELECT department, role, gender, count(*) FROM employees GROUP BY GROUPING SETS (department, role, gender, ()); department | role | gender | count ------------+-----------------+-----------+------- Finance | | | 3 IT | | | 2 Sales | | | 2 | | | 7 | | Female | 3 | | Male | 4 | Accountant | | 1 | Manager | | 3 | Project Manager | | 3 (9 rows)



Here we can see the count of employees in each department, each role, and each gender. We also get a total where all columns except count are blank.

If we wanted a count for every combination of those 3 categories, we could use CUBE:

# SELECT department, role, gender, count(*) FROM employees GROUP BY CUBE (department, role, gender); department | role | gender | count ------------+-----------------+-----------+------- Finance | Accountant | Male | 1 Finance | Accountant | | 1 Finance | Manager | Female | 1 Finance | Manager | | 1 Finance | Project Manager | Female | 1 Finance | Project Manager | | 1 Finance | | | 3 IT | Manager | Female | 1 IT | Manager | | 1 IT | Project Manager | Male | 1 IT | Project Manager | | 1 IT | | | 2 Sales | Manager | Male | 1 Sales | Manager | | 1 Sales | Project Manager | Male | 1 Sales | Project Manager | | 1 Sales | | | 2 | | | 7 Finance | | Female | 2 IT | | Female | 1 | | Female | 3 Finance | | Male | 1 IT | | Male | 1 Sales | | Male | 2 | | Male | 4 | Accountant | Male | 1 | Accountant | | 1 | Manager | Female | 2 | Manager | Male | 1 | Manager | | 3 | Project Manager | Female | 1 | Project Manager | Male | 2 | Project Manager | | 3 (33 rows)

So we get counts for every combination of all values. If we wanted to ensure columns are grouped in sequence, where we only summarise from the left to right, we'd use ROLLUP:

# SELECT department, role, gender, count(*) FROM employees GROUP BY ROLLUP (department, role, gender); department | role | gender | count ------------+-----------------+-----------+------- Finance | Accountant | Male | 1 Finance | Accountant | | 1 Finance | Manager | Female | 1 Finance | Manager | | 1 Finance | Project Manager | Female | 1 Finance | Project Manager | | 1 Finance | | | 3 IT | Manager | Female | 1 IT | Manager | | 1 IT | Project Manager | Male | 1 IT | Project Manager | | 1 IT | | | 2 Sales | Manager | Male | 1 Sales | Manager | | 1 Sales | Project Manager | Male | 1 Sales | Project Manager | | 1 Sales | | | 2 | | | 7 (18 rows)

So we don't get summaries per role or per gender except when used in combination with the previous columns.

These are just basic examples. Far more complicated configurations are possible.

Waiting for Grouping Sets

JSONB-modifying operators and functions

In 9.3 (and to a greater extent in 9.4), JSONB data could be extracted using various functions and operators, but nothing that could actually modify the data. As of 9.5, JSONB data can now be modified.

jsonb || jsonb (concatenate / overwrite)

The || operator allows us to combine 2 jsonb objects. If there's overlap, values are replaced on the highest level.

For example, if we want to add values to a jsonb object:

# SELECT '{"name": "Joe", "age": 30}'::jsonb || '{"town": "London"}'::jsonb; ?column? ---------------------------------------------- {"age": 30, "name": "Joe", "town": "London"} (1 row)



Or we can overwrite existing values:

# SELECT '{"town": "Dataville", "population": 4096}'::jsonb || '{"population": 8192}'::jsonb; ?column? ------------------------------------------- {"town": "Dataville", "population": 8192} (1 row)

Note that this only works on the highest level, so nested objects are replaced from the top level. For example:

# SELECT '{"name": "Jane", "contact": {"phone": "01234 567890", "mobile": "07890 123456"}}'::jsonb || '{"contact": {"fax": "01987 654321"}}'::jsonb; ?column? ------------------------------------------------------ {"name": "Jane", "contact": {"fax": "01987 654321"}} (1 row)





jsonb - text / int (remove key / array element)

We can remove keys from a jsonb object with the - operator:

# SELECT '{"name": "James", "email": "james@localhost"}'::jsonb - 'email'; ?column? ------------------- {"name": "James"} (1 row)

Or remove values from an array (base 0):

# SELECT '["red","green","blue"]'::jsonb - 1; ?column? ----------------- ["red", "blue"] (1 row)

jsonb #- text[] / int (remove key / array element in path)

The previous example, we can remove keys or array elements, but not any lower than the highest level, so we can provide a path to the value we want to delete using a text array. Here we'll want to remove the fax number from within the contact value:

# SELECT '{"name": "James", "contact": {"phone": "01234 567890", "fax": "01987 543210"}}'::jsonb #- '{contact,fax}'::text[]; ?column? --------------------------------------------------------- {"name": "James", "contact": {"phone": "01234 567890"}} (1 row)

Or we can remove an array value. Here we'll get rid of the array value as index 1 (2nd value):

# SELECT '{"name": "James", "aliases": ["Jamie","The Jamester","J Man"]}'::jsonb #- '{aliases,1}'::text[]; ?column? -------------------------------------------------- {"name": "James", "aliases": ["Jamie", "J Man"]} (1 row)

jsonb_set function

The above lets us delete values in a path, but not update them, so we have the jsonb_set function for that. We'll update the phone value within the contact value:

# SELECT jsonb_set('{"name": "James", "contact": {"phone": "01234 567890", "fax": "01987 543210"}}'::jsonb, '{contact,phone}', '"07900 112233"'::jsonb); jsonb_replace -------------------------------------------------------------------------------- {"name": "James", "contact": {"fax": "01987 543210", "phone": "07900 112233"}} (1 row)

The anatomy of this function is:

jsonb_set( target jsonb, # The jsonb value you're amending. path text[], # The path to the value you wish to add to or change, represented as a text array. new_value jsonb, # The new object, key : value pair or array value(s) to add to or change. create_missing boolean # An optional field that, if true (default), creates the value if the key doesn't already exist. # If false, the path must exist for the update to happen, or the value won't be updated. )

If the target key doesn't exist, and you specify true as a fourth parameter (the default), it will be appended at the location specified by the path:

# SELECT jsonb_set('{"name": "James", "contact": {"phone": "01234 567890", "fax": "01987 543210"}}'::jsonb, '{contact,skype}', '"myskypeid"'::jsonb, true); jsonb_set ------------------------------------------------------------------------------------------------------ {"name": "James", "contact": {"fax": "01987 543210", "phone": "01234 567890", "skype": "myskypeid"}} (1 row)

The same function call, but using false instead of true will not update in this case because there's no pre-existing key to update.

# SELECT jsonb_set( '{"name": "James", "contact": {"phone": "01234 567890", "fax": "01987 543210"}}'::jsonb, '{contact,skype}', '"myskypeid"'::jsonb, false); jsonb_set -------------------------------------------------------------------------------- {"name": "James", "contact": {"fax": "01987 543210", "phone": "01234 567890"}} (1 row)

Or you can set the value of array elements by specifying its position. The position start from 0, so the 2nd element is 1. Negative values start from the end backward, so the last element is -1, 2nd to last is -2.

# SELECT jsonb_set('{"name": "James", "skills": ["design","snowboarding","mechnaicalengineering"]}', '{skills,2}', '"mechanical engineering"'::jsonb, true); jsonb_set ----------------------------------------------------------------------------------- {"name": "James", "skills": ["design", "snowboarding", "mechanical engineering"]} (1 row)



So the positions in this array can be referred to like so:

0 1 2 # positive values -3 -2 -1 # negative values ["design", "snowboarding", "mechanical engineering"]

With this array, -4 or anything lower would append to the beginning. 3 or anything higher would append to the end.

jsonb_pretty

Notice that jsonb doesn't preserve white-space, so no matter how much effort you put in to make the object easier to read, it will end up as a long string. Well jsonb_pretty will format it for you. If we use the previous jsonb example and wrap it all in a jsonb_pretty function:

# SELECT jsonb_pretty(jsonb_set('{"name": "James", "contact": {"phone": "01234 567890", "fax": "01987 543210"}}'::jsonb, '{contact,phone}', '"07900 112233"'::jsonb)); jsonb_pretty --------------------------------- { + "name": "James", + "contact": { + "fax": "01987 543210", + "phone": "07900 112233"+ } + } (1 row)

Much easier to read.

9.5 brings support for "UPSERT" operations.

INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.

For example, if we have a simple table with user accounts logins where we wanted to track the number of times that user had logged in:

# SELECT username, logins FROM user_logins; username | logins ----------+-------- James | 4 Lois | 2 (2 rows)

And we wanted to add 2 new logins, normally we'd have a problem if the primary key (or unique constraint) was violated:

# INSERT INTO user_logins (username, logins) VALUES ('Naomi',1),('James',1); ERROR: duplicate key value violates unique constraint "users_pkey" DETAIL: Key (username)=(James) already exists.

Unlike approaches using a Common Table Expression, the new command has no race conditions, guaranteeing either an insert or an update (provided there is no incidental error). ON CONFLICT DO UPDATE safely guarantees "insert-or-update" semantics, with no risk of the statement failing to perform one of those two actions for each row proposed for insertion (unless there was an independent error). Unlike SQL MERGE as implemented in other database systems, there is no implementation or semantic restriction that burdens users with having to consider that their UPSERT might fail due to a concurrent operation. In short, the new "UPSERT" feature is designed to be both as simple and as safe as possible.

Further example of new syntax:

INSERT INTO user_logins (username, logins) VALUES ('Naomi',1),('James',1) ON CONFLICT (username) DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;

Now let's look at what happened:

# SELECT username, logins FROM user_logins; username | logins ----------+-------- Lois | 2 Naomi | 1 James | 5 (3 rows)

We have a new row for Naomi, which shows her having logged in once, but then we also have James whose logins value has incremented by one as specified by the UPDATE part of the statement. The UPDATE statement knows which rows it's updating based on the column or unique constraint that's being checked against.

Of course there are scenarios where you might want to insert a value into a table, but only if it's not there already.

Say we had a list of countries which would be used to constrain values in other tables:

# SELECT * FROM countries; country ----------- Australia Italy Japan UK USA (5 rows)

We want to add 2 more countries. If one or more of them already existed and violated the primary key (in this case the "country" column), we'd get an error:

# INSERT INTO countries (country) VALUES ('France'),('Japan'); ERROR: duplicate key value violates unique constraint "countries_pkey" DETAIL: Key (country)=(Japan) already exists.

But now we can tell it that a conflict is fine, and just DO NOTHING in those scenarios:

# INSERT INTO countries (country) VALUES ('France'),('Japan') ON CONFLICT DO NOTHING; INSERT 0 1

Now we should just have one additional country in our table:

# SELECT * FROM countries; country ----------- Australia Italy Japan UK USA France (6 rows)

If there were additional columns, that were also uniquely-constrained, we could specify the constraint or column that we want to apply the condition to, so that a "legitimate" conflict on another column produces an error.

So, to clarify our intent, we could have written the last example as:

# INSERT INTO countries (country) VALUES ('France'),('Japan') ON CONFLICT ON CONSTRAINT countries_pkey DO NOTHING;

or

# INSERT INTO countries (country) VALUES ('France'),('Japan') ON CONFLICT (country) DO NOTHING;

However, the former isn't really fully equivalent to the latter, because the "inference" based syntax is more robust against things like schema migrations that result in a new name for essentially the same constraint/business rule. Users should prefer the latter (unless it's unavoidable, as with exclusion constraints).

Note that providing multiple sets of conflict/update conditions isn't supported, so if a specific conflict is specified, but another conflict occurs instead, it will produce a conflict error like it would with a normal insert.

Upsert Lands in PostgreSQL 9.5 – a First Look

Youtube video - PostgreSQL 9.5's Upsert Feature Explained

pg_rewind

pg_rewind makes it possible to efficiently bring an old primary in sync with a new primary without having to perform a full base backup. This works by looking in the Write Ahead Log to see which pages have been modified, and only copying across those pages.

In this example, we have a primary (running on port 5530) and a standby subscribing to it (on port 5531):

# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 11609 usesysid | 16384 usename | rep_user application_name | standby1 client_addr | 127.0.0.1 client_hostname | client_port | 38434 backend_start | 2015-03-29 00:11:55.243319+00 backend_xmin | state | streaming sent_location | 0/C81BB40 write_location | 0/C81BB40 flush_location | 0/C81BB40 replay_location | 0/C81BB40 sync_priority | 0 sync_state | async

Now we'll promote the standby:

$ pg_ctl promote -D standby1 server promoting

And we'll make some changes on this instance:

$ psql -p 5531 postgres # CREATE TABLE x (content text); CREATE TABLE # INSERT INTO x SELECT 'test' FROM generate_series(1,1000); INSERT 0 1000

Now we'll stop old primary and use pg_rewind to re-synchronise it:

$ pg_ctl stop -D primary waiting for server to shut down.... done server stopped $ pg_rewind -D primary --source-server='host=localhost port=5531' -P connected to remote server The servers diverged at WAL position 0/C81BB40 on timeline 1. Rewinding from last common checkpoint at 0/2000060 on timeline 1 reading source file list reading target file list reading WAL in target Need to copy 274 MB (total source directory size is 290 MB) 281142/281142 kB (100%) copied creating backup label and updating control file Done!

And we'll make some changes to get it to subscribe to the new primary:

$ cd primary $ mv recovery.{done,conf} $ vi recovery.conf # edited to set host info to point to port 5531 in this case $ vi postgresql.conf # as our example instances are running on the same server, we'll just change the port so it doesn't conflict

Then start the new standby (old primary):

$ pg_ctl start -D primary

Let's see if it's successfully caught up:

$ psql -p 5531 postgres # connect to the new primary # SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 11837 usesysid | 16384 usename | rep_user application_name | standby1 client_addr | 127.0.0.1 client_hostname | client_port | 49148 backend_start | 2015-03-29 00:22:39.047657+00 backend_xmin | state | streaming sent_location | 0/C8559B0 write_location | 0/C8559B0 flush_location | 0/C8559B0 replay_location | 0/C855978 sync_priority | 0 sync_state | async

And see if the test data from the new primary is on the new standby:

$ psql -p 5530 postgres # connect to the new standby # SELECT COUNT(*) FROM x; count ------- 1000 (1 row)

All synchronised.

Waiting for 9.5 – Add pg_rewind, for re-synchronizing a master server after failback.

Other new features

ALTER TABLE ... SET LOGGED / UNLOGGED

PostgreSQL allows one to create tables which aren't written to the Write Ahead Log, meaning they aren't replicated or crash-safe, but also don't have the associated overhead, so are good for data that doesn't need the guarantees of regular tables. But if you decided an unlogged table should now be replicated, or a regular table should no longer be logged, you'd previously have to create a new copy of the table and copy the data across. But in 9.5, you can switch between logged and unlogged using a new command:

Set an unlogged table to logged:

ALTER TABLE <tablename> SET LOGGED;

Set a logged table to unlogged:

ALTER TABLE <tablename> SET UNLOGGED;

For example:

# CREATE UNLOGGED TABLE messages (id int PRIMARY KEY, message text);

# SELECT relname, CASE relpersistence WHEN 'u' THEN 'unlogged' WHEN 'p' then 'logged' ELSE 'unknown' END AS table_type FROM pg_class WHERE relname ~ 'messages*'; relname | table_type ---------------+------------ messages | unlogged messages_pkey | unlogged (2 rows)

Note that setting an unlogged table to logged will generate WAL which will contain all data in the table, so this would cause a spike in replication traffic for large tables. And now we change it to a logged table:

# ALTER TABLE messages SET LOGGED;

And the result of the previous query is now:

relname | table_type ---------------+------------ messages | logged messages_pkey | logged (2 rows)

SKIP LOCKED

If you need to SELECT from a table and protect those rows from being updated until your transaction has completed, you would specify FOR UPDATE, but if some rows are locked, you can specify SKIP LOCKED to tell it to simply ignore those rows and just perform the operation on whichever rows it can access.

For example:

In session 1:

# BEGIN; BEGIN

# SELECT * FROM colours; id | name ----+------- 1 | red 2 | green 3 | blue (3 rows)

# UPDATE colours SET name = 'scarlet' WHERE name = 'red'; CREATE TABLE

In session 2:

# SELECT * FROM colours FOR UPDATE NOWAIT; ERROR: could not obtain lock on row in relation "colours"

# SELECT * FROM colours FOR UPDATE SKIP LOCKED; id | name ----+------- 2 | green 3 | blue (2 rows)

Waiting for 9.5 – Implement SKIP LOCKED for row-level locks

Parallel VACUUMing

The vacuumdb utility now supports parallel jobs. This is specified with the -j option, just like when using pg_dump or pg_restore. This means vacuuming a database will complete a lot quicker, and especially so for cases where tables are spread across multiple tablespaces. It will also start vacuuming the largest relations first.

For example:

vacuumdb -j4 productiondb

This would vacuum the database named "productiondb" by spawning 4 vacuum jobs to run simultaneously.

Waiting for 9.5 – vacuumdb: enable parallel mode

A boolean configuration parameter named track_commit_timestamp is now available. "on" enables commit timestamp tracking. This makes it possible to see when a transaction was committed. (timestamp with timezone of the servertime)

To query that information, use the function pg_xact_commit_timestamp(transaction_id) to find out when rows were INSERTed/UPDATEed, call the pg_xact_commit_timestamp function passing in the xmin system column. Note that you have to set the parameter and restart before it can start tracking timestamps, so it won't show up until you do so:

# SELECT pg_xact_commit_timestamp(xmin), * FROM colours; pg_xact_commit_timestamp | id | name --------------------------+----+------- | 1 | red | 2 | green | 3 | blue (3 rows)

Then we edit postgresql.conf so that track_commit_timestamp = on, then restart PostgreSQL:

# INSERT INTO colours VALUES ('mauve'),('cyan'),('indigo'); INSERT # SELECT pg_xact_commit_timestamp(xmin), * FROM colours; pg_xact_commit_timestamp | id | name -------------------------------+----+-------- | 1 | red | 2 | green | 3 | blue 2015-10-02 11:16:34.678267+01 | 4 | mauve 2015-10-02 11:16:34.678267+01 | 5 | cyan 2015-10-02 11:16:34.678267+01 | 6 | indigo (6 rows)

You can see that the first 3 rows that were inserted before the parameter was enabled don't return the timestamp, so it only starts to keep track of timestamps for INSERTs/UPDATEs once the parameter has been enabled and the database cluster has been restarted.

There is also the set-returning function pg_last_committed_xact which tells you the timestamp of the last committed transaction:

# SELECT * FROM pg_last_committed_xact(); xid | timestamp ------+------------------------------- 2039 | 2015-10-02 11:16:34.678267+01 (1 row)

cluster_name

There's a new parameter in postgresql.conf called cluster_name. This allows a string to be used to identify the cluster in the process list, which is particularly useful when multiple clusters are running on the same server:

$ ps -ef | grep checkpointer postgres 12181 12178 0 11:12 ? 00:00:00 postgres: personnel: checkpointer process postgres 12207 12204 0 11:12 ? 00:00:00 postgres: reportsdb: checkpointer process postgres 12233 12230 0 11:12 ? 00:00:00 postgres: management: checkpointer process

Sorting optimization (Abbreviated Keys)

The abbreviated keys optimization can be expected to greatly enhance the performance of sorts in PostgreSQL, including those used for CREATE INDEX. Reportedly, in some cases, CREATE INDEX on text columns can be as much as an entire order of magnitude faster (3x is a more typical improvement). Numeric sorts also support the optimization.

Abbreviated keys will complement grouping sets very nicely. Reporting queries will be very significantly faster with PostgreSQL 9.5.

Abbreviated keys: exploiting locality to improve PostgreSQL's text sort performance

Abbreviated keys for numeric to accelerate numeric sorts

GiST Index-Only Scans

Previously, the only index access method that supported index-only scans was B-Tree and SP-GiST, but support is added for GiST in PostgreSQL 9.5:

In this example, we'll be using the btree_gist extension:

# CREATE EXTENSION btree_gist;

We'll set up a simple table that stores meeting room reservations:

# CREATE TABLE meetings ( id serial primary key, room int, reservation tstzrange);

Then we add an exclusion constraint to ensure that no booking for any one room overlaps another booking for the same room, which creates an index to enforce the constraint:

# ALTER TABLE meetings ADD CONSTRAINT meeting_exclusion EXCLUDE USING GIST (room with =, reservation with &&);

And we'll populate it with lots of test data:

# WITH RECURSIVE ins AS ( SELECT 1 AS room, '2000-01-01 08:00:00'::timestamptz AS reservation_start, (ceil(random()*24)*5 || ' minutes')::interval as duration UNION ALL SELECT CASE WHEN ins.reservation_start > now() THEN ins.room + 1 ELSE ins.room END AS room, CASE WHEN ins.reservation_start > now() THEN '2000-01-01 08:00:00'::timestamptz ELSE ins.reservation_start + ins.duration END AS reservation_start, (ceil(random()*16)*15 || ' minutes')::interval AS duration FROM ins WHERE reservation_start < now() + '1 day'::interval and room <= 200 ) INSERT INTO meetings (room, reservation) SELECT room, tstzrange(reservation_start, reservation_start + duration) FROM ins WHERE (reservation_start + duration)::time between '08:00' and '20:00';

One run of this results in 6.4 million rows.

If we get the query plan for counting how many meetings occurred during May in 2014 for each room:

# EXPLAIN SELECT room, count(*) FROM meetings WHERE reservation && '[2014-05-01,2014-05-31]'::tstzrange GROUP BY room ORDER BY room; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Sort (cost=1294.20..1294.21 rows=2 width=4) Sort Key: room -> HashAggregate (cost=1294.17..1294.19 rows=2 width=4) Group Key: room -> Index Only Scan using meeting_exclusion on meetings (cost=0.41..1113.98 rows=36038 width=4) Index Cond: (reservation && '["2014-05-01 00:00:00+01","2014-05-31 00:00:00+01"]'::tstzrange)



Prior to 9.5, we would get the following plan:

QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Sort (cost=28570.15..28570.16 rows=2 width=4) Sort Key: room -> HashAggregate (cost=28570.12..28570.14 rows=2 width=4) Group Key: room -> Bitmap Heap Scan on meetings (cost=778.49..28386.07 rows=36810 width=4) Recheck Cond: (reservation && '["2014-05-01 00:00:00+01","2014-05-31 00:00:00+01"]'::tstzrange) -> Bitmap Index Scan on meeting_exclusion (cost=0.00..769.29 rows=36810 width=0) Index Cond: (reservation && '["2014-05-01 00:00:00+01","2014-05-31 00:00:00+01"]'::tstzrange)

TABLESAMPLE

The TABLESAMPLE clause allows returning a randomly selected sample of the rows in the table. There are two sampling algorithms provided, SYSTEM and BERNOULLI. SYSTEM chooses random data pages and returns all of the rows on each page, and is thus quite fast. Bernoulli, which is slower, returns a more random set of rows.

Example:

SELECT * FROM profiles TABLESAMPLE SYSTEM ( 0.01 );

Lock scalability improvements

The locks acquired for most operations in postgres didn't scale well to many concurrent shared acquisitions. Despite only acquiring shared locks, frequently most of the backends were blocked acquiring a exclusive spinlock. This frequently lead to 100% CPU usage by a large number of sessions. Now shared lockers should not interfere much with each others. This particularly helps on servers with more than one socket.

[1]

Improved buffer replacement scalability

Before 9.5 every buffer replacement happened while holding a global lock. Now the hot path is lock-free, and the slow paths use a per buffer spinlock.

[2] [3]

Reduced per-backend memory usage

Before 9.5 every backend used an array containing an entry for each page shared_buffers to manage page reference counts. With large shared_buffers that could add tens to hundreds of megabytes to each session. Now a small, fixed size, array is used instead.

[4]

CustomScan interface and Join-pushdown infrastructure

CustomScan interface allows extensions to implement own logic to scan relation, also replace built-in logic if its estimated cost is more reasonable. Relevant infrastructure also allows to replace built-in join nodes, by foreign-scan (if both child nodes are managed by same foreign server) or custom-scan; that performs like a scan on remotely/preliminary joined relations.