On 8th of May, Andres Freund committed patch:

Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE. The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.

This is a dream-come-true for many. Lack of upsert was troubling to many, caused many questions on irc, and posts on blogs, including mine, of course.

Now, it's all over. With the new syntax we can forget about all of this.

So, let's see how it works.

CREATE TABLE test ( id serial PRIMARY KEY , some_key text NOT NULL UNIQUE , some_val int4 NOT NULL DEFAULT 0 ) ;

So, let's insert some rows, and see what we can do.

First, sanity checks:

INSERT INTO test ( some_key ) VALUES ( 'a' ) , ( 'b' ) ; SELECT * FROM test; id | some_key | some_val ----+----------+---------- 1 | a | 0 2 | b | 0 ( 2 ROWS )

With these two rows, we can do:

INSERT INTO test ( some_key ) VALUES ( 'a' ) ; ERROR: duplicate KEY VALUE violates UNIQUE CONSTRAINT "test_some_key_key" DETAIL: KEY ( some_key ) = ( a ) already EXISTS .

This, correctly, errored out. How about ignoring the error?

INSERT INTO test ( some_key ) VALUES ( 'a' ) ON CONFLICT DO NOTHING; INSERT 0 0 SELECT * FROM test; id | some_key | some_val ----+----------+---------- 1 | a | 0 2 | b | 0 ( 2 ROWS )

That's new. How about making update?

INSERT INTO test ( some_key ) VALUES ( 'a' ) ON CONFLICT DO UPDATE SET some_val = some_val + 1 ; ERROR: ON CONFLICT DO UPDATE requires inference specification OR CONSTRAINT name LINE 2 : ON CONFLICT DO UPDATE SET some_val = some_val + 1 ; ^ HINT: FOR example , ON CONFLICT ON CONFLICT ( < column > ) .

This is interesting – for “DO NOTHING" you can just type it as it, but when doing “DO UPDATE" you have to put some more work.

Why? Consider that we have 2 different unique indexes – on id and on some_key. What we update should be different depending on which key might get violated.

So, for DO UPDATE we need to specify which key violation is the one we care about, so let's do it:

INSERT INTO test ( some_key ) VALUES ( 'a' ) ON CONFLICT ON CONSTRAINT test_some_key_key DO UPDATE SET some_val = some_val + 1 ; ERROR: COLUMN reference "some_val" IS ambiguous LINE 2 : ... TRAINT test_some_key_key DO UPDATE SET some_val = some_val +... ^

And another error. This time – it has problem with “some_val". Why? That's because UPDATE has access to both values in row that is being updated, and the ones that I just tried to insert. And some_val doesn't specify where the data comes from, or goes into.

So, how to specify which some_val, I want? For the table columns, prefix it with table name, and for the row that you tried to insert – use “excluded" pseudo-table.

So, my simple example becomes

INSERT INTO test ( some_key ) VALUES ( 'a' ) ON CONFLICT ON CONSTRAINT test_some_key_key DO UPDATE SET some_val = test . some_val + 1 ; UPSERT 0 1 SELECT * FROM test; id | some_key | some_val ----+----------+---------- 2 | b | 0 1 | a | 1 ( 2 ROWS )

But if I wanted, for example, to use provided value, I could:

INSERT INTO test ( some_key , some_val ) VALUES ( 'b' , 10 ) ON CONFLICT ON CONSTRAINT test_some_key_key DO UPDATE SET some_val = excluded . some_val; UPSERT 0 1 SELECT * FROM test; id | some_key | some_val ----+----------+---------- 1 | a | 1 2 | b | 10 ( 2 ROWS )

From what I can tell, you can't have two “ON CONFLICT" clauses, which means you can't have different handling of pkey violation, and some other key violation, but maybe it will be added in future.

Now, how does that behave in case of multiple rows?

INSERT INTO test ( some_key , some_val ) VALUES ( 'a' , 5 ) , ( 'b' , 10 ) , ( 'c' , 15 ) ON CONFLICT ON CONSTRAINT test_some_key_key DO UPDATE SET some_val = excluded . some_val returning * ; id | some_key | some_val ----+----------+---------- 1 | a | 5 2 | b | 10 9 | c | 15 ( 3 ROWS ) UPSERT 0 3

And if I'd ignore duplicates?

INSERT INTO test ( some_key , some_val ) VALUES ( 'a' , 5 ) , ( 'b' , 10 ) , ( 'c' , 15 ) , ( 'd' , 20 ) ON CONFLICT DO NOTHING returning * ; id | some_key | some_val ----+----------+---------- 13 | d | 20 ( 1 ROW )

Nice.

So, finally, let's test performance.

I'll test 2 approaches:

insert on conflict do nothing

upsert done via function, which uses LOOP approach

Insert is trivial. Now for the function.

CREATE FUNCTION upsert_loop ( p_key text , p_val int4 ) RETURNS void AS $$ BEGIN LOOP UPDATE test SET some_val = p_val WHERE some_key = p_key; IF found THEN RETURN ; END IF ; BEGIN INSERT INTO test ( some_key , some_val ) VALUES ( p_key , p_val ) ; RETURN ; EXCEPTION WHEN unique_violation THEN -- Do nothing, and loop to try the UPDATE again. END ; END LOOP; END ; $$ LANGUAGE plpgsql; CREATE FUNCTION SELECT upsert_loop ( 'a' , 123 ) ; upsert_loop ------------- ( 1 ROW ) SELECT upsert_loop ( 'z' , 23 ) ; upsert_loop ------------- ( 1 ROW ) SELECT * FROM test; id | some_key | some_val ----+----------+---------- 2 | b | 10 9 | c | 15 13 | d | 20 1 | a | 123 14 | z | 23 ( 5 ROWS )

Works. So now, let's write simple test.

I will need couple of test files.

I need some (many) duplicates to happen, so let's make dataset that will be having them:

perl -e 'my @c = ("a".."z", "A".."Z", "0".."9"); printf "%s%s %d

", $c[rand @c], $c[rand @c], rand 1000 for 1..100000' | split -l 20000 - input.

This, somewhat cryptic line, generated five files, named “input.aa" to “input.ae".

Each of these files contains 20000 lines, and the lines look like this:

uE 459 GD 347 HN 977 wE 3 OD 201 MD 926 MB 523 t1 903 FH 157 tk 104

There is 3844 different combinations in 1 column, so we will have lots of updates.

Now, I'll convert these files into 2 sets of sql files, one using insert … on conflict, and the other will use function:

for a in input.a?; do cat $a | sed "s/ /', /;s/^/select upsert_loop('/;s/\$/);/" > function-$a; done for a in input.a?; do cat $a | sed "s/ /', /;s/^/insert into test (some_key, some_val) values ('/;s/\$/) on conflict do nothing;/" > conflict-$a; done

The files looks now like this:

= $ head function - INPUT . aa SELECT upsert_loop ( 'uE' , 459 ) ; SELECT upsert_loop ( 'GD' , 347 ) ; SELECT upsert_loop ( 'HN' , 977 ) ; SELECT upsert_loop ( 'wE' , 3 ) ; SELECT upsert_loop ( 'OD' , 201 ) ; SELECT upsert_loop ( 'MD' , 926 ) ; SELECT upsert_loop ( 'MB' , 523 ) ; SELECT upsert_loop ( 't1' , 903 ) ; SELECT upsert_loop ( 'FH' , 157 ) ; SELECT upsert_loop ( 'tk' , 104 ) ; 14 : 41 :00 depesz@krowka ~ = $ head conflict - INPUT . aa INSERT INTO test ( some_key , some_val ) VALUES ( 'uE' , 459 ) ON conflict do nothing; INSERT INTO test ( some_key , some_val ) VALUES ( 'GD' , 347 ) ON conflict do nothing; INSERT INTO test ( some_key , some_val ) VALUES ( 'HN' , 977 ) ON conflict do nothing; INSERT INTO test ( some_key , some_val ) VALUES ( 'wE' , 3 ) ON conflict do nothing; INSERT INTO test ( some_key , some_val ) VALUES ( 'OD' , 201 ) ON conflict do nothing; INSERT INTO test ( some_key , some_val ) VALUES ( 'MD' , 926 ) ON conflict do nothing; INSERT INTO test ( some_key , some_val ) VALUES ( 'MB' , 523 ) ON conflict do nothing; INSERT INTO test ( some_key , some_val ) VALUES ( 't1' , 903 ) ON conflict do nothing; INSERT INTO test ( some_key , some_val ) VALUES ( 'FH' , 157 ) ON conflict do nothing; INSERT INTO test ( some_key , some_val ) VALUES ( 'tk' , 104 ) ON conflict do nothing;

Now, with this in place, I will run them all, at the same time, and check how long it takes. From clean table.

So, I need s helper script, which does:

#!/usr/bin/env bash prefix = "$1" psql -qAtX -c 'truncate test' for a in $prefix -input.a?; do psql -qAtX -f $a >/ dev / null & done time wait

This will run truncate on the test table, then run all sql files (for given prefix) in parallel, and check how long it takes for them to finish.

Since single run is not really good thing, i will test each of the methods 3 times, and pick best time. Results:

ON CONFLICT – 16.780s

function – didn't finish. after 3 minutes a gave up.

Winner is clear. But, ON CONFLICT has also one benefit – it seamlessly handles working with multiple rows. So importing new data will be much simpler now. Thanks guys – that's great feature.

update

Andreas notice that I used key name in all “on conflict" clauses – where you can use “on conflict (col_a, col_b)". This form (with listed columns, and not constraint name) has the benefit that it will work if you'd change name of unique constraint.