If you worked with certain other (than PostgreSQL) open source database, you might wonder why PostgreSQL doesn't have MERGE, and why UPSERT example in documentation is so complicated.

Well, let's try to answer the question, and look into some alternatives.

First, of course – let's define the task.

I will need a way to either insert row, or update it, if it already exists. What does “it already exists" mean – let's assume we're talking about row with the same value in UNIQUE column.

For example – I have a table:

CREATE TABLE test ( whatever INT4 PRIMARY KEY , counter INT4 NOT NULL DEFAULT 0 ) ;

Very simple, but that's basically the point. I want to insert new “whatever", but if it already exists – I want instead to update the counter.

And – what makes UPSERT so complicated – it should never raise exception.

The first, simplest solution is:

UPDATE test SET counter = counter + 1 WHERE whatever = ?;

and if it fails (modified row count is = 0) – run:

INSERT INTO test ( whatever , counter ) VALUES ( ? , 1 ) ;

It's so simple, it can't fail. Right? Let's see. Very simple Perl program to test:

#!/usr/bin/env perl use DBI ; my $dbh = DBI -> connect ( 'dbi:Pg:dbname=pgdba;host=127.0.0.1;port=5920' , undef , undef , { 'AutoCommit' => 1 , } , ) ; while ( 1 ) { my $whatever = int ( 1 + rand 100 ) ; my $rows = $dbh -> do ( 'UPDATE test set counter = counter + 1 WHERE whatever = ?' , undef , $whatever ) ; if ( 0 == $rows ) { $dbh -> do ( 'INSERT INTO test (whatever, counter) VALUES (?, 1)' , undef , $whatever ) ; } }

Of course it will never end (while look from line 9 to 23 doesn't have ending condition), but after running it, and pressing ctrl-c I can stop it. And I did, after couple of minutes – and we're good. No errors. So it's OK, right? Wrong.

Before I will show why it's wrong, let's explain what it does – in case reader of this blogpost didn't read fluently such simple Perl programs 🙂

Line 1 – standard UNIX “shebang" to tell shell to run it via perl

Line 2 – load DBI module, which is all-purpose database access library

Lines 3-8 – it's technically one line, just indented a lot – creates database connection, using Pg driver, to database pgdba, using tcp/ip connection to 127.0.0.1, port 5920. Undefs are just a way to specify default username and lack of password. Autocommit bit in line 7 makes each statement its own transaction.

Line 9 – beginning of infinite loop

Line 10 – I pick random integer value in range 1-100

Lines 11-15 – single command, runs UPDATA in database with given $whatever (random value from previous step)

Line 16 if no rows were updated (->do() returned 0) – run next step

Lines 17-21 – single command, running INSERT in database

All very simple, step by step doing what we need.

And yet, despite initial test – I say that it's wrong. Why?

Answer is very simple – it breaks when I'll run it many times in parallel. How? Let's see:

=$ psql -c 'truncate test' ; for i in { 1 .. 10 } ; do . / test.pl & done ; sleep 5 ; killall perl TRUNCATE TABLE [ 1 ] 12843 [ 2 ] 12844 [ 3 ] 12845 [ 4 ] 12846 [ 5 ] 12847 [ 6 ] 12848 [ 7 ] 12849 [ 8 ] 12850 [ 9 ] 12851 [ 10 ] 12852 DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 55 ) already exists. at . / test.pl line 17 . DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 20 ) already exists. at . / test.pl line 17 . DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 11 ) already exists. at . / test.pl line 17 . DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 97 ) already exists. at . / test.pl line 17 . DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 18 ) already exists. at . / test.pl line 17 . DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 18 ) already exists. at . / test.pl line 17 . [ 1 ] Terminated . / test.pl [ 2 ] Terminated . / test.pl [ 4 ] Terminated . / test.pl [ 5 ] Terminated . / test.pl [ 6 ] Terminated . / test.pl [ 7 ] Terminated . / test.pl [ 8 ] Terminated . / test.pl [ 9 ] - Terminated . / test.pl [ 3 ] - Terminated . / test.pl [ 10 ] + Terminated . / test.pl

We have errors. But how? It shows error of duplicate key in test.pl line 17 – which is the $dbh->do() of INSERT. But the UPDATE didn't found the row. So what's going on?

Reason is very simple – with multiple processes working in parallel it is perfectly possible that two processes chose the same value. Both issued UPDATE – and neither of the UPDATES found a row, so they both issued INSERT. And the second insert failed.

Of course there is very short time for such case to happen – if the update happened 1 millisecond later – I wouldn't have the problem. That's why it's called race condition.

Some people at the moment say: OK, but that's simple – just do it in transaction.

So, let's try. I changed the script to be:

#!/usr/bin/env perl use DBI ; my $dbh = DBI -> connect ( 'dbi:Pg:dbname=pgdba;host=127.0.0.1;port=5920' , undef , undef , { 'AutoCommit' => 0 , } , ) ; while ( 1 ) { my $whatever = int ( 1 + rand 100 ) ; my $rows = $dbh -> do ( 'UPDATE test set counter = counter + 1 WHERE whatever = ?' , undef , $whatever ) ; if ( 0 == $rows ) { $dbh -> do ( 'INSERT INTO test (whatever, counter) VALUES (?, 1)' , undef , $whatever ) ; } $dbh -> commit ( ) ; }

Changes – line 7 now shows disabled autocommit, and we have line 23 which does commit() after the work has been done. We don't need BEGIN; because Perl's DBI is smart enough to start transaction whenever we are not in transaction, and some work has to be done.

So, with this change, we see:

=$ psql -c 'truncate test' ; for i in { 1 .. 10 } ; do . / test.pl & done ; sleep 5 ; killall perl TRUNCATE TABLE ... DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 23 ) already exists. at . / test.pl line 17 . DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 98 ) already exists. at . / test.pl line 17 . DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 22 ) already exists. at . / test.pl line 17 . DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 77 ) already exists. at . / test.pl line 17 . DBD::Pg::db do failed: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key ( whatever ) = ( 9 ) already exists. at . / test.pl line 17 . ...

(I removed the “job started" and “Terminated" lines for brevity).

It still fails? Why? Reason is very simple – transactions (in default isolation level) cannot prevent anything like this – they just are there to make sure that:

other connections will not see “your" changes, before you will finish making them

if you'll decide – you can abandon all your changes – i.e. rollback

That's all.

If you're paying attention you noticed “in default isolation level“.

There are docs on these levels, but generally they mean – how much “separated" you want your transactions. There is even “SERIALIZABLE" level, which is described like:

The most strict is Serializable, which is defined by the standard in a paragraph which says that any concurrent execution of a set of Serializable transactions is guaranteed to produce the same effect as running them one at a time in some order.

This might sound great. Let's test it. Another change in the test script:

#!/usr/bin/env perl use DBI ; my $dbh = DBI -> connect ( 'dbi:Pg:dbname=pgdba;host=127.0.0.1;port=5920' , undef , undef , { 'AutoCommit' => 0 , } , ) ; while ( 1 ) { $dbh -> do ( 'SET TRANSACTION ISOLATION LEVEL SERIALIZABLE' ) ; my $whatever = int ( 1 + rand 100 ) ; my $rows = $dbh -> do ( 'UPDATE test set counter = counter + 1 WHERE whatever = ?' , undef , $whatever ) ; if ( 0 == $rows ) { $dbh -> do ( 'INSERT INTO test (whatever, counter) VALUES (?, 1)' , undef , $whatever ) ; } $dbh -> commit ( ) ; }

(new line 10).

Unfortunately, this time, when I'll try to run is, I will get HUGE number of errors like these:

=$ psql -c 'truncate test' ; for i in { 1 .. 10 } ; do . / test.pl & done 2 >& 1 | head -n 15 ; sleep 5 ; killall perl TRUNCATE TABLE DBD::Pg::db commit failed: ERROR: could not serialize access due to read / write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt. HINT: The transaction might succeed if retried. at . / test.pl line 24 . DBD::Pg::db commit failed: ERROR: could not serialize access due to read / write dependencies among transactions DETAIL: Reason code: Canceled on commit attempt with conflict in from prepared pivot. HINT: The transaction might succeed if retried. at . / test.pl line 24 . DBD::Pg::db commit failed: ERROR: could not serialize access due to read / write dependencies among transactions DETAIL: Reason code: Canceled on commit attempt with conflict in from prepared pivot. HINT: The transaction might succeed if retried. at . / test.pl line 24 . DBD::Pg::db do failed: ERROR: could not serialize access due to read / write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during write. HINT: The transaction might succeed if retried. at . / test.pl line 18 . DBD::Pg::db do failed: ERROR: could not serialize access due to read / write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during write. HINT: The transaction might succeed if retried. at . / test.pl line 18 .

Ouch. That sucks.

So, another idea might be: just use a lock. Of course. But what kind of lock? You cannot lock a row if it doesn't exist. SO we'd have to lock the table. That's doable.

So, let's change the line 10 from setting isolation level, to:

$dbh -> do ( 'LOCK TABLE test IN ACCESS EXCLUSIVE MODE' ) ;

Quick run shows – NO ERROR. Yeah. We're all good. Are we? Well, not really. Reason is simple – performance.

Each command either inserted row with counter = 1, or updated the counter by incrementing it. So, by summing all counters – we'll see how many queries actually happened. Since all copies of the script are killed after 5 seconds we'll have some base number:

$ SELECT SUM ( counter ) FROM test; SUM ───── 439 ( 1 ROW )

OK. And how does it work without locking? We will have errors, but let's just see some numbers. Line 10 got commented out, and:

=$ psql -c 'truncate test' ; for i in { 1 .. 10 } ; do . / test.pl & done &> / dev / null; sleep 5 ; killall perl ; psql -c "select sum(counter) from test" TRUNCATE TABLE ... sum ------ 2042 ( 1 row )

That's very big difference. We had some errors, but the number of logged “items" was nearly 5 times higher. So – locking of the table can be an answer, but it kills performance.

There is one kind of locks that could be used. These are so called advisory locks. Using them you can lock something that doesn't exist in database – because you're applying the lock on a number. Not a row. Not a table. Not any object. Just a number (or two numbers, but that's irrelevant).

So – this looks like a godsend. Right? Well, mostly – yes. Let's see. Two small changes (line 10 got moved to 11 and change to create lock, and I added line 25 and 26 to remove the lock):

#!/usr/bin/env perl use DBI ; my $dbh = DBI -> connect ( 'dbi:Pg:dbname=pgdba;host=127.0.0.1;port=5920' , undef , undef , { 'AutoCommit' => 0 , } , ) ; while ( 1 ) { my $whatever = int ( 1 + rand 100 ) ; $dbh -> do ( 'SELECT pg_advisory_lock( ? )' , undef , $whatever ) ; my $rows = $dbh -> do ( 'UPDATE test set counter = counter + 1 WHERE whatever = ?' , undef , $whatever ) ; if ( 0 == $rows ) { $dbh -> do ( 'INSERT INTO test (whatever, counter) VALUES (?, 1)' , undef , $whatever ) ; } $dbh -> commit ( ) ; $dbh -> do ( 'SELECT pg_advisory_unlock( ? )' , undef , $whatever ) ; $dbh -> commit ( ) ; }

You might ask: why I need another commit() at line 26? Answer is simple – if I didn't make it there, the unlock would work, but would be in the same transaction as next iteration of the loop. And this – while wouldn't be fully an error is a thing I don't like – doing stuff for two unrelated jobs in the same transaction.

So, let's run it, see if there are errors, and what's the performance:

=$ psql -c 'truncate test' ; for i in { 1 .. 10 } ; do . / test.pl & done ; sleep 5 ; killall perl ; psql -c "select sum(counter) from test" TRUNCATE TABLE ... sum ------ 2114 ( 1 row )

Looks good – no errors, and the performance number looks great. So, we're set. Right? Yeah, you guessed it – wrong.

Why? Reason is very simple – lack of trust.

Advisory locks work great, and will protect you from all kinds of errors. If you use them.

And since the advisory lock doesn't lock anything “real" in database – like table or row – it will not stop other accesses to the table – from parts of your code where you forgot about them (locks), or from psql sessions where you're too lazy to do it.

This wouldn't be that big of a problem generally, but PostgreSQL devs, and (to some extent) users strive for perfection. And safety in all cases. So – while using advisory locks (or even the solution with race condition) might be good for your situation – it is not good enough for the general case.

Now, let me make a small digression. I, once, long time ago, wrote on Polish, and then, (quite recently) David Fetter extended description of a method to do “insert of new rows" (i.e. ignore, without error, inserting of rows that already exist).

Generally the method is basically either:

INSERT INTO test ( whatever , counter ) SELECT 123 , 1 WHERE NOT EXISTS ( SELECT * FROM test WHERE whatever = 123 ) ;

or the same, with OUTER JOIN:

INSERT INTO test ( whatever , counter ) SELECT v .* FROM ( VALUES ( 123 , 1 ) ) AS v ( w , c ) LEFT JOIN test t ON t . whatever = v . w WHERE t . whatever IS NULL ;

Unfortunately both of these approaches have the same problem – race condition. Window of opportunity for the error is shorter than in case of two queries, but it exists. I will not show it, because test is trivial, and I don't want to inflate this, already big, blogpost, but you can either trust me or simply run the test yourself.

End of digression, back to main track.

So, we need a way to do the update, insert if it didn't work, and then – if insert failed redo update. Seems simple, with one problem – we wanted errors not to happen.

And now we should ask ourselves – what does it mean that “error did not happen". Does it mean that at no point we saw the “ERROR" text, or simply – that we have one transaction that does the change in dataase, and that it will always succeed, but might sometime log error (but it will still succeed).

If we want to get rid of errors altogether – we need to dive into PL/pgSQL language, but for now – let's just assume we want it done in single transaction, that will always succeed. Errors might be shown, but each iteration of loop from line 9, should increment a counter in database.

So, I'll use a relatively unknown feature of PostgreSQL: SAVEPOINT. Savepoint is a “thing" within transaction, that we can always roll back to, even in case of errors that happened after setting savepoint.

New version of test program:

#!/usr/bin/env perl use DBI ; my $dbh = DBI -> connect ( 'dbi:Pg:dbname=pgdba;host=127.0.0.1;port=5920' , undef , undef , { 'AutoCommit' => 0 , } , ) ; for ( 1 .. 500 ) { my $whatever = int ( 1 + rand 100 ) ; my $rows = $dbh -> do ( 'UPDATE test set counter = counter + 1 WHERE whatever = ?' , undef , $whatever ) ; if ( 1 == $rows ) { $dbh -> commit ( ) ; next ; } $dbh -> do ( 'SAVEPOINT x' ) ; $rows = $dbh -> do ( 'INSERT INTO test (whatever, counter) VALUES (?, 1)' , undef , $whatever ) ; if ( 1 == $rows ) { $dbh -> commit ( ) ; next ; } $dbh -> do ( 'ROLLBACK TO x' ) ; $dbh -> do ( 'UPDATE test set counter = counter + 1 WHERE whatever = ?' , undef , $whatever ) ; $dbh -> commit ( ) ; }

(I know, it's not pretty, but should be simple to understand).

So, what does it do? First – I changed the loop from infinite to “do 500 operations". I run 10 parallel tasks, which should give me 5000 operations in database – which is a bit more than I did previously, but should be good enough to see how it works.

Then – I do update, and see if it worked. If it did – fine, commit transaction, and do another loop.

If it didn't work – I create savepoint, imaginatively named “x", and call insert – this time, checking how many rows were inserted.

If the row was inserted – commit the transaction, and continue with next loop – I don't have to “destroy" savepoints – they “live" within transaction only, so when you commit transaction savepoint is removed too.

If the row was not inserted (because of, in our case, UNIQUE violation) – issue rollback to savepoint x (thus removing “transaction is aborted" state), and redo update. Since we now know that the row is there – we don't have to check anything, just do it, commit, and continue with next iteration of the loop.

Results?

= $ psql - c 'truncate test' ; FOR i IN { 1 .. 10 } ; do ./ test . pl & done; TIME wait; psql - c "select sum(counter) from test" TRUNCATE TABLE ... DBD::Pg::db do failed: ERROR: duplicate KEY VALUE violates UNIQUE CONSTRAINT "test_pkey" DETAIL: KEY ( whatever ) = ( 46 ) already EXISTS . at ./ test . pl line 21 . DBD::Pg::db do failed: ERROR: duplicate KEY VALUE violates UNIQUE CONSTRAINT "test_pkey" DETAIL: KEY ( whatever ) = ( 90 ) already EXISTS . at ./ test . pl line 21 . DBD::Pg::db do failed: ERROR: duplicate KEY VALUE violates UNIQUE CONSTRAINT "test_pkey" DETAIL: KEY ( whatever ) = ( 81 ) already EXISTS . at ./ test . pl line 21 . DBD::Pg::db do failed: ERROR: duplicate KEY VALUE violates UNIQUE CONSTRAINT "test_pkey" DETAIL: KEY ( whatever ) = ( 71 ) already EXISTS . at ./ test . pl line 21 . ... REAL 0m12 . 216s USER 0m1 . 228s sys 0m0 . 336s SUM ------ 5000 ( 1 ROW )

So, we see that we had INSERT error four times. But final count on table shows that all iterations of the loop did in fact increment a counter – which means that savepoint worked.

What about performance? Previously we had ~ 2100 increments in 5 seconds. This time we have 5000 in 12.2 seconds. This means that this is only ~ 3% slower than version with advisory locks.

But, I have to admit, I cheated a little. We do 5000 inserts, but we have only 100 different values of “whatever". Which means that in 4896 cases flow of the loop ended right after first update.

Whether this is an issue for you – you have to consider yourself, and perhaps redo tests with different number of loops, or different range of “whatever" values.

Of course – you might say that you don't like the fact that the program shows errors. I could silence them in script, or I could move to stored procedure. Since the code in docs used procedure – let's use it.

PL/pgSQL function that does the same logic is:

CREATE OR REPLACE FUNCTION upsert_test ( in_whatever INT4 ) RETURNS void AS $$ BEGIN UPDATE test SET counter = counter + 1 WHERE whatever = in_whatever; IF FOUND THEN RETURN ; END IF ; BEGIN INSERT INTO test ( whatever , counter ) VALUES ( in_whatever , 1 ) ; EXCEPTION WHEN OTHERS THEN UPDATE test SET counter = counter + 1 WHERE whatever = in_whatever; END ; RETURN ; END ; $$ LANGUAGE plpgsql;

and this allowed me to simplify perl test script to:

#!/usr/bin/env perl use DBI ; my $dbh = DBI -> connect ( 'dbi:Pg:dbname=pgdba;host=127.0.0.1;port=5920' , undef , undef , { 'AutoCommit' => 1 , } , ) ; for ( 1 .. 500 ) { my $whatever = int ( 1 + rand 100 ) ; $dbh -> do ( 'SELECT upsert_test( ? )' , undef , $whatever ) ; }

Couple of notes:

there is no “SAVEPOINT" command in PL/pgSQL. But the inner BEGIN (line 7) does exactly the same thing.

WHEN OTHERS (function, line 9) is a “catch all" type of exception handler. Doing “WHEN unique_violation" (as in Pg docs is cleaner, but it's not relevant at the moment

You might have noticed that Perl script is again using AutoCommit – this is because now every transaction is just one command (select from function), so we don't need explicit transactions. Thought – in real life applications I wouldn't use autocommit.

So, how it works?

=$ psql -c 'truncate test' ; for i in { 1 .. 10 } ; do . / test.pl & done ; time wait ; psql -c "select sum(counter) from test" TRUNCATE TABLE ... real 0m12.094s user 0m0.932s sys 0m0.268s sum ------ 5000 ( 1 row )

No errors on screen. No errors in logs. Performance acceptable. Function a bit simpler than the one in logs (no loop, just simple 3 queries).

So we're good? Yeah, sure, you expected it. No, we're not good.

We forgot about one fact. DELETEs happen.

What will happen, if we'd have following case:

session #1 session #2 begin; update … – fails, there is no row insert insert – fails, the row exists DELETE update – fails, there is no row!

Of course the chances for such case are very low. And the timing would have to be perfect. But it is technically possible, and if it is technically possible, it should be at least mentioned, and at best – solved.

This is, of course, another case of race condition. And this is exactly the reason why docs version of the upsert function has a loop.

If you'll excuse me – I will skip showing the error happening – as it requires either changing the code by adding artificial slowdowns, or a lot of luck, or a lot of time. But I hope you understand why the DELETEs can cause problems. And why loop is needed to solve the problem.

Of course – adding the loop, the way as its shown in docs, introduces potential infinite loop – code that runs inserts and deletes, and runs in perfect synchronization with your function will cause your function to never end.

But such situation is simply improbable (running in perfect synchronization), so we should be good, and the loop will at most redo its work twice.

So, all in all – now, I hope, you understand why code in documentation is so complex with exception handling and loop.

In your own programs you can use another approaches. For example – if you're sure you're running just one writing client at a time – you can use the way with outer join or “exists" that I mentioned in digression. If you are sure about your app, and that noone will ever touch the database otherwise – you can use advisory locks (these would work with DELETE's too, as long as delete would use advisory lock too). Or you can use exception handling, and possibly loop. It all depends on a case that you have.

As a final word – yeah, but why PostgreSQL doesn't handle it internally, as the other database?

Answer lies somewhere between “developers have other things to do first" and “it's complicated, because there are many ways to do it, and the most correct one is slow – so there is a trade off decision to be made".

I don't know which method was used by “MERGE" in the other database, and would like to know what kind of trade offs they made – speed over correctness, or the other way around. And to what extent.