Could PostgreSQL 9.5 be your next JSON database?

TL;DR: No, but that's not the right question.

Just over a year ago we asked Is PostgreSQL Your Next JSON Database... Now, with PostgreSQL 9.5 out, it's time to check if Betteridge's law still applies. So let's talk about JSONB support in PostgreSQL 9.5.

For context, and for those of you who haven't been following, it's worth knowing the history of JSON in PostgreSQL. If you're all up to speed already, just skip ahead to read about the new features. The JSON story begins with the arrival of JSON in PostgreSQL 9.2..

JSON in 9.2

The original JSON data type that landed in PostgreSQL 9.2 was basically a text column flagged as JSON data for processing through a parser. In 9.2 though, you could turn rows and arrays in json and for everything else you have to dive into one of the PL languages. Useful in some cases but ... more, lots more was needed. To illustrate, if we had JSON data like this:

{ "title": "The Shawshank Redemption", "num_votes": 1566874, "rating": 9.3, "year": "1994", "type": "feature", "can_rate": true, "tconst": "tt0111161", "image": { "url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_.jpg", "width": 933, "height": 1388 } }

We could create a table like so:

CREATE TABLE filmsjson ( id BIGSERIAL PRIMARY KEY, data JSON );`

And insert data into it like so:

compose=> INSERT INTO filmsjson (data) VALUES ('{ "title": "The Shawshank Redemption", "num_votes": 1566874, "rating": 9.3, "year": "1994", "type": "feature", "can_rate": true, "tconst": "tt0111161", "image": { "url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZ TgwMDU2MjEyMDE@._V1_.jpg", "width": 933, "height": 1388 } }') INSERT 0 1 compose=> select * FROM filmsjson; id | data ----+----------------------------------------------------------------------------------------------------------- 1 | { + | "title": "The Shawshank Redemption", + | "num_votes": 1566874, + | "rating": 9.3, + | "year": "1994", + | "type": "feature", + | "can_rate": true, + | "tconst": "tt0111161", + | "image": { + | "url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_.jpg",+ | "width": 933, + | "height": 1388 + | } + | } (1 row);

And apart from storing and retrieving the entire document, there was little we could do with it. Notice that all the spaces and carriage returns have been preserved. That'll be important later...

Fast forward to PostgeSQL 9.3.

On the back of a new parser for JSON in PostgreSQL 9.3, operators appear to extract values from the JSON data type. Chief among them is -> which can, given an integer, extract a value from a JSON array or, given a string, member of an JSON data type and ->> which does the same but returns text. Building on this is #> and #>> which allow a path to be specified to the value to be extracted.

With our previous example table, that meant we could now at least peer into the JSON and do a query like:

compose=> select data->'title' from filmsjson; ?column? ---------------------------- "The Shawshank Redemption" (1 row) compose=> select data#>'{image,width}' from filmsjson; ?column? ---------- 933 (1 row)

Yes, the path is a list of keys working down through the JSON document. Don't be caught out thinking the curly braces represent JSON though - this is a text array as a literal string which PostgreSQL interprets into a text[]. That means that query is equivelant to this:

select data#>ARRAY['image','width'] from filmsjsonb;

These were joined by a good set of functions but this was all still pretty limited. It didn't really allow for complex queries, there was limited indexing on particular fields and only a few ways to create new JSON elements. But most importantly all that on the fly parsing of a text field wasn't efficient.

Cut to PostgreSQL 9.4.

PostgreSQL 9.4 is where JSONB arrived. JSONB is a binary encoded version of JSON which efficiently stores the keys and values of a JSON document. This means all the space padding is gone and with it all the need to parse the JSON. The down side is that you can't have repeated keys at the same level and you generally lose all the formatted structure of the document. It's a sacrifice thats well worth making because everything gets generally more efficient because there's no on the fly parsing. It does slow inserts down because it's there that the parsing actually gets done. To see the difference, let's create a JSONB table and insert our example data into it:

compose=> CREATE TABLE filmsjsonb ( id BIGSERIAL PRIMARY KEY, data JSONB ); CREATE TABLE compose=> INSERT INTO filmsjsonb (data) VALUES ('{ "title": "The Shawshank Redemption", "num_votes": 1566874, "rating": 9.3, "year": "1994", "type": "feature", "can_rate": true, "tconst": "tt0111161", "image": { "url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZ TgwMDU2MjEyMDE@._V1_.jpg", "width": 933, "height": 1388 } }'); INSERT 0 1 compose=> select * from filmsjsonb id | data ----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | {"type": "feature", "year": "1994", "image": {"url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_.jpg", "width": 933, "height": 1388}, "title": "The Shawshank Redemption", "rating": 9.3, "tconst": "tt0111161", "can_rate": true, "num_votes": 1566874} (1 row)

Yes, that is rather wide. All the spaces and returns from the JSON data have gone leaving one compact key/value list.

Although they share many features, here's a fun fact: JSONB has no creation functions. In 9.4, the JSON data type got a bundle of extra creation functions: json_build_object() , json_build_array() and json_object() . Use those, or other creation functions, and cast to JSONB ( ::jsonb ) to get the JSONB version. It reflects the logic the PostgreSQL developers have applied - JSON for document fidelity and storage, JSONB for fast, efficient operations. So while JSON and JSONB both have the -> , ->> , #> and #>> operators, only JSONB has the "contains" and "exists" operators @> , <@ , ? , ?| and ?& .

Exists is a check for strings that match top-level keys in the JSONB data so we can check there's a rating field in our example data like so:

compose=> select data->'title' from filmsjsonb where data ? 'rating'; ?column? ---------------------------- "The Shawshank Redemption" (1 row)

But if we queried for the url key that's inside the image value, we'd fail

compose=> select data->'title' from filmsjsonb where data ? 'url'; ?column? ---------- (0 rows)

But we could test the image value, like so:

compose=> select data->'title' from filmsjsonb where data->'image' ? 'url'; ?column? ---------------------------- "The Shawshank Redemption" (1 row)

The ?| operator does the same thing but "or" matches the keys against an array of strings rather than just one string. The ?& operator does a similar thing but "and" matches so all the strings in the array must be matched.

But exists operators just check for presence. With the '@>' contains operator you can match keys, paths and values. But let's quickly import some more movies into the database first. Ok, now say we want all the movies from 1972, we can look for the records that contain "year":"1972".

compose=> select data->'title' from filmsjsonb where data @> '{"year":"1972"}'; ?column? ----------------- "The Godfather" "Solaris" (2 rows)

And we can look for particular values within objects:

compose=> select data->'title' from filmsjsonb where data @> '{ "image":{"width":500}}'; ?column? -------------------------------------- "The Green Mile" "My Neighbor Totoro" "Nausicaä of the Valley of the Wind" (3 rows)

9.4 also brought creating GIN indexes which cover all the fields in the JSONB documents for all JSON operations. It's also possible to create GIN indexes with json_path_ops set which gives smaller, faster indexes but only for use of the @> contains operator which is actually remarkably useful as many JSON operations on nested documents are about finding documents which contain particular values. That said, there's still plenty of scope for more comprehensive and capable indexing.

So, 9.4 brought PostgreSQL up to the point where you could create, extract and index JSON/JSONB. What was missing though was the ability to modify the JSON/JSONB data types. You still had to look at passing the JSON data to a PLv8 or PLPerl script where it could be natively manipulated. So, things were close to being a full service JSON document handling environment, but not quite.

Enter PostgreSQL 9.5

PostgreSQL 9.5's new JSON capabilities are all about modifying and manipulating JSONB data. Apart from one, that is. The jsonb_pretty() function takes JSONB and makes it more readable so you go from:

compose=> select data from filmsjsonb where id=1; data ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {"type": "feature", "year": "1994", "image": {"url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_.jpg", "width": 933, "height": 1388}, "title": "The Shawshank Redemption", "rating": 9.3, "tconst": "tt0111161", "can_rate": true, "num_votes": 1566874} (1 row)

To a much more digestable form...

compose=> select jsonb_pretty(data) from filmsjsonb where id=1; jsonb_pretty --------------------------------------------------------------------------------------------------------------- { + "type": "feature", + "year": "1994", + "image": { + "url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_.jpg",+ "width": 933, + "height": 1388 + }, + "title": "The Shawshank Redemption", + "rating": 9.3, + "tconst": "tt0111161", + "can_rate": true, + "num_votes": 1566874 + } (1 row)

Which is much more readable and going to pop up in any JSON related PostgreSQL 9.5 examples. On to the operators....

Let us delete

The simplest modifier is deletion. Just say what you want gone and make it go away. For that, 9.5 introduces the - and #- operators. The - operator works like the -> operator except instead of returning a value from an array (if given an integer as a parameter) or object (if given a string), it deletes the value or key/value pair. So, with our movie database, if we want to remove the rating field then this does the trick:

compose=> update filmsjsonb set data=data-'rating'; UPDATE 250

The #- operator goes further, taking a path as a parameter. So say we wanted to remove the image's dimension properties:

compose=> update filmsjsonb set data=data#-'{image,width}'; UPDATE 250 compose=> update filmsjsonb set data=data#-'{image,height}'; UPDATE 250 compose=> select jsonb_pretty(data) from filmsjsonb where id=1; jsonb_pretty -------------------------------------------------------------------------------------------------------------- { + "type": "feature", + "year": "1994", + "image": { + "url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_.jpg"+ }, + "title": "The Shawshank Redemption", + "tconst": "tt0111161", + "can_rate": true, + "num_votes": 1566874 + } (1 row)

We do two updates because the path specifier doesn't allow for optional keys but we can get it down to one update by remembering that the set expression can be as complex as we need it.

compose=> update filmsjsonb set data#-'{image,height}'#-'{image,width}'; UPDATE 250

Although you can delete data from the database, remember that you can also just remove it from your output too:

compose=> select jsonb_pretty(data#-'{image,height}'#-'{image,width}') from filmsjsonb where id=1; jsonb_pretty -------------------------------------------------------------------------------------------------------------- { + "type": "feature", + "year": "1994", + "image": { + "url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_.jpg"+ }, + ....

Concatenation

The operator for manipulations is the concatenation operator || . This tries to combine two JSONB objects into one. It works with the top level keys of both values only and when the same key is present on both sides, it resolves it by taking the right-hand operand's value. This means you can use it as an update mechanism too. Say, using out example data, we need to set the can_rate field to false, clear the num_votes field and add a new revote field set to true...

compose=> update filmsjsonb set data=data || '{"can_rate":false,"num_votes":0," revote":true }'; UPDATE 250 compose=> select jsonb_pretty(data) from filmsjsonb where id=1; jsonb_pretty --------------------------------------------------------------------------------------------------------------- { + "type": "feature", + "year": "1994", + "image": { + "url": "http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_.jpg",+ "width": 933, + "height": 1388 + }, + "title": "The Shawshank Redemption", + "rating": 9.3, + "revote": true, + "tconst": "tt0111161", + "can_rate": false, + "num_votes": 0 + } (1 row)

This is a generally useful way to merge JSONB data types, for example in post processing. As an update method it leaves something to be desired. Updating a single top level-field, it's a bit overkill. Updating a nested single field in a document, then you have to dig your way down to the containing object and merge from there. If only there was a simple way to set a particular field...

jsonb_set for success

The jsonb_set() function is designed for updating single fields wherever they are in the JSON document. Let's jump straight to an example:

compose=> update filmsjsonb SET data = jsonb_set(data,'{"image","width"}',to_js onb(1024)) where id=1;

This will change the value of the image.width property to 1024. The arguments for jsonb_set() are simple; the first argument is a JSONB data type you want to modify, the second is a text array path and the third is a JSONB value to replace the value at the end of that path. If the key/value pair at the end of the path doesn't exist, by default, jsonb_set() creates and sets it. To stop that behavior, add a fourth optional parameter ("create_missing") and set it to false. If "create_missing" is true but other components of the path don't exist then jsonb_set() won't try to create the entire path and will just fail. Say we wanted to add a new object to our image data about picture rights, we can simply add in the JSON data for that new object:

compose=> update filmsjsonb SET data = jsonb_set(data,'{"image","quality"}','{" copyright":"company X","registered":true}') where id=2; compose=> select jsonb_pretty(data) from filmsjsonb where id=2; jsonb_pretty --------------------------------------------------------------------------------------------------------------- { + "type": "feature", + "year": "1972", + "image": { + "url": "http://ia.media-imdb.com/images/M/MV5BMjEyMjcyNDI4MF5BMl5BanBnXkFtZTcwMDA5Mzg3OA@@._V1_.jpg",+ "width": 1024, + "height": 500, + "quality": { + "copyright": "company X", + "registered": true + } + }, + "title": "The Godfather", + "rating": 9.2, + "tconst": "tt0068646", + "can_rate": true, + "num_votes": 1072605 + } (1 row)

jsonb_set() is probably the most important addition in PostgreSQL 9.5's JSON functions. It offers the chance to change data in-place within JSONB data types. Do remember that where we've used simple values to set parameters is only for examples; you could have PostgreSQL subqueries creating new values and co-ercing them into JSONB subdocuments or arrays to create richer JSON documents.

Consider this

What this all leads to is an interesting position for PostgreSQL. PostgreSQL 9.5's JSON enhancements mean that you could use PostgreSQL as a JSON database; it's fast and functional. Whether you'd want to is a different consideration.

For example, the relatively accessible APIs or client libraries of many JSON databases are not there. In their place is a PostgreSQL specific dialect of SQL for manipulating JSON which is used in tandem with the rest of the database's SQL to exploit the full power of it. This means you still have to learn SQL, a requirement which, unfortunately, too many people use as their reason for using a "NoSQL" database.

You can use PostgreSQL to create rich, complex JSON/JSONB documents within the database. But then if you are doing that, you may want to consider whether you are using PostgreSQL well. If the richness and complexity of those documents comes from relating the documents to each other then the relational model is often the better choice for data models that have intertwined data. The relational model also has the advantage that it handles that requirement without large scale duplication within the actual data. It also has literally decades of engineering expertise backing up design decisions and optimizations.

What JSON support in PostgreSQL is about is removing the barriers to processing JSON data within an SQL based relational environment. The new 9.5 features take down another barrier, adding just enough accessible, built-in and efficient functions and operators to manipulate JSONB documents.

PostgreSQL 9.5 isn't your next JSON database, but it is a great relational database with a fully fledged JSON story. The JSON enhancements arrive alongside numerous other improvements in the relational side of the database, "upsert", skip locking and better table sampling to name a few.

It may not be your next JSON database, but PostgreSQL could well be the next database you use to work with relational and JSON data side by side.