Introducing the JSONB Query API of Objection.js

As you all know PostgreSQL have had a good support for JSON datatype already since 9.4. I love it.

It is really powerful tool e.g. to simplify DB when one has to store extra data with a row which isn't really relational, but whose structure might be quite complex.

Schemaless development is one popular reason why people use MongoDB (e.g. for fast prototyping). Now one can do that with PostgreSQL with transaction support. Not saying that "MongoDB is dead" though 😉

While ago we at Vincit open sourced our Node.js ORM objection.js we've been developing for a few years in our internal use.

In addition to the basic ORM stuff (composite keys etc.) it has some neat document features and it supports reading / writing JSON columns with proper serialization. If you are into schemas you can even declare application side schema for a JSON column contents with it.

The thing I was missing was an API to be able to write queries against the data stored in JSON columns without need to write raw SQL.

So here it is! The first release of objection.js JSON query API. API was designed to just wrap query operators provided by PostgreSQL 9.4 and give nice access to them through JS APIs.

In this post I'll walk through

Referring the JSON attributes inside a column

Usage of JSONB query API and pitfalls

How to create and choose indexes for JSON data

Referring JSON Fields Inside the Column

To start creating queries one has to be able to refer individual JSON fields inside the column. Beforehands objection did support only syntax for referring table and column 'TableName.columnName' syntax, which had to be extended to support fields inside a column.

PostgreSQL Syntax

There are two different ways in PostgreSQL to extract / refer separate values inside a JSON value

"columnName"#>'{phoneNumbers,0,description}' "columnName"->'phoneNumbers'->0->'description'

Only difference I've noticed between these two has been that with -> notation the operand is type sensitive. One has to use integers to refer an array and strings to refer object keys.

-- operator -> integers for arrays, strings for keys select ('{"arr": ["value"], "obj": {"0":"value"}}'::jsonb) ->'obj'->0; -- returns null select ('{"arr": ["value"], "obj": {"0":"value"}}'::jsonb) ->'arr'->0; -- returns "value" select ('{"arr": ["value"], "obj": {"0":"value"}}'::jsonb) ->'obj'->'0'; -- returns "value" select ('{"arr": ["value"], "obj": {"0":"value"}}'::jsonb) ->'arr'->'0'; -- returns null -- operator #> doesn't care of type select ('{"arr": ["value"], "obj": {"0":"value"}}'::jsonb) #>'{obj,0}'; -- returns "value" select ('{"arr": ["value"], "obj": {"0":"value"}}'::jsonb) #>'{arr,0}'; -- returns "value" select ('{"arr": ["value"], "obj": {"0":"value"}}'::jsonb) #>'{obj,"0"}'; -- returns "value" select ('{"arr": ["value"], "obj": {"0":"value"}}'::jsonb) #>'{arr,"0"}'; -- returns "value"

Objection.js Syntax

To refer fields inside JSONB column with objection we extended the table / column reference syntax to have additional JSON reference part.

Earlier table/column reference syntax was <tableName>.<columnName> . Now updated syntax is <tableName>.<columnName>:<jsonFieldReference>

an example JsonFieldReference

'UserTable.jsonProfileData:details.phoneNumbers[1]'

Since #>'{}' notation does not require caller to know internal types of JSON structure we decided to use that everywhere when converting our JSON field reference syntax to SQL .

It doesn't matter if you write access with an array notation 'col1:[1].nested' or you use dot to access nested fields 'col1:1.nested' . The code generated from both will be equivalent.

db.Place.query() .whereJsonIsObject('JustTable.imaColumn:same.sqlstr.generated') .dumpSql(); // select * from "Place" // where ( "JustTable"."imaColumn"#>'{same,sqlstr,generated}' )::jsonb // @> '{}'::jsonb db.Place.query() .whereJsonIsObject('JustTable.imaColumn:[same][sqlstr][generated]') .dumpSql(); // select * from "Place" // where ( "JustTable"."imaColumn"#>'{same,sqlstr,generated}' )::jsonb // @> '{}'::jsonb

Syntax is documented in objection API docs.

Query API

JSON API is an extension to QueryBuilder providing few new functions for filtering data according to values, which are contained in PostgreSQL JSONB columns.

All the code related to this post is found from objection.js examples/jsonb folder.

Let's start by putting some data in our tables:

var castleGreyskull = { name: 'Castle Grayskull', details: { color: 'Gray', looksLikeSkull: true }, heroes : [ { name: 'He-Man', details: { title: 'Master of the Universe', type: 'Hero', quote: 'by the power of Grayskull!', luckyNumbers: [ 1 ] } }, { name: 'Teela', details: { type: 'Hero', magic: true, 0 : "zero", luckyNumbers: [ 5, 7 ] } }, { name: 'Orko', details: { type: 'Hero', title: 'Jester', magic: true, funniness: true, race : 'Trollan', luckyNumbers: [ 0 ] } } ] }; var snakeMountain = { name: 'Snake Mountain', details: { color: 'Purple', looksLikeSnake: true }, heroes : [ { name: 'Skeletor', details: { title: 'Main Villain of the Evildoers', type: 'Villain', luckyNumbers: [ 4, 5 ,9 ] } }, { name: 'Evil-Lyn', details: [ {}, "race", [], 0 ] }, { name: 'Faker', details: null } ] }; // The easy way of inserting data with relations db.Place.query().insertWithRelated([castleGreyskull, snakeMountain]);

Objection's .insertWithRelated(...) lets us add data + relations with quite powerful syntax. After insert we have following data in DB:

objection-jsonb-example=# select * from "Place"; id | name | details ----+------------------+---------------------------- 1 | Castle Grayskull | { + | | "color": "Gray", + | | "looksLikeSkull": true+ | | } 2 | Snake Mountain | { + | | "color": "Purple", + | | "looksLikeSnake": true+ | | } (2 rows) objection-jsonb-example=# select * from "Hero"; id | name | details | homeId ----+----------+------------------------------------------------+-------- 1 | He-Man | { +| 1 | | "type": "Hero", +| | | "quote": "by the power of Grayskull!", +| | | "title": "Master of the Universe", +| | | "luckyNumbers": [ +| | | 1 +| | | ] +| | | } | 2 | Teela | { +| 1 | | "0": "zero", +| | | "type": "Hero", +| | | "magic": true, +| | | "luckyNumbers": [ +| | | 5, +| | | 7 +| | | ] +| | | } | 3 | Orko | { +| 1 | | "race": "Trollan", +| | | "type": "Hero", +| | | "magic": true, +| | | "title": "Jester", +| | | "funniness": true, +| | | "luckyNumbers": [ +| | | 0 +| | | ] +| | | } | 4 | Skeletor | { +| 2 | | "type": "Villain", +| | | "title": "Main Villain of the Evildoers", +| | | "luckyNumbers": [ +| | | 4, +| | | 5, +| | | 9 +| | | ] +| | | } | 5 | Evil-Lyn | [{}, "race", [], 0] | 2 6 | Faker | | 2 (6 rows)

Now we got the data! Lets move to the queries.

Where JSON Contain Keys or Values

There are 3 operators for JSONB columns to query if JSON has key with certain name ? , ?| and ?& . Key can be key of an object or it can be a string item in a JSON array.

Actually to support these operators we had to add support to knex to allow escaping ? so that it won't be considered as a SQL parameter binding by the underlying query builder library knex .

.whereJsonHasAny(<JsonFieldReference>, <string[]>)

db.Hero.query().select('name') .whereJsonHasAny('details', ['title', 'race']) // returns [ {"name":"He-Man"}, // had "title" as an object key {"name":"Orko"}, // had "title" and "race" in object keys {"name":"Skeletor"}, // had "title" as an object key {"name":"Evil-Lyn"} // had "race" as an array string item ] // generated sql: // select "name" from "Hero" // where "details"#>'{}' ?| array['title','race']

Teela and Faker were not returned since they didn't have any of the queried keys.

.whereJsonHasAll(<JsonFieldReference>, <string[]>)

db.Hero.query().select('name') .whereJsonHasAll('details', ['title', 'race']) // returns [ {"name":"Orko"} // had both "title" and "race" as object keys ] // generated sql: // select "name" from "Hero" // where "details"#>'{}' ?& array['title','race']

Only String Keys Allowed!

PostgreSQL supports only string type of keys when querying if value exist with ? operators.

db.Hero.query().select('name') .whereJsonHasAny('details', ['0']) // returns [ {"name":"Teela"} // had '0' string as an object key ] // generated sql: // select "name" from "Hero" // where "details"#>'{}' ?| array['0']

Evil-Lyn is not returned, because she has item 0 as a number, not as a string.

If one tries to make the same query, but trying to match numbers instead of strings there will be an error:

select "name" from "Hero" where "details"#>'{}' ?| array[0]; ERROR: operator does not exist: jsonb ?| integer[]

Where JSON is Equal

JSON equality operation is for checking if an object or an array is deeply equal.

The data in JSONB type is normalized so order of the keys in an object does not matter. It is all the same if one does query by {a:1, b:2} or {b:2, a:1} .

.whereJsonEquals(<JsonFieldReference>, <Object|Array>)

db.Place.query().select('name') .whereJsonEquals('details', { looksLikeSnake: true, color: 'Purple' }) // returns a purple place that looks like a snake... pretty obvious [{"name":"Snake Mountain"}] // generated sql: // select "name" from "Place" // where // ( "details"#>'{}' )::jsonb = // '{\"looksLikeSnake\":true,\"color\":\"Purple\"}'::jsonb

.whereJsonEquals(<JsonFieldReference>, <JsonFieldReference>)

If a string is given as a second parameter it is considered as a reference to another column / JSON field in the DB.

// Both sides json field references db.Hero.query().select('name') .whereJsonEquals('details:funniness', 'details:magic') // returns [{"name":"Orko"}] // generated sql: // select "name" from "Hero" // where // ( "details"#>'{funniness}' )::jsonb = // ( "details"#>'{magic}' )::jsonb

Caveats When Queried Key Doesn't Exist

If one tries to match a key that does not exist in the JSON PostgreSQL will ignore that row completely from results.

TLDR; It's complicated... when doing e.g. not queries make sure that every row that you want to be included in your results have the key that you refer in your query.

This is general rule when doing queries to JSONB columns which feels natural in most of the cases, but e.g. with negative queries it feels unintuitive since the set of rows returned by not query does not return all the other rows which did not match the positive query.

In other words not query returns all the rows that contain the referred field and whose contents do not match the compared value.

The rows which are returned by the following query .whereJsonNotEquals('col:a', {innerField:1}) are highlighted.

col | ----------------------+- {a: {innerField: 1}} | {a: {innerField: 2}} | {a: {innerField: 3}} | {a: {}} | {a: null} | {a: true} | {b: {innerField: 1}} | {b: {innerField: 2}} | {b: {innerField: 3}} | {b: {innerField: 4}} |

Yet another example with our dataset from the example project:

// Not variant returns those which not exist, but not those who don't even has the key db.Hero.query().select('name') .whereJsonNotEquals('details:luckyNumbers', [ 0 ]) // returns [{"name":"He-Man"},{"name":"Teela"},{"name":"Skeletor"}] // generated sql: // select "name" from "Hero" // where ( "details"#>'{luckyNumbers}' )::jsonb != '[0]'::jsonb

Evil-Lyn and Faker who don't have the luckyNumbers field are not returned. Only those rows which have the field and where field value is not [0] were fetched.

Also one may write a query where both sides of the clause are non-existing. Because fields are not being found all the rows are dropped from the result set already before the comparison and nothing will be returned.

// Nothing found even when both sides of the comparison point to non existing field db.Place.query().select('name') .whereJsonEquals('details:nonExisting1', 'details:nonExisting2') // returns: [] // The same result for not equals db.Place.query().select('name') .whereJsonNotEquals('details:nonExisting1', 'details:nonExisting1') // returns: []

One can make these cases to behave differently by adding separate handling how to handle the rows which do not contain referred attribute.

// return rows where both nonExisting1 and nonExisting2 are the same also those where both are null or undefined db.Place.query().select('name') .whereIn('id', function (builder) { builder.select('id').from('Place') .whereJsonField('details:nonExisting1','IS', null) .whereJsonField('details:nonExisting2','IS', null); }) .orWhereJsonEquals('details:nonExisting1', 'details:nonExisting2');

Yet better is to just avoid this kind of problems by making sure that the keys you are querying are always really found.

Where JSON is Subset or Superset

PostgreSQL has @> and <@ operators for querying if some JSON value is subset of another.

.whereJsonSupersetOf(<JsonFieldReference>, <Object|Array|JsonFieldReference>)

db.Hero.query().select('name') .whereJsonSupersetOf('details', { type: 'Hero' }) // returns [{"name":"He-Man"},{"name":"Teela"},{"name":"Orko"}] // generated sql: // select "name" from "Hero" // where ( "details"#>'{}' )::jsonb @> '{\"type\":\"Hero\"}'::jsonb

.whereJsonSubsetOf(<JsonFieldReference>, <Object|Array|JsonFieldReference>)

db.Hero.query().select('name') .whereJsonSubsetOf('details:luckyNumbers', [9,5,4]) // returns [{"name":"Skeletor"}] // generated sql: // select "name" from "Hero" // where ( "details"#>'{luckyNumbers}' )::jsonb <@ '[9,5,4]'::jsonb

Like with the .whereJsonEquals() both sides of the subset / superset comparison can be a reference to JSON field.

Where JSON is an Array or an Object

PostgreSQL does not have separate operator for querying by the type of a JSON field, but we can exploit the sub-/superset operators to find out if a field is an object or an array.

.whereJsonIsObject(<JsonFieldReference>)

db.Hero.query().select('name') .whereJsonIsObject('details') // returns [{"name":"He-Man"},{"name":"Teela"},{"name":"Orko"},{"name":"Skeletor"}] // generated sql: // select "name" from "Hero" // where ( "details"#>'{}' )::jsonb @> '{}'::jsonb

.whereJsonIsArray(<JsonFieldReference>)

db.Hero.query().select('name') .whereJsonIsArray('details') // returns [{"name":"Evil-Lyn"}] // generated sql: // select "name" from "Hero" // where ( "details"#>'{}' )::jsonb @> '[]'::jsonb

Since there is just one parameter which is always a JsonFieldReference it is trivial to implement correctly working not variants which will work also in the case where the field reference doesn't exist.

Meaning that also the rows which don’t even have the mentioned key will be returned unlike the case with other methods like described in section .whereJsonNotEquals() .

.whereJsonIsArray(<JsonFieldReference>)

db.Hero.query().select('name').whereJsonNotObject('details') // returns [{"name":"Evil-Lyn"},{"name":"Faker"}] // generated sql: // select "name" from "Hero" // where (not ( "details"#>'{}' )::jsonb @> '{}'::jsonb // or ("details"#>>'{}')::TEXT is NULL)

See the extra check which matches also the rows where referred field doesn't exist.

Querying With String / Integer Operators >, <, ILIKE etc...

If one likes to use other operators which are not available for the JSONB type we need to cast the field reference to correct type which has the desired operator.

For that objection.js has currently somewhat limited support (more complex stuff one has to with Raw queries) for the most common cases with .whereJsonField(...) method.

It supports casting the field reference to a TEXT , NUMERIC or BOOLEAN type automatically depending on which kind of value is used for comparison in query.

.whereJsonField(<JsonFieldReference>, <operator>, <string|number|boolean>)

// '%master%' is string so SQL type will be TEXT db.Hero.query().select('name') .whereJsonField('details:title', 'ilike', '%master%') // returns [{"name":"He-Man"}] // generated sql: // select "name" from "Hero" // where ("details"#>>'{title}')::TEXT ilike '%master%'

// 7 is a number so SQL type will be NUMERIC db.Hero.query().select('name') .whereJsonField('details:luckyNumbers[1]', '=', 7) // returns [{"name":"Teela"}] // generated sql: // select "name" from "Hero" // where ("details"#>>'{luckyNumbers,1}')::NUMERIC = '7'

In this example when casting a field which does not exist in JSON column the casted value will be NULL .

// null is null so we just cast to TEXT here anyways db.Hero.query().select('name') .whereJsonField('details:luckyNumbers[1]', 'IS', null) // returns [{"name":"He-Man"},{"name":"Orko"},{"name":"Evil-Lyn"},{"name":"Faker"}] // generated sql: // select "name" from "Hero" // where ("details"#>>'{luckyNumbers,1}')::TEXT is NULL

Creating Index for JSON Columns / Fields

There are many good articles about different index choices specific to PostgreSQL JSONB columns. Couple of my favorites are JSONB type performance in PostgreSQL 9.4 by Marco Nenciarini and Indexing JSON data with jsonb data type by Michael Paquier.

Here I'm just going to introduce briefly the two most common index types and how to create the index for JSONB column with the knex migrations (the migration framework used by objection.js).

General Inverted Indexes a.k.a. GIN

This is the index type which makes all JSONB set operations fast. All isSuperset / isSubset / hasKeys / hasValues etc. queries can use this index to speed 'em up. Usually this is the index you want and it may take around 30% extra space on the DB server.

.raw('CREATE INDEX on ?? USING GIN (??)', ['Hero', 'details'])

If one likes to use only the subset/superset operators with faster and smaller index one can give an extra parameter when creating the index: "The path_ops index supports only the search path operator "@>" (see below), but produces a smaller and faster index for these kinds of searches.". According to Marco Nenciarini's post the speed up can be over 600% where the size of index is reduced from ~30% -> ~20%.

.raw('CREATE INDEX on ?? USING GIN (?? jsonb_path_ops)', ['Place', 'details'])

Index on Expression

Another type of index one may use for JSONB field is to create an expression index e.g. for a certain JSON field inside a column.

You might want to use these if you are using lots of `.whereJsonField()` queries, which cannot be sped up with GIN index. Use of these indexes are more limited, but they are also somewhat faster than using GIN and querying e.g. { field: value } with subset operator. They also takes a lot less space. So if you want to make certain query to go extra fast you may consider using .whereJsonField() + index on expression.

.raw("CREATE INDEX on ?? ((??#>>'{type}'))", ['Hero', 'details'])

Complete Migration Example and Created Tables / Indexes

exports.up = function (knex) { return knex.schema .createTable('Hero', function (table) { table.increments('id').primary(); table.string('name'); table.jsonb('details'); table.integer('homeId').unsigned() .references('id').inTable('Place'); }) .raw( 'CREATE INDEX on ?? USING GIN (??)', ['Hero', 'details'] ) .raw( "CREATE INDEX on ?? ((??#>>'{type}'))", ['Hero', 'details'] ) .createTable('Place', function (table) { table.increments('id').primary(); table.string('name'); table.jsonb('details'); }) .raw( 'CREATE INDEX on ?? USING GIN (?? jsonb_path_ops)', ['Place', 'details'] ); };

objection-jsonb-example=# \d "Hero" Table "public.Hero" Column | Type ---------+------------------------ id | integer name | character varying(255) details | jsonb homeId | integer Indexes: "Hero_pkey" PRIMARY KEY, btree (id) "Hero_details_idx" gin (details) "Hero_expr_idx" btree ((details #>> '{type}'::text[]))

objection-jsonb-example=# \d "Place" Table "public.Place" Column | Type ---------+------------------------ id | integer name | character varying(255) details | jsonb Indexes: "Place_pkey" PRIMARY KEY, btree (id) "Place_details_idx" gin (details jsonb_path_ops)

Expression index is used for example for

explain select * from "Hero" where details#>>'{type}' = 'Hero'; QUERY PLAN ---------------------------------------------------------------- Index Scan using "Hero_expr_idx" on "Hero" Index Cond: ((details #>> '{type}'::text[]) = 'Hero'::text)

Last Words

SQL + JSON is here and now there is also a proper Node.js ORM which supports it. If you feel like going schemaless with your SQL DB nothing can stop you now! If you like to put some additional document based data to your rows and query it, just do it!

If you are new to objection.js an easy way to get going is to git clone the repo, run an example project and start experimenting. There are examples for ES5,ES6 and ES7, which ever makes you comfortable.

Next steps with objection's JSON support is to figure out a convenient way to allow JSON field references in all of the QueryBuilder API methods.

Especially I would like to allow selecting just parts from JSON column without need to write raw SQL e.g. .select('id', 'details:luckyNumbers[1]') and to come up with nice syntax to support patching values atomically to JSON columns (with jsonb_set() and other JSON modify functionality which came with PostgreSQL 9.5).

If you want to keep up with the latest news what is going on with objection.js you can follow me in Twitter.

Follow @MikaelLepisto

.