When building servers that provide JSON data, either using REST or GraphQL, we often want to query some time, along with its “children”. If we’re not careful we can end up with a large number of inefficient queries. Fortunately, Postgres has some functions that allow returning complex data structures directly as “JSON”. This works great with node.js because they just become native JavaScript objects.

The Problem

Consider the database schema:

A database schema with 3 tables: “users”, “user_teams” & “teams”. Each user has a manager identified by the “manager_id” field. The “user_teams” table links “users” and “managers”.

We can create this database schema, along with some sample data by running:

CREATE TABLE users (

user_id BIGSERIAL NOT NULL PRIMARY KEY,

display_name TEXT NOT NULL,

manager_id BIGINT NULL REFERENCES users(user_id)

); CREATE TABLE teams (

team_id BIGSERIAL NOT NULL PRIMARY KEY,

display_name TEXT NOT NULL

); CREATE TABLE user_teams (

user_id BIGINT NOT NULL REFERENCES users,

team_id BIGINT NOT NULL REFERENCES teams,

PRIMARY KEY(user_id, team_id)

); INSERT INTO users

(user_id, display_name, manager_id)

VALUES

(1, 'Forbes', NULL), (2, 'John', NULL), (3, 'Joe', 1); INSERT INTO teams

(team_id, display_name)

VALUES

(1, 'Awesome Team'), (2, 'Team of One'); INSERT INTO user_teams

(user_id, team_id)

VALUES

(1, 1), (2, 1), (1, 2);

Now imagine we want to run a query like:

Get a list of all users, and for each user, get their teams and their managers

We want output that looks something like:

[

{

"id": 1,

"display_name": "Forbes",

"manager": null,

"teams": [

{

"id": 1,

"display_name": "Awesome Team"

},

{

"id": 2,

"display_name": "Team of One"

}

]

},

{

"id": 2,

"display_name": "John",

"manager": null,

"teams": [

{

"id": 1,

"display_name": "Awesome Team"

}

]

},

{

"id": 3,

"display_name": "Joe",

"manager": {

"id": 1,

"display_name": "Forbes"

},

"teams": []

}

]

The Simplest Solution?

We have a few options. We could just run all the queries recursively using node.js to combine all the data using @databases/pg :

const results = await Promise.all(

(await db.query(

sql`

SELECT u.user_id AS id, u.display_name, u.manager_id

FROM users u

`,

)).map(async ({manager_id, ...user}) => ({

...user,

manager:

(await db.query(

sql`

SELECT u.user_id AS id, u.display_name

FROM users u

WHERE u.user_id=${manager_id}

`,

))[0] || null,

teams: await db.query(

sql`

SELECT t.team_id AS id, t.display_name

FROM user_teams ut

JOIN teams t USING (team_id)

WHERE ut.user_id = ${user.id}

`,

),

})),

);

console.log(results);

This works fine, as long as the data set is small. The only problem is we’re running 2n + 1 queries (where n is the number of users). We also can’t run the inner queries until we’ve fetched all the users. This is putting a lot of unnecessary load on both our node.js server and our Postgres server.

Select ANY

If you’re familiar with JavaScript, and have some familiarity with SQL, you might think to fix this by making just 3 queries:

const users = await db.query(

sql`

SELECT u.user_id AS id, u.display_name, u.manager_id

FROM users u

`,

); const userTeams = await db.query(

sql`

SELECT t.team_id AS id, t.display_name, ut.user_id AS user_id

FROM user_teams ut

JOIN teams t USING (team_id)

WHERE ut.user_id = ANY(${users.map(u => u.id)})

`,

); const managers = await db.query(

sql`

SELECT u.user_id AS id, u.display_name

FROM users u

WHERE u.user_id = ANY(${users.map(u => u.manager_id)})

`,

);

This is much more efficient, but we’re still left with a lot of work to do in the JavaScript land to combine these results into the desired nested structure. We’ve fetched all the data reasonably efficiently, but we’ve still got most of the work to do.

JSON to the Rescue!

Wouldn’t it be great if Postgres could just directly return the nested data in exactly the format we want? The good news is, it can!

First, we need to understand a little bit about the data types

The Record data type, represents a row of SQL data. At the top level, all SQL queries return a list of Record s. Unfortunately, if we try and return nested rows, these come back as awkward strings that are hard to decode into anything useful. The Array data type, represents a list of things, e.g. JSON objects. The JSON data type, represents an object or array.

We’re going to use the following functions to deal with all these types:

row_to_json takes a Record and returns JSON . array_agg is an “aggregation function”. If you use this in a SELECT query, you get one record back containing an Array of values, instead of many records each containing one value. array_to_json takes an Array (e.g. the result of array_agg ) and converts it into JSON . coalesce takes two values, and if the first value is NULL it takes the second value (casting it to the type of the first value).

Handling the manager field

The first is row_to_json . You can use a subquery that returns only one value as a field in SQL. All we need to do here is convert the row to JSON so that our one value can be a complex object:

const results = await db.query(

sql`

SELECT

u.user_id AS id,

u.display_name,

(

SELECT row_to_json(x) FROM

(

SELECT m.user_id AS id, m.display_name

FROM users m WHERE m.user_id = u.manager_id

) x

) AS manager

FROM users u

`,

);

The extra nesting there is so that we can give a clear name x to our record, and pass that into row_to_json .

We can tidy this up with a helper function:

function nestQuerySingle(query) {

return sql`

(SELECT row_to_json(x) FROM (${query}) x)

`;

}

Then our query becomes:

const results = await db.query(

sql`

SELECT

u.user_id AS id,

u.display_name,

${nestQuerySingle(

sql`

SELECT m.user_id AS id, m.display_name

FROM users m WHERE m.user_id = u.manager_id

`

)} AS manager

FROM users u

`,

);

N.B. This would throw an error in Postgres if the manager query ever returned multiple records

Handling the teams

Now we have our Record s converted into JSON , but we need to fix another problem to handle the teams. The issue is that you cannot have multiple values as part of a single field. This is where array_agg comes in, along with array_to_json and coalesce .

const results = await db.query(

sql`

SELECT

u.user_id AS id,

u.display_name,

${nestQuerySingle(

sql`

SELECT m.user_id AS id, m.display_name

FROM users m WHERE m.user_id = u.manager_id

`

)} AS manager,

coalesce(

(

SELECT array_to_json(array_agg(row_to_json(x)))

FROM (

SELECT t.team_id AS id, t.display_name

FROM user_teams ut

JOIN teams t USING (team_id)

WHERE ut.user_id = u.user_id

) x

),

'[]'

) AS teams

FROM users u

`,

);

We use a sub query to get the teams , then we use array_to_json(array_agg(row_to_json(x))) to convert the many records into a single JSON value. Unfortunately, Postgres will return NULL here if there are no records, not [] , so we need to use coalesce to specify a default. The default of '[]' as a string here gets automatically cast to JSON to match the type of the result of array_to_json .

One more helper function:

function nestQuery(query) {

return sql`

coalesce(

(

SELECT array_to_json(array_agg(row_to_json(x)))

FROM (${query}) x

),

'[]'

)

`;

}

and we have the final query:

const results = await db.query(

sql`

SELECT

u.user_id AS id,

u.display_name,

${nestQuerySingle(

sql`

SELECT m.user_id AS id, m.display_name

FROM users m WHERE m.user_id = u.manager_id

`

)} AS manager,

${nestQuery(

sql`

SELECT t.team_id AS id, t.display_name

FROM user_teams ut

JOIN teams t USING (team_id)

WHERE ut.user_id = u.user_id

`

)} AS teams

FROM users u

`,

);

This returns the exact same data as our “simple” solution using queries nested in node.js, except that now the nesting occurs in SQL. This will be much more performant, not to mention easier to maintain.

Conclusion