Posted by Moser on 17 Mar 2018

On client project I have to do a search in a structure of tables and return results including some child objects. A classical instance of the N+1 query problem, if I was using an ORM. I decided not to use the ORM for the feature because it will be one of the hottest paths of the application and I wanted more control over the queries (rather complex logic with LIKE & sorting). But the filtering/sorting is not the topic today, so I will leave it out in the examples.

For illustration, let’s assume the following schema:

CREATE TABLE parents ( parent_id INTEGER PRIMARY KEY , name VARCHAR ); CREATE TABLE children ( child_id INTEGER PRIMARY KEY , name VARCHAR , birthdate DATE , parent_id INTEGER , FOREIGN KEY ( parent_id ) REFERENCES parents );

Without any further thinking I would do a query like this and a bit of code that constructs parent objects with their respective children.

SELECT * FROM parents JOIN children USING ( parent_id ) ORDER BY parent_id ;

parent_id | name | child_id | name | birthdate -----------+-------+----------+---------+------------ 1 | Jim | 1 | Tamara | 2017-02-01 1 | Jim | 3 | Tom | 2005-10-01 1 | Jim | 5 | Tonja | 2011-07-17 2 | Jenny | 2 | Tim | 2000-11-02 2 | Jenny | 4 | Theresa | 2017-04-30

Just before I started writing the logic to pull apart the result and put it into it’s object structure, I thought, “It would be nice to let the database put together all the children of one parent into an array.”

I already knew array_agg which aggregates all values into an array. After some reading I discovered json_build_object which takes a sequence of keys and values and creates a JSON object from it.

So my final query looked like this:

SELECT parent_id , p . name , array_agg ( json_build_object ( 'child_id' , c . child_id , 'name' , c . name , 'birthdate' , c . birthdate )) as children FROM parents p JOIN children c USING ( parent_id ) GROUP BY 1 , 2 ;

parent_id | name | children -----------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | Jim | {"{\"child_id\" : 1, \"name\" : \"Tamara\", \"birthdate\" : \"2017-02-01\"}","{\"child_id\" : 3, \"name\" : \"Tom\", \"birthdate\" : \"2005-10-01\"}","{\"child_id\" : 5, \"name\" : \"Tonja\", \"birthdate\" : \"2011-07-17\"}"} 2 | Jenny | {"{\"child_id\" : 2, \"name\" : \"Tim\", \"birthdate\" : \"2000-11-02\"}","{\"child_id\" : 4, \"name\" : \"Theresa\", \"birthdate\" : \"2017-04-30\"}"}

When the query is executed with sqlalchemy, children in the resulting rows is already correctly typed as a list of dictionaries.

The explain analyze output for both queries (on a trivially small test set) shows that the aggregated version is 50-100% slower (150-200μs vs. 250-350μs), but I guess that will rarely be a real problem because - at least in my case

the execution time of my query is dominated by filtering/sorting the parent rows.

If you would like to play with the example yourself, get the SQL file here.