As I don’t use ORMs anymore, I end up writing a lot of fairly complicated 100+ line SQL queries. I found a few ways to keep them readable.

1. Use lowercase

In most languages newer than COBOL, most of the code is in lowercase, so there is no reason it shouldn’t work for SQL. I’ve been writing it in lowercase for a few years and find it to be quite readable, with or without syntax highlighting.

My queries look like this:

select geometries.id , geometries.type , geometries.geometry , geometries.properties from geometries join projects using (project_id) join customers using (customer_id) where project_id = $1 and GeometryType(geometry) = 'POLYGON' order by geometries.properties->>'name'

Just like in any other code, there will be bits of queries which are non-obvious, and they can be easily be commented on:

where user_id = $1 and project_id = $2 and events.created_at >= to_timestamp($3) and events.created_at < to_timestamp($4) -- Sort by event ID to deal with entry-exit pairs in the same epoch order by area_id, created_at, event_id

Without that comment above order , I’d likely be puzzled by the non-obvious sort by event_id if I returned to this query a few months after writing it.

3. Format carefully

I apply a few rules:

Start each major block of a query on a new line

Use leading commas in lists - this prevents annoying errors due to forgetting a comma, and makes adding items to the list easier

Break up conditions into multiple lines such that logical operators are at the start of the line

Use whitespace liberally to make it clear which parts of a query belong together (e.g. when writing a subquery in a where ).

For example:

select geometries.id , geometries.type , geometries.geometry , geometries.properties from geometries where project_id in (select project_id from projects where customer_id = $1) and GeometryType(geometry) = 'POLYGON' and ( updated_at >= clock_timestamp() - interval '1 month' or deleted_at >= clock_timestamp() - interval '1 month' ) order by geometries.properties->>'name'

I do like to preserve vertical space (if I can see the whole query without scrolling, it’s much easier), so I’m likely to write multiple columns per line in a select list, but still use leading commas when a single line is not enough:

select geometries.id, geometries.type, geometries.geometry, geometries.properties , extract(epoch from created_at at time zone (select timezone from projects where id = $1)) , updated_at, deleted_at

4. Use CTEs

I use CTEs a lot in Postgres as a substitute for a lot of subqueries, and I find that I work towards the final result in a different way, more like I would in code - by treating the query as a series of data transformations. So, for example, if I want to produce a result for a set of users on a range of days, I write a CTE to get the users, then another one to generate the range of dates, then another one to combine them to get all possible “user-days”, then another to retrieve the data for each and process it, and so on:

with roles as ( select id from user_roles where is_active and (properties->>'is_person' or properties->>'does_signon') ) , relevant_users as ( select * from users where role_id in (select id from roles) ) , days as ( select generate_series($1::date, $2::date, '1 day') as day_start ) , user_days as ( select day_start, day + interval '1 day' as day_end, user_id from relevant_users cross join days ) select * from events join user_days on events.user_id = user_days.user_id and created_at >= day_start and created_at < day_end where type in ('entry', 'exit')

This results in very readable queries - way better than having subqueries all over the place. There are performance caveats to using CTEs so sometimes I have to structure the query differently, but it works well for a lot of them, as long as the intermediate result sets are small.