Share Tweet Share





First off, before I get started on some more guidelines for writing maintainable queries, Regina and Leo already wrote about Part I. My response to their example is, there is a clear lack of comprehensibility there … but it’s due to badly named tables, not a need for aliases, let alone single-letter aliases. And if you’re stuck with badly named tables, then by all means use aliases … but use meaningful aliases. Regina, Leo, if you don’t feel you made your point, how about a more realistic example?

Use Explicit Joins

Due to parser/planner limitations from 10 years ago (or 5 years ago in the case of MySQL), developers have gotten used to doing their joins in the WHERE clause, SQL89-style. Today, this is like programmers who limit all of their function names to 8 characters because darn it, that’s all FORTRAN ever needed! WHERE-clause joins are a bad habit, both because they mix up the joins and the filter conditions. They also make it easy to leave out a join condition entirely, resulting in a cartesian product which might not be discovered until tables are fully populated in production, at which time it brings the whole server down. Explicit joins (e.g. persons JOIN posts ON persons.id = posts.author ) are far more readable, and make it easy for you and other maintainer to visualize the full chain of relations produced by the query. This, in turn, makes it easy (or at least less difficult) to see where problems might be happening, especially extra result rows or an empty result set. PostgreSQL and other advanced databases are capable of reordering your explicit joins just as easily as WHERE clause joins. So there’s no performance difference, either.

Use that Whitespace!

One of the other issues I encounter constantly, both in reading clients’ SQL and on the PostgreSQL IRC channel and mailing lists, is that people will submit queries as one big line, or broken up randomly in a way that doesn’t improve readability at all. Usually I ask the submitter to reformat the query and come back. SQL isn’t Python. Whitespace is seldom significant, and has no performance impact. So, linebreak, tab and space for maximum readability; think of your SQL as its own documentation and format it like documentation. Here’s a good way to use whitespace:

Each clause (SELECT, FROM, WHERE, GROUP, ORDER) should go on its own line.

Clauses which are more than 80 characters long should be broken up at natural divisions (like commas between identifiers) and tabbed to wrap to a new line.

Subqueries should start on their own line, and be tabbed inwards to match up with the start of the subquery, with each subquery clause starting its own line.

Complex expressions should also get their own line, tabbed in.

Since that’s a little hard to picture, here’s a generic template: SELECT table1,field1, table2.field2, expression1, expression2 FROM table1 JOIN table2 ON condition JOIN table3 ON condition WHERE filter1 AND filter2 AND table1.field5 IN ( SELECT fieldA FROM subtableA WHERE filterA ) GROUP BY field1, field2 ORDER BY field1, field2 Breaking it up over multiple lines like this makes it easy to zero in on the particular clause which is giving the query problems. Some people prefer to line up a division line between the syntax keywords and the identifiers instead of tabbing. This approach can be very readable as well, but it’s hard to demonstrate using simple HTML so I don’t have an example.

Other Rules for Readability

Here’s a few other formatting tips to make your queries easy to read, understand, and maintain:

SQL grammar words and builtins (such as SELECT, AS and MAX) should be in all caps, and identifiers and custom functions should be in lower case.

In the FROM and WHERE clauses, try to put the tables and fields in the same order they are in in the SELECT and/or GROUP BY clauses.

Never use the shortcut of referencing GROUP BY identifiers by position (e.g. ORDER BY 2,3 ), even if the SQL standard does allow it.

), even if the SQL standard does allow it. Refer to fields by their “table.field” and not just “field” name unless the query is very simple. This means that you don’t need to look up each table definition to find out where a field is from.

Putting It All Together