Share Tweet Share





Now that I’m consulting again, I find myself spending a lot of time fixing other people’s queries. And I find that the first 15-30 minutes of fixing a complex query I spend re-writing the query just so I can see what it’s doing — and considering what I bill, that’s rather costly time for formatting that should have been done correctly in the first place. While I’m very glad to see that many web developers are discovering the power of complex queries, I’m not at all happy about how they write them. A lot common practice seems to be based on conserving keystrokes rather than comprehensibility, rendering a 6-table reporting query something out of the Obfuscated Code Contest. The cost of writing unmaintainable queries should be obvious: they are harder to debug, harder to tune, and dramatically increase your costs of training up new staff and/or contractors. Worse, the query bugs you don’t catch because of your unreadable queries will often consume 90% of your system resources and troubleshooting time. You’re making your own job harder by writing unmaintainable, unreadable SQL. To paraphrase a famous quote about Perl: “Write all of your SQL with the maintainer in mind. Because chances are, that maintainer will be you.” Unfortunately, while a lot of attention has been paid to making procedural/object code well-documented and maintainable in various books, scant attention has been paid to making SQL maintainable. The only such text I can recall, in fact, came from chapter 2 of O’Reilly’s PL/SQL book. So here’s a few rules for writing maintainable SQL.

Avoid Table and Field Alias Abbreviations

… or use long ones. I’d thought that we’d all left the days of having program variables named “I” and “X” and “Y” behind us in BASIC and COBOL, but apparently not. For the same reasons you don’t want to have a global variable named “p”, you don’t want to have a queried table aliased as “p”. Certainly there’s a reason to use an abbreviated alias for “persons_referral_type” if you’re going to refer to it 18 times in the query, but there’s no legitimate reason to abbreviate the name of the table “class” to “cs”. If you’re any good as a programmer, you should be typing at least 80WPM, so those extra 4 characters are taking you what, 0.4 seconds? Whereas getting “cs” for “class” and “cs” for “club_status” mixed up could cost you up to 12 hours of debugging. Further, if you do have a really long name you need to abbreviate, make the abbreviation long enough to be distinctive. “persons_referral_type”, for example, should be abbreviated “persreft” or “personref” or “p_reftype” or similar, not “pr”, which could also refer to “performance_rating”.

Make 2nd Referral Aliases and Subquery Aliases Descriptive

Of course, there are times you have to use an alias because you’re referring to the same table twice in the same query in two different contexts, or because you have a named subquery or expression. Again, just like a program variable, the rule here is to be distinctive and descriptive, so that someone who is reading the variable deep down in the query knows exactly to what it refers. For example, if you have two references to the “status” lookup table, one from “person” and one from “club”, the proper aliasing would be “status AS person_status” and “status AS club_status”, not “status AS s1” or “status AS cs”. The same goes for subqueries. This is the wrong way to select a correlated summary statistic: (SELECT MAX(score) FROM persons p2 WHERE p2.club = persons.club) as ms, This is the right way: (SELECT MAX(score) FROM persons AS pers_max WHERE pers_max.club = persons.club) AS max_club_score, See how much more readable the second version is? You don’t have to run the subquery in your head to figure out what it’s doing; the alias tells you most of what you need to know for troubleshooting. This rule will become even more critical when we get Common Table Expressions in PostgreSQL 8.4.

Other Rules for Aliases

1. Always use the optional keyword AS. This both provides a visual que that there’s an alias, and helps avoid accidentally breaking the query by pasting something between the identifier and the alias or by referring to the original table name and not the alias elsewhere in the query. 2. Never, ever, ever, reuse the same alias for two different objects in different contexts in the query. Even if the SQL parser is fine with it, it will drive your maintainer insane. 3. Never use an alias which is the name of an actual object elsewhere in the database, even if it’s not used the the query. I’ll continue How To Write Maintainable Queries next week.