Share Tweet Share





Recently on the pgsql-performance mailing list an Oracle DBA launched an argument we seem to see about once a year on some PostgreSQL mailing list: a proposal that we support Query Hints. As always, the proposal gets rejected. We’ve even added a FAQ item about it.

However, this comes up every year, and I thought I’d try to explain why we don’t want Query Hints in an effort to reduce pointless arguments like this week’s thread. Especially since most other major SQL DBMSes have them; it makes us look the odd man out, and makes PostgreSQL look “behind”. But there’s a very simple reason we don’t have them, and it’s not the one you think:

PostgreSQL does not have query hints because we are not a for-profit company

That certainly seems like an odd connection, but I’ll explain it.

Back in the bad old days of Oracle 7 and Informix 5, query planners were pretty dumb. Oracle used a rule-based optimizer which got things wrong as often as it got them right. So it was very important for DBAs to be able to override the query planner, even on a per-operation basis. So Hints became a fairly important means of query tuning.

Warp forward a decade, and most of the leading databases have very sophisticated query planners, using cost-based estimation or more advanced techniques to optimally execute all queries. And they need them; advances in SQL, indexing technologies, and join techniques mean that query planners need to make literally millions of decisions in order to execute even a moderately complex query. Cases where a human can outguess the query planner are relatively rare (like 0.1%) and tend to be ephemeral; once the underlying data changes due to accumulation, we’re back in the query planner’s court.

So if query planners are so cool and exceptions are so rare, why do most of the major DBMSes implement Hints? Because DBAs want them. Not that they need them, but they want them, for a few reasons:

Many older DBAs, and their proteges distrust query planners because of bad experiences back in the 80’s and early 90’s. They are sure they can do a better job, even though they can’t.

Other DBAs are lazy or labor under unrealistic deadlines. Applying a query hint is often faster and easier than diagnosing the real reason the query has a bad plan.

Ignorance and weak software play a role too: good diagnostic tools and techniques for troubleshooting bad queries did not become available until relatively recently, and most DBAs still don’t know how to use them.

Since so many DBAs are convinced that they need Hints, they demand that DBMS vendors supply them. And if those DBAs work for large multinational corporations, the DBMS vendors listen. Whether or not their engineers think query hints are a good idea, or even work.

The developers who work on the PostgreSQL not-for-profit database project, though, have the privilege of not implementing a bad idea just because a lot of people seem to want it. Q.E.D.

All that aside, there are those 0.1% of pathological cases where the query planner does The Wrong Thing even when it’s patently obvious what the right thing is. In our community, that usually leads to patches and improvements in the query planner and the statistics system, but that’s scant consolation to the hapless DBA who’s stuck behind a mandated 3-year upgrade cycle. So what would the PostgreSQL project consider to fix this?

Well, one thought is a system which would allow DBAs to selectively adjust the cost calculations for queries. For example, one which allowed you to put “SELECTIVITY 0.1” after a WHERE clause to tell the planner that despite what it thinks from its statistics, that set of criteria will give you 10% of the table. Even better would be a system of fudging the statistics on database objects, to allow DBAs to indicate (for example) that using a particular index is more costly than it appears because of the poor clustering of the data or because of the complex calculated expression it uses.

The advantage to a “cost fudging” system is that it’s far less vulnerable to rapid obsolescence due to data growth and version upgrades, and it produces better query plans by “nudging” the optimizer instead of putting it in a straitjacket. Data object cost adjustment also has the advantage that it would be centrally auditable. Further, we could gather data about cost fudging from our community of users and use it to improve the database’s cost model.

In 9.0, we’ve already added one such cost-fudging factor: the ability to override ndistinct estimates for columns. With feedback and reproduceable examples from our users, we’ll be able to add more in the future.