Optimizing postgresql query for DISTINCT values

If you want to get DISTINCT values from large tables (and by large I mean hundreds of thousands of rows), it is painfully slow.

Simple query like this: SELECT DISTINCT(day) FROM reports can take a lot of time to be processed.

Here is how you can improve its speed dramatically using postgres recursive functionality:

WITH RECURSIVE t(n) AS ( SELECT MIN(day) FROM reports UNION SELECT (SELECT day FROM reports WHERE day > n ORDER BY day LIMIT 1) FROM t WHERE n IS NOT NULL ) SELECT n FROM t;

I’ve tested it against 50 million rows with 365 distinct values for day, and it executed in 8ms, opposed to 15.7 seconds for SELECT DISTINCT query.

There are only two prerequisites:

you need index column you want to get distincts from.

you have postgres 8.4+

For more info, check out the postgresql docs on the topic: http://www.postgresql.org/docs/9.1/static/queries-with.html