Optimized search with PostgreSQL

Your search is anchored at the start and no fuzzy search logic is required. This is not the typical use case for full text search.

If it gets more fuzzy or your search is not anchored at the start, look here for more:

Similar UTF-8 strings for autocomplete field

More on pattern matching in Postgres.

In PostgreSQL you can make use of advanced index features that should make the query very fast. In particular look at operator classes and indexes on expressions.

1) text_pattern_ops

Assuming your column is of type text, you would use a special index for text pattern operators like this:

CREATE INDEX name_text_pattern_ops_idx ON tbl (name text_pattern_ops); SELECT name FROM tbl WHERE name ~~ ('Hambu' || '%');

This is assuming that you operate with a database locale other than C - most likely de_DE.UTF-8 in your case. You could also set up a database with locale 'C'. I quote the manual here:

If you do use the C locale, you do not need the xxx_pattern_ops operator classes, because an index with the default operator class is usable for pattern-matching queries in the C locale.

2) Index on expression

I'd imagine you would also want to make that search case insensitive. so let's take another step and make that an index on an expression:

CREATE INDEX lower_name_text_pattern_ops_idx ON tbl (lower(name) text_pattern_ops); SELECT name FROM tbl WHERE lower(name) ~~ (lower('Hambu') || '%');

To make use of the index, the WHERE clause has to match the the index expression.

3) Optimize index size and speed

Finally, you might also want to impose a limit on the number of leading characters to minimize the size of your index and speed things up even further:

CREATE INDEX lower_left_name_text_pattern_ops_idx ON tbl (lower(left(name,10)) text_pattern_ops); SELECT name FROM tbl WHERE lower(left(name,10)) ~~ (lower('Hambu') || '%');

left() was introduces with Postgres 9.1. Use substring(name, 1,10) in older versions.

4) Cover all possible requests

What about strings with more than 10 characters?

SELECT name FROM tbl WHERE lower(left(name,10)) ~ (lower(left('Hambu678910',10)) || '%'); AND lower(name) ~~ (lower('Hambu678910') || '%');

This looks redundant, but you need to spell it out this way to actually use the index. Index search will narrow it down to a few entries, the additional clause filters the rest. Experiment to find the sweet spot. Depends on data distribution and typical use cases. 10 characters seem like a good starting point. For more than 10 characters, left() effectively turns into a very fast and simple hashing algorithm that's good enough for many (but not all) use cases.

5) Optimize disc representation with CLUSTER

So, the predominant access pattern will be to retrieve a bunch of adjacent rows according to our index lower_left_name_text_pattern_ops_idx . And you mostly read and hardly ever write. This is a textbook case for CLUSTER . I quote the manual:

When a table is clustered, it is physically reordered based on the index information.

With a huge table like yours, this can dramatically improve response time because all rows to be fetched are in the same or adjacent blocks on disk.

First call:

CLUSTER tbl USING lower_left_name_text_pattern_ops_idx;

Information which index to use will be saved and successive calls will re-cluster the table:

CLUSTER tbl; CLUSTER; -- cluster all tables in the db that have previously been clustered.

If you don't want to repeat it:

ALTER TABLE tbl SET WITHOUT CLUSTER;

For tables with more write load look into pg_repack , which can doe the same without exclusive lock on the table.

6) Prevent too many rows in the result

Demand a minimum of, say, 3 or 4 characters for the search string. I add this for completeness, you probably do it anyway.

And LIMIT the number of rows returned:

SELECT name FROM tbl WHERE lower(left(name,10)) ~~ (lower('Hambu') || '%') LIMIT 501;

If your query returns more than 500 rows, tell the user to narrow down his search.

7) Optimize filter method (operators)

If you absolutely must squeeze out every last microsecond, you can utilize operators of the text_pattern_ops family. Like this:

SELECT name FROM tbl WHERE lower(left(name, 10)) ~>=~ lower('Hambu') AND lower(left(name, 10)) ~<=~ (lower('Hambu') || chr(2097151));

You gain very little with this last stunt. Normally, standard operators are a better choice.

If you do all that, search time will be reduced to a matter of milliseconds.