At recent conferences, we have received a curious question from users who have used Oracle in the past and are now using PostgreSQL: “Do we have hypothetical indexes in PostgreSQL ?“. The answer to that question is YES. The true meaning of hypothetical is imaginary but not real. We can try creating indexes that are imaginary for the PostgreSQL optimizer which don’t exist in reality. Let’s look at some detailed examples.

How are hypothetical indexes helpful?

The execution time of an SQL in most of the relational databases depends on the cost of the execution. An optimizer chooses the plan that has the lowest cost, and thus considers further phases such as execute and fetch using that plan. One of the easiest ways of optimizing a well-written SQL is through appropriate indexes that are suitable for that query. An index may reduce the number of pages fetched from disk and may live in the cache due to its size (<<< table size). So, indexing is always a low-hanging fruit for admins and developers wishing to tune an SQL.

But often developers wish to see if an index can be really helpful. The only option available is to create the index and check to see if the queries are able to use it with a reduced cost. Creating the index may be fine if it is in a test environment, or a table that is not huge in size, but if for some reason you wish to see if an index can be helpful for an SQL without actually creating it, then you may test and try this extension to create hypothetical indexes.

PostgreSQL Extension for hypothetical indexes

In order to achieve the functionality of creating imaginary indexes in PostgreSQL, we need to use an extension named hypopg. This extension is made available in the PGDG repository from PostgreSQL 9.2 and works until the latest active release PostgreSQL 11.3. In order to create this extension, you may just use yum or apt depending on your linux distro or compile it from source.

Installing hypopg

In RedHat/CentOS

# yum install hypopg 1 # yum install hypopg

For Debian/Ubuntu, you must make sure you have the development package named: postgresql-server-dev-X where X is the major version.

# apt install postgresql-server-dev-11 (Example for PostgreSQL 11) # apt install postgresql-11-hypopg 1 2 # apt install postgresql-server-dev-11 (Example for PostgreSQL 11) # apt install postgresql-11-hypopg

Creating the Extension

# su - postgres -c "psql -d percona -c 'CREATE EXTENSION hypopg WITH SCHEMA myextensions'" CREATE EXTENSION 1 2 # su - postgres -c "psql -d percona -c 'CREATE EXTENSION hypopg WITH SCHEMA myextensions'" CREATE EXTENSION

In order to show you the list of functions created by hypopg, I have created this extension in a different schema. Here is the list:

percona=# select proname from pg_proc where pronamespace IN (select oid from pg_namespace where nspname = 'myextensions'); proname ---------------------- hypopg hypopg_create_index hypopg_drop_index hypopg_get_indexdef hypopg_list_indexes hypopg_relation_size hypopg_reset (7 rows) 1 2 3 4 5 6 7 8 9 10 11 12 percona = # select proname from pg_proc where pronamespace IN ( select oid from pg_namespace where nspname = 'myextensions' ) ; proname -- -- -- -- -- -- -- -- -- -- -- hypopg hypopg_create_index hypopg_drop_index hypopg_get_indexdef hypopg_list_indexes hypopg_relation_size hypopg_reset ( 7 rows )

hypopg Functions

hypopg_create_index: This function is used to create a hypothetical index. We do not have to specify the name of the index because it is ignored by this function anyway.

percona=# SELECT * FROM myextensions.hypopg_create_index('CREATE INDEX btree_hypo_idx on foo.products USING BTREE(quantity)'); indexrelid | indexname ------------+------------------------------------ 16665 | <16665>btree_foo_products_quantity (1 row) 1 2 3 4 5 percona = # SELECT * FROM myextensions.hypopg_create_index('CREATE INDEX btree_hypo_idx on foo.products USING BTREE(quantity)'); indexrelid | indexname -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 16665 | < 16665 > btree_foo_products_quantity ( 1 row )

hypopg_drop_index: This function is used to drop a hypothetical index upon testing. As soon as you disconnect your session, the index automatically gets dropped as it is only created in the private memory of that session.

percona=# select * from myextensions.hypopg_drop_index(16687); hypopg_drop_index ------------------- t (1 row) 1 2 3 4 5 percona = # select * from myextensions.hypopg_drop_index(16687); hypopg_drop_index -- -- -- -- -- -- -- -- -- - t ( 1 row )

hypopg_get_indexdef: The definition of the hypothetical index we have created using this function.

percona=# select * from hypopg_get_indexdef(16713); hypopg_get_indexdef ----------------------------------------------------- CREATE INDEX ON foo.products USING btree (quantity) (1 row) 1 2 3 4 5 percona = # select * from hypopg_get_indexdef(16713); hypopg_get_indexdef -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - CREATE INDEX ON foo .products USING btree ( quantity ) ( 1 row )

hypopg_list_indexes: This function helps us list all the hypothetical indexes we have created in our session.

percona=# select * from hypopg_list_indexes(); indexrelid | indexname | nspname | relname | amname ------------+------------------------------------+---------+----------+-------- 16713 | <16713>btree_foo_products_quantity | foo | products | btree (1 row) 1 2 3 4 5 percona = # select * from hypopg_list_indexes(); indexrelid | indexname | nspname | relname | amname -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- - + -- -- -- -- -- + -- -- -- -- 16713 | < 16713 > btree_foo_products_quantity | foo | products | btree ( 1 row )

hypopg_relation_size: This function helps us estimate the index size, so we know the approximate amount of space this index could occupy.

percona=# CREATE INDEX idx1 ON foo.products (product_sku); CREATE INDEX percona=# \di+ idx1 List of relations Schema | Name | Type | Owner | Table | Size | Description --------+------+-------+----------+----------+--------+------------- foo | idx1 | index | postgres | products | 631 MB | (1 row) percona=# SELECT * FROM myextensions.hypopg_create_index('CREATE INDEX on foo.products USING BTREE(product_sku)'); indexrelid | indexname ------------+--------------------------------------- 16718 | <16718>btree_foo_products_product_sku (1 row) percona=# select * from pg_size_pretty(hypopg_relation_size(16718)); pg_size_pretty ---------------- 653 MB (1 row) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 percona = # CREATE INDEX idx1 ON foo.products (product_sku); CREATE INDEX percona = # \di+ idx1 List of relations Schema | Name | Type | Owner | Table | Size | Description -- -- -- -- + -- -- -- + -- -- -- - + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- + -- -- -- -- -- -- - foo | idx1 | index | postgres | products | 631 MB | ( 1 row ) percona = # SELECT * FROM myextensions.hypopg_create_index('CREATE INDEX on foo.products USING BTREE(product_sku)'); indexrelid | indexname -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 16718 | < 16718 > btree_foo_products_product_sku ( 1 row ) percona = # select * from pg_size_pretty(hypopg_relation_size(16718)); pg_size_pretty -- -- -- -- -- -- -- -- 653 MB ( 1 row )

hypopg_reset: This function drops all the hypothetical indexes created by us in a single command.

percona=# select * from hypopg_list_indexes(); indexrelid | indexname | nspname | relname | amname ------------+---------------------------------------+---------+----------+-------- 16715 | <16715>btree_foo_products_quantity | foo | products | btree 16716 | <16716>btree_foo_products_product_sku | foo | products | btree (2 rows) percona=# select * from hypopg_reset(); hypopg_reset -------------- (1 row) percona=# select * from hypopg_list_indexes(); indexrelid | indexname | nspname | relname | amname ------------+-----------+---------+---------+-------- (0 rows) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 percona = # select * from hypopg_list_indexes(); indexrelid | indexname | nspname | relname | amname -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- - + -- -- -- -- -- + -- -- -- -- 16715 | < 16715 > btree_foo_products_quantity | foo | products | btree 16716 | < 16716 > btree_foo_products_product_sku | foo | products | btree ( 2 rows ) percona = # select * from hypopg_reset(); hypopg_reset -- -- -- -- -- -- -- ( 1 row ) percona = # select * from hypopg_list_indexes(); indexrelid | indexname | nspname | relname | amname -- -- -- -- -- -- + -- -- -- -- -- - + -- -- -- -- - + -- -- -- -- - + -- -- -- -- ( 0 rows )

How does a hypothetical index work in reality?

When you use the hypopg_create_index function to create a hypothetical index, it creates an index in our connection’s private memory. None of the catalog tables nor the original tables are actually touched. The only way to see if we can benefit from that index is by running an EXPLAIN <QUERY>. However, if you wish to run an EXPLAIN ANALYZE that runs that SQL and provides the run time stats, it would not be able to use that hypothetical index. This is because a hypothetical index does not exist in reality.

Testing hypothetical indexes

While testing hypothetical indexes, I want to check if there are any cases where it may not work. Because our main reason for using this extension is to test whether the optimizer could identify the index we wish to create, I have thought of testing almost all types of the most-used indexes in PostgreSQL such as BTREE, HASH, GIN, GiST, BRIN, and BLOOM indexes. Only BTREE indexes are currently mentioned as applicable for hypothetical indexes.

For this reason, I have created the following table which may not make much sense for a real-time use case but helps us test all the types of indexes we could create.

percona=# CREATE TABLE foo.products (product_sku character(8), quantity int, location point, comments text, comments_tsv tsvector, assembled_days integer[], manufactured_date timestamptz); CREATE TABLE percona=# INSERT INTO foo.products VALUES ('a1b2c3d4','2',point '(1,1)','This is my first message',NULL,ARRAY[1,3,5], now() - interval '11 days'); ....................................................................................................................................... ....................................................................................................................................... percona=# INSERT INTO foo.products VALUES ('j1l2m3n4','1000000',point '(5,2)','This is my millionth message',NULL,ARRAY[2,5,7], now() - interval '10 days'); percona=# update foo.products set comments_tsv = to_tsvector(comments); UPDATE 1000000 percona=# select * from foo.products LIMIT 1; product_sku | quantity | location | comments | comments_tsv | assembled_days | manufactured_date -------------+----------+----------+----------------------------+------------------------+----------------+------------------------------- a1b2c3d4 | 2 | (1,1) | This is my first message | 'first':4 'messag':5 | {1,3,5} | 2019-06-01 17:31:35.632891-04 (1 row) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 percona = # CREATE TABLE foo.products (product_sku character(8), quantity int, location point, comments text, comments_tsv tsvector , assembled_days integer [ ] , manufactured_date timestamptz ) ; CREATE TABLE percona = # INSERT INTO foo.products VALUES ('a1b2c3d4','2',point '(1,1)','This is my first message',NULL,ARRAY[1,3,5], now() - interval '11 days'); . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . percona = # INSERT INTO foo.products VALUES ('j1l2m3n4','1000000',point '(5,2)','This is my millionth message',NULL,ARRAY[2,5,7], now() - interval '10 days'); percona = # update foo.products set comments_tsv = to_tsvector(comments); UPDATE 1000000 percona = # select * from foo.products LIMIT 1; product_sku | quantity | location | comments | comments_tsv | assembled_days | manufactured_date -- -- -- -- -- -- - + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - a1b2c3d4 | 2 | ( 1 , 1 ) | This is my first message | 'first' : 4 'messag' : 5 | { 1 , 3 , 5 } | 2019 - 06 - 01 17 : 31 : 35.632891 - 04 ( 1 row )

Btree Indexes

We may try creating a BTREE index on the quantity column to speed up the performance of queries that use that column as a predicate. As you can see in the following log, I have created a hypothetical btree index, and the EXPLAIN on the SQL shows an index scan where the cost of using an index is much less than a sequence scan without the index. With this exercise, we know that creating this index could help optimize the SQL we tested.

WITHOUT Index -------------- percona=# EXPLAIN select * from foo.products where quantity IN (4,5,6); QUERY PLAN --------------------------------------------------------------------- Seq Scan on products (cost=0.00..716349.40 rows=6246717 width=128) Filter: (quantity = ANY ('{4,5,6}'::integer[])) (2 rows) Create Hypothetical Index -------------------------- percona=# SELECT * FROM myextensions.hypopg_create_index('CREATE INDEX btree_hypo_idx on foo.products USING BTREE(quantity)'); indexrelid | indexname ------------+------------------------------------ 16665 | <16665>btree_foo_products_quantity (1 row) WITH Hypothetical Index ------------------------ percona=# EXPLAIN select * from foo.products where quantity IN (4,5,6); QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Index Scan using <16665>btree_foo_products_quantity on products (cost=0.06..546930.72 rows=6246729 width=128) Index Cond: (quantity = ANY ('{4,5,6}'::integer[])) (2 rows) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 WITHOUT Index -- -- -- -- -- -- -- percona = # EXPLAIN select * from foo.products where quantity IN (4,5,6); QUERY PLAN -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - Seq Scan on products ( cost = 0.00..716349.40 rows = 6246717 width = 128 ) Filter : ( quantity = ANY ( '{4,5,6}' :: integer [ ] ) ) ( 2 rows ) Create Hypothetical Index -- -- -- -- -- -- -- -- -- -- -- -- -- percona = # SELECT * FROM myextensions .hypopg_create_index ( 'CREATE INDEX btree_hypo_idx on foo.products USING BTREE(quantity)' ) ; indexrelid | indexname -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 16665 | < 16665 > btree_foo_products_quantity ( 1 row ) WITH Hypothetical Index -- -- -- -- -- -- -- -- -- -- -- -- percona = # EXPLAIN select * from foo.products where quantity IN (4,5,6); QUERY PLAN -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Index Scan using < 16665 > btree_foo_products_quantity on products ( cost = 0.06..546930.72 rows = 6246729 width = 128 ) Index Cond : ( quantity = ANY ( '{4,5,6}' :: integer [ ] ) ) ( 2 rows )

BLOOM Indexes

The documentation does not mention that hypothetical indexes currently work for BLOOM indexes. However, I have tested to see if an EXPLAIN could show a plan that uses the hypothetical bloom index, and it did indeed work.

percona=# CREATE EXTENSION bloom ; CREATE EXTENSION percona=# SELECT * FROM myextensions.hypopg_create_index('create index on foo.products using bloom(quantity);'); indexrelid | indexname ------------+------------------------------------ 16703 | <16703>bloom_foo_products_quantity (1 row) percona=# SET enable_seqscan TO OFF; SET percona=# EXPLAIN select * from foo.products WHERE quantity = 4; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on products (cost=199449.64..873500.25 rows=4164944 width=128) Recheck Cond: (quantity = 4) -> Bitmap Index Scan on <16703>bloom_foo_products_quantity (cost=0.00..198408.40 rows=4164944 width=0) Index Cond: (quantity = 4) (4 rows) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 percona = # CREATE EXTENSION bloom ; CREATE EXTENSION percona = # SELECT * FROM myextensions .hypopg_create_index ( 'create index on foo.products using bloom(quantity);' ) ; indexrelid | indexname -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 16703 | < 16703 > bloom_foo_products_quantity ( 1 row ) percona = # SET enable_seqscan TO OFF; SET percona = # EXPLAIN select * from foo.products WHERE quantity = 4; QUERY PLAN -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Bitmap Heap Scan on products ( cost = 199449.64..873500.25 rows = 4164944 width = 128 ) Recheck Cond : ( quantity = 4 ) -> Bitmap Index Scan on < 16703 > bloom_foo_products_quantity ( cost = 0.00..198408.40 rows = 4164944 width = 0 ) Index Cond : ( quantity = 4 ) ( 4 rows )

BRIN Indexes

I have then tried to create a hypothetical BRIN index to see if I get any errors because only btree hypothetical indexes are currently supported. To my surprise, I didn’t see any errors. But then I do see a strange error when trying to do an EXPLAIN on the SQL, or even when I try to run a SELECT on that table, as you see in the following log. So, just because it allows you to create the brin hypothetical index doesn’t mean it will work.

percona=# SELECT * FROM myextensions.hypopg_create_index('create index on foo.products using brin(manufactured_date) with (pages_per_range=4);'); indexrelid | indexname ------------+-------------------------------------------- 16669 | <16669>brin_foo_products_manufactured_date (1 row) percona=# select * from hypopg_list_indexes(); indexrelid | indexname | nspname | relname | amname ------------+--------------------------------------------+---------+----------+-------- 16669 | <16669>brin_foo_products_manufactured_date | foo | products | brin (1 row) percona=# EXPLAIN select * from foo.products WHERE manufactured_date < '2019-06-03 17:31:35'; ERROR: could not open relation with OID 16669 percona=# select count(*) from foo.products WHERE manufactured_date < '2019-06-03 17:31:35'; ERROR: could not open relation with OID 16669 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 percona = # SELECT * FROM myextensions .hypopg_create_index ( 'create index on foo.products using brin(manufactured_date) with (pages_per_range=4);' ) ; indexrelid | indexname -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 16669 | < 16669 > brin_foo_products_manufactured_date ( 1 row ) percona = # select * from hypopg_list_indexes(); indexrelid | indexname | nspname | relname | amname -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- - + -- -- -- -- -- + -- -- -- -- 16669 | < 16669 > brin_foo_products_manufactured_date | foo | products | brin ( 1 row ) percona = # EXPLAIN select * from foo.products WHERE manufactured_date < '2019-06-03 17:31:35'; ERROR : could not open relation with OID 16669 percona = # select count(*) from foo.products WHERE manufactured_date < '2019-06-03 17:31:35'; ERROR : could not open relation with OID 16669

Hash, GIN, GiST or other Indexes

Unlike BRIN indexes, when we try to create any other type of hypothetical index, it throws an exact error message that states that the index type we have specified is not supported.

Hash Index ----------- percona=# SELECT * FROM myextensions.hypopg_create_index('CREATE INDEX hash_hypo_idx on foo.products USING HASH(product_sku)'); ERROR: hypopg: access method "hash" is not supported GiST Index ----------- percona=# SELECT * FROM myextensions.hypopg_create_index('create index on foo.products using gist(location);'); ERROR: hypopg: access method "gist" is not supported percona=# GIN Index ---------- percona=# SELECT * FROM myextensions.hypopg_create_index('create index on foo.products using gin(comments_tsv);'); ERROR: hypopg: access method "gin" is not supported percona=# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Hash Index -- -- -- -- -- - percona = # SELECT * FROM myextensions .hypopg_create_index ( 'CREATE INDEX hash_hypo_idx on foo.products USING HASH(product_sku)' ) ; ERROR : hypopg : access method "hash" is not supported GiST Index -- -- -- -- -- - percona = # SELECT * FROM myextensions .hypopg_create_index ( 'create index on foo.products using gist(location);' ) ; ERROR : hypopg : access method "gist" is not supported percona = # GIN Index -- -- -- -- -- percona = # SELECT * FROM myextensions .hypopg_create_index ( 'create index on foo.products using gin(comments_tsv);' ) ; ERROR : hypopg : access method "gin" is not supported percona = #

Conclusion

It is great to see some effort towards helping developers and admins with an extension which helps them know if an index can be used without actually having to create it. This is something we could use to automate index recommendations with a combination of another extension called: pg_qualstats, which we shall discuss in my next blog post. Thanks to all the contributors who have helped us achieve this functionality in PostgreSQL.

Discuss on HackerNews.