As of now, the driving table for parallel query is accessed by

parallel sequential scan which limits its usage to a certain degree.

Parallelising index scans would further increase the usage of parallel

query in many more cases. This patch enables the parallelism for the

btree scans. Supporting parallel index scan for other index types

like hash, gist, spgist can be done as separate patches.

The basic idea is quite similar to parallel heap scans which is that

each worker (including leader whenever possible) will scan a block and

then get the next block that is required to be scan. The parallelism

in implemented at the leaf level of a btree. The first worker to

start a btree scan will scan till leaf and others will wait till the

first worker has reached till leaf. The first worker after reading

the leaf block will set the next block to be read and wake the first

worker waiting to scan the next block and proceed with scanning tuples

from the block it has read, similarly each worker after reading the

block, sets the next block to be read and wakes up the first waiting

worker. This is achieved by using the condition variable patch [1]

proposed by Robert. Parallelism is supported for both forward and

backward scans.

The optimizer will choose the parallelism based on number of pages in

index relation and cpu cost for evaluating the rows is divided equally

among workers. Index Scan node is made parallel aware and can be used

beneath Gather as shown below:

Current Plan for Index Scans

----------------------------------------

Index Scan using idx2 on test (cost=0.42..7378.96 rows=2433 width=29)

Index Cond: (c < 10)

Parallel version of plan

----------------------------------

Gather (cost=1000.42..1243.40 rows=2433 width=29)

Workers Planned: 1

-> Parallel Index Scan using idx2 on test (cost=0.42..0.10

rows=1431 width=29)

Index Cond: (c < 10)

The Parallel index scans can be used in parallelising aggregate

queries as well. For example, given a query like: select count(*)

from t1 where c1 > 1000 and c1 < 1100 and c2='aaa' Group By c2; below

form of parallel plans are possible:

Finalize HashAggregate

Group Key: c2

-> Gather

Workers Planned: 1

-> Partial HashAggregate

Group Key: c2

-> Parallel Index Scan using idx_t1_partial on t1

Index Cond: ((c1 > 1000) AND (c1 < 1100))

Filter: (c2 = 'aaa'::bpchar)

OR

Finalize GroupAggregate

Group Key: c2

-> Sort

-> Gather

Workers Planned: 1

-> Partial GroupAggregate

Group Key: c2

-> Parallel Index Scan using idx_t1_partial on t1

Index Cond: ((c1 > 1000) AND (c1 < 1100))

Filter: (c2 = 'aaa'::bpchar)

In the second plan (GroupAggregate), the Sort + Gather step would be

replaced with GatherMerge, once we have a GatherMerge node as proposed

by Rushabh [2]. Note, that above examples are just taken to explain

the usage of parallel index scan, actual plans will be selected based

on cost.

Performance tests

----------------------------

This test has been performed on community m/c (hydra, POWER-7).

Initialize pgbench with 3000 scale factor (./pgbench -i -s 3000 postgres)

Count the rows in pgbench_accounts based on values of aid and bid

Serial plan

------------------

set max_parallel_workers_per_gather=0;

postgres=# explain analyze select count(aid) from pgbench_accounts

where aid > 1000 and aid < 90000000 and bid > 800 and bid < 900;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------

----

Aggregate (cost=4714590.52..4714590.53 rows=1 width=8) (actual

time=35684.425..35684.425 rows=1 loops=1)

-> Index Scan using pgbench_accounts_pkey on pgbench_accounts

(cost=0.57..4707458.12 rows=2852961 width=4) (actual

time=29210.743..34385.271 rows=9900000 loops

=1)

Index Cond: ((aid > 1000) AND (aid < 90000000))

Filter: ((bid > 800) AND (bid < 900))

Rows Removed by Filter: 80098999

Planning time: 0.183 ms

Execution time: 35684.459 ms

(7 rows)

Parallel Plan

-------------------

set max_parallel_workers_per_gather=2;

postgres=# explain analyze select count(aid) from pgbench_accounts

where aid > 1000 and aid < 90000000 and bid > 800 and bid < 900;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------

Finalize Aggregate (cost=3924773.13..3924773.14 rows=1 width=8)

(actual time=15033.105..15033.105 rows=1 loops=1)

-> Gather (cost=3924772.92..3924773.12 rows=2 width=8) (actual

time=15032.986..15033.093 rows=3 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=3923772.92..3923772.92 rows=1

width=8) (actual time=15030.354..15030.354 rows=1 loops=3)

-> Parallel Index Scan using pgbench_accounts_pkey on

pgbench_accounts (cost=0.57..3920801.08 rows=1188734 width=4) (actual

time=12476.068..14600.410 rows=3300000 loops=3)

Index Cond: ((aid > 1000) AND (aid < 90000000))

Filter: ((bid > 800) AND (bid < 900))

Rows Removed by Filter: 26699666

Planning time: 0.244 ms

Execution time: 15036.081 ms

(11 rows)

The above is a median of 3 runs, all the runs gave almost same

execution time. Here, we can notice that execution time is reduced by

more than half with two workers and I have tested with four workers

where time is reduced to one-fourth (9128.420 ms) of serial plan. I

think these results are quite similar to what we got for parallel

sequential scans. Another thing to note is that parallelising index

scans are more beneficial if there is a Filter which removes many rows

fetched from Index Scan or if the Filter is costly (example - filter

contains costly function execution). This observation is also quite

similar to what we have observed with Parallel Sequential Scans.

I think we can parallelise Index Only Scans as well, but I have not

evaluated the same and certainly it can be done as a separate patch in

future.

Contributions

--------------------

First patch (parallel_index_scan_v1.patch) implements parallelism at

IndexAM level - Rahila Syed and Amit Kapila based on design inputs and

suggestions by Robert Haas

Second patch (parallel_index_opt_exec_support_v1.patch) provides

optimizer and executor support for parallel index scans - Amit Kapila

The order to use these patches is first apply condition variable patch

[1] then parallel_index_scan_v1.patch and then

parallel_index_opt_exec_support_v1.patch

Thoughts?

[1] - https://www.postgresql.org/message-id/CAEepm%3D0zshYwB6wDeJCkrRJeoBM%3DjPYBe%2B-k_VtKRU_8zMLEfA%40mail.gmail.com

[2] - https://www.postgresql.org/message-id/CAGPqQf09oPX-cQRpBKS0Gq49Z%2Bm6KBxgxd_p9gX8CKk_d75HoQ%40mail.gmail.com

--

With Regards,

Amit Kapila.

EnterpriseDB: http://www.enterprisedb.com

