The index-only scan is one of the most powerful tuning methods of all. It not only avoids accessing the table to evaluate the where clause, but avoids accessing the table completely if the database can find the selected columns in the index itself.

To cover an entire query, an index must contain all columns from the SQL statement—in particular also the columns from the select clause as shown in the following example:

CREATE INDEX sales_sub_eur ON sales ( subsidiary_id, eur_value )

SELECT SUM(eur_value) FROM sales WHERE subsidiary_id = ?

Of course indexing the where clause takes precedence over the other clauses. The column SUBSIDIARY_ID is therefore in the first position so it qualifies as an access predicate.

On my Own Behalf I make my living from training, other SQL related services and selling my book. Learn more at https://winand.at/.

The execution plan shows the index scan without a subsequent table access ( TABLE ACCESS BY INDEX ROWID ).

DB2 Explain Plan --------------------------------------------------------------- ID | Operation | Rows | Cost 1 | RETURN | | 21 2 | GRPBY (COMPLETE) | 1 of 34804 ( .00%) | 21 3 | IXSCAN SALES_SUB_EUR | 34804 of 1009326 ( 3.45%) | 19 Predicate Information 3 - START (Q1.SUBSIDIARY_ID = ?) STOP (Q1.SUBSIDIARY_ID = ?) Oracle ---------------------------------------------------------- | Id | Operation | Name | Rows | Cost | ---------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 104 | | 1 | SORT AGGREGATE | | 1 | | |* 2 | INDEX RANGE SCAN| SALES_SUB_EUR | 40388 | 104 | ---------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("SUBSIDIARY_ID"=TO_NUMBER(:A))

The index covers the entire query so it is also called a covering index.

Note If an index prevents a table access it is also called a covering index. The term is misleading, however, because it sounds like an index property. The phrase index-only scan correctly suggests that it is an execution plan operation.

The index has a copy of the EUR_VALUE column so the database can use the value stored in the index. Accessing the table is not required because the index has all of the information to satisfy the query.

An index-only scan can improve performance enormously. Just look at the row count estimate in the execution plan: the optimizer expects to aggregate more than 40,000 rows. That means that the index-only scan prevents 40,000 table fetches—if each row is in a different table block. If the index has a good clustering factor—that is, if the respective rows are well clustered in a few table blocks—the advantage may be significantly lower.

Besides the clustering factor, the number of selected rows limits the potential performance gain of an index-only scan. If you select a single row, for example, you can only save a single table access. Considering that the tree traversal needs to fetch a few blocks as well, the saved table access might become negligible.

Important The performance advantage of an index-only scans depends on the number of accessed rows and the index clustering factor.

The index-only scan is an aggressive indexing strategy. Do not design an index for an index-only scan on suspicion only because it unnecessarily uses memory and increases the maintenance effort needed for update statements. See Chapter 8, “Modifying Data”. In practice, you should first index without considering the select clause and only extend the index if needed.

Index-only scans can also cause unpleasant surprises, for example if we limit the query to recent sales:

SELECT SUM(eur_value) FROM sales WHERE subsidiary_id = ? AND sale_date > ?

Without looking at the execution plan, one could expect the query to run faster because it selects fewer rows. The where clause, however, refers to a column that is not in the index so that the database must access the table to load this column.

DB2 Explain Plan ------------------------------------------------------------------- ID | Operation | Rows | Cost 1 | RETURN | | 13547 2 | GRPBY (COMPLETE) | 1 of 1223 ( .08%) | 13547 3 | FETCH SALES | 1223 of 34804 ( 3.51%) | 13547 4 | RIDSCN | 34804 of 34804 (100.00%) | 32 5 | SORT (UNIQUE) | 34804 of 34804 (100.00%) | 32 6 | IXSCAN SALES_SUB_EUR | 34804 of 1009326 ( 3.45%) | 19 Predicate Information 3 - SARG (? < Q1.SALE_DATE) SARG (Q1.SUBSIDIARY_ID = ?) 6 - START (Q1.SUBSIDIARY_ID = ?) STOP (Q1.SUBSIDIARY_ID = ?) Oracle -------------------------------------------------------------- |Id | Operation | Name | Rows |Cost | -------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 371 | | 1 | SORT AGGREGATE | | 1 | | |*2 | TABLE ACCESS BY INDEX ROWID| SALES | 2019 | 371 | |*3 | INDEX RANGE SCAN | SALES_DATE| 10541 | 30 | -------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("SUBSIDIARY_ID"=TO_NUMBER(:A)) 3 - access("SALE_DATE">:B)

The table access increases the response time although the query selects fewer rows. The relevant factor is not how many rows the query delivers but how many rows the database must inspect to find them.

Warning Extending the where clause can cause “illogical” performance behavior. Check the execution plan before extending queries.

If an index can no longer be used for an index-only scan, the optimizer will choose the next best execution plan. That means the optimizer might select an entirely different execution plan or, as above, a similar execution plan with another index. In this case it uses an index on SALE_DATE , which is a leftover from the previous chapter.

From the optimizer’s perspective, this index has two advantages over SALES_SUB_EUR . The optimizer believes that the filter on SALE_DATE is more selective than the one on SUBSIDIARY_ID . You can see that in the respective “Rows” column of the last two execution plans (about 10,000 versus 40,000). These estimations are, however, purely arbitrary because the query uses bind parameters. The SALE_DATE condition could, for example, select the entire table when providing the date of the first sale.

The second advantage of the SALES_DATE index is that is has a better clustering factor. This is a valid reason because the SALES table only grows chronologically. New rows are always appended to the end of the table as long as there are no rows deleted. The table order therefore corresponds to the index order because both are roughly sorted chronologically—the index has a good clustering factor.

When using an index with a good clustering factor, the selected tables rows are stored closely together so that the database only needs to read a few table blocks to get all the rows. Using this index, the query might be fast enough without an index-only scan. In this case we should remove the unneeded columns from the other index again.

Note Some indexes have a good clustering factor automatically so that the performance advantage of an index-only scan is minimal.

In this particular example, there was a happy coincidence. The new filter on SALE_DATE not only prevented an index-only scan but also opened a new access path at the same time. The optimizer was therefore able to limit the performance impact of this change. It is, however, also possible to prevent an index only scan by adding columns to other clauses. However adding a column to the select clause can never open a new access path which could limit the impact of losing the index-only scan.

Tip Maintain your index-only scans. Add comments that remind you about an index-only scan and refer to that page so anyone can read about it.

Function-based indexes can also cause unpleasant surprises in connection with index-only scans. An index on UPPER(last_name) cannot be used for an index-only scan when selecting the LAST_NAME column. In the previous section we should have indexed the LAST_NAME column itself to support the LIKE filter and allow it to be used for an index-only scan when selecting the LAST_NAME column.

Tip Always aim to index the original data as that is often the most useful information you can put into an index. Avoid function-based indexing for expressions that cannot be used as access predicates.

Aggregating queries like the one shown above make good candidates for index-only scans. They query many rows but only a few columns, making a slim index sufficient for supporting an index-only scan. The more columns you query, the more columns you have to add to the indexed to support an index-only scan. As a developer you should therefore only select the columns you really need.

Tip Avoid select * and fetch only the columns you need.

Regardless of the fact that indexing many rows needs a lot of space, you can also reach the limits of your database. Most databases impose rather rigid limits on the number of columns per index and the total size of an index entry. That means you cannot index an arbitrary number of columns nor arbitrarily long columns. The following overview lists the most important limitations. Nevertheless there are indexes that cover an entire table as we see in the next section.

INCLUDE : Non-key Columns SQL Server and PostgreSQL 11+ support so-called non-key columns in B-tree indexes. They are—other than the key columns, which we discussed so far—only stored in the leaf nodes and can thus not be used for access predicates. Non-key columns are specified in the include clause: CREATE INDEX empsubupnam ON employees (subsidiary_id, last_name) INCLUDE(phone_number, first_name)