In short: using the tag_titles index brings the query’s performance from O(n) to O(log n + log n) = O(log n) which is similar to if we were to query on the primary key.

You can prefix the query with EXPLAIN QUERY PLAN to verify that the index is used.

EXPLAIN QUERY PLAN SELECT * FROM tags WHERE title = "Kotlin";

“Covering Indexes”

In situations where you tend to do a lot of querying for subsets of your table’s data, you can utilize what are called “Covering Indexes” to eliminate that second binary search from the query planning operation.

Say our application needs to be able to display the description as well as the title of a particular tag when you search for it.

SELECT title, description FROM tags WHERE title = "Kotlin";

We saw that with the index described in the last section: one binary search is done to find the rowid of the record where the title is “Kotlin”, and a second is done to find the remainder of the data that was required. Let’s create a new index which includes the description as well as the title :

CREATE INDEX tag_title_desc ON tags (title, description);

Now when you run the query, SQLite’s query planner is smart enough to recognize that we only need data which is contained within the index and can return that data immediately after the index search without resorting to looking at the actual records in the tags table.

EXPLAIN QUERY PLAN

SELECT title, description FROM tags WHERE title = "Kotlin";

The tag_title_desc is considered a “covering index” because it covers all the fields needed to satisfy the query.

One O(log n) operation is better than two, especially when the data set is huge.

- some smart person

Side Note: Although the tag_title_desc defines description as a field on the index, it won’t help you if you’re trying to query the tags table by description. This is because the index is sorted by title first, and description second, and there’s no good way to binary-search the descriptions using an index like that. If we need that functionality, we should consider adding a new index on the description field.

Using Indexes with ORDER BY

When you use ORDER BY without an index on the column to be sorted, SQLite needs to build up a temporary data structure that contains the sorted results each time the query is executed. That data structure will use up space in memory (or on disk, depending on the circumstances) and will be torn down after the query is executed. This can be a lot of extra work that might noticeably degrade performance.

However, in addition to helping with quickly finding records: indexes are used by the SQLite query planner to speed up sorting when the ORDER BY clause references indexed columns. For example:

SELECT * FROM tags ORDER BY title ASC;

Here, SQLite doesn’t need to build up the temporary data structure and can simply iterate through the tag_titles index. When fetching the data for the result set it will perform binary searches to retrieve each tags record.

Both sorting with a non-covering index and without are O(n log n) operations, but eliminating the need to generate the temporary data structure cuts out a lot of work.

When a covering index can be used, the sort is even faster because SQLite can omit those binary searches all-together and the query becomes O(n) because it only needs to iterate through the index and return the covered columns from the index.

When should I consider creating a new index?

Because indexes are sorted data structures and their benefit comes from how binary search works, it’s important to ensure that your indexed columns have what is called “high cardinality”. All this means is that the indexed data has a lot of uniqueness.

Why is this important? Well, without a lot of uniqueness to your data, the binary search can end up becoming an O(n) linear scan anyway.

The Costs

Additionally, it’s important to remember two important costs to creating and maintaining indexes on your tables:

Size

Indexes need to be stored in the database file, and their structure is similar to tables. You should remember that adding an index on a column or a set of columns effectively doubles the storage used to keep the data those columns represent. Insert/Update/Delete Performance

When you insert, update, or delete records in your indexed tables, it’s important that the indexes are kept up to date as well as in the correct order. Each of these operations will take slightly longer when there are indexes on the table than when there are no indexes. If you’re going to be doing a lot of updates, consider dropping your indexes before the operation and re-creating them afterwards.

Conclusion

Indexes are a powerful tool at your disposal to boost performance of your application’s data retrieval. They help with querying and sorting the data, and when covering indexes are used correctly can make lookups incredibly fast. Covering indexes take the whole idea to the next level.

Indexes also come with some overhead: they will use additional space, and can increase the time it takes to insert or update records because SQLite needs to keep the indexes up to date. It’s important to weigh the pros and cons before you add an index to your table.

This post touches on the basics of indexes in SQLite, but you can read a whole lot more about how SQLite’s query planner works and its use of indexes in the official documentation. Additionally, for the sake of making this post more approachable, I purposefully glossed over how the indexes are built and maintained. You can read more about how SQLite (and other databases) maintain indexes by learning about B-trees.

Finally, a lot of what we discussed in this post can be applied to how indexes work in other database systems.

If you enjoyed this post, please tap that heart button to recommend it to your followers. Also, you should check out the other posts in my series: “Squeezing Performance from SQLite”: