By: Ben Snaidero | Updated: 2017-07-07 | Comments (7) | Related: More > Indexing

Problem

I've read many SQL Server best practice documents that state every table should have a clustered index as opposed to leaving it as a heap structure with nonclustered indexes. Most of the documentation notes the administrative benefits of using clustered indexes. Are there any performance implications, either positive or negative, from implementing a clustered index on a table?

Solution

In order to test the performance of each of these types of table/index structures we will create two identical tables with one difference. One will have a primary key created with a clustered index and the other will have the primary key created using a nonclustered index so the table data will stay in the heap structure. Below is the T-SQL to create and load sample data into these tables.

-- Create table and indexes CREATE TABLE testtable ([col1] [int] NOT NULL PRIMARY KEY CLUSTERED, [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL, [col5] uniqueidentifier); -- Load sample data into table DECLARE @val INT SELECT @val=1 WHILE @val < 5000000 BEGIN INSERT INTO testtable (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @[email protected]+1 END GO -- Create sample table and indexes CREATE TABLE testtable2 ([col1] [int] NOT NULL PRIMARY KEY NONCLUSTERED, [col2] [int] NULL, [col3] [int] NULL, [col4] [varchar](50) NULL, [col5] uniqueidentifier); INSERT INTO testtable2 SELECT * FROM testtable;

One thing I noticed right away was that the heap table was using more space since the table and index are separate structures. With the clustered index the data is stored in the leaf pages of the index so less space is used. Below is a chart that shows how much space is used by each object after the initial load.

TableName IndexName Used (KB) Reserved (KB) Rows testtable PK__testtabl__357D0D3E3D086A66 257952 257992 4999999 testtable2 HEAP 256992 257032 4999999 testtable2 PK__testtabl__357D0D3F2CBA35D8 89432 89608 4999999

Now that we have some data loaded let's perform another data load and monitor the performance using SQL Profiler and check the space usage after the inserts complete. Below is the T-SQL to load another 100,000 records each table.

-- insert when all pages are full DECLARE @val INT SELECT @val=5000000 WHILE @val < 5100000 BEGIN INSERT INTO testtable (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @[email protected]+1 END GO DECLARE @val INT SELECT @val=5000000 WHILE @val < 5100000 BEGIN INSERT INTO testtable2 (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @[email protected]+1 END GO

Looking first at the space usage as one would expect (since the pages were all full ie. nothing has been deleted from the table) each object is using proportionately more space. Both the heap and clustered index increased in size by almost the same amount.

TableName IndexName Used (KB) Reserved (KB) Rows testtable PK__testtabl__357D0D3E3D086A66 263128 263176 5099999 testtable2 HEAP 262392 262472 5099999 testtable2 PK__testtabl__357D0D3F2CBA35D8 91216 91272 5099999

A little more interesting is the SQL Profiler output. Due to the fact that there are two objects to be updated the nonclustered index and heap table insert requires slightly more CPU and performs more reads and writes than the clustered index insert. It also takes slightly longer to complete.

Index Type CPU (ms) Reads Writes Duration (ms) CLUSTERED 3500 304919 654 11288 HEAP 3890 406083 904 11438

Let's now randomly delete some data and then run another 100,000 record insert to see if having free space on some of the pages in the table makes any difference with performance. Below is the T-SQL to perform both the delete and the data load.

-- remove 1000000 random records from each table DELETE FROM testtable WHERE col1 in (SELECT TOP 1000000 col1 FROM testtable ORDER BY newid()); DELETE FROM testtable2 WHERE col1 not in (SELECT col1 FROM testtable); GO -- insert when there is free space in pages DECLARE @val INT SELECT @val=5100000 WHILE @val < 5200000 BEGIN INSERT INTO testtable (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @[email protected]+1 END GO DECLARE @val INT SELECT @val=5100000 WHILE @val < 5200000 BEGIN INSERT INTO testtable2 (col1, col2, col3, col4, col5) VALUES (@val,round(rand()*100000,0), round(rand()*100000,0),'TEST' + cast(@val AS VARCHAR), newid()) SELECT @[email protected]+1 END GO

Looking at the SQL Profiler output we see a much larger performance gap between the two types of structures this time with the table with the clustered index having much better performance.

Index Type CPU (ms) Reads Writes Duration (ms) CLUSTERED 3562 304859 653 10334 HEAP 4973 422142 7053 13042

This difference is most likely due to the fact that when inserting data into a heap table the database engine will search for empty space within each page to store the new data being inserted. This is done since the data in the heap table is unsorted so it can go anywhere. The table with the clustered index simply has data appended to the end of the clustered index since the primary key is increasing. Let's also take a look at the space usage for the table. If the heap structure is indeed reusing space it should not have grown any more since we deleted 1,000,000 records and only inserted 100,000. Below is the table sizes after this last batch of inserts completed. We can see here that the size of the heap table has not changed confirming our earlier assumption.

TableName IndexName Used (KB) Reserved (KB) Rows testtable PK__testtabl__357D0D3E3D086A66 268304 268360 4199999 testtable2 HEAP 262392 262472 4199999 testtable2 PK__testtabl__357D0D3F2CBA35D8 93008 93064 4199999

Summary

In this test case we showed that using a clustered index gave us better performance in every category that we measured, CPU, IO and total duration. The only side effect of using a clustered index was that it did use a little bit more space. This space could also be reclaimed by rebuilding the index during a maintenance window (or online if you are running Enterprise Edition). In any case, before implementing any index changes in production you should always test in your own environment to ensure you also see a positive improvement in performance.

Next Steps

About the author