You’re doing it wrong.

You didn’t mean to be that way, I know. But the truth is things happen. When they do I am brought in to find out what is happening, what has gone wrong. I see repeating patterns in the database issues I am asked to investigate. I keep track of them as I see them and have been able to place them into some general buckets for you here.

When it comes to database performance these are the five factors that I see causing all end users pain.

1. Memory Configuration

I’ve written before about memory configurations but this bears repeating: if you are relying on the default settings for Microsoft SQL Server then you are simply asking for trouble.

SQL Server (and other database systems such as Oracle and Sybase) need to read data pages into their internal memory before they can be used. Of course your server needs memory to operate as well. When your database engine and your server are competing for the same memory resources, you get bad performance. You want your server and your database engine to be like dancing partners, and less like my kids fighting over the last cupcake.

It’s is often easy to spot when you are having memory pressure for your database server. The Available Mbytes memory counter is the leading indicator that your server O/S is seeing significant memory pressure (PRO TIP: if the amount of ‘available’ memory is low, then that is often bad). From there you can investigate a bit more to see how much of the available server memory is being used by your database instance. For SQL server a quick query would be this one from Glenn Berry (blog | @GlennAlanBerry):

-- Good basic information about OS memory amounts and state (Query 28) SELECT total_physical_memory_kb, available_physical_memory_kb, total_page_file_kb, available_page_file_kb, system_memory_state_desc FROM sys.dm_os_sys_memory WITH (NOLOCK) OPTION (RECOMPILE); -- You want to see "Available physical memory is high" -- This indicates that you are not under external memory pressure -- Good basic information about OS memory amounts and state (Query 28) SELECT total_physical_memory_kb, available_physical_memory_kb, total_page_file_kb, available_page_file_kb, system_memory_state_desc FROM sys.dm_os_sys_memory WITH (NOLOCK) OPTION (RECOMPILE); -- You want to see "Available physical memory is high" -- This indicates that you are not under external memory pressure

The best way to avoid this issue altogether is to configure the max memory setting for your instance. If you want to know what value you should choose for your max memory setting I would point you to this blog post by Jonathan Kehayias (blog | @SQLPoolboy), I like the formula he has been using, it is better than any others I have been using previously.

2. Putting It All On One Disk

As a database administrator I know that for years we have been advocating the use of dedicated drives for our data and log files. Not only is it good for performance, but it helps to mitigate disasters by keeping the transaction logs on a different drive than the data. We have even been able to convince server admins that we need to have a dedicated set of disks just for tempdb in order to have more performance gain. So why am I seeing a sudden increase in database servers using on one disk for data, logs, and tempdb?

Virtualization.

It would seem that for some reason folks think that by being virtual it is now OK to load up everything on one disk device. If you are lucky that disk device is dedicated to your guest but I am willing to bet that it is actually shared with other guests as well. This leads to a LOT of disk saturation or an excessive amount of read and write activity. It also leads to my colleagues saying things like “hey, can you pass along my contact info to those guys, they’ll be needing some help soon.”

It is extra overhead to be creating extra devices, I know. But loading up everything on one big RAID-5 device isn’t the answer (those can fail, too). As a result of this over-provisioning of disk resources I now tell my customers that they can forget to expect an average disk read and write rate of 20ms or less. For many, those days are long past. I see averages these days closer to 40ms, more than twice what had been considered acceptable previously.

You can get that performance back by using dedicated devices, just like you used to have dedicated drives.

3. Excessive Indexing

I often see database designs that contain WAY too many indexes. How many is too many? That depends, of course. How I like to define too many is by first examining to see if there are duplicate indexes. If I have duplicates, then I have too many indexes. Simple enough in theory, right?

The way I see a lot of duplicate indexes being introduced into systems is twofold: using the Database Engine Tuning Advisor (DTA), or using the missing index hints from SQL Server Management Studio (SSMS). I can spot the use of DTA indexing easily, as they will have a name prefixed with ‘_dta’. The indexes create from the hints shown in SSMS do not have a similar default naming convention but when you come across a table with a dozen indexes that are essentially duplicates of others you can start to get a sense that someone has been relying on advice from one tool or the other.

The idea behind an index is a simple one: you want to find a piece of information faster than going through the each and every page of data on disk. When your database is only used for reads, then the additional indexes are not likely to be as much of a problem with query performance. The problems really kick in with the DUI (Delete, Update, Insert) statements, as each one of those statements will need to touch each of the indexes on the table. That’s when performance comes to a crawl, and this can also result in deadlocking.

You can follow the link from earlier in this section for my script on finding duplicate indexes. Here is another piece of code from Glenn Berry (blog | @GlennAlanBerry) that will help you to determine if you have indexes that are seeing more write activity than reads (PRO TIP: that’s not ideal):

-- Possible Bad NC Indexes (writes > reads) SELECT OBJECT_NAME(s.[object_id]) AS [Table Name], i.name AS [Index Name], i.index_id, user_updates AS [Total Writes], user_seeks + user_scans + user_lookups AS [Total Reads], user_updates - (user_seeks + user_scans + user_lookups) AS [Difference] FROM sys.dm_db_index_usage_stats AS s WITH (NOLOCK) INNER JOIN sys.indexes AS i WITH (NOLOCK) ON s.[object_id] = i.[object_id] AND i.index_id = s.index_id WHERE OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1 AND s.database_id = DB_ID() AND user_updates > (user_seeks + user_scans + user_lookups) AND i.index_id > 1 ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC; -- Possible Bad NC Indexes (writes > reads) SELECT OBJECT_NAME(s.[object_id]) AS [Table Name], i.name AS [Index Name], i.index_id, user_updates AS [Total Writes], user_seeks + user_scans + user_lookups AS [Total Reads], user_updates - (user_seeks + user_scans + user_lookups) AS [Difference] FROM sys.dm_db_index_usage_stats AS s WITH (NOLOCK) INNER JOIN sys.indexes AS i WITH (NOLOCK) ON s.[object_id] = i.[object_id] AND i.index_id = s.index_id WHERE OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1 AND s.database_id = DB_ID() AND user_updates > (user_seeks + user_scans + user_lookups) AND i.index_id > 1 ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC;

-- Consider your complete workload -- Investigate further before dropping an index

Get rid of the indexes you don’t need. You will see a reduction in your I/O consumption and likely better performance. Your database and its backup files will be smaller, too.

4. Plan Cache Bloat

When you submit a query for execution the database engine will check to see if a plan already exists in memory. If it does, great! This saves time as the query optimizer will not need to create a new plan. The area of memory that contains the query plans is named the plan cache. With each new release of SQL Server we are given more ways to dive into the internal memory stores and thus I find the topic of plan cache and plan cache bloating coming up more often these days.

There is a lot of information available about the plan cache, another on troubleshooting plan cache issues, and a great article written by Bob Beauchemin (blog | @bobbeauch) on how data access code affects query performance. It’s a lot of reading so I am going to do my best to simplify things for you a bit here.

You don’t want to see an excessive number of query plans in your plan cache that are only used once (i.e., ‘ad-hoc’ plans) because these plans take up memory that could otherwise be used to store data pages instead. You can determine for yourself how your plan cache is allocated by using this query written by Robert Pearl (blog | @PearlKnows):

WITH CACHE_ALLOC AS ( SELECT objtype AS [CacheType] ,COUNT_BIG(objtype) AS [Total Plans] ,sum(cast(size_in_bytes as decimal(18,2)))/1024/1024 AS [Total MBs] , avg(usecounts) AS [Avg Use Count] , sum(cast( (CASE WHEN usecounts = 1 THEN size_in_bytes ELSE 0 END) AS decimal(18,2)))/1024/1024 AS [Total MBs - USE Count 1] , CASE WHEN (Grouping(objtype)=1) THEN count_big(objtype) ELSE 0 END AS GTOTAL FROM sys.dm_exec_cached_plans GROUP BY objtype ) SELECT [CacheType], [Total Plans],[Total MBs], [Avg Use Count],[Total MBs - USE Count 1], Cast([Total Plans]*1.0/Sum([Total Plans])OVER() * 100.0 AS DECIMAL(5, 2)) AS Cache_Alloc_Pct FROM CACHE_ALLOC Order by [Total Plans] desc WITH CACHE_ALLOC AS ( SELECT objtype AS [CacheType] ,COUNT_BIG(objtype) AS [Total Plans] ,sum(cast(size_in_bytes as decimal(18,2)))/1024/1024 AS [Total MBs] , avg(usecounts) AS [Avg Use Count] , sum(cast( (CASE WHEN usecounts = 1 THEN size_in_bytes ELSE 0 END) AS decimal(18,2)))/1024/1024 AS [Total MBs - USE Count 1] , CASE WHEN (Grouping(objtype)=1) THEN count_big(objtype) ELSE 0 END AS GTOTAL FROM sys.dm_exec_cached_plans GROUP BY objtype ) SELECT [CacheType], [Total Plans],[Total MBs], [Avg Use Count],[Total MBs - USE Count 1], Cast([Total Plans]*1.0/Sum([Total Plans])OVER() * 100.0 AS DECIMAL(5, 2)) AS Cache_Alloc_Pct FROM CACHE_ALLOC Order by [Total Plans] desc

That helps you discover if you have plan cache issues after they have already happened. How do you discover if you are having problems at the moment? That’s simple, too. You just want to check for an excessive number of compilations. After all, you can’t have a lot of these ad-hoc plans getting stored in the plan cache without them all being compiled. Robert comes to the rescue again with this bit of code for us:

select t1.cntr_value As [Batch Requests/sec], t2.cntr_value As [SQL Compilations/sec], plan_reuse = convert(decimal(15,2), (t1.cntr_value*1.0-t2.cntr_value*1.0)/t1.cntr_value*100) from master.sys.dm_os_performance_counters t1, master.sys.dm_os_performance_counters t2 where t1.counter_name='Batch Requests/sec' and t2.counter_name='SQL Compilations/sec' select t1.cntr_value As [Batch Requests/sec], t2.cntr_value As [SQL Compilations/sec], plan_reuse = convert(decimal(15,2), (t1.cntr_value*1.0-t2.cntr_value*1.0)/t1.cntr_value*100) from master.sys.dm_os_performance_counters t1, master.sys.dm_os_performance_counters t2 where t1.counter_name='Batch Requests/sec' and t2.counter_name='SQL Compilations/sec'

You can take some quick measurements to see if your system is having an excessive number of recompiles. The defined acceptable level is 10% of your batch requests should be compilations.

The best way I know to avoid plan cache issues is to write your code with query plan reuse in mind.

5. One Size Does Not Fit All

My experience as a database administrator is that most systems start out as a way to store data. They are designed with one purpose in mind: to stuff as much data as possible, as quickly as possible, inside the tables. If we are lucky these databases are designed to be normalized, to help with the insertion of data and protect data integrity. These are often called online transaction processing systems or OLTP.

Eventually someone else comes along and says “what a lovely bunch of data you have there, may I cursor through all your rows?” They start asking for exports of the data, or they start writing reports directly against the data. They want to mine every piece of that data looking for some piece of information that will help them make the best possible business decision. What they are doing is trying to use the data for analytical processing. They want an OLAP (online analytical processing) system.

Here’s the problem: OLTP is not the same as OLAP. When end users try to use an system designed for OLTP as an OLAP system it leads to performance issues. For SQL Server this is usually where locking and blocking rear their heads. It was not uncommon for me to see contention between users that wanted to insert data and users that wanted to generate reports. I had an alert built just to notify me when a session was blocked for more than five minutes. Can you imagine waiting that long and consider it to be part of your “normal” processing?

I can. I see it frequently when OLTP systems are being used for OLAP purposes.

What you want to do here would be to build a reporting solution for those OLAP users. That solution could be just a few tables optimized for reporting purposes or a full blown SSRS solution. Another possible answer is to denormalize your data, if possible, but understand that this moves performance issues from the reporting functions on to the DUI functions.

There you go, the five buckets that I see are causing performance issues and each one is something you control.

If only you weren’t doing it wrong.

Share this: Twitter

Facebook

LinkedIn

Reddit

