When you query sys.dm_os_wait_stats, or check your waits with sp_Blitz® or sp_BlitzFirst®, one of your biggest wait types is probably CXPACKET. Let’s talk about what’s really going on. Here’s a 15-minute video explaining it, or you can keep scrolling down to read:

Our SQL Server Stand-In: a Class.

Pretend that you’re a schoolteacher in a classroom with a couple dozen students. You’ve got a class project that you need to accomplish as quickly as possible. The project is to build a list of all of the distinct words in the book Moby Dick, along with the frequency of each word.

There’s three ways you could do it:

1. Hand the entire book to a single student and let them work on it until it’s done. This student is going to be miserable, and it’s going to take a heck of a long time for this project to finish. However, this leaves the rest of the students to hang out and have fun playing Words With Friends.

2. Divide the pages of the book out between all of the students. Tear the book into a bunch of stacks of pages, distribute that work, and let them start working simultaneously. As they turn in their work to you, they can each get freed up to do other stuff. You (the teacher) will combine their work together into one master list. Each student’s paperwork will include the word “whale”, for example, and you’ll need to add up all of those “whale” references on one master list.

3. Divide the pages between a group of students (but not all of them.) Instead of putting all 24 students to work on the same project, you might tear the book into 8 sections and assign that work to 8 of your students. The other 16 students would be free to veg out in case another assignment comes in.

How This Relates to SQL Server

You’ve probably already figured out that the work involved is a query. The teacher is the master thread running a query. The students are CPU cores. Microsoft SQL Server can break a single query out across multiple cores as long as the work can be parallelized. (Paul White explains what kinds of work isn’t parallelized.)

When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).

Read that sentence again, because it’s really important. We’re going to come back to that.

The more students we get involved, the more work we have to do to coordinate their efforts and combine their results. The students, being rowdy goons, get frustrated when they get done with their work quickly and have to sit around waiting for other students to finish. This isn’t really a bottleneck per se – the students could go off and do other work – but they like to complain about how they had to wait around for the slow kids.

That complaining is CXPACKET – Class eXchange Packets. The class is turning in their packets, and complaining about the slow kids.

Just like kids complaining, CXPACKET by itself isn’t a problem. However, it’s an indication that something might be going wrong in this important sentence. Let’s read it again:

When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).

How to Reduce CXPACKET by Setting MAXDOP

The default Maximum Degree of Parallelism – the number of students we’ll get involved in a project – is zero, which means unlimited. That made sense back in the days when our SQL Servers only had a couple of cores, but today, we’ve got huge, ginormous processors. (I’m not bragging, I’m just saying.)

Microsoft KB 2806535 gives MAXDOP recommendations, and the short story is that you should set it to the number of cores in one processor, up to 8. (I’m simplifying here.)

Microsoft is saying that in most cases, as you go beyond 8 students working on a single project, there’s a diminishing amount of returns. You’re better off leaving the rest of the students free to work on other projects.

There’s a ton of bad advice on the web that goes a lot further and says you should set MAXDOP to 1 for OLTP-oriented workloads. They’re saying that if your application’s database needs are mostly very small inserts, updates, and deletes, then you should constrain SQL Server to only use one student at a time for any project, no matter how big that project appears to be.

That advice is usually no good because it assumes your server will never, ever run a big query. Sure, in some environments like Microsoft SharePoint, MAXDOP = 1 does make sense – but the key is really knowing the query workloads well. In most of the OLTP environments I see, people still want to run big, ugly reports in production, and we can’t stop them. When they do hit the server hard with those queries, I want to throw a few cores at the problem to make it finish faster. The key is knowing that there’s another knob we can tweak.

Defining Big Queries: Cost Threshold for Parallelism

Remember that really important sentence? Let’s read it again:

When you pass in a big query, SQL Server will consider parallelizing it across more cores – up to your Maximum Degree of Parallelism (MAXDOP).

What exactly is a big query? Turns out we can actually tell SQL Server by setting Cost Threshold for Parallelism – the minimum cost of a query before SQL Server will consider dividing the work across multiple cores. It’s set in exactly the same place as MAXDOP – in SSMS, right-click on the server name and click Properties, Advanced.

The default Cost Threshold for Parallelism is 5 – but 5 what? Books Online says:

The cost refers to an estimated elapsed time in seconds required to run the serial plan on a specific hardware configuration.

That’s technically correct, but the words “specific hardware configuration” are a little tricky. They refer to an imaginary machine with an imaginary amount of CPU power and storage throughput. It’s like how we measure cars in horsepower. If you put together a tug-of-war between a Jaguar XFR-S and 550 horses, it’s not like it would be a dead heat. These are just numbers.

The real thing to know is that a query cost of 5 is a lot like 5 horsepower – it’s pretty darned small. With the defaults, SQL Server will aggressively try to parallelize everything.

The Real Way to Reduce CXPACKET Waits

Set MAXDOP per Microsoft’s recommendations in KB 2806535.

Set Cost Threshold for Parallelism in a way that lets your small queries stay single-threaded, but still gets more students involved when there’s a really big project. To do that, you have to…

Learn your workload. Use SQL Server’s plan cache to find the most resource-intensive queries on your system, and then analyze the costs of those queries. On one of my servers, for example, I know that we get hit with a lot of OLTP queries with a cost of around 30-40, and I don’t want those to go parallel. However, we also get a lot of reporting queries that scan the entire sales detail table, and those always have a cost of around 200 or higher.

This is a big topic, and tuning is different on a server by server basis. To get trained on learning your workload and taking the right steps for it, check out our SQL Server training videos.

What If Performance Gets Worse?

After setting Cost Threshold and MAXDOP, it’s possible that performance could suddenly get worse. You might have queries with a cost in the range of 5 to 50 that used to go parallel, and now suddenly go single-threaded.

Use sp_BlitzCache® to identify the top 10 most resource-intensive queries, making sure to run it with @SortOrder = ‘CPU’, then reads, then duration, looking at your SQL Server’s plan cache overall. Look for queries with a cost between 5 and 50, and do query tuning or index tuning on those queries.

It’s also possible that 50 might not be the right number: say you’ve got a query with an estimated cost of 40, and it’s actually a wildly underestimated cost. SQL Server is making a plan that doesn’t take into account how much data actually comes back. In that case, you can either tune the query, or lower Cost Threshold to 39 to let that query go parallel.

Wanna learn more?

Check out our online training classes.