I recently noticed that my Windows 7 home laptop was running slower than usual. It seemed to be spending a lot of time waiting on disk I/O. When I looked at Task Manager it showed that my laptop was caching about one fifth as much disk data as normal, but the cause of this was not obvious.

Part of my job is to be a performance expert – to investigate both mundane and esoteric performance problems. So when I realized that I had a serious and systemic problem I knuckled down and brought all the tools of my trade to bear, and ultimately I found the culprit.

The story you are about to hear is true. Only the precise values of cached, available, and free memory have been changed to protect the innocent.

Spoiler alert: the Western Digital My Book Live backup drive that I’d purchased from Costco was the culprit.

Even better spoiler alert: Western Digital contacted me nine days after I posted this entry, and took the bug quite seriously. About four months later they published a fix. The story has a happy ending, and you can change the world through blogging.

Caching counts

In order to understand this mystery it is necessary to understand the importance of the Windows disk cache. Having lots of memory in your computer is important to allow running lots of programs without hitting the page file, but also to allow lots of data to be cached in memory. Hard drives, with their spinning physical platters, are among the slowest components in a computer, and laptop drives are the slowest type of hard drives. Windows caches data from the hard drive so that it will be available instantly, instead of requiring many milliseconds for each read. The more data that is cached, the less frequently the disk drive will be needed, and the faster your computer will be. Because of the Windows disk cache there is no such thing as too much memory. My work machine has 32 GiB which means that I typically have over 24 GiB of disk data cached, and this allows me to bounce around between different programs and different projects with minimal disk I/O.

Windows (starting with Vista) has systems in place whose only purpose is to fill up the disk cache with data that its logging shows you are likely to need. SuperFetch watches your usage patterns and knows how to prefetch data when you launch an application. It also prefetches data when your system is idle.

These features were confusing when Windows Vista first came out. Users would complain that Vista was a pig because it used so much memory, because Task Manager would show very little free memory. But it turns out that this is a good thing. Free memory is a total waste. Free memory is not doing anything useful. A well running Windows machine should have free memory near zero, and cached memory (the Windows disk cache) nice and high. Cached memory is fulfilling a useful purpose, it makes your computer run faster, and it can be repurposed (turned into free memory to satisfy allocations) the instant that it is needed.

A healthy machine

When Windows is running well Task Manager (available by right-clicking on the taskbar) should show Physical Memory statistics similar to those shown to the right: lots of cached memory, lots of available memory, and very little free memory. If available memory is low (less than a few thousand MiB) then it is worth purchasing more memory so that Windows will have more space for running programs and caching data. If free memory is high (more than a few hundred MiB) then either Windows has not had time to populate the disk cache with useful data or else something is wrong.

An unhealthy machine

On my laptop the performance tab of Task Manager looked more like this:

Looking at the area circled in purple shows that I’ve got lots of available memory, but my disk cache is smaller than expected for a machine with 8 GiB of RAM, and free memory is unusually high. On my 8 GiB laptop I normally have about 4-5 GiB of data cached, but when I noticed this problem my laptop was typically only caching about 1-2 GiB of data.

Looking at the history of memory usage shows why cache memory is low and free memory is high. The area circled in red shows three brief spikes of memory usage. Each of those spikes represent a transient 4 GiB memory allocation, and Windows satisfied the first of those allocations by evicting 4 GiB of useful data from the file cache, leaving it as wasteful freed memory at the end.

Task Manager helped me to understand the structure of the problem, but it was no use at finding the culprit. Its Peak Working Set column showed nothing that could explain the spikes. It was time to invoke my favorite system profiling tool.

Investigating with xperf

I turned to the most powerful performance investigation tool that I know of: ETW tracing (aka xperf) using the Windows Performance Toolkit. This is a free suite of tools created by Microsoft that lets you profile all aspects of a Windows machine’s performance, from CPU usage, to disk I/O, to memory allocations, and more. Xperf is not an easy tool to use, but it rewards expert users with almost unlimited insight into the performance of Windows.

I decided to use the new Windows Performance Recorder UI (wprui) to record a trace. I selected all of the options that were related to memory allocations and told it to log to memory (now I would use UIforETW with VirtualAlloc stacks always checked in the Settings dialog, POOL in Extra kernel flags, and PoolAlloc in Extra kernel stackwalks). In this mode wprui or UIforETW continually monitors your system, writing detailed performance information to circular buffers in memory. When you notice a performance problem you click the Save button and it saves the buffers to disk. Thus, it is a retroactive profiler that can at any given time record the last ~30 seconds of system activity. I generally leave in-memory tracing running continuously so that I can always catch rare performance problems, without having to worry about reproducing them. When this screenshot was taken I’d had wprui running for most of a week, but the ‘Time’ field wraps after one day.

Caught in the act

The memory spikes were mostly happening at random times and were easy to miss, so I wasn’t having any luck recording a trace. However I eventually realized that I was getting a memory spike every time that my laptop resumed from standby. That made things much easier. All I had to do was have wprui recording to memory, then suspend and resume my laptop, log in, and click ‘Save’. Done. Then I loaded the trace into WPA (Windows Performance Analyzer) and looked for memory related anomalies. This can be tricky because you often don’t know what you are looking for or where you will find it, but when you look at the right graph it is sometimes quite obvious. Can you see the oh-so-subtle memory spike in the graph below?

So far we are seeing similar data to what Task Manager showed us, but with xperf we can dig deeper. The trace contains call stacks for, among other things, every pool memory allocation. After selecting the appropriate range (from the middle of the spike to the left of it) we can get a summary table. By arranging the summary table columns appropriately, sorting by size, and drilling down into the AIFO type (Allocated Inside the selected time range, Freed Outside), we quickly find the 4,294,967,296 byte allocation that is causing all the pain. And we have a call stack.

My first assumption upon seeing this call stack was that this was a Windows bug. I’d recently installed some Windows updates and I guessed that one of them was causing this bug. I was wrong, but it will take a bit more digging to see that.

The call stack shows that FstubAllocateDiskInformation is making two calls to ExAllocatePoolWithTag, and one of them is for a lot of memory. It’s time to look at FstubAllocateDiskInformation.

The truth is in the code

The simplest way to look at FstubAllocateDiskInformation is to use Local Kernel Debugging. This lets me use windbg to grab a snapshot of the kernel’s memory without actually breaking in to the kernel. Local kernel debugging is limited in that you can’t set breakpoints or step through code, but it lets you view disassembly, and it is simpler than setting up cables and a second machine to allow live kernel debugging. It was good enough.

Obviously the ‘uf’ command is what we need to dump the disassembly of this function. I’ve reformatted the results and put the crucial assembly language below, with a few lines bold faced and a few comments added:

lkd> uf ntkrnlmp!FstubAllocateDiskInformation

nt!FstubAllocateDiskInformation:

mov qword ptr [rsp+8],rbx

mov qword ptr [rsp+10h],rbp

mov qword ptr [rsp+18h],rsi

push rdi

sub rsp,20h

mov rsi,rdx

mov rbp,rcx

mov r8d,42747346h ; pool tag of FstB

mov edx,40h ; allocation size

xor ecx,ecx ; flags

call nt!ExAllocatePoolWithTag

test rax,rax

je Exit1 nt!FstubAllocateDiskInformation+0x35:

lea rdx,[rax+10h]

mov rcx,rbp

mov rbx,rax

call nt!FstubGetDiskGeometry

test eax,eax

js Exit2 nt!FstubAllocateDiskInformation+0x4c:

mov ecx,dword ptr [rbx+24h]

test ecx,ecx ; compare SectorSize to zero

je Exit3 nt!FstubAllocateDiskInformation+0x57:

mov rax,qword ptr [rbx+28h]

test rax,rax

je Exit3 nt!FstubAllocateDiskInformation+0x64:

xor edx,edx

mov dword ptr [rbx+8],ecx

mov r8d,42747346h ; pool tag of FstB

div rax,rcx

mov rdx,rcx ; SectorSize/allocation size

mov ecx,4 ; flags

mov qword ptr [rbx+38h],rax

mov qword ptr [rbx],rbp

call nt!ExAllocatePoolWithTag

test rax,rax

je Exit4

A careful reading of the code above isn’t required in order to follow along. The code contains two calls to ExAllocatePoolWithTag, one for 0x40 bytes and one for a variable amount. Both use a pool tag of 42747346h, which is a four-byte code for ‘FstB’, the file system pool. A full analysis required looking at the input parameter types and their layouts, but I also went to the reactos site and looked at their version of the source code. It was quite helpful for confirming my analysis of the assembly language. The crucial lines can be paraphrased as:

FstubGetDiskGeometry(DiskInformation);

…

if (DiskInformation->SectorSize != 0)

ExAllocatePoolWithTag(DiskInformation->SectorSize);

In other words, FstubAllocateDiskInformation asks for the geometry of the disk (using FstubGetDiskGeometry), and then if the sector size is non-zero it allocates however many bytes the driver says is the sector size.

I briefly stopped to verify the sector size of my main disk:

C:\>fsutil fsinfo ntfsinfo c:

NTFS Volume Serial Number : 0x00b8457cb8457168

Version : 3.1

Number Sectors : 0x0000000048ad8ae2

Total Clusters : 0x000000000915b15c

Free Clusters : 0x00000000016ac4a5

Total Reserved : 0x0000000000000780

Bytes Per Sector : 512

Bytes Per Physical Sector : 512

…

and then, in a long-delayed flash of insight, I realized what the problem was. A few weeks earlier, around the time the performance problems first started, and around the time of the Windows Updates that I had suspected, I had purchased a backup drive. When I went to use this backup drive it had installed a driver. And this driver was the cause of all of my problems:

Problem solved!

I fired up device manager, found the problematic driver, uninstalled it, and without even a reboot my computer was fixed. I could suspend and resume my laptop with no memory spikes and without my disk cache getting purged. Bliss! Now my laptop was acting like a computer with 8 GiB, instead of like a computer with 4 GiB.

I’m not a device driver expert – user-mode is my domain – but I can imagine what must be going on. The device driver appears to be responsible for filling in a DISK_GEOMETRY structure when responding to IOCTL_DISK_GET_DRIVE_GEOMETRY queries. It’s not clear to me what the driver is supposed to do if this information is not available. It could return a failure code, or it could return a structure initialized with zeroes, or it could initialize the fields – including BytesPerSector – with 0xFFFFFFFF. And that’s what this driver does. Since SectorSize is an unsigned value and since pool allocations are rounded up this results in a pool allocation of exactly 4 GiB. This is an easy mistake to make, but the Western Digital developers should have noticed that the driver was causing problems.

The Windows kernel carefully tests for BytesPerSector being equal to zero, but it doesn’t do any other sanity checking to see if BytesPerSector is one of the small number of reasonable values. Thus, the kernel missed an opportunity to avoid this problem.

If you own this device I recommend returning it. However that may not be sufficient. I was hitting this performance problem even when I was miles away from – and not connected to – the backup drive. It may be necessary to run device manager and manually uninstall the driver, but I’m not sure.

I also recommend contacting Western Digital and demanding that they release a fix.

Postscript

When I found the problem I mostly just wanted to return the drive, but first I made sure I had the latest drivers on my machine and all the latest updates on the NAS, to make sure my results were valid. I then deleted my data and wiped the drive. The Full Factory Restore option securely erases the drive, which is important when returning a drive that had several full backups of my laptop on it. And, of course I uninstalled the drivers, to make sure the problem went away.

Once again ETW let me investigate the mysterious and find the root cause in a truly scientific manner. I want more developers to use evidence-based-techniques to investigate performance problems rather than guessing.

I feel sorry for regular consumers who naively purchased this drive and have little hope of avoiding this problem. If they notice the performance problem they are unlikely to guess that it is correlated to the new backup drive.

I contacted Microsoft through some personal contacts and they have made sure that the appropriate teams know about this issue. I contacted Western Digital through regular support channels on August 29th, 2012 but have not heard back from them yet – they are now aware of the issue because one of their engineers saw this article. Support replied to my request on September 7th, 2012 – nine days later.

Other stories

If you liked this article then you might like my other forensic performance investigations with xperf:

For more information on using xperf to do your own investigations, look at ETW central.