My kingdom for some symbols

I spend a large portion of my time at work trying to make things faster and less crashy. Usually the problems I investigate are in our own code so I have full information – source code and symbols. However sometimes the problems are at least partially in some other company’s code, and the task gets trickier.

This article was originally posted on #AltDevBlogADay.

Note that 64-bit binaries are also needed in order to do stack walking, so symbol servers that only provide PE files actually do have some value. That’s because 64-bit stack walking uses metadata in the PE file, instead of a linked-list of stack frames (as is used for stack walking in 32-bit processes). Therefore it is good news that AMD and NVIDIA both now support symbol servers for downloading their PE files (binaries), even though symbols are not hosted on these symbol servers. Note also that http: for symbol servers is a really bad idea.

Mandatory disclaimer: this post represents my opinion, not that of my employer.

A call stack we can believe in

For example, last week I investigated a Visual Studio hang. This intermittent hang had been bothering me for months and I finally decided to record an xperf trace of the hang and investigate. The details will be the subject of another post, but one vital clue was this call stack from the xperf CPU scheduling summary table:

The call stack is entirely in Microsoft code. It starts in Visual Studio and ends up in Windows, and this call stack shows that Visual Studio hung for 2.585 seconds while trying to CreateFileW so that it can GotoEntry in the CResultList. Even though I know nothing about the Visual Studio architecture that was enough information to let me understand the problem, and I then changed our project files in order to completely avoid this hang in the future. Shazam!

The reason I was able to diagnose this problem is because Microsoft publishes symbols for most of its code on a public symbol server. Symbols are published for Windows, Visual Studio, and much more, and this often lets me fix performance problems and crashes even when they are entirely separate from our code. Yay Microsoft!

A call stack that knows how to keep its secrets

Another example, not quite so happy, is demonstrated by this call stack. This is sampling profiler data from a thread that is in our game:

Huh. This thread sure is using a lot of CPU time. In our process. I wonder what it’s doing? Except for the two out 1,036 samples that hit in Windows functions I can only tell that it is NVIDIA code that is executing – there are no indications as to what it is doing.

I don’t mean to pick on NVIDIA here. Well, to be more accurate, I don’t mean to just pick on NVIDIA. This is a problem with all three major graphics vendors – NVIDIA, AMD, and Intel. None of them share symbols with the public and this leaves game developers with a significant problem. When a crash occurs deep in graphics driver code (a not uncommon occurrence) we are helpless. When a frame glitch occurs deep in graphics driver code (also quite common) we are helpless. And when game startup includes excessive memory allocations or CPU time deep in graphics driver code… we are helpless.

You can’t handle the truth

I’ve been told by some graphics vendors that having symbols would not be valuable to game developers, and might even be confusing. Game developers couldn’t possibly understand their cryptic function names, and might misinterpret them.

Poppycock.

I’ve solved dozens of performance problems in other people’s code, with just symbols to guide me. Having symbols has never been confusing, and has almost always saved me time.

If I had symbols for the graphics drivers then I could solve some problems on my own. I could recognize patterns in the crashes and performance problems that I see. I could give more precise suggestions and bug reports to the graphics vendors. I could more easily figure out what is happening in my code that is causing problems in their code.

As it is I can do almost nothing. Significant CPU time and memory is being consumed in my game’s process and I don’t have symbols to help understand why.

Call to action

If you’re a game developer, ask the graphics vendors that you work with for symbols. They’ll say no, but it’s still important to ask, in order to remind them of the importance of this issue. After they say no be sure to send them all of the crash dumps and xperf traces where they are a factor and insist that they help you, since they won’t let us help ourselves. And, don’t forget to share your stories and needs for symbols in the comments.

If you’re a graphics vendor – please release symbols. I’m confident that if you do it will let us make better games on your hardware, while saving time for your support team, and for game developers. I know that deciding to share symbols is hard, because symbols reveal a lot. I get that. But not sharing symbols with anyone is counterproductive.

Final disclaimer

I own stock in Microsoft, Intel, and AMD, but not NVIDIA. I hope it has not affected my impartiality. You decide.