Sometimes when feeding data into a black box like MsDia140.dll it can be hard to know what code paths are taken and why. For example, let’s look at the function GSI1::readHash (see here to follow along in the source).

This function does some stuff I still don’t fully understand, so let’s walk through the process of gaining more understanding.

First we need a partially malformed PDB. This is easy to do since PDB files are sensitive to tiny changes, often in non-obvious ways. In particular, I’m going to work on the Publics stream. This is a fragment of a Multi-Stream File (aka MSF) which contains, among other things, publicly visible debug symbols for some program.

At the beginning of the stream, there is a structure which llvm-pdbdump is sadly cryptic about. Thankfully, llvm-pdbdump contains some sanity checks which seem to align well with the checks made by Microsoft’s code, so it’s at least easy to use the tool to verify what we’re spitting out.

readHash is responsible for decoding part of this data structure, which appears to be some kind of hash table for accelerating symbol lookups. Inside the code for readHash (see link above) there is a call to a pesky function called fixHashIn . By attaching WinDbg to a running copy of Dia2Dump.exe and setting liberal numbers of breakpoints, I traced a failure in my PDB generation code to this single function. fixHashIn is vomiting because I’m feeding it data it doesn’t like.

The first thing to note is that fixHashIn begins with a decrement instruction to decrease the value of one of its parameters. This parameter is supposedly the number of buckets in the hash table, or so I extrapolate from the source.