For as long as I can remember talking about video cards and GPU performance at AnandTech, there has been debate over the type of benchmarks used to represent that performance. In the old days, the debate was mostly manufacturer driven. Curiously enough, the discourse usually fired up when one manufacturer was at a significant deficit in GPU performance. NVIDIA made a big deal about moving away from timedemos and average frame rates during the early GeForce FX (NV30) days, when its cards might have delivered a decent gaming experience but were slaughtered in most benchmarks. Even Intel advocated for a shift away from most CPU bound gaming benchmarks back during the early years of the Pentium 4 - again, for obvious reasons.

It’s a shame that these revolutions in gaming performance testing were always associated with underperforming products (and later dropped once the product stack improved in the next generation or two). It’s a shame because there has always been merit in introducing additional metrics in order to provide the most complete picture when it came to gaming performance.

The issue lay mostly dormant over the past several years. Every now and then there’d be a new attempt to revolutionize GPU performance testing, but most failed to gain widespread traction for one reason or another. Broad repeatability, one of the basic tenets of the scientific method, was usually cast aside in pursuit of a lot of these new attempts at performance testing - which ultimately limited acceptance.

A year and a half ago, Scott Wasson over at the Tech Report did something no one since Dr. Pabst was able to do: he actually brought about a revolution in the 3D game benchmarking scene.

The approach seemed ridiculously simple - we’ve all had the tools for so very long. Scott used FRAPS to record frame times, and would calculate how long every frame in a benchmark took to render. By focusing on individual frame latencies, Scott’s method could better characterize the little hiccups and stutters that would get smoothed out in an average frame rate. With the new method came a bunch of nifty graphs, and the world changed.

The methodology wasn’t perfect, as FRAPS lacks a holistic view of the 3D rendering pipeline, but it did reveal some surprising issues (in addition to spawning further work that uncovered even more issues on the multi-GPU front). Interestingly enough, many of the issues uncovered by this focus on frame times/latency seemed to primarily impact AMD hardware.

AMD remained curiously quiet as to exactly why its hardware and drivers were so adversely impacted by these new testing methods. While our own foray into evolving GPU testing will come later this week, we had the opportunity to sit down with AMD to understand exactly what’s been going on.

Although neither strictly a defense nor merely an explanation of what we’ve been seeing over the past year, AMD wanted to sit down and better explain their position. This includes both why AMD’s products have been impacted in the manner they were, and why at the same time (and not unlike NVIDIA) AMD is worried about FRAPS being given more weight than it should be. Ultimately AMD believes that it’s to the benefit of buyers and journalists alike to better understand just what is happening, why it’s happening, and just what the most common tools can and are measuring.

What follows is based on our meeting with some of AMD's graphics hardware and driver architects, where they went into depth in all of these issues. In the following pages we’ll get into a high-level explanation of how the Windows rendering pipeline works, why this leads to single-GPU issues, why this leads to multi-GPU issues, and what various tools can measure and see in the rendering process.