Yesterday, we reported on Ashes of the Singularity performance in DirectX 12 and how it gives AMD a significant advantage over Nvidia. There’s a report making the rounds from Guru3D that shows AMD’s FCAT results as compared with Nvidia. The resulting frame time plot makes AMD look terrible, but these results aren’t accurate. The output looks the way it does because there’s a mismatch between what FCAT expects and how AMD’s driver actually performs image compositing. This creates the distinct impression (visible below) of poor performance on AMD GPUs.

First, some basics. FCAT is a system NVIDIA pioneered that can be used to record, playback, and analyze the output that a game sends to the display. This captures a game at a different point than FRAPS does, and it offers fine-grained analysis of the entire captured session. Guru3D argues that FCAT’s results are intrinsically correct because “Where we measure with FCAT is definitive though, it’s what your eyes will see and observe.” Guru3D is wrong. FCAT records output data, but its analysis of that data is based on assumptions it makes about the output — assumptions that don’t reflect what users experience in this case.

AMD’s driver follows Microsoft’s recommendations for DX12 and composites using the Desktop Windows Manager to increase smoothness and reduce tearing. FCAT, in contrast, assumes that the GPU is using DirectFlip. According to Oxide, the problem is that FCAT assumes so-called intermediate frames make it into the data stream and depends on these frames for its data analysis. If V-Sync is implemented differently than FCAT expects, the FCAT tools cannot properly analyze the final output. The application’s accuracy is only as reliable as its assumptions, after all.

An Oxide representative told us that the only real negative from AMD’s switch to DWM compositing from DirectFlip “[I]s that it throws off FCAT.”

In this case, AMD is using Microsoft’s recommended compositing method, not the method that FCAT expects, and the result is an FCAT graph that makes AMD’s performance look terrible. It isn’t. From an end-user’s perspective, compositing through DWM eliminates tearing in windowed mode and may reduce it in fullscreen mode as well when V-Sync is disabled.

When we approached Oxide about this problem, the company provided us with an Event Trace for Windows (ETW) of Ashes of the Singularity running on an AMD Radeon R9 390X.

The top row of the yellow data line shows when data was presented to the back buffer. There’s some mild variation, to be sure — but not the crazy up-and-down pattern FCAT is showing. Oxide recommends using ETW for performance analysis on frame smoothness, since the times it gives are accurate to within 100 microseconds (0.1ms).

According to Oxide, Microsoft is making a huge push in Windows 10 to make the operating system cooperative, with an emphasis on smooth image presentation (which is why the AMD driver composites using WDM instead of DirectFlip). DirectFlip also isn’t as power-efficient as WDM. All of these considerations, however, make it more difficult to profile applications.

FCAT is an extremely useful and powerful tool, but it’s not perfect. In his initial coverage of FCAT several years ago, Scott Wasson, who pioneered the use of “inside the second” techniques of analyzing GPU output, wrote the following:

There’s a pretty widespread assumption at other sites that FCAT data is “better” since it comes from later in the frame production process, and some folks like to say Fraps is less “accurate” as a result. I dispute those notions. Fraps and FCAT are both accurate for what they measure; they just measure different points in the frame production process. It’s quite possible that Fraps data is a better indication of animation smoothness than FCAT data. For instance, a smooth line in an FCAT frame time distribution wouldn’t lead to smooth animation if the game engine’s internal simulation timing doesn’t match well with how frames are being delivered to the display. The simulation’s timing determines the *content* of the frames being produced, and you must match the sim timing to the display timing to produce optimally fluid animation. Even “perfect” delivery of the frames to the display will look awful if the visual information in those frames is out of sync.

Trust your eyes

There’s one final reason to dispute what FCAT is reporting: It doesn’t match how the game appears to run on AMD hardware. The reason that Scott Wasson’s initial report on sub-second GPU rendering was so influential is because it crystalized and demonstrated a problem that reviewers and gamers had noticed for years. Discussions of microstutter are as old as multi-GPU configurations. Here’s a graph from our original Radeon HD 7990 review:

That microstutter was clearly, obviously visible while benchmarking the game. It might not have shown up in a conventional FPS graph, but it popped out immediately in the FRAPS frame time data. Looking at that graph for the first time, I felt like I’d finally found a way to intuitively capture what I’d been seeing for years.

Ashes of the Singularity doesn’t look like that on an R9 Fury X. It doesn’t look anything like the FCAT graph suggests it does. It appears to be equally smooth on both AMD and Nvidia hardware when running at roughly the same frame rate. Granted the experience of smoothness is subjective, but the difference in presentation between AMD and Nvidia is nothing like the initial FCAT graph implies.

Ashes of the Singularity measures its own frame variance in a manner similar to FRAPS; we extracted that information for both the GTX 980 Ti and the R9 Fury X. The graph above shows two video cards that perform identically — AMD’s frame times are slightly lower because AMD’s frame rate is slightly higher. There are no other significant differences. That’s what the benchmark “feels” like when viewed in person. The FCAT graph above suggests incredible levels of microstutter that simply don’t exist when playing the game or viewing the benchmark.

AMD has told us that it recognizes the value of FCAT in performance analysis and fully intends to support the feature in a future driver update. In this case, however, what FCAT shows is happening simply doesn’t match the experience of the actual output — and it misrepresents AMD in the process.