Intro

gfx-rs is a Rust project aiming to make graphics programming more accessible and portable, focusing on exposing a universal Vulkan-like API. It’s a single Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL. We are also building a Vulkan Portability implementation based on it, which allows non-Rust applications using Vulkan to run everywhere. This post is focused on the Metal backend only.

Previously, we benchmarked Dota2 and were able to run many other applications and engines successfully, including Dolphin Emulator. For Dolphin, we previously focused on visual correctness. After games appeared to render correctly, we shifted our focus to performance to ensure they also render quickly.

Setup

@MayImilae proposed a simple benchmark scenario: run the game Metroid Prime 2 (US), load into Sanctuary Fortress, wait for the animation to finish, and finally record 20 seconds of frame times (without providing any input to the game). We ensure the game window is on screen and in focus while being benchmarked.

The Dolphin settings used for the benchmark were:

Store EFB Copies to Texture Only must be enabled

Speed Limit: Unlimited

4x native internal resolution

Vsync: Off

As with Dota2, gfx’s Metal backend was tested in 2 modes: one with Immediate command recording and one with Deferred. These where configured using GFX_METAL_RECORDING environment. gfx-portability itself was selected by pointing LIBVULKAN_PATH environment to it. The library was built from tag 0.5 using a simple make version-release command. We also played a bit with Dolphin’s “Backend multi-threading” option (or “MT” for short) because we had doubts whether this is the right approach when used with a normal Vulkan driver.

Results

Test/Library gfx/immediate/-MT gfx/immediate/+MT gfx/deferred/+MT MoltenVK/-MT MoltenVK/+MT platform A (Intel, dual-core) frame time average 14.933781 15.989498 14.827277 15.731309 15.492961 frame time variance 2.3165195 2.1808865 1.753293 3.0022306 4.5931387 platform B (AMD, quad core) frame time average 14.572058 14.32026 14.479047 18.306593 18.41038 frame time variance 17.192923 2.0200737 2.1380246 30.974926 29.487541

Frame times where gathered using Dolphin’s built-in logging, which was manually turned on/off for that 20 second time span. The output was then fed to a simple analysis tool which produced the average and variance of the numbers.

Conclusions

In Dolphin, gfx-portability provides faster and more consistent frame rates. The average frame times decreased by 4% on Intel machines, and significantly decreased by 22% on AMD machines. Consistency difference is especially visible on AMD, where we produce rock solid frame rate. Subjectively the game plays much smoother in gfx-portability as well.

Of the gfx configurations tested, the Deferred+MT showed best results. This is similar to Dota2 results, but we still find it surprising that Immediate did not get ahead this time. Unlike Dota, in this case we didn’t have many small command buffers that the Deferred recording would be able to stitch together. Thus, we conclude that the explanation lies in Metal implementations/drivers, which work most efficiently when the hardware queue is immediately available (which is not the case for Immediate recording).

Rust is still showing it’s strength (and potential!), although we approach a point where zero cost abstractions start breaking (quantum level?). For example, copyless crate allows us to use the same standard containers but with fewer memcpy instructions generated by LLVM. Hopefully, the optimization story of Rust will keep evolving, and eventually we’ll be able to deprecate the crate and programs will run faster out of the box.

Finally, a usual disclaimer that we are not benchmarking specialists, and the results here might be taken with a grain of salt. We’ll be happy to assist any party that attempts to reproduce them.