Skyrim Special Edition

Batman: Arkham City

layout(location=0) in vec3 v0[3]; layout(location=0) in vec2 v1[3]; layout(location=0) out vec4 oVertex[3][32]; vec4 shader_in[3][32]; void hs_main () { oVertex[gl_InvocationId][0].xyz = shader_in[gl_InvocationId][0].xyz; oVertex[gl_InvocationId][1].xy = shader_in[gl_InvocationId][1].xy; // Do some other stuff } void main () { shader_in[0][0].xyz = v0[0]; shader_in[1][0].xyz = v0[1]; shader_in[2][0].xyz = v0[2]; shader_in[0][1].xyz = v1[0]; shader_in[1][1].xyz = v1[1]; shader_in[2][1].xyz = v1[2]; hs_main(); }

layout(location=0) in vec3 v0[3]; layout(location=0) in vec2 v1[3]; layout(location=0) out vec4 oVertex[3][32]; void main () { oVertex[gl_InvocationId][0].xyz = v0[gl_InvocationId].xyz; oVertex[gl_InvocationId][1].xy = v1[gl_InvocationId].xy; // Do some other stuff }

gl_InvocationId

Conclusion

One of the recent happenings in the world of Linux graphics is rise of DXVK . For those who don't know, DXVK is a translation layer which translates D3D11 and D3D10 Api calls to Vulkan. It's intended to be used together with Wine to allow more Windows game titles to run directly on Linux without modification. Wine already has a D3D10/11 to OpenGL translator but DXVK has generally better performance and compatibility than what is built into core Wine.For Linux gamers, this has meant a wealth of new titles to play on their favorite operating system. For driver developers, it means more workloads which have different shaders and API usage patterns. This means more bugs and more opportunities for performance optimization. While a lot of stuff works fine and performs very well out-of-the-box, we've gotten a handful of new GPU hangs and other issues reported. Much of the work I've done over the course of the last three months or so has been focused around fixing or improving the performance of games running under DXVK.Because bug fixing is boring, let's talk about making games faster!One of the first titles I tested on DXVK (the third, if I recall correctly) was The Elder Scrolls V: Skyrim Special Edition. When I first fired the game up, there were two immediately obvious problems: everything was green (this turned out to be a DXVK bug) and it was a slide-show. I don't recall the details exactly but it may have been in the seconds-per-frame range. While Skyrim may have once been considered graphically intensive, that was a long time ago and I knew we could do better.The first thing I did to try and narrow down the problem was to use RenderDoc to capture a frame of the game so I could inspect it draw-by-draw. Even though RenderDoc doesn't have actual performance counter support yet, it does use timestamps to tell you how long each draw takes. I was quickly able to identify a particular draw call that was dominating the frame render time even though it was just rendering a quad with some shading.With a bit more work, I was able to isolate the offending shader and look at the assembly. The shader was an ambient occlusion shader which had a couple of large constant arrays in the shader which it used as a look-up table for part of the calculation. Due to the size of the arrays, they were taking considerable shader resources and causing a large amount of spilling in the shader. Also, since they were accessed indirectly, we were generating large if-ladders for accessing them.Isn't this a fairly obvious thing we should be optimizing? Yes, and we have been in OpenGL. Unfortunately, the optimization pass for this lives at the GLSL IR level and not in NIR so the SPIR-V path can't take advantage of it. Using more-or-less the same idea as the GLSL IR pass, I wrote a NIR pass which pulls large constant arrays out into a blob of constant data associated with the shader which we then turn into a UBO in the Vulkan driver. The optimization successfully got rid of all of the spilling in that and similar shaders, reduced the time required for that draw by 99.6% (no joke!), brought the framerate from slide-show to nicely playable and roughly in-line with the performance of the same game under native D3D11.This all goes to show that sometimes the difference between garbage performance and good performance is just that one tiny thing you were missing all along.Some time later, a user was complaining on the DXVK issue tracker about GPU hangs with Batman: Arkham City on Intel. How I fixed the hangs is a very boring story but, while I was looking at GPU error states trying to figure out the hangs, I noticed that the tessellation shaders were spilling like mad. (As it turns out, that had nothing to do with the hangs and our spilling was working perfectly.)Why were they spilling so badly? The problem turned out to be because of the shadow variables that DXVK was creating for inputs. There are very good reasons why it creates these shadows that has to do with differences between the D3D shader interface and Vulkan. However, our compiler was having difficulty eliminating them and so we were storing 4K of temporary data which blows out the register file and we start spilling like mad. The pattern in DXVK looks like this:In order to chew through it, I wrote a series of four optimizations which chews through the above mess and turns it into, effectively, this:Not only are the temporary arrays gone but the array access with an index ofis now on an input variable directly and not on a temporary. It's much easier for our hardware to do an indirect access on a vertex input than on a temporary so, again, we dropped the if-ladders and almost all of the spilling.The improvement to Batman: Arkham City wasn't nearly as dramatic as with Skyrim but it was still around a 15% FPS increase in the game's built-in benchmark.So what's the moral of the story? It's not that bad shaders or spilling is the root of all performance problems. (I could just as easily tell you stories of badly placed HiZ resolves.) It's that sometimes big performance problems are caused by small things (that doesn't mean they're easy to find!). Also, that we (the developers on the Intel Mesa team) care about Linux gamers and are hard at work trying to make our open-source Vulkan and OpenGL drivers the best they can be.