[PULL REQUEST] Optimized Big Picture shaders for Linux (long)

[github.com]

[github.com]

Particle shader

vec4 texcol = color; vec2 uv = tex.st - 0.5; float radius = sqrt( dot( uv, uv ) ); float flSharpRadius = ( clamp( particleSharpness, 0.0, 0.98 ) ) / 2.0; float alpha = 1.0; if ( radius < flSharpRadius ) { alpha = 1.0; } else { alpha = clamp( (1.0 - ( (radius - flSharpRadius) / (0.5 - flSharpRadius ) ) ), 0.0, 1.0 ); } gl_FragColor.r = color.r * color.a * alpha; gl_FragColor.g = color.g * color.a * alpha; gl_FragColor.b = color.b * color.a * alpha; gl_FragColor.a = color.a * alpha;

vec2 uv = tex.st - 0.5; float radius = sqrt( dot( uv, uv ) ); float flSharpRadius = clamp( particleSharpness, 0.0, 0.98 ) * 0.5; gl_FragColor = color * vec4( color.aaa, 1.0 ) * mix( 1.0, clamp( 1.0 - ( radius - flSharpRadius ) / ( 0.5 - flSharpRadius ), 0.0, 1.0 ), step( flSharpRadius, radius ) );

YUV shader

void main (void) { vec2 texHalf = tex.st/2; float y = texture2DRect( Texture0, tex.st ).r; float u = texture2DRect( Texture1, texHalf ).r; float v = texture2DRect( Texture2, texHalf ).r; y = 1.1643*(y-0.0625); u = u-0.5; v = v-0.5; gl_FragColor.r = y+1.5958*v; gl_FragColor.g = y-0.39173*u-0.81290*v; gl_FragColor.b = y+2.017*u; gl_FragColor.a = 1.0; }

const vec3 mulRed = vec3( 1.0, 0.0, 1.5958 ); const vec3 mulGreen = vec3( 1.0, -0.39173, -0.8129 ); const vec3 mulBlue = vec3( 1.0, 2.017, 0.0 ); void main (void) { vec2 texHalf = tex.st * 0.5; vec3 yuv = vec3( 1.1643 * texture2DRect( Texture0, tex.st ).r + 0.42723, texture2DRect( Texture1, texHalf ).r, texture2DRect( Texture2, texHalf ).r, 1.0 ) - 0.5; gl_FragColor = vec4( dot( yuv, mulRed ), dot( yuv, mulGreen ), dot( yuv, mulBlue ), 1.0 ); }

Other shaders

Recently, I found the Big Picture mode shaders code in the Steam folder.To my surprise, one of them had dynamic conditional branching, which is considered aperformance dropper, because it doesn't let the GPU use single instruction for all pixels currently processed in parallel.So I decided to optimize the shaders.On GitHub, the issue is located there , and the repo with the updated code is there So, let me explain the biggest changes in the shaders.The particle shader (tex2dparticle.frag) is that shader with branching I was talking about.From what I see, the branch condition depends on a varying variable, so it can be different in a single batch of pixels.Here is the original main() code:So I replaced the branch with mix/step (much faster - even though some instructions are wasted, all pixels in the batch are processed with the same instruction), and the length of main() is now, while the unoptimized code islong.There's nothing wrong if flSharpRadius become 0.5, division by zero is completely fine in shaders and only results in undefined number (or infinity), but you aren't going to use it to color your pixels, because mix will make it 0.0.The resulting code is:I also replaced division by multiplication (less work for the compiler) and vectorized color/alpha multiplication.The issue with the YUV color space texture shader is underusing fast vector operations. Vector operations take mainly one instruction (and mostly one cycle, including dot product), and GPUs are designed for vector processing.The original main() is:And here is the version with vector operations:For the compiler, it would be hard to find out that dot product can be used here, especially in the case of red and blue components. And you can't rely on compiler optimizations in GLSL, because every graphics card vendor has shader different compiler, unlike HLSL, which is compiled solely by fxc.exe.Three 0.5 subtractions are also replaced by one. The 0.42723 constant is there to allow subtracting 0.5 from the red component, and is calculated with (0.5 - 1.1643 * 0.0625) formula.The same code is also merged with the YUV part of fancyquaduber.frag shader.In other shaders, the optimizations are mostly small, like replacing .r=;.g=;.b=;.a=; with vec4 assignment and using 1-argument vector constructors.As I said before, fancyquaduber.frag includes my YUV optimization, so it's probably the biggest optimization of other shaders.