Passing values…

A few months ago I had some interesting performance problems with OpenGL on OSX. I identified the problem and made some work arounds for development to continue. This week I’ve properly fixed the issue, and I want to record it here for myself and others to avoid this mistake.

So here’s a scene, rendering on OSX, at an abysmal frame rate of 14 on a MacBook Pro. That’s right. 14. I’ve got the game paused so there isn’t any time spent on updates, this is just drawing.







If I move the camera to a different location, the frame rate is 126. Thats a difference of 63 or so milliseconds. Ouch.







So after much debugging I determined that rendering animated models was causing the slow down. The image of just trees doesn’t have any deer or people moving around. And if I remove the people from my original test scene, the frame rate is over 100.







Since rendering houses and trees really only has minor differences with animated models I disabled the shader code that animates the models and the frame rate went back up to normal. This looks funny, and runs fast.







So here’s the basic code that handles animation in GLSL. It looks pretty standard and is simple code. This isn’t the entire shader, just enough to get an idea of how the animation part works.

struct BoneConstants { mat4x4 transforms[64]; }; uniform BoneConstants bc; in vec3 inputPosition; in vec4 inputWeight; in ivec4 inputIndex; vec3 SkinPosition(vec3 position, ivec4 index, vec4 weight, BoneConstants bones) { return ((bones.transforms[index.x] * vec4(position, 1.0)) * weight.x + (bones.transforms[index.y] * vec4(position, 1.0)) * weight.y + (bones.transforms[index.z] * vec4(position, 1.0)) * weight.z + (bones.transforms[index.w] * vec4(position, 1.0)) * weight.w)).xyz; } void main() { vec3 position = SkinPosition(inputPosition, inputIndex, inputIndex, bc); gl_Position = (gc.worldToProjection * (tc.transform * vec4(position, 1.0))); }

What this code does is transform the position of a vertex by up to four bones in the models structure. It then weights them by how much influence each bone has on the vertex.

I stared at this code for a while (more than a while actually), and after messing about a bit, it finally dawned on me what’s wrong with it. Face Palm.

To fix it, instead of calling a function to animate the models, I manually inlined the code. And my frame rate returned to normal, with animated characters.

void main() { vec4 position = ((bc.transforms[inputIndex.x] * vec4(inputPosition, 1.0)) * inputWeight.x + (bc.transforms[inputIndex.y] * vec4(inputPosition, 1.0)) * inputWeight.y + (bc.transforms[inputIndex.z] * vec4(inputPosition, 1.0)) * inputWeight.z + (bc.transforms[inputIndex.w] * vec4(inputPosition, 1.0)) * inputWeight.w)).xyz; gl_Position = (gc.worldToProjection * (tc.transforms[gl_InstanceID] * vec4(position, 1.0)));

Wow. So whats going on there?

There’s two ways to pass parameters to a function. Either by value, or by reference.

When you pass a parameter by value, a copy of the variable is made so that any changes to the variable in the function don’t effect its value in the calling function.

When you pass a parameter by reference any modifications to the variable change it directly. No copy is made.

In my case with animation, the entire array of bone transformations is being copied, because it’s being passed by value. My suspicion is that the program running on the GPU doesn’t have enough registers to make this copy, so the GLSL compiler is generating code – copying the array bit by bit, and then is running the code over and over to evaluate the final result. What’s just a few matrix multiples, scaling, and adding becomes many many copies and conditionals. This possibly results in different execution paths per GPU thread, causing even more slowdown.

My first attempt before manually inlining this code was actually to pass the array by reference, but the OpenGL compiler yelled at me that you can’t pass a uniform by reference.

On Windows and Linux, I suspect the compiler is smart enough to see that the function doesn’t modify the array, and optimizes the copy away. (Or my GTX 980 and 290X are just too fast for me to notice the slowdown…)

Most people directly reference the global list of uniform bone transformations directly and never run into this issue. But since my custom shader language that generates GLSL doesn’t have a concept of globals, everything is passed to functions if it’s needed. Arghghghg.

So what’s the real fix?

I don’t want to have to manually repeat code in shaders, that’s just bad programming practice. Luckily, I control the compiler for my own shading language, so I can get it to generate different code.

So I just recently added an ‘inline’ keyword for functions. The code gets inlined automatically and any value passed by reference isn’t copied when the GLSL is generated.

Previously my skinning function looked (in SRSL, not GLSL) like this:

inline float3 SkinPosition(float3 position, int4 index, float4 weight, BoneConstants bc) {...}

And now it looks like this

inline float3 SkinPosition(float3 position, inout int4 index, inout float4 weight, inout BoneConstants bc) {...}

No more repeated skinning code everywhere.

Getting my compiler to inline the code is pretty easy. However, as most shader languages don’t feature a goto or label statement to jump over remaining code, it’s hard (if not impossible) to inline a certain class of functions. So my inline feature doesn’t handle inlining when returning from complex flow control. This really isn’t an issue for shaders, as the programs tend to be straight forward and not have many loops or conditionals.

So long story short, don’t pass uniform arrays and large structs to a function by value in GLSL.