May 2, 2016

The mainstream real-time graphics APIs, OpenGL and Direct3D, are probably the most widespread way that programmers interact with heterogeneous hardware. But their brand of CPU–GPU integration is unconscionable. CPU-side code needs to coordinate closely with GPU-side shader programs for good performance, but the APIs we have today treat the two execution units as isolated universes. This mindset leads to stringly typed interfaces, a huge volume of boilerplate, and impoverished GPU-specific programming languages.

This post tours a few gritty realities in a trivial OpenGL application. You can follow along with a literate listing of the full source code.

Shaders are Strings

To define an object’s appearance in a 3D scene, real-time graphics applications use shaders: small programs that run on the GPU as part of the rendering pipeline. There are several kinds of shaders, but the two most common are the vertex shader, which determines the position of each vertex in an object’s mesh, and the fragment shader, which produces the color of each pixel on the object’s surface. You write shaders in special C-like programming languages: OpenGL uses GLSL.

This is where things go wrong. To set up a shader, the host program sends a string containing shader source code to the graphics card driver. The driver JITs the source to the GPU’s internal architecture and loads it onto the hardware.

Here’s a simplified pair of GLSL vertex and fragment shaders in C string constants:

const char *vertex_shader = "in vec4 position;

" "out vec4 myPos;

" "void main() {

" " myPos = position;

" " gl_Position = position;

" "}

"; const char *fragment_shader = "uniform float phase;

" "in vec4 myPos;

" "void main() {

" " gl_FragColor = ...;

" "}

";

(It’s also common to load shader code from text files at startup time.) Those in , out , and uniform qualifiers denote communication channels between the CPU and GPU and between the different stages of the GPU’s rendering pipeline. That myPos variable serves to shuffle data from through vertex shader into the fragment shader. The vertex shader’s main function assigns to the magic gl_Position variable for its output, and the fragment shader assigns to gl_FragColor .

Here’s roughly how you compile and load the shader program:

// Compile the vertex shader. GLuint vshader = glCreateShader(GL_VERTEX_SHADER); glShaderSource(vshader, 1, &vertex_shader, 0); // Compile the fragment shader. GLuint fshader = glCreateShader(GL_FRAGMENT_SHADER); glShaderSource(fshader, 1, &fragment_shader, 0); // Create a program that stitches the two shader stages together. GLuint shader_program = glCreateProgram(); glAttachShader(shader_program, vshader); glAttachShader(shader_program, fshader); glLinkProgram(shader_program);

With that boilerplate, we’re ready to invoke shader_program to draw objects.

The shaders-in-strings interface is the original sin of graphics programming. It means that some parts of the complete program’s semantics are unknowable until run time—for no reason except that they target a different hardware unit. It’s like eval in JavaScript, but worse: every OpenGL program is required to cram some of its code into strings.

Direct3D and the next generation of graphics APIs—Mantle, Metal, and Vulkan—clean up some of the mess by using a bytecode to ship shaders instead of raw source code. But pre-compiling shader programs to an IR doesn’t solve the fundamental problem: the interface between the CPU and GPU code is needlessly dynamic, so you can’t reason statically about the whole, heterogeneous program.

Stringly Typed Binding Boilerplate

If string-wrapped shader code is OpenGL’s principal investment in pain, then it collects its pain dividends via the CPU–GPU communication interface.

Check out those variables position and phase in the vertex and fragment shaders, respectively. The in and uniform qualifiers mean they’re parameters that come from the CPU. To use those parameters, the host program’s first step is to look up location handles for each variable:

GLuint loc_phase = glGetUniformLocation(program, "phase"); GLuint loc_position = glGetAttribLocation(program, "position");

Yes, you look up the variable by passing its name as a string. The phase parameter is just a float scalar, but position is a dynamically sized array of position vectors, so it requires even more boilerplate to set up a backing buffer.

Next, we use these handles to pass new data to the shaders to draw each frame:

// The render loop. while (1) { // Bind our compiled shader program. glUseProgram(shader_program); // Set the scalar `phase` variable. glUniform1f(loc_phase, sin(4 * t)); // Set the `location` array by copying data into the buffer. glBindBuffer(GL_ARRAY_BUFFER, buffer); glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(points), points); // Use these parameters and our shader program to draw something. glDrawArrays(GL_TRIANGLE_FAN, 0, NVERTICES); }

The verbosity is distracting, but those glUniform1f and glBufferSubData calls are morally equivalent to writing set("variable", value) instead of let variable = value . The C and GLSL compilers can check and optimize the CPU and GPU code separately, but the stringly typed CPU–GPU interface prevents either compiler from doing anything useful to the complete program.

The Age of Heterogeneity

OpenGL and its equivalents make miserable standard bearers for the age of hardware heterogeneity. Heterogeneity is rapidly becoming ubiquitous, and we need better ways to write software that spans hardware units with different capabilities. OpenGL’s programming model espouses the simplistic view that heterogeneous software should comprise multiple, loosely coupled, independent programs.

If pervasive heterogeneity is going to succeed, we need to bury this 20th-century notion. We need programming models that let us write one program that spans multiple execution contexts. This won’t erase heterogeneity’s essential complexity, but it will let us stop treating non-CPU code as a second-class citizen.