There has been lots of discussion lately on OpenGL with the advent of the next-gen low level APIs like Mantle, Metal and DirectX 12. These low level APIs promise better performance by a design which maps better to the hardware (that’s why they are low-level). I’m not going to talk about that in this article, there are a lot of good resources like:

OpenGL response to these low level APIs is what some people calls AZDO (Approaching zero driver overhead ). AZDO is a presentation by Cass Everitt, Graham Sellers, John McDonald and Tim Foley. You can watch the presentation here :

Or check out the slides here: http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead

What AZDO proposes is to use a group of extensions aimed to reduce driver overhead on modern OpenGL applications.

So, first of all, what is driver overhead? Driver overhead is noticeable when the application and the gpu could render more but the driver can not keep up with them so it becomes the bottleneck.

OpenGL remains the only multiplatform API at the moment and this will not change in the immediate future. Mantle is not even out yet and will be Windows and AMD only (at least at first), and Metal is aimed at Apple products. For this reason I think we should embrace AZDO to reduce driver overhead and write more efficient programs. That’s the reason why I have decided to write some articles explaining some of these extensions. This first article on this series is devoted to persistent mapped buffers.

Persistent-mapped buffers

Use Case: To update dynamic buffers faster. ( Dynamic VB/IB data, highly dynamic uniform data, MultiDrawIndirect command buffers )

Normally, if a buffer object is mapped, it cannot be used in a non-mapped fashion. Rendering commands that would read or write to a mapped buffer will throw an error if the buffer is mapped.

However, if the buffer is created with immutable storage and the GL_MAP_PERSISTENT_BIT flag, then the buffer can remain mapped while the GPU is using it. Immutable storage means you will be unable to rellocate that storage (e.g orphaning the buffer won’t work ). You can modify the data but not its “memory address”.

To allocate immutable storage you call the function glBufferStorage( GLenum taget, GLsizeiptr size, const GLvoid * data, GLbitfield flags ) with the GL_MAP_PERSISTENT_BIT enabled. Now you can map the buffer and keep it mapped forever.

Obviously, you get this at a cost, because, now, you need to do synchronization yourself, that is, you will need to use fences to ensure that the GPU is not using the buffer while you are writting data to it. Every time you issue a draw calls which may use the persistent-mapped buffer you will need to place a fence using glFenceSync. Next time you want to update the buffer you will need to call glClientWaitSync to ensure the fence has been removed and the GPU is no longer working with your buffer. If you do this naively, chances are your application will spend a lot of time in glClientWaitSync, that’s when you will need to use double (or even triple) buffering on your buffer object, so you can update a region of the buffer while the gpu is using another region. Actually, AZDO proposes using triple buffering, so you need to create an immutable storage three times bigger than needed, so, one region is what the GPU will be using, another region is what the driver will be holding getting ready for the GPU to use, and the last region will be the one you are updating.

This is a little program to show the use of this extension. It creates a vertex buffer and maps it at the begging of the execution and then uses the pointer to update the data every frame. I don’t use double or triple buffering in this example to keep it simple but it can be added easily. You can find this sample app and some others in this GitHub repository https://github.com/fsole/GLSamples

EDIT : I forgot about this and it is actually quite important. As neobrain points out, to make sure the data you have just written to a mapped buffer is visible to the GPU you’ll need to place a memory barrier or create and map the buffer using the GL_MAP_COHERENT_BIT. If you use the GL_MAP_COHERENT_BIT when allocating the buffer’s storage and you map it with the same bit the data you write will become visible to every OpenGL command issued after the write, automatically.

# include #include #include "GL/glew.h" #include "GL/freeglut.h" namespace { struct SVertex2D { float x ; float y ; }; const GLchar * gVertexShaderSource [] = { "#version 440 core

" "layout (location = 0 ) in vec2 position;

" "void main(void)

" "{

" " gl_Position = vec4(position,0.0,1.0);

" "}

" }; const GLchar * gFragmentShaderSource [] = { "#version 440 core

" "out vec3 color;

" "void main(void)

" "{

" " color = vec3(0.0,1.0,0.0);

" "}

" }; const SVertex2D gTrianglePosition [] = { {- 0.5f ,- 0.5f }, { 0.5f ,- 0.5f }, { 0.0f , 0.5f } }; GLfloat gAngle = 0.0f ; GLuint gVertexBuffer ( 0 ); SVertex2D * gVertexBufferData ( 0 ); GLuint gProgram ( 0 ); GLsync gSync ; } //Unnamed namespace GLuint CompileShaders ( const GLchar ** vertexShaderSource , const GLchar ** fragmentShaderSource ) { //Compile vertex shader GLuint vertexShader ( glCreateShader ( GL_VERTEX_SHADER ) ); glShaderSource ( vertexShader , 1 , vertexShaderSource , NULL ); glCompileShader ( vertexShader ); //Compile fragment shader GLuint fragmentShader ( glCreateShader ( GL_FRAGMENT_SHADER ) ); glShaderSource ( fragmentShader , 1 , fragmentShaderSource , NULL ); glCompileShader ( fragmentShader ); //Link vertex and fragment shader together GLuint program ( glCreateProgram () ); glAttachShader ( program , vertexShader ); glAttachShader ( program , fragmentShader ); glLinkProgram ( program ); //Delete shaders objects glDeleteShader ( vertexShader ); glDeleteShader ( fragmentShader ); return program ; } void Init ( void ) { //Check if Opengl version is at least 4.4 const GLubyte * glVersion ( glGetString ( GL_VERSION ) ); int major = glVersion [ 0 ] - '0' ; int minor = glVersion [ 2 ] - '0' ; if ( major < 4 || minor < 4 ) { std :: cerr << "ERROR: Minimum OpenGL version required for this demo is 4.4. Your current version is " << major << "." << minor << std :: endl ; exit (- 1 ); } //Init glew glewInit (); //Set clear color glClearColor ( 1.0f , 1.0f , 1.0f , 0.0f ); //Create and bind the shader program gProgram = CompileShaders ( gVertexShaderSource , gFragmentShaderSource ); glUseProgram ( gProgram ); glEnableVertexAttribArray ( 0 ); //Create a vertex buffer object glGenBuffers ( 1 , & gVertexBuffer ); glBindBuffer ( GL_ARRAY_BUFFER , gVertexBuffer ); glVertexAttribPointer ( 0 , 2 , GL_FLOAT , GL_FALSE , 0 , 0 ); //Create an immutable data store for the buffer size_t bufferSize ( sizeof ( gTrianglePosition ) ); GLbitfield flags = GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT ; glBufferStorage ( GL_ARRAY_BUFFER , bufferSize , 0 , flags ); //Map the buffer for ever gVertexBufferData = ( SVertex2D *) glMapBufferRange ( GL_ARRAY_BUFFER , 0 , bufferSize , flags ); } void LockBuffer () { if ( gSync ) { glDeleteSync ( gSync ); } gSync = glFenceSync ( GL_SYNC_GPU_COMMANDS_COMPLETE , 0 ); } void WaitBuffer () { if ( gSync ) { while ( 1 ) { GLenum waitReturn = glClientWaitSync ( gSync , GL_SYNC_FLUSH_COMMANDS_BIT , 1 ); if ( waitReturn == GL_ALREADY_SIGNALED || waitReturn == GL_CONDITION_SATISFIED ) return ; } } } void Display () { glClear ( GL_COLOR_BUFFER_BIT ); gAngle += 0.1f ; //Wait until the gpu is no longer using the buffer WaitBuffer (); //Modify vertex buffer data using the persistent mapped address for ( size_t i ( 0 ); i != 6 ; ++ i ) { gVertexBufferData [ i ]. x = gTrianglePosition [ i ]. x * cosf ( gAngle ) - gTrianglePosition [ i ]. y * sinf ( gAngle ); gVertexBufferData [ i ]. y = gTrianglePosition [ i ]. x * sinf ( gAngle ) + gTrianglePosition [ i ]. y * cosf ( gAngle ); } //Draw using the vertex buffer glDrawArrays ( GL_TRIANGLES , 0 , 3 ); //Place a fence wich will be removed when the draw command has finished LockBuffer (); glutSwapBuffers (); } void Quit () { //Clean-up glUseProgram ( 0 ); glDeleteProgram ( gProgram ); glUnmapBuffer ( GL_ARRAY_BUFFER ); glDeleteBuffers ( 1 , & gVertexBuffer ); //Exit application exit ( 0 ); } void OnKeyPress ( unsigned char key , int x , int y ) { //'Esc' key if ( key == 27 ) Quit (); } int main ( int argc , char ** argv ) { glutInit (& argc , argv ); glutInitDisplayMode ( GLUT_DOUBLE | GLUT_RGB ); glutInitWindowSize ( 400 , 400 ); glutCreateWindow ( "Persistent-mapped buffers example" ); glutIdleFunc ( Display ); glutKeyboardFunc ( OnKeyPress ); Init (); //Enter the GLUT event loop glutMainLoop (); }