In my previous post about Soil Library I have talked about adding some new features. One of them was improving mipmap generation by simply using glGenerateMipmap(EXT) function. In this post I am going to describe changes needed to be made to implement it and gained benefits.

To be short: For NPOT sizes I get around 4x faster texture loading and 2x smaller memory consumption. For POT size 2x faster times (no memory difference).

The problem

Add ability to use glGenerateMipmap in the SOIL library. Old functionality - the custom software solution for mipmap generation - will be (and should be) left unchanged.

The new generation method can be used when passing new flag called SOIL_FLAG_GL_MIPMAPS. For desktop OpenGL this should be much faster than the original SOIL method. It can be hardware accelerated and it will work for NPOT textures. When using standard SOIL_FLAG_MIPMAPS SOIL rescales image to be POT and then creates mipmaps. All of that happens in custom code - CPU side.



Since the lib is small I do not want to introduce GLEW or other Another assumption:Since the lib is small I do not want to introduce GLEW or other extension loading libraries . Extension loading will be done manually.

Desired usage:

texID = SOIL_load_OGL_texture("test.jpg", SOIL_LOAD_AUTO, SOIL_CREATE_NEW_ID, SOIL_FLAGS_GL_MIPMAPS); // <<

The solution

Since there is no GL_EXT_mipmap extension we need to find where our desired function is placed. The easiest way to do that is to download latest version of glext.h and search for glGenerateMipmap. We will find two version:

glGenerateMipmap - in OpenGL 3.0 core or in GL_ARB_framebuffer_object

glGenerateMipmapEXT - in GL_EXT_framebuffer_object

The code will try to find the first one if not then the second function pointer will be obtained. If both test fail then we will use same functionality as SOIL_FLAG_MIPMAPS (fallback).

There is no need to load all functions from extension actually, only one is essential. First the code below should be added:

// soil.c typedef void (APIENTRY *P_PFNGLGENERATEMIPMAPPROC)(GLenum target); static P_PFNGLGENERATEMIPMAPPROC soilGlGenerateMipmap = NULL;

static int has_gen_mipmap_capability = SOIL_CAPABILITY_UNKNOWN; static int query_gen_mipmap_capability( void );

The above example adds function declaration (we can find the proper declaration in the glext.h) and then the actual function pointer. The last line is a function that has to be invoked some time in the code to load and check the extension. This should be done only in the first time.

Query extension

Let us go inside query_gen_mipmap_capability() :

int query_gen_mipmap_capability( void ) { /* check for the capability */ P_PFNGLGENERATEMIPMAPPROC ext_addr = NULL; if( has_gen_mipmap_capability == SOIL_CAPABILITY_UNKNOWN ) { // instead of checking "GL_ARB_framebuffer_object" or // "GL_EXT_framebuffer_object" // we simply test the function pointer ext_addr = (P_PFNGLGENERATEMIPMAPPROC) soilLoadProcAddr("glGenerateMipmap"); if(ext_addr == NULL) { ext_addr = (P_PFNGLGENERATEMIPMAPPROC) soilLoadProcAddr("glGenerateMipmapEXT"); } if(ext_addr == NULL) { /* not there, flag the failure */ has_gen_mipmap_capability = SOIL_CAPABILITY_NONE; } else { /* it's there! */ has_gen_mipmap_capability = SOIL_CAPABILITY_PRESENT; soilGlGenerateMipmap = ext_addr; } } return has_gen_mipmap_capability; }

The code is quite simple. It basically checks if our function pointer is available in the system. We could check availability of the extension first but our method should be equally safe. Usually SOIL is called after all OpenGL extension setup so our extension for GL_ARB_framebuffer_object should be already checked.

Let us go to the soilLoadProc function:

void *soilLoadProcAddr(const char *procName) { #ifdef WIN32 PROC p = wglGetProcAddress(procName); if (soilTestWinProcPointer(p)) return p; else return NULL; #elif defined(__APPLE__) || defined(__APPLE_CC__) // apple specific.. #elif defined ( linux ) || defined( __linux__ ) #if !defined(GLX_VERSION_1_4) return glXGetProcAddressARB((const GLubyte *)procName); #else return glXGetProcAddress((const GLubyte *)procName); #endif #else return NULL; // unsupported platform #endif }

Interesting function soilTestWinProcPointer :

#ifdef WIN32 static int soilTestWinProcPointer(const PROC pTest) { ptrdiff_t iTest; if(!pTest) return 0; iTest = (ptrdiff_t)pTest; if(iTest == 1 || iTest == 2 || iTest == 3 || iTest == -1) return 0; return 1; } #endif

It appears that we cannot assume that wglGetProcAddress returns NULL or a proper pointer. We need to perform more testing (for 1, 2, 3 and -1).

Usage

Now we can use our loading code in SOIL texture loading function. This will happen in SOIL_internal_create_OGL_texture :

if( flags & SOIL_FLAG_MIPMAPS || flags & SOIL_FLAG_GL_MIPMAPS) { ... }

In the if statement we just need to write:

if ((flags & SOIL_FLAG_GL_MIPMAPS) && query_gen_mipmap_capability() == SOIL_CAPABILITY_PRESENT) { soilGlGenerateMipmap(opengl_texture_target); } else { // old functionality... }

Benefits

In the introduction I used catchy phrases like "4x speedup" or "2x lower memory consumption". Let me explain where those results may come from.

Memory consumption

For POT size there will be no difference of course. New method will create exactly the same number of levels as the SOIL way. But for NPOT size situation changes. Let us take simple case:

Image 540x600 RGB8 - memory needed 540*600*3 bytes = ~950kb

This image will have mipmaps: 270x300, 135x150, 67x75, 33x37, 16x18, 8x9, 4x4, 2x2, 1x1 - 10 levels (including original image).

In total we will need around 1265 kb. (33% more than with no mipmaps of course)

When we use SOIL method, first we need to rescale image to be POT - new size is 1024x1024! This is 3072kb!

Mipmaps: 512x512, 256x256, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, 1x1. In total we will have 11 levels! (one more then NPOT).

Total memory: around 4095kb! As we see it is even 3x larger than NPOT.

The difference is of course bigger when input size is a little bit larger then some POT size. If the input size is only a little bit smaller then some POT size the difference is small. As mentioned before, for POT size there is no difference (no need to scale the texture).

Performance

The first gain comes from smaller number of pixels to process when we use NPOT textures.

The second comes from internal optimization, possibility to use hardware accelerated scaling and lower cost of driver calls (one call to glGenerateMipmap vs several calls to glTexImage).

Brief results

I load one image 50 times and create 50 different texture objects.

Image 540x600 RGB jpeg: 50 loads: 0.5s vs 3.5s 62MB vs 200MB (total memory for 50 textures)

Image 1024x1024 RGB jpeg: 50 loads: 1.1s vs 3.1s memory 200MB in both cases of course



Those are only brief results and I will describe my perf test in the next post.

Although we load textures usually in init phase and thus there is no need to fight for the performance at all cost I think it is important to know that by a simple improvement we can get nice speed-up. It will be significant for scenarios where we dynamically load textures through the game. Or when we load all directory of photos to display them in some gallery. User should see results as soon as possible.

Beside all things: it was quite interesting experience for me :) I dig into code and I had to verify my initial thoughts :)

Notes

Then the code for loading/checking: