Have just been testing the new Cython accelerator module for PyOpenGL 3.x on my workstation. As seen on the laptop, the PyOpenGL-specific stuff just sort of drops out of the hotspot-set. There's a few little spots with a percent or two showing up, but pretty much everything else is OpenGLContext scenegraph management stuff. Those tend to be nicely compartmentalized things such as "converters" module instances.

Also ran a profiler over a code-base that uses ctypes-only arrays (instead of Numpy, as OpenGLContext does)... those still need work, as a few of the helper functions are way too slow. Nothing complex there from what I can see, just need to provide similar, trivial accelerator functions to get the array dimensions and the like out of the Python-level instance objects.

OpenGLContext still needs some optimization love, the traversal code is still too heavy, and the node-path code is taking way too long (its calculating the current matrix for every transform in order to do frustum culling on the client side). There's lots of ways to optimize those, but the first step is algorithmic fixes, not writing C/Cython code.

I can't say I've explored using Cython for accessing e.g. raw OpenGL functionality (or any C functionality), nor have I figured out how to do things such as declaring arrays in Cython (I'm using Python lists). Still, for this kind of "spot optimization" of code that is already modular and heavily refactored it's been a very effective tool. For the most part I'm just copying the existing Python code into the accelerator module and tweaking to provide the C-level hints that let Cython generate the optimized code.

There's obviously concerns with the approach, the accelerator modules will have to be recompiled for every Python version (just like the old SWIG system), but the accelerator modules are comparatively trivial, they're pretty much just Python-API extensions, they don't need to hook into the system OpenGL/Tk/whatever libraries, and they should be relatively stable (all they do is optimize the wrapping process itself, rather than optimizing the particular wrappers).