



This looks a lot like another screenshot, from two years ago:







So why am I posting this now? Well the answer is simple, we're going back in time and preparing a new melonDS release that is roughly equivalent to 0.1.



-



Joke aside, there are some key differences between those screenshots:



* newer one has proper clipping at screen edges

* both lack Z-buffering, but newer one is different, likely because the older one didn't have Y-sorting

* newer one is using OpenGL



So, yeah, that's the long-awaited OpenGL renderer. Like the good ol' software renderer in its time, it's taking its baby steps, and barely beginning to render something, but I'm working on it, so in a while it will become awesome :P



This renderer will aim for reasonable accuracy. As a result, it will require a pretty recent OpenGL version, and compatible hardware. It's set to OpenGL 4.3 currently, but I will adjust the minimum requirement once the renderer is finished.



If needed, I can provide alternate versions of the renderer for lower-end hardware supporting older OpenGL versions, but they will be less accurate. While the software renderer is the 'gold standard', the current OpenGL renderer is a sort of 'minimum standard' to get most games to render correctly. Any lower-spec renderer may render certain games wrong and that will be unlikely to get fixed (or it would be fixed but at the cost of killing performance).



-



Speaking of the software renderer, I also felt like doing a bit more research towards the holy grail: pixel perfection. I'm not done yet, but I finally have those pesky polygon edge slope functions down. Someday we will get those aging cart tests to pass ;)



But that will be for later. I may also write a post about all the juicy low-level hardware details.



-



But, back to OpenGL. Might as well explain why the planning phase for this renderer took so long. Although you guess that it's 50% the DS GPU being a pile of quirks and 50% me being a lazy fuck.



The first experiments were made with a compute shader based rasterizer. That way, I could get it perfect, while supporting graphical enhancements. I ended up ditching this solution because the performance wasn't good.



So, back to more standard rendering methods, aka pushing triangles. We won't get to rasterize quads correctly that way, but in most cases, the difference shouldn't matter.



First thing to do is to devise an efficient way to push triangles. This requires straying away from standard rendering methods, especially in how we do texturing and all.



On the DS, a game can choose to change the current texture at any time. Polygon attributes can only be changed before a BEGIN_VTXS command, but that doesn't make it any better. Polygons are sorted by their Y coordinates before rendering, which can completely change their ordering. Basically, there is no guarantee that polygons will be grouped by polygon/texture attributes, and the ordering after Y-sorting must be preserved or you might break things like UIs that rely on it.



This is shitty for our purposes though. If, for the DS, changing polygon/texture attributes is mostly free, you can't say as much about OpenGL (or any desktop graphics API for that matter). You would end up with one draw call per polygon, which isn't really a good thing.



Another thing worth considering is that our window for 3D rendering is not a full frame (16.667ms). On the DS, 3D rendering starts at scanline 215 (or 214?). Rendering any sooner would be a bad idea as the game might still be updating texture VRAM. But, we need 3D graphics as soon as scanline 0 of the next frame, which leaves us only 48 scanlines worth of time to do the rendering.



The software renderer is able to work around this limitation by using threading and per-scanline rendering (pretty much like the real thing, except that one seems to render two scanlines at once), which extends the rendering time frame to 192 scanlines.



OpenGL does not render per-scanline, though. So we can forget about this. However, a possibility would be splitting the frame in four 256x48 chunks. I will study that possibility if performance is an issue -- would have to see how far the extended rendering timeframe can outweigh the extra draw calls. Maybe propose the two rendering methods as options.



Back to pushing triangles, for now. I devised a way to pass polygon/texture attributes to the fragment shader, and render all the polygons in one draw call. Nice and dandy, but we're not out of trouble. This will imply passing the raw DS VRAM to the fragment shader and having it handle all the details of texturing, akin to the TextureLookup() function in the software renderer. No idea about the performance implications of this.



Also, we will have to think of something for shadow polygons, I don't think we can use the regular stencil buffer with this.



Well. I hope this renderer will be compatible with OpenGL ES, with all the tricks it may be pulling, but... we'll see.