For years, Nintendo 64 emulation has been pretty bad and lagging significantly behind Nintendo Gamecube/Wii emulation. At least 90 to 95% of the remaining problems are at the RDP level, the N64’s video subcomponent chip. By moving away from High-Level Emulation of the RDP, we could solve most of the remaining problems. The problem has been that for a long time, it seemed impossible to do this at playable speeds. Software rendering is too slow for a GPU from this timeframe, and older versions of OpenGL have too many crippling limitations in order to allow for a 1:1 reprogramming and port of Angrylion to GL.

At last, this dire situation will change in the upcoming days and we can finally release to the public something that will revolutionize N64 emulation forever so that we can move away from all of the hacky HLE video plugins that have been released in recent years.

The world’s first-ever low-level N64 video plugin implemented using the Vulkan API!

And not just any video plugin either. This is a reimplementation/port of Angrylion to Vulkan. This will be the first time most will be able to get anywhere close to playable speeds with an accuracy-based N64 video renderer.

This hardware renderer is unique for the following reasons:

This is the first N64 emulator project ever so far to receive Vulkan support.

This is the first time ever that an emulator takes advantage of asynchronous compute (exclusive only to DirectD12/Vulkan) for hardware rasterization of an emulated GPU.

This is the first time ever that the Angrylion renderer has been ported to a graphics API. It is the first time an RDP LLE video renderer for N64 has been capable of running at fullspeed. It marks a shift away from decades of inaccurate high-level emulation of the N64’s RDP which made for buggy N64 emulation in general.

How to use it?

When it will be released in the upcoming days, this is what you will need in order to use it.

You will need the latest RetroArch version (either nightlies or the upcoming 1.3.5 version). The libretro API has been updated to make asynchronous compute cores possible, hence why ‘Mupen64plus HW libretro’ will not work on any older version of RetroArch.

Your video card also needs to support the Vulkan graphics API.

When RetroArch 1.3.5 gets released

Download the new RetroArch 1.3.5, go to ‘Online Updater’, go to ‘Core Updater’.

From there, go to ‘Experimental’, and download Mupen64plus HW. This will download the Vulkan-enabled Mupen64plus core.

Before trying to use it, make sure your video card supports the Vulkan API otherwise it won’t work!

Why RDP LLE? Why is this significant?

For years, Nintendo 64 emulators have fixated upon a High-Level Emulation approach to emulate the RDP, the N64’s video rasterizer. Examples include Glide64, Rice, GLN64 (and its recent fork, GlideN64).

It is a practical but imperfect way of emulating the RDP for many reasons:

These plugins require numerous game-specific hacks and workarounds. It becomes a real maintenance chore and there’s plenty of missing graphical effects to this day. Examples include: missing lens flares in Turok: The Dinosaur Hunter, corrupt backgrounds in Killer Instinct Gold and GoldenEye 007, fiddly auxilliary frame buffer glitches, inaccurate approximations of graphical effects due to combiner issues, etc.

Most of these HLE RDP plugins recycle a lot of old code. For instance, Gliden64 is mostly a collage of GLN64 + Glide64 code, but the code recycling goes deeper than that. Low-level triangle rasterization functions in both Glide64 and Gliden64 are borrowed from Z64 GL, an RDP plugin by Ziggy. The problem is that bugs still exist in these sections of the code. Most of the low-level rasterization functions that keep being borrowed in these high-level plugins are directly responsible for many of the remaining glitches you can see. And since the code was written by outside people who are no longer active in the scene, it doesn’t seem likely it is ever going to get fixed.

There are other legacy issues. The most notorious one of all is of course Glide64, which originally targeted (you guessed it) the obsolete 3Dfx graphics API Glide. We are talking GL 1.2 / 1.3-ish era here, really stone-age. An OpenGL wrapper for Glide had to be written around Glide64 in order to get it to run with OpenGL-supported video cards in the first place, but the wrapper code unfortunately is far from optimal. Other plugins like Z64 GL still seem to use OpenGL 1.4x-era code and lots of questionable fixed function wrapper code.

Many games use custom RSP microcode to do certain game tasks. For instance, Rogue Squadron uses custom RSP microcode for terrain heightmap generation, while games like Resident Evil 2 and Legend of Zelda: Ocarina of Time use the RSP for video and image decompression routines. Usually this would call for a high-level implementation/approximation of what the game would expect to be returned to the RDP, and to also implement corresponding high-level displaylist implementations on the RDP rasterization side. Many games simply have never had their custom microcode properly reverse engineered, so the only way to play these games is to use a combination of a low-level RSP plugin and a low-level RDP renderer. Most of the existing microcode was actually handed to devs on a silver platter and it seems the remaining microcodes will probably never be reversed for this reason.

You run into pretty big bottlenecks with traditional GL rendering for which no real solutions exist, frame buffer bottlenecks, depth buffer bottlenecks, etc. More recent versions of OpenGL (4.3+) have made it possible to fix some of the issues, like better depth compare, faster and more efficient framebuffer to framebuffer copying, but it’s still honestly a big inoptimal mess.

Coverage emulation is usually completely stubbed out in HLE video plugins.

All of these plugins have so far completely avoided trying to emulate the VI interface. The VI interface basically reads from the RDP’s frame buffer and sends it to the digital-to-analog converter to create the video output. Along the way it applies several postprocessing effects including what appears to be 8x MSAA. I guess some can blame for this VI interface for leading to the ‘smudged’/’smoothed out’/’blurry’ look of many N64 games. But hey, we’re going for authentic here 🙂

Enter this new renderer. It takes as a base Angrylion (the most accurate RDP rasterizer yet so far) and it uses compute shaders to transfer the workload to the GPU instead of the CPU. Angrylion has been known to render nearly all games accurately unlike regular HLE N64. The only problem has been that it has been too slow to run at full-speed because of it being completely software rendered, which puts all the strain on the CPU. RDP LLE changes that around so that this rendering bottleneck is completely gone. With RDP LLE, the only remaining bottleneck will be the interpreter RSP plugin that a low-level RDP plugin has to use.

Work remaining to be done

With this video renderer we have aimed for a GL 4.3 / Vulkan featureset in order to escape most of the bottlenecks and limitations that usually drags N64 emulation down. From now on, there will be two big remaining tasks to be done:

We will have to port the code over to OpenGL 4.3+. Lower subsets of OpenGL won’t work as this renderer requires compute shader support.

With the RDP bottleneck being completely gone with this renderer, RSP has now become the main bottleneck. We will have to write a recompiler for the RSP in order to attain even better performance and reduce the RSP bottleneck as much as possible. So far, only Project64 has an RSP recompiler like this, but there are plans of using Daeken’s generic recompiler system in order to come up with something equivalent for Mupen64plus libretro.

Asynchronous compute raymarching libretro test core

In order to make this renderer possible, extensions to the libretro API had to be added.

For educational reasons and in order to serve as a proof of concept on how to make your own libretro core that takes advantage of the recently added asynchronous compute capabilities, a test core has been made, called ‘libretro-test-vulkan-async-compute‘.

It is a basic test program that demonstrates raymarching being done in Vulkan. We’d very much like to see people improve upon this and collaborate to make a more impressive core out of it.

You can find the sourcecode for this sample test core inside RetroArch’s source code directory tree (cores/libretro-test-vulkan-async-compute in specific).

Conclusion

It has been a long time coming, but finally with paraLLEl, N64 emulation can finally become ‘good enough’ and we no longer need to have patchwork renderer plugins that try to fix graphics issues on a per-game basis.