Emulating the extra processors

One of the fun aspects of a cartridge-based system is that you are literally plugging a PCB directly into your system. Thus, you can put extra coprocessors, often digital signal processors, right inside of the cartridge. This gives you an extra edge against your competitors' titles. The most popular SNES coprocessors were the SuperFX for polygon rendering and sprite rotation (used in Starfox and Super Mario World 2) and the DSP-1 for 3D math (used in Pilotwings and Mario Kart.)

Often, these processors are so segregated from the host processor that it is possible to implement them using high-level emulation (HLE). This isn't unique to the SNES coprocessors; you also see it in N64 video microcode emulation, among other areas.

The idea here is to not think of individual instructions, but about what an entire group of them do together. In other words, think about a program in terms of functions like "render a triangle at these points" or "rotate this sprite by N degrees." It is then possible to simulate these operations with virtually no overhead. Unfortunately, this approach also discards all timing information required for the execution of each individual instruction. And worst of all, it's usually anything but perfect: minor edge cases, quirks, and flaws in the original implementations are lost, resulting in subtly different operation.

The low-level emulation (LLE) approach is to treat the DSP just like a regular processor: execute each and every instruction, one at a time. Games no longer run faster than they really should, and the edge cases all work correctly. But this is immensely more demanding. Whereas Super Mario Kart runs just as fast with HLE as Super Mario World, a game with no coprocessor, it runs 25-30 percent slower when emulated with LLE. The Cx4 chip used in the latter Mega Man X games is so powerful that it can literally cut performance in half with LLE. DSPs are in fact very powerful, typically running at 21 MIPS or more, even back in the SNES days.

LLE is also a very expensive operation, monetarily speaking: to obtain the DSP program code requires melting the integrated circuit with nitric acid, scanning in the surface of a chip with an electron microscope, and then either staining and manually reading out or physically altering and monitoring the traces to extract the program and data ROMs. This kind of work can cost up to millions of dollars to have done professionally, depending upon the chip's complexity, due to the extremely specialized knowledge and equipment involved. Thanks to the efforts of an individual who goes by the name "Dr. Decapitator," we've been able to extract this data from nearly a dozen chips for just the cost of materials.

Once finished, you must realize that DSPs are usually one-off specialty parts. Instruction sets must be reverse-engineered from binary blobs and emulated with virtually no documentation at all. This is a demanding process, and it shows the level of dedication needed to accurately emulate these games.

Accurate enough?

Honestly, even with all of the issues listed above, we've only scratched the surface of accurate emulation. Take the case of DICE, the digital integrated circuit emulator. Here is an emulator that works at the transistor level for absolutely perfect recreation of the very first video games ever created. To run Pong at about 5-10fps, DICE requires a 3GHz processor. Yes, you read that right: no computer processor at this time that can run Pong at the circuit level at full speed. It's not that DICE is a slow program; indeed, it is very well optimized. It's that there is enormous overhead to simulating every last transitor propagation delay.

But one day it will be possible. And I for one am happy that this classic has been completely replicated for future generations.

Applying DICE's approach to more modern systems becomes problematic, however. Take the case of Visual6502. Some clever individuals used a technique not unlike our DSP extraction method to scan in the surface of the 6502 CPU, used by the Nintendo and Commodore 64, among others. They vectorized the transistors and provided a high-level simulation of the chip, which disregards propagation delays. This code was famously demonstrated in Javascript but has also been ported to C and optimized. Even with the shortcuts it takes, computers are not yet fast enough to use this implementation in emulation.

As much as I would like every emulator to support every last transistor propagation delay, in truth this simply isn't possible. It becomes fair to say that we will likely never see hardware capable of emulating an N64 at this level at playable framerates within our lifetimes.

In fact, it's doubtful this is even possible for the SNES. I do understand that at some level, the power of modern computing does have a hand in determining just how accurate emulators can become.

The compromise I have made was to craft a rough design that would be similar to the real hardware and to cleanly isolate each processor, sharing only the state that the real chips would share. While the goal is always to match the results of all operations on real hardware perfectly, my approach at least creates an emulated system that is virtually indistinguishable from real hardware. But unlike DICE, it is not a perfect digital form of the exact original hardware design.

When it comes to the N64 and above, I realize that even this approach is no longer practical. I do not have any easy answers here, but I know that nobody can realistically be expected to develop an emulator that cannot run at even a single frame per second on the most powerful system in the world.

In closing

What concerns me is that, for the most part, developers are afraid to even utilize half of the available processing power of today's systems. Old hardware gets typecast into the system requirements of the earliest, least accurate emulators, which become the benchmarks for all future efforts.

I see no reason why we should not be utilizing every last ounce of power that we have available today—or perhaps even a little bit more in anticipation of faster hardware in the future. The older emulators are not going away: they are still there for folks with older, slower hardware.

For those who have more powerful systems, please give the more accurate emulators a chance! Developers desperately need users to improve the faithfulness of their emulation, and they need an audience who can encourage them to keep going.