With all the recent tinkering with audio devices and sound routines, I stumbled across various resources, old and new. One such resource was this article on OS/2 Museum, about the Gravis UltraSound. (And small world, this site is by Michal Necasek, who is on the OpenWatcom team, the C/C++ compiler I use for my 16-bit DOS projects). More specifically, the ‘flamewar’ between Rich Heimlich and the rest of the newsgroup, regarding the quality of the UltraSound patches, and general usability in games.

Now, as a long-time demoscener and Amiga guy, it probably doesn’t surprise you that I myself was an early adopter of the GUS, and it has always had a special place in my heart. So I decided to browse through that flamewar, for a trip down memory lane. I can understand both sides of the argument.

In the blue corner…

The GUS was not designed to be a perfect clone of a prior standard, and build on that, unlike most other sound cards. Eg, the original Sound Blaster was basically an AdLib with a joystick port and a DMA-driven DAC for digital audio added. Later Sound Blasters and clones would in turn build on 100% AdLib/Sound Blaster compatibility. Likewise, the Roland MT-32 set a standard. Roland Sound Canvas set another standard (General MIDI), and also included an MT-32 compatibility mode (which wasn’t quite 100% though). Most other MIDI devices would also try to be compatible with Sound Canvas/General MIDI, and sometimes even MT-32.

The GUS was different. Being a RAM-based wavetable synthesizer, it most closely resembles the Amiga’s Paula chip. Which is something completely alien to PCs. You upload samples, which you can then play at any pitch and volume, anywhere in the stereo image (panning). While there was a brave attempt at a software layer to make the thing compatible with Sound Blaster (SBOS), the nature of the hardware didn’t lend itself very well to simulating the Yamaha OPL2 FM chip. So the results weren’t that great.

In theory it would lend itself quite well to MIDI, and there was also an emulator available to support Roland MT-32 and Sound Canvas (Mega-Em). However, for a complete General MIDI patch set, you needed quite a lot of RAM, and that was the bottleneck here. Early cards only had 256kB. Later cards had 512kB and could be upgraded to a maximum of 1 MB. Even 1 MB is still quite cramped for a high-quality General MIDI patch set. Top quality ROM-based wavetable synthesizers would have around 4 MB of ROM to store the patches.

Since the card was rather new and not that well-known, there weren’t that many games that supported the card directly, so you often had to rely on these less-than-great emulators. Even when games did use the card natively, the results weren’t always that great. And that’s what I want to focus on in this blog, but more on that later.

I never used SBOS myself, so I suppose my perspective on the GUS is slightly different anyway. My first soundcard was a Sound Blaster Pro 2, and when I got a GUS some years later (being a C64/Amiga guy, the SB Pro never impressed me much. The music sounded bland and the card was very noisy), I just left the SB Pro in my system, so I had the best of both worlds: Full AdLib/SB compatibility, and GUS support (or MT-32/Sound Canvas emulation) when required.

In the red corner…

People who owned and loved the UltraSound, knew what the card was capable of, if you played to its strengths, rather than its weaknesses (as in the emulators).

Gravis included their own MIDI player, where you could configure the player to use specially tweaked patch sets for each song. The card could really shine there. For example, they included a solo piano piece, where the entire RAM could be used for a single high-quality piano patch:

Another demonstration they included was this one:

That works well for individual songs, because you know what instruments are and aren’t used. But for generic software like games, you have to support all instruments, so you have to cram all GM instruments into the available RAM.

And being so similar to the Amiga’s Paula, the GUS was quickly adopted by demosceners, who had just recently started to focus on the PC, and brought the Amiga’s ProTracker music to the PC. Initially just by software-mixing multiple channels and outputting on PC speaker, Covox, Sound Blaster or similar single-channel devices. So when the GUS came out, everything seemed to fall into place: This card was made to play modules. Each module contains only the samples it needs, so you make maximum use of the RAM on the card. The chip would perform all mixing in hardware, so there was very little CPU overhead for playing music, and the resulting quality was excellent.

On the Amiga, every game used tracked music. So that would be a great solution for the GUS on the PC as well, right? Well, apparently not, because in practice very few games included tracked music on PC. And of the few games that did, many of them were ported from the Amiga, and used the 4-channel 8-bit music from the Amiga as-is. That didn’t really give the GUS much of a chance to shine. It couldn’t show off its 16-bit quality or its ability to mix up to 32 channels in hardware. Mixing just 4 channels was not such a heavy load on the CPU at the time, so hardware mixing wasn’t that much of an advantage in this specific case.

Yamaha’s FM synthesis

As you may know, the Sound Blaster and AdLib cards used a Yamaha FM synthesizer chip. Originally they used the OPL2, later generations (starting with Sound Blaster Pro 2 and AdLib Gold), used the more advanced OPL3. Now, Yamaha is a big name in the synthesizer world. And their FM synthesis was hugely popular in the 80s, especially with their revolutionary DX7 synthesizer, which you can hear in many hits from that era.

But I just said that I thought the Sound Blaster Pro 2 sounded bland. What happened here? Well, my guess is that MIDI happened. The above flamewar with Rich Heimlich seems to revolve a lot around the capability of the devices to play MIDI data. Rich Heimlich was doing QA for game developers at the time, and apparently game developers thought MIDI was very important.

Yamaha’s chips, much like the GUS, weren’t that well-suited for MIDI. For different reasons however, although, some are related. That is, if you want to play MIDI data, you need to program the proper instruments into the FM synthesizer. If you just use generic instrument patches, your music will sound… generic.

Also, you are not exploiting the fact that it is in fact an FM synthesizer, and you can modify all the operators in realtime, doing cool filter sweeps and other special effects that make old synthesizers so cool.

Why MIDI?

So what is it that made MIDI popular? Let’s define MIDI first, because MIDI seems to mean different things to different people.

I think we have to distinguish between three different ‘forms’ of MIDI:

MIDI as in the physical interface to connect musical devices MIDI as in the file format to store and replay music MIDI as in General MIDI

Interfaces

The first is not really relevant here. Early MIDI solutions on PC were actually a MIDI interface. For example, the MT-32 and Sound Canvas that were mentioned earlier were actually ‘sound modules’, which is basically a synthesizer without the keyboard. So the only way to get sound out of it is to send it MIDI data. Which you could do from any MIDI source, such as a MIDI keyboard, or a PC with a MIDI interface. The Roland MPU-401 was an early MIDI interface for PC, consisting of an ISA card and a breakout box with MIDI connections. The combination of MPU-401 + MT-32 became an early ‘standard’ in PC audio.

However, Roland later released the LAPC-I, which was essentially an MPU-401 and MT-32 integrated on an ISA card. So you no longer had any physical MIDI connection between the PC and the sound module. Various later sound cards would also offer MPU-401-compatibility, and redirect the MIDI data to their onboard synthesizer (like the GUS with its MegaEm emulation, or the Sound Blaster 16 with WaveBlaster option, or the AWE32). I can also mention the IBM Music Feature Card, which was a similar concept to the LAPC-I, except that its MIDI interface was not compatible with the MPU-401, and it contained a Yamaha FB-01 sound module instead of an MT-32.

So for PCs, the physical MIDI interface is not relevant. The MPU-401 hardware became a de-facto standard ‘API’ for sending MIDI data to a sound module. Whether or not that is actually implemented with a physical MIDI interface makes no difference for PC software.

File format

Part of the MIDI standard is also a way to store the MIDI data that is normally sent over the interface to a file, officially called ‘Standard MIDI file’ or sometimes SMF. It is basically a real-time log of MIDI data coming in from an interface: a sequence of MIDI data events, with delta timestamps of very high accuracy (up to microsecond resolution). We mostly know them as ‘.MID’ files. These are also not that relevant to PC games. That is, they may be used in the early stages of composing the music, but most developers will at some point convert the MIDI data to a custom format that is more suited to realtime playback during a game on various hardware.

General MIDI

Now this is the part that affects sound cards, and the GUS in particular. Initially, MIDI was nothing more than the first two points: an interface and a file format. So where is the problem? Well, MIDI merely describes a number of ‘events’, such as note on/off, vibrato, etc. So MIDI events tell a sound module what to play, but nothing more. For example, you can send an event to select ‘program 3’, and then to play a C#4 at velocity 87.

The problem is… what is ‘program 3’? That’s not described by the MIDI events. Different sound modules could have entirely different types of instruments mapped to the same programs. And even if you map to the same instruments, the ‘piano’ of one sound module will sound different to the other, and one module may support things like aftertouch, while another module does not, so the expression is not the same.

In the PC-world, the MT-32 became the de-facto standard, because it just happened to be the first commonly available/supported MIDI device. So games assumed that you connected an MT-32, and so they knew what instruments mapped to which programs. One reason why the IBM Music Feature Card failed was because its FB-01 was very different from the MT-32, and the music had to be tweaked specifically for the unit to even sound acceptable, let alone sound good.

Roland later introduced the SC-55 Sound Canvas, as something of a ‘successor’ to the MT-32. The SC-55 was the first device to also support ‘General MIDI’, which was a standardization of the instrument map, as well as a minimum requirement for various specs, such as polyphony and multi-timbral support. It could be switched to the MT-32 instrument map for backward compatibility.

Where did it go wrong?

While the idea of standardizing MIDI instruments and specs seems like a noble cause, it never quite worked in practice. Firstly, even though you now have defined that program 1 is always a piano, and that you can always find an organ at program 17, there is still no guarantee that things will sound anything alike. Different sound modules will have different methods of sound generation, use different samples, and whatnot, so it never sounds quite the same. What’s worse, if you play an entire piece of music (as is common with games), you use a mix of various instruments. You get something that is more than the ‘sum of its parts’… as in, the fact that each individual instrument may not sound entirely like the one the composer had used, is amplified by them not fitting together in the mix like the composer intended.

In fact, even the SC-55 suffered from this already: While it has an MT-32 ’emulation’ mode, it does not use the same linear arithmetic method of sound generation that the real MT-32 uses, so its instruments sound different. Games that were designed specifically for the MT-32 may sound slightly less than perfect to downright painful.

The second problem is that indeed developers would design sound specifically for the MT-32, and thereby used so-called ‘System Exclusive’ messages to reprogram the sounds of the MT-32 to better fit the composition. As the name already implies, these messages are exclusive to a particular ‘system’, and as such are ignored by other devices. So the SC-55 can play the standard MT-32 sounds, but it cannot handle any non-standard programming.

This leads to a ‘lowest common denominator’ problem: Because there are so many different General MIDI devices out there, it’s impossible to try and program custom sounds on each and every one of them. So you just don’t use it. This is always a problem with standards and extension mechanisms, and reminds me a lot of OpenGL and its extension system.

Today, many years later, General MIDI is still supported by the built-in Windows software synthesizer and most synthesizers and sound modules on the market, and the situation hasn’t really changed: if you just grab a random General MIDI file, it will sound different on all of them, and in many cases it doesn’t even sound that good. The fact that it’s ‘lowest common denominator’ also means that some of the expression and capabilities of synthesizers are lost, and they tend to sound a bit ‘robotic’.

So I think by now it is safe to say that if the goal of General MIDI was to standardize MIDI and make all MIDI sound good everywhere, all the time, it has failed miserably. Hence General MIDI never caught on as a file format for sharing music, and we stopped using it for that purpose many years ago. The ‘classic’ MIDI interface and file format/data are still being used in audio software, but things went more into the direction of custom virtual instruments with VSTi plugins and such, so I don’t think anyone bothers with standardized instrument mapping anymore. The first two parts of MIDI, the interface and the file format, did their job well, and still do to this day.

Getting back to games, various developers would build their music system around MIDI, creating their own dialect or preprocessor. Some interesting examples are IMF by ID software, which preprocesses the MIDI data to OPL2-specific statements, and HERAD by Cryo Interactive.

Doing something ‘custom’ with MIDI was required for at least two reasons:

Only high-end devices like the IBM Music Feature Card and the MPU-401/MT-32/Sound Canvas could interpret MIDI directly. For other devices, such as PC speaker, PCjr/Tandy audio, AdLib or Game Blaster, you would need to translate the MIDI data to specific commands for each chip to play the right notes. Most audio devices tend to be very limited in the number of instruments they can play at a time, and how much polyphony they have.

Especially that second issue is a problem with MIDI. Since MIDI only sends note on/off commands, there is no explicit polyphony. You can just endlessly turn on notes, and have ‘infinite polyphony’ going on. Since MIDI devices tend to be somewhat ‘high-end’, they’ll usually have quite a bit of polyphony. For example, the MT-32 already supports up to 32 voices at a time. It has a simple built-in ‘voice allocation’, so it will dynamically allocate voices to each note that is played, and it will turn off ‘older’ notes when it runs out. With enough polyphony that usually works fine in practice. But if you only have a few voices to start with, even playing chords and a melody at the same time may already cause notes to drop out.

An alternative

Perhaps it’s interesting to mention the Music Macro Language (MML) here. Like the MIDI file format it was a way to store note data in a way that was independent from the actual hardware. Various early BASIC dialects had support for it. It seemed to especially be popular in Japan, possibly because of the popularity of the MSX platform there. At any rate, where some game developers would build a music system around MIDI, others would build an MML interpreter, usually with their own extensions to make better use of the hardware. Chris Covell did an interesting analysis of the MML interpreter found in some Neo Geo games.

So, trackers then!

Right, so what is the difference between trackers and MIDI anyway? Well, there are some fundamental differences, mainly:

The instrument data is stored together with the note data. Usually the instruments are actually embedded inside the tracker ‘module’ file, although some early trackers would store the instruments in separate files and reference them from the main file, so that instruments could easily be re-used by multiple songs on a single disk. Notes are entered in ‘patterns’, like a 2d matrix of note data, where a pattern is a few bars of music. These patterns are then entered in a ‘sequence’, which determines the order of the song, allowing easy re-use of patterns. The columns of the pattern are ‘channels’, where each channel maps directly to a physical voice on the audio hardware, and each channel is monophonic, like the audio hardware is. The horizontal ‘rows’ of the pattern make up the timeline. The timing is usually synchronized to the framerate (depending on the system this is usually 50, 60 or 70 Hz), and the tempo is set by how many frames each row should take.

Does that sound limited? Well yes, it does. But there is a method to this madness. Where MIDI is a ‘high-level’ solution for music-related data, which is very flexible and has very high accuracy, trackers are more ‘low-level’. You could argue that MIDI is like C, and trackers are more like assembly. Or, you could think of MIDI as HTML: it describes which components should be on the page, and roughly describes the layout, but different browsers, screen sizes, installed fonts etc will make the same page look slightly different. A tracker however is more like PostScript or PDF: it describes *exactly* what the page looks like. Let’s look at these 4 characteristics of trackers in detail.

Instruments inside/with the file

Trackers started out as being hardware-specific music editors, mainly on C64 and Amiga. As such, they were targeted at a specific music chip and its capabilities. As a result, you can only play tracker modules on the actual hardware (or an emulation thereof). But since it is a complete package of both note data and instrument data, the tracker module defines exactly how the song should sound, unlike MIDI and its General MIDI standard, which merely describe that a certain instrument should be ‘a piano’, or ‘a guitar’ or such.

The most popular form of tracker music is derived from the Amiga and its SoundTracker/NoiseTracker/ProTracker software. I have discussed the Amiga’s Paula sound chip before. It was quite revolutionary at the time in that it used 4 digital sound channels. Therefore, Amiga trackers used digital samples as instruments. Given enough CPU processing power, and a way to output at least a single digital audio stream, it was relatively easy to play Amiga modules on other platforms, so these modules were also used on PC and even Atari ST at times.

Notes entered in patterns

I more or less said it already: trackers use sequences of patterns. I think to explain what a ‘pattern’ is, an image speaks more than a thousand words:

If you are familiar with drum machines, they usually work in a similar way: a ‘pattern’ is a short ‘slice’ of music, usually a few bars. Then you create a song by creating a ‘sequence’ of patterns, where you can re-use the same patterns many times to save time and space.

Patterns are vertically oriented, you usually have 64 rows to place your notes on. What these rows mean ‘rhythmically’ depends on what song speed you choose (so how quickly your rows are played), and how ‘sparsely’ you fill them. For example, you could put 4 bars inside a single pattern. But if you space your notes apart twice as far, and double the speed, it sounds the same, but you only get 2 bars out of the 64 rows now. However, you have gained extra ‘resolution’, because you now have twice the amount of rows in the same time interval.

Pattern columns are ‘voices’

This is perhaps the biggest difference between MIDI and trackers: Any polyphony in a tracker is explicit. Each channel is monophonic, and maps directly to a (monophonic) voice on the hardware. This is especially useful for very limited sound chips that only have a handful of voices (like 3 for the C64 and 4 for the Amiga). MIDI simply sends note on/off events, and there will be some kind of interpreter that converts the MIDI data to the actual hardware, which will have to decide which voices to allocate, and may have to shut down notes when new note on-events arrive and there are no more free voices.

With a tracker, you will explicitly allocate each note you play to a given channel/voice, so you always know which notes will be enabled, and which will be disabled. This allows you to make very effective use of only a very limited number of channels. You can for example ‘weave’ together some drums and a melody or bassline. See for example what Rob Hubbard does here, at around 4:03:

He starts out with just a single channel, weaving together drums and melody. Then he adds a second channel with a bassline and even more percussion. And then the third channel comes in with the main melody and some extra embellishments. He plays way more than 3 parts all together on a chip only capable of 3 channels. That is because he can optimize the use of the hardware by manually picking where every note goes.

Here is another example, by Purple Motion (of Future Crew), using only two channels:

And another 2-channel one by Purple Motion, just because optimization is just that cool:

I suppose these songs give a good idea of just how powerful a tool a tracker can be in capable hands.

The horizontal ‘rows’ of the pattern make up the timeline

This part also has to do with efficiency and optimization, but not in the musical sense. You may recall my earlier articles regarding graphics programming and ‘racing the beam’ and such. Well, of course you will want to play some music while doing your graphics in a game or demo. But you don’t want your music routine to get in the way of your tightly timed pixel pushing. So what you want is to have a way to synchronize your music routine as well. This is why trackers will usually base their timing on the refresh-rate of the display.

For example, Amiga trackers would run at 50 Hz (PAL). That is, your game/demo engine will simply call the music routine once per frame. The speed-command would be a counter of how many frames each row would take. So if you set speed 6, that means that the music routine will count down 6 ‘ticks’ before advancing to the next row.

This allows you to choose when you call the music routine during a frame. So you can reserve a ‘slot’ in your rastertime, usually somewhere in the vertical blank interval, where you play the music. Then you know that by definition the music routine will not do anything during the rest of the frame, so you can do any cycle-exact code you like. The music is explicitly laid out in the row-format to be synchronized this way, allowing for very efficient and controlled replaying in terms of CPU time. The replay routine will only take a handful of scanlines.

With regular MIDI this is not possible. MIDI has very accurate timing, and if you were to just take any MIDI song, you will likely have to process multiple MIDI events during a frame. You never quite know when and where the next MIDI event may pop up. Which is why games generally quantize the MIDI down. However, quantizing it all the way down to around 50 or 60 Hz is not going to work well, so they generally still use a significantly higher frequency, like in the range 240-700 Hz. Which is an acceptable compromise, as long as you’re not trying to race the beam.

Back to the UltraSound

The specific characteristics and advantages of tracker-music should make it clear why it was so popular in the demoscene. And by extension you will probably see why demosceners loved the UltraSound so much: it seems to be ‘custom-made’ for playing sample-based tracker modules. ProTracker modules already sounded very good with 4 channels and 8-bit samples, even if on the PC you needed to dedicate quite a bit of CPU power for a software-mixing routine.

But now there was this hardware that gave you up to 32 channels, supported 16-bit samples, and even better: it did high-quality mixing in hardware, so like on the Amiga it took virtually no CPU time to play music at all. The UltraSound was like a ‘tracker accelerator’ card. If you heard the above examples with just 2 or 3 channels on primitive chips like the C64’s SID and the Amiga’s Paula, you can imagine what was possible with the UltraSound in capable hands.

Where things went wrong for the UltraSound is that trackers were not adopted by a lot of game developers. Which is strange in a way. On the Amiga, most games used one of the popular trackers, usually ProTracker. You would think that this approach would be adopted for the UltraSound as well. But for some reason, many developers treated it as a MIDI device only, and the UltraSound wasn’t nearly as impressive in games as it was in the demoscene.

So, let’s listen to two of my favourite demos from the time the UltraSound reigned supreme in the demoscene. The legendary demo Second Reality has an excellent soundtrack (arguably the highlight of the demo), using ‘only’ 8 channels:

And Triton’s Crystal Dream II also has some beautiful tracked music, again I believe it is ‘only’ 8 channels, certainly not the full 32 that the UltraSound offered (note by the way that the card pictured in the background of the setup menu is an UltraSound card):

What is interesting is that both these groups developed their own trackers. Future Crew developed Scream Tracker, and Triton developed FastTracker. They became the most popular trackers for PC and UltraSound.

So who won in the end? Well, neither did, really. The UltraSound came a bit too late. There were at least three developments that more or less rendered the UltraSound obsolete:

CPUs quickly became powerful enough to mix up to 32 channels with 16-bit accuracy and linear interpolation in the background, allowing you to get virtually the same quality of tracker music from any sound card with a single stereo 16-bit DAC (such as a Sound Blaster 16 or Pro Audio Spectrum 16) as you do from the UltraSound. CD-ROMs became mainstream, and games started to just include CD audio tracks as music, which no sound card could compete with anyway. Gaming migrated from DOS to Windows. Where games would access sound hardware directly under DOS, in Windows the sound hardware was abstracted, and you had to go via an API. This API was not particularly suited to a RAM-based wavetable synthesizer like the UltraSound was, so again you were mostly in General MIDI-land.

As for MIDI, point 2 more or less sealed its fate in the end as well, at least as far as games are concerned. Soundtracks are ‘pre-baked’ to CD-tracks or at least digital audio files on a CD, and just streamed through a stereo 16-bit DAC. MIDI has no place there.

I would say that General MIDI has become obsolete altogether. It may still be a supported standard in the market, but I don’t think many people actually use it to listen to music files on their PCs anymore. It just never sounded all that good.

MIDI itself is still widely used as a basis for connecting synthesizers and other equipment together, and most digital audio workstation software will also still be able to import and export standard MIDI files, although they generally have their own internal song format that is an extension of MIDI, which also includes digital audio tracks. Many songs you hear on the radio today probably have some MIDI events in them somewhere.

Trackers are also still used, both in the demoscene, and also in the ‘chiptune’ scene, which is somewhat of a spinoff of the demoscene. Many artists still release tracker songs regularly, and many fans still listen to tracker music.