Standards and specs

The Interchange File Format (IFF)

Simple, portable, and extensible data storage

Content series: This content is part # of # in the series: Standards and specs Stay tuned for additional content in this series. This content is part of the series: Standards and specs Stay tuned for additional content in this series.

The Interchange File Format (IFF) standard is widely regarded as long dead, and indeed, no one uses it anymore, except that nearly everyone uses it sometimes. Many believe the IFF standard is an Amiga graphics standard, and certainly, there have been a great many graphics files saved in the IFF format. However, IFF is not just a graphics format. It has been used for graphics, audio, text, saved games, and more. Electronic Arts actually developed the standard, back when it was a software company and not just a video game company.

The IFF standard was introduced in 1985 and has a number of characteristics which date it. The most obvious is the lack of support for files exceeding 4GB in size.

The IFF format has long been abandoned. It's not that the Web site for it is down; it predates Web sites and never had one. There is no central registry to speak of anymore. However, the IFF format is still in use. The AIFF audio file format is simply one particular instance of an IFF file. The Quetzal save format shared by Z-machine interpreters is an IFF format. The file format survives because it does a good job of solving a number of the central and recurring problems which lead people to want a "file format" in the first place.

While the IFF format might not be actively maintained, or in very widespread use, it offers a lot of insight into generic file format standards and issues that might come up in dealing with them.

A brief overview

The easiest way to understand what the IFF standard aimed to accomplish is to look at a brief overview of it. Most generically, an IFF file consists of an IFF chunk. An IFF chunk is a four-byte type followed by data. The type is generally selected to be four ASCII characters in a row, which in some way hint at the type of data encoded. How is the data encoded? That depends on the type. A file must be of one of three types: 'FORM,' 'LIST,' or 'CAT .' (There is a space in the CAT name.) The FORM type is the simple type containing only a length (a 32-bit value) and some other chunk. The LIST and CAT types hold multiple chunks. Every chunk provides its size, so a program can skip over a chunk it doesn't know how to handle.

Headers and extensions You can figure out what a file is in essentially four ways. The first is to simply have everything hard-coded: "my program writes a file named foo, then reads the file named foo." However, this doesn't allow for any kind of interchange between programs. To allow interchange between programs, you need some way for programs to recognize files that they can read. CP/M introduced the notion of filename "extensions," three-letter symbols which gave information about a file's type. For instance, ".BAS" might indicate a BASIC program, or ".TXT" a text file. The practice was adopted in MS/DOS, and to this day, Windows® XP still uses extensions to identify files. Neither CP/M, MS-DOS, nor early Windows could have more than one extension on a file, or extensions over three characters in length. This led to a number of ugly and hard-to-read extensions, and a fair amount of reuse of common names. UNIX® systems tended to use both file name extensions and headers. A header is a chunk of data at the beginning of a file identifying its type. Many UNIX programs have used exceptionally trivial headers consisting of as little as a single "magic number" -- a numeric value which is unlikely to occur except in files of a given type. Finally, the Mac introduced the notion of file type and creator codes which were stored in the file system as data about the file rather than data in the file. Type and creator codes offered a very different set of quirks from extensions. Modern Macs use both systems, with surprisingly little confusion. The IFF format has a magic number in its header, followed by a more complete header indicating the contents in detail. As it happens, many IFF files are also stored with an ".IFF" filename extension, although users are sometimes surprised when such a file isn't a graphics file. Some programs have tended to save single-FORM IFF files with their form's type as a file extension, for instance, ".ilbm" for an ILBM file.

A number of common IFF types are widely enough known to be supported in modern software. The two most obvious are AIFF and ILBM, which are a sound format and a graphics format (InterLeaved BitMap), respectively. The JFIF (JPEG) image format has a similar header, too, and the PNG specification has imitated loosely some of the concepts. (One major change is that PNG files are designed to be easily serially writeable. IFF writers generally rely on the ability to seek back and update data.)

The IFF format is a binary file format. It is intended to be easy to read, but not human readable with the naked eye. The decision to encode sizes at the head of chunks simplifies processing when reading, but can make it harder to write files. Many programs which write IFF files depend on seeking back in the file to "fill in" the size once it's known, which can be awkward if you want to use IFF files in a pipeline. Note that no special marker is given for the end of a chunk; after all, you already know what size it is.

A couple of quirks reflect the 68000 architecture on which the IFF format was first developed. One is the 32-bit, big-endian numbers. The other is that chunks are always padded to an even number of bytes. The chunk length is not padded, but if your chunk length is three, you write a fourth byte after your real contents, so the reader doesn't have to worry about two-byte alignment issues. This provides a substantial performance advantage on many architectures, even when unaligned accesses are possible.

Similarities

Some formats are at least similar to IFF. For instance, the RIFF file format is nearly identical to IFF, only it uses little-endian integers. For the arguable convenience of slightly faster interpretation of a four-byte file size on x86 hardware, compatibility was discarded -- in retrospect, maybe not a good choice. It's one more standard to keep track of, and one more format for software to interpret.

The IFF specification's allowance for new chunk types is already suspiciously similar to XML, but the real kicker is that, since chunks can be nested, the IFF spec provides for defining your own chunk type, which defines its own contents. Since the whole chunk is skipped by programs unfamiliar with it, there's no namespace clash here. On the other hand, doing this too much violates the point of the spec. This is supposed to be the Interchange File Format, not the Only I Know What This Is File Format. This is a problem XML tools can face too, however. It is intrinsic to allowing later definition of new data types.

Another thing particularly similar to IFF files is the Macintosh resource fork. The resource file format has a different basic structure: you can have many resources of each type, but each must have a unique numeric ID. By contrast, with IFF, you can have as many as you have space for, but you have to come up with your own convention for assigning tags or names to them. In fact, this might be an excellent use of the NAME chunk type. The AIFF FORM can have a NAME chunk attached to it, so you could scan a file for FORM chunks containing an AIFF chunk with a given name. The big feature this implementation would lack is the ability to randomly access specific resources and edit them without rewriting the whole file. You could write an IFF library to handle this -- it's just not a standard feature. Embedding additional tags in data is a popular feature. JPEG files can contain thumbnails and metadata (such as camera orientation or resolution).

Most file formats that provide headers have some conceptual similarity to IFF. What they typically lack is the in-file documentation of which parts are headers and what types of data a file contains. The entire idea of a "file format" is one which arose gradually. Early programs simply wrote their data in whatever order was convenient, then loaded it in the same order. The development of file formats intended to be shared took time. Wikipedia has some interesting material on the history of file formats (see Related topics).

Compatibility

The obvious problem with a very generic file format is the possibility for subtle variances in handling of boundary conditions. This problem is hardly unique to generic file formats. Microsoft® Word used to invert black and white in some non-empty set of well-formed TIFF files. (According to Microsoft tech support, to whom I spoke, Word did not officially support the Version 6 TIFF spec, but some unspecified previous version. This was in 1995, a full three years after the completely free and unencumbered reference implementation was released.) In principle, the generic format has the advantage that developers will be sharing a common file format library for reading and writing the file, and doing their own processing only on specific chunks. In practice, of course, not all developers are so careful.

HAM and cheese The HAM mode isn't really a standards issue, but it's too cool not to mention in a bit more detail. HAM, or Hold-And-Modify, refers to a video mode the Amiga video hardware had, which allowed for simultaneous display of 4,096 colors with only a six-bit bitmap, on a platform with no support for 64-color palettes. How, you ask? 4,096 colors represent a 12-bit color space; each pixel's final value has four bits each of red, green, and blue. So -- the HAM mode divides each pixel into two segments: the first two bits indicate what type of pixel it is, and the next four provide a value. If the type value is 0, the other bits are used to select a color from a 16-color palette. If the type value is 1, the other bits are used to select a red value. What this means is that the blue and green values of the previous pixel are held, but the red value is modified, thus the name, hold-and-modify. Type 2 indicates a new blue value, and type 3 indicates a new green value. The "previous" pixel is the last pixel drawn on the scan line. Thus, the HAM mode introduced some artifacting that could blur vertical lines (because horizontal transitions could take a couple of pixels), but allowed for surprisingly good color fidelity for the number of bits used. (Later Amigas, adding eight-bit color mode, also supported a HAM8 format which did the same thing with 18-bit color, using two type bits and six value bits.) For extra fun, a few programs supported an even more elaborate format, called SHAM (Sliced HAM), in which each raster line of the image could have its own 16-bit palette. This was possible only because the Amiga had support for interrupt routines to run every raster line. The SHAM feature was deprecated when HAM8 was introduced. The Amiga also had a "Halfbrite" mode in which five bits were used to select a color from a palette, and the sixth indicated a half-intensity value. This provided a 64-color mode with only a 32-color palette. One of the most visually impressive Amiga animations was a simple animation done entirely in the 6th bitplane in Halfbrite mode. (Some early Amigas didn't support this mode.) This is the kind of thing you might have to deal with when trying to interpret graphics files from old hardware. Not all systems do this. The PICT and WMF image formats are essentially encodings of drawing instructions rather than renditions of the hardware layout.

Some particular instances of IFF files have unusual qualities. For instance, the Amiga's insanely strange Hold-And-Modify video modes are supported directly by the IFF file format. Pictures stored for this representation present an unusual challenge to software intending to display the picture on any other hardware. Even when the overall file format is somewhat standardized or open, a particular file's data might not be easy to use on another platform.

In fact, although you might expect endianness problems, the IFF format simply stated its expectations up front (sizes are big-endian), and I've never heard of any platform-specific problems in IFF code written with even a casual eye towards spec compliance. The weakness is mostly just that a graphics program which reads "IFF files" can't do much with an IFF sound file. Supporting the format doesn't imply supporting the data. This is much like the issues XML readers face. Being able to parse XML doesn't guarantee that you can read the data in a given XML file.

In some cases, data representations reflect hardware assumptions. For instance, the ILBM format reflects Amiga graphics hardware, and Amiga audio files often reflect Amiga audio hardware. (PC audio hardware was frequently different.)

This highlights one of the benefits of standardization: while very few people are using video hardware with a HAM mode (see sidebar), or have access to an Amiga audio chip, video and audio data for that platform are available in an exhaustively documented standard format, with multiple implementations floating around. By contrast, even a current and actively maintained format might be virtually impossible to use reliably. IFF "FTXT" (formatted text) files might not have had all the features of a modern word processor, but the definition is public, and programs supporting the format could generally communicate with each other. By contrast, I can't reliably exchange Microsoft Word documents with other users of Microsoft Word.

Lessons learned

One of the most disappointing lessons learned from IFF is simply that a pretty good standard can easily be ignored in favor of a huge variety of proprietary formats. A very large number of formats out there have carefully attempted to recreate just the basic functionality of IFF. In practice, this means that graphics programs are stuck supporting dozens of subtly incompatible formats, each of which has its own innovative set of quirks.

If you were going to do an IFF standard today, the obvious things to fix would be the size limitation (32-bit sizes are no longer large enough for files) and the flat and fairly small namespace for chunk types. The PCI spec's paired 16-bit values for vendor and product, while providing no more data bits than the 32-bit chunk types, are in many ways more useful due to improved namespace management. On the other hand, four bytes here and there is small potatoes today. It might make sense to use four bytes for a 'vendor' and another four for a 'product.' The similarity between this notion and the Macintosh type/creator codes is probably not a pure coincidence. A broader range of universal types could be provided. In fact, such a standard could be a pure extension to traditional IFF, which thoughtfully reserved the names CAT1-CAT9, FOR1-FOR9, and LIS1-LIS9 for future versions.

The IFF file format's basic design goals have survived. The same issues that were important in 1985 seem to be important today. 99% of the files many users manipulate on a regular basis could be happily stored in an IFF file of some sort, and the code used to read and write such files would be comparatively trivial. Well-tested IFF libraries are out there.

The problems IFF left to developers (such as how to tell people what your new chunk type was) have not really been changed that much by newer standards, such as XML. XML does offer some improvements over IFF in extensibility, and especially in human-readability. On the other hand, compare the size of even a simple XML parser to a very complete and robust IFF parser; it's not all wine and roses.

In the end, everything else aside, it comes down to this: my life as a computer user was immeasurably easier on a platform where nearly everything used IFF as, well, an interchange file format. The partial solutions of XML haven't really solved anything. Programs that are storing exactly the same sort of data in XML files still do it in entirely innovative and incompatible ways. Providing for common data types was a good idea. The IFF format provided many of the features much lauded in XML -- for instance, an easy way to skip over data you don't know what to do with.

What's really changed isn't the technical issues; it's the culture. For whatever reason, XML seems to be used primarily as a checkmark. Is our standard open? You betcha, we're using XML. Never mind that it would be easier and cheaper to get our data storage routines through industrial espionage than to figure out what on earth we're storing in it; it's still an "open standard." Unfortunately, our cultural unwillingness to store data in accessible formats is beyond the scope of a standards committee.

Downloadable resources

Related topics