At the heart of stdio is the FILE structure. We’ll look at a portion of stdio.h from 7th Edition. This comes from The Unix Tree :

Well, here we go. This struct _iobuf is what everyone knows and loves as the FILE structure. A lot of this structure is still with us today. If you look at different versions of the structure, you’ll see the same members. The _ptr , _cnt , and _base members are used to help implement the buffering policies. The _flag member is how the structure kept track of knowing if it was open for read, write, or had hit EOF or an error.

The _file member is particularly interesting. This represented the file descriptor that was backed by the FILE structure. If you consider a char was either a signed or unsigned byte value, this limited you from file descriptors 0 to 127 or 255. Most folks could quite reasonably have more than 256 file descriptors open. If you’re writing a tool like grep, this might not be true, but if you’re writing a network application and there’s a file descriptor per connection, having more than 256 connections is quite common.

Back in 7th Edition, the FILE structures were statically allocated. There was no dynamic allocation of them and if we look at the bit above there, we’ll see that all you could have were 20 different FILE structures. In that world, having a file descriptor limit probably made a lot more sense and when this was first written it was before there was much networking present.

Unfortunately, the fact that these structures and the array were declared this way leaked into many different applications. Let’s take a look at the 4.4BSD version of the structure :

While it’s different, many things are still the same. The flags and file descriptor grew to a short , but everything is still public, which means the layout of the structure is in the ABI. This all leads to a few critical issues that are all intertwined.

Lack of Opacity

If we look at the APIs that exist around stdio, all of them take a FILE * . This is a pointer to the FILE structure. This means that the use of the APIs doesn’t actually require someone to know the size. Even from the beginning, the APIs that gave you access to stdio, fopen() , and fdopen() returned a FILE * and stdin , stdout , and stderr all referred to a FILE * pointer.

All of this tells us that a consumer didn’t actually need to know the layout of the structure. If you have a consumer where you don’t actually know the layout of a structure, then we call that an opaque structure. There are a couple implications of this. Most notably it means that an application cannot allocate the structure itself, it must ask a library to, and that you need to use functions to access the members of the structure.

While in the early days of Unix, things like binary compatibility weren’t top of mind, by the 90s, some systems started caring about backwards compatibility and it was a bit too late. An existing body of software was using those fields directly.

Many of the things that we know of as functions that were standardized in C89 such as getc() or fileno() , actually were macros and just dereferenced the structure members. Here’s an example of another part of V7’s stdio.h:

#define getc(p) (--(p)->_cnt>=0? *(p)->_ptr++&0377:_filbuf(p)) #define getchar() getc(stdin) #define putc(x,p) (--(p)->_cnt>=0? ((int)(*(p)->_ptr++=(unsigned)(x))):_flsbuf((unsigned)(x),p)) #define putchar(x) putc(x,stdout) #define feof(p) (((p)->_flag&_IOEOF)!=0) #define ferror(p) (((p)->_flag&_IOERR)!=0) #define fileno(p) p->_file

What we think of today as functions were actually all macros dereferenced the FILE structure directly. Every consumer had to know the implementation. Take fileno() for example. It returns the corresponding file descriptor for a FILE * . Here, it just dereferenced the field, which means that programs that called fileno() encoded the actual size and offset of the _file member in their programs. The same is true of all the other members referenced

If we turn to modern implementations, the stdio functions aren’t actually implemented as macros for the most part. Folks will just pay the cost of a function for that opacity. However, that doesn’t actually mean it’s safe to modify and change the FILE structure around. The problem isn’t actually just software that people compile on their own that could have encoded it in the past. No, some modern software refuses to let go of the encoding and accept that these structures may now be opaque.

A great example of this is Gnulib which is a portability library. Gnulib has actually gone back and encoded the now-opaque structures that exist into itself! It will happily reach into the structures and modify them. Here’s a header file that encodes all of the structures for a number of different operating systems from Windows, to Android, to OS X, and even Minix. At this point, an operating system maintainer can’t do too much, without risking arbitrary corruption in user programs. While it’s tempting to say who cares about someone that encodes the private interface of a once-public structure, when software breaks we all lose. Users generally don’t care about who was responsible for breakage, just that it broke and that’s not necessarily a bad thing.

For better or for worse, the stdio ship has sailed here. It doesn’t matter if Android made their version private or if the Solaris 64-bit structure was never even in a public header, because stdio once visible, some software will still use it and encode the private implementation. Even when functions were added to get and set these private members.