I’ve never seen a FIFO that worked. Period. Every piece of hardware I’ve had to write a driver for has had a buggy FIFO.

A FIFO, for those of you fortunate enough not to know, is a hardware gizmo that buffers up bytes between a source and a destination. FIFOs are used a lot in situations where you temporarily need to store a few extra bytes because the source and destination data rates don’t exactly match. For instance, disks and network controllers like to “dribble” data back and forth, while memory systems work most efficiently in bursts; this is a clear mismatch, and often you stick a FIFO in the middle to deal with it.

Imagine you’re a software guy and it’s your job to make a disk driver for a new piece of hardware. The first thing to try is to just read a sector from the disk. So you go flipping through the hardware documentation and find that you need to set up a transfer address, a transfer count, a transfer direction, and then an offset adjustment fumbleguzzle, followed by a “Go!” bit, then stand back and wait for the completion gortwibble.

Digging further, you find that the offset adjustment fumbleguzzle is computed by taking the 1’s complement of the modulo-eight-byte transfer size added to the ending transfer address. The completion gortwibble has you confused, until you realize the interrupt arrives through a spare line on the sound chip. Fine. You check your prototype hardware board and verify that you have the right collection of blue, green, yellow and purple-with-black-stripes wires. You’re sittin’ pretty, and it’s just a matter of slamming out the right code. How hard could it be?

Now, you’re a veteran of several chip wars, grizzled and tough and you eat hardware guys for breakfast, and you don’t believe anything you read in documentation. So you write something simple just to tickle what the hardware docs claim. It’s easy enough: Stuff some hardware registers and slurp sector zero off the drive. Go!

The system hard-locks, and the only way to get it back is to remove the power supply. Huh. So you single step through the code to find out where you blundered, and it works great. Sector zero lands in memory, right where it should, all happy and sparkly and wondering what the fuss is about. Fuck.

Through an afternoon of trial-and-error, you find out that there has to be a few instructions’ delay between the setting of the DMA address and the transfer count, or else the transfer count is set to “lots” and the DMA engine happily wipes out memory at DMA speeds; when the wavefront of DMA-driven destruction reaches your debugger’s stack it’s Lights Out. Furthermore, there were lies (lies! imagine that?) told about the completion gortwibble, and the interrupt needs to be edge-triggered, not level-sensitive, though the latter is all the cheap-ass sound chip is capable of. This will never work. You’re going to need more fancy-colored wires on that board.

So you follow official channels and send out a memo asking for an ECO (“engineering change order”), there are meetings and public floggings, and you finally get hardware that works, handed to you by a humbled and now very quiet hardware engineer who might, might get his soldering iron back if he’s on good behavior for the next six weeks. Ha ha. No, not really. What actually happens is that you knock on the side of hardware guy’s office door (these guys get offices, apparently) and explain the situation re gortwibbles and interrupts. You get a blank stare. Okay, maybe you made a stupid mistake. You explain more slowly and enunciate very clearly, waving your hands in wide, slow gestures, pantomiming DMA and register values, just as Americans do in foreign countries when they are asking well-armed locals where the Consulate used to be or maybe where they keep the blue-and-white wires. You feel like an idiot. You feel even more like an idiot when Blank Stare forwards you the email (CC’d to the whole hardware team, sales, marketing and Usenet, but nobody on the software side at all) explaining how the fumbleguzzle and gortwibble registers were designed-out months ago and replaced with an integrated, 67-bit-wide cobwolly register, and the interrupt in question doesn’t exist anymore (now, there are six of them to deal with. Erm).

“You’ll get that hardware next week.”

“What do we have now?”

“Those are the Rev B chips. They had lots of problems. Why are you even using those?”

Please note that I haven’t even gotten to the subject of FIFOs yet, and because I’m beating my head against the usual concrete post in the software area I’m not sure I’m going to stay conscious that long.

– – – –

So imagine (just for kicks) that you’ve got the disk system happily transferring bytes back and forth, but then you get reports that occasionally people are seeing some corrupted bytes. “That’s my data,” says one person, “but it’s shifted by one byte here.”

FIFO madness.

A FIFO is like an accordion; it fills up with bytes from the disk, but the memory system isn’t ready yet, so the FIFO has to hang onto them, getting more and more full, until finally the memory asks for “Bytes, and lots of ’em!” and (phew!) the FIFO deflates and starts filling again.

But the memory system is picky, and won’t accept bytes unless they are on a 4-byte boundary, so for unaligned accesses there are wacky start and end conditions. Standard textbook stuff, and they cover this stuff in every school’s design course, every school but the one that your hardware guy went to, that is. In Outer Gonzonistan they use a method handed down from The Ancients by generations of Village Elders, involving six bits specifying an arcane rotation-and-mask after mixing in the blood of a software —

“Oh God,” you cry, “Save me from this living hell.” Because as you fix one bug involving edge conditions, something else undocumented sticks its prairie-dog ass into the air and hoses you down with poo. Adjust the count for alignment, except you have to make sure things don’t cross a 4K boundary, and let’s not get started about the DMA engine hitting cirty cache lines or the wicked timing problems involving DRAM refresh.

This is about the time that the hardware manager approaches your own manager and accuses you of not being a team player. “My guys have been designing these FIFOs and catching nothing but flack from your software guys,” he says over his hardware-guy-class matching belly and beard. “And why isn’t that disk driver done yet?”

Your manager explains that the hardware is buggy. This is about the time that the Director of Hardware approaches the Director of Software. “I understand that some of the people on the software team are not being Team Players, and that the software is behind schedule while the hardware people are thumb-twiddling.”

Your own Director says she’ll look into it. Three minutes later you’re pinned against a wall in the parking garage, staring into the barrel of a sawed-off HR violation and stammering reasons why you shouldn’t just be launched into oblivion and write, say, video games for a living.

Ever dealt with a GPU hang?