How to Write Bad Code Hans Spiller

There are a lot of ways to write bad code, and I've done most of them. But before we can really go about deciding what is bad code and what's not, we need to establish some metrics and goals.

Why would you want to write bad code? Well, there are lot of reasons:

fun

The most important is that it's fun. In the same way as throwing rocks at passing cars, and tearing the wings off of bugs are both a lot of fun, it's fun to really screw up a program.

Job security

if you're the only one that can work on a piece of code that they need, they can't fire you. The flip side of this is that they can't let you go work on anything else. Being able to really screw up a piece of code and then ducking out to go screw up another project too is a truly special talent that only a few have demonstrated.

Egoism.

Knowing that no one else can work on your code makes you feel smarter.

What are the tools of our trade?

The most important tool is confusion. It helps to a degree if we are confused ourselves, but if we're sufficiently confused we might not be able to get the piece of code to appear to work, and then where would we be? Code that doesn't work at all doesn't satisfy any of the reasons for writing bad code--no one will want it, so no one will be screwed up by it. So the key is to keep our personal confusion to a safe level, and maliciously do things to confuse the unfortunate who tries to read our work.

Most of the standard tactical maneuvers can be applied to achieve confusion: misdirection, obscuration, overwhelming, stealth, surprise, indirection, etc. If it works in football or warfare, it can probably be used to write bad code.

The second tool is suffering. This includes a lot of things: waiting for compiles, unhelpful error messages, long hours, the cumulative effects of confusion--anything that makes working on a piece of code unpleasant. I generally refer to this as pain. Bad code is painful to work on.

Lets look at a few software engineering concepts and see how they can make our code good or bad.

Locality.

Locality can be a very easy thing to measure. In order to figure out how something works, how often do you have to search around? If you have to follow several pointers for every single line, the bad code artist has done a good job.

Djikstra demonstrates one example of this in his "Goto Considered Harmful" letter. Each time you have to follow a goto, you have to change contexts. Of course, in most cases it's difficult to do more than a single indirection per line using goto, so it's a comparatively weak method of writing bad code. Worse yet, there are a lot of cases where gotos can actually improve readability, so goto is a fairly limited weapon.

We can expand the power of the goto by turning it into a function call, and doing several tiny ones on a line. Accessor functions are an excellent way to achieve this, especially inherited accessor functions. Each function does a tiny thing, and then calls some other function, requiring the would be maintainer to change contexts each time, not just elsewhere in the same procedure, but in another source, or better yet, in another source that's in another directory. We can use this to our advantage though, by hiding a side effect inside the accessor. It's most effective if he's not sure if the accessor he's looking at is the right one. Accessors, Inheritance and Overloading are among the best things yet invented for bad code writers. Of course, accessors are a double edged sword. It's quite possible to write an accessor that does something helpful, like internal consistency checking, or making it easy to change the layout of a structure, instead of just generating pointless and irritating indirection. W hen most accessors do almost nothing, while others, apparently very similar, do something quite radical, the bad coder has achieved the goal.

Djikstra's complaint about goto is that it leads to spaghetti code. Of course, with a little work, spaghetti code can be followed. Spaghetti data on the other hand, requires a much larger amount of work to follow. Spaghetti objects are clearly the highest form so far devised.

Source Code Browsers can give the illusion of greater locality than there actually is. This can work tremendously in favor of the bad coder. For example, many browsers don't work if the application failed to build. So by wiring in hidden dependencies on the specific directory structure of your machine, you can make life sheer hell for the person who would try to work on your code. And you can say "I dunno, it compiled for me" and show them how easy it is to find stuff with the browser.

Coupling/Encapsulation.

Coupling is a measure of how interrelated two separate things are. If there's a lot of coupling, then it's difficult to follow one thing without a deep understanding of the other. If there's minimal coupling, then you can understand the one thing with only a minimal understanding of what the other one does. Of course, as bad code artists, we're trying to maximize coupling.

The object orientation people have done bad code artists a huge favor here. What those misguided fools think they're doing is minimizing the amount that the played user of a component has to understand about the workings of the component. One of the ideas they use, they call "Hiding". Wow, this sounds just like a bad code technique! By making it difficult to get into the components of an "Object" they think they're promoting neatly encapsulated packages. Wrong! They've right into our hands. We can use those very mechanisms to hide our little pitfalls. And by forcing a complicated interface to fit an irrelevant language structure, we can convolute it dramatically!

If the object oriented people really wanted to promote encapsulation, they would have stressed encapsulation more, and never even used the word "hiding". Hiding is a very powerful bad code technique, because it makes things difficult to follow. Encapsulation is just the opposite--it makes it unnecessary to follow. You can't force encapsulation--you have to come up with an interface that's easy to describe--anathema to a bad coder. You can easily enforce hiding of course, and that plays perfectly into the hands of bad coders.

Here's a simple bad code technique based on coupling that can be used in any language to really screw things up: It is frequently useful to build a table which is indexed with an enumeration. Now I hear you say that this sounds like a good code technique. To make it a bad code technique, you merely need to separate the definition of the enumeration and the definition of the table itself. With most languages, this is easily done--in fact, with many languages, it takes work to do anything else. Now, for the Coup de Grace: Put some complicated conditional compilation in the table, so that not all builds use all of the elements--and make sure the enumeration has all the same conditionals. Putting all the variation at the end is cheating, and so is using a tool to generate it (unless you're abusing a tool...see below). For truly bad code, you have to be able to mix and match from the middle.

Redundancy

Duplication of code is a tremendously effective way to set pitfalls. For example, if you have a whole bunch of code that does very similar things, possibly with very minor variations, it's important to duplicate it if you want to write bad code. The easiest thing to see that this does is to bloat code, which is obviously the sort of badness we're looking for. But there's a much more insidious thing. Suppose that the would-be understander of your bad code needs to add something to this duplicated sequence, such as a bug fix. He has to find each instance and separately add the change to each one. Similarly, there's a chance some subtle difference will sneak in to some of the cases, and will get by testing. Serendipitous bad code! This of course works best if the similar sequences look quite different, e.g. with different comments or variable names, but are in fact very similar functionally.

Djikstra's "Goto Considered Harmful" letter is one of the bad code artist's best friends here. A lot of people who think they're writing good code by eliminating gotos, are in fact setting just the kind of pitfalls we're talking about. One kind of redundancy that's often used to eliminate gotos is to add a flag. Here we have an extra piece of data for which the sole purpose is to obfuscate a simple piece of control flow! After 25 years, I think it's fair to say that Djikstra's letter has done far more to promote bad code through duplication and spurious flags than the "spaghetti" code it was intended to prevent. Hooray for Djikstra! Of course, there is nothing preventing bad code artists from doing both.

Obscurity

It's quite possible to write a bad piece of bad code which is concise, self contained, and non-redundant. It generally is quite difficult though, and to achieve it is a sort of Nivana for the bad coder. It is in the area of obscurity that all of the truly bad coders rise to their supreme creative achievements. The most effective way is through unnecessarily complicated algorithms. An example would be using a quick sort or shell sort (uncommented of course) when there can never be more than a dozen or so items to be sorted. A good coder would have used a bubble sort. It's simpler, and for the short list, just as fast. Another example is using a complicated data structure, especially one that requires a lot of finicky special case code, to handle something that can easily be traversed linearly, and which is only used in a non time critical portion, such as user interaction.

One of my favorite comments is "Abandon hope all ye who enter here" heading a particularly obscure passage in a famous operating system. Many of the worst pieces of obscurity, such as the one this comment heads, are done in the name of performance. When this is done with full knowledge of the real performance of the program (either late in the development cycle, or by someone with a deep understanding of the behavior of the system), and in ways that leave the code portable and maintainable, this is good code. That's a lot of "with"s. Leave any of them out, and you've got bad code. Bit twiddlers are often the baddest of the bad coders, and they frequently take a justifiable pride in this.

One of the most effective ways to obscure things is to hide them in constructors and destructors. Many languages have constructs which have hidden or obscure side effects, (e.g. garbage collection, heap allocation, implicit indirection, mass allocation or copying) but the Object Oriented people have provided the most effective tool yet. They've also generated a bunch of new terminology and syntax for some old ideas, which always contributes mightily in our great goal.

Inheritance

Inheritance is a mechanism by which components can be selectively reused verbatim by another component of a program. This can be used to achieve incredible levels of nonlocality, obscuration, and misdirection if suitably applied. The most effective use the author has seen is in a fairly simple (approx 5000 lines of code) program which does some would-be simple analysis, using a well understood technique which has been universal since the computer industry first began. (The previous implementations of the algorithm the author has seen are in the general range of 2000 LOC, including one that was entirely in assembly, b.t.w.) This program used identically named functions which were in fact not related in any way, while frequently calling inherited functions across the same interface, to achieve a remarkable level of confusion. The programmer also found extensive opportunities to duplicate large passages of nearly identical code in several components. So you can see, inheritance is a wonderful opportunity for bringing out our other techniques.

Comments

Comments are one of the best ways to obfuscate and misdirect. At the very least, they can be used to overwhelm. Here are a couple of examples:

les bx, [pFoo] ; move pointer to a foo into es:bx

Note that the comment provides NO information at all. If we can read the code, we don't need the comment. Even without using Hungarian, we'd need to be pretty thick to not be able to figure out that something being loaded into es:bx was a pointer. (for those not fluent in the 80x86, es:bx is a register pair which can *only* be used as a pointer, it gives a protection fault if you load it with something that is not). All the comment does is take up space on the disk, and more importantly, take up time of the would-be understander.

/************ *IT FOOTYPE::ItOfThat(THAT * pThat) * Purpose: * find the IT of a THAT * Parameters: * pThat: A pointer to a THAT * Returns: * an IT * Notes: *****************************/ IT FOOTYPE::ItOfThat(THAT * pThat) {

This is even better. In a sense, it's triple redundant. Hungarian, the comment, and the declaration all say exactly the same thing, while providing no useful information about interactions, algorithms, side effects, and so forth. If it uses the this pointer in an obscure way, that just adds to the fun. Be wary: if the input and output are registers (i.e., if this is about assembly language), this is useful information, and as bad coders, we wouldn't want that. (Incidently, this example, with the names changed, is taken from actual code from an actual program which is being marketed commercialy by a major software developer.) A lot of bad coders blindly put such a header on all procedures, to pad their Lines Of Code statistics.

Perhaps the single most important thing that can be done with comments is misdirection, and sometimes it happens entirely by accident by people who have no intention of writing bad code. The situation usually comes about like this: Coder A writes a tricky bit of code, and carefully comments the algorithm used. Coder B (perhaps even the same person as coder A, but some time later) comes along and makes a big change to the algorithm, or even moves it entirely into a different place, leaving the comment behind.

Coder C, trying to make some subsequent change, reads the comment, and then spends a bunch of time reading the code and trying to match it up with the comments. Note that had Coder A actually been a malicious bad coder, he could have achieved the same effect by writing some obscure but irrelevant (but relevant sounding) comment instead of waiting for the transformation to occur by accident. Doug Klunder says that he'd rather have no comment at all than a misleading one. Obviously an enemy of bad coding.

Logistics

More pain can be caused by logistics than through any other element of code design. By logistics, I mean the actual mechanism involved in putting a change into a piece of executing code. This can involve editors, compilers, linkers, downloaders, debuggers, automatic build tools (e.g. make) and any other tool you might think of to involve. To a certain extent, the more tools involved, the more painful it is to work on a piece of code. But some tools actually do help. If the objective is to write bad code, the help they give should be minimized and undermined at every turn.

The makefile is one of the most effective. For example, using the same dependency list for every object file, whether it uses a particular include file or structure or not. will cause you to recompile everything, whether what you changed effected it or not. The people who use a "master include file" which includes everything all the time have added their support to this little element of bad coding. The monolithic precompiled header is of course exactly the same thing. Notice that this produces much of the effect of coupling without the bad code artist actually having to write any coupled code.

Object Oriented Languages, particularly C++, contribute greatly to this effect. In order to add a private method (or member) to a class, you need to modify the include file. In order to use any method of the class, you need to include the include file. So even with a well written makefile, you need to recompile a lot of stuff when all you've actually changed is completely encapsulated! For the bad code writer, this is a big improvement over the predecessor language C, where if you write a private function, nothing needs to be recompiled except the module that contains the private function. There are efforts afoot to build compilers which are able to figure out the true dependencies and avoid this spurious overhead. The bad coders of the world must unite to put down this sacrilege!

(a non sarcastic aside: the single best feature of C++ (and apart from inlining and // comments, perhaps the only good feature of C++) is that because all procedures are defined with CLASSNAME::ProcName it's very easy to find the definition with a text editor. By including the colons in the search, you only get a few of uses of the function)

One of my personal favorites is the spurious use of tools advertised as labor saving. One of the best examples is in the area of writing compiler front ends. For the last dozen or so years, one of the popular bad coding fads is to use so-called compiler compilers. These tools mostly use a technology called LALR(1) (see the section on obscurity) and when given a language description with imbedded actions written in a host language, produces a parser for the language with the actions invoked at the appropriate place. This sounds pretty simple, and many of those cretins who think that the goal is to write good code jumped on the bandwagon. Of course, none of the academics who invented LALR had ever written a compiler for a language with more than a few dozen productions in it, or for a user community that they couldn't count on the fingers of one hand. When people started using it to write compilers for real programmers to use, they found that the underlying tables grew exponentially. But then some bad c oders noticed that they were sparse and they added a compression algorithm that made it run in exponential time, instead of exponential space. Then some other bad coders discovered that the error messages were terrible, so instead of leaving bad enough alone, the added some extremely obscure code to make error messages that were negligibly better. Then they discovered that the output file (a machine generated program which implements the parser described) had become too big to compile, and they split it up. Now any good coder would know that to minimize coupling, you'd split things by function: the syntax and semantics for a given construct together, and the syntax and semantics for a different construct in a different place. But because they were using a tool, the bad coders cut the syntax from the semantics, almost perfectly perpendicular to what a good coder would have done, and maximized the coupling across the dividing line. A masterpiece of bad code, and many of the most popular compilers used to day do it! Unmaintainable, slow, terrible error handling, complicated logistics, obscure....LALR's got it all! Perhaps the best part is that using the traditional recursive descent technology, parsers are simple, fast, handle errors well, easy to write and maintain, and are a pretty small part of the typical compiler. LALR not only was done by bad coders, it can make bad coders out of good ones! These guys are stars of the bad code art!

Portability

Portability can describe a number of things. If a source tree can be used to build the same program on a number of different machines, but using the same processor and operating system, one level of portabilty has been achieved. A higher level is to be able to vary the operating system or the processor. There are many aspects to this: byte and word order, the machines native word size, arcane memory structures (such as segmentation) and finally, operating system incompatibilities.

Byte and word order and size are pretty hard to make much trouble with. Most of the fun is to be had in file formats, which is discussed in a separate chapter. But there still some tricks to be played. For example, using the same data object for several different things...sometines as an array of bytes, sometimes as an array of larger integers (or even as floating point!) and changing the current representation without changing data. This is a wonderful trick for the bad coder, because not only does it make the code precarious and unportable, but it also screws up the compiler's optimizations because it has a property known as "aliasing".

Another trick to screw up portabilty is amazingly common. The langauge C has two logical sorts of integers: ones which have a specific size (signed and unsigned char, short and long), and ones which are based on the word size of the machine (int and unsigned int). The language guarantees that int will be at least as long as a short, but no more. So the bad coder working on a 32 bit platform should naturally use "int" for all that stuff that needs a 32 bits to work right, and "long" for all those things which tend to be limited by the machine, such as array indicies and other sorts of counters. This is just the opposite of what a good coder would do.

The higher level things described, such as memory architecture and operating system compatibilities, are much more powerful, because they can affect the program at a much higher (and therefor more abstract) level. Design something into the architecture of the program which is fundimentally based on some operating system structure which is present in no other operating system. If you're on a segmented machine, build the availability of cheap movement of segment sized chunks of memory into your design. When you get to a machine with a flat memory model, your program will run slooooowly.

File Formats

The worst damage that can be done to portability is in the area of file formats. Dirty tricks in file formats have so much potential that they deserve a section on their own.

All of the platform dependant stuff is a natural. You can build a dependency on a particular platform's byte order, and then when you try to read it in on a platform with a different byte order, wierd stuff happens. Packing issues are effective too. Most machines need to align data words on machine word boundaries. But these often vary: 16 bit machines need 16 bit alignment, and 32 bit machines need 32 bit alignment. Some even require more, and a few require none at all. So defining a particular packing into a file, we can make it tricky to port.

Of course, by the simple expedient of file accessors, the file format can be isolated in such a way that a good coder can undermine our hard work. But remember that accessor functions can be a double edged sword. Build up a towering, precarious, and most of all, obscure dependency structure among these accessors and you're right back into bad code. There are a number of ways to achieve this, such as complicated interrelations between parts of the file that are not adjacent in any apparent way (see the section on Locality--the most effective apprach for this is to actually put the pieces in different physical files). But the most effective approach is versioning.

Versioning is simply the necessary fact that file formats change a little bit (and sometimes a lot) from version to version of the program. The new version is generally required to read the old versions, so you can bury lots of magic in the code to recognize these distinctions. A good coder will make a careful decision and implement low level version issues in the low level accessors, and high level isssues at the higher level. So a bad coder should do just the opposite. Generally it's difficult (but by no means impossible) to get low level stuff to a very high level, but pushing high level issues towards the bottom is really powerful, and it generally results in tremendous duplication of similar, but obscurely different pieces of code.

There are other file format issues that bear on bad code too. For example, you can make a format which is extremely wasteful. for example, the famous WINMAIL.DAT format encodes 256 color bitmaps and 18 color icons with 16 bits per pixel. 8 bits per pixel would have been ample. You can also simply be redundant. The popular DBF format maintains the length of each field at least twice, as well as keeping a separate position within the record. Since the records are all consecutive, the accessor function could have simply added up the lengths and figured out where the fields were, This requires a database which wants to change the sizes of a field to update all of these pieces separately. Excellent bad code.

Of course, the darnned good coders keep finding ways to minimize this stuff. For example, they've been putting compression schemes into the file system, so even though these wasteful formats are still pointlessly difficult to manipulate, they don't actually waste much of the disk. But the bad coders still have a way to fight back. The compression schemes rely on highly regular patterns in the data to achieve their ends. By doing something to obscure these regularities, we can undermine it. The best way known to man is called "Data Encryption". The people designing data encryption schemes are trying to do precisely what we want, which is to make data difficult to read by the uninitiated. Of course all you need to become initiated is a password, but the file system compression scheme certainly doesn't have that. Encryption changes the regularities in a file and makes them appear like random noise. The more effective the encryption scheme, the more random it appears.

One of my personal favorite tricks with file formats is to get so good at reading the bits that the original coder just looks at them with a hex dumper. Of course the next guy along is going to need a tool to make heads or tails of it. Gotcha!

There is one file format that bad coders need to be warned against, and that is human readable formats. Since text files are represented in a way which is extremely easy to move from one platform to another, it undermines our goal of being nonportable. Since the "language" described by a human readable file format is almost always extensible in obvious, regular, easily understood ways, it undermines our versioning tricks. As long as it's not encrypted, human readable files are easily compressed, often quite a lot. Of course the need for a special dumper is zero. Uncompressed, they are a relatively inefficient way to store stuff, but unless the designer of the text format is very bad (one of us bad coders!), it's rarely worse than a factor of two or three.

The one place that the bad coder can screw up a human readable format is in the reading and writing of it. Since it's built up of more abstract pieces, a parser is required. A simple recursive descent parser is very easy to write and with a well written lexical phase can be extremely fast, so even here, bad coders have their work cut out for them. But by the simple expedient of writing it badly, the simplest parser can be made an obscure knot of indecipherable gobletygook, and the lexer can be cripplingly slow. Of course, it takes some skill to do these simple things badly enough to achieve our ends while getting them to work at all, but it's been achieved many times.

Modifications

Oct 94 Many updates

Feb 6 95 added file format examples and footers

Mar 3 96 minor gramatical improvements, and added this modification history

Dec 96 Translated to html

Dec 29, 96 Typographic improvements, added section on inheritance.

Ain't I a stinker?