This topic takes me back to some of the “good old days” of when I was a C programmer. I found “Top 10 Ways to be Screwed by the C programming language”, by Dave Dyer, on reddit. As I read through it I got a chuckle. Here were some good ones.

Note: none of the code examples display carriage-returns in strings that are printed. For whatever reason my blog software has a real problem with them.

Accidental assignment/Accidental Booleans if(a=b) c; /* a always equals b, but c will be executed if b!=0 */ Depending on your viewpoint, the bug in the language is that the assignment operator is too easy to confuse with the equality operator; or maybe the bug is that C doesn’t much care what constitutes a boolean expression: (a=b) is not a boolean expression! (but C doesn’t care).

No, this was not a bug in the language. It’s working the way it was intended. Fundamentally, C has a simple structure. Part of that is you can evaluate expressions just about anywhere, no matter what type they are. It works the same way in for-loops:

int i; for (printf("Starting processes"), i = 0; ExecuteProcess(i) && i < 6; printf("Processes executed successfully"), i++);

In this example, ExecuteProcess() returns 0 for failure or a non-zero value for success. That’s how it determines “true” and “false”. 0 == “false”. The example above will work. It’s hackerish, and it’s something I tried to avoid doing in my own code just to avoid confounding others, but the language designers wanted to allow this. The way they saw it, a for-loop was just a “form” that had an initialization step, a test, a code block that would get executed (if you put one in–the fact that I didn’t is intentional), and a post-execution step. You can pour whatever you want into these slots. The if construct is no different. In this way, C is a bit like Lisp, which has similar “open-ended” forms where you can put any sort of expression into the slots of constructs.

Closely related to this lack of rigor in booleans, consider this construction: if( 0 < a < 5) c; /* this "boolean" is always true! */ Always true because (0<a) generates either 0 or 1 depending on if (0<a), then compares the result to 5, which is always true, of course. C doesn’t really have boolean expressions, it only pretends to.

I forget when was the last time I’d seen constructs like this that actually worked. Maybe BASIC? It’s useful, but he’s right. It won’t fly in C. The reason is that it looks at r-values (which is what the code inside the if statement is) as an expression to evaluate. Nothing more. It doesn’t consider the context. As such, it evaluates expressions according to some rules. A book I got years ago, called “The C Programming Language”, “C: A Reference Manual”, by Harbison and Steele, is an excellent book on C. There was one page I went to more than any other. It had a hierarchical list of all the operators you could use, and what precedence they had in relation to each other. The higher up in the list, the higher their precedence. Some operators were equal in precedence, in which case they’d just be evaluated from left to right. Dyer’s example above is one such case.

Or consider this: if( a =! b) c; /* this is compiled as (a = !b) an assignment, rather than (a != b) or (a == !b) */

Yes, it’s a typo. What do you expect? I understand that in more modern languages, like C# (maybe Java, too), this would’ve been flagged. I know in C#, it flags it if it sees an assignment inside a conditional statement. I forget if it’s just a warning or a syntax error.

Unhygienic macros #define assign(a,b) a=(char)b assign(x,y>>8) becomes x=(char)y>>8 /* probably not what you want */

I didn’t get this example. Yes, you have to be careful how you construct your macros so they fit the situations you intend to use them in. Just common sense, if you ask me. Just rewrite the macro like so: “#define assign(a, b) a=(char)(b)”. So for, “assign(x,y>>8)” you’ll get “x=(char)(y>>8)”. That wasn’t so hard, was it?

Mismatched header files

Suppose foo.h contains: struct foo { BOOL a}; file F1.c contains #define BOOL char #include "foo.h" file F2.c contains #define BOOL int #include "foo.h" now, F1. and F2 disagree about the fundamental attributes of structure “foo”. If they talk to each other, You Lose!

This is just undisciplined coding at work. The place to define a value like BOOL is in something like a types.h file. You would then include that file inside foo.h. This way, whenever you include foo.h, you also get a consistent BOOL type built into the inclusion.

Unpredictable struct construction

Consider this bit packing struct: struct eeh_type { uint16 size: 10; /* 10 bits */ uint16 code: 6; /* 6 bits */ }; Depending on which C compiler, and which “endian” flavor of machine you are on, this might actually be implemented as <10-bits><6-bits> or as <6-bits><10-bits> Also, again depending on the C compiler, machine architecture, and various mysterious preference settings,the items might be aligned to the nearest 8, 16, 32, or 64 bits. So what matters? If you are trying to match bits with a real world file, everything! Need another way to lose big? How about this: Rect foo = {0,1,2,3}; // assign numbers to // the first four slots You may think you know what those four slots are, but there’s at least an even chance you’ll have to discover the hard way if the structure ever changes. Indefinite order of evaluation (contributed by Xavier @ triple-i.com) foo(pointer->member, pointer = &buffer[0]); Works with gcc (and other compilers I used until I tried acc) and does not with acc. The reason is that gcc evaluates function arguments from left to right, while acc evaluates arguments from right to left. K&R and ANSI/ISO C specifications do not define the order of evaluation for function arguments. It can be left-to-right, right-to-left or anything else and is “unspecified”. Thus any code which relies on this order of evaluation is doomed to be non portable, even across compilers on the same platform. This isn’t an entirely non controversial point of view. Read the supplementary dialog on the subject. [link does not work — Mark]

A fundamental misunderstanding of C I’ve seen occasionally is programmers not recognizing that it’s just a higher level assembly language. This goes for Dyer’s complaint about “fake booleans” as well. If you really want to grok why C does what it does, check out an assembly language sometime.

ANSI C might’ve helped alleviate this misunderstanding some (or maybe it just created more confusion). K&R C was the epitome of the “higher level assembly language” mindset. It’s the original C language, the one that Kernigan & Ritchie wrote (hence the name). It didn’t care about anything except for syntax. It cared nothing for types, even though you had to specify them. All types did for K&R C was tell it how much memory you wanted for a variable, and/or how big of an offset in memory you wanted between it and other variable values. I don’t know for sure, but I think it was the originator of the concept that an index into an array is just an offset into it, not an enumerator for its elements.

For example, back when I used it in college, an int was 16 bits (2 bytes), and it would allocate that on the stack for me, and keep track of its length, if I declared an int variable. That was it. It didn’t care what I put in it, just as an assembler wouldn’t check what type of value I put into a memory location.

If the compiler put adjacent fields in a struct into contiguous memory locations, it would allocate the necessary amount of memory for the struct by adding up the length of each variable specified in it, and then allocate the total amount for it. If I referenced a field within that struct, it was just an offset and a length into that memory area. Again, it didn’t care what type of value I put in it.

I discovered a couple years out of college that one could use this property of structs, even in ANSI C, to parse binary files. To tell you the truth, I think structs and bit fields were made for this sort of thing. Some example file reading code would be: “fread(&structVar, sizeof(structType), 1, fileHandle);”. fread()’s first parameter is of type void *, so it’ll take anything as a blob of memory. After this call, the struct variable is populated with data, and it’s easy enough to parse it by just saying “a = structVar.field1;”. Union types could also be used for parsing in memory.

In K&R C, if I defined a function, it’d look like this:

SomeFunction(arg1, arg2, arg3) int arg1; char *arg2; int arg3; {/* some code */}

All the type specifiers did was tell the compiler where the offsets in the stack frame were. It cared nothing for types. It didn’t even care how many arguments I passed in to the function! I could’ve passed an 8-byte struct in for arg1, and not filled in arg2 or arg3, for all it cared. I could’ve accessed the first 2 bytes of my struct via. arg1, the next four bytes by arg2 (pointers were 4 bytes then), and the next 2 by arg3. The reason being that my 8-byte struct filled the stack frame the function was expecting. The struct could’ve been bigger. It still wouldn’t have cared. It would’ve caused problems, because then I would’ve had a stack overflow, but it didn’t watch for such things, just as an assembler wouldn’t have.

It wasn’t even that necessary to declare the function in a header file. If you called the function from another module, the linker would figure out what you wanted and just set up the call. The one time you needed to declare a function in the header was if you had a circular reference, where Function A called Function B, which called Function A.

Maybe there was a point to this madness, but I know it drove even some very smart people nuts. One possibility was you could use this property of functions to automatically do some of the work for you in separating a chunk of memory into pieces, so you could use the function arguments as a kind of template into it. It’s hackerish, again, but it would’ve worked.

Anyway, continuing:

Easily changed block scope (Suggested by Marcel van der Peijl ) if( ... ) foo(); else bar(); which, when adding debugging statements, becomes if( ... ) foo(); /* the importance of this semicolon can't be overstated */ else printf( "Calling bar()" ); /* oops! the else stops here */ bar(); /* oops! bar is always executed */ There is a large class of similar errors, involving misplaced semicolons and brackets.

This is a programming 101 mistake. Come on! Put brackets around your code blocks, for cripes sake! 🙂 I’m getting the sneaking suspicion that the people complaining about this stuff are Python programmers.

I know Python determines code blocks via. indentation. I personally find this a little dangerous, because while I’m festidious about style, in a hurry, I sometimes just put aside code formatting to get something done, and then clean it up later. Worrying about formatting slows you down. Formatting doesn’t matter to C, delimiters do.

Unsafe returned values (suggested by Bill Davis <wdavis@dw3f.ess.harris.com>) char *f() { char result[80]; sprintf(result,"anything will do"); return(result); /* Oops! result is allocated on the stack. */ } int g() { char *p; p = f(); printf("f() returns: %s",p); } The “wonderful” thing about this bug is that it sometimes seems to be a correct program; As long as nothing has reused the particular piece of stack occupied by result.

Yeah, this is a mistake that’s often made by beginners, if they don’t understand stack dynamics that well. He’s right that sometimes this will work if nothing else happens to overwrite the stack. But there is an easy way to overcome this bug: make “result” static in f(). This way it will never be overwritten. Another method is to dynamically allocate the buffer for “result” (make the “result” variable a pointer) on the heap using malloc() (but you gotta remember to free “p” later in g()!). This way, only the pointer in f() is destroyed, but the buffer will remain.

When you’re writing in C or C++, you have to be more concerned with how the computer is executing what you’re going to write.

Undefined order of side effects. (suggested by michaelg@owl.WPI.EDU and others) Even within a single expression, even with only strictly manifest side effects, C doesn’t define the order of the side effects. Therefore, depending on your compiler, I/++I might be either 0 or 1. Try this: #include <stdio .h> int foo(int n) {printf("Foo got %d", n); return(0);} int bar(int n) {printf("Bar got %d", n); return(0);} int main(int argc, char *argv[]) { int m = 0; int (*(fun_array[3]))(); int i = 1; int ii = i/++i; printf("i/++i = %d, ",ii); fun_array[1] = foo; fun_array[2] = bar; (fun_array[++m])(++m); } Prints either i/++ i = 1 or i/++ i=0;

Prints either “Foo got 2”, or “Bar got 2”

Yeah, this is a common problem with C/C++. It’s been there forever. I took a brief course on C in college around 1991. One of the things the teacher explicitly had us try out was an example like this. The moral of the story is never use l-value expressions (an assignment like ++i) inside of an r-value expression. In other words don’t do stuff like i/++i unless you really understand the hardware. You never know how it’s going to come out.

The reason I was told this and other such weirdness exists in the language is the language designers wanted to make it possible for the compiler to optimize for the hardware. This meant not locking down certain characteristics like this. If you wanted to predict what the compiler for a particular hardware platform would do, you needed to understand how the CPU handled sequences of operators.

Utterly unsafe arrays This is so obvious it didn’t even make the list for the first 5 years, but C’s arrays and associated memory management are completely, utterly unsafe, and even obvious cases of error are not detected. int thisIsNuts[4]; int i; for ( i = 0; i < 10; ++i ) { thisIsNuts[ i ] = 0; /* Isn't it great? I can use elements 1-10 of a 4 element array, and no one cares */ } Of course, there are infinitely many ways to do things like this in C.

Any decent instructional material on C would tell you to watch out for this. Again, C is just a higher level assembly language. No assembler would watch out for this, either. If you ran this code on a Unix system, you’d likely get a “segmentation fault” error (the operating system would terminate the process).

Octal numbers (suggested by Paul C. Anagnostopoulos) In C, numbers beginning with a zero are evaluated in base 8. If there are no 8’s or 9’s in the numbers, then there will be no complaints from the compiler, only screams from the programmer when he finally discovers the nature of the problem. int numbers[] = { 001, // line up numbers for // typographical // clarity, lose big time 010, // 8 not 10 014 }; // 12, not 14

Maybe this has changed. What I remember is that octal numbers were formatted with a backslash in front (I can’t represent it here). There are all sorts of type markers like this in C and C++. Heck, don’t start a number with 0x either, because that’s a marker for a hexadecimal integer. Don’t put an “f” after a number, because that makes it floating-point, etc.

Fabulously awful “standard libraries” (suggested by Pietro Gagliardi) The default libraries in C are leftovers from the stone age of computing, when anything that worked was acceptable. They are full of time bombs waiting to explode at runtime, For an example, look no further than the “standard i/o library”, which, amazingly, is still standard. { int a=1,b=2; char buf[10]; sscanf("%d %d",a,b); // don't you mean &a,&b? Prepare to blow! printf("this is the result: %d %d"); // putting at least 20 characters in // a 10 character buffer // and fetching a couple random vars // from the stack. }

I ran into this bug rather frequently because I didn’t use fscanf() or sscanf() that much. It makes some sense, though, because C passes everything by value. You have to pass in pointers to create “out” values. The reason this fails is that functions like scanf() and printf() allow a variable number of arguments (after the first one, which specifies formatting). Since the value types could be anything, the language could not restrict what types were in the argument list. Since variables are not references to objects, as in modern languages, but actual spots in physical memory, you had to distinguish what was a reference to a memory space (a pointer), and what was an actual value in memory. In this case, sscanf() wants a reference to a spot in memory, but Gagliardi is passing it int’s. Since C does not have runtime type checking, and since they’re coming in through a var-arg list, sscanf() cannot realize that it’s not getting pointers. It just has to assume that it’s getting them. In C++, it would be possible for a function such as this to do run-time type checking on the parameters and throw an exception if they weren’t of the correct type.

I don’t see this issue with var-args in C as a weakness in the standard library, but rather in the language. Functions like this still exist with modern languages, but today’s languages, with garbage-collected memory, use the concept of references to objects as the default, so there’s less of a problem (though it’s still possible to throw a function like this a curveball by specifying a certain number of arguments in the format string, but passing in fewer arguments in the var-arg list).

All of this reminds me of why I’ve sworn off programming in C as much as I can. I understand it, but I’ve grown beyond it. C is still used extensively in open source programming, from what I hear. I think in a way, C is returning to its roots. It’s good for writing things like operating systems, device drivers, and virtual machines, because those things need to interact with the hardware at an intimate level, and C certainly doesn’t get in the way of that.

I programmed in C for a couple years in college, and then for another 4 years out in the work world, and I was mainly using it for writing utilities, applications, and servers. In terms of software engineering for those things, it wasn’t the best language, but back in the 90s, it’s what a lot of places used for a while, before moving on to C++. I think the main reason it got picked was it represented a “happy medium” between high level abstraction and execution speed. In the 80s, if you needed it to be fast, you wrote it in assembly language. In the 90s, you wrote it in C. From my experience C++ didn’t get reasonably fast on the hardware available at the time until the late 90s.

Language compilers, interpreters, and VM environments are now written or translated into C, like Ruby, Java, .Net, and Squeak. I’m sure there are some others. Squeak is a little unusual. The source code for its VM is in Smalltalk, but that gets translated to C and then compiled to generate a new version of it. In a way, C is becoming the new assembly language.