C versus C++

by Jakob Østergaard

February 24th, 2004

Introduction

Someone asked:

hey can anybody tell me what do i choose for programing either GNU c or JAVA for implementing some networkprograms using some linux systemcalls

Being a C++ evangelist (for what I believe are very good reasons), I suggested that the person started his project in C++ instead of Java or C.

Now, the language wars will never end, and everyone who enjoys programming have their own favourite language. I do not believe that I will ever manage to convince the unwashed masses that C++ is a superior language - yet, when someone cries out for advice, as was the case, I will of course give my best recommendations - and since I firmly believe that C++ is a superior general purpose programming language, I am going to recommend it. That was the start of "Beowulf Language War 2004". Q1 I guess ;)

Why C++ is a superior general purpose programming language

A little history

The C++ programming language was initially created by Bjarne Stroustrup, as a "better C". C++ was an attempt to extend and (to some extent) change the C programming language to overcome some of the common and big problems with the language. C++ was first standardized in 1998, and as such, is a much younger language than C.

So, what's the deal with C++?

For the vast majority of problems out there, I state that C++ provides no significant downsides and a number of significant improvements. Bold? Some people seem to think so, but it's really the case. Let's start out by clearing up a few very common C++ misunderstandings:

C++ is slower than C Wrong! Many C programs are valid C++ programs as well - and such a C program should run at identical speed when translated with either the C and with the C++ compiler.

Wrong! Many C programs are valid C++ programs as well - and such a C program should run at identical speed when translated with either the C and with the C++ compiler. C++ specific features give overhead Wrong! The so-called overhead introduced by certain C++ specific features (such as virtual function calls or exceptions), is comparable to the overhead you yourself would introduce should you choose to go thru the pain it would be to implement a similar feature in C.

Wrong! The so-called overhead introduced by certain C++ specific features (such as virtual function calls or exceptions), is comparable to the overhead you yourself would introduce should you choose to go thru the pain it would be to implement a similar feature in C. C++ is object oriented Wrong! The C++ language contains some language extentions over C, that make object oriented programming and generic programming more convenient. C++ does not force object oriented design anywhere - it merely allows for it if the programmer deems OO feasible. C allows for object oriented programming as well, C++ only makes it simpler and less error prone.

Stronger typing The type system in C++ is stronger than in C. This prevents many common programming errors - coupled with the next very important feature, the stronger type system even manages not to be an inconvenience.

The type system in C++ is stronger than in C. This prevents many common programming errors - coupled with the next very important feature, the stronger type system even manages not to be an inconvenience. Parameterized types The template keyword allows the programmer to write generic (type-agnostic) implementations of algorithms. Where in C, one could write a generic list implementation with an element like: struct element_t { struct element_t *next, *prev; void *element; }; C++ allows one to write something like: template <typename T> struct element_t { element_t<T> *next, *prev; T element; }; Not only does the C++ implementation prevent common programming errors (like putting an element of the wrong type on the list), it also allows better optimization by the compiler! For example, a generic sort implementation is available in both C and C++ - the C routine is defined as: void qsort(void *base, size_t nmemb, size_t size, int(*compar)(const void *, const void *)); whereas the C++ routine is defined as template void sort(RandomAccessIterator first, RandomAccessIterator last); The difference being, that for example sorting an array of integers, would, in the C case, require a function call for every single compare, whereas the C++ implementation would allow the compiler to inline the integer comparison calls, as the actual sort routine is instantiated at compile time, automatically by the compiler, with the correct types inserted in the template arguments.

The keyword allows the programmer to write generic (type-agnostic) implementations of algorithms. Where in C, one could write a generic list implementation with an element like: C++ allows one to write something like: Not only does the C++ implementation prevent common programming errors (like putting an element of the wrong type on the list), it also allows better optimization by the compiler! For example, a generic sort implementation is available in both C and C++ - the C routine is defined as: whereas the C++ routine is defined as The difference being, that for example sorting an array of integers, would, in the C case, require a function call for every single compare, whereas the C++ implementation would allow the compiler to inline the integer comparison calls, as the actual sort routine is instantiated at compile time, automatically by the compiler, with the correct types inserted in the template arguments. A bigger standard library C++ allows the full use of the C standard library. This is very important of course, as the C standard library is an invaluable resource when writing real world programs. However, C++ includes the Standard Template Library. The STL contains a number of useful templates, like the sort routine above. It includes useful common data structures such as lists, maps, sets, etc. Like the sort routine, the other STL routines and data structures are "tailored" to the specific needs the programmer has - all the programmer has to do is fill in the types. Of course, the STL is no silver bullet - but it does provide a great help very often, when solving general problems. How often have you implemented a list in C? How often would an RB-tree have been a better solution, if only you had had the time to do it? With the STL you do not need to make such compromises - use the tree if it's a better fit, it's as easy as using the list.

There are no good C++ compilers It's been like this for a long time. But you must remember, that the language was standardized in 1998 - it is a complex language, more complex than C. It has taken a long time for compilers to catch up to the standard. But as of this writing, there are good compilers available for the most widely used platforms out there; GCC in versions 3.X are generally very good, and it runs on GNU/Linux and most UNIX platforms. Intel has a good compiler for Win32 - it is also pretty good, but unfortunately it still relies on the MS STL which is sub-par.

It's been like this for a long time. But you must remember, that the language was standardized in 1998 - it is a complex language, more complex than C. It has taken a long time for compilers to catch up to the standard. But as of this writing, there are good compilers available for the most widely used platforms out there; GCC in versions 3.X are generally very good, and it runs on GNU/Linux and most UNIX platforms. Intel has a good compiler for Win32 - it is also pretty good, but unfortunately it still relies on the MS STL which is sub-par. People don't know good C++ This is not an often heard complaint, but it's something that I see a lot. C++ is a big and complex language - but it also used to be a language that was hyped a lot, especially back in the "OOP solves hunger, cures AIDS and cancer" days. The result seems to be that a lot of really poor C++ code, basically bad C with a few class statements here and there, is out there and is being used as learning material. This means, a lot of people who believe they know C++ actually write really crap code. That's too bad, and it's a problem, but I think it's unfair to blame C++ for this.

Note, that I am not arguing that everything is rewritten in C++. There are many large projects out there which are written in C - I do not believe that it is a good idea to just "convert" them to C++. C++ allows for cleaner solutions than C does, for a great many problems. Doing a minimal conversion of a solution which is "as clean as it gets" in C, to C++, would convert "good C" code into "poor C++". That is not a change to the better!

I have converted code from C to C++. But such a conversion has always ended up being a complete rewrite. From scratch. A redesign as such may not be necessary - after all, C++ does not force any new design principles down your throat, and if the original C code was well designed, there may not be a need to redo that.

However, for new projects starting out today, I fail to see why anyone (who knows C++ or is willing to learn it) would use C rather than C++. Unless, of course, that one of the two problems stated earlier are important.

The Challenge

The problem: Write a program that reads text from standard-input, and returns (prints) the total number of distinct words found in the text.

In short, I presented a C++ solution to the problem, and challenged someone to do it in C. I even wrote:

I still guarantee you two things: 1) Your code will be longer 2) Your program will be slower

Initial solutions

The small C++ solution

std::set

std::string

stdin

#include <set> #include <string> #include <iostream> int main(int argc, char **argv) { // Declare and Initialize some variables std::string word; std::set<std::string> wordcount; // Read words and insert in rb-tree while (std::cin >> word) wordcount.insert(word); // Print the result std::cout << "Words: " << wordcount.size() << std::endl; return 0; }

The faster C solution

Note, the actual code presented here is not the exact suggestion sent to the list - I changed the printing of the result at the bottom of the code, so that the program prints out the number of distinct words - the original code was made to print out an unsorted list of word frequencies. This change only touches a few of the very last lines in the code, it touches nothing in the central parts of the program at all.

Unlike my initial solution, this program uses a hash to store the distinct words. A hash can be faster than an RB-tree - a well dimensioned hash can exhibit O(1) complexity for insertions where an RB-tree exhibits O(log(n)) complexity.

The change of algorithm is unfortunate for a language-performance comparison point of view - as the C implementation will exhibit different run-time performance than the C++ implementation, for reasons that are not related to the choice of language at all. However, I'm not going to bitch about it, I'm going to evaluate the suggestion and still prove my point; that C++ is a better language for solving general purpose programming problems.

Let's have a look at the C code shall we;

/* Copyright (C) 1999 Lucent Technologies */ /* From 'Programming Pearls' by Jon Bentley */ /* wordfreq.c -- list of words in file, with counts */ #include <stdio.h> #include <stdlib.h> #include <string.h> typedef struct node *nodeptr; typedef struct node { char *word; int count; nodeptr next; } node; #define NHASH 29989 #define MULT 31 nodeptr bin[NHASH]; unsigned int hash(char *p) { unsigned int h = 0; for ( ; *p; p++) h = MULT * h + *p; return h % NHASH; } #define NODEGROUP 1000 int nodesleft = 0; nodeptr freenode; nodeptr nmalloc() { if (nodesleft == 0) { freenode = malloc(NODEGROUP*sizeof(node)); nodesleft = NODEGROUP; } nodesleft--; return freenode++; } #define CHARGROUP 10000 int charsleft = 0; char *freechar; char *smalloc(int n) { if (charsleft < n) { freechar = malloc(n+CHARGROUP); charsleft = n+CHARGROUP; } charsleft -= n; freechar += n; return freechar - n; } void incword(char *s) { nodeptr p; int h = hash(s); for (p = bin[h]; p != NULL; p = p->next) if (strcmp(s, p->word) == 0) { (p->count)++; return; } p = nmalloc(); p->count = 1; p->word = smalloc(strlen(s)+1); strcpy(p->word, s); p->next = bin[h]; bin[h] = p; } int main() { int i; nodeptr p; char buf[16384]; unsigned total = 0; for (i = 0; i < NHASH; i++) bin[i] = NULL; while (scanf("%s", buf) != EOF) incword(buf); for (i = 0; i < NHASH; i++) for (p = bin[i]; p != NULL; p = p->next) total += p->count ? 1 : 0; printf("Words: %d

", total); return 0; }

> As in maybe it's just me, but when I look at Jakob's code I have no idea > how it works. What are these routines? What do they do? Where's the > logic? Oh, I can read the comments and see the keywords and sort of > guess that all sorts of dark magic is being done; I just have no idea > how it does it, any more than I would if one took Jakob's code or my > code or anybody's code for the same task and wrapped it up in a single > routine: And here I argue that this is where C++'s strength becomes evident. STL is _standard_ (hence its name). Thus, for a C++ programmer, Jakob's code is comprehensible in one 15second long glimpse and this programmer can immediately use (trust?) the code. With the equivalent C code from the posted link, I personally have the problem that I need at least 15 minutes to read and understand it, and even then, I can't be sure I identified all gotchas and thus I can use it without major crashes and problems. I argue that the biggest problem with C up until glib and gsl (thus very recently) was exactly the lack of high level routines collected in a ubiquous library.

Showdown, so far

C and C++ code was compiled with GCC 3.3.2, using the following switches:

-O3 -Wall -g -march=pentium2

-O3 -march=pentium2

Run-time performance Solution Lines of code Run-time C++ RB-tree 15 235 sec. C Hash 85 36 sec.

Wow! Quite a difference! At 567% the code-size, the C implementation gets the work done in 15% of the time of the C++ solution. Quite a difference, in run-time as well as implementation-time.

But this benchmark is no good for the purpose of the "C versus C++" argument - it's two completely different approaces that yield completely different results, for reasons that have nothing to do with the languages used.

Identical algorithms, different languages

Shorter (so that it is easier audited/comprehended)

Generic (so that it can be reused)

Faster

Prettier

brew coffee

With the time that I spent on this, I was not able to achieve all of those goals. It was a tradeoff between code length/clarity and speed, basically.

As it turned out, making the code more generic did not affect run-time performance, and did not significantly affect code size.

The main thing that affected code size, was, that I made the code reliable and reusable - I changed the C implementation which used global variables and #define s (thus making the implementation unusable if for example someone wanted to use two different hashes in a program), into a generic template class.

A short table summarizing the changes:

C versus C++, hash implementation Quality C implementation C++ implementation Generality Hash parameters are defines, everything-global Hash parameters specified as template parameters, nothing-global Usability No memory cleanup - customized allocation routines make addition of cleanup slightly complicated Full cleanup in hash destructor

One of the nifty features of the C++ implementation is, that the hash parameters are specified at hash instantiation, as template parameters. So, a program needing two different hashes with different size parameters, would simply declare them as:

my_hash_t<> one; // Uses default parameters my_hash_t<11,3> two; // NHASH=11, MULT=3 in this instantiation

C++'ification of the C solution

/* * Rewrite in C++ by Jakob Oestergaard * of: (original copyright follows) */ /* Copyright (C) 1999 Lucent Technologies */ /* From 'Programming Pearls' by Jon Bentley */ /* wordfreq.c -- list of words in file, with counts */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <vector> struct node_t { char *word; int count; node_t *next; }; template <typename T> struct alist_t { alist_t(size_t elms) : next(0) { area = new T[elms]; } ~alist_t() { if (next) delete next; delete[] area; } T* allocate(size_t elms) { alist_t *n = new alist_t<T>(elms); n->next = next; next = n; return n->area; } T *area; alist_t<T> *next; }; template <unsigned NHASH = 29989, unsigned MULT = 31, unsigned NODEGROUP = 1000, unsigned CHARGROUP = 1000> class my_hash_t { public: my_hash_t() : bin(NHASH,0), nodesleft(0), charsleft(0), nglist(NODEGROUP), cglist(CHARGROUP) { } unsigned hash(const char *p) const { unsigned int h = 0; while (*p) h = MULT * h + *p++; return h % NHASH; } void incword(const char *s) { node_t* p; int h = hash(s); for (p = bin[h]; p; p = p->next) if (strcmp(s, p->word) == 0) { ++p->count; return; } p = nmalloc(); p->count = 1; p->word = smalloc(strlen(s)+1); strcpy(p->word, s); p->next = bin[h]; bin[h] = p; } node_t *const * begin() const { return &bin[0]; } node_t *const * end() const { return &bin[NHASH]; } private: std::vector<node_t*> bin; int nodesleft; node_t* freenode; int charsleft; char *freechar; alist_t<node_t> nglist; alist_t<char> cglist; node_t* nmalloc() { if (nodesleft == 0) { freenode = nglist.allocate(NODEGROUP); nodesleft = NODEGROUP; } nodesleft--; return freenode++; } char *smalloc(int n) { if (charsleft < n) { freechar = cglist.allocate(n+CHARGROUP); charsleft = n+CHARGROUP; } charsleft -= n; freechar += n; return freechar - n; } }; int main() { my_hash_t<> my_hash; char buf[16384]; while (scanf("%s", buf) != EOF) my_hash.incword(buf); unsigned total = 0; for (node_t*const* p = my_hash.begin(); p != my_hash.end(); ++p) for (const node_t *e = *p; e; e = e->next) total += e->count ? 1 : 0; printf("Words: %d

", total); return 0; }

malloc

alist_t

template class

Now, what you may not realize if you're an old-school C programmer is, that you can actually stick the entire hash implementation - everything in the above listing except for the main routine - in a header file. Yep, that's right - C++ will handle this, the linker will not barf. You can then include this header file in every project you do, where you may need a hash implementation. You define the properties of the has when you instantiate the hash - in your own code. The header file just provides the template implementation. C'mon, that's clever!

Showdown; identical algorithm, C versus C++

Speed - the C++ implementation must run as fast or faster than the C implementation

Generality - the C++ implementation must be easily reusable in other projects (unlike the C implementation)

Usability - the C++ implementation must manage its memory properly (unlike the C implementation)

Code length - The C++ implementation is 36 lines longer than the C implementation - I accepted this, as the new code can in fact be used as a component in other projects (unlike the C implementation)

Run-time performance Solution Lines of code Run-time C++ Hash 121 34.697 sec. C Hash 85 35.929 sec.

Conclusion so far

I believe that I have demonstrated how C++ constructs (the template ) allows for generic re-usable and flexible implementations without run-time performance overhead, and (the constructor/destructor) allows for automatic initialization and cleanup of data structures - making it simpler for the user of generic data structures, such as the hash presented, to write real-world programs that are reliable and do not leak memory.

I also believe that I have proven, that C++ does not force any particular type of design down anyone's throat. For example, the node_t in my code is almost identical to the struct node in the original C code - there simply was no need to change it, and therefore it was left basically the way that it was.

Further solutions

The reasons for the beating turned out to be pretty interesting.

Fixed C++ solution

char*

std::string

std::string a = "abc"; // easy initialization from C strings const char *b = a.c_str(); // easy to get a C string out again std::string b = "foo" + a; // operations on strings are easy too! size_t b_len = b.size(); // O(1) behavior for some operations bool a_eq_b = a == b; // string comparison, done right

std::string

I discovered that for string comparisons (which is the number-one CPU consuming operation in both the C hash implementation and the C++ RB-tree implementation) in my STL use the memcmp routine, whereas the C implementation directly calls the strcmp routine. Now, for some odd reason, the strcmp routine on my system is heavily optimized for the Pentium II processor, while the memcmp routine used when I specify Pentium II optimizations is optimized for the i486 processor. This was one cause for the slowdown - my RB-tree implementation simply could not compare strings as fast as the C implementation, even though it ought to be faster at comparing since it already has O(1) access to the string lengths.

But this was not the whole explanation. It turns out, that using std::cin >> buf is just horribly inefficient. I did not dig further into this to discover why - I guess that all in all, maintaining the length of the string explicitly as the STL std::string does, is just not a win in this scenario. Further, I guess that std::cin is just not horribly well optimized, in the STL that I use. I guess this is understandable - the STL is big, and I/O may just not have been the number one priority for the STL people - understandable, I guess, since the standard C library offers fine routines for doing I/O, and especially when high-performance I/O is a concern, one will use the "raw" read and write system calls anyway.

So, I cooked up a very simple implementation that still uses std::string (because it gives us safe memory management and fully automatic cleanup), but takes the C approach to reading in data from the text file. The code is presented below:

#include <set> #include <string> #include <iostream> #include <stdio.h> int main(int argc, char **argv) { // Declare and Initialize some variables char buf[16384]; std::set<std::string> wordcount; // Read words and insert in rb-tree while (scanf("%s", buf) != EOF) wordcount.insert(buf); // Print the result std::cout << "Words: " << wordcount.size() << std::endl; return 0; }

Run-time performance Solution Lines of code Run-time C++ RB-tree (I) 15 235 sec. C++ Hash 121 34.697 sec. C Hash 85 35.929 sec. C++ RB-tree (II) 15 63.970 sec.

Faster C++ solution

It really serves no purpose. Except perhaps to boost my ego by presenting something that is faster than the "programming pearl" presented earlier ;)

The following solution uses something that is almost-but-not-quite a digital trie. In my tests, it uses about four times as much memory as the hash implementations did. It sacrifices memory footprint for performance.

#include <unistd.h> #include <vector> #include <cctype> #include <stdint.h> #include <iostream> const size_t pagesize = 4096; struct knot_t { knot_t() : exists(false), childcount(0), children(0) { } knot_t* child(unsigned c) { // Look for a matching child if (children) { unsigned i = childcount / 2; unsigned leftpos = 0; unsigned rightpos = childcount; while (children[i].letter != uint8_t(c)) { if (uint8_t(c) < children[i].letter) { rightpos = i; i = (leftpos + i) / 2; } else { leftpos = i+1; i = (rightpos + i) / 2; } if (rightpos == leftpos) goto nomatch; } return children + i; } nomatch: // Create matching child - insert in order knot_t *nary = new knot_t[childcount + 1]; knot_t *destp = nary; knot_t *sourcep = children; knot_t *endsource = children + childcount; while (sourcep != endsource && sourcep->letter < c) *destp++ = *sourcep++; knot_t *outp = destp++; while (sourcep != endsource) *destp++ = *sourcep++; delete[] children; children = nary; outp->letter = c; childcount++; return outp; } unsigned distinctcount() const { unsigned total = exists ? 1 : 0; for (unsigned i = 0; i != childcount; ++i) total += children[i].distinctcount(); return total; } bool exists; private: uint8_t letter; unsigned short childcount; knot_t *children; }; int main(int argc, char **argv) { knot_t base; char buffer[pagesize]; knot_t *curnot = 0; while (ssize_t r = read(0, buffer, pagesize)) { if (r <= 0) break; for (unsigned bufpos = 0; bufpos != unsigned(r); ++bufpos) { uint8_t c = buffer[bufpos]; if (isspace(c)) { if (!curnot) continue; curnot->exists = true; curnot = 0; continue; } if (!curnot) { curnot = base.child(c); continue; } curnot = curnot->child(c); } } if (curnot) curnot->exists = true; std::cout << "Distinct: " << base.distinctcount() << std::endl; return 0; }

Run-time performance Solution Lines of code Run-time C++ RB-tree (I) 15 235 sec. C++ Hash 121 34.697 sec. C Hash 85 35.929 sec. C++ RB-tree (II) 15 63.970 sec. C++ Ego-booster 93 18.108 sec.

Of course, the ego-booster could also be implemented in C. Probably with negligible run-time penalty (as per the previous tests). This code really uses virtually none of the niceties that C++ provides, in mainly serves to prove that C++ can be used for real performance critical code, where old-timers would cry "C" if asked. Well, I'm not asking, I was too busy coding ;)

The above code spends about half of the run-time waiting for a register stall, because I store the letter as a uint8_t instead of an unsigned . It would be simple to change, but it would cost about 30% extra on the memory footprint. Ah, tradeoffs...

The end