C++ Frequently Questioned Answers

This is a single page version of C++ FQA Lite. C++ is a general-purpose programming language, not necessarily suitable for your special purpose. FQA stands for "frequently questioned answers". This FQA is called "lite" because it questions the answers found in C++ FAQ Lite.

The single page version does not include most "metadata" sections such as the FQA FAQ.

Table of contents

Defective C++

This page summarizes the major defects of the C++ programming language (listing all minor quirks would take eternity). To be fair, some of the items by themselves could be design choices, not bugs. For example, a programming language doesn't have to provide garbage collection. It's the combination of the things that makes them all problematic. For example, the lack of garbage collection makes C++ exceptions and operator overloading inherently defective. Therefore, the problems are not listed in the order of "importance" (which is subjective anyway - different people are hit the hardest by different problems). Instead, most defects are followed by one of their complementary defects, so that when a defect causes a problem, the next defect in the list makes it worse.

No compile time encapsulation

In naturally written C++ code, changing the private members of a class requires recompilation of the code using the class. When the class is used to instantiate member objects of other classes, the rule is of course applied recursively.

This makes C++ interfaces very unstable - a change invisible at the interface level still requires to rebuild the calling code, which can be very problematic when that code is not controlled by whoever makes the change. So shipping C++ interfaces to customers can be a bad idea.

Well, at least when all relevant code is controlled by the same team of people, the only problem is the frequent rebuilds of large parts of it. This wouldn't be too bad by itself with almost any language, but C++ has...

Outstandingly complicated grammar

"Outstandingly" should be interpreted literally, because all popular languages have context-free (or "nearly" context-free) grammars, while C++ has undecidable grammar. If you like compilers and parsers, you probably know what this means. If you're not into this kind of thing, there's a simple example showing the problem with parsing C++: is AA BB(CC); an object definition or a function declaration? It turns out that the answer depends heavily on the code before the statement - the "context". This shows (on an intuitive level) that the C++ grammar is quite context-sensitive.

In practice, this means three things. First, C++ compiles slowly (the complexity takes time to deal with). Second, when it doesn't compile, the error messages are frequently incomprehensible (the smallest error which a human reader wouldn't notice completely confuses the compiler). And three, parsing C++ right is very hard, so different compilers will interpret it differently, and tools like debuggers and IDEs periodically get awfully confused.

And slow compilation interacts badly with frequent recompilation. The latter is caused by the lack of encapsulation mentioned above, and the problem is amplified by the fact that C++ has...

No way to locate definitions

OK, so before we can parse AA BB(CC); , we need to find out whether CC is defined as an object or a type. So let's locate the definition of CC and move on, right?

This would work in most modern languages, in which CC is either defined in the same module (so we've already compiled it), or it is imported from another module (so either we've already compiled it, too, or this must be the first time we bump into that module - so let's compile it now, once, but of course not the next time we'll need it). So to compile a program, we need to compile each module, once, no matter how many times each module is used.

In C++, things are different - there are no modules. There are files, each of which can contain many different definitions or just small parts of definitions, and there's no way to tell in which files CC is defined, or which files must be parsed in order to "understand" its definition. So who is responsible to arrange all those files into a sensible string of C++ code? You, of course! In each compiled file, you #include a bunch of header files (which themselves include other files); the #include directive basically issues a copy-and-paste operation to the C preprocessor, inherited by C++ without changes. The compiler then parses the result of all those copy-and-paste operations. So to compile a program, we need to compile each file the number of times it is used in other files.

This causes two problems. First, it multiplies the long time it takes to compile C++ code by the number of times it's used in a program. Second, the only way to figure out what should be recompiled after a change to the code is to check which of the #include files have been changed since the last build. The set of files to rebuild generated by this inspection is usually a superset of the files that really must be recompiled according to the C++ rules of dependencies between definitions. That's because most files #include definitions they don't really need, since people can't spend all their time removing redundant inclusions.

Some compilers support "precompiled headers" - saving the result of the parsing of "popular" header files to some binary file and quickly loading it instead of recompiling from scratch. However, this only works well with definitions that almost never change, typically third-party libraries.

And now that you've waited all that time until your code base recompiles, it's time to run and test the program, which is when the next problem kicks in.

No run time encapsulation

Programming languages have rules defining "valid" programs - for example, a valid program shouldn't divide by zero or access the 7th element of an array of length 5. A valid program isn't necessarily correct (for example, it can delete a file when all you asked was to move it). However, an invalid program is necessarily incorrect (there is no 7th element in the 5-element array). The question is, what happens when an invalid program demonstrates its invalidity by performing a meaningless operation?

If the answer is something like "an exception is raised", your program runs in a managed environment. If the answer is "anything can happen", your program runs somewhere else. In particular, C and C++ are not designed to run in managed environments (think about pointer casts), and while in theory they could run there, in practice all of them run elsewhere.

So what happens in a C++ program with the 5-element array? Most frequently, you access something at the address that would contain the 7th element, but since there isn't any, it contains something else, which just happens to be located there. Sometimes you can tell from the source code what that is, and sometimes you can't. Anyway, you're really lucky if the program crashes; because if it keeps running, you'll have hard time understanding why it ends up crashing or misbehaving later. If it doesn't scare you (you debugged a couple of buffer overflows and feel confident), wait until you get to many megabytes of machine code and many months of execution time. That's when the real fun starts.

Now, the ability of a piece of code to modify a random object when in fact it tries to access an unrelated array indicates that C++ has no run time encapsulation. Since it doesn't have compile time encapsulation, either, one can wonder why it calls itself object-oriented. Two possible answers are warped perspective and marketing (these aren't mutually exclusive).

But if we leave the claims about being object-oriented aside, the fact that a language runs in unmanaged environments can't really be called a "bug". That's because managed environments check things at run time to prevent illegal operations, which translates to a certain (though frequently overestimated) performance penalty. So when performance isn't that important, a managed environment is the way to go. But when it's critical, you just have to deal with the difficulties in debugging. However, C++ (compared to C, for example) makes that much harder that it already has to be, because there are...

No binary implementation rules

When an invalid program finally crashes (or enters an infinite loop, or goes to sleep forever), what you're left with is basically the binary snapshot of its state (a common name for it is a "core dump"). You have to make sense of it in order to find the bug. Sometimes a debugger will show you the call stack at the point of crash; frequently that information is overwritten by garbage. Other things which can help the debugger figure things out may be overwritten, too.

Now, figuring out the meaning of partially corrupted memory snapshots is definitely not the most pleasant way to spend one's time. But with unmanaged environments you have to do it and it can be done, if you know how your source code maps to binary objects and code. Too bad that with C++, there's a ton of these rules and each compiler uses different ones. Think about exception handling or various kinds of inheritance or virtual functions or the layout of standard library containers. In C, there's no standard binary language implementation rules, either, but it's an order of magnitude simpler and in practice compilers use the same rules. Another reason making C++ code hard to debug is the above-mentioned complicated grammar, since debuggers frequently can't deal with many language features (place breakpoints in templates, parse pointer casting commands in data display windows, etc.).

The lack of a standard ABI (application binary interface) has another consequence - it makes shipping C++ interfaces to other teams / customers impractical since the user code won't work unless it's compiled with the same tools and build options. We've already seen another source of this problem - the instability of binary interfaces due to the lack of compile time encapsulation.

The two problems - with debugging C++ code and with using C++ interfaces - don't show up until your project grows complicated in terms of code and / or human interactions, that is, until it's too late. But wait, couldn't you deal with both problems programmatically? You could generate C or other wrappers for C++ interfaces and write programs automatically shoveling through core dumps and deciphering the non-corrupted parts, using something called reflection. Well, actually, you couldn't, not in a reasonable amount of time - there's...

No reflection

It is impossible to programmatically iterate over the methods or the attributes or the base classes of a class in a portable way defined by the C++ standard. Likewise, it is impossible to programmatically determine the type of an object (for dynamically allocated objects, this can be justified to an extent by performance penalties of RTTI, but not for statically allocated globals, and if you could start at the globals, you could decipher lots of memory pointed by them). Features of this sort - when a program can access the structure of programs, in particular, its own structure - are collectively called reflection, and C++ doesn't have it.

As mentioned above, this makes generating wrappers for C++ classes and shoveling through memory snapshots a pain, but that's a small fraction of the things C++ programmers are missing due to this single issue. Wrappers can be useful not only to work around the problem of shipping C++ interfaces - you could automatically handle things like remote procedure calls, logging method invocations, etc. A very common application of reflection is serialization - converting objects to byte sequences and vice versa. With reflection, you can handle it for all types of objects with the same code - you just iterate over the attributes of compound objects, and only need special cases for the basic types. In C++, you must maintain serialization-related code and/or data structures for every class involved.

But perhaps we could deal with this problem programmatically then? After all, debuggers do manage to display objects somehow - the debug information, emitted in the format supported by your tool chain, describes the members of classes and their offsets from the object base pointer and all that sort of meta-data. If we're stuck with C++, perhaps we could parse this information and thus have non-standard, but working reflection? Several things make this pretty hard - not all compilers can produce debug information and optimize the program aggressively enough for a release build, not all debug information formats are documented, and then in C++, we have a...

Very complicated type system

In C++, we have standard and compiler-specific built-in types, structures, enumerations, unions, classes with single, multiple, virtual and non-virtual inheritance, const and volatile qualifiers, pointers, references and arrays, typedef s, global and member functions and function pointers, and templates, which can have specializations on (again) types (or integral constants), and you can "partially specialize" templates by pattern matching their type structure (for example, have a specialization for std::vector<MyRetardedTemplate<T> > for arbitrary values of T ), and each template can have base classes (in particular, it can be derived from its own instantiations recursively, which is a well-known practice documented in books), and inner typedef s, and... We have lots of kinds of types.

Naturally, representing the types used in a C++ program, say, in debug information, is not an easy task. A trivial yet annoying manifestation of this problem is the expansion of typedef s done by debuggers when they show objects (and compilers when they produce error messages - another reason why these are so cryptic). You may think it's a StringToStringMap , but only until the tools enlighten you - it's actually more of a...

// don't read this, it's impossible. just count the lines std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >

But wait, there's more! C++ supports a wide variety of explicit and implicit type conversions, so now we have a nice set of rules describing the cartesian product of all those types, specifically, how conversion should be handled for each pair of types. For example, if your function accepts const std::vector<const char*>& (which is supposed to mean "a reference to an immutable vector of pointers to immutable built-in strings"), and I have a std::vector<char*> object ("a mutable vector of mutable built-in strings"), then I can't pass it to your function because the types aren't convertible. You have to admit that it doesn't make any sense, because your function guarantees that it won't change anything, and I guarantee that I don't even mind having anything changed, and still the C++ type system gets in the way and the only sane workaround is to copy the vector. And this is an extremely simple example - no virtual inheritance, no user-defined conversion operators, etc.

But conversion rules by themselves are still not the worst problem with the complicated type system. The worst problem is the...

Very complicated type-based binding rules

Types lie at the core of the C++ binding rules. "Binding" means "finding the program entity corresponding to a name mentioned in the code". When the C++ compiler compiles something like f(a,b) (or even a+b ), it relies on the argument types to figure out which version of f (or operator+ ) to call. This includes overload resolution (is it f(int,int) or f(int,double) ?), the handling of function template specializations (is it template<class T> void f(vector<T>&,int) or template<class T> void f(T,double) ?), and the argument-dependent lookup (ADL) in order to figure out the namespace (is it A::f or B::f ?).

When the compiler "succeeds" (translates source code to object code), it doesn't mean that you are equally successful (that is, you think a+b called what the compiler thought it called). When the compiler "fails" (translates source code to error messages), most humans also fail (to understand these error messages; multiple screens listing all available overloads of things like operator<< are less than helpful). By the way, the C++ FAQ has very few items related to the unbelievably complicated static binding, like overload resolution or ADL or template specialization. Presumably people get too depressed to ask any questions and silently give up.

In short, the complicated type system interacts very badly with overloading - having multiple functions with the same name and having the compiler figure out which of them to use based on the argument types (don't confuse it with overriding - virtual functions, though very far from perfect, do follow rules quite sane by C++ standards). And probably the worst kind of overloading is...

Defective operator overloading

C++ operator overloading has all the problems of C++ function overloading (incomprehensible overload resolution rules), and then some. For example, overloaded operators have to return their results by value - naively returning references to objects allocated with new would cause temporary objects to "leak" when code like a+b+c is evaluated. That's because C++ doesn't have garbage collection, since that, folks, is inefficient. Much better to have your code copy massive temporary objects and hope to have them optimized out by our friend the clever compiler. Which, of course, won't happen any time soon.

Like several other features in C++, operator overloading is not necessarily a bad thing by itself - it just happens to interact really badly with other things C++. The lack of automatic memory management is one thing making operator overloading less than useful. Another such thing is...

Defective exceptions

Consider error handling in an overloaded operator or a constructor. You can't use the return value, and setting/reading error flags may be quite cumbersome. How about throwing an exception?

This could be a good idea in some cases if C++ exceptions were any good. They aren't, and can't be - as usual, because of another C++ "feature", the oh-so-efficient manual memory management. If we use exceptions, we have to write exception-safe code - code which frees all resources when the control is transferred from the point of failure ( throw ) to the point where explicit error handling is done ( catch ). And the vast majority of "resources" happens to be memory, which is managed manually in C++. To solve this, you are supposed to use RAII, meaning that all pointers have to be "smart" (be wrapped in classes freeing the memory in the destructor, and then you have to design their copying semantics, and...). Exception safe C++ code is almost infeasible to achieve in a non-trivial program.

Of course, C++ exceptions have other flaws, following from still other C++ misfeatures. For example, the above-mentioned lack of reflection in the special case of exceptions means that when you catch an exception, you can't get the call stack describing the context where it was thrown. This means that debugging illegal pointer dereferencing may be easier than figuring out why an exception was thrown, since a debugger will list the call stack in many cases of the former.

At the bottom line, throw/catch are about as useful as longjmp/setjmp (BTW, the former typically runs faster, but it's mere existence makes the rest of the code run slower, which is almost never acknowledged by C++ aficionados). So we have two features, each with its own flaws, and no interoperability between them. This is true for the vast majority of C++ features - most are...

Duplicate facilities

If you need an array in C++, you can use a C-like T arr[] or a C++ std::vector<T> or any of the array classes written before std::vector appeared in the C++ standard. If you need a string, use char* or std::string or any of the pre-standard string classes. If you need to take the address of an object, you can use a C-like pointer, T* , or a C++ reference, T& . If you need to initialize an object, use C-like aggregate initialization or C++ constructors. If you need to print something, you can use a C-like printf call or a C++ iostream call. If you need to generate many similar definitions with some parameters specifying the differences between them, you can use C-like macros or C++ templates. And so on.

Of course you can do the same thing in many ways in almost any language. But the C++ feature duplication is quite special. First, the many ways to do the same thing are usually not purely syntactic options directly supported by the compiler - you can compute a+b with a-b*-1 , but that's different from having T* and T& in the same language. Second, you probably noticed a pattern - C++ adds features duplicating functionality already in C. This is bad by itself, because the features don't interoperate well (you can't printf to an iostream and vice versa, code mixing std::string and char* is littered with casts and calls to std::string::c_str , etc.). This is made even worse by the pretty amazing fact that the new C++ features are actually inferior to the old C ones in many aspects.

And the best part is that C++ devotees dare to refer to the C features as evil, and frequently will actually resort to finger pointing and name calling when someone uses them in C++ code (not to mention using plain C)! And at the same time they (falsely) claim that C++ is compatible with C and it's one of its strengths (why, if C is so evil?). The real reason to leave the C syntax in C++ was of course marketing - there's absolutely NO technical reason to parse C-like syntax in order to work with existing C code since that code can be compiled separately. For example, mixing C and the D programming language isn't harder than mixing C and C++. D is a good example since its stated goals are similar to those of C++, but almost all other popular languages have ways to work with C code.

So IMO all that old syntax was kept for strictly commercial purposes - to market the language to non-technical managers or programmers who should have known better and didn't understand the difference between "syntax" and "compatibility with existing code" and simply asked whether the old code will compile with this new compiler. Or maybe they thought it would be easier to learn a pile of new syntax when you also have the (smaller) pile of old syntax than when you have just the new syntax. Either way, C++ got wide-spread by exploiting misconceptions.

Well, it doesn't matter anymore why they kept the old stuff. What matters is that the new stuff isn't really new, either - it's obsessively built in ways exposing the C infrastructure underneath it. And that is purely a wrong design decision, made without an axe to grind. For example, in C++ there's...

No high-level built-in types

C is a pretty low-level language. Its atomic types are supposed to fit into machine registers (usually one, sometimes two of them). The compound types are designed to occupy a flat chunk of memory with of a size known at compile time.

This design has its virtues. It makes it relatively easy to estimate the performance & resource consumption of code. And when you have hard-to-catch low-level bugs, which sooner or later happens in unmanaged environments, having a relatively simple correspondence between source code definitions and machine memory helps to debug the problem. However, in a high-level language, which is supposed to be used when the development-time-cost / execution-time-cost ratio is high, you need things like resizable arrays, key-value mappings, integers that don't overflow and other such gadgets. Emulating these in a low-level language is possible, but is invariably painful since the tools don't understand the core types of your program.

C++ doesn't add any built-in types to C (correction). All higher-level types must be implemented as user-defined classes and templates, and this is when the defects of C++ classes and templates manifest themselves in their full glory. The lack of syntactic support for higher-level types (you can't initialize std::vector with {1,2,3} or initialize an std::map with something like {"a":1,"b":2} or have large integer constants like 3453485348545459347376 ) is the small part of the problem. Cryptic multi-line or multi-screen compiler error messages, debuggers that can't display the standard C++ types and slow build times unheard of anywhere outside of the C++ world are the larger part of the problem. For example, here's a simple piece of code using the C++ standard library followed by an error message produced from it by gcc 4.2.0. Quiz: what's the problem?

// the code typedef std::map<std::string,std::string> StringToStringMap; void print(const StringToStringMap& dict) { for(StringToStringMap::iterator p=dict.begin(); p!=dict.end(); ++p) { std::cout << p->first << " -> " << p->second << std::endl; } } // the error message test.cpp: In function 'void print(const StringToStringMap&)': test.cpp:8: error: conversion from 'std::_Rb_tree_const_iterator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >' to non-scalar type 'std::_Rb_tree_iterator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >' requested

The decision to avoid new built-in types yields other problems, such as the ability to throw anything, but without the ability to catch it later. class Exception , a built-in base class for all exception classes treated specially by the compiler, could solve this problem with C++ exceptions (but not others). However, the most costly problem with having no new high-level built-in types is probably the lack of easy-to-use containers. But to have those, we need more than just new built-in types and syntax in the C++ compiler. Complicated data structures can't be manipulated easily when you only have...

Manual memory management

Similarly to low-level built-in types, C++ manual memory management is inherited from C without changes (but with the mandatory addition of duplicate syntax - new/delete , which normally call malloc/free but don't have to do that, and of course can be overloaded).

Similarly to the case with low-level built-in types, what makes sense for a low-level language doesn't work when you add higher-level features. Manual memory management is incompatible with features such as exceptions & operator overloading, and makes working with non-trivial data structures very hard, since you have to worry about the life cycles of objects so they won't leak or die while someone still needs them.

The most common solution is copying - since it's dangerous to point to an object which can die before we're done with it, make yourself a copy and become an "owner" of that copy to control its life cycle. An "owner" is a C++ concept not represented in its syntax; an "owner" is the object that is responsible to deallocate a dynamically allocated chunk of memory or some other resource. The standard practice in C++ is to assign each "resource" (a fancy name for memory, most of the time) to an owner object, which is supposed to prevent resource leaks. What it doesn't prevent is access to dead objects; we have copying for that. Which is slow and doesn't work when you need many pointers to the same object (for example, when you want other modules to see your modifications to the object).

An alternative solution to copying is using "smart" pointer classes, which could emulate automatic memory management by maintaining reference counts or what-not. To implement the pointer classes for the many different types in your program, you're encouraged to use...

Defective metaprogramming facilities

There are roughly two kinds of metaprogramming: code that generates other code and code that processes other code. The second kind is practically impossible to do with C++ code - you can't reliably process source code due to the extremely complicated grammar and you can't portably process compiled code because there's no reflection. So this section is about the first kind - code generation.

You can generate C++ code from within a C++ program using C macros and C++ templates. If you use macros, you risk getting clubbed to death by C++ fanatics. Their irrational behavior left aside, these people do have a point - C macros are pretty lame. Too bad templates are probably even worse. They are limited in ways macros aren't (however, the opposite is also true). They compile forever. Being the only way to do metaprogramming, they are routinely abused to do things they weren't designed for. And they are a rats' nest of bizarre syntactic problems.

That wouldn't necessarily be so bad if C++ didn't rely on metaprogramming for doing essential programming tasks. One reason C++ has to do so is that in C++, the common practice is to use static binding (overload resolution, etc.) to implement polymorphism, not dynamic binding. So you can't take an arbitrary object at run time and print it, but in many programs you can take an arbitrary type at compile time and print objects of this type. Here's one common (and broken) application of metaprogramming - the ultimate purpose is to be able to print arbitrary object at run time:

// an abstract base class wrapping objects of arbitrary types. // there can be several such classes in one large project struct Obj { virtual void print(std::ostream&) const = 0; }; template<class T> struct ObjImpl : Obj { T wrapped; virtual void print(std::ostream& out) const { out << wrapped; } }; // now we can wrap int objects with ObjImpl<int> and string objects // with ObjImpl<std::string>, store them in the same collection of Obj* // and print the entire collection using dynamic polymorphism: void print_them(const std::vector<Obj*>& objects) { for(int i=0; i<(int)objects.size(); ++i) { objects[i]->print(std::cout); // prints wrapped ints, strings, etc. std::cout << std::endl; } }

Typically there are 10 more layers of syntax involved, but you get the idea. This sort of code doesn't really work because it requires all relevant overloads of operator<< to be visible before the point where ObjImpl is defined, and that doesn't happen unless you routinely sort your #include directives according to that rule. Some compilers will compile the code correctly with the rule violated, some will complain, some will silently generate wrong code.

But the most basic reason to rely on the poor C++ metaprogramming features for everyday tasks is the above-mentioned ideological decision to avoid adding high-level built-in types. For example, templates are at the core of the...

Unhelpful standard library

Most things defined by the C++ standard library are templates, and relatively sophisticated ones, causing the users to deal with quite sophisticated manifestations of the problems with templates, discussed above. In particular, a special program called STLFilt exists for decrypting the error messages related to the C++ standard library. Too bad it doesn't patch the debug information in a similar way.

Another problem with the standard library is all the functionality that's not there. A large part of the library duplicates the functionality from the C standard library (which is itself available to C++ programs, too). The main new thing is containers ("algorithms" like max and adjacent_difference don't count as "functionality" in my book). The standard library doesn't support listing directories, opening GUI windows or network sockets. You may think that's because these things are non-portable. Well, the standard library doesn't have matrices or regular expressions, either.

And when you use the standard library in your code, one reason it compiles slowly to a large binary image is that the library extensively uses the...

Defective inlining

First, let's define the terms.

"Inlining" in the context of compilers refers to a technique for implementing function calls (instead of generating a sequence calling the implementation of the function, the compiler integrates that implementation at the point where the call is made). "Inlining" in the context of C++ refers to a way to define functions in order to enable (as opposed to "force") such implementation of the calls to the function (the decision whether to actually use the opportunity is made by the compiler).

Now, the major problem with this C++ way to enable inlining is that you have to place the definition of the function in header files, and have it recompiled over and over again from source. This doesn't have to be that way - the recompilation from source can be avoided by having higher-level object file formats (the way it's done in LLVM and gcc starting from version 4). This approach - link-time inlining - is one aspect of "whole program optimization" supported by modern compilers. But the recompilation from source could also be avoided in simpler ways if C++ had a way to locate definitions instead of recompiling them, which, as we've seen, it hasn't.

The crude support for inlining, designed with a traditional implementation of a C tool chain in mind, wouldn't be as bad if it wasn't used all the time. People define large functions inline for two reasons. Some of them "care" (emotionally) about performance, but never actually measure it, and someone told them that inlining speeds things up, and forgot to tell them how it can slow them down. Another reason is that it's simply annoying to define functions non-inline, since that way, you place the full function definition in a .cpp file and its prototype in a .h file. So you write the prototype twice, with small changes (for example, if a class method returns an object of a type itself defined in the class, you'll need an extra namespace qualification in the .cpp file since you're now outside of the namespace of the class). Much easier to just have the body written right in the .h file, making the code compile more slowly and recompile more frequently (changing the function body will trigger a recompilation).

And you don't even need to actually write any inline functions to get most of their benefits! A large subset of the inline functions of a program are...

Implicitly called & generated functions

Here's a common "design pattern" in C++ code. You have a huge class. Sometimes there's a single pseudo-global object of this class. In that case, you get all the drawbacks of global variables because everybody has a pointer to the thing and modifies it and expects others to see the changes. But you get no benefits of global variables since the thing is allocated on the stack and when your program crashes with a buffer overflow, you can't find the object in a debugger. And at other times there are many of these objects, typically kept in a pseudo-global collection.

Anyway, this huge class has no constructors, no destructor and no operator= . Of course people create and destroy the objects, and sometimes even assign to them. How is this handled by the compiler?

This is handled by the compiler by generating a gigantic pile of code at the point where it would call the user-defined functions with magic names (such as operator= ) if there were any. When you crash somewhere at that point, you get to see kilobytes of assembly code in the debugger, all generated from the same source code line. You can then try and figure out which variable didn't like being assigned to, by guessing where the class member offsets are in the assembly listing and looking for symbolic names of the members corresponding to them. Or you can try and guess who forgot all about the fact that these objects were assigned to using the "default" operator= and added something like built-in pointer members to the class. Because that wouldn't work, and could have caused the problem.

Implicit generation of functions is problematic because it slows compilation down, inflates the program binaries and gets in the way when you debug. But the problem with implicitly calling functions (whether or not they were implicitly generated) is arguably even worse.

When you see code like a=f(b,c) (or even a=b+c , thanks to operator overloading), you don't know whether the objects are passed by reference or by value (see "information hiding"). In the latter case, the objects are copied with implicitly called functions; in the former case, that's possible, too, if implicit type conversions were involved. Which means that you don't really understand what the program does unless you know the relevant information about the relevant overloads and types. And by the way, the fact that you can't see whether the object is passed by reference or by value at the point of call is another example of implicit stuff happening in C++.

One more problem with automatically generated functions (such as constructors and destructors) is that they must be regenerated when you add private members to a class, so changing the private parts of a class triggers recompilation... Which brings us back to square 1.

Big Picture Issues

This part deals with the Big (and somewhat Sad) Picture.

[6.1] Is C++ a practical language?

FAQ: Sure - not perfect, but mature and well-supported, which is good for business.

FQA: C++ is not "mature" in the sense that different compilers will interpret it differently, and C++ modules built by different vendors will not work with each other. C++ is not "well-supported" in the sense that development tools for C++ lack features and are unreliable compared to other languages. These things make one ask "Am I the first one trying to do this?" all the time.

This situation is not likely to change, because it follows from the C++ definition. C++ is very complicated for programs (or people) to understand. C++ specification leaves out most aspects crucial for interoperability, such as modules and calling conventions. C++ has a huge installed base, and since solving these problems backward-compatibly is impossible, they won't be solved.

[6.2] Is C++ a perfect language?

FAQ: No, and it shouldn't be, it should be practical, which, as we've just seen, it is. Perfect is for academy, practical is for business.

FQA: No language is "perfect" because our requirements from a "perfect" language are inconsistent with each other. So instead of perfection, good languages provide consistency and usability. This can be called "practical" from the point of view of language users.

C++ is different - it's designed for perfection. Where other languages give you a feature, C++ gives you meta-features. Instead of built-in strings and vectors, it gives you templates. Instead of garbage collection, it gives you smart pointers. This way, you can (theoretically) implement your own "perfect" (most efficient and generic) strings. In practice, this turns into a nightmare since many different kinds of strings, smart pointers, etc., each perfect in its own way, will not work with each other. C++ sacrifices usability for perfection.

However, despite the obsession with perfection, C++ is "practical" - from a language designer's perspective rather than from a user's point of view. The "practical" thing in C++ is that it's based on C. This helped the language gain popularity. This is also the main reason for inconsistencies in the language - ambiguities in the grammar (declaration/definition, type name/object name...), duplications of functionality in the different features (pointers/references, constructors/aggregate initialization, macros/constants/templates, files/namespaces...). C++ sacrifices consistency for popularity. This "practical" approach helps to increase the number of C++ users, but it doesn't help those users to get their job done.

[6.3] What's the big deal with OO?

FAQ: Object-oriented programming is the best known way to develop complex systems. It was invented because customers kept demanding increasingly complex systems.

FQA: Object-oriented programming is very useful. For a lot of things it's so useful you're likely to want support for it built into your language. Of course nobody knows how to build complex systems in the general case (or in your special case). OOP can help, other things can help, but ultimately there is no simple way to deal with complexity. Which is only surprising if you think that there should exist a reliable process to produce anything people are willing to pay for. The laws of business are powerful, the laws of nature are more powerful.

Most kinds of built-in language support for object-oriented programming, including no such support, have big advantages over C++ classes. The single biggest problem with C++ classes is that private members are written in header files, so changing them requires recompiling the code using them - for important practical purposes, this makes private members a part of the interface. C++ is built such that recompilation is very slow (an order of magnitude slower than it is with virtually any other language), and classes are built to make recompilation a frequent event.

From a business perspective, this means two things: your C++ developers spend a significant amount of their time in recompilation cycles, and C++ interfaces provided to your customers or by your vendors will cause you major headaches (when versions are upgraded, some of the code won't be recompiled and software will fail in creative ways). Luckily, C++ interfaces are hard to provide (effectively all parties must use the same compiler with the same settings), so quite typically C++ modules have interfaces written in C.

[6.4] What's the big deal with generic programming?

FAQ: Generic programming allows to create components which are easy to use, widely applicable (reusable) and efficient. Using them makes your code faster and reduces the amount of errors. Creating them is a "non-process" (a poetic description of solving hard problems follows - waking up at night and other things probably questionable from the "business perspective" of which the FAQ is so fond). Most people can use them, but aren't cut out to create their own - one must like to solve puzzles for that. But these generic components are so generic that you can probably find an off-the-shelf one for your needs.

FQA: "Generic programming" in the context of C++ refers to templates.

Templates are hard to use (and not only define & implement) due to cryptic compiler error messages, extremely long compilation time and remarkable hostility to symbolic debugging - both code browsing and data inspection. The usability problems are not solved by using off-the-shelf components.

Templates are mostly applicable to containers or smart pointers, which can contain or point to almost anything. When the constraints on the input are less trivial, most of the time you either don't really need polymorphism, or you are better off with dynamic polymorphism (for example, the kind you get with C++ virtual functions). That's because in most cases, the benefits (such as separate compilation) are worth the overhead of dynamic binding (which is dwarfed by the complexity of the dispatched operations themselves).

Templates are a form of code generation, and hence they don't make code faster or slower compared to code you'd write manually. They do tend to make it larger since the compiler generates the same code many times. Although there are theoretical ways to avoid this, you find yourself solving someone else's problem. With the "evil" C macros you can at least control when they are expanded.

People who like to solve puzzles usually prefer interesting puzzles. With templates, the greatest puzzle is what on Earth the code means (even compilers frequently disagree). Practical people avoid fiddling with problems which nobody actually wants solved, and templates are only interesting inside the world of C++, not the real world.

[6.5] Is C++ better than Ada? (or Visual Basic, C, FORTRAN, Pascal, Smalltalk, or any other language?)

FAQ: Answering this question is not very helpful because business considerations dominate technical considerations. Specifically, availability (of compile time and run time environments, tools, developers) is the most important consideration. People who don't get this are techie weenies endangering their employer's interests.

FQA: Answering this question is not very helpful because the real question is what language is best for your specific purposes. The purposes are defined by the business considerations (what seems worth doing) and by technical considerations (what seems possible to do). In particular, your purposes may limit the availability of developers, tools, etc. These constraints are necessary to meet.

One thing is always true: where you can use C++, you can use C. In particular, if someone gave you C++ interfaces, a thin layer of wrappers will hide them. Using C instead of C++ has several practical benefits: faster development cycle, reduced complexity, better support by tools such as debuggers, higher portability and interoperability. When C++ is an option, C is probably a better option.

Another thing is always true: where you can use a managed environment (where the behavior of wrong programs is defined), using it will save a lot of trouble. C++ (like C) is designed for unmanaged environments (where the behavior of wrong programs is undefined). Unmanaged environments make it very hard to locate faults and impose no limit on the damage done by an undetected fault. In theory, C++ implementations can run in managed environments, but in practice they don't because of innumerable compatibility issues.

Yet another thing is almost always true: picking up a new language is easier for an experienced C++ programmer than working in C++. This is the result of the exceeding complexity of C++.

People who think there's no point in comparing programming languages, for example because "business considerations dominate technical considerations", are free to start their new projects in COBOL (COmmon Business-Oriented Language).

[6.6] Who uses C++?

FAQ: Lots and lots and lots of people and organizations. Which is excellent for business since a lot of developers are available.

FQA: Empirical studies indicate that 20% of the people drink 80% of the beer. With C++ developers, the rule is that 80% of the developers understand at most 20% of the language. It is not the same 20% for different people, so don't count on them to understand each other's code.

Two things are at fault: the exceptional complexity of C++ and its wide popularity, driving hordes of people who don't consider professional competence a personal priority. The few competent developers will spend much of their time dealing with problems created by the language instead of the original problems (and a subset of these developers will not even notice).

The large number of developers at least has the advantage of motivating the development of tools for dealing with C++ code. However, the design of the language makes it notoriously hard to produce such tools - a problem motivation can't quite remedy. Compare the quality of code browsing in C++ IDEs to IDEs of other languages and you'll get the idea. You can look at language-specific IDEs, general-purpose programming IDEs or extensions for general-purpose text editors - C++ loses everywhere. Don't just look at small examples, try it on large programs (especially ones using cutting-edge template libraries).

[6.7] How long does it take to learn OO/C++?

FAQ: In 6-12 months you can become proficient, in 3 years you are a local mentor. Some people won't make it - those can't learn, and/or they are lazy. Changing the way you think and what you consider "good" is hard.

FQA: In 6-12 months you can become as proficient as it gets. It is impossible to "know" C++ - it keeps surprising one forever. For example, what does the code cout << p do when p is a volatile pointer? Hint: as experienced people might expect, there's an unexpected implicit type conversion involved.

While some people are better at learning than others, it is also true that some languages are easier to learn and use than others. C++ is one of the hardest, and your reward for the extra effort spent learning it is likely to be extra effort spent using it. If you find it hard to work in C++, trying another language may be a good idea.

Before you subvert the way you think about programming and your definition of "good" in this context to fit C++, it might be beneficial to ask the common sense again. For example, does compilation time really cost nothing (is development time that cheap, are there compilation servers with 100 GHz CPUs around)? Is run time really priceless (don't user keystrokes limit out speed, how much data are we processing anyway)? How efficient a C++ construct really is in your implementation (templates, exceptions, endless copying & conversion)? The reasoning behind C++ may be consistent, but the assumptions almost never hold.

Learning OO has nothing to do with learning C++, and it is probably better to learn OO using a different language as an example. The OO support in C++ is almost a parody on OO concepts. For example, encapsulation is supposed to hide the implementation details from the user of a class. In C++, the implementation is hidden neither at compile time (change a private member and you must recompile the calling code) nor at run time (overwrite memory where an object is stored and you'll find out a lot about the implementation of its class - although in an unpleasant way).

[6.8] What are some features of C++ from a business perspective?

FAQ: Here are a few:

FQA: Here are a few more:

No practical implementation of C++ runs in managed environments, increasing both the defect rate and the potential damage of an undetected defect

Providing C++ interfaces to a software component is impossible in practice due to lack of compile time and run time interoperability

C++ is extremely inconsistent and complicated, increasing learning curves and the defect rate

C++ compilers typically fail to comply to its intricate standard, reducing portability

C++ compilation is both very slow and very frequent, increasing development time and defect rate (people write cryptic and dangerous code to avoid recompilation, for example, use global variables instead of adding arguments to functions, saving 1.5 hours per rebuild x 20 developers = 30 hours of downtime)

C++ lacks standard types representing basic data structures like strings, arrays and lists (or has more than one standard and many non-standard ones, which is the same), making it harder to reuse code (each interface works with a different kind of strings) and reducing the speed due to run time type conversion

All things mentioned in the FAQ are false for most practical purposes:

Despite the "huge" installed base, the tools dealing with C++ code are poor and their interoperability is a disaster (in both cases the problem is in the language definition)

C++ interfaces are usually very complicated (lots of small classes, implicitly generated functions like constructors & destructors, code bundled with the interface in template definitions...). As mentioned above, providing C++ interfaces to someone outside of your team is very hard in practice. And private members are for many purposes effectively a part of your interface.

Operator overloading is almost always counter-intuitive if one tries to understand the functionality (why does the left shift operator print things?), and always counter-intuitive if one tries to estimate performance (go figure if * multiplies two integers or two matrices, especially inside a template definition) or locate bugs (lethal ones can hide in places like operator= , where they are hard to see)

things?), and always counter-intuitive if one tries to estimate performance (go figure if multiplies two integers or two matrices, especially inside a template definition) or locate bugs (lethal ones can hide in places like , where they are hard to see) C++ doesn't reduce safety-vs-anything trade-off since it's extremely unsafe (it "supports" all the undefined behavior of C like buffer overflows, adds many new scenarios with undefined result like invisibility of template specializations at the point of usage, and its complexity reduces the chances that someone actually knows what a program does and can prove its correctness). Where's the "trade-off"?

Old code can call new code in almost any popular language, for example, C (the ancient qsort function is probably calling new code as we speak). The item is really supposed to describe the benefits of OO to non-technical people. C++ is not likely to give its user the benefits of OO.

[6.9] Are virtual functions (dynamic binding) central to OO/C++?

FAQ: Sure, that's what makes C++ an object-oriented language. Don't switch from C to C++ unless you need virtual functions!

FQA: They probably are if you consider C++ an "object-oriented" language (a C++ debugger doesn't - try asking it to show what "object" is located at a random place, for example). You have to carefully define "object-oriented" so that C++ fits the definition.

Dynamic binding is central to any language since otherwise old code can't call new code, making code reuse very hard. Virtual functions are one form of dynamic binding supported by C++ (function pointers, inherited from C, are another one).

Switching from any language to C++ is not necessarily a good idea.

[6.10] I'm from Missouri. Can you give me a simple reason why virtual functions (dynamic binding) make a big difference?

FAQ: Before OO, you could only reuse old code by having new code call it. With OO, old code can call new code - more reuse. Even if the source code for the old code is not available.

FQA: It is unclear why the FAQ gets this wrong - most of the time it is technically accurate. Dynamic binding - old code calling new code - exists outside of OO. There are countless examples on any scale, ranging from the C qsort function to operating systems, which run programs written long after the code of those systems.

The special thing in OO is that, well, it works with objects. In the case of dynamic binding, this means that not only does old code call new code - it also passes the state (encapsulated in the object receiving the method call) needed for this new code to work. This also happens outside of OO, but OO is an excellent unifying framework for dealing with this kind of thing. Especially if you have a good OO environment.

The omission of facts in the FAQ is much more typical than the technical inaccuracy. Specifically, there's a difference between theory and practice when it comes to old code not available in source form calling new code. In practice, code generated from C++ source is not portable, limiting the scenarios where the reuse actually works. Worse, even C++ implementations running on the same hardware and operating system are rarely compatible. For actually having old code call new code, you must limit yourself to a small subset of the language (C is one good one), and/or have both the old and the new code built with the same tools under the same settings.

[6.11] Is C++ backward compatible with ANSI/ISO C?

FAQ: Almost. But a declaration of a function without parameters means different things in C and C++, and sizeof('x') is likely to yield a different value, and...

FQA: The pair of words "almost compatible" is almost meaningless - for many technical purposes, compatibility is a binary thing. "Compatible", on the other hand, can have several meanings.

If your question is "Can I compile C code with a C++ compiler?", the answer is "no" because of numerous differences in the way code is interpreted (some things will be reported by the compiler, some will be silently misinterpreted). However, this is not a real problem, since you can compile C code with a C compiler.

If your question is "Can I call C code from C++ code?", the answer is "yes", but it's not special to C++. You can call C code from virtually any popular language because most of today's environments are based on C, making it both easy and beneficial to support this.

If your question is "Can I call C++ code from C code?", the answer is "sometimes". It is possible if the C++ code exposes a C interface (no classes, no exceptions...), and even then there are problems like making sure C++ global constructors and destructors are invoked. Many platforms provide ways for this to work.

If your question is "Is it easier for a C programmer to learn and use C++ than another new language, possibly object-oriented?", the answer is "no". C++ is very hard to learn and use and the hardest parts are not related to the C subset, but to the new parts and the way they interact with the old parts.

If your question is "Are C++ programs likely to contain bugs similar to these littering C programs, like buffer overflows?", the answer is "yes". If you are willing to sacrifice performance to gain stability, a managed environment might suit your needs. If you want to improve the stability of your programs without sacrificing neither development time nor run time, you probably can't. In particular, the "high-level" C++ is compatible to the "low-level" C when it comes to damage caused by low-level errors.

[6.12] Is C++ standardized?

FAQ: Yes, an ISO standard was adopted in 1997. The FAQ mentions twice that it was "adopted by unanimous vote".

FQA: Yes, there is a document specifying what "C++" means, and lots of implementation vendors signed it. The important thing about standardization, however, is the practical implications. Let's examine them.

The C++ standard does not specify what source code is translated to. Unlike code built from Java source, compiled C++ code will usually only run on one hardware/OS configuration.

The C++ standard does not address interoperability between implementations. Unlike code built from C source, compiled C++ code will only be able to call C++ code built with the same compiler and the same settings. Different implementations implement exceptions, global initialization & destruction, virtual functions, RTTI, mangling conventions, etc. etc. differently. The C standard leaves out interoperability between implementations just like the C++ standard - but C is an order of magnitude simpler, so you won't have these problems in practice.

The C++ standard does not define a term like "module" or "library" - only "program" and "translation unit" (roughly, the latter means a preprocessed source file). If you deliver dynamically/statically linked libraries, you're on your own. Again, you will have problems with global initialization & destruction, RTTI, exceptions...

The C++ standard does not specify a machine-readable definition of the C++ grammar, and the question whether a given sequence of characters is legal C++ is undecidable. Building tools reliably processing C++ code (including compilers) is extremely hard.

The C++ standard has been out there for a long time. Today, different C++ compilers will interpret C++ code differently. The most frequent source of problems is static binding - figuring out what function calls should be generated from a given statement. Compilers implement name resolution (affected by namespaces, function & operator overloading, template signature matching & specialization, implicit type conversions, type qualifiers, inheritance...) differently. Neither the standard document nor common sense will easily tell you which compiler is "right".

You may think that compilers will eventually catch up with the standard (which most vendors are trying to do all the time, but the tools still frequently disagree on the question what "C++" means). Well, the next version of the standard is supposed to be adopted before 2010, giving those vendors some more work. For those compiler writers with really too much time on their hands, there's C++/CLI.

C++ is standardized, but it may have less practical benefits than you might be used to expect from "standards".

[6.13] Where can I get a copy of the ANSI/ISO C++ standard?

FAQ: Get ready to spend some money. A list of links follows.

FQA: Get ready to throw away some money. Seriously, what are you going to do with your copy? The document is incomprehensible.

The document may be useful if you are a language lawyer planning to sue the people responsible for the language or a particular implementation. But if you want to build working software, it's more practical to accept the fact that your implementation is not standard-compliant in many dark areas. If you find a front-end bug (for example, many times nifty, expensive tools will crash trying to process complicated C++ code; all compilers I used did) - that's actually your problem. While you are lost in the maze of C++ features, your competitor has already released a working product written without such complications.

The document is also useful if you're into meta-programming (compilers/debuggers/profilers/verifiers...) and want to write tools dealing with C++ code. The standard may help chill your passion before you throw away too much of your time.

[6.14] What are some "interview questions" I could ask that would let me know if candidates really know their stuff?

FAQ: If you are a non-technical person (manager/HR), ask a technical person to help you judge the technical competence of a candidate. If you are a technical person, the FAQ is one source of good questions, separating the truly competent people from the posers.

FQA: The good interview questions probably don't mention anything unique to C++.

Ultimately, you are looking for people with good will (some call them "cooperative"), who will do things, not just talk about them (some call them "practical"), and who will think, not just do (some call them "intelligent"). So the best questions, relevant for all candidates, are about their largest last projects. The answers give you lots of information and good answers are almost impossible to fake.

You may also need people to have some prior knowledge relevant to their work since you don't have time to have them trained and gain experience. If you are sure that's the case (despite the fact that the people you are looking for are good learners), ask specific questions. Questions about high-level software organization issues (like OO) may be useful. Questions about low-level software construction issues (like pointers) may be useful. These issues are not specific to C++.

Asking about things specific to C++ is not very useful.

First, many of these things are useless for any practical purpose and are best avoided. Whether someone knows these things is correlated quite loosely with proficiency, and there are many excellent developers out there who weren't confronted with a particular obscure C++ feature yet, or successfully forgot it. So chances are that you are going to reject a good candidate.

Second, a good candidate actually knowing the answer may prefer an employer asking more relevant and practical questions. So chances are that a good candidate is going to reject you.

And third, there are people who look for the most complicated way to solve a problem to show off their intelligence. These tend to stumble into the dark areas of the tools they use all the time, so they will know answers to many C++-specific questions (they won't know answers to many more, because almost nobody does). Your questions will rank these people as the best possible candidates. Later you will find out that these people are poor practitioners.

[6.15] What does the FAQ mean by "such and such is evil"?

FAQ: This means that a feature should be avoided whenever possible. The strong word is supposed to help people change their old thinking.

FQA: This means the feature satisfies the following conditions:

It is inherited from C

It is easy to abuse (especially when it interacts with the new C++ features)

It can cause problems when abused (especially when it interacts with the new C++ features)

C++ provides one or more facilities duplicating the functionality of the feature, replacing the original problems with new and much more complicated problems

For example, macros, pointers, and arrays meet this definition (the corresponding C++ "solutions" are const & template, references & smart pointers, and vector/string classes). Include files almost meet this definition, except that C++ doesn't duplicate this functionality (namespaces are a parallel notion of "modules", but they can't be used to locate definitions). Consequently, the FAQ will not call include files "evil". On the other hand, function overloading doesn't come from C, so duplicate facilities like template specialization, default arguments, etc. are not enough for the FAQ to call function overloading "evil". Still, function overloading is very commonly abused leading to major problems.

A C++ user is likely to have a different definition of "evil". A user doesn't care whether something came from C or not, and whether C++ tried to offer duplicate facilities (while forcing users to deal with the original ones since they're still in the language). A user typically cares about the "easy to abuse and causing trouble when abused" parts. Lots and lots of parts of C++ are like that.

As to the features the FAQ does call evil - why are they in the language? Is it good for the users of the language, or for those who designed and promoted it?

[6.16] Will I sometimes use any so-called "evil" constructs?

FAQ: Of course! Evil means "usually undesirable", but sometimes you have to choose from a set of bad options, and an "evil" feature is your best option. There are no universal rules. Think! At this point the FAQ (and your typical C++ devotee) gets quite agitated.

FQA: Of course! You have no choice. They are built into the language. For example, "abc" and {1,2,3} are evil arrays, the keyword this and the standard char** argv are evil pointers, and you'll need an evil #define to define a usable interface (for the header file inclusion guards).

Note that with evil arrays, you can write int a[3] = {1,2,3}; while with the supposedly less evil std::vector , you can't. You'll find out that C++ brand new features duplicating the functionality of the "evil" old C features are inferior to the latter in many more ways.

Worse, you can avoid neither the features the FAQ calls evil nor the ones the user would call evil, because if your code doesn't use a feature, it doesn't mean that someone else's code you have to live with doesn't. For example, you may try to avoid exceptions, but the C++ operator new , as well as code in third-party libraries, will throw exceptions, and you have to catch them.

There's a basic assumption behind C++ that extra features can't be a problem - only missing features can. That's why there are so many features in C++, and in particular so many duplicate ones. Real world analogies ("imagine a dog with twelve legs") are pale compared to this reality.

[6.17] Is it important to know the technical definition of "good OO"? Of "good class design"?

FAQ: Not if you are a practitioner. Business considerations are the important ones. Precise technical definitions of "good" may lead developers to ignore these considerations, so they are dangerous.

FQA: Whether it's important or not, there is no technical definition of "good", in particular good OO or good class design. "Good" is not a formal term, nor is it universal. For example, if you work for a company, it's important to consider how beneficial something ultimately is for that company in order to define "good".

However, there are technical definitions of OO. So while there are no formal means to tell whether something is good OO, you may be able to reason whether something is OO or not. Which is not necessarily interesting by itself. But it may be interesting if you have reasons to believe that OO is a good tool for your job - you may want to make sure you'll actually get the benefits you expect. It may also be interesting if someone calls something OO - you may wonder whether you use the same terms or whether they know what they're talking about.

Getting obsessive about precise definitions is a bad way to make decisions. But it's also bad to ignore definitions and blindly go with the hype. For example, people promoting C++ keep telling how good OO is, and how C++ supports OO, and then you try to find out what OO actually means, and suddenly it turns out that it's not important. Isn't that a little strange?

It is very beneficial for a practitioner to gain familiarity with OO systems other than C++, and with OO definitions other than the "encapsulation, inheritance, polymorphism" trinity interpreted in special ways allowing C++ to be considered "OO". For example, a claim that an environment lacking boundary checking or garbage collection is not an OO environment sounds outrageous to people accustomed to C++. But from many perspectives, it makes a lot of sense. If anyone can overwrite an object, where's the "encapsulation"? If disposing an object can lead to dangling references or memory leaks, how is the system "object-oriented"? What about the ability to tell what kind of object is located at a given place and time? You say the software works with objects - where are they? And if one can't find out, how is one supposed to debug the software?

When people claim that C++ is object-oriented and therefore "good", it may be worth checking whether your notion of "good" is similar to theirs - from a business perspective, for example.

[6.18] What should I tell people who complain that the word "FAQ" is misleading, that it emphasizes the questions rather than the answers, and that we should all start using a different acronym?

FAQ: These people should get a life. Changing a term used and understood by many people is pointless, because people no longer care about the origins of the term and directly associate it with the right meaning.

FQA: If people are accustomed to express an idea in a certain way, and it works for them, trying to convince them to use a new way serves no useful purpose. We could use the opportunity to ask nitpicking questions about how this wisdom is applied to C++ itself. For example, why would someone deprecate static variables at the translation unit scope and demand people to use anonymous namespaces to get identical behavior? And all that.

Instead, we'll use the opportunity to point out that at the time of writing (2007), "FQA" appears to be a less popular acronym than "FAQ": a Google search yields a few results, but a Wikipedia search does not. Still, changing "FQA" to something else in this document is not an option: it's all over the place.

Classes and objects

One of the stated goals of C++ is support for object-oriented programming. This page introduces C++ classes and outlines the tactics they use to defeat their purpose.

[7.1] What is a class?

FAQ: In OO software, "the fundamental building block".

A class is a type - a representation for a set of states (much like a C struct ) and a set of operations for changing the state (moving from one state to another). Classes are similar to built-in types in this sense (for example, an int holds a bunch of bits and provides operations like + and *).

FQA: That's a correct theoretical definition. It's equally applicable to all OO languages, but they are different when it comes to more specific, practical aspects of their particular implementation of classes.

How do I create objects? And what happens when they are no longer needed? Is it my job to figure out which ones are unused and deallocate them? Bad.

What happens if I have bugs? If I have a pointer to an object, can it be invalid (be a random bit pattern, point to a dead object)? It can? The program will crash or worse? What about arrays of objects and out-of-bounds indexes? Crash or a modification of some other random object? You call that encapsulation? Bad.

What happens if I change/add/remove a private value, without changing the interface? All code using the class has to be recompiled? I bet you call that encapsulation, too. Bad.

I don't like C++ classes.

[7.2] What is an object?

FAQ: A chunk of memory with certain semantics. The semantics are defined by the class of the object.

FQA: They are also defined by the bugs which cause the code to overwrite data of these objects without bothering to use the interface defined by the class. People who think that real programmers write code without bugs need to upgrade to a human brain.

Still, it sounds interesting: the memory of our C++ program is apparently broken into chunks storing objects of various classes, with "defined semantics". Looks very promising, that. For example, we could ask a debugger about the kind of object located at such a chunk and inspect its data (as in "this is a Point with x=5 and y=6"). We could even take this one step further and implement things like garbage collectors, which can check whether an object is used by looking for pointers to it in the places which are supposed to store pointers.

Unfortunately, you can't tell the class of a C++ object given a pointer to it at run time. So if you debug a crashed C++ program and find a pointer somewhere in its guts, and you don't know its type, you'll have to guess that "0000000600000005" is a Point. Which is completely obvious, because that's the way a pair of adjacent integers looks like in hexadecimal memory listings of a little endian 32 bit machine. And two adjacent integers might be a Point. Or some other two-integer structure. Or a part of a three-integer-and-a-float structure. Or they might be two unrelated numbers which just happen to be adjacent.

Which is why you can't automatically collect the garbage of C++ programs.

[7.3] When is an interface "good"?

FAQ: It is good when it hides details, so that the users see a simpler picture. It should also speak the language of the user (a developer, not the customer).

FQA: Um, sometimes you want the interface to expose the many details and speak the language of the machine, although it's probably not very common. The generic answer is something like "an interface is good if it gets the user somewhere".

For example, using OpenGL you can render nifty 3D stuff at real time frame rates. FFTW delivers, well, the Fastest Fourier Transform in the West. With Qt, you can develop cross-platform GUI, and "cross-platform" won't mean "looking like an abandoned student project". Writing that stuff from scratch is lots of work; using the libraries can save lots of work. Apparently learning the interfaces of these libraries is going to pay off for many people.

For a negative example, consider <algorithm> . Does std::for_each get us anywhere compared to a bare for loop, except that now we need to define a functor class? That's a bad interface, because learning it doesn't make it easier to achieve anything useful.

[7.4] What is encapsulation?

FAQ: The prevention of "unauthorized access" to stuff.

The idea is to separate the implementation (which may be changed) from the interface (which is supposed to be stable). Encapsulation will force users to rely on the interface rather than the implementation. That will make changing the implementation cheaper, since the code of the users won't need to be changed.

FQA: That's a nice theoretical definition. Let's talk about practice - the properties of the C++ keywords private and protected , which actually implement encapsulation.

These keywords will cause the compiler to produce an error message upon access to a non-public member outside of the class. However, they will not cause the compiler to prevent "unauthorized access" by buggy code, for example upon buffer overflow. If you debug a crashed or misbehaving C++ program, forget about encapsulation. There's just one object now: the memory.

As to the cost of changes to the the private parts - they trigger recompilation of all code that #include s your class definition. That's typically an order of magnitude more than "code actually using your class", because everything ends up including everything. "The key money-saving insight", as the business-friendly-looking FAQ puts it, is that every time you change a class definition, you are recompiling the programs using it. Here's another simple observation: C++ compiles slowly. And what do we get now when we put two and two together? That's right, kids - with C++ classes, the developers get paid primarily to wait for recompilation.

If you want software that is "easy to change", stay away from C++ classes.

[7.5] How does C++ help with the tradeoff of safety vs. usability?

FAQ: In C, stuff is either stored in struct s (safety problem - no encapsulation), or it is declared static at the file implementing an interface (usability problem - there is no way to have many instances of that data).

With C++ classes, you can have many instances of the data (many objects) and encapsulation (non- public members).

FQA: This is wildly wrong, and the chances that the FAQ author didn't know it are extremely low. That's because you can't use FILE* from <stdio.h> or HWND from <windows.h> or in fact any widely used and/or decent C library without noticing that the FAQ's claim is wrong.

When you need multiple instances and encapsulation in C, you use a forward declaration of a struct in the header file, and define it in the implementation file. That's actually better encapsulation than C++ classes - there's still no run-time encapsulation (memory can be accidentally/maliciously overwritten), but at least there's compile-time encapsulation (you don't have to recompile the code using the interface when you change the implementation).

The fact that a crude C technique for approximating classes is better than the support for classes built into the C++ language is really shameful. Apparently so shameful that the FAQ had to distort the facts in an attempt to save face (or else the readers would wonder whether there's any point to C++ classes at all). The FQA hereby declares that it will not go down this path. Therefore, we have to mention this: the forward declaration basically makes it impossible for the calling code to reserve space for the object at compile time. This means that a struct declared in a header file or a C++ class can sometimes be allocated more efficiently than a forward-declared struct . However, this is really about a different tradeoff - safety vs. efficiency, and there's no escape from this tradeoff. Either the caller knows about the details such as the size of an object at compile time - which breaks compile-time encapsulation - or it doesn't, so it can't handle the allocation.

Anyway, here's the real answer to the original question: C++ helps with the tradeoff of safety vs. usability by eliminating both.

C++ is extremely unsafe because every pointer can be used to modify every piece of memory from any point in code. C++ is extremely unusable due to cryptic syntax, incomprehensible semantics and endless rebuild cycles. Where's your tradeoff now, silly C programmers?

[7.6] How can I prevent other programmers from violating encapsulation by seeing the private parts of my class?

FAQ: Don't bother. The fact that a programmer knows about the inner workings of your class isn't a problem. It's a problem if code is written to depend on these inner workings.

FQA: That's right. Besides, people can always access the code if a machine can. Preventing people from "seeing" the code in the sense that they can access it, but not understand it is obfuscation, not encapsulation.

[7.7] Is Encapsulation a Security device?

FAQ: No. Encapsulation is about error prevention. Security is about preventing purposeful attacks.

FQA: Depends on the kind of "encapsulation". Some managed environments rely on their support for run time encapsulation, which makes it technically impossible for code to access private parts of objects, to implement security mechanisms. C++ encapsulation evaporates at run time, and is almost non-existent even at compile time - use #define private public before including a header file and there's no more encapsulation (correction). It's hardly "encapsulation" at all, so of course it has no security applications - security is harder than encapsulation.

The capital E and S in the question are very amusing. I wonder whether they are a manifestation of Deep Respect for Business Values or Software Engineering; both options are equally hilarious.

[7.8] What's the difference between the keywords struct and class ?

FAQ: By default, struct members and base classes are public . With class , the default is private . Never rely on these defaults! Otherwise, class and struct behave identically.

But the important thing is how developers feel about these keywords. struct conveys the feeling that its members are supposed to be read and modified by the code using it, and class feels like one should use the class methods and not mess with the state directly. This difference is the important one when you decide which keyword to use.

FQA: struct is a C keyword. class was added to C++ because it is easier than actually making the language object-oriented. And it does a good job when it comes to the feeling of a newbie who heard that "OO is good".

Check out the emotional discussion about which keyword should be used in the FAQ. The more similar two duplicate C++ features are, the more heated the argument about the best option to use in each case becomes. Pointers/references, arrays/vectors... Yawn.

By the way, the forward-declaration-of-struct thing works in C++, and it's better than a class without virtual functions most of the time.

Inline functions

Inline functions are a pet feature of people who think they care about performance, but don't bother to measure it.

[9.1] What's the deal with inline functions?

FAQ: Inlining a function call means that the compiler inserts the code of the function into the calling code (which is technically different, but logically similar to the expansion of #define macros). This may improve performance, because the compiler optimizes the callee code in the context of the calling code instead of implementing a function call. However, the performance impact depends on lots of things.

There's more than one way to say that a function should be inline, some of which use the inline keyword and some don't. No matter what way you use, the compiler might actually inline the function and it might not - you're just giving it a "hint". Sounds vague? It is - and it is good: it lets the compiler generate better and/or more debuggable code.

FQA: To summarize: the compiler has the right to inline or not inline any function, whether it's declared inline in any of the several ways or not. Doesn't this make "inline functions" a meaningless term?

It's impossible to make any sense of this without discussing the history of actual implementations of the C language tool chain - compilers, assemblers and linkers. A straight-forward C implementation (which originally all of them were) works like this. First, a compiler generates assembly code from each source file, separately (without looking at other source files). Then the assembler converts the assembly code to an "object file", where "object" means "a sequence of bytes" (talk about "object oriented"). For example, a function is one kind of "object" - the bytes encode the machine instructions the compiler used to implement it.

The values of the bytes making up these "objects" are almost completely finalized at this stage. The only kind of "unknowns" is addresses of "objects" - when an "object" refers to an address of another "object" (say, a function calls another function), the assembler can't compute the actual values of the bytes making up the function call instructions. This is done by the linker, which allocates the "objects" (basically by concatenating them). The linker then resolves the references (such as function calls) to the addresses of the "objects".

What this means is that the only way to inline functions is to #include their definition in the header file - otherwise, the compiler doesn't see the code of the function, and the linker can't do inlining, because all it sees is byte sequences, and it would have to decompile them first. Which explains why you need to include the source code of inline functions in header files, but doesn't explain why you need an inline keyword and other ways to explicitly declare a function as "inline". After all, the compiler is free to ignore these hints, so what's their point?

Well, the point is that the compiler can't tell an #include d function from one written in your source file, because that's how C preprocessing works - the compiler only sees one large file of code. So unless you explicitly declare the #include d functions "inline", it will generate their code like it does with normal functions. Then the linker will complain about multiple definitions.

Here's how code is compiled by many modern compilers, including some C and C++ compilers. The compiler transforms the source code to an intermediate representation, "lower" than the source language but "higher" than assembly language. This makes it possible to do inlining at link time, either on a per-library or a whole-program basis. The machine code is only generated as the final linkage step, or it can even be delayed until run time (the so called "just in time compilation"). This way, you don't have to split your functions to "inline" (those that can be inlined, but the compiler gets to decide if they actually are inlined) and the rest (those that just can't be inlined). Instead, you let the compiler make the decision for all functions. Unfortunately, the meaning of C and C++ is defined with the old sort of implementation in mind, and having newer, more sophisticated implementations around can't change it.

[9.2] What's a simple example of procedural integration?

FAQ: There's an example of a function calling another function and how inlining may save you copying the parameters when you pass them to a function and copying the result it returns and stuff. There's also a disclaimer saying that it's just an example and many different things can happen.

FQA: Basically, code may be portable, but performance is typically not. For example, inlining a function may make code faster or slower, depending on lots of things discussed in the next FAQs. This is one reason to leave the decision to a compiler, because "it understands the target platform better". This is also a reason to leave the decision to a human, because compilers don't really know the target platform very well (they know the target processor but not the entire system), and because they don't understand the problem you are solving at all (so they can't tell how many times each piece of code is likely to run, etc.).

Anyway, the problem of "helping the compiler to optimize code" by adding "hints" to the code, especially portable code, is quite hard. There are many things similar to inlining in this respect (for example, loop unrolling). Which is why there's no unroll keyword forcing loop unrolling. And the inline keyword only exists because there's no better way to enable (as opposed to "force") inlining in C/C++.

[9.3] Do inline functions improve performance?

FAQ: Sometimes they do, sometimes they don't. There's no simple answer.

Inlining can make code faster by eliminating function call overhead, or slower by generating too much code, causing instruction cache misses. It may make it larger by replicating all that callee code, or smaller by saving the instructions used to implement function calls. It may inflate the code of an innermost loop, causing repeated cache misses, or it may improve the locality of reference in the loop, by compiling all relevant code at adjacent addresses. It may also be irrelevant for performance, because your system is not CPU-bound.

See? Told you there was no simple answer.

FQA: However, there is a relatively simple answer to the legitimate question: "Why do we need this language feature if its effect is undefined?". See the first FAQ in the section. There's also a relatively simple & useful rule saying that functions which have short code and/or are typically called with compile time constant arguments so that most of their code computes a constant are typically good candidates for inlining. Long functions are typically worse candidates for inlining, because the function call overhead is negligible compared to the things the functions actually do, and the main problem with inlining - large code size - becomes dominant.

It's usually a good idea to only explicitly enable the inlining of very short functions, and performance considerations are not the only reason. In C++ you have to place the code in header files to enable inlining. While the run time performance may or may not improve, the compile time performance is guaranteed to drop. Which means changing code becomes hard, which means you'll do your best to not change it, which means you'll leave wrong things unfixed. And debuggers handle inlined code pretty poorly (typically you can't inspect the local variables of inlined functions or even step through their source code lines), so debugging the optimized (production) build becomes harder. And debugging a special "debug" build is not always possible (some bugs won't reproduce in that build), not to mention that you have to spend time building those binaries, too.

If your application is not CPU bound, you aren't getting any benefits from using an unsafe language like C++ except for extra quality time with the debugger.

[9.4] How can inline functions help with the tradeoff of safety vs. speed?

FAQ: "In straight C" you could implement encapsulation using a void* , so that users can't access the underlying data. Instead, the users have to call functions which access that data by casting the void* to the right type first.

Not only is this type-unsafe - it's also costly, since the simplest access now involves a function call. In C++ you can use inline accessors to private data - safe, fast.

FQA: Today C has inline functions (the FAQ probably doesn't consider the current C standard "straight", I wonder why). AFAIK they were back-ported from C++ together with const and other useless things. But it's irrelevant to the question, which is about the completely wrong argument that inline accessors to private functions are a form of "high-speed encapsulation".

First, the tales about void* are wrong - you can use forward declarations to achieve the holy grail of (compile time) type safety. Second, a good language implementation can inline the small C-style accessors at link time. Third, private provides little encapsulation - change a private member and you have to recompile all code using the class. Fourth, most frequently private members with straight-forward public accessors are a just verbose way to implement a public member, since changing the representation is almost impossible and/or hardly useful.

And in the quite rare cases where it is possible and useful, "properties" - a language facility allowing to overload the obj.member syntax - could solve the problem, but C++ doesn't have properties. Or you could refactor the code automatically - if anything could reliably parse it.

In the C++ world code is considered a good thing, of which there should be plenty. private: int _n; public: int n() const { return _n; } is thus better than int n; . The question is - do you like lots and lots of C++ code doing practically nothing?

[9.5] Why should I use inline functions instead of plain old #define macros?

FAQ: Because macros are evil. In particular, when a macro is expanded, the parameters are not evaluated before the expansion, but copied "as is" (they are interpreted as character sequences by the preprocessor, not as C++ expressions). Therefore, if a parameter is an expression with a side effect, such as i++ , and the macro mentions it several times, the macro will expand to buggy code (for instance, i will get incremented many times). Or the generated code may be slow (when you use expressions like functionTakingAgesToCompute() as macro arguments).

Besides, inline functions check the argument types.

FQA: Yeah, C macros are no picnic. But see that thing about argument types? How can you write an inline function computing the maximal value of two arguments? A template inline function, you say? Try this: std::max(n,5) with short n .

And how should function arguments be passed - by value or by reference? "By value" may cause extra copying, and "by reference" may slow down the code due to aliasing problems, forcing the compiler to actually spill values to memory in order to pass them to the code of an inlined function! How's that for "performance benefits"? Another problem C macros don't have.

Frequently inline functions are better than macros, though, because the problems with macros turn out to be more severe in many cases. As usual, it's you who gets the interesting job of choosing between duplicate C++ facilities, each flawed in its own unique way.

[9.6] How do you tell the compiler to make a non-member function inline ?

FAQ: Prepend the inline keyword to its prototype, and place the code in a header file, unless it's only used in a single .cpp file, or else you'll get errors about "unresolved externals" from the linker.

FQA: The FAQ's decision to avoid the discussion of the reasons leading to these requirements is wrong. Clearly people who don't understand the underlying implementation issues won't survive to live the miserable life of competent C++ developers. That's because in C++, the underlying stuff tends to climb out of the basement in repeated attempts to make you the one lying under a pile of hard, urgent, mind-numbing low-level problems.

Therefore, the only legitimate excuse for telling about a totally weird language requirement and not explaining why it exists is brevity. See the FAQ's lengthy discussion about the performance of inline functions for a pretty good evidence that brevity is hardly the motivation here.

[9.7] How do you tell the compiler to make a member function inline ?

FAQ: Declare the function in the class as usual. In the definition, add the inline keyword to the prototype. The definition must be in a header file.

FQA: Yep, it's similar to non-member functions.

[9.8] Is there another way to tell the compiler to make a member function inline ?

FAQ: Yes, by writing its code right in the body of the class, instead of only writing the declaration there, and defining it outside of the class. This way, you don't even have to use the inline keyword.

It's easier when you write classes, but harder when you read them, because the interface is mixed with the implementation. Remember the "reuse-oriented world"? Think about the welfare of the many, many users of your class!

FQA: What popular language forces you to write a bare interface ("header file") and separately an implementation containing all the information in the interface ("source files")? Somehow all those languages which only make you type the interface once are not at all that hard to use. May it be that it's because these languages are parsable and therefore IDEs can do the oh-so-interesting job of extracting the interface from the implementation and then you can use things like class view windows to inspect interfaces?

Unless you use templates and operator overloading and all that, many IDEs even have a chance of working with your C++ code. So writing inline functions inside class definitions won't really hurt your users after all. Even if there were no IDEs, you should probably only inline very short functions, with implementations as descriptive as a comment (is return _x; less of a documentation than "//returns the value of the x coordinate"?). If there's lots of "implementation details" to hide from the eye of a casual observer, inlining is most likely a bad idea anyway.

[9.9] With inline member functions that are defined outside the class, is it best to put the inline keyword next to the declaration within the class body, next to the definition outside the class body, or both?

FAQ: The "best practice" is to only use it in the definition outside of the class. Blah, blah, blah, argues the FAQ passionately about the issue. "Observable semantics", "practical standpoint", blah, blah, blah.

FQA: Programmers typically have a good ability to keep many details in their heads. So good that many don't realize that this ability is finite. If you litter your brain with idiotic "best practices" which don't even affect the observable semantics of code, you do it at the expense of not thinking about something else when you write code. "Something else" may include really important things, like the purpose and the meaning of the code.

If you don't care about this sort of discussions, and you find yourself under an attack of "software professionals" buzzing buzzwords about your "bad practices", send them a link to this page to distract them, and use the time gained by the distraction to go out and buy a buzzer to talk back to them.

References

This page is about C++ references - a duplicate language feature introduced in order to support other duplicate features.

[8.1] What is a reference?

FAQ: It's an alias for an object - another name by which it can be called. The implementation is frequently identical to that of pointers. But don't think of references as pointers - a reference is the object.

FQA: A C++ reference is like a C++ pointer except for the following differences:

You use it as if it were a value: ref.member , not ptr->member , etc. (in this sense ref behaves like (*ptr) ).

, not , etc. (in this sense behaves like ). It must be initialized to point to an object - otherwise, the code won't compile.

After the initialization, you can't make it point to another object.

You can't take the address of a reference like you can with pointers (forming a pointer to a pointer).

There's no "reference arithmetics" (but you can take the address of an object pointed by a reference and do pointer arithmetics on it as in &obj + 5 ).

Strange phrases like "a reference IS the object" are used quite frequently in the C++ community. Such claims are only useful to hide the fact that C++ pointers & references are so similar that having both in the language is an unnecessary complication. In other contexts, the claims are simply false. For example, a wide class of bugs comes from accessing dangling references - references to objects which were already destroyed. If a reference is the object, or just another name for it, how can that happen? Names of destroyed objects are inaccessible - it takes a previously assigned pointer to access a destroyed object (C++ also breaks that rule - you can access a destroyed global object by its name from a destructor of another global object, but that's a different can of worms).

[8.2] What happens if you assign to a reference?

FAQ: A reference is the object, so of course you assign to the referent object.

FQA: Which means that you can't understand the effect of a statement as simple as a=b; without kn