Java Style Classes in C++

You can argue night and day about whether C++ is a good language, but for some reason, I have grown to really like it. This may sound especially strange from someone who has looked at several hundred language designs and designed quite a few himself, but the C++ language design makes a lot of sense (most C++ haters simply don’t understand the language deeply enough to see the elegance in it), and maybe more importantly, implementation quality for it is sadly still generations ahead of other languages.

That said, there is one problem with C++ that is by far the most annoying of all possible shortcomings one could identify in C++: the .cpp/.h seperation for classes, in combination with “declare before use”.

Header files as a simplistic way to aid modularity come from C, and there they work reasonably well, as its contents tends to be short and simple, and neatly seperates interface from implementation. In C++ however, the model is driven to the breaking point:

classes don’t allow a neat interface seperation, as implementation elements (private class state and methods) are required to be present in the header file, making your view of an interface more complicated/cluttered than in C

Having to declare functions/classes before their use (partially or not) is a remnant from the days when the compiler had to run on a pdp-11 with very little memory, and is a burden which almost no other (modern) programming language has. It was already a pain in C, in C++ it makes ordering classes, and declaring/including them in the correct order in larger programs a rather complicated puzzle, and drives more implementation details into the header files.

If you’re fanatic about perfect factoring of code like I am, this system is problematic in 3 ways: The seperation causes excessive duplication of declarations. Duplication is the #1 sin in programming. Ideally, tightly coupled elements in code should be close together in source code, and loosely coupled elements should be separated. The C++ model does the opposite (!). Perfect factoring requires constant refactoring. Refactoring is hard if single elements are spread over multiple files, and every change requires the same changes to be replicated (i.e. method decl vs definition).



For me, the above are some strong concerns, and being able to fix them would make C++ significantly better to work with. But we can’t do away with header file separation, and we can’t fix the declaration order. Or can we? As it turns out, some little known facts about C++ make this possible!

Java, for all its shortcoming, has a wonderful way of dealing with modularity: you can write classes in a single file, with all method implementations straight inside the class body, with no duplication required. The closest we can get in C++ to make this happen is to write our program entirely in header files (one or more self contained classes each), and include them all in a single .cpp. This does not work as smoothly as it does in Java however, and most C++ programmers will dismiss this idea immediately as unfeasible. But lets look at all complications of using C++ this way in turn, and see if they really hold:

Separating Interface from Implementation

Depending on the size of a program, you can still partition your program in multiple black-box style modules, where each module is written in the above style, with a single .cpp and many headers. The majority of headers can be the implementation of the module, and one of them can be a classical pure abstract interface that is available to the other modules. The concept of a class and an interface should be separate anyway, a module (or component) should have the interface, and the vast majority of classes are implementation chunks of a component, and thus do not require an interface to the outside world.

Compile time

This is not issue either. Assuming you have all operating system headers and frequently used containers (STL) or math libraries in precompiled headers, and assuming that if your program is really huge, you have separated it into components as per above, the remaining amount of code is absolutely negligible for a modern C++ compiler. In fact, modern C++ compilers often do more work at link time than at compile time (some even do code generation there), so that it can easily be faster.

Code bloat

Since modern C++ programs already have a lot of implementation code inside headers, and since code bloat is a big concern for cache performance, compilers already do a perfect job at filtering out duplicate method bodies, no matter where you place them or how you name them. And since you only need to include every header just once, it may even produce less code even on older compilers.

So it seems like it could work.. but we left the most difficult problem for last:

Declare before Use

Java doesn’t have this problem. If your program is just a list of linearly included header files, circular dependencies between classes have just grown worse compared to the old situation, because now dependencies caused by implementation code can also cause un resolvable loops (which class to declare first). So it seems like our excitement of being able to do Java style programming is dead in the water.. or is it?

There is a C++ feature very few programmers seem to know about (or at least, have exploited): and that is that inside a class, declare before use doesn’t hold (!). Clearly Bjarne was aware of this legacy problem in C++ and fixed it for the inside of classes, but couldn’t do the same for top-level declarations because of backwards compatibility with C (why? there must be some intricacy I am missing here).

So we can effectively solve our problems by wrapping the entire program (all our classes/headers) in a single “dummy” class whose sole purpose is to give us more modern C++ features. We can do this conveniently by putting this class inside our single .cpp, and the #include statement inside that class. This may feel like a dirty hack from a classical C++ perspective, but remember that headers are nothing but textual inclusion, and this system has some significant software engineering benefits. Don’t let traditions hold you back!

There are some weird C++ quirkynesses that one needs to take into account when programming this way, but they are all very minor. Lets look at an example of usage (imagine the classes could sit in their own files):

struct entire_program { struct B; struct A { B * bbb; void Aa () { B bb; bb.Bb(); }; }; struct B { A aaa; void Bb () { A aa; aa.Aa(); }; }; };

Notice something amazing, even though the full declaration of B is not yet given at the point of declaration of A, we can create value (full, non-pointer) objects of type B, and we can call methods on it that have not been declared yet! This is pretty much just like Java. Notice that even though I can create a B value inside an A method, I cannot create a B instance variable inside A, I can only create pointers. This make sense, because otherwise we would be able to create a pair of classes which would be impossible to instantiate (would have infinite size!). As you can see, I can create an A instance variable just fine, so it is not really a limitation.

It is rather un logical that a forward class decl is still required, as clearly the compiler appears to know everything about the class at that point already (unlike at top level, where a forward decl does not allow you anything except pointer decls). But it is not really a big deal either, because the amount of duplication caused by it is nothing compared to traditional C++. Curiously though, there is a way around these forward declarations:

struct entire_program { struct A { int a; void Aa () { bb.b = 1 ; bb.Bb(); bb.aaa.Aa(); }; }; struct B { int b; A aaa; void Bb () { aa.a = 1 ; aa.Aa(); }; }; static A aa; static B bb; };

This now allows you to access everything inside B already from A, but without even the forward decl (!). There are weird limitations to this however, that make it useful only for limited program structures. The A and B objects referred to have to be static (class variables), and if you want additional objects of this type you still require the forward decl. It is therefore mostly useful for situations where you have “system/support” elements in your module that are referenced thru out, as there is no other way to have instance variables in the outer class easily available to the entire program.

An additional thought may be that this outer class looks much like a fake namespace, so could namespaces be used to achieve the same effect? Sadly, no. For some reason, a namespace has the same top level behavior as normal, and thus does not support “use before declaration” as supported here.

As an example, here’s the entire .cpp source code of a tiny game engine I am working on:

#include "stdafx.h" #include "containers.h" #include "linalg.h" struct engine { #include "scriptobj.h" #include "scriptcfg.h" #include "camera.h" #include "cmaloader.h" #include "gamestat.h" #include "game.h" #include "particles.h" #include "d3dmeshrt.h" #include "d3drenderer.h" #include "system.h" static System _g; }; engine :: System engine :: _g; int WINAPI WinMain (HINSTANCE hInst, HINSTANCE, char * args, int ) { engine :: _g.Main(args); };

The headers at the top are the “tools” that are entirely independent of the engine, and can thus be included outside of it. The single global _g object contains all “system” facilities such as logs, timers, and other things that are used from everywhere, and as you can see I am using the static trick above, as this is the only way to make this object available to the entire engine easily.

The header files in the middle can be ordered in any way, just that the smarter you order them the less forward decls you require. In this case, the engine requires only one or two forward decls, everything else is in-order. None of the include files above contain any #include statements themselves, all including in the entire program is in the source above. The headers just contain one or more class decls each, with all implementation inside the class, nothing else.

Refactoring classes and methods in the above program is incredibly easy and fun, as everything is local in just one place. It is quite amazing to see how code can shrink, and become much more readable as a consequence of removing all superfluous declaration overhead.