In the last weeks I have written about the contents of my “Large C++ Legacy Applications” talk: I’ve written about how dealing with those applications is a team game, about the importance of planning the refactoring and tests and modularization. This post concludes the series with a look at the tools at our disposal.

Tooling

There are tools that we can use to refactor and clean up the mess left behind for us. The most obvious is the tooling built into our IDEs: Many modern IDEs provide assistance beyond mere syntax highlighting. We have warnings shown to us while writing the code, i.e. they provide some static analysis. This that can help us find dubious spots in our code, which in turn can prevent errors and improve readability.

There are very few IDEs that I know of that provide tools for simple refactoring steps, like extracting and inlining functions and variables. That kind of functionality is common in IDEs for other languages, like Eclipse, IntelliJ and Visual Studio for C#. The more complex syntax of C++ however seems to make it more difficult to provide the same functionality for C++ IDEs.

One of the better known examples of IDEs with emerging refactoring support is CLion, which I also use in the “4C environment” for Fix. The refactoring support definitely has its limits, but as far as I can see the development is on a good way.

IDE plugins

Some IDEs provide plugin functionality which allows third party vendors to add refactoring aides. The most prominent examples are probably Visual Assist X and Resharper for C++. I have not used either myself, but as far as I know those tools are of at least a similar quality as CLion when it comes to refactoring support.

Static analyzers

While compilers and also IDEs already emit a lot of warnings about code that does not look quite right, there is no substitute for a proper static analyzer. There are lots of subtle things that can go wrong in large code bases. Static analyzers are tools designed to find all kinds of small omissions and subtle bugs, so you should use one or two of them.

Consider using a newer IDE and compiler

Modern IDE Tooling is getting better and better, but it mostly is only available on the newer IDEs. Plugins may not work on older IDEs, and modern static analyzers might warn about code that can not be fixed if you have to cater to the needs of some ancient compiler.

In addition to the tool support, newer compilers also support the new C++ standards. This can enable us to write some code less tedious, safer and more performant.

But of course, it’s not that simple.

Switching the compiler

Switching to another compiler can be a large task on its own. That is especially true if we skip multiple versions, from 32 bit to 64 bit compilation and/or to a different compiler vendor.

One of the many small issues we can have is the size of pointers and integral types. There is code written a decade or two ago that simply assumes that the size of a pointer is and will be always 32 bit or 4 byte. Other code does compile without warnings only if long and int have the same size.

For example, try grepping a million line code base for the number 4 – it’s not the best thing to spend several days on. Neither is the process of finding that subtle bug where the chunk of memory you allocated for two pointers suddenly only is enough for a single pointer.

Or try to see the problem in this code:

std::pair<std::string, std::string> splitOnFirstComma(std::string const& input) { unsigned position = input.find(','); if (position == std::string::npos) { return std::make_pair(input, ""); } std::string first = input.substr(0, position); std::string second = input.substr(position+1, std::string::npos); return std::make_pair(first, second); }

unsigned is an unsigned int , which usually has 32 bit. Comparing it to the 64 bit npos then always fails, which introduces one of those nasty subtle bugs we all love so dearly.

All these small details have to be taken into account, found and fixed when switching the compiler. This is usually a series of small, isolated refactorings. Unless you are using a proprietary framework that comes with your old compiler and IDE, that is not available for the newer compiler you want to switch to. Then switching the compiler can become a large project on its own.

Continuous integration

Having to run all the tests that are not yet real unit tests and all the static analysis tools can take some time. I have worked on projects, where compilation from scratch would take half an hour, “unit” tests another hour, and static analysis was in that order of magnitude as well.

This is something we can not afford to run several times a day on our local machines. Therefore, we usually run a reduced test suite and only incremental builds. It is, however, crucial to run the full build from scratch, all tests and static analysis as often as possible, especially when we are refactoring. To achieve that, the use of a continuous integration (CI) server can come in very handy.

I myself have mostly used Jenkins in corporate environments. For many GitHub C++ projects, Travis CI is a natural choice. But there is also a host of other options, see for example this post at code-maze.com.

Refactoring without tool support

What if we are stuck with our ancient compiler and don’t have support from fancy tools? Well, we still have one tool at our disposal: The compiler itself. Using very small steps in the right order allows us to leverage the syntax checks the compiler has to do.

For example, if we want to find all uses of a function, simply rename its declaration and definition and compile. The compiler will complain about unknown function names on each use of that function. Of course this assumes that you have no other declaration with the same name.

With C++11, we can add final to a virtual function in the base class to find all classes that override the function – the compiler has to complain about each and every one of them.

Example: factor out a function

Let me finish this post with a step by step example to get help from the compiler while factoring out a function. Consider this original code:

std::shared_ptr<Node> createTree(TreeData const& data) { auto rootData = data.root(); auto newNode = std::make_shared<Node>(); newNode->configure(rootData); for (auto&& subTreeData : data.children()) { newNode->add(createTree(subTreeData)); } return newNode; }

We want to factor out the lines 2-4 into their own function createNode . I’ll assume a C++11 conformant compiler, but similar things can be done with older compilers, too.

The first step is to add an additional scope around the lines in question to see which entities get created in the new function and used outside it. These will be the return values:

std::shared_ptr<Node> createTree(TreeData const& data) { { auto rootData = data.root(); auto newNode = std::make_shared<Node>(); newNode->configure(rootData); } for (auto&& subTreeData : data.children()) { newNode->add(createTree(subTreeData)); //ERROR: newNode was not declared... } return newNode; }

So, our function needs to return newNode . The next step is to make our code compile again by putting the new scope into a lambda. We can already give the lambda the name of the new function:

std::shared_ptr<Node> createTree(TreeData const& data) { auto createNode = [&]{ auto rootData = data.root(); auto newNode = std::make_shared<Node>(); newNode->configure(rootData); return newNode; }; auto newNode = createNode(); for (auto&& subTreeData : data.children()) { newNode->add(createTree(subTreeData)); } return newNode; }

The capture by reference makes all variables defined before the lambda accessible inside it. Which those are is the next thing to find out, by simply removing the capture:

std::shared_ptr<Node> createTree(TreeData const& data) { auto createNode = []{ auto rootData = data.root(); //ERROR: data is not captured auto newNode = std::make_shared<Node>(); newNode->configure(rootData); return newNode; }; auto newNode = createNode(); for (auto&& subTreeData : data.children()) { newNode->add(createTree(subTreeData)); } return newNode; }

So, we have to get data into our function. This can be done by making it a parameter and passing it explicitly to the call:

std::shared_ptr<Node> createTree(TreeData const& data) { auto createNode = [](TreeData const& data){ auto rootData = data.root(); auto newNode = std::make_shared<Node>(); newNode->configure(rootData); return newNode; }; auto newNode = createNode(data); for (auto&& subTreeData : data.children()) { newNode->add(createTree(subTreeData)); } return newNode; }

Now we have no dependencies of the lambda to its outer scope and vice versa. That means we can extract it as a real function:

auto createNode(TreeData const& data) { auto rootData = data.root(); auto newNode = std::make_shared<Node>(); newNode->configure(rootData); return newNode; } std::shared_ptr<Node> createTree(TreeData const& data) { auto newNode = createNode(data); for (auto&& subTreeData : data.children()) { newNode->add(createTree(subTreeData)); } return newNode; }

Depending on our needs we can now add some further polishing, e.g. specifying the return type of createNode and using rootData as its parameter instead of data . However, the main task of extracting the function is done, simply by relying on the compiler to tell us what to do by triggering compiler errors the right way.

Conclusion

Tools that help us refactoring and analyzing our legacy code base are important to the necessary refactoring. It is however possible, albeit tedious, to refactor our code even without such tools. So there is no real excuse to leave our legacy code rot for another decade.