A recurring theme in just about all discussions revolving around the comparison of programming languages – apart from using the wrong tool for the job, adamantly pushing a language objectively/demonstrably inferior at x out of blind loyalty, bashing on languages you’ve never used or studied simply because you’ve seen firsthand how well received such comments can be, and worse – is acting off of stale information that no longer necessarily holds true.

At NeoSmart Technologies, we don’t just have one dog in the race; our software is developed in a multitude of languages, ranging from C/C++ to both desktop/web C#/ASP.NET, rust, [JS|TypeScript]/HTML/[LESS|CSS], (ba)sh scripting, and more.1 So it’s always interesting to observe these discussions (sometimes up close and personal and sometimes disinterestedly from afar) and observe what arguments remain standing once the dust has settled and the troops have gone home for the day.

As mentioned, one of the most important concepts to keep in mind when discussing programming languages is the tool ↔ job relationship. An astute reader will have observed that while the list of languages we code in may be somewhat diverse, it actually contains little overlap in practice: no one in their right mind would use a shell script to create a complicated GUI for an enterprise product (or would they?) and JavaScript isn’t quite the right tool for the job when you’re writing a kernel driver.2 Low-level, low-dependency applications written in C or C++ probably shouldn’t be dumped in favor of some .NET Core code anytime soon (although work has progressed on native, dependency-free AOT compilation of .NET Core code in recent weeks) – but what about rust? While C# may have never made an official goal out of replacing legacy C and C++ code, rust certainly has.

No amount of arguing can dance around the fact that rust (the language) exposes significantly more information to help rustc (the compiler) protect developers from hanging themselves with their own rope, but there are arguments to be made for the benefits of recent improvements and advancements in the C++ world that have made manual allocation of pointers a significantly-less-necessary evil combined with toolchain improvements that have improved static analysis and provided immense relief from a certain class of bugs and vulnerabilities.

Many rust champions that haven’t come from the C++ world are unaware of these improvements, and it’s not too improbable that you’ll see in a “C++ vs rust” debate a rustacean retorting with something along the lines of “.. and I’ll never have to use pointers ever again!” (or something else equal parts overly-dramatized and untrue) to which a C++ developer that has embraced the outpouring of new features in the language and standard library since C++111 and beyond will pipe up with the surprising news that “it’s actually possible to write an entire project in C++11 without having to deal with pointers even once,” which is certainly true enough, almost inevitably followed up with “and if you restrict yourself to a subset of C++ (the language), you don’t even have to worry about memory allocation or lifespans because it’s all done automatically via references and smart pointers;” at which point they’ve unfortunately demonstrated their lack of familiarity with lifetimes and opened themselves up to the same problems pointers would have similarly caused oh so long ago.

While it’s true that shared_ptr and co (perhaps primarily unique_ptr<T> which brings almost rust-like qualities to the language, unfortunately hidden behind (officially) heap-allocated memory and a pointer indirection) have made C++ a significantly faster, safer, and more productive language to write code in, it’s most unfortunately untrue that C++11 onwards has made it easier to avoid lifetime issues, due to a problem seen way too often in many large C++11 (or C++14 and C++17) codebases both in the private/enterprise and the public/OSS sectors, that can perhaps best be summed up by the old adage, what the right hand giveth, the left hand taketh away.

The problem is that while C++11 brought to the world the glories of shared_ptr , unique_ptr , and a mountain of other benefits, they also introduced a horribly misunderstood, impossible to properly vet feature in the form of capture-by-reference for variables used in lambdas, causing a raft of lifetime issues to surface. At heart, reference-captured variables used in lambdas are no different than functions returning references to locally-scoped variables: an obvious no-no if there ever was one. But the problem with lambdas is that they significantly obfuscate the issue, and make it all too easy to pass in one or more variables by reference (especially with the evil, never-should-have-been adopted, should-be-a-hard-warning-by-default capture all by reference [&] lambda operator.

Fundamentally, the following code snippet demonstrating the usage of a function-local variable passed by reference to a lambda is no different than a naive return of a locally-scoped variable by reference in “legacy” C++:

auto helper1(std::string str) -> auto { std::string capital; std::transform(str.begin(), str.end(), std::back_inserter(capital), ::toupper); return [&capital] () { std::cout << "HELLO, " << capital << "!" << std::endl; }; } int main(int argc, const char *argv[]) { auto callback1 = helper1("Mahmoud"); callback1(); }

It’s easy to argue – going into the very-much contrived code sample above already knowing what to look for and all – that the invalid (or, more technically, the use-after-free) memory access above is obvious and is a mistake that no one could ever fall for twice. But you’d be surprised. Previous “idiomatic” usages of pass-by-reference in C++ (i.e. as used by your average programmer and not by either of expert practitioners or beginners) typically took the form of one of the two: pass-by-reference arguments to a function, and return-by-reference from a function, usages of the former probably outnumbering the latter by at least 500:1.

I’m not going to argue that it’s impossible to end up in a situation where an argument passed in to a function by reference can wind up pointing to invalid memory, but by-and-large, it’s a fairly safe and very ordinary occurrence. A variable is declared before the function is called, and a reference to it is passed to the function so that it’s existing value may be manipulated by the called function, most often used as a workaround for the fact that C++ (the language) had no tuples and functions could only return a single result:

bool foo(int input, int &output) { bool ok = true; //do something here that might fail if (ok) { output = result; } return ok; }

In the code above, the reference is used as a more-or-less equivalent replacement for C’s pointers, preventing the null dereferencing that might have occurred if int &output were int *output instead – just like ref in C# would be used to accomplish the same (before the language grew up and learned to use tuples).

This usage of pass-by-reference for variables is by far the most common in C++ development that most developers you interview will have a hard time telling you when to use return-by-reference, or if it’s even legal in the language (let alone both legal and safe) to write something like this:

class Singleton { public: int x; int &retrieve_reference() { return x; } void print() { std::cout << "x is " << x << std::endl; } }; Singleton singleton; int main(int argc, const char *argv[]) { int &x = singleton.retrieve_reference(); x = 42; singleton.print(); }

That function declaration with a return-by-reference is used when you need it; you never find developers returning by reference without at least thinking they had a good reason to do so. And the only time you’d need to use it would be when the caller needs to be able to modify the state of an existing variable, i.e. when that variable is, at least at the time that the function returns, still alive. Think about it: ignoring the memory issues in the code fragment below, what would be the point of code that essentially boils down to this:

int &foo() { int x = 7; return x; }

But with C++11 onwards, it’s become only too easy to write code that does just that, only without it being so obvious. The insidious “capture-all-by-reference” [&] makes it all too likely that a (lazy) developer will use it as a shortcut being fully cognizant of what variables are being referenced and when they are expected to go out of scope, only for a future commit (by the same developer, no less!) to introduce access to a different variable (without any complaints from the compiler) into the lambda that will result in a use-after-free vulnerability.

Since variables captured by value in a lambda are read-only by default (unless mutable is used) and since lambdas are often used as a way of reducing code repetition and eliminating some copy-and-paste errors by modifying variables declared in an outer scope from within a lambda (only possible when variables are captured by reference), lambdas are a natural breeding ground for these use-after-free errors. This problem is greatly exacerbated by the abundance of literature – both online and offline – that claims lambdas “extend” the lifetime of variables “captured” by value (when, in reality, a copy of that variable with its value captured having a lifetime equal to that of the lambda itself is created).

Suffice to say, if you restrict yourself to a subset of C/C++ that avoids pointers and references and shares state between scopes exclusively via reference-counted smart pointers or by passing-by-value, these gotchas don’t apply. But while references were a significant improvement over pointers, it’s either intellectual dishonesty or sheer naïveté to think that references magically eliminate memory errors. It’s important to understand that memory access violations come in many different shapes and colors, and replacing C pointers with C++ references only solves one class of them (null dereference).

But I think we can all agree that virtually all the reasons to use C++17 disappear if you are forced to use reference-counted, heap-allocated variables for virtually all of your code. It might not be as slow as *insert name of your least-favorite interpreted language here*, but it will certainly be far more verbose!

— Addendum: Bonus Content —

If you’re still not sure, here’s an example of code using std::shared_ptr that is still vulnerable to this issue, only not as obviously so:

#include <set> #include <functional> #include <iostream> #include <memory> class int_wrapper; std::set<int_wrapper*> destroyed; class int_wrapper { int _value; public: int_wrapper(int x) { _value = x; } ~int_wrapper() { destroyed.insert(this); } int get_value() { if (::destroyed.find(this) != ::destroyed.end()) { std::cerr << "mayday! get_value() called against" " destroyed object!" << std::endl; } return _value; } }; std::function<void ()> bar() { std::shared_ptr<int_wrapper> ptr; auto helper = [&](int x) { for (int i = 1; i <= x; ++i) { if (i == (rand() % x) || i == (rand() % 3)) { std::cout << "Picked a number!" << std::endl; ptr = std::make_shared<int_wrapper>(i); break; } } int to_print = ptr->get_value(); std::cout << "The pointless convolution returned " << to_print << std::endl; }; bool condition1 = (rand() % 20) == 2; if (condition1) { return std::bind(helper, 12); } else { return std::bind(helper, 5); } } int main(int argc, const char *argv[]) { srand(42); auto foo = bar(); foo(); }