Uglier than a Windows backslash, odder than === , more common than PHP, more unfortunate than CORS, more disappointing than Java generics, more inconsistent than XMLHttpRequest, more confusing than a C preprocessor, flakier than MongoDB, and more regrettable than UTF-16, the worst mistake in computer science was introduced in 1965.

In commemoration of the 50th anniversary of Sir Hoare’s null, this article explains what null is, why it is so terrible, and how to avoid it.

I call it my billion-dollar mistake…At that time, I was designing the first comprehensive type system for references in an object-oriented language. My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. – Tony Hoare, inventor of ALGOL W.

What is wrong with NULL?

The short answer: NULL is a value that is not a value. And that’s a problem.

It has festered in the most popular languages of all time and is now known by many names: NULL, nil, null, None, Nothing, Nil, nullptr. Each language has its own nuances.

Some of the problems caused by NULL apply only to a particular language, while others are universal; a few are simply different facets of a single issue.

NULL…

subverts types is sloppy is a special case makes poor APIs exacerbates poor language decisions is difficult to debug is non-composable

1. NULL subverts types Statically typed languages check the uses of types in the program without actually executing, providing certain guarantees about program behavior. For example, in Java, if I write x.toUppercase() , the compiler will inspect the type of x. If x is known to be a String , the type check succeeds; if x is known to be a Socket , the type check fails. Static type checking is a powerful aid in writing large, complex software. But for Java, these wonderful compile-time checks suffer from a fatal flaw: any reference can be null, and calling a method on null produces a NullPointerException . Thus, toUppercase() can be safely called on any String …unless the String is null.

can be safely called on any …unless the is null. read() can be called on any InputStream …unless the InputStream is null.

can be called on any …unless the is null. toString() can be called on any Object …unless the Object is null. Java is not the only culprit; many other type systems have the same flaw, including of course, AGOL W. In these languges, NULL is above type checks. It slips through them silently, waiting for runtime, to finally burst free in a shower of errors. NULL is the nothing that is simultaneously everything.

2. NULL is sloppy There are many times when it doesn’t make sense to have a null. Unfortunately, if the language permits anything to be null, well, anything can be null. Java programmers risk carpal tunnel from writing if (str == null || str.equals("")) { } It’s such a common idiom that C# adds String.IsNullOrEmpty if (string.IsNullOrEmpty(str)) { } Abhorrent. Every time you write code that conflates null strings and empty strings, the Guava team weeps.

– Google Guava Well said. But when your type system (e.g. Java, or C#) allows NULL everywhere, you cannot reliably exclude the possibility of NULL, and it’s nearly inevitable it will wind up conflated somewhere. The ubiquitous possibility of null posed such a problem that Java 8 added the @NonNull annotation to try to retroactively fix this flaw in its type system.

3. NULL is a special-case Given that NULL functions as a value that is not a value, NULL naturally becomes the subject of various forms of special treatment. Pointers For example, consider this C++: char c = 'A'; char *myChar = &c; std::cout << *myChar << std::endl; myChar is a char * , meaning that it is a pointer—i.e. the memory address—to a char . The compiler verifies this. Therefore, the following is invalid: char *myChar = 123; // compile error std::cout << *myChar << std::endl; Since 123 is not guaranteed to be the address of a char , compilation fails. However, if we change the number to 0 (which is NULL in C++), the compiler passes it: char *myChar = 0; std::cout << *myChar << std::endl; // runtime error As with 123, NULL is not actually the address of a char . Yet this time the compiler permits it, because 0 (NULL) is a special case. Strings Yet another special case happens with C’s null-terminated strings. This is a bit different than the other examples, as there are no pointers or references. But the idea of a value that is not a value is still present, in the form of a char that is not a char. A C-string is a sequence of bytes, whose end is marked by the NUL (0) byte. 76 117 99 105 100 32 83 111 102 116 119 97 114 101 0 L u c i d S o f t w a r e NUL Thus, each character of a C-string can be any of the possible 256 bytes, except 0 (the NUL character). Not only does this make string length a linear-time operation; even worse, it means that C-strings cannot be used for ASCII or extended ASCII. Instead, they can only be used for the unusual ASCIIZ. This exception for a singular NUL character has caused innumerable errors: API weirdness, security vulnerabilities, and buffer overflows. NULL is the worst CS mistake; more specifically, NUL-terminated strings are the most expensive one-byte mistakes.

4. NULL makes poor APIs For the next example, we will journey to the land of dynamically-typed languages, where NULL will again prove to be a terrible mistake. Key-value store Suppose we create a Ruby class that acts as a key-value store. This may be a cache, an interface for a key-value database, etc. We’ll make the general-purpose API simple: class Store ## # associate key with value # def set(key, value) ... end ## # get value associated with key, or return nil if there is no such key # def get(key) ... end end We can imagine an analog in many languages (Python, JavaScript, Java, C#, etc.). Now suppose our program has a slow or resource-intensive way of finding out someone’s phone number—perhaps by contacting a web service. To improve performance, we’ll use a local Store as a cache, mapping a person’s name to his phone number. store = Store.new() store.set('Bob', '801-555-5555') store.get('Bob') # returns '801-555-5555', which is Bob’s number store.get('Alice') # returns nil, since it does not have Alice However, some people won’t have phone numbers (i.e. their phone number is nil). We’ll still cache that information, so we don’t have to repopulate it later. store = Store.new() store.set('Ted', nil) # Ted has no phone number store.get('Ted') # returns nil, since Ted does not have a phone number But now the meaning of our result is ambiguous! It could mean: the person does not exist in the cache (Alice) the person exists in the cache and does not have a phone number (Tom) One circumstance requires an expensive recomputation, the other an instantaneous answer. But our code is insufficiently sophisticated to distinguish between these two. In real code, situations like this come up frequently, in complex and subtle ways. Thus, simple, generic APIs can suddenly become special-cased, confusing sources of sloppy nullish behavior. Patching the Store class with a contains() method might help. But this introduces redundant lookups, causing reduced performance, and race conditions. Double trouble JavaScript has this same issue, but with every single object.

If a property of an object doesn’t exist, JS returns a value to indicate the absence. The designers of JavaScript could have chosen this value to be null . But instead they worried about cases where the property exists and is set to the value null. In a stroke of ungenius, JavaScript added undefined to distinguish a null property from a non-existent one. But what if the property exists, and is set to the value undefined ? Oddly, JavaScript stops here, and there is no uberundefined . Thus JavaScript wound up with not only one, but two forms of NULL.

5. NULL exacerbates poor language decisions Java silently converts between reference and primitive types. Add in null, and things get even weirder. For example, this does not compile: int x = null; // compile error This does compile: Integer i = null; int x = i; // runtime error though it throws a NullPointerException when run. It’s bad enough that member methods can be called on null; it’s even worse when you never even see the method being called.

6. NULL is difficult to debug C++ is a great example of how troublesome NULL can be. Calling member functions on a NULL pointer won’t necessarily crash the program. It’s much worse: it might crash the program. #include <iostream> struct Foo { int x; void bar() { std::cout << "La la la" << std::endl; } void baz() { std::cout << x << std::endl; } }; int main() { Foo *foo = NULL; foo->bar(); // okay foo->baz(); // crash } When I compile this with gcc, the first call succeeds; the second call fails. Why? foo->bar() is known at compile-time, so the compiler avoids a runtime vtable lookup, and transforms it to a static call like Foo_bar(foo) , with this as the first argument. Since bar doesn’t dereference that NULL pointer, it succeeds. But baz does, which causes a segmentation fault. But suppose instead we had made bar virtual. This means that its implementation may be overridden by a subclass. ... virtual void bar() { ... As a virtual function, foo->bar() does a vtable lookup for the runtime type of foo , in case bar() has been overridden. Since foo is NULL, the program now crashes at foo->bar() instead, all because we made a function virtual. int main() { Foo *foo = NULL; foo->bar(); // crash foo->baz(); } NULL has made debugging this code extraordinarily difficult and unintuitive for the programmer of main . Granted, dereferencing NULL is undefined by the C++ standard, so technically we shouldn’t be surprised by whatever happened. Still, this is a non-pathological, common, very simple, real-world example of one of the many ways NULL can be capricious in practice.