Automatic Reference Counting

Since 2005, I've been very publicly concerned about the long-term prospects of Apple's programming language and application framework, Objective-C and Cocoa, going so far as to speculate about a possible technological crisis a few years in the future.

When the future arrived, I revisited the issue of Apple's language and API future in light of Apple's dramatic entrance into the mobile market and the unprecedented growth this has enabled. You can read my conclusions for yourself, but the bottom line is that I'm still concerned about the issue—and think Apple should be too. Success hides problems, and Apple has been so very successful in recent years.

Enter (and exit) garbage collection

Apple has done a tremendous amount of work to modernize its development platform, including completely replacing its compiler, overhauling its IDE, and adding features and new syntax to the Objective-C language itself.

All of these things are great, but none address my specific concerns about memory management. Apple did eventually see fit to add garbage collection to Objective-C, but my fear that Apple wouldn't really commit to garbage collection in Objective-C turned out to be well-founded. Today, years after the introduction of this feature, very few of Apple's own applications use garbage collection.

There's a good reason for this. Runtime garbage collection is simply a poor fit for Objective-C. For all its syntactic simplicity and long, distinguished history, the C programming language is actually a surprisingly complex beast, especially when it comes to memory management. In C, any correctly aligned pointer-size bit pattern in memory can potentially be used as an address; the language explicitly allows casting from void * to a typed pointer, and vice versa. Objective-C, as a superset of C, inherits these charming properties. In exchange for this sacrifice, Objective-C code can be compiled alongside plain C code and can link to C libraries with ease.

This means that the runtime garbage collector is expected to traverse memory allocated by an arbitrary conglomeration of Objective-C and plain old C code and make the correct decision—every time—about what memory may safely be collected. Apple's Objective-C garbage collection is a global switch. It can't be enabled just for the clean, object-oriented Objective-C code that application developers write; it applies to the entire process, including all the frameworks that the application links to.

It seems sensible for garbage collection to take a hands-off approach to any memory allocated outside Objective-C's gated object-oriented community. Unfortunately, memory allocated "the old-fashioned way" in plain C code routinely makes its way into the world of Objective-C, and vice versa. In theory, all such code could be annotated in such a way that it works correctly with garbage collection. In practice, Mac OS X contains way too much code—much of it not written by Apple—to be able to properly vet every line of it to ensure that a runtime garbage collector has enough information to make the right decisions in every case.

And, in fact, despite Apple's bold claims of readiness, there have been and continue to be cases where even code within Apple's own frameworks can confuse the Objective-C garbage collector. These kinds of bugs are particularly insidious because they may only manifest themselves when the collector runs within a certain window of time. The garbage collection compatibility outlook for third-party libraries is even more grim.

Long story short: garbage collection for Objective-C is out. (It's still supported in Lion, but I wouldn't count on Apple putting a tremendous amount of effort into it going forward. And don't be surprised if it goes the way of Rosetta in a few years.) In its place, Apple has created something called Automatic Reference Counting, or ARC for short. But to understand ARC, you should first understand how memory management in Cocoa has traditionally worked.

Cocoa memory management

Cocoa uses a memory management technique called reference counting. Each object has a reference count associated with it. When some part of an application takes ownership of an object, it increments the object's reference count by sending it a retain message. When it's done with the object, it decrements the reference count by sending a release message to the object. When an object's reference count is zero, it is deallocated.

This allows a single object to be used by several different parts of the application, each of which is responsible for bookending its use of the object with retain and release messages. If retain is sent to an object more times than release , then its reference count will never reach zero and its memory will never be freed. This is called a memory leak. If release is sent more times than retain , then a release message sent after the object's reference count has reached zero will find itself looking at the region of memory formerly occupied by the object, which may now contain anything at all. A crash usually ensues.

Finally, there's the autorelease message which means " release , but later." When an object is sent an autorelease message, it's added to the current "autorelease pool." When that pool is drained, all objects in it are sent one release message for each time they were added to the pool. (An object may be added to the same autorelease pool multiple times.) Cocoa applications have an autorelease pool that's drained at the end of each event loop, but new pools can be created locally by the programmer.

Simple, right? Just make sure your retain and release / autorelease messages are balanced and you're golden. But as straightforward as it is conceptually, it's actually surprisingly easy to get wrong. Experienced Cocoa programmers will tell you that retain / release memory management eventually becomes second-nature—and it does—but programmers are only human. Accurately tracking the lifecycle of all objects in a large application starts to push the limits of human mental capacity. To help, Apple provides sophisticated developer tools for tracking memory allocations and hunting down leaks.

But education and tools only go so far. Cocoa experts may not see retain / release memory management as a problem, but Apple is looking towards the future, towards new developers. Other mobile and desktop platforms don't require this sort of manual memory management in their top-level application frameworks. Based on Apple's past efforts with garbage collection, it seems clear that Apple believes it would be better for the platform if developers didn't have to manually manage memory. Now, finally, Apple believes it has found a solution that it can really get behind.

Enter ARC

To understand how ARC works, start by picturing a traditional Objective-C source code file written by an expert Cocoa programmer. The retain , release , and autorelease messages are sent in all the right places and are in perfect balance.

Now imagine editing that source code file, removing every instance of the retain , release , and autorelease messages, and changing a single build setting in Xcode that instructs the compiler to put all the appropriate memory management calls back into your program when the source code is compiled. That's ARC. It's just what the name says: traditional Cocoa reference counting, done automatically.

Before explaining how ARC does this, it's important to understand what ARC does not do. First, ARC does not impose a new runtime memory model. Code compiled under ARC uses the same memory model as plain C or non-ARC Objective-C code, and can be linked to all the same libraries. Second, ARC provides automatic memory management for Objective-C objects only (though note that blocks also happen to be Objective-C objects under the covers). Memory allocated in any other way is not touched and must still be managed manually. (The same goes for other resources like file handles and sockets.) Finally, ARC is not garbage collection. There is no process that scans the memory image of a running application looking for memory to deallocate. Everything ARC does happens at compile time.

What ARC does at compile time is not magic. There is no deep artificial intelligence at work here. ARC doesn't even use LLVM's sophisticated static analyzer to figure out where to put the retain s and release s. The static analyzer takes a long time to run—too long to be a mandatory part of the build process; it can also produce false positives. That's fine for a tool meant to detect possible bugs, but reliable memory management requires certainty.

What allows ARC to work is the same thing that enables people to (eventually) become expert Cocoa programmers: conventions. Cocoa has rules about the transfer of ownership that takes place during common operations like getting or setting an object attribute, initializing an object, or making a mutable copy. Furthermore, the methods that implement these operations follow a set of naming conventions. ARC knows all these rules and uses them to decide when to retain and when to release .

In fact, ARC follows the rules in a more pedantic manner than any human ever would, bracketing every operation that could possibly be influenced by object ownership with the appropriate retain and release messages. This can produce a huge number of memory management operations. Luckily, Apple has an excellent optimizing compiler called Clang (since rechristened by Apple's marketing geniuses as the Apple LLVM Compiler 3.0). Clang sweeps through this sea of mechanically generated code, detecting and eliminating redundancies until what remains looks a lot like what a human would have written.

Conventions were made to be broken, of course. But what ARC lacks in semantic sophistication it makes up for in predictability and speed, speed, speed. In cases where the human really does know best, ARC can be told exactly what to do thanks to a comprehensive set of new attributes and macros that allow the developer to annotate variables, data structures, methods, and parameters with explicit instructions for ARC. But the idea behind ARC is that these exceptions should be rare.

To ensure that ARC can do what it's designed to do in a correct manner, a few additional language restrictions have been added. Most of them are esoteric, existing on the boundaries between Objective-C and plain C code (e.g., C structs and unions are not allowed to contain references to Objective-C objects). Compatibility with existing C code is one of Objective-C's greatest strengths. But since ARC is a per-compilation-unit feature and ARC and non-ARC code can be mixed freely, these new language restrictions make ARC more reliable without compromising interoperability.