A commonly misunderstood concept in .NET performance tuning is the importance of avoiding memory allocations. It is thought that since memory allocations are fast, that they rarel,y if ever, have an impact on performance.

To understand the cause of this misunderstanding, one must go back to the era of COM programming as seen in C++ and Visual Basic 4 thru 6. With COM, memory was managed using a reference counting style garbage collector. Each time an object was assigned to a reference variable, a hidden counter was incremented. If the variable is reassigned or falls out of scope, the counter is decremented. And if the counter reaches zero, the object is deleted, freeing the memory to use elsewhere.

This system of memory management is “deterministic”. By careful analysis, you can determine exactly when an object will be deleted. This in turn means that you can automatically free resources such as database connections. Contrast this with .NET, where you need a separate mechanism (i.e. IDisposable/using) to ensure the non-memory resources are released in a timely manner.

There are three major downsides to reference counting garbage collectors. The first is that they are susceptible to “circular references”. If two objects reference each other, even indirectly, then it is impossible for the reference count to drop to zero and a memory leak occurs. Code has to be carefully written to either avoid circular references or to provide a deconstruct method of some sort that break the loop when the objects are no longer needed.

The other major drawback occurs when working in a multi-threaded environment. In order to avoid race conditions, some sort of locking mechanism (e.g. Interlocked.Increment, spinlock, etc.) is needed to ensure that the refence counts remain accurate. These operations are surprisingly expensive.

Finally, the list of available memory locations can become fragmented with a lot of small, unusable spaces between live objects. Memory allocation often involves walking a linked list of chain of free locations, looking for a spot large enough for the desired object. (Memory fragmentation can also in .NET on the “Large Object Heap” or LOH.)

By contrast, allocating memory in a mark-and-sweep style garbage collector such as .NET or Java is a simple matter of a pointer increment. And assignments are no more costly than assigning an integer. It’s only when the GC actually runs that the real cost is paid, and often that’s mitigated by using a generational collector.

Back when .NET was new, many people complained that the non-deterministic performance of .NET’s garbage collector would harm performance and be difficult to reason about. Microsoft’s counter-argument at the time was that, for most use cases, the mark-and-sweep garbage collector would actually be faster despite the intermittent GC pauses.

Unfortunately, over time the message has become somewhat garbled. Even if we accept the theory that mark-and-sweep garbage collection is faster than reference counting, that doesn’t mean it’s necessarily fast in an absolute sense. Memory allocations, and the associated memory pressure, are often a cause of hard to detect performance problems.

Furthermore, the more memory being actively used, the less efficient your CPU caches are. While main RAM is so large that using disk-based virtual memory is almost unheard of for most use cases, the cache in the CPU is tiny by comparison. And the time it takes to populate the CPU cache from RAM can take dozens or even hundreds of CPU cycles.

In a recent article, Frans Bouma identified several techniques for optimizing memory usage. While he was specifically looking at improving ORM performance, the suggestions are useful in a variety of situations. Some of his suggestions include:

Avoid params arrays

The params keyword is useful, but expensive compared to a normal function call because it requires a memory allocation. APIs should provide non-params overloads for commonly used parameter counts.

An IEnumerable<T> or IList<T> overload should also be provided so that collections don’t need to be unnecessarily copied into an array before calling the function.

Pre-size data-structures if you add data immediately afterwards

A List<T> or other collection class can be resized multiple times while being populated. Each resize operation allocates another internal array than then needs to filled by the previous array. You can often avoid this cost by providing the collection’s constructor with a capacity parameter.

Initialize members lazily

If you know that a given object won’t actually be needed most of the time, then you should use lazy-initialization to avoid allocating it prematurely. Usually this is done manually, as the Lazy<T> class itself requires allocations.

Back in 2011, we reported on Microsoft’s efforts to reduce the size of Task by using similar techniques. They reported that they saw a 49 to 55% reduction in the time it takes to create a Task<Int32> and a 52% reduction in size.