In a .NET application, memory and performance are very much linked. Poor memory management can hurt performance in many ways. One such effect is called GC Pressure or Memory Pressure.

GC Pressure (garbage collector pressure) is when the GC doesn’t keep up with memory deallocations. When the GC is pressured, it will spend more time garbage collecting, and these collections will come more frequently. When your app spends more time garbage collecting, it spends less time executing code, thus directly hurting performance.

If you’re not familiar with garbage collector fundamentals, I suggest reading this article first.

This article will show 8 techniques to minimize GC pressure, and by doing so, improve performance.

1. Set initial capacity for dynamic collections

.NET provides a lot of great collections types like List<T> , Dictionary<T> , and HashSet<T> . All those collections have dynamic size capacity. That means they automatically expand in size as you add more items.

While this functionality is very convenient, it’s not great for memory management. Whenever the collection reaches its size limit, it will allocate a new larger memory buffer (usually an array double in size). That means an additional allocation and deallocation.

Check out this benchmark:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [ Benchmark ] public void ListDynamicCapacity ( ) { List < int > list = new List < int > ( ) ; for ( int i = 0 ; i < Size ; i ++ ) { list . Add ( i ) ; } } [ Benchmark ] public void ListPlannedCapacity ( ) { List < int > list = new List < int > ( Size ) ; for ( int i = 0 ; i < Size ; i ++ ) { list . Add ( i ) ; } }

I’m using BenchmarkDotNet here with [Host]: .NET Core 2.1.9 (CoreCLR 4.6.27414.06, CoreFX 4.6.27415.01), 64bit RyuJIT

In the first method, the List collection started with default capacity and expanded in size. In the second benchmark, I set the initial capacity to the number of items it’s going to have.

For 1000 items, the results were:

Method Mean Error StdDev ListDynamicCapacity 3.415 us 0.0687 us 0.1240 us ListPlannedCapacity 2.422 us 0.0219 us 0.0183 us

By setting capacity, we saved 30% in performance time. In practice, the improvement in performance is probably even greater because BenchmarkDotNet performs GC collections before and after each benchmark run.

I performed another benchmark for Dictionary and HashSet , with similar results:

Method Mean Error StdDev DictionaryDynamicCapacity 36.693 us 0.7505 us 1.4637 us DictionaryPlannedCapacity 17.500 us 0.3325 us 0.3696 us HashSetDynamicCapacity 28.080 us 0.4264 us 0.3780 us HashSetPlannedCapacity 16.533 us 0.3285 us 0.3374 us

2. Use ArrayPool for short-lived large arrays

Allocation of arrays and the inevitable de-allocation can be quite costly. Performing these allocations in high frequency will cause GC pressure and hurt performance. An elegant solution is the System.Buffers.ArrayPool class found in the Systems.Buffers NuGet.

The idea is pretty similar to to the ThreadPool. A shared buffer for arrays is allocated, which you can reuse without actually allocating and de-allocating memory. The basic usage is by calling ArrayPool<T>.Shared.Rent(size) . This returns a regular array, which you can use any way you please. When finished, call ArrayPool<int>.Shared.Return(array) to return the buffer back to the shared pool.

Here’s a benchmark showing this:

1 2 3 4 5 6 7 8 9 10 11 12 13 [ Benchmark ] public void RegularArray ( ) { int [ ] array = new int [ ArraySize ] ; } [ Benchmark ] public void SharedArrayPool ( ) { var pool = ArrayPool < int > . Shared ; int [ ] array = pool . Rent ( ArraySize ) ; pool . Return ( array ) ; }

For 100 integers the results are:

Method Mean Error StdDev RegularArray 41.23 ns 0.8544 ns 2.236 ns SharedArrayPool 47.42 ns 0.9781 ns 1.087 ns

Pretty similar, but when running for 1,000 integers:

Method Mean Error StdDev RegularArray 404.53 ns 8.074 ns 18.872 ns SharedArrayPool 51.71 ns 1.354 ns 1.505 ns

As you can imagine, the ArrayPool allocation time stays the same, whereas regular allocation time increases as the size grows.

Much like the ThreadPool with threads, the ArrayPool should be used for short-lived large arrays. For more info on the ArrayPool, read Adam Sitnik’s excellent blog post.

3. Use Structs instead of Classes (sometimes)

Structs have several benefits when it comes to deallocation:

When structs are not part of a class, they are allocated on the Stack and don’t require garbage collection at all (stack unwinding).

and don’t require garbage collection at all (stack unwinding). Structs are stored on the heap when they are part of a class (or any reference-type). In that case, they are stored inline and are deallocated when the containing type is deallocated. Inline means the struct’s data is stored as-is. As opposed to a reference type, where a pointer is stored to another location on the heap with the actual data. This is especially meaningful in collections, where a collection of structs is much cheaper to de-allocate because it’s just one buffer of memory.

and are deallocated when the containing type is deallocated. means the struct’s data is stored as-is. As opposed to a reference type, where a pointer is stored to another location on the heap with the actual data. This is especially meaningful in collections, where a collection of structs is much cheaper to de-allocate because it’s just one buffer of memory. Structs take less memory than a reference type because they don’t have an ObjectHeader and a MethodTable.

In most cases, you will want to use classes. Use structs when all of the following is true (full guidelines from Microsoft):

The struct size is less than or equals to 16 bytes (e.g 4 integers). More than that size, classes are more effective than structs.

The struct is short lived

The struct is immutable.

The struct will not have to be boxed frequently.

In addition, structs are passing by value. So when you’re passing a struct as a method parameter, it will be copied entirely. Copying is expansive and can hurt performance instead of improving it.

Here’s a benchmark that shows how efficient allocating structs can be:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 class VectorClass { public int X { get ; set ; } public int Y { get ; set ; } } struct VectorStruct { public int X { get ; set ; } public int Y { get ; set ; } } private const int ITEMS = 10000 ; [ Benchmark ] public void WithClass ( ) { VectorClass [ ] vectors = new VectorClass [ ITEMS ] ; for ( int i = 0 ; i < ITEMS ; i ++ ) { vectors [ i ] = new VectorClass ( ) ; vectors [ i ] . X = 5 ; vectors [ i ] . Y = 10 ; } } [ Benchmark ] public void WithStruct ( ) { VectorStruct [ ] vectors = new VectorStruct [ ITEMS ] ; // At this point all the vectors instances are already allocated with default values for ( int i = 0 ; i < ITEMS ; i ++ ) { vectors [ i ] . X = 5 ; vectors [ i ] . Y = 10 ; } }

Result:

Method Mean Error StdDev WithClass 77.97 us 1.5528 us 2.6785 us WithStruct 12.97 us 0.2564 us 0.6094 us

As you can see, the struct allocation is about 6.5 times faster than class allocation.

4. Avoid Finalizers

Finalizers in C# are very expensive for several reasons:

Any class with a finalizer is automatically promoted a generation by the garbage collector. This means they can’t be garbage collected in Gen 0, which is the fastest generation.

The finalizer is placed in a Finalizer Queue, handled by a single dedicated thread. This can cause problems is some finalizer runs for a long time or throws an exception.

To prove how terrible finalizers can be for performance, consider the following benchmark:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 class Simple { public int X { get ; set ; } } class SimpleWithFinalizer { ~ SimpleWithFinalizer ( ) { } public int X { get ; set ; } } private int ITEMS = 100000 ; private static Simple _instance1 ; private static SimpleWithFinalizer _instance2 ; [ Benchmark ] public void AllocateSimple ( ) { for ( int i = 0 ; i < ITEMS ; i ++ ) { _instance1 = new Simple ( ) ; } } [ Benchmark ] public void AllocateSimpleWithFinalizer ( ) { for ( int i = 0 ; i < ITEMS ; i ++ ) { _instance2 = new SimpleWithFinalizer ( ) ; } }

The result for 100,000 items is:

Method Mean Error StdDev AllocateSimple 409.9 us 9.063 us 17.24 us AllocateSimpleWithFinalizer 128,796.8 us 2,520.871 us 2,588.75 us

The measuring unit ‘us’ stands for microseconds. 1000 us = 1 millisecond

As you can see, there’s a 1:320 ratio in favor of classes without finalizers.

Sometimes, finalizers are unavoidable. For example, they are often used in the Dispose Pattern. In such cases, make sure to suppress the finalizers when it’s no longer required, like this:

1 2 3 4 5 6 public void Dispose ( ) { Dispose ( true ) ; // the actual dispose functionality GC . SuppressFinalize ( this ) ; //now, the finalizer won't be called }

5. Use StackAlloc for short-lived array allocations

The StackAlloc keyword in C# allows for very fast allocation and deallocation of unmanaged memory. That is, classes won’t work, but primitives, structs, and arrays are supported. Here’s an example benchmark:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 struct VectorStruct { public int X { get ; set ; } public int Y { get ; set ; } } [ Benchmark ] public void WithNew ( ) { VectorStruct [ ] vectors = new VectorStruct [ 5 ] ; for ( int i = 0 ; i < 5 ; i ++ ) { vectors [ i ] . X = 5 ; vectors [ i ] . Y = 10 ; } } [ Benchmark ] public unsafe void WithStackAlloc ( ) // Note that unsafe context is required { VectorStruct* vectors = stackalloc VectorStruct [ 5 ] ; for ( int i = 0 ; i < 5 ; i ++ ) { vectors [ i ] . X = 5 ; vectors [ i ] . Y = 10 ; } } public void WithStackAllocSpan ( ) // When using Span, no need for unsafe context { Span < VectorStruct > vectors = stackalloc VectorStruct [ 5 ] ; for ( int i = 0 ; i < 5 ; i ++ ) { vectors [ i ] . X = 5 ; vectors [ i ] . Y = 10 ; } }

This results are:

Method Mean Error StdDev WithNew 10.372 ns 0.1531 ns 0.1432 ns WithStackAlloc 5.704 ns 0.0938 ns 0.0831 ns WithStackAllocSpan 5.742 ns 0.0965 ns 0.1021 ns

stackalloc is about twice as fast as regular instantiation. When increasing the number of items from 5 to 100, the difference is even greater – 82ns : 36ns.

Use Span<T> rather than array pointer since no unsafe context is needed

Learn more about stackalloc here.

6. Use StringBuilder, but not always

Strings are immutable. As such, they cannot change. Any concatenation like str1 = str1 + str2 will allocate a new object. To prevent these new allocations and improve performance, the StringBuilder class was created.

I recently wrote a blog post on StringBuilder performance and found out that things were not as simple as they might seem. Here’s the summary of my research:

Regular concatenations are more efficient than StringBuilder for a small number of concatenations. Depending on string sizes, using StringBuilder becomes more efficient with over 10-15 concatenations.

for a small number of concatenations. Depending on string sizes, using becomes more efficient with over 10-15 concatenations. StringBuilder can be optimized by setting its initial capacity.

can be optimized by setting its initial capacity. StringBuilder can be optimized by reusing the same instance. This can make a difference for very frequent usages like logging.

For more information, read the full article: Challenging the C# StringBuilder Performance

7. Use String Interning in very specific cases

About 60% percent of the human body is water. Similarly, about 70% of a .NET application is strings. This makes optimizing strings one of the most important aspects of memory management.

The .NET runtime has a hidden optimization. For literal strings with the same value, it uses the same reference. For example, consider the following code:

1 2 3 string a = "Table" ; string b = "Table" ;

It seems like a and b will be allocated to 2 different objects. But, the CLR will allocate just 1 object, which both a and b will reference. This optimization is called String Interning. There are 2 positive side effects to this:

You save memory by using just 1 object. It’s cheaper to compare between the strings. A comparison first checks for reference equality. Since both a and b referencing same object, the comparison will return true without actually checking the string contents.

This optimization is done just for string literals. For example, when you write something like this: string myString = "Something" . It’s not done for strings that are calculated at runtime. The reason is that string interning is expensive. When interning a new string, the runtime has to look for an identical string in memory to find a match. This is obviously expensive and just not done.

As it happens, you can perform string intering manually. This is done with the string.Intern(string) method. And you can check if a string is already interned with string.IsInterned(string) . In very specific cases, you can use this for optimization. Here’s one example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 private string s1 = "Hello" ; private string s2 = " World" ; [ Benchmark ] public void WithoutInterning ( ) { string s1 = GetNonLiteral ( ) ; string s2 = GetNonLiteral ( ) ; for ( int i = 0 ; i < Size ; i ++ ) { bool x = s1 . Equals ( s2 ) ; } } [ Benchmark ] public void WithInterning ( ) { string s1 = string . Intern ( GetNonLiteral ( ) ) ; string s2 = string . Intern ( GetNonLiteral ( ) ) ; for ( int i = 0 ; i < Size ; i ++ ) { bool x = s1 . Equals ( s2 ) ; } } private string GetNonLiteral ( ) { return s1 + s2 ; }

For 100 items this benchmark will return:

Method Mean Error StdDev Median WithoutInterning 198.3 ns 3.986 ns 10.776 ns 201.5 ns WithInterning 424.4 ns 8.426 ns 8.653 ns 421.0 ns

And for 10,00 items:

Method Mean Error StdDev WithoutInterning 68.06 us 0.6225 us 0.5198 us WithInterning 16.11 us 0.3288 us 0.3075 us

As you can see, this can be very effective when the amount of comparisons is much larger than the number of intern operations. These cases are very rare. If you do consider interning, do some benchmarking to make sure you are actually optimizing anything.

Note that an interned string will never be garbage collected. It might make more sense to create a local string-pool of your own. You can see Jon Skeet’s answer on StackOverflow where he explains this point further and even shows an implementation example.

8. Avoid memory leaks

Memory leaks are a constant troublemaker in any big application. Besides the obvious danger of an eventual out-of-memory exception, memory leaks also cause GC Pressure and performance issues. Here’s how:

With a memory leak, objects remain referenced, even when they are effectually unused. While referenced, the garbage collector will keep promoting them to higher generations instead of collecting them. These promotions are expansive and add work for the GC.

Memory leaks cause more memory to be in use. This means you will run out of free space quicker, causing the GC to do more frequent collections.

Memory leaks are a huge subject. Here are 2 resources you can take advantage of to learn more:

Summary

I hope you got value from the mentioned tips and tricks. You probably noticed that all of the above optimizations make use of one or more of these core concepts:

Allocations should be avoided if possible.

Reusing memory is better than allocating new memory.

Allocating on the Stack is faster than allocating on the Heap.

These are not the only concepts in performance optimizations, but probably the most important ones when it comes to GC pressure.

Happy coding.