A common execution path optimization

Sergey

February 21st, 2017

Today I want to talk about one interesting optimization pattern that you may face in framework code or in high-performance libraries.

The idea is simple: suppose you have a commonly used method that has two execution paths – one is very common and simple, and the second one takes longer to execute, has more steps but happens not that often.

As an example, let’s consider a List<T>.Add method. On the “happy-path” Add method is very simple and efficient: if there is enough space it adds an item to an internal array. But if the internal array is full, Add method resizes the underlying array (by allocating another array double the size) and then adds an element to it.

Here is a simple implementation of this method:

public void Add(T item) { if (_size < _items.Length) { // Common path: the array has enough space _items[_size++] = item; return; } // Corner case: need to resize the array int newCapacity = _items.Length == 0 ? _defaultCapacity : _items.Length * 2; T[] newItems = new T[newCapacity]; if (_size > 0) { Array.Copy(_items, 0, newItems, 0, _size); } _items = newItems; _items[_size++] = item; }

Unfortunately, this implementation is not eligible for method inlining by the JIT compiler, because it is too large. Even if we try to “force” the method inlining by using MethodImplOptions.AggressiveInlining nothing good will happen. As we’ll see at the end of the post, inlining big methods doesn’t improve performance.

There is no official documentation regarding the JIT compiler optimizations. But there are enough unofficial documents (like blog-posts) that cover current behavior. The JIT compiler won’t inline a method in the following cases:

Method is marked with MethodImplOptions.NoInlining

Method body is larger than 32 bytes of the IL code

Virtual method calls including interface method invocation

Method has complex flow control like switch, while

Method has exception handling logic

Add method shown above fits into the second rule and won’t be inlined due to its size.

We know that the method has two cases: light-weight common path when the list has enough space and the rare case when the list should be resized. Based on this knowledge we can extract the logic for a rare case and leave the happy-path code as is:

public void Add(T item) { if (_size < _items.Length) { // Common path: the array has enough space _items[_size++] = item; return; } // Rare case: need to resize the array AddItemSlow(item); } private void AddItemSlow(T item) { int newCapacity = _items.Length == 0 ? _defaultCapacity : _items.Length * 2; T[] newItems = new T[newCapacity]; if (_size > 0) { Array.Copy(_items, 0, newItems, 0, _size); } _items = newItems; _items[_size++] = item; }

The change is very simple, but it makes the method small enough for method inlining by the JIT compiler. We can check the difference using the amazing benchmarking tool – BenchmarkDotNet:

const int DefaultCapacity = 100; const int NumberOfElements = 500000; [Benchmark] public void InlinableList() { var list = new OptimizedList<int>(DefaultCapacity); for (int i = 0; i < NumberOfElements; i++) { list.Add(i); } } [Benchmark] public void NonInlinableList() { var list = new NonOptimizedList<int>(DefaultCapacity); for (int i = 0; i < NumberOfElements; i++) { list.Add(i); } }

Method | Mean | StdErr | ----------------- |---------- |---------- | InlinableList | 3.3074 ms | 0.0373 ms | NonInlinableList | 3.9557 ms | 0.0097 ms |

As you can see the difference is pretty significant (about 20% for a method that just adds a 500 000 elements). You may wander, why the difference is so big? The asymptotic complexity for the Add method is O(N), but the amortized complexity is O(1). The underlying array grows exponentially, so in the vast majority of cases adding an element to the list so cheap that the method call overhead starts playing a reasonable role.

In Visual Studio, you can check that the JIT compiler actually inlines the method. To do that, add a breakpoint to the both benchmark method, run the app in Release mode, and then switch to Disassembly window (Debug => Windows => Disassembly). But be aware, that you need to disable two options Tools => Options => Debugging => General menu: ‘Suppress JIT optimization on module load’ and ‘Enable Just My Code’.

When to use the pattern?

This is not a common pattern that would be helpful in every application. It may be useful for extremely hot paths in end-user production code. However it is more likely to be helpful for high-performance libraries and frameworks where even slight overhead is critical since they may be called on the hot path.

For instance, this pattern is used in the BCL and several places in TPL Dataflow (StartTaskSafe, OfferAsyncIfNececssary, ProcessAsyncIfNecessary etc):

internal static Exception StartTaskSafe(Task task, TaskScheduler scheduler) { if (scheduler == TaskScheduler.Default) { task.Start(scheduler); // We don't need to worry about scheduler exceptions // from the default scheduler. return null; } // Slow path with try/catch separated out so that // StartTaskSafe may be inlined in the common case. else return StartTaskSafeCore(task, scheduler); }

You may consider using this pattern if you have a highly-used member with two distinct scenarios: fast and common, and heavyweight and rare. If the heavyweight part is not inline friendly due to its size or language constructs (like, try/finally) then splitting methods in two will help the CLR JIT to inline the whole method.

Beware of too much inlining

You may think that the method inlining is such a good thing, that frameworks and libraries should enforce the runtime to inline as much as possible using MethodImplOptions.AggressiveInlining that was added in the .NET 4.5. First of all, the attribute removes only the restriction for method inlining based on a method size and won’t force the runtime to inline virtual methods (even ‘sealed’ ones), or methods with complicated control flow or exception handling.

But the main reason for not using the attribute without careful measurement is the speed. The method inlining is double-edged sword. Method inlining reduces number of instructions but can make the resulting code bigger.

For instance, we can force method inlining for our original implementation of the Add method, but it won’t yield any performance benefits. Actually, the overall performance would be slightly worse (for more information see “To Inline or not to Inline: That is the question” by Vance Morrison):

Method | Mean | StdErr | StdDev | Median | ----------------------------------- |---------- |---------- |---------- |---------- | NonInlinableList | 3.7030 ms | 0.0116 ms | 0.0402 ms | 3.6858 ms | OriginalListWithAggressiveInlining | 3.8335 ms | 0.0381 ms | 0.2347 ms | 3.7086 ms | InlinableList | 3.3077 ms | 0.0457 ms | 0.2238 ms | 3.1836 ms |

Additional links and resources