The scripting virtual machine team at Unity is always looking for ways to make your code run faster. This is the first post in a three part miniseries about a few micro-optimizations performed by the IL2CPP AOT compiler, and how you can take advantage of them. While nothing here will make code run two or three times as fast, these small optimizations can help in important parts of a game, and we hope they give you some insight into how your code is executing.

Modern compilers are excellent at performing many optimizations to improve run time code performance. As developers, we can often help our compilers by making information we know about the code explicit to the compiler. Today we’ll explore one micro-optimization for IL2CPP in some detail, and see how it might improve the performance of your existing code.

Devirtualization

There is no other way to say it, virtual method calls are always more expensive than direct method calls. We’ve been working on some performance improvements in the libil2cpp runtime library to cut back the overhead of virtual method calls (more on this in the next post), but they still require a runtime lookup of some sort. The compiler cannot know which method will be called at run time – or can it?

Devirtualization is a common compiler optimization tactic which changes a virtual method call into a direct method call. A compiler might apply this tactic when it can prove exactly which actual method will be called at compile time. Unfortunately, this fact can often be difficult to prove, as the compiler does not always see the entire code base. But when it is possible, it can make virtual method calls much faster.

The canonical example

As a young developer, I learned about virtual methods with a rather contrived animal example. This code might be familiar to you as well:

public abstract class Animal { public abstract string Speak(); } public class Cow : Animal { public override string Speak() { return "Moo"; } } public class Pig : Animal { public override string Speak() { return "Oink"; } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 public abstract class Animal { public abstract string Speak ( ) ; } public class Cow : Animal { public override string Speak ( ) { return "Moo" ; } } public class Pig : Animal { public override string Speak ( ) { return "Oink" ; } }

Then in Unity (version 5.3.5) we can use these classes to make a small farm:

public class Farm: MonoBehaviour { void Start () { Animal[] animals = new Animal[] {new Cow(), new Pig()}; foreach (var animal in animals) Debug.LogFormat("Some animal says '{0}'", animal.Speak()); var cow = new Cow(); Debug.LogFormat("The cow says '{0}'", cow.Speak()); } } 1 2 3 4 5 6 7 8 9 10 public class Farm : MonoBehaviour { void Start ( ) { Animal [ ] animals = new Animal [ ] { new Cow ( ) , new Pig ( ) } ; foreach ( var animal in animals ) Debug . LogFormat ( "Some animal says '{0}'" , animal . Speak ( ) ) ; var cow = new Cow ( ) ; Debug . LogFormat ( "The cow says '{0}'" , cow . Speak ( ) ) ; } }

Here each call to Speak is a virtual method call. Let’s see if we can convince IL2CPP to devirtualize any of these method calls to improve their performance.

Generated C++ code isn’t too bad

One of the features of IL2CPP I like is that it generates C++ code instead of assembly code. Sure, this code doesn’t look like C++ code you would write by hand, but it is much easier to understand than assembly. Let’s see the generated code for the body of that foreach loop:

// Set up a local variable to point to the animal array AnimalU5BU5D_t2837741914* L_5 = V_2; int32_t L_6 = V_3; int32_t L_7 = L_6; // Get the current animal from the array V_1 = ((L_5)->GetAt(static_cast<il2cpp_array_size_t>(L_7))); Animal_t3277885659 * L_9 = V_1; // Call the Speak method String_t* L_10 = VirtFuncInvoker0< String_t* >::Invoke(4 /* System.String AssemblyCSharp.Animal::Speak() */, L_9); 1 2 3 4 5 6 7 8 9 10 11 // Set up a local variable to point to the animal array AnimalU5BU5D_t2837741914* L_5 = V_2 ; int32_t L_6 = V_3 ; int32_t L_7 = L_6 ; // Get the current animal from the array V_1 = ( ( L_5 ) -> GetAt ( static_cast < il2cpp_array_size_t > ( L_7 ) ) ) ; Animal_t3277885659 * L_9 = V_1 ; // Call the Speak method String_t* L_10 = VirtFuncInvoker0 < String_t* > :: Invoke ( 4 /* System.String AssemblyCSharp.Animal::Speak() */ , L_9 ) ;

I’ve removed a bit of the generated code to simplify things. See that ugly call to Invoke? It is going to lookup the proper virtual method in the vtable and then call it. This vtable lookup will be slower than a direct function call, but that is understandable. The Animal could be a Cow or a Pig, or some other derived type.

Let’s look at the generated code for the second call to Debug.LogFormat, which is more like a direct method call:

// Create a new cow Cow_t1312235562 * L_14 = (Cow_t1312235562 *)il2cpp_codegen_object_new(Cow_t1312235562_il2cpp_TypeInfo_var); Cow__ctor_m2285919473(L_14, /*hidden argument*/NULL); V_4 = L_14; Cow_t1312235562 * L_16 = V_4; // Call the Speak method String_t* L_17 = VirtFuncInvoker0< String_t* >::Invoke(4 /* System.String AssemblyCSharp.Cow::Speak() */, L_16); 1 2 3 4 5 6 7 8 // Create a new cow Cow_t1312235562 * L_14 = ( Cow_t1312235562 * ) il2cpp_codegen_object_new ( Cow_t1312235562_il2cpp_TypeInfo_var ) ; Cow__ctor_m2285919473 ( L_14 , /*hidden argument*/ NULL ) ; V_4 = L_14 ; Cow_t1312235562 * L_16 = V_4 ; // Call the Speak method String_t* L_17 = VirtFuncInvoker0 < String_t* > :: Invoke ( 4 /* System.String AssemblyCSharp.Cow::Speak() */ , L_16 ) ;

Even in this case we are still making the virtual method call! IL2CPP is pretty conservative with optimizations, preferring to ensure correctness in most cases. Since it does not do enough whole-program analysis to be sure that this can be a direct call, it opts for the safer (and slower) virtual method call.

Suppose we know that there are no other types of cows on our farm, so no type will ever derive from Cow. If we make this knowledge explicit to the compiler, we can get a better result. Let’s change the class to be defined like this:

public sealed class Cow : Animal { public override string Speak() { return "Moo"; } } 1 2 3 4 5 public sealed class Cow : Animal { public override string Speak ( ) { return "Moo" ; } }

The sealed keyword tells the compiler that no one can derive from Cow (sealed could also be used directly on the Speak method). Now IL2CPP will have the confidence to make a direct method call:

// Create a new cow Cow_t1312235562 * L_14 = (Cow_t1312235562 *)il2cpp_codegen_object_new(Cow_t1312235562_il2cpp_TypeInfo_var); Cow__ctor_m2285919473(L_14, /*hidden argument*/NULL); V_4 = L_14; Cow_t1312235562 * L_16 = V_4; // Look ma, no virtual call! String_t* L_17 = Cow_Speak_m1607867742(L_16, /*hidden argument*/NULL); 1 2 3 4 5 6 7 8 // Create a new cow Cow_t1312235562 * L_14 = ( Cow_t1312235562 * ) il2cpp_codegen_object_new ( Cow_t1312235562_il2cpp_TypeInfo_var ) ; Cow__ctor_m2285919473 ( L_14 , /*hidden argument*/ NULL ) ; V_4 = L_14 ; Cow_t1312235562 * L_16 = V_4 ; // Look ma, no virtual call! String_t* L_17 = Cow_Speak_m1607867742 ( L_16 , /*hidden argument*/ NULL ) ;

The call to Speak here will not be unnecessarily slow, since we’ve been very explicit with the compiler and allowed it to optimize with confidence.

This kind of optimization won’t make your game incredibly faster, but it is a good practice to express any assumptions you have about the code in the code, both for future human readers of that code and for compilers. If you are compiling with IL2CPP, I encourage you to peruse the generated C++ code in your project and see what else you might find!

Next time we’ll discuss why virtual method calls are expensive, and what we are doing to make them faster.