This post is based on the theoretical part “Hoisting in .NET Explained” It contains some examples of the JIT hoisting optimization with assembly listings. Take a look at the previous post please if you haven’t read it before.

Local variables, arguments and fields

This is the basic example of the hoisting optimization. I combined local variable, argument and field access examples into one example with a field access. Because the semantic of them is very similar, we don’t read a value from memory at each iteration but put it into register.

public class HoistingField { public int a = 123 ; public int Run () { var sum = 0 ; for ( var i = 0 ; i < 11 ; i ++) sum += a ; return sum ; } }

00007fff`23b907b0 33c0 xor eax,eax 00007fff`23b907b2 33d2 xor edx,edx ; put the field value into register 00007fff`23b907b4 8b4908 mov ecx,dword ptr [rcx+8] ; iteration starts here 00007fff`23b907b7 03c1 add eax,ecx 00007fff`23b907b9 ffc2 inc edx 00007fff`23b907bb 83fa0b cmp edx,0Bh 00007fff`23b907be 7cf7 jl 00007fff`23b907b7 ; iteration ends here 00007fff`23b907c0 c3 ret

Array’s length and element

This is a classic example. We access the array’s length property and an element value in a loop. JIT is smart enough to move those reads outside the loop.

public class HoistingArray { public int Run ( int [] arr ) { var sum = 0 ; for ( var i = 0 ; i < arr . Length ; i ++) sum += arr [ 1 ] + arr [ i ]; return sum ; } }

00007fff`23bb0580 4883ec28 sub rsp,28h 00007fff`23bb0584 33c0 xor eax,eax 00007fff`23bb0586 33c9 xor ecx,ecx ; put the array's length into register 00007fff`23bb0588 448b4208 mov r8d,dword ptr [rdx+8] 00007fff`23bb058c 4585c0 test r8d,r8d 00007fff`23bb058f 7e1f jle 00007fff`23bb05b0 00007fff`23bb0591 4183f801 cmp r8d,1 00007fff`23bb0595 761e jbe 00007fff`23bb05b5 ; put the array's first element into register 00007fff`23bb0597 448b4a14 mov r9d,dword ptr [rdx+14h] ; iteration starts here 00007fff`23bb059b 4103c1 add eax,r9d 00007fff`23bb059e 4c63d1 movsxd r10,ecx 00007fff`23bb05a1 468b549210 mov r10d,dword ptr [rdx+r10*4+10h] 00007fff`23bb05a6 4103c2 add eax,r10d 00007fff`23bb05a9 ffc1 inc ecx 00007fff`23bb05ab 443bc1 cmp r8d,ecx 00007fff`23bb05ae 7feb jg 00007fff`23bb059b ; iteration ends here 00007fff`23bb05b0 4883c428 add rsp,28h 00007fff`23bb05b4 c3 ret 00007fff`23bb05b5 e84e14a85f call clr!TranslateSecurityAttributes+0x900d4 (00007fff`83631a08) (JitHelp: CORINFO_HELP_RNGCHKFAIL) 00007fff`23bb05ba cc int 3

JIT helper method call

Actually, this case attracted my attention to the hoisting optimization. JIT can hoist calls to its internal helper methods. This example is from my post “.NET Generics under the hood” Take a look if you want to get familiar with the topic.

In this example we have a Generic class and we call another Generic method in its method. This means that we need to perform an expensive Handle lookup at Runtime before calling the method. Good news! JIT can optimize and move that call outside the loop.

public class HoistingJitHelperMethod < T > { private List < T > _list = new List < T >(); public void Run () { for ( var i = 0 ; i < 11 ; i ++) if ( list . Any ()) return ; } }

00007ffc`8c2e0b00 57 push rdi 00007ffc`8c2e0b01 56 push rsi 00007ffc`8c2e0b02 53 push rbx 00007ffc`8c2e0b03 4883ec30 sub rsp,30h 00007ffc`8c2e0b07 48894c2428 mov qword ptr [rsp+28h],rcx 00007ffc`8c2e0b0c 488bf1 mov rsi,rcx 00007ffc`8c2e0b0f 33ff xor edi,edi 00007ffc`8c2e0b11 488b0e mov rcx,qword ptr [rsi] 00007ffc`8c2e0b14 48ba281c328cfc7f0000 mov rdx,7FFC8C321C28h ; execute the expensive call only once outside the loop 00007ffc`8c2e0b1e e80d90685f call clr!DllCanUnloadNowInternal+0x32c90 (00007ffc`eb969b30) (JitHelp: CORINFO_HELP_RUNTIMEHANDLE_CLASS) 00007ffc`8c2e0b23 488bd8 mov rbx,rax ; iteration starts here 00007ffc`8c2e0b26 488bcb mov rcx,rbx 00007ffc`8c2e0b29 488b5608 mov rdx,qword ptr [rsi+8] 00007ffc`8c2e0b2d e80edacc5c call System_Core_ni+0x2be540 (00007ffc`e8fae540) (System.Linq.Enumerable.Any[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1<System.__Canon>), mdToken: 0000000006000732) 00007ffc`8c2e0b32 84c0 test al,al 00007ffc`8c2e0b34 7408 je 00007ffc`8c2e0b3e 00007ffc`8c2e0b36 4883c430 add rsp,30h 00007ffc`8c2e0b3a 5b pop rbx 00007ffc`8c2e0b3b 5e pop rsi 00007ffc`8c2e0b3c 5f pop rdi 00007ffc`8c2e0b3d c3 ret 00007ffc`8c2e0b3e ffc7 inc edi 00007ffc`8c2e0b40 81ff00127a00 cmp edi,7A1200h 00007ffc`8c2e0b46 7cde jl 00007ffc`8c2e0b26 ; iteration ends here 00007ffc`8c2e0b48 4883c430 add rsp,30h 00007ffc`8c2e0b4c 5b pop rbx 00007ffc`8c2e0b4d 5e pop rsi 00007ffc`8c2e0b4e 5f pop rdi 00007ffc`8c2e0b4f c3 ret

Static field

JIT doesn’t hoist static fields. The reason is unclear for me. There’s a discussion happening on github. Multithreading code, backward compatibility, a bug?

public class HoistingStatic { public static int a = 123 ; public int Run () { var sum = 0 ; for ( var i = 0 ; i < 11 ; i ++) sum += a ; return sum ; } }

00007ffa`a0520590 33c0 xor eax,eax 00007ffa`a0520592 33d2 xor edx,edx ; iteration starts here ; read a value from the main memory 00007ffa`a0520594 8b0dc241efff mov ecx,dword ptr [00007ffa`a041475c] 00007ffa`a052059a 03c1 add eax,ecx 00007ffa`a052059c ffc2 inc edx 00007ffa`a052059e 83fa0b cmp edx,0Bh 00007ffa`a05205a1 7cf1 jl 00007ffa`a0520594 ; iteration ends here 00007ffa`a05205a3 c3 ret

Try catch block

JIT doesn’t optimize the loop that starts with try block.

public class HoistingTryCatchBlock { public int Run ( int a ) { var sum = 0 ; for ( var i = 0 ; i < 11 ; i ++) { try { sum += a ; } catch { } } return sum ; } }

00007fff`23b706f0 55 push rbp 00007fff`23b706f1 4883ec10 sub rsp,10h 00007fff`23b706f5 488d6c2410 lea rbp,[rsp+10h] 00007fff`23b706fa 48892424 mov qword ptr [rsp],rsp 00007fff`23b706fe 895518 mov dword ptr [rbp+18h],edx 00007fff`23b70701 33c0 xor eax,eax 00007fff`23b70703 8945fc mov dword ptr [rbp-4],eax 00007fff`23b70706 8945f8 mov dword ptr [rbp-8],eax ; iteration starts here ; read values from the stack 00007fff`23b70709 8b45fc mov eax,dword ptr [rbp-4] 00007fff`23b7070c 8b5518 mov edx,dword ptr [rbp+18h] 00007fff`23b7070f 03c2 add eax,edx 00007fff`23b70711 8945fc mov dword ptr [rbp-4],eax 00007fff`23b70714 8b45f8 mov eax,dword ptr [rbp-8] 00007fff`23b70717 ffc0 inc eax 00007fff`23b70719 8945f8 mov dword ptr [rbp-8],eax 00007fff`23b7071c 8b45f8 mov eax,dword ptr [rbp-8] 00007fff`23b7071f 83f80b cmp eax,0Bh 00007fff`23b70722 7ce5 jl 00007fff`23b70709 ; iteration ends here 00007fff`23b70724 8b45fc mov eax,dword ptr [rbp-4] 00007fff`23b70727 488d6500 lea rsp,[rbp] 00007fff`23b7072b 5d pop rbp 00007fff`23b7072c c3 ret 00007fff`23b7072d 55 push rbp 00007fff`23b7072e 4883ec10 sub rsp,10h 00007fff`23b70732 488b29 mov rbp,qword ptr [rcx] 00007fff`23b70735 48892c24 mov qword ptr [rsp],rbp 00007fff`23b70739 488d6d10 lea rbp,[rbp+10h] 00007fff`23b7073d 488d05d0ffffff lea rax,[00007fff`23b70714] 00007fff`23b70744 4883c410 add rsp,10h 00007fff`23b70748 5d pop rbp 00007fff`23b70749 c3 ret

Many exits in a loop

JIT doesn’t hoist loops with many exits. It doesn’t know what branch will be executed and tries not to add more work here. It optimizes only the first entry block in that case.

public class HoistingManyExits { public int a = 123 ; public int Run () { var sum = 0 ; for ( var i = 0 ; i < 11 ; i ++) { if ( sum > 123 ) return sum ; sum += a ; } return sum ; } }

00007fff`23ba0810 33c0 xor eax,eax 00007fff`23ba0812 33d2 xor edx,edx ; iteration starts here 00007fff`23ba0814 83f87b cmp eax,7Bh 00007fff`23ba0817 7e01 jle 00007fff`23ba081a 00007fff`23ba0819 c3 ret ; read the field's value from the main memory 00007fff`23ba081a 448b4108 mov r8d,dword ptr [rcx+8] 00007fff`23ba081e 4103c0 add eax,r8d 00007fff`23ba0821 ffc2 inc edx 00007fff`23ba0823 83fa0b cmp edx,0Bh 00007fff`23ba0826 7cec jl 00007fff`23ba0814 ; iteration ends here 00007fff`23ba0828 c3 ret

Many variables

The following example shows a case when there’s not enough registers and the operation isn’t expensive. JIT doesn’t optimize the code in this case.

public class HoistingManyVars { public int a = 123 ; public int Run ( int x1 , int x2 , int x3 , int x4 , int x5 , int x6 , int x7 , int x8 , int x9 , int x10 ) { var sum = 0 ; for ( var i = 0 ; i < 11 ; i ++) { sum += x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + a ; } return sum ; } }

00007fff`23b808b0 4157 push r15 00007fff`23b808b2 4156 push r14 00007fff`23b808b4 4154 push r12 00007fff`23b808b6 57 push rdi 00007fff`23b808b7 56 push rsi 00007fff`23b808b8 55 push rbp 00007fff`23b808b9 53 push rbx 00007fff`23b808ba 8b442460 mov eax,dword ptr [rsp+60h] 00007fff`23b808be 448b542468 mov r10d,dword ptr [rsp+68h] 00007fff`23b808c3 448b5c2470 mov r11d,dword ptr [rsp+70h] 00007fff`23b808c8 8b742478 mov esi,dword ptr [rsp+78h] 00007fff`23b808cc 8bbc2480000000 mov edi,dword ptr [rsp+80h] 00007fff`23b808d3 8b9c2488000000 mov ebx,dword ptr [rsp+88h] 00007fff`23b808da 8bac2490000000 mov ebp,dword ptr [rsp+90h] 00007fff`23b808e1 4533f6 xor r14d,r14d 00007fff`23b808e4 4533ff xor r15d,r15d ; iteration starts here 00007fff`23b808e7 4403f2 add r14d,edx 00007fff`23b808ea 4503f0 add r14d,r8d 00007fff`23b808ed 4503f1 add r14d,r9d 00007fff`23b808f0 4403f0 add r14d,eax 00007fff`23b808f3 4503f2 add r14d,r10d 00007fff`23b808f6 4503f3 add r14d,r11d 00007fff`23b808f9 4403f6 add r14d,esi 00007fff`23b808fc 4403f7 add r14d,edi 00007fff`23b808ff 4403f3 add r14d,ebx 00007fff`23b80902 4403f5 add r14d,ebp ; read the field's value from the main memory 00007fff`23b80905 448b6108 mov r12d,dword ptr [rcx+8] 00007fff`23b80909 4503f4 add r14d,r12d 00007fff`23b8090c 41ffc7 inc r15d 00007fff`23b8090f 4183ff0b cmp r15d,0Bh 00007fff`23b80913 7cd2 jl 00007fff`23b808e7 ; iteration ends here 00007fff`23b80915 418bc6 mov eax,r14d 00007fff`23b80918 5b pop rbx 00007fff`23b80919 5d pop rbp 00007fff`23b8091a 5e pop rsi 00007fff`23b8091b 5f pop rdi 00007fff`23b8091c 415c pop r12 00007fff`23b8091e 415e pop r14 00007fff`23b80920 415f pop r15 00007fff`23b80922 c3 ret

Math functions & double type

In the following example I wanted to check double type and Math functions hoisting. I expected to see the int Math.Abs(int x) function to be hoisted. But it wasn’t. Who can explain that? Math.Pow() isn’t hoisted either, I assume because it operates with double type.

public class HoistingMath { public double Run ( int a ) { var sum = 0d ; for ( var i = 0 ; i < 11 ; i ++) sum += Math . Abs ( a ) + Math . Pow ( 2 , 2 ); return sum ; } }

00007fff`23b80970 57 push rdi 00007fff`23b80971 56 push rsi 00007fff`23b80972 53 push rbx 00007fff`23b80973 4883ec30 sub rsp,30h 00007fff`23b80977 c4e17829742420 vmovaps xmmword ptr [rsp+20h],xmm6 00007fff`23b8097e c5f877 vzeroupper 00007fff`23b80981 8bf2 mov esi,edx 00007fff`23b80983 c4e14957f6 vxorpd xmm6,xmm6,xmm6 00007fff`23b80988 33ff xor edi,edi ; iteration starts here 00007fff`23b8098a 85f6 test esi,esi 00007fff`23b8098c 7c04 jl 00007fff`23b80992 00007fff`23b8098e 8bde mov ebx,esi 00007fff`23b80990 eb09 jmp 00007fff`23b8099b 00007fff`23b80992 8bce mov ecx,esi ; call Math.Abs() 00007fff`23b80994 e8a7c0545e call mscorlib_ni+0x45ca40 (00007fff`820cca40) (System.Math.AbsHelper(Int32), mdToken: 0000000006000f17) 00007fff`23b80999 8bd8 mov ebx,eax 00007fff`23b8099b c4e17b100544000000 vmovsd xmm0,qword ptr [00007fff`23b809e8] 00007fff`23b809a4 c4e17b100d43000000 vmovsd xmm1,qword ptr [00007fff`23b809f0] ; call Math.Pow() 00007fff`23b809ad e86e45d15f call clr!NGenCreateNGenWorker+0xa7880 (00007fff`83894f20) (System.Math.Pow(Double, Double), mdToken: 0000000006000f10) 00007fff`23b809b2 c4e17057c9 vxorps xmm1,xmm1,xmm1 00007fff`23b809b7 c4e1732acb vcvtsi2sd xmm1,xmm1,ebx 00007fff`23b809bc c4e17b58c1 vaddsd xmm0,xmm0,xmm1 00007fff`23b809c1 c4e14b58f0 vaddsd xmm6,xmm6,xmm0 00007fff`23b809c6 ffc7 inc edi 00007fff`23b809c8 83ff0b cmp edi,0Bh 00007fff`23b809cb 7cbd jl 00007fff`23b8098a ; iteration ends here 00007fff`23b809cd c4e17828c6 vmovaps xmm0,xmm6 00007fff`23b809d2 c5f877 vzeroupper 00007fff`23b809d5 c4e17828742420 vmovaps xmm6,xmmword ptr [rsp+20h] 00007fff`23b809dc 4883c430 add rsp,30h 00007fff`23b809e0 5b pop rbx 00007fff`23b809e1 5e pop rsi 00007fff`23b809e2 5f pop rdi 00007fff`23b809e3 c3 ret

Not a “do-while” loop

In the following example JIT isn’t sure that the loop will be executed. JIT tries to optimize the path that definitely will be executed, so that it doesn’t perform unnecessary read from the main memory before iteration.

public class HoistingNotDoWhile { public int a = 123 ; public int Run () { var sum = 0 ; for (; ShouldContinue (); ) { sum += a ; } return sum ; } }

00007fff`23ba0a40 57 push rdi 00007fff`23ba0a41 56 push rsi 00007fff`23ba0a42 4883ec28 sub rsp,28h 00007fff`23ba0a46 488bf1 mov rsi,rcx 00007fff`23ba0a49 33ff xor edi,edi 00007fff`23ba0a4b 488bce mov rcx,rsi 00007fff`23ba0a4e e835f8ffff call 00007fff`23ba0288 (HoistingInDotNetExamples.HoistingNotDoWhile.ShouldContinue(), mdToken: 000000000600000a) 00007fff`23ba0a53 84c0 test al,al 00007fff`23ba0a55 7411 je 00007fff`23ba0a68 ; iteration starts here ; read the field's value from the main memory 00007fff`23ba0a57 8b4e08 mov ecx,dword ptr [rsi+8] 00007fff`23ba0a5a 03f9 add edi,ecx 00007fff`23ba0a5c 488bce mov rcx,rsi 00007fff`23ba0a5f e824f8ffff call 00007fff`23ba0288 (HoistingInDotNetExamples.HoistingNotDoWhile.ShouldContinue(), mdToken: 000000000600000a) 00007fff`23ba0a64 84c0 test al,al 00007fff`23ba0a66 75ef jne 00007fff`23ba0a57 ; iteration ends here 00007fff`23ba0a68 8bc7 mov eax,edi 00007fff`23ba0a6a 4883c428 add rsp,28h 00007fff`23ba0a6e 5e pop rsi 00007fff`23ba0a6f 5f pop rdi 00007fff`23ba0a70 c3 ret

Epilogue

We’ve taken a close look at some examples of the JIT hoisting optimization. You can find those examples on github. Please create a pull request if you want to add an interesting case. Thank you for your time