First noticed by dewresearch, Delphi XE6 introduced a new optimization for inlined functions that return a floating-point value.

Here is an exploration of what was improved… and what was not improved.



When inlining was introduced in Delphi, one limitation was that functions returning a floating point values would incur an unnecessary round-trip to the stack, which for short/simple math functions could sometimes not just negate the benefits of inlining, but make the performance worse.

With XE6, that roundtrip seems to have been optimized away in some cases.

A look at the most trivial case

Here are test cases for the usual conventions for functions returning a floating point value:

function GetFloat : Double; begin Result := 0; end; function GetFloatInline : Double; inline; begin Result := 0; end; procedure GetFloatVar(var Result : Double); begin Result := 0; end; procedure GetFloatVarInline(var Result : Double); inline; begin Result := 0; end;

The procedure variants traditionally offered higher performance than function, by eliminating all round-trips to the stack.

Here is the Delphi XE6 compiler output for calls to those functions, the inlined variants are nice and tight:

Unit1.pas.48: f := GetFloat; 005D7357 E8DCFFFFFF call GetFloat 005D735C DD1C24 fstp qword ptr [esp] 005D735F 9B wait Unit1.pas.49: f := GetFloatInline; 005D7360 33C0 xor eax,eax 005D7362 890424 mov [esp],eax 005D7365 89442404 mov [esp+$04],eax Unit1.pas.50: GetFloatVar(f); 005D7369 8BC4 mov eax,esp 005D736B E8DCFFFFFF call GetFloatVar Unit1.pas.51: GetFloatVarInline(f); 005D7370 33C0 xor eax,eax 005D7372 890424 mov [esp],eax 005D7375 89442404 mov [esp+$04],eax

By comparison, here is Delphi XE compiler output for the GetFloatInline call. The output is unchanged for the other calls.

Unit1.pas.49: f := GetFloatInline; 004AB664 33C0 xor eax,eax 004AB666 89442408 mov [esp+$08],eax 004AB66A 8944240C mov [esp+$0c],eax 004AB66E 8B442408 mov eax,[esp+$08] // stack juggling 004AB672 890424 mov [esp],eax // stack juggling 004AB675 8B44240C mov eax,[esp+$0c] // stack juggling 004AB679 89442404 mov [esp+$04],eax // stack juggling

And that’s just for the call (you have other induced overhead in the pre-amble and post-amble), and just for a trivial function returning a constant.

So Delphi XE6 compiler demonstrates a clear advantage.

What about the non-inlined functions?

Well, nothing changed, and the procedure variant still has the edge, a function returning a float will still exhibit the stack round-trip in XE6 in the same way as Delphi XE:

Unit1.pas.24: function GetFloat : Double; Unit1.pas.25: begin 005D7338 83C4F8 add esp,-$08 Unit1.pas.26: Result := 0; 005D733B 33C0 xor eax,eax 005D733D 890424 mov [esp],eax 005D7340 89442404 mov [esp+$04],eax Unit1.pas.27: end; 005D7344 DD0424 fld qword ptr [esp] 005D7347 59 pop ecx 005D7348 5A pop edx 005D7349 C3 ret Unit1.pas.34: function GetFloatVar(var Result : Double); Unit1.pas.35: begin Unit1.pas.36: Result := 0; 005D734C 33D2 xor edx,edx 005D734E 8910 mov [eax],edx 005D7350 895004 mov [eax+$04],edx Unit1.pas.37: end; 005D7353 C3 ret

Next: A marginally more complex case & Conclusion