The presenter Eric Brumer (from Visual C++ Compiler Team) talked, in quite unique way, about deep down details of code optimizations. Why it is better to use compiler to do the hard work. Why new and powerful FMAD instructions can sometimes slow down your code. And how to generally think about code performance.

Summary

Visual Studio has support for code generation using SIMD instructions: /arch:SSE /arch:SSE2 and then /arch:AVX and /arch:AVX2 . The last one will be available for VS 2013 Update 2 and on Intel Haswell chips only.

. The last one will be available for VS 2013 Update 2 and on Intel Haswell chips only. Profile, profile, profile ! I hear this all the time when watching/reading any presentation talking about performance. Maybe they are all right! :)

! I hear this all the time when watching/reading any presentation talking about performance. Maybe they are all right! :) FMA can slow down the code!

It will be faster for a = yx + z , but not for a = yx + zw

, but not for

For Intel mul is 5 cycles, add is 3 cycyles, FMA is 5.



So for the latter equation two muls will be executed in parallel and then added - in total 8 cycles



FMA version will first use mul for zw and then use FMA - in total 10 cycles.

and then use FMA - in total 10 cycles.

Conclusion: be careful

256 bit code does not run 2X faster than 128 bit!

Computation and instruction execution is 2x faster, but we need to wait for memory



Highly efficient code is actually memory efficient code.

In the last part of the presentation there was an analysis of a performance bug in Eigen3 math library

Compiling with /arch:AVX2 (and /arch:AVX) caused 60% slowdown on Haswell chips!



BTW: there had no difference between /arch:SSE2 and /arch:AVX on Sandy Bridge



problem was cause by bottleneck in Cpu Store Buffer - I haven't heard about that before, but using this thing carefully can give you a huge boost (or problems :))



Here is a nice looking link with some more info about Store Buffers on Sandy and Haswell



CPUs are so powerful that they can 'analyze' the code and sometimes this can introduce secondary such bugs. Need to know profiler tools to properly analyze such situations.

Wrap up:

Highly efficient code is actually memory efficient code.

Overally the presentation was great!