From Paul Turner <> Date Thu, 4 Jan 2018 01:10:47 -0800 Subject [RFC] Retpoline: Binary mitigation for branch-target-injection (aka "Spectre") Apologies for the discombobulation around today's disclosure. Obviously the

original goal was to communicate this a little more coherently, but the

unscheduled advances in the disclosure disrupted the efforts to pull this

together more cleanly.



I wanted to open discussion the "retpoline" approach and and define its

requirements so that we can separate the core

details from questions regarding any particular implementation thereof.



As a starting point, a full write-up describing the approach is available at:

https://support.google.com/faqs/answer/7625886



The 30 second version is:

Returns are a special type of indirect branch. As function returns are intended

to pair with function calls, processors often implement dedicated return stack

predictors. The choice of this branch prediction allows us to generate an

indirect branch in which speculative execution is intentionally redirected into

a controlled location by a return stack target that we control. Preventing

branch target injections (also known as "Spectre") against these binaries.



On the targets (Intel Xeon) we have measured so far, cost is within cycles of a

"native" indirect branch for which branch prediction hardware has been disabled.

This is unfortunately measurable -- from 3 cycles on average to about 30.

However the cost is largely mitigated for many workloads since the kernel uses

comparatively few indirect branches (versus say, a C++ binary). With some

effort we have the average overall overhead within the 0-1.5% range for our

internal workloads, including some particularly high packet processing engines.



There are several components, the majority of which are independent of kernel

modifications:



(1) A compiler supporting retpoline transformations.

(1a) Optionally: annotations for hand-coded indirect jmps, so that they may be

made compatible with (1).

[ Note: The only known indirect jmp which is not safe to convert, is the

early virtual address check in head entry. ]

(2) Kernel modifications for preventing return-stack underflow (see document

above).

The key points where this occurs are:

- Context switches (into protected targets)

- interrupt return (we return into potentially unwinding execution)

- sleep state exit (flushes cashes)

- guest exit.

(These can be run-time gated, a full refill costs 30-45 cycles.)

(3) Optional: Optimizations so that direct branches can be used for hot kernel

indirects. While as discussed above, kernel execution generally depends on

fewer indirect branches, there are a few places (in particular, the

networking stack) where we have chained sequences of indirects on hot paths.

(4) More general support for guarding against RSB underflow in an affected

target. While this is harder to exploit and may not be required for many

users, the approaches we have used here are not generally applicable.

Further discussion is required.



With respect to the what these deltas mean for an unmodified kernel:

(1a) At minimum annotation only. More complicated, config and

run-time gated options are also possigble.

(2) Trivially run-time & config gated.

(3) The de-virtualizing of these branches improves performance in both the

retpoline and non-retpoline cases.



For an out of the box kernel that is reasonably protected, (1)-(3) are required.



I apologize that this does not come with a clean set of patches, merging the

things that we and Intel have looked at here. That was one of the original

goals for this week. Strictly speaking, I think that Andi, David, and I have

a fair amount of merging and clean-up to do here. This is an attempt

to keep discussion of the fundamentals at least independent of that.



I'm trying to keep the above reasonably compact/dense. I'm happy to expand on

any details in sub-threads. I'll also link back some of the other compiler work

which is landing for (1).



Thanks,



- Paul



