This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Spectre V1 diagnostic / mitigation

From: Richard Biener <rguenther at suse dot de>

To: gcc at gcc dot gnu dot org

Date: Tue, 18 Dec 2018 16:36:51 +0100 (CET)

Subject: Spectre V1 diagnostic / mitigation

Hi, in the past weeks I've been looking into prototyping both spectre V1 (speculative array bound bypass) diagnostics and mitigation in an architecture independent manner to assess feasability and some kind of upper bound on the performance impact one can expect. https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is an interesting read in this context as well. For simplicity I have implemented mitigation on GIMPLE right before RTL expansion and have chosen TLS to do mitigation across function boundaries. Diagnostics sit in the same place but both are not in any way dependent on each other. The mitigation strategy chosen is that of tracking speculation state via a mask that can be used to zero parts of the addresses that leak the actual data. That's similar to what aarch64 does with -mtrack-speculation (but oddly there's no mitigation there). I've optimized things to the point that is reasonable when working target independent on GIMPLE but I've only looked at x86 assembly and performance. I expect any "final" mitigation if we choose to implement and integrate such would be after RTL expansion since RTL expansion can end up introducing quite some control flow whose speculation state is not properly tracked by the prototype. I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local mitigation and =3 does mitigation global with passing the state via TLS memory. The following was measured on a Haswell desktop CPU: -O2 vs. -O2 -fspectre-v1=2 Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 400.perlbench 9770 245 39.8 * 9770 452 21.6 * 184% 401.bzip2 9650 378 25.5 * 9650 726 13.3 * 192% 403.gcc 8050 236 34.2 * 8050 352 22.8 * 149% 429.mcf 9120 223 40.9 * 9120 656 13.9 * 294% 445.gobmk 10490 400 26.2 * 10490 666 15.8 * 167% 456.hmmer 9330 388 24.1 * 9330 536 17.4 * 138% 458.sjeng 12100 437 27.7 * 12100 661 18.3 * 151% 462.libquantum 20720 300 69.1 * 20720 384 53.9 * 128% 464.h264ref 22130 451 49.1 * 22130 586 37.8 * 130% 471.omnetpp 6250 291 21.5 * 6250 398 15.7 * 137% 473.astar 7020 334 21.0 * 7020 522 13.5 * 156% 483.xalancbmk 6900 182 37.9 * 6900 306 22.6 * 168% Est. SPECint_base2006 -- Est. SPECint2006 -- -O2 -fspectre-v1=3 Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 400.perlbench 9770 497 19.6 * 203% 401.bzip2 9650 772 12.5 * 204% 403.gcc 8050 427 18.9 * 181% 429.mcf 9120 696 13.1 * 312% 445.gobmk 10490 726 14.4 * 181% 456.hmmer 9330 537 17.4 * 138% 458.sjeng 12100 721 16.8 * 165% 462.libquantum 20720 446 46.4 * 149% 464.h264ref 22130 613 36.1 * 136% 471.omnetpp 6250 471 13.3 * 162% 473.astar 7020 579 12.1 * 173% 483.xalancbmk 6900 350 19.7 * 192% Est. SPECint(R)_base2006 Not Run Est. SPECint2006 -- While the following was measured on a Zen Epyc server: -O2 vs -O2 -fspectre-v1=2 Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Copies Run Time Rate Copies Run Time Rate --------------- ------- --------- --------- ------- --------- --------- 500.perlbench_r 1 499 3.19 * 1 621 2.56 * 124% 502.gcc_r 1 286 4.95 * 1 392 3.61 * 137% 505.mcf_r 1 331 4.88 * 1 456 3.55 * 138% 520.omnetpp_r 1 454 2.89 * 1 563 2.33 * 124% 523.xalancbmk_r 1 328 3.22 * 1 569 1.86 * 173% 525.x264_r 1 518 3.38 * 1 776 2.26 * 150% 531.deepsjeng_r 1 365 3.14 * 1 448 2.56 * 123% 541.leela_r 1 598 2.77 * 1 729 2.27 * 122% 548.exchange2_r 1 460 5.69 * 1 756 3.46 * 164% 557.xz_r 1 403 2.68 * 1 586 1.84 * 145% Est. SPECrate2017_int_base 3.55 Est. SPECrate2017_int_peak 2.56 72% -O2 -fspectre-v2=3 Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Copies Run Time Rate Copies Run Time Rate --------------- ------- --------- --------- ------- --------- --------- 500.perlbench_r NR 1 700 2.27 * 140% 502.gcc_r NR 1 485 2.92 * 170% 505.mcf_r NR 1 596 2.71 * 180% 520.omnetpp_r NR 1 604 2.17 * 133% 523.xalancbmk_r NR 1 643 1.64 * 196% 525.x264_r NR 1 797 2.20 * 154% 531.deepsjeng_r NR 1 542 2.12 * 149% 541.leela_r NR 1 872 1.90 * 146% 548.exchange2_r NR 1 761 3.44 * 165% 557.xz_r NR 1 595 1.81 * 148% Est. SPECrate2017_int_base Not Run Est. SPECrate2017_int_peak 2.26 64% you can see, even thoug we're comparing apples and oranges, that the performance impact is quite dependent on the microarchitecture. Similarly interesting as performance is the effect on text size which is surprisingly high (_best_ case is 13 bytes per conditional branch plus 3 bytes per instrumented memory). CPU2016: BASE -O2 text data bss dec hex filename 1117726 20928 12704 1151358 11917e 400.perlbench 56568 3800 4416 64784 fd10 401.bzip2 3419568 7912 751520 4179000 3fc438 403.gcc 12212 712 11984 24908 614c 429.mcf 1460694 2081772 2330096 5872562 599bb2 445.gobmk 284929 5956 82040 372925 5b0bd 456.hmmer 130782 2152 2576896 2709830 295946 458.sjeng 41915 764 96 42775 a717 462.libquantum 505452 11220 372320 888992 d90a0 464.h264ref 638188 9584 14664 662436 a1ba4 471.omnetpp 38859 900 5216 44975 afaf 473.astar 4033878 140248 12168 4186294 3fe0b6 483.xalancbmk PEAK -O2 -fspectre-v1=2 text data bss dec hex filename 1508032 20928 12704 1541664 178620 400.perlbench 135% 76098 3800 4416 84314 1495a 401.bzip2 135% 4483530 7912 751520 5242962 500052 403.gcc 131% 16006 712 11984 28702 701e 429.mcf 131% 1647384 2081772 2330096 6059252 5c74f4 445.gobmk 112% 377259 5956 82040 465255 71967 456.hmmer 132% 164672 2152 2576896 2743720 29dda8 458.sjeng 126% 47901 764 96 48761 be79 462.libquantum 114% 649854 11220 372320 1033394 fc4b2 464.h264ref 129% 706908 9584 14664 731156 b2814 471.omnetpp 111% 48493 900 5216 54609 d551 473.astar 125% 4862056 140248 12168 5014472 4c83c8 483.xalancbmk 121% PEAK -O2 -fspectre-v1=3 text data bss dec hex filename 1742008 20936 12704 1775648 1b1820 400.perlbench 156% 83338 3808 4416 91562 165aa 401.bzip2 147% 5219850 7920 751520 5979290 5b3c9a 403.gcc 153% 17422 720 11984 30126 75ae 429.mcf 143% 1801688 2081780 2330096 6213564 5ecfbc 445.gobmk 123% 431827 5964 82040 519831 7ee97 456.hmmer 152% 182200 2160 2576896 2761256 2a2228 458.sjeng 139% 53773 772 96 54641 d571 462.libquantum 128% 691798 11228 372320 1075346 106892 464.h264ref 137% 976692 9592 14664 1000948 f45f4 471.omnetpp 153% 54525 908 5216 60649 ece9 473.astar 140% 5808306 140256 12168 5960730 5af41a 483.xalancbmk 144% CPU2017: BASE -O2 -g text data bss dec hex filename 2209713 8576 9080 2227369 21fca9 500.perlbench_r 9295702 37432 1150664 10483798 9ff856 502.gcc_r 21795 712 744 23251 5ad3 505.mcf_r 2067560 8984 46888 2123432 2066a8 520.omnetpp_r 5763577 142584 20040 5926201 5a6d39 523.xalancbmk_r 508402 6102 29592 544096 84d60 525.x264_r 84222 784 12138360 12223366 ba8386 531.deepsjeng_r 223480 8544 30072 262096 3ffd0 541.leela_r 70554 864 6384 77802 12fea 548.exchange2_r 180640 884 17704 199228 30a3c 557.xz_r PEAK -fspectre-v2=2 text data bss dec hex filename 2991161 8576 9080 3008817 2de931 500.perlbench_r 135% 12244886 37432 1150664 13432982 ccf896 502.gcc_r 132% 28475 712 744 29931 74eb 505.mcf_r 131% 2397026 8984 46888 2452898 256da2 520.omnetpp_r 116% 6846853 142584 20040 7009477 6af4c5 523.xalancbmk_r 119% 645730 6102 29592 681424 a65d0 525.x264_r 127% 111166 784 12138360 12250310 baecc6 531.deepsjeng_r 132% 260835 8544 30072 299451 491bb 541.leela_r 117% 96874 864 6384 104122 196ba 548.exchange2_r 137% 215288 884 17704 233876 39194 557.xz_r 119% PEAK -fspectre-v2=3 text data bss dec hex filename 3365945 8584 9080 3383609 33a139 500.perlbench_r 152% 14790638 37440 1150664 15978742 f3d0f6 502.gcc_r 159% 31419 720 744 32883 8073 505.mcf_r 144% 2867893 8992 46888 2923773 2c9cfd 520.omnetpp_r 139% 8183689 142592 20040 8346321 7f5ad1 523.xalancbmk_r 142% 697434 6110 29592 733136 b2fd0 525.x264_r 137% 123638 792 12138360 12262790 bb1d86 531.deepsjeng_r 147% 315347 8552 30072 353971 566b3 541.leela_r 141% 98578 872 6384 105834 19d6a 548.exchange2_r 140% 239144 892 17704 257740 3eecc 557.xz_r 133% The patch relies heavily on RTL optimizations for DCE purposes. At the same time we rely on RTL not statically computing the mask (RTL has no conditional constant propagation). Full instrumentation of the classic Spectre V1 testcase char a[1024]; int b[1024]; int foo (int i, int bound) { if (i < bound) return b[a[i]]; } is the following: foo: .LFB0: .cfi_startproc xorl %eax, %eax cmpl %esi, %edi setge %al subq $1, %rax jne .L4 ret .p2align 4,,10 .p2align 3 .L4: andl %eax, %edi movslq %edi, %rdi movsbq a(%rdi), %rax movl b(,%rax,4), %eax ret so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome. Patch below for reference (and your own testing in case you are curious). I do not plan to pursue this further at this point. Richard. >From 01e4a5a43e266065d32489daa50de0cf2425d5f5 Mon Sep 17 00:00:00 2001 From: Richard Guenther <rguenther@suse.de> Date: Wed, 5 Dec 2018 13:17:02 +0100 Subject: [PATCH] warn-spectrev1 diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 7960cace16a..64d472d7fa0 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1334,6 +1334,7 @@ OBJS = \ gimple-ssa-sprintf.o \ gimple-ssa-warn-alloca.o \ gimple-ssa-warn-restrict.o \ + gimple-ssa-spectrev1.o \ gimple-streamer-in.o \ gimple-streamer-out.o \ gimple-walk.o \ diff --git a/gcc/common.opt b/gcc/common.opt index 45d7f6189e5..1ae7fcfe177 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -702,6 +702,10 @@ Warn when one local variable shadows another local variable or parameter of comp Wshadow-compatible-local Common Warning Undocumented Alias(Wshadow=compatible-local) +Wspectre-v1 +Common Var(warn_spectrev1) Warning +Warn about code susceptible to spectre v1 style attacks. + Wstack-protector Common Var(warn_stack_protect) Warning Warn when not issuing stack smashing protection for some reason. @@ -2406,6 +2410,14 @@ fsingle-precision-constant Common Report Var(flag_single_precision_constant) Optimization Convert floating point constants to single precision constants. +fspectre-v1 +Common Alias(fspectre-v1=, 2, 0) +Insert code to mitigate spectre v1 style attacks. + +fspectre-v1= +Common Report RejectNegative Joined UInteger IntegerRange(0, 3) Var(flag_spectrev1) Optimization +Insert code to mitigate spectre v1 style attacks. + fsplit-ivs-in-unroller Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization Split lifetimes of induction variables when loops are unrolled. diff --git a/gcc/gimple-ssa-spectrev1.cc b/gcc/gimple-ssa-spectrev1.cc new file mode 100644 index 00000000000..c2a5dc95324 --- /dev/null +++ b/gcc/gimple-ssa-spectrev1.cc @@ -0,0 +1,824 @@ +/* Loop interchange. + Copyright (C) 2017-2018 Free Software Foundation, Inc. + Contributed by ARM Ltd. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +GCC is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "is-a.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "gimple-pretty-print.h" +#include "gimple-iterator.h" +#include "params.h" +#include "tree-ssa.h" +#include "cfganal.h" +#include "gimple-walk.h" +#include "tree-ssa-loop.h" +#include "tree-dfa.h" +#include "tree-cfg.h" +#include "fold-const.h" +#include "builtins.h" +#include "alias.h" +#include "cfgloop.h" +#include "varasm.h" +#include "cgraph.h" +#include "gimple-fold.h" +#include "diagnostic.h" + +/* The Spectre V1 situation is as follows: + + if (attacker_controlled_idx < bound) // speculated as true but is false + { + // out-of-bound access, returns value interesting to attacker + val = mem[attacker_controlled_idx]; + // access that causes a cache-line to be brought in - canary + ... = attacker_controlled_mem[val]; + } + + The last load provides the side-channel. The pattern can be split + into multiple functions or translation units. Conservatively we'd + have to warn about + + int foo (int *a) { return *a; } + + thus any indirect (or indexed) memory access. That's obvioulsy + not useful. + + The next level would be to warn only when we see load of val as + well. That then misses cases like + + int foo (int *a, int *b) + { + int idx = load_it (a); + return load_it (&b[idx]); + } + + Still we'd warn about cases like + + struct Foo { int *a; }; + int foo (struct Foo *a) { return *a->a; } + + though dereferencing VAL isn't really an interesting case. It's + hard to exclude this conservatively so the obvious solution is + to restrict the kind of loads that produce val, for example based + on its type or its number of bits. It's tempting to do this at + the point of the load producing val but in the end what matters + is the number of bits that reach the second loads [as index] given + there are practical limits on the size of the canary. For this + we have to consider + + int foo (struct Foo *a, int *b) + { + int *c = a->a; + int idx = *b; + return *(c + idx); + } + + where idx has too many bits to be an interesting attack vector(?). + */ + +/* The pass does two things, first it performs data flow analysis + to be able to warn about the second load. This is controlled + via -Wspectre-v1. + + Second it instruments control flow in the program to track a + mask which is all-ones but all-zeroes if the CPU speculated + a branch in the wrong direction. This mask is then used to + mask the address[-part(s)] of loads with non-invariant addresses, + effectively mitigating the attack. This is controlled by + -fpectre-v1[=N] where N is default 2 and + 1 optimistically omit some instrumentations (currently + backedge control flow instructions do not update the + speculation mask) + 2 instrument conservatively using a function-local speculation + mask + 3 instrument conservatively using a global (TLS) speculation + mask. This adds TLS loads/stores of the speculation mask + at function boundaries and before and after calls. + */ + +/* We annotate statements whose defs cannot be used to leaking data + speculatively via loads with SV1_SAFE. This is used to optimize + masking of indices where masked indices (and derived by constant + ones) are not masked again. Note this works only up to the points + that possibly change the speculation mask value. */ +#define SV1_SAFE GF_PLF_1 + +namespace { + +const pass_data pass_data_spectrev1 = +{ + GIMPLE_PASS, /* type */ + "spectrev1", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_NONE, /* tv_id */ + PROP_cfg|PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa, /* todo_flags_finish */ +}; + +class pass_spectrev1 : public gimple_opt_pass +{ +public: + pass_spectrev1 (gcc::context *ctxt) + : gimple_opt_pass (pass_data_spectrev1, ctxt) + {} + + /* opt_pass methods: */ + opt_pass * clone () { return new pass_spectrev1 (m_ctxt); } + virtual bool gate (function *) { return warn_spectrev1 || flag_spectrev1; } + virtual unsigned int execute (function *); + + static bool stmt_is_indexed_load (gimple *); + static bool stmt_mangles_index (gimple *, tree); + static bool find_value_dependent_guard (gimple *, tree); + static void mark_influencing_outgoing_flow (basic_block, tree); + static tree instrument_mem (gimple_stmt_iterator *, tree, tree); +}; // class pass_spectrev1 + +bitmap_head *influencing_outgoing_flow; + +static bool +call_between (gimple *first, gimple *second) +{ + gcc_assert (gimple_bb (first) == gimple_bb (second)); + /* ??? This is inefficient. Maybe we can use gimple_uid to assign + unique IDs to stmts belonging to groups with the same speculation + mask state. */ + for (gimple_stmt_iterator gsi = gsi_for_stmt (first); + gsi_stmt (gsi) != second; gsi_next (&gsi)) + if (is_gimple_call (gsi_stmt (gsi))) + return true; + return false; +} + +basic_block ctx_bb; +gimple *ctx_stmt; +static bool +gather_indexes (tree, tree *idx, void *data) +{ + vec<tree *> *indexes = (vec<tree *> *)data; + if (TREE_CODE (*idx) != SSA_NAME) + return true; + if (!SSA_NAME_IS_DEFAULT_DEF (*idx) + && gimple_bb (SSA_NAME_DEF_STMT (*idx)) == ctx_bb + && gimple_plf (SSA_NAME_DEF_STMT (*idx), SV1_SAFE) + && (flag_spectrev1 < 3 + || !call_between (SSA_NAME_DEF_STMT (*idx), ctx_stmt))) + return true; + if (indexes->is_empty ()) + indexes->safe_push (idx); + else if (*(*indexes)[0] == *idx) + indexes->safe_push (idx); + else + return false; + return true; +} + +tree +pass_spectrev1::instrument_mem (gimple_stmt_iterator *gsi, tree mem, tree mask) +{ + /* First try to see if we can find a single index we can zero which + has the chance of repeating in other loads and also avoids separate + LEA and memory references decreasing code size and AGU occupancy. */ + auto_vec<tree *, 8> indexes; + ctx_bb = gsi_bb (*gsi); + ctx_stmt = gsi_stmt (*gsi); + if (PARAM_VALUE (PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES) > 0 + && for_each_index (&mem, gather_indexes, (void *)&indexes)) + { + /* All indices are safe. */ + if (indexes.is_empty ()) + return mem; + if (TYPE_PRECISION (TREE_TYPE (*indexes[0])) + <= TYPE_PRECISION (TREE_TYPE (mask))) + { + tree idx = *indexes[0]; + gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (idx)) + || POINTER_TYPE_P (TREE_TYPE (idx))); + /* Instead of instrumenting IDX directly we could look at + definitions with a single SSA use and instrument that + instead. But we have to do some work to make SV1_SAFE + propagation updated then - this would really ask to first + gather all indexes of all refs we want to instrument and + compute some optimal set of instrumentations. */ + gimple_seq seq = NULL; + tree idx_mask = gimple_convert (&seq, TREE_TYPE (idx), mask); + tree masked_idx = gimple_build (&seq, BIT_AND_EXPR, + TREE_TYPE (idx), idx, idx_mask); + /* Mark the instrumentation sequence as visited. */ + for (gimple_stmt_iterator si = gsi_start (seq); + !gsi_end_p (si); gsi_next (&si)) + gimple_set_visited (gsi_stmt (si), true); + gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT); + gimple_set_plf (SSA_NAME_DEF_STMT (masked_idx), SV1_SAFE, true); + /* Replace downstream users in the BB which reduces register pressure + and allows SV1_SAFE propagation to work (which stops at call/BB + boundaries though). + ??? This is really reg-pressure vs. dependence chains so not + a generally easy thing. Making the following propagate into + all uses dominated by the insert slows down 429.mcf even more. + ??? We can actually track SV1_SAFE across PHIs but then we + have to propagate into PHIs here. */ + gimple *use_stmt; + use_operand_p use_p; + imm_use_iterator iter; + FOR_EACH_IMM_USE_STMT (use_stmt, iter, idx) + if (gimple_bb (use_stmt) == gsi_bb (*gsi) + && gimple_code (use_stmt) != GIMPLE_PHI + && !gimple_visited_p (use_stmt)) + { + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) + SET_USE (use_p, masked_idx); + update_stmt (use_stmt); + } + /* Modify MEM in place... (our stmt is already marked visited). */ + for (unsigned i = 0; i < indexes.length (); ++i) + *indexes[i] = masked_idx; + return mem; + } + } + + /* ??? Can we handle TYPE_REVERSE_STORAGE_ORDER at all? Need to + handle BIT_FIELD_REFs. */ + + /* Strip a bitfield reference to re-apply it at the end. */ + tree bitfield = NULL_TREE; + tree bitfield_off = NULL_TREE; + if (TREE_CODE (mem) == COMPONENT_REF + && DECL_BIT_FIELD (TREE_OPERAND (mem, 1))) + { + bitfield = TREE_OPERAND (mem, 1); + bitfield_off = TREE_OPERAND (mem, 2); + mem = TREE_OPERAND (mem, 0); + } + + tree ptr_base = mem; + /* VIEW_CONVERT_EXPRs do not change offset, strip them, they get folded + into the MEM_REF we create. */ + while (TREE_CODE (ptr_base) == VIEW_CONVERT_EXPR) + ptr_base = TREE_OPERAND (ptr_base, 0); + + tree ptr = make_ssa_name (ptr_type_node); + gimple *new_stmt = gimple_build_assign (ptr, build_fold_addr_expr (ptr_base)); + gimple_set_visited (new_stmt, true); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + ptr = make_ssa_name (ptr_type_node); + new_stmt = gimple_build_assign (ptr, BIT_AND_EXPR, + gimple_assign_lhs (new_stmt), mask); + gimple_set_visited (new_stmt, true); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + tree type = TREE_TYPE (mem); + unsigned align = get_object_alignment (mem); + if (align != TYPE_ALIGN (type)) + type = build_aligned_type (type, align); + + tree new_mem = build2 (MEM_REF, type, ptr, + build_int_cst (reference_alias_ptr_type (mem), 0)); + if (bitfield) + new_mem = build3 (COMPONENT_REF, TREE_TYPE (bitfield), new_mem, + bitfield, bitfield_off); + return new_mem; +} + +bool +check_spectrev1_2nd_load (tree, tree *idx, void *data) +{ + sbitmap value_from_indexed_load = (sbitmap)data; + if (TREE_CODE (*idx) == SSA_NAME + && bitmap_bit_p (value_from_indexed_load, SSA_NAME_VERSION (*idx))) + return false; + return true; +} + +bool +check_spectrev1_2nd_load (gimple *, tree, tree ref, void *data) +{ + return !for_each_index (&ref, check_spectrev1_2nd_load, data); +} + +void +pass_spectrev1::mark_influencing_outgoing_flow (basic_block bb, tree op) +{ + if (!bitmap_set_bit (&influencing_outgoing_flow[SSA_NAME_VERSION (op)], + bb->index)) + return; + + /* Note we are deliberately non-conservatively stop at call and + memory boundaries here expecting earlier optimization to expose + value dependences via SSA chains. */ + gimple *def_stmt = SSA_NAME_DEF_STMT (op); + if (gimple_vuse (def_stmt) + || !is_gimple_assign (def_stmt)) + return; + + ssa_op_iter i; + FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, i, SSA_OP_USE) + mark_influencing_outgoing_flow (bb, op); +} + +bool +pass_spectrev1::find_value_dependent_guard (gimple *stmt, tree op) +{ + bitmap_iterator bi; + unsigned i; + EXECUTE_IF_SET_IN_BITMAP (&influencing_outgoing_flow[SSA_NAME_VERSION (op)], + 0, i, bi) + /* ??? If control-dependent on. + ??? Make bits in influencing_outgoing_flow the index of the BB + in RPO order so we could walk bits from STMT "upwards" finding + the nearest one. */ + if (dominated_by_p (CDI_DOMINATORS, + gimple_bb (stmt), BASIC_BLOCK_FOR_FN (cfun, i))) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, stmt, "Condition %G in block %d " + "is related to indexes used in %G

", + last_stmt (BASIC_BLOCK_FOR_FN (cfun, i)), + i, stmt); + return true; + } + + /* Note we are deliberately non-conservatively stop at call and + memory boundaries here expecting earlier optimization to expose + value dependences via SSA chains. */ + gimple *def_stmt = SSA_NAME_DEF_STMT (op); + if (gimple_vuse (def_stmt) + || !is_gimple_assign (def_stmt)) + return false; + + ssa_op_iter it; + FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, it, SSA_OP_USE) + if (find_value_dependent_guard (stmt, op)) + /* Others may be "nearer". */ + return true; + + return false; +} + +bool +pass_spectrev1::stmt_is_indexed_load (gimple *stmt) +{ + /* Given we ignore the function boundary for incoming parameters + let's ignore return values of calls as well for the purpose + of being the first indexed load (also ignore inline-asms). */ + if (!gimple_assign_load_p (stmt)) + return false; + + /* Exclude esp. pointers from the index load itself (but also floats, + vectors, etc. - quite a bit handwaving here). */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))) + return false; + + /* If we do not have any SSA uses the load cannot be one indexed + by an attacker controlled value. */ + if (zero_ssa_operands (stmt, SSA_OP_USE)) + return false; + + return true; +} + +/* Return true whether the index in the use operand OP in STMT is + not transfered to STMTs defs. */ + +bool +pass_spectrev1::stmt_mangles_index (gimple *stmt, tree op) +{ + if (gimple_assign_load_p (stmt)) + return true; + if (gassign *ass = dyn_cast <gassign *> (stmt)) + { + enum tree_code code = gimple_assign_rhs_code (ass); + switch (code) + { + case TRUNC_DIV_EXPR: + case CEIL_DIV_EXPR: + case FLOOR_DIV_EXPR: + case ROUND_DIV_EXPR: + case EXACT_DIV_EXPR: + case RDIV_EXPR: + case TRUNC_MOD_EXPR: + case CEIL_MOD_EXPR: + case FLOOR_MOD_EXPR: + case ROUND_MOD_EXPR: + case LSHIFT_EXPR: + case RSHIFT_EXPR: + case LROTATE_EXPR: + case RROTATE_EXPR: + /* Division, modulus or shifts by the index do not produce + something useful for the attacker. */ + if (gimple_assign_rhs2 (ass) == op) + return true; + break; + default:; + /* Comparisons do not produce an index value. */ + if (TREE_CODE_CLASS (code) == tcc_comparison) + return true; + } + } + /* ??? We could handle builtins here. */ + return false; +} + +static GTY(()) tree spectrev1_tls_mask_decl; + +/* Main entry for spectrev1 pass. */ + +unsigned int +pass_spectrev1::execute (function *fn) +{ + calculate_dominance_info (CDI_DOMINATORS); + loop_optimizer_init (AVOID_CFG_MODIFICATIONS); + + int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); + int rpo_num = pre_and_rev_post_order_compute_fn (fn, NULL, rpo, false); + + /* We track for each SSA name whether its value (may) depend(s) on + the result of an indexed load. + A set of operation will kill a value (enough). */ + auto_sbitmap value_from_indexed_load (num_ssa_names); + bitmap_clear (value_from_indexed_load); + + unsigned orig_num_ssa_names = num_ssa_names; + influencing_outgoing_flow = XCNEWVEC (bitmap_head, num_ssa_names); + for (unsigned i = 1; i < num_ssa_names; ++i) + bitmap_initialize (&influencing_outgoing_flow[i], &bitmap_default_obstack); + + + /* Diagnosis. */ + + /* Function arguments are not indexed loads unless we want to + be conservative to a level no longer useful. */ + + for (int i = 0; i < rpo_num; ++i) + { + basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]); + + for (gphi_iterator gpi = gsi_start_phis (bb); + !gsi_end_p (gpi); gsi_next (&gpi)) + { + gphi *phi = gpi.phi (); + bool value_from_indexed_load_p = false; + use_operand_p arg_p; + ssa_op_iter it; + FOR_EACH_PHI_ARG (arg_p, phi, it, SSA_OP_USE) + { + tree arg = USE_FROM_PTR (arg_p); + if (TREE_CODE (arg) == SSA_NAME + && bitmap_bit_p (value_from_indexed_load, + SSA_NAME_VERSION (arg))) + value_from_indexed_load_p = true; + } + if (value_from_indexed_load_p) + bitmap_set_bit (value_from_indexed_load, + SSA_NAME_VERSION (PHI_RESULT (phi))); + } + + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (is_gimple_debug (stmt)) + continue; + + if (walk_stmt_load_store_ops (stmt, value_from_indexed_load, + check_spectrev1_2nd_load, + check_spectrev1_2nd_load)) + warning_at (gimple_location (stmt), OPT_Wspectre_v1, "%Gspectrev1", + stmt); + + bool value_from_indexed_load_p = false; + if (stmt_is_indexed_load (stmt)) + { + /* We are interested in indexes to later loads so ultimatively + register values that all happen to separate SSA defs. + Interesting aggregates will be decomposed by later loads + which we then mark as producing an index. Simply mark + all SSA defs as coming from an indexed load. */ + /* We are handling a single load in STMT right now. */ + ssa_op_iter it; + tree op; + FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE) + if (find_value_dependent_guard (stmt, op)) + { + /* ??? Somehow record the dependence to point to it in + diagnostics. */ + value_from_indexed_load_p = true; + break; + } + } + + tree op; + ssa_op_iter it; + FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE) + if (bitmap_bit_p (value_from_indexed_load, + SSA_NAME_VERSION (op)) + && !stmt_mangles_index (stmt, op)) + { + value_from_indexed_load_p = true; + break; + } + + if (value_from_indexed_load_p) + FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_DEF) + /* ??? We could cut off single-bit values from the chain + here or pretain that float loads will be never turned + into integer indices, etc. */ + bitmap_set_bit (value_from_indexed_load, + SSA_NAME_VERSION (op)); + } + + if (EDGE_COUNT (bb->succs) > 1) + { + gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb)); + /* ??? What about switches? What about badly speculated EH? */ + if (!stmt) + continue; + /* We could constrain conditions here to those more likely + being "bounds checks". For example common guards for + indirect accesses are NULL pointer checks. + ??? This isn't fully safe, but it drops the number of + spectre warnings for dwarf2out.i from cc1files from 70 to 16. */ + if ((gimple_cond_code (stmt) == EQ_EXPR + || gimple_cond_code (stmt) == NE_EXPR) + && integer_zerop (gimple_cond_rhs (stmt)) + && POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))) + ; + else + { + ssa_op_iter it; + tree op; + FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE) + mark_influencing_outgoing_flow (bb, op); + } + } + } + + for (unsigned i = 1; i < orig_num_ssa_names; ++i) + bitmap_release (&influencing_outgoing_flow[i]); + XDELETEVEC (influencing_outgoing_flow); + + + + /* Instrumentation. */ + if (!flag_spectrev1) + return 0; + + /* Create the default all-ones mask. When doing IPA instrumentation + this should initialize the mask from TLS memory and outgoing edges + need to save the mask to TLS memory. */ + gimple *new_stmt; + if (!spectrev1_tls_mask_decl + && flag_spectrev1 >= 3) + { + /* Use a smaller variable in case sign-extending loads are + available? */ + spectrev1_tls_mask_decl + = build_decl (BUILTINS_LOCATION, + VAR_DECL, NULL_TREE, ptr_type_node); + TREE_STATIC (spectrev1_tls_mask_decl) = 1; + TREE_PUBLIC (spectrev1_tls_mask_decl) = 1; + DECL_VISIBILITY (spectrev1_tls_mask_decl) = VISIBILITY_HIDDEN; + DECL_VISIBILITY_SPECIFIED (spectrev1_tls_mask_decl) = 1; + DECL_INITIAL (spectrev1_tls_mask_decl) + = build_all_ones_cst (ptr_type_node); + DECL_NAME (spectrev1_tls_mask_decl) = get_identifier ("__SV1MSK"); + DECL_ARTIFICIAL (spectrev1_tls_mask_decl) = 1; + DECL_IGNORED_P (spectrev1_tls_mask_decl) = 1; + varpool_node::finalize_decl (spectrev1_tls_mask_decl); + make_decl_one_only (spectrev1_tls_mask_decl, + DECL_ASSEMBLER_NAME (spectrev1_tls_mask_decl)); + set_decl_tls_model (spectrev1_tls_mask_decl, + decl_default_tls_model (spectrev1_tls_mask_decl)); + } + + /* We let the SSA rewriter cope with rewriting mask into SSA and + inserting PHI nodes. */ + tree mask = create_tmp_reg (ptr_type_node, "spectre_v1_mask"); + new_stmt = gimple_build_assign (mask, + flag_spectrev1 >= 3 + ? spectrev1_tls_mask_decl + : build_all_ones_cst (ptr_type_node)); + gimple_stmt_iterator gsi + = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fn))); + gsi_insert_before (&gsi, new_stmt, GSI_CONTINUE_LINKING); + + /* We are using the visited flag to track stmts downstream in a BB. */ + for (int i = 0; i < rpo_num; ++i) + { + basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]); + for (gphi_iterator gpi = gsi_start_phis (bb); + !gsi_end_p (gpi); gsi_next (&gpi)) + gimple_set_visited (gpi.phi (), false); + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + gimple_set_visited (gsi_stmt (gsi), false); + } + + for (int i = 0; i < rpo_num; ++i) + { + basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]); + + for (gphi_iterator gpi = gsi_start_phis (bb); + !gsi_end_p (gpi); gsi_next (&gpi)) + { + gphi *phi = gpi.phi (); + /* ??? We can merge SAFE state across BB boundaries in + some cases, like when edges are not critical and the + state was made SAFE in the tail of the predecessors + and not invalidated by calls. */ + gimple_set_plf (phi, SV1_SAFE, false); + } + + bool instrumented_call_p = false; + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + gimple_set_visited (stmt, true); + if (is_gimple_debug (stmt)) + continue; + + tree op; + ssa_op_iter it; + bool safe = is_gimple_assign (stmt); + if (safe) + FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE) + { + if (safe + && (SSA_NAME_IS_DEFAULT_DEF (op) + || !gimple_plf (SSA_NAME_DEF_STMT (op), SV1_SAFE) + /* Once mask can have changed we cannot further + propagate safe state. */ + || gimple_bb (SSA_NAME_DEF_STMT (op)) != bb + /* That includes calls if we have instrumented one + in this block. */ + || (instrumented_call_p + && call_between (SSA_NAME_DEF_STMT (op), stmt)))) + { + safe = false; + break; + } + } + gimple_set_plf (stmt, SV1_SAFE, safe); + + /* Instrument bounded loads. + We instrument non-aggregate loads with non-invariant address. + The idea is to reliably instrument the bounded load while + leaving the canary, being it load or store, aggregate or + non-aggregate, alone. */ + if (gimple_assign_single_p (stmt) + && gimple_vuse (stmt) + && !gimple_vdef (stmt) + && !zero_ssa_operands (stmt, SSA_OP_USE)) + { + tree new_mem = instrument_mem (&gsi, gimple_assign_rhs1 (stmt), + mask); + gimple_assign_set_rhs1 (stmt, new_mem); + update_stmt (stmt); + /* The value loaded my a masked load is "safe". */ + gimple_set_plf (stmt, SV1_SAFE, true); + } + + /* Instrument return store to TLS mask. */ + if (flag_spectrev1 >= 3 + && gimple_code (stmt) == GIMPLE_RETURN) + { + new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask); + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + } + /* Instrument calls with store/load to/from TLS mask. + ??? Placement of the stores/loads can be optimized in a LCM + way. */ + else if (flag_spectrev1 >= 3 + && is_gimple_call (stmt) + && gimple_vuse (stmt)) + { + new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask); + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + if (!stmt_ends_bb_p (stmt)) + { + new_stmt = gimple_build_assign (mask, + spectrev1_tls_mask_decl); + gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT); + } + else + { + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, bb->succs) + { + if (e->flags & EDGE_ABNORMAL) + continue; + new_stmt = gimple_build_assign (mask, + spectrev1_tls_mask_decl); + gsi_insert_on_edge (e, new_stmt); + } + } + instrumented_call_p = true; + } + } + + if (EDGE_COUNT (bb->succs) > 1) + { + gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb)); + /* ??? What about switches? What about badly speculated EH? */ + if (!stmt) + continue; + + /* Instrument conditional branches to track mis-speculation + via a pointer-sized mask. + ??? We could restrict to instrumenting those conditions + that control interesting loads or apply simple heuristics + like not instrumenting FP compares or equality compares + which are unlikely bounds checks. But we have to instrument + bool != 0 because multiple conditions might have been + combined. */ + edge truee, falsee; + extract_true_false_edges_from_block (bb, &truee, &falsee); + /* Unless -fspectre-v1=2 we do not instrument loop exit tests. */ + if (flag_spectrev1 >= 2 + || !loop_exits_from_bb_p (bb->loop_father, bb)) + { + gimple_stmt_iterator gsi = gsi_last_bb (bb); + + /* Instrument + if (a_1 > b_2) + as + tem_mask_3 = a_1 > b_2 ? -1 : 0; + if (tem_mask_3 != 0) + this will result in a + xor %eax, %eax; cmp|test; setCC %al; sub $0x1, %eax; jne + sequence which is faster in practice than when retaining + the original jump condition. This is 10 bytes overhead + on x86_64 plus 3 bytes for an and on the true path and + 5 bytes for an and and not on the false path. */ + tree tem_mask = make_ssa_name (ptr_type_node); + new_stmt = gimple_build_assign (tem_mask, COND_EXPR, + build2 (gimple_cond_code (stmt), + boolean_type_node, + gimple_cond_lhs (stmt), + gimple_cond_rhs (stmt)), + build_all_ones_cst (ptr_type_node), + build_zero_cst (ptr_type_node)); + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + gimple_cond_set_code (stmt, NE_EXPR); + gimple_cond_set_lhs (stmt, tem_mask); + gimple_cond_set_rhs (stmt, build_zero_cst (ptr_type_node)); + update_stmt (stmt); + + /* On the false edge + mask = mask & ~tem_mask_3; */ + gimple_seq tems = NULL; + tree tem_mask2 = make_ssa_name (ptr_type_node); + new_stmt = gimple_build_assign (tem_mask2, BIT_NOT_EXPR, + tem_mask); + gimple_seq_add_stmt_without_update (&tems, new_stmt); + new_stmt = gimple_build_assign (mask, BIT_AND_EXPR, + mask, tem_mask2); + gimple_seq_add_stmt_without_update (&tems, new_stmt); + gsi_insert_seq_on_edge (falsee, tems); + + /* On the true edge + mask = mask & tem_mask_3; */ + new_stmt = gimple_build_assign (mask, BIT_AND_EXPR, + mask, tem_mask); + gsi_insert_on_edge (truee, new_stmt); + } + } + } + + gsi_commit_edge_inserts (); + + return 0; +} + +} // anon namespace + +gimple_opt_pass * +make_pass_spectrev1 (gcc::context *ctxt) +{ + return new pass_spectrev1 (ctxt); +} diff --git a/gcc/params.def b/gcc/params.def index 6f98fccd291..19f7dbf4dad 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1378,6 +1378,11 @@ DEFPARAM(PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS, " loops.", 100, 0, 0) +DEFPARAM(PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES, + "spectre-v1-max-instrument-indices", + "Maximum number of indices to instrument before instrumenting the whole address.", + 1, 0, 0) + /* Local variables: diff --git a/gcc/passes.def b/gcc/passes.def index 144df4fa417..2fe0cdcfa7e 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -400,6 +400,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_lower_resx); NEXT_PASS (pass_nrv); NEXT_PASS (pass_cleanup_cfg_post_optimizing); + NEXT_PASS (pass_spectrev1); NEXT_PASS (pass_warn_function_noreturn); NEXT_PASS (pass_gen_hsail); diff --git a/gcc/testsuite/gcc.dg/Wspectre-v1-1.c b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c new file mode 100644 index 00000000000..3ac647e72fd --- /dev/null +++ b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-Wspectre-v1" } */ + +unsigned char a[1024]; +int b[256]; +int foo (int i, int bound) +{ + if (i < bound) + return b[a[i]]; /* { dg-warning "spectrev1" } */ +} diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 9f9d85fdbc3..f5c164f465f 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -625,6 +625,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt); extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt); extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_spectrev1 (gcc::context *ctxt); /* Current optimization pass. */ extern opt_pass *current_pass;