Alexandre Oliva - 2018-10-11 v1.0 (*) -g-Ology, or gOlogy, stands for the study of how optimization levels (selected by -O flags) affect the quality of debugging information (enabled by -g flags). This report assesses the theoretical and practical impact of various optimizations available in the GNU Compiler Collection version 8 on the debugging experience of applications compiled by it. The goal is to assess the quality of the debug information generated by GCC with optimization enabled, document the effects of optimization passes, and identify and document problems and opportunities to improve it. GCC offers various optimization levels, from -O0 to -O3, plus -Og, -Osize and -Ofast, and way over a hundred independently-controllable optimization flags. Each of the optimization levels enables a subset of the optimization flags; enabling debugging information generation, on the other hand, is not supposed to have any effect whatsoever on the executable code. This report focuses on flags that are enabled by the -O* options, and their effects on (extended) DWARF debug information generated by GCC. This report is structured as follows. The introduction outlines how GCC gets from source code to output assembly code and debug information, the major internal representation forms used throughout compilation, and several techniques used by GCC to keep track of the mapping from internal representations and output executable code to corresponding source code concepts. Then, the bulk of the report goes through each of the -O flags, and in each of them, through the optimization passes that are enabled or affected by the -O flag, describing the general behavior of the pass and what effects it may have on debug information. The final section highlights and consolidates the most relevant findings. == Introduction In GCC, language front ends parse a translation unit and deliver to the so-called middle end a number of functions (procedures, methods, subprograms) to compile in a form that, although language-independent, closely resembles a parse tree. Each function then goes through a number of passes, some of which are only executed when certain optimization flags are enabled, or other conditions are met. The tree form is turned into gimple form, in which each function amounts to a set of basic blocks in a control flow graph, each containing a sequence of stmts represented as tuples. A stmt may be a label definition, a simple assignment, a function call, a conditional or unconditional branch, an asm statement, debug binds or markers, or other less common forms. Scalar variables are versioned and converted to static single assignment (SSA) form, in which each reference to a variable takes a version that links it back to a single definition of that variable version. Additional definitions, called PHI nodes, may be introduced at confluence basic block, indicating which version is to be taken when arriving from each incoming block. This is the form in which most of the optimization passes in GCC take place. Each function is then expanded to the register transfer language (RTL) form, in which basic blocks are now formed by a sequence of insns, each one corresponding to a machine instruction defined in the target back end, or other machine-independent forms such as debug binds and markers, notes and other forms not relevant for this report. Each insn may contain zero or more computations represented as SETs (one of which may set PC to indicate a branch), a CALL, an ASM, and indicators that additional registers or memory can be used or unpredictably modified. Scalar variables are initially assigned to pseudo-registers, and many RTL optimization passes operate in this form. Register allocation will then map each remaining pseudo-register to a hardware register (if optimizing) or a stack slot, adding spills and reloads as needed to satisfy the requirements of each hardware instruction. A few RTL passes run after register allocation, and at the end assembly code is output for each insn, while outputting debug information that is to be interspersed with the assembly code, and gathering debug information that is consolidated and output afterwards. === Preserving debug information There was a time when debugging required disabling optimizations. Debug information formats back then could only assign a single location to each variable, and optimizing out the frame pointer would remove the base reference for all stack-based variables. GCC has long had the notion that enabling debug information should not cause any changes to executable code. To that end, each stmt and insn carries source location information, i.e., file and line (and, more recently, column) numbers and lexical blocks, even when debug information is not enabled. Without optimization, this makes for single-stepping in a debugger just in the natural order of execution, and all variables are assigned stable memory locations, which makes for a single location per variable throughout its lifetime. Optimizations introduce complications, combining, simplifying and removing computations, modifying the order of execution, reusing registers and stack slots, duplicating portions of code, introducing alternate induction variables and modifying the iteration order in loop nests. Compiler and debug information formats have evolved over time so as to enable optimized programs to be represented and debugged, with varying levels of success. For example, automatic variables in optimized programs may live in a register for some time, another register at another time, and a stack slot at other times. DWARF debug information supports location lists, that may indicate a different location for a variable for different, possibly-overlapping executable code ranges. Memory references in gimple and RTL forms carry symbolic expressions used for alias analysis, and also to build location lists; SSA versions, RTL pseudo-registers and hardware registers also carry symbolic references to the variables they refer to. The variable tracking pass identifies, using such symbolic references, situations in which the location of a variable varies throughout its lifetime, and arranges for location lists to be output accordingly. As location expressions gained the ability to represent value expressions, it became possible to indicate that in a certain range a variable holds a known constant value, or that its value is not available directly, but it can be computed from other locations. Variable tracking at assignments extended variable tracking, introducing debug binds early in compilation that associate a scalar source variable with the location in which its value is stored, arranging for the location/value expressions to be adjusted throughout the compilation (even if computations are removed or moved past the binds, so that the bound value expressions remain accurate) while preserving their natural execution order, and using such binds to generate location lists. Although each stmt and insn carries source location information, as they're shuffled by optimization, single-stepping may go back to earlier statements, and it becomes impossible to tell when the effects of a statement are complete. Statement Frontier Notes (SFN) are introduced as additional debug notes, emitted (so far only by C and C++ parsers) in the stmt stream to mark the beginning of logical statements, thus after any debug binds associated with previous statements take effect. Their natural execution order is retained by the compiler, so the markers can be used to output source location information marked as recommended stop points (the is_stmt flag in DWARF line number tables), avoiding bouncing and making for predictable observability of side effects. Given optimization, it is not uncommon for no executable code to remain between inspection points for multiple neighbor statements. This was a problem because, although multiple source locations can be associated with a single address in the line number table, ranges in location lists could only name addresses of executable instructions. Location view (LVu) numbering was introduced to identify each of the entries in the line number table that refer to the same code address, so that they can then be referenced unambiguously in location lists. The representation of such extended location lists requires extensions proposed for DWARF v6, and at the time of this writing, there aren't any debuggers that support such extended location lists. Still, since GCC makes the information available and we expect debuggers to catch up eventually, the analyses that follow assume the disambiguation given by LVu is effective in masking the optimization effects it was created to overcome. Despite all this effort, it is not realistic to expect the debug experience of a program without optimization to be the same as that of a program optimized even by optimizations regarded as not affecting debugging. For example, a variable assigned to an exclusive stack slot will be available throughout a function, but optimization may assign it to a register during its limited live range, and then it won't be possible to inspect it elsewhere. Setting breakpoints based on addresses of executable code may not work as effectively in optimized programs, because the same spot of the program may have been duplicated by optimization, and then the breakpoint may not hit where expected. Having the value of a variable available in a given locations, say its stack slot, does not guarantee it is possible to modify it, say it could have just been loaded into a register, that may then be modified by the program and stored back in the stack slot; this might happen even without optimization, but the windows for this possibility are narrower. Furthermore, folding that logically follows from reasoning about what is known about a variable at compile time may no longer be applicable if the variable is modified in the debugger; if a block was removed because the condition guarding it was provably false at compile time, changing a variable so that the condition would evaluate to true will not bring back the code that was optimized out. So, inspecting variables in optimized programs is more likely to yield "optimized out" because optimizations may expose dead ranges that are not noticed with -O0, and modifying them may always conflict with optimizations. As for breakpoints, using source locations rather than code addresses is less likely to yield surprising results. == Optimizations In this section, each optimization level is detailed, enumerating the flags incrementally enabled by it over the previous level, and detailing the effects on debugging brought about by each of the optimization levels and flags. Optimization levels form a nearly-strict crescendo in terms of passes they activate: -O0 ->#O0, -Og ->#Og, -O1 ->#O1, -Os ->#Os, -O2 ->#O2, -O3 ->#O3, -Ofast ->#Ofast. Nevertheless, determining when a pass is run is an involved process. Each pass has a gate function, that decides whether to run the pass based on optimization levels and flags. The default_options_table array in gcc/opts.c arranges for flags to be enabled depending on the optimization level, but some flags are enabled by default through their initializer in e.g. gcc/common.opt. Some are also forced enabled or disabled depending on other conditions. However, even if the gate condition of a pass is enabled, it might not run if any enclosing pass group fails its own gate condition. The following outline depicts the optimization passes GCC goes through while compiling a function, in the order they might run; the information is extracted from gcc/passes.def. Indentation indicates grouping of the indented passes within the previous less-indented pass group. Parameters for the pass are indicated between parentheses after the pass name. all_lowering_passes: pass_warn_unused_result pass_diagnose_omp_blocks pass_diagnose_tm_blocks #pass_lower_omp pass_lower_omp see -O1 ->#O1-pass_lower_omp pass_lower_cf pass_lower_tm pass_refactor_eh #pass_lower_eh pass_lower_eh see -Og ->#Og-pass_lower_eh pass_build_cfg pass_warn_function_return #pass_expand_omp pass_expand_omp see -Og ->#Og-pass_expand_omp, and -O1 ->#O1-pass_lower_omp pass_sprintf_length(!fold_return_value) pass_walloca(strict_mode) pass_build_cgraph_edges all_small_ipa_passes: pass_ipa_free_lang_data pass_ipa_function_and_variable_visibility pass_ipa_chkp_versioning pass_ipa_chkp_early_produce_thunks pass_build_ssa_passes: pass_fixup_cfg pass_build_ssa pass_warn_nonnull_compare pass_ubsan pass_early_warn_uninitialized pass_nothrow pass_rebuild_cgraph_edges pass_chkp_instrumentation_passes: pass_fixup_cfg pass_chkp pass_rebuild_cgraph_edges pass_local_optimization_passes: pass_fixup_cfg pass_rebuild_cgraph_edges pass_local_fn_summary pass_early_inline pass_all_early_optimizations: pass_remove_cgraph_callee_edges pass_object_sizes(insert_min_max) #pass_ccp pass_ccp(!nonzero) ->#Og-tree-ccp see also --tree-bit-ccp ->#O1-tree-bit-ccp, and --ipa-bit-cp ->#Os-ipa-bit-cp #pass_forwprop pass_forwprop ->#O1-tree-forwprop pass_early_thread_jumps #pass_sra_early pass_sra_early ->#O1-tree-sra #pass_build_ealias pass_build_ealias ->#O1-tree-pta #pass_fre pass_fre ->#Og-tree-fre #pass_early_vrp pass_early_vrp ->#Os-tree-vrp #pass_merge_phi pass_merge_phi ->#O1-pass_merge_phi #pass_dse pass_dse ->#Og-tree-dse #pass_cd_dce pass_cd_dce ->#Og-tree-dce see also --tree-dce(aggressive) ->#Os-tree-dce(aggressive) #pass_early_ipa_sra pass_early_ipa_sra ->#Os-ipa-sra #pass_tail_recursion pass_tail_recursion ->#Os-optimize-sibling-calls #pass_convert_switch pass_convert_switch ->#Os-tree-switch-conversion #pass_cleanup_eh pass_cleanup_eh see -Og ->#Og-pass_split_crit_edges #pass_profile pass_profile see --guess-branch-probability ->#Og-guess-branch-probability #pass_local_pure_const pass_local_pure_const ->#Og-ipa-pure-const #pass_split_functions pass_split_functions ->#Os-partial-inlining pass_strip_predict_hints pass_release_ssa_names pass_rebuild_cgraph_edges pass_local_fn_summary pass_ipa_oacc: pass_ipa_pta pass_ipa_oacc_kernels: pass_oacc_kernels: #pass_ch pass_ch ->#Og-tree-ch pass_fre see above ->#pass_fre #pass_lim pass_lim ->#O1-tree-loop-im #pass_dominator pass_dominator(!may_peel_loop_headers) ->#O1-tree-dominator-opts #pass_dce pass_dce ->#Og-tree-dce pass_parallelize_loops(oacc_kernels) #pass_expand_omp_ssa pass_expand_omp_ssa see -Og ->#Og-pass_expand_omp, and -O1 ->#O1-pass_lower_omp pass_rebuild_cgraph_edges pass_target_clone pass_ipa_chkp_produce_thunks pass_ipa_auto_profile pass_ipa_tree_profile: pass_feedback_split_functions pass_ipa_free_fn_summary(small) #pass_ipa_increase_alignment pass_ipa_increase_alignment ->#O3-tree-loop-vectorize-pass_ipa_increase_alignment pass_ipa_tm pass_ipa_lower_emutls all_regular_ipa_passes: pass_ipa_whole_program_visibility #pass_ipa_profile pass_ipa_profile ->#Og-ipa-profile #pass_ipa_icf pass_ipa_icf ->#Os-ipa-icf #pass_ipa_devirt pass_ipa_devirt ->#Os-devirtualize see also --devirtualize-speculatively ->#Os-devirtualize-speculatively #pass_ipa_cp pass_ipa_cp ->#Os-ipa-cp see also --ipa-bit-cp ->#Os-ipa-bit-cp, --ipa-vrp ->#Os-ipa-vrp, and --ipa-cp-clone ->#O3-ipa-cp-clone pass_ipa_cdtor_merge pass_ipa_hsa pass_ipa_fn_summary #pass_ipa_inline pass_ipa_inline see -Og ->#Og-pass_ipa_inline, --inline-functions-called-once ->#O1-inline-functions-called-once, --inline-small-functions ->#Os-inline-small-functions, --indirect-inlining ->#Os-indirect-inlining, -Os ->#Os-inline-functions, -O2 ->#O2-no-inline-functions, and -O3 ->#O3-inline-functions #pass_ipa_pure_const pass_ipa_pure_const ->#Og-ipa-pure-const pass_ipa_free_fn_summary(!small) #pass_ipa_reference pass_ipa_reference ->#Og-ipa-reference pass_ipa_comdats all_late_ipa_passes: pass_materialize_all_clones pass_ipa_pta pass_omp_simd_clone all_passes: pass_fixup_cfg pass_lower_eh_dispatch pass_oacc_device_lower pass_omp_device_lower pass_omp_target_link pass_all_optimizations: pass_remove_cgraph_callee_edges pass_strip_predict_hints pass_ccp(nonzero) see above ->#pass_ccp pass_post_ipa_warn #pass_complete_unrolli pass_complete_unrolli ->#Os-pass_complete_unrolli see also --tree-loop-ivcanon ->#O1-tree-loop-ivcanon #pass_backprop pass_backprop ->#O1-ssa-backprop #pass_phiprop pass_phiprop ->#O1-tree-phiprop pass_forwprop see above ->#pass_forwprop pass_object_sizes(!insert_min_max) #pass_build_alias pass_build_alias ->#O1-tree-pta #pass_return_slot pass_return_slot ->#Og-pass_return_slot pass_fre see above ->#pass_fre pass_merge_phi see above ->#pass_merge_phi #pass_thread_jumps pass_thread_jumps ->#Os-expensive-optimizations-pass_thread_jumps #pass_vrp pass_vrp(warn_array_bounds) ->#Os-tree-vrp pass_chkp_opt pass_dce see above ->#pass_dce #pass_stdarg pass_stdarg ->#O1-stdarg-opt #pass_call_cdce pass_call_cdce ->#O1-tree-builtin-call-dce #pass_cselim pass_cselim ->#O1-tree-cselim #pass_copy_prop pass_copy_prop ->#Og-tree-copy-prop #pass_tree_ifcombine pass_tree_ifcombine ->#O1-pass_tree_ifcombine pass_merge_phi see above ->#pass_merge_phi #pass_phiopt pass_phiopt ->#O1-ssa-phiopt see also --hoist-adjacent-loads ->#Os-hoist-adjacent-loads pass_tail_recursion see above ->#pass_tail_recursion pass_ch see above ->#pass_ch pass_lower_complex #pass_sra pass_sra ->#O1-tree-sra pass_thread_jumps see above ->#pass_thread_jumps pass_dominator(may_peel_loop_headers) see above ->#pass_dominator #pass_isolate_erroneous_paths pass_isolate_erroneous_paths ->#Os-isolate-erroneous-paths-dereference #pass_phi_only_cprop pass_phi_only_cprop ->#O1-tree-dominator-opts pass_dse see above ->#pass_dse #pass_reassoc pass_reassoc(insert_powi) ->#O1-tree-reassoc pass_dce see above ->#pass_dce pass_forwprop see above ->#pass_forwprop pass_phiopt see above ->#pass_phiopt pass_ccp(nonzero) see above ->#pass_ccp #pass_cse_sincos pass_cse_sincos ->#Og-pass_cse_sincos #pass_optimize_bswap pass_optimize_bswap ->#Os-expensive-optimizations-pass_optimize_bswap #pass_laddress pass_laddress ->#O1-pass_laddress pass_lim see above ->#pass_lim pass_walloca(!strict_mode) #pass_pre pass_pre ->#Os-tree-pre see also --code-hoisting ->#Os-code-hoisting, --tree-tail-merge ->#Os-tree-tail-merge, and --tree-partial-pre ->#O3-tree-partial-pre #pass_sink_code pass_sink_code ->#Og-tree-sink pass_sancov pass_asan pass_tsan pass_dce see above ->#pass_dce #pass_fix_loops pass_fix_loops ->#O1-tree-loop-optimize #pass_tree_loop pass_tree_loop: ->#O1-tree-loop-optimize pass_tree_loop_init #pass_tree_unswitch pass_tree_unswitch ->#O3-unswitch-loops #pass_scev_cprop pass_scev_cprop ->#O1-tree-scev-cprop #pass_loop_split pass_loop_split ->#O3-split-loops #pass_loop_jam pass_loop_jam ->#O3-loop-unroll-and-jam pass_cd_dce see above ->#pass_cd_dce #pass_iv_canon pass_iv_canon ->#O1-tree-loop-ivcanon #pass_loop_distribution pass_loop_distribution ->#O3-tree-loop-distribution see also --tree-loop-distribute-patterns ->#O3-tree-loop-distribute-patterns #pass_linterchange pass_linterchange ->#O3-loop-interchange pass_copy_prop see above ->#pass_copy_prop pass_graphite: pass_graphite_transforms pass_lim see above ->#pass_lim pass_copy_prop see above ->#pass_copy_prop pass_dce see above ->#pass_dce pass_parallelize_loops(!oacc_kernels) pass_expand_omp_ssa see above ->#pass_expand_omp_ssa #pass_ch_vect pass_ch_vect ->#Og-tree-ch see also --tree-loop-vectorize ->#O3-tree-loop-vectorize-pass_ch_vect #pass_if_conversion pass_if_conversion ->#O3-tree-loop-if-convert #pass_vectorize pass_vectorize: ->#O3-tree-loop-vectorize see also --vect-cost-model=cheap ->#Os-vect-cost-model=cheap, and --vect-cost-model=dynamic ->#O3-vect-cost-model=dynamic pass_dce see above ->#pass_dce #pass_predcom pass_predcom ->#O3-predictive-commoning #pass_complete_unroll pass_complete_unroll ->#O3-pass_complete_unroll see also --tree-loop-ivcanon ->#O1-tree-loop-ivcanon, and --peel-loops ->#O3-peel-loops #pass_slp_vectorize pass_slp_vectorize ->#O3-tree-slp-vectorize see also --vect-cost-model=cheap ->#Os-vect-cost-model=cheap, and --vect-cost-model=dynamic ->#O3-vect-cost-model=dynamic pass_loop_prefetch #pass_iv_optimize pass_iv_optimize ->#O1-ivopts pass_lim see above ->#pass_lim pass_tree_loop_done #pass_tree_no_loop pass_tree_no_loop: ->#O1-tree-loop-optimize pass_slp_vectorize see above ->#pass_slp_vectorize pass_simduid_cleanup #pass_lower_vector_ssa pass_lower_vector_ssa see -Og ->#Og-pass_lower_vector #pass_cse_reciprocals pass_cse_reciprocals ->#Ofast-reciprocal-math pass_sprintf_length(fold_return_value) pass_reassoc(!insert_powi) see above ->#pass_reassoc #pass_strength_reduction pass_strength_reduction ->#Og-tree-slsr see also --expensive-optimizations ->#Os-expensive-optimizations-pass_strength_reduction #pass_split_paths pass_split_paths ->#O3-split-paths pass_tracer pass_thread_jumps see above ->#pass_thread_jumps pass_dominator(!may_peel_loop_headers) see above ->#pass_dominator #pass_strlen pass_strlen ->#O2-optimize-strlen pass_thread_jumps see above ->#pass_thread_jumps pass_vrp(!warn_array_bounds) see above ->#pass_vrp pass_warn_restrict pass_phi_only_cprop see above ->#pass_phi_only_cprop pass_dse see above ->#pass_dse pass_cd_dce see above ->#pass_cd_dce pass_forwprop see above ->#pass_forwprop pass_phiopt see above ->#pass_phiopt #pass_fold_builtins pass_fold_builtins see -Og ->#Og-pass_fold_builtins, and --inline-atomics ->#O1-inline-atomics #pass_optimize_widening_mul pass_optimize_widening_mul ->#Os-expensive-optimizations-pass_optimize_widening_mul #pass_store_merging pass_store_merging ->#Os-store-merging #pass_tail_calls pass_tail_calls ->#Os-optimize-sibling-calls pass_dce see above ->#pass_dce #pass_split_crit_edges pass_split_crit_edges ->#Og-pass_split_crit_edges pass_late_warn_uninitialized #pass_uncprop pass_uncprop ->#O1-tree-dominator-opts pass_local_pure_const see above ->#pass_local_pure_const pass_all_optimizations_g: pass_remove_cgraph_callee_edges pass_strip_predict_hints pass_lower_complex pass_lower_vector_ssa see above ->#pass_lower_vector_ssa pass_ccp(nonzero) see above ->#pass_ccp pass_post_ipa_warn pass_object_sizes pass_fold_builtins see above ->#pass_fold_builtins pass_sprintf_length(fold_return_value) pass_copy_prop see above ->#pass_copy_prop pass_dce see above ->#pass_dce pass_sancov pass_asan pass_tsan pass_split_crit_edges see above ->#pass_split_crit_edges pass_late_warn_uninitialized pass_uncprop see above ->#pass_uncprop pass_local_pure_const see above ->#pass_local_pure_const pass_tm_init: pass_tm_mark pass_tm_memopt pass_tm_edges pass_simduid_cleanup pass_vtable_verify pass_lower_vaarg #pass_lower_vector pass_lower_vector see -Og ->#Og-pass_lower_vector pass_lower_complex_O0 pass_sancov_O0 pass_lower_switch pass_asan_O0 pass_tsan_O0 pass_sanopt pass_cleanup_eh see above ->#pass_cleanup_eh pass_lower_resx pass_nrv pass_cleanup_cfg_post_optimizing pass_warn_function_noreturn pass_gen_hsail #pass_expand pass_expand see -Og ->#Og-pass_expand, --tree-coalesce-vars ->#Og-tree-coalesce-vars, --tree-ter ->#Og-tree-ter, --defer-pop ->#Og-defer-pop, and --expensive-optimizations ->#Os-expensive-optimizations-pass_strength_reduction pass_rest_of_compilation: pass_instantiate_virtual_regs pass_into_cfg_layout_mode #pass_jump pass_jump see -Og ->#Og-pass_jump, and --thread-jumps ->#Os-thread-jumps #pass_lower_subreg pass_lower_subreg ->#Og-split-wide-types #pass_df_initialize_opt pass_df_initialize_opt see -Og ->#Og-pass_df_initialize_opt #pass_cse pass_cse ->#Og-pass_cse see also --expensive-optimizations ->#Os-expensive-optimizations-pass_strength_reduction, --rerun-cse-after-loop ->#Os-rerun-cse-after-loop, and --cse-follow-jumps ->#Os-cse-follow-jumps #pass_rtl_fwprop pass_rtl_fwprop ->#Og-forward-propagate #pass_rtl_cprop pass_rtl_cprop ->#Os-gcse #pass_rtl_pre pass_rtl_pre ->#Os-gcse #pass_rtl_hoist pass_rtl_hoist ->#Os-gcse pass_rtl_cprop see above ->#pass_rtl_cprop pass_rtl_store_motion #pass_cse_after_global_opts pass_cse_after_global_opts ->#Os-rerun-cse-after-loop see also --cse-follow-jumps ->#Os-cse-follow-jumps #pass_rtl_ifcvt pass_rtl_ifcvt ->#O1-if-conversion pass_reginfo_init pass_loop2: pass_rtl_loop_init #pass_rtl_move_loop_invariants pass_rtl_move_loop_invariants ->#O1-move-loop-invariants see also -Og ->#Og-pass_rtl_move_loop_invariants pass_rtl_unroll_loops #pass_rtl_doloop pass_rtl_doloop ->#O1-branch-count-reg pass_rtl_loop_done pass_web pass_rtl_cprop see above ->#pass_rtl_cprop #pass_cse2 pass_cse2 ->#Os-rerun-cse-after-loop see also --cse-follow-jumps ->#Os-cse-follow-jumps #pass_rtl_dse1 pass_rtl_dse1 ->#Og-dse #pass_rtl_fwprop_addr pass_rtl_fwprop_addr ->#Og-forward-propagate #pass_inc_dec pass_inc_dec ->#Og-auto-inc-dec #pass_initialize_regs pass_initialize_regs ->#Og-pass_initialize_regs #pass_ud_rtl_dce pass_ud_rtl_dce ->#Os-dce(ud) #pass_combine pass_combine ->#Og-pass_combine see also --expensive-optimizations ->#Os-expensive-optimizations-pass_strength_reduction #pass_if_after_combine pass_if_after_combine ->#O1-if-conversion pass_partition_blocks pass_outof_cfg_layout_mode pass_split_all_insns #pass_lower_subreg2 pass_lower_subreg2 ->#Og-split-wide-types pass_df_initialize_no_opt pass_stack_ptr_mod pass_mode_switching pass_match_asm_constraints pass_sms pass_live_range_shrinkage #pass_sched pass_sched ->#O2-schedule-insns #pass_early_remat pass_early_remat ->#Os-pass_early_remat #pass_ira pass_ira see -Og ->#Og-pass_ira, --ira-share-save-slots ->#Og-ira-share-save-slots, --omit-frame-pointer ->#Og-omit-frame-pointer, -Os ->#Os-pass_ira, --expensive-optimizations ->#Os-expensive-optimizations-pass_strength_reduction, --caller-saves ->#Os-caller-saves, --ipa-ra ->#Os-ipa-ra, and --lra-remat ->#Os-lra-remat #pass_reload pass_reload see -Og ->#Og-pass_reload, and --expensive-optimizations ->#Os-expensive-optimizations-pass_strength_reduction pass_postreload: #pass_postreload_cse pass_postreload_cse ->#Os-expensive-optimizations-pass_strength_reduction #pass_gcse2 pass_gcse2 ->#O3-gcse-after-reload #pass_split_after_reload pass_split_after_reload ->#Og-pass_split_after_reload pass_ree #pass_compare_elim_after_reload pass_compare_elim_after_reload ->#Og-compare-elim pass_branch_target_load_optimize1 #pass_thread_prologue_and_epilogue pass_thread_prologue_and_epilogue see -Og ->#Og-pass_jump, and --shrink-wrap ->#Og-shrink-wrap #pass_rtl_dse2 pass_rtl_dse2 ->#Og-dse #pass_stack_adjustments pass_stack_adjustments ->#Og-combine-stack-adjustments #pass_jump2 pass_jump2 see --crossjumping ->#Os-crossjumping #pass_duplicate_computed_gotos pass_duplicate_computed_gotos ->#Os-expensive-optimizations-pass_duplicate_computed_gotos pass_sched_fusion #pass_peephole2 pass_peephole2 ->#Os-peephole2 #pass_if_after_reload pass_if_after_reload ->#O1-if-conversion2 pass_regrename #pass_cprop_hardreg pass_cprop_hardreg ->#Og-cprop-registers #pass_fast_rtl_dce pass_fast_rtl_dce ->#Og-dce(fast) see also -Og ->#Og-pass_fast_rtl_dce #pass_reorder_blocks pass_reorder_blocks ->#Og-reorder-blocks see also --reorder-blocks-algorithm=stc ->#O2-reorder-blocks-algorithm=stc pass_branch_target_load_optimize2 pass_leaf_regs #pass_split_before_sched2 pass_split_before_sched2 ->#Os-schedule-insns2 #pass_sched2 pass_sched2 ->#Os-schedule-insns2 pass_stack_regs: #pass_split_before_regstack pass_split_before_regstack ->#Og-pass_split_after_reload pass_stack_regs_run pass_late_compilation: #pass_compute_alignments pass_compute_alignments see --align-loops ->#Os-align-loops, --align-jumps ->#Os-align-jumps, --align-labels ->#Os-align-labels, and --align-functions ->#Os-align-functions #pass_variable_tracking pass_variable_tracking ->#Og-pass_variable_tracking pass_free_cfg pass_machine_reorg pass_cleanup_barriers #pass_delay_slots pass_delay_slots ->#Og-delayed-branch pass_split_for_shorten_branches pass_convert_to_eh_region_ranges #pass_shorten_branches pass_shorten_branches see -Og ->#Og-pass_shorten_branches pass_set_nothrow_function_flags pass_dwarf2_frame #pass_final pass_final see -Og ->#Og-pass_final, --peephole ->#Og-peephole, and --ipa-ra ->#Os-ipa-ra pass_df_finish pass_clean_state #build #gimplify Before optimizations, the program is parsed so as to build ->#Og-build a tree representation, that is then gimplified ->#Og-gimplify. #TODO_cleanup_cfg #TODO_rebuild_alias #TODO_remove_unused_locals Some optimization passes run such cleanup passes as TODO_cleanup_cfg ->#Og-TODO_cleanup_cfg, TODO_rebuild_alias ->#O1-tree-pta, and TODO_remove_unused_locals ->#Og-TODO_remove_unused_locals. #strict-aliasing #varasm #fast-math There are other flags that affect too many passes to mention, such as --strict-aliasing ->#Os-strict-aliasing, --merge-constants ->#Og-merge-constants and --fast-math ->#Ofast-fast-math, or that cannot be associated with any optimization pass, such as --reorder-functions ->#Os-reorder-functions. #O0 -O0: optimize=0 Disable optimization. This flag sets optimization level to 0. This is the base level, the golden standard for the debugging experience, against which other levels are compared. All automatic variables and parameters are allocated to memory, being loaded and, if modified, stored back, at every use. All branches and labels are preserved, and no blocks are duplicated. Functions are not inlined, except for mandatory inlines, e.g., functions marked with attribute always_inline. Source locations preserved from branches or returns only in CFG edges are materialized as NOPs. #Og -Og: optimize=1 + debug Perform only very fast optimizations with low impact on debugging. This flag sets the optimization level to 1, but limited by an option for better debugging that disables a number of optimizations, even some that would otherwise be enabled at optimization level 1. #Og-build --- build ->#build Optimization enables the selection of the local dynamic TLS model to access thread-local variables known to be defined in the dynamic module being compiled. Without that, the global dynamic TLS model is used instead, but this change has no effect on debugging. Type conversions attempt to substitute conversions to float of results of standard calls that return double to calls that return float. Likewise, conversions to integral types of results of standard calls that return double (e.g. round, logb) are converted to calls that return integral types (lround, ilogb). These only affect debugging inasmuch as the behavior of the substituted functions is to be inspected. #Og-gimplify --- gimplify ->#gimplify Small changes in the processing of nested functions that enable frame structs and static chains to be optimized away, without impact on debugging, and in representing variable-length arrays in nested functions, which may lose some details about the types. #Og-pass_expand_omp --- pass_expand_omp ->#pass_expand_omp, and pass_expand_omp_ssa ->#pass_expand_omp_ssa Some OpenMP primitives may also be simplified when optimization is enabled. These are internal implementation details, so they shouldn't affect debugging. #Og-pass_lower_eh --- pass_lower_eh ->#pass_lower_eh Gimple EH lowering decisions change with optimization, but finally regions may be duplicated either way, and with the same minor effects on debugging: different code addresses for the same source code lines. #Og-pass_split_crit_edges --- pass_split_crit_edges ->#pass_split_crit_edges, and pass_cleanup_eh ->#pass_cleanup_eh Critical edges are also split to ease optimizations, and later unsplit if they remain. #Og-pass_ipa_inline --- pass_ipa_inline ->#pass_ipa_inline Optimization affects slightly the way variables and parameters are remapped when inlining, but these changes have their effects on debug information masked away. #Og-TODO_cleanup_cfg --- TODO_cleanup_cfg ->#TODO_cleanup_cfg When optimizing, various passes run cleanups of the control flow graph. This may delete unreachable blocks and trivially dead insns like unused sets or copies to self. In gimple mode, the removal of unreachable blocks may propagate SSA defs to uses, but it is hard to imagine that any uses thereof will be reachable, so there should be no impact on debugging. Removed blocks may be missed during debugging: breakpoints can't be set in removed blocks. Cleanup may renumber basic blocks, detect forwarder blocks, remove unused labels and fallthrough forwarder blocks, merge blocks with unconditional fallthrough, replace jumps to returns or jumps with copies of the targets, simplify conditional jumps and remove single-destination jumps. The removal of fallthrough forwarder blocks may discard debug binds and markers, which could make single-stepping or breaking at the source locations represented by the removed markers impossible. Binds might also be lost, though at least in gimple there will often be redundant binds at confluence points, shortly thereafter. A similar negative effect arises when a jump is replaced with a return or another jump, bypassing any debug markers and binds at the original target's block. When optimizing, NOPs that would materialize CFG edge source locations are not inserted, and extra steps that preserve source locations during gimplification of jumps and labels are not taken. If corresponding debug markers are also dropped, this may remove the possibility of stopping at some goto. #Og-TODO_remove_unused_locals --- TODO_remove_unused_locals ->#TODO_remove_unused_locals Optimization enables unused local variables and lexical blocks to be released early; it may cause variables and scopes that cannot ever be entered to be omitted altogether from debug information. #Og-pass_return_slot --- pass_return_slot ->#pass_return_slot Optimization enables the named return value pass, that detects functions that return aggregate types in memory, always returning the same local variable, and unifies that variable with the result, using the name and source location of the variable, and mapping all uses of the variable to the result. This may have an effect on debugging if the variable happens to be taken from an inlined function: in this case, the source name and location mapping is skipped, because it would introduce a name not present in the original function, but the variable is still remapped to the return declaration, so the source location of the variable's declaration is lost. #Og-pass_cse_sincos --- pass_cse_sincos ->#pass_cse_sincos Optimization enables a pass that combines calls to sin, cos and cexpi with the same SSA operand into a single dominating cexpi call, taking the real or imaginary part of the result at each former sin or cos call. This pass also attempts to simplify pow, powi and cabs calls. None of these affect debugging, aside from the ability to step into any of the affected math function calls. #Og-pass_fold_builtins --- pass_fold_builtins ->#pass_fold_builtins With optimization, a pass that simplifies memcpy to memset if the copied-from range is known to be all zeros, some stdarg calls to simple pointer operations if va_list is a simple pointer type, and other similar transformations that do not affect debugging, aside from stepping into or breaking at simplified functions. #Og-pass_lower_vector --- pass_lower_vector ->#pass_lower_vector, and pass_lower_vector_ssa ->#pass_lower_vector_ssa Optimization enables attempts to optimize divide and modulus operations on vectors of integral types into combinations of vector multiply, shift, and add. It also enables attempts to optimize initialization of vectors to avoid piecewise initialization. None of these affect debugging. #Og-pass_expand --- pass_expand ->#pass_expand Enabling optimization changes defer_stack_allocation behavior, but its effect on debugging is limited to narrowing the live ranges of dead values. It also enables reordering of operations in expand, so that those requiring more operands are performed first. This reordering does not involve memory-modifying operations, and debug binds cover affected cases, so it does not affect debugging. Expand also introduces plenty of pseudos when optimizing, which allows replacement of common subexpressions and whatnot. Conversely, gimplification introduces more temporaries when not optimizing, and it attempts to reuse temporaries when optimizing. The effects on debugging are limited to variations in variable location assignments. #Og-pass_jump --- pass_jump ->#pass_jump, and pass_thread_prologue_and_epilogue ->#pass_thread_prologue_and_epilogue The jump and pro_and_epilogue RTL passes run cleanup_cfg with CLEANUP_EXPENSIVE, given optimize. This performs some more expensive block merging, and simplification of conditional jumps around jumps. The merging has no effect on debugging (indeed, it could reduce the loss of debug markers and binds if done on forwarder blocks), whereas the simplification might drop markers and binds along with the jumps, with impact on debugging similar to that of the other jump simplifications. #Og-pass_df_initialize_opt --- pass_df_initialize_opt ->#pass_df_initialize_opt Several RTL optimization passes also use dataflow analysis to update notes about unused register definitions, as well as death points of registers. Debug binds that reference registers after their death points or unused sets are detected during this analysis, and debug temporaries are introduced next to the death points to preserve the equivalent expressions for use in the debug binds. This generally improves the debugging experience, enabling bind expressions to resort to the equivalences to express the values bound to user variables even if the register is reused for another purpose and no longer holds the value. #Og-pass_cse --- pass_cse ->#pass_cse The first CSE (common subexpression elimination) pass is enabled when optimizing. The effects of this pass are described under --rerun-cse-after-loop ->#Os-rerun-cse-after-loop. A third CSE pass may be activated with --rerun-cse-after-global-opts. #Og-pass_rtl_move_loop_invariants --- pass_rtl_move_loop_invariants ->#pass_rtl_move_loop_invariants Depending on the selected register allocation model, optimization changes register pressure cost estimates in the RTL loop analyzers, but that's not something that changes the kinds of optimizations made there, or the kinds of impacts on debugging they may have. #Og-pass_initialize_regs --- pass_initialize_regs ->#pass_initialize_regs Optimization enables the init-regs pass, that adds zero-initialization for pseudos before uninitialized uses, without effects on debugging. #Og-pass_combine --- pass_combine ->#pass_combine Optimization enables combine, a pass that performs arithmetic substitution of single-use pseudo-set insns into others. After successful substitution, insns become useless and are removed, but if their values are still used in debug binds, the binds are updated accordingly, and markers ensure the bind effects are still visible. Therefore, this pass has no effect on debugging. #Og-pass_ira --- pass_ira ->#pass_ira It also changes the default register allocation region setting, without effects on debugging. #Og-pass_reload --- pass_reload ->#pass_reload Optimization enables reload inheritance and removal of redundant reload stores, without effects on debugging. #Og-pass_split_after_reload --- pass_split_after_reload ->#pass_split_after_reload, and pass_split_before_regstack ->#pass_split_before_regstack Additional insn splitting passes are enabled after reload when optimizing, without any effects on debugging; any impact would have been brought about by later splitting passes anyway. #Og-pass_fast_rtl_dce --- pass_fast_rtl_dce ->#pass_fast_rtl_dce Several RTL optimization passes run a fast dead code elimination subpass, at the end of the live registers dataflow analysis, as long as --dce is enabled; see --dce(fast) ->#Og-dce(fast) for details. #Og-pass_variable_tracking --- pass_variable_tracking ->#pass_variable_tracking Optimization enables variable tracking, debug binds and markers, to try to mask the effects of optimizations on debugging. They are not needed without optimization. #Og-pass_shorten_branches --- pass_shorten_branches ->#pass_shorten_branches When optimizing, insn lengths are estimated with multiple passes that grow lengths as needed, which may result in shorter variants, without effects on debugging. #Og-pass_final --- pass_final ->#pass_final Final may discard redundant compares when optimizing. It also links back single-use labels to jumps to them, for use in machine-specific transformations such as SH's constant pool placement. These transformations have no effect on debugging. #Og-tree-ccp --tree-ccp: pass_ccp ->#pass_ccp Enable SSA-CCP optimization on trees. Conditional constant propagation attempts to determine the value of conditions that control conditional branches. It may simplify (fold) some calls and assigns into constant assignments, and turn conditional branches into unconditional ones, possibly dropping blocks that become unreachable. The most significant effect on the debugging experience is that setting breakpoints at certain source code ranges may become impossible as the blocks containing them are dropped. The extra folding might make additional lines not be represented by any instructions, but SFN provides markers to stand for them, and VTA and LVu ensure the effects of the optimized-away code can be inspected even without remaining instructions, so the overall impact of this pass on the debugging information is likely negligible. #Og-tree-fre --tree-fre: pass_fre ->#pass_fre Enable Full Redundancy Elimination (FRE) on trees. This pass uses value numbering to identify and remove redundant SSA computations, replacing them with previously-computed results, while also propagating copies, removing dead computations, folding computations, and resolving conditional branches and indirect calls. Changes are only relevant for debugging sessions that would modify variables to create situations that wouldn't normally arise at runtime. The substitutions and folding have no effect on debugging, unless variables are changed in the debugger so as to break the equivalences. Stmt removals are masked by debug binds, markers and views. Resolving conditional branches may remove entire blocks if they aren't reachable to begin with, but the consequent inability to set breakpoints on them could be surprising, especially if the debugging session were to change variables so as to try to force the execution of the unreachable block. Resolving indirect calls to direct ones might also surprise attempts to modify pointers in a debug session, attempting to cause a different function to be called. #Og-tree-dse --tree-dse: pass_dse ->#pass_dse Enable dead store elimination. This pass removes stores and mem* calls that modify memory that is overwritten without intervening reads. Addressable variables, that might be modified by such removed stmts, are not tracked by debug binds, so debugging sessions might be confusing as expected effects of removed dead stores will not be observable. #Og-guess-branch-probability --guess-branch-probability: pass_profile ->#pass_profile Enable guessing of branch probabilities. No effect on debugging per se. #Og-tree-ch --tree-ch: pass_ch ->#pass_ch, and pass_ch_vect ->#pass_ch_vect Enable loop header copying on trees. This pass copies loop headers, turning the copies into entry tests. Debug binds in the copied blocks are also copied to the post-loop block, modeling the binds introduced after PHI nodes when entering SSA. With those additional bindings, duplicating the header blocks does not impact debugging significantly within the copied blocks or after them. One possibly confusing consequence is that setting a breakpoint at the current program counter, while single-stepping the loop entry test, will not break at subsequent iterations, and vice-versa. This is unlikely to be surprising, and setting breakpoints by line overcomes this effect. User labels, that would not be present in the copy, could make for further confusion, but if they provide for additional edges into the loop header, they will actually stop the transformation from taking place. When --tree-loop-vectorize ->#O3-tree-loop-vectorize is enabled, another ch_vect pass is activated, that differs from the regular ch pass only in deciding which loops are to undergo such header copying, so both passes have essentially the same effects on debugging. #Og-tree-dce --tree-dce: pass_dce ->#pass_dce, and pass_cd_dce ->#pass_cd_dce Enable SSA dead code elimination optimization on trees. This may remove assignments, branches and even some calls that are deemed unused/dead. Dead assignments are propagated into debug stmts before removal, which makes the removal itself not to affect debugging. Dead branches may cause entire blocks to be removed, making any expectation of stepping through or setting breakpoints at such blocks during debugging impossible to meet. Pure or const calls, as well as malloc and free pairs that are deemed dead may be removed, frustrating expectations of stepping into them during debugging. #Og-ipa-profile --ipa-profile: pass_ipa_profile ->#pass_ipa_profile Perform interprocedural profile propagation. This pass propagates execution frequencies from callers to callees. Also, upon identifying the target of an indirect call from execution profiles, it introduces a speculative direct call that can then be inlined or otherwise optimized. None of this affects debugging. #Og-ipa-pure-const --ipa-pure-const: pass_ipa_pure_const ->#pass_ipa_pure_const, and pass_local_pure_const ->#pass_local_pure_const Discover pure and const functions. Detect and mark functions on whether or not they have side effects, loop, or throw, and propagate the information to decide about callers. This, by itself, has no effect on debugging, but it may enable the elision of calls that would return the same value, without any other side effects, of functions that are not explicitly marked as pure or const, and this elision may be slightly confusing for debugging, as such functions may be called (and hit breakpoints) fewer times than expected, and stepping into elided calls will not be possible. #Og-ipa-reference --ipa-reference: pass_ipa_reference ->#pass_ipa_reference Discover readonly and non addressable static variables. This pass analyses how static variables are used by functions, and propagates the gathered information to callers, so that it can be used in later optimizations. There aren't any effects on debugging. #Og-tree-copy-prop --tree-copy-prop: pass_copy_prop ->#pass_copy_prop Enable copy propagation on trees. This pass identifies and simplifies expressions based on copy-related SSA names. This may unify multiple variables into a single location, in ranges in which they take up equivalent values, making it impossible to modify them independently in the debugger. The identification of such equivalences may also resolve conditional branches to unconditional ones, removing entire basic blocks and the possibility of overriding the conditions in the debugger. #Og-tree-sink --tree-sink: pass_sink_code ->#pass_sink_code Enable SSA code sinking on trees. This pass moves statements down the control flow, closer to uses thereof, when it may be profitable, and removes them when they are unused. As the DEF is removed from a position that dominates a debug bind, the bind is adjusted, masking the effects on debugging, at least as far as scalars are concerned. Addressable variables are not subject to value tracking in debug binds, and so the delaying of stores may actually be observable during debugging. #Og-tree-slsr --tree-slsr: pass_strength_reduction ->#pass_strength_reduction Perform straight-line strength reduction. This pass replaces computations involving multiplies into ones involving adds, in some cases introducing additional temporaries. In the end, trackable variables end up getting the same values, just computed in a different way, so this does not affect debugging. #Og-tree-coalesce-vars --tree-coalesce-vars: pass_expand ->#pass_expand Enable SSA coalescing of user variables. This flag allows the compiler to assign to a single pseudo-register SSA versions originally created for different user variables. With the aid of debug binds, this has very little effect on debugging: the impact is limited to early loss of values expected to be about to be overwritten, e.g. when an earlier value of a variable is already dead, and the location holding it is overwritten by a value computed for a temporary or for another variable, before being copied to the former variable. Between the computation point and the binding point, attempting to inspect the variable may indicate it is optimized out at that point, which is perfectly accurate, if undesirable from a debugging perspective. #Og-tree-ter --tree-ter: pass_expand ->#pass_expand Replace temporary expressions in the SSA->normal pass. This substitutes singly-used SSA defs into their single (non-debug) uses for expand to have larger expressions to select insns from. Debug binds may end up with more complex expressions than needed, bound before the actual computation of the larger expression takes place, but this does not affect debugging. #Og-defer-pop --defer-pop: pass_expand ->#pass_expand Defer popping functions args from stack until later. No effect on debugging. #Og-split-wide-types --split-wide-types: pass_lower_subreg ->#pass_lower_subreg, and pass_lower_subreg2 ->#pass_lower_subreg2 Split wide types into independent registers. This flag enables two RTL lowering passes that explode wide-mode pseudos into multiple word-mode ones. In many cases this modifies insns in place, but it occasionally emits multiple insns to replace a single one. In no such case does it affect debugging. Such splitting may be performed on user variables, and although we can represent variable locations with independent locations for different fragments, such wide variables do not always get debug binds at assignments for tracking throughout compilation. Location inference from DECLs associated with REGs and MEMs is used for fragments of such variables instead, which does correctly identify locations, but not necessarily at points of the program that reflect the recommended inspection points. This may cause debugging sessions to observe changes to such variables too early or too late, which can make debugging confusing. Adding debug binds for the fragments, and arranging for GCC to aggregate them back, might get more accurate information, but since this would be done at such a late stage, it is possible that the binds would be introduced at points that do not satisfy the usual expectation that side effects would take place between the markers immediately before and after the assignment. There are also issues with dismembered aggregates, mentioned under --tree-sra ->#O1-tree-sra, that would likely affect such split variables as well. #Og-forward-propagate --forward-propagate: pass_rtl_fwprop ->#pass_rtl_fwprop, and pass_rtl_fwprop_addr ->#pass_rtl_fwprop_addr Perform a forward propagation pass on RTL. These RTL passes replace uses of a pseudo with its single reaching definition. This in itself has no impact on debugging. If a pseudo is propagated into all uses, it will become unused, but then it will have been substituted into debug binds as well and, if not, the unused def might end up preserved as a debug temp. There is a possibility that, by propagating a pseudo, it becomes dead earlier, and then, after register allocation, debug binds that referenced it while it was still set end up finding the register reused for other purposes earlier than without this transformation. Since the propagation found the source of the definition was available all the way to the propagation point, and the equivalence between the propagated pseudo and its definition is noted by the variable tracking machinery at the definition point, it is very likely that an alternate expression for the register value will be found. #Og-dse --dse: pass_rtl_dse1 ->#pass_rtl_dse1, and pass_rtl_dse2 ->#pass_rtl_dse2 Use the RTL dead store elimination pass. This flag is enabled by default, but it's only activated when optimizing. The RTL passes enabled by it remove stores in memory that are overwritten without intervening reads, that store the same value as the previous store, or that write a value to the stack that is not read before the function returns. Since it affects addressable variables, global or local, debug binds do not apply, and so the effects of removing these stores are going to be noticeable in debugging, except for the redundant stores. #Og-auto-inc-dec --auto-inc-dec: pass_inc_dec ->#pass_inc_dec Generate auto-inc/dec instructions. The flag is enabled by default, but it's only activated when optimizing, and when the target architecture supports auto inc or auto dec addressing modes. It detects insns that add or subtract a constant or pseudo from a pseudo before or after the pseudo or a copy thereof is used in a memory reference, and it attempts to turn the memory address into a pre- or post-inc, -dec or -mod addressing mode. This may cause one of the pseudos to change earlier or later than expected, and although this is only done when the pseudo is not otherwise used between the original and modified modification insns, debug binds between them are not adjusted, so they will bind to the wrong value, and when the pseudo is modified even that incorrect location may be lost. #Og-ira-share-save-slots --ira-share-save-slots: pass_ira ->#pass_ira Share slots for saving different hard registers. The flag is enabled by default, but it's only activated when optimizing. It allows registers whose lifetimes do not overlap to be saved in the same slot across calls. This could shorten the apparent live range of variables, making them unavailable at spots in which they might be in the absence of this flag. #Og-omit-frame-pointer --omit-frame-pointer: pass_ira ->#pass_ira When possible do not generate stack frames. This flag attempts to avoid reserving and using a register as a frame pointer, using stack pointer-relative addresses as needed. A frame pointer register used to be essential for debugging, but call frame information obviated it: it is now irrelevant for this purpose, and this optimization has no effect on debugging. #Og-compare-elim --compare-elim: pass_compare_elim_after_reload ->#pass_compare_elim_after_reload Perform comparison elimination after register allocation has finished. This pass removes redundant compare insns, relying on insns that set flags as side effects instead. It has no effect on debugging. #Og-shrink-wrap --shrink-wrap: pass_thread_prologue_and_epilogue ->#pass_thread_prologue_and_epilogue Emit function prologues only before parts of the function that need it, rather than at the top of the function. This pass attempts to inserts the prologue sequence at a later point than the entry point, which may involve duplicating some blocks and moving non-prologue early insns down to other blocks. The moved insns are simple enough that debug binds can be adjusted and mask the moves, so it does not affect debugging. Block duplication has little to no impact on debugging, though breakpoints set based on code addresses, rather than on logical locations, may notice the difference. The later prologue may confuse debuggers that assume the end of the epilogue, noted in debug information, marks the beginning of user code: such debuggers will likely be significantly affected by this optimization. #Og-combine-stack-adjustments --combine-stack-adjustments: pass_stack_adjustments ->#pass_stack_adjustments Looks for opportunities to reduce stack adjustments and stack references. This flag consolidates consecutive stack allocations, consecutive stack deallocations, or deallocations followed by allocations, within single blocks, adjusting stack pointer-relative addresses as needed. It has no effect on debugging. #Og-cprop-registers --cprop-registers: pass_cprop_hardreg ->#pass_cprop_hardreg Perform a register copy-propagation optimization pass. This pass only replaces (pseudos assigned to) hard regs in SET_SRCs with earlier-defined equivalent values, and removes noop moves. Substitutions are made in debug bind insns too. So, aside from noop moves that stood for source lines on their own in non-SFN settings, this shouldn't affect the debugging experience in any way. #Og-dce(fast) --dce(fast): pass_fast_rtl_dce ->#pass_fast_rtl_dce Use the RTL dead code elimination pass. This flag is enabled by default, but the fast rtl_dce pass is only activated when optimizing. Insns are regarded as dead if they only set registers and none of them are live. Dead sets used in debug binds are preserved in debug temps, so this does not affect debugging. #Og-reorder-blocks --reorder-blocks: pass_reorder_blocks ->#pass_reorder_blocks Reorder basic blocks to improve code placement. The reorder blocks pass attempts to increase the number of fallthrough edges by moving basic blocks. This may remove the possibility of breaking at explicit goto statements. #Og-delayed-branch --delayed-branch: pass_delay_slots ->#pass_delay_slots Attempt to fill delay slots of branch instructions. This pass moves insns about, attempting to fill delay slots on arches that support them, most often of calls, branches, jumps and returns. It runs after var-tracking, and it may move insns across debug bind notes that would be affected by it, potentially confusing location information. It may create opportunities for jumps to jumps to be redirected to the ultimate jump target, which may invalidate breakpoints that could have been set at the bypassed jumps. On a few arches, calls followed by jumps may have their delay slots filled with insns that modify the register holding the return address for the call, which may confuse debuggers as to the point of the call, including the recovery of entry-point values from the caller frame and location information. Conditional markers might enable CFG simplifications without invalidating breakpoints, but failing that, it would probably be wise to disable this and return address adjustments at -Og ->#Og. #Og-peephole --peephole: pass_final ->#pass_final Enable machine specific peephole optimizations. This flag is enabled by default, but it is only activated if optimization is enabled, on machines that define peepholes, not to be confused with the newer peephole2, handled by --peephole2 ->#Os-peephole2. Unlike peephole2, these older peepholes recognize sequence of insns during the final pass and output assembly code directly. Any debug notes between insns that are recognized as a peephole group are moved before or after the peephole output, which keeps markers mostly correct, but may corrupt binds. #Og-merge-constants --merge-constants: varasm ->#varasm Attempt to merge identical constants across compilation units. With this flag, constant pool entries and other constants that do not amount to objects that may have their addresses taken and compared (or --merge-all-constants is given, requesting even such read-only objects to be merged), are emitted in mergeable sections so that the linker can detect and remove duplicates. This may affect debugging inasmuch as the address/identity of the unified objects matters; since so-unified objects are usually string literals and initializers, rather than user-visible variables, this should seldom if ever affect debugging. #O1 -O1: optimize=1 Perform only very fast optimizations. This option sets the optimization level to 1. #O1-pass_lower_omp --- pass_lower_omp ->#pass_lower_omp, pass_expand_omp ->#pass_expand_omp, and pass_expand_omp_ssa ->#pass_expand_omp_ssa With -O0 ->#O0 or -Og ->#Og, the maximum vectorization factor for OpenMP is limited to 1. At -O1 ->#O1 or higher, target-specific vector sizes are used instead. #O1-pass_merge_phi --- pass_merge_phi ->#pass_merge_phi Basic blocks containing only PHI nodes, debug binds and markers may be dropped altogether by the mergephi pass. Dropping markers could make some statements impossible to stop at when stepping, and dropping binds makes their side effects not visible, so that earlier binds seem to remain effective. It might be possible to move the binds and markers into the destination block so as to keep them as conditionals. #O1-pass_tree_ifcombine --- pass_tree_ifcombine ->#pass_tree_ifcombine Pairs of tests guarding conditional blocks in && or || arrangements may be combined into a single test by the ifcombine pass. The block holding the second test becomes unconditional, so any markers and binds in it will take effect even when they shouldn't. Further optimizations are enabled if the then block is a forwarder to the else block, or vice-versa (a forwarder block is empty except for phi nodes, debug binds and markers). These may further confuse debugging changing the situations in which the forwarder's binds and markers take effect. Conditional binds and markers may alleviate these problems. #O1-pass_laddress --- pass_laddress ->#pass_laddress The laddress pass lowers address-taking operations that are not invariant, so as to expose the computations involving offsets and array indexing to optimizers. It has no effect on debugging. #O1-tree-bit-ccp --tree-bit-ccp: pass_ccp ->#pass_ccp Enable SSA-BIT-CCP optimization on trees. This flag modifies slightly the behavior of the SSA tree-ccp pass ->#Og-tree-ccp, so that it keeps track of individual bits in SSA registers, rather than just entire registers. This allows some further simplifications, especially of conditional branches based on individual bits. This does not introduce any new kind of impact on the debugging experience but it may make further blocks unreachable and thus unavailable for breakpointing, and further assignments reduced to reuse of constants without additional code. #O1-tree-forwprop --tree-forwprop: pass_forwprop ->#pass_forwprop Enable forward propagation on trees. This pass, enabled by default but activated only at -O1 ->#O1 or higher, is run up to 3 times on each function. It substitutes expressions assigned to SSA names into uses thereof, folding statements in place. This doesn't affect debugging, but other transformations made by these passes do. Loads of complex types whose real or imaginary parts are used separately are broken up into separate component loads, but debug binds referencing the complex value loaded from memory are reset, degrading debug information: the bind stmt might be adjusted instead. Stores of complex values are also split up, without effect on debugging. Expressions taking the address of variables, and possibly adding offsets to them, may be substituted into indirections, enabling variables to become non-addressable and turned into SSA form, as in --tree-phiprop ->#O1-tree-phiprop. The conditions in conditional branches may be folded to constants, which changes the control flow graph and can render entire blocks unreachable. Likewise, simplifications in switch expressions may rule out some case targets. It may combine memcpy and memset calls to neighbor ranges into a single memcpy, which may affect debugging if the pointer returned by the memset call is referenced in debug binds. Additional specialized transformations involve bit rotations, permutations, bitfield refs and vector constructors, but none of these affect debugging. #O1-tree-sra --tree-sra: pass_sra_early ->#pass_sra_early, and pass_sra ->#pass_sra Perform scalar replacement of aggregates. This flag enables passes that turn members of aggregates that would normally live in memory into stand-alone scalars that can be optimized like registers. The original aggregate object may in some cases be fully taken apart, but when it is still used as a whole, the scalar is "spilled" back in place and "reloaded" as needed. After assignments to the scalar introduced by these passes, as well as spills and reloads, debug binds are introduced so that var-tracking can keep track of the fragments of the aggregate, so this pass should be transparent as far as debug information is concerned. Unfortunately, there are problems or limitations in the var-tracking pass that cause us to not use the annotations for the scalarized members, at least in cases in which the aggregate as a whole is small enough to be regarded as an SSA register. Some investigation to var-tracking is needed to determine how to use at least the conflicting notes that apply to both the whole aggregate and the scalarized member, but this may turn out to show significant shortcomings in VTA (variable tracking at assignments) and require some work to make use of the available annotations so as to bring debug information quality of (fully- and?) partially-scalarized aggregates in line with that of scalars. Another notable limitation introduced by this pass is that dismembered aggregates can no longer be used in inferior calls that expect references or pointers. #O1-tree-loop-im --tree-loop-im: pass_lim ->#pass_lim Enable loop invariant motion on trees. Although this flag is enabled by default, the pass is omitted from the set of passes activated at -Og ->#Og, so it is only run at -O1 ->#O1 or higher. This pass moves invariants out of loops, and performs store motion. Floating-point divides and shifts for bit tests may have invariant divisors and shifted bits rearranged for hoisting, without impact on debugging. Access to memory at an invariant address may be turned into a SSA scalar, with a load at the loop entry and a store at the loop exit; such early loads and delayed stores may be confusing for debugging. Invariant computations are moved to the edge into the loop from the preheader, after being removed from their original position. The removal triggers propagation into debug binds, which preserves bind equivalences but drops the actual location, and becomes more fragile. With a bit of additional effort, it would be possible to keep the binds unchanged. Still, this movement should have little to no impact on debugging. #O1-tree-dominator-opts --tree-dominator-opts: pass_dominator ->#pass_dominator, pass_phi_only_cprop ->#pass_phi_only_cprop, and pass_uncprop ->#pass_uncprop Enable dominator optimizations. Although this flag is enabled even at -Og ->#Og, the passes controlled by it are omitted from the set of passes activated at -Og ->#Og, so they are only run at -O1 ->#O1. It propagates constants and copies into uses, folds expressions, attempts to resolve conditionals, eliminates redundant computations and redundant stores, replaces inequalities with equality tests, propagates coalescible SSA names equivalent to PHI values incoming from each edge, propagates and removes degenerate PHIs, and performs jump threading. The only transformation that has any significant effect on the debug experience, given that VTA, SFN and LVu mask the effects of the others, is jump threading. See the effects of (gimple) jump threading under --tree-vrp ->#Os-tree-vrp-pass_thread_jumps. #O1-inline-functions-called-once --inline-functions-called-once: pass_ipa_inline ->#pass_ipa_inline Integrate functions only required by their single caller. This option works as an enabler for certain cases of inlining, in that, if this option is disabled, or optimization is disabled, for a function or for any of its callers, and no other flag or attribute mandates or enables inlining, then the possibility of inlining into all callers and not emitting an out-of-line copy will not even be considered. Oddly, the "called once"/"single caller" bit seems to be a left-over artifact of earlier implementations: there doesn't seem to be any test involving the caller count in the inlining code paths activated by this flag. Inline substitution, per se, is not usually a significant source of debug information degradation: any piece of debug information that could be represented in the out of line function can be and is equally represented for each inlined copy. Potential loss arises out of debug-lossy optimizations, when performing transformations that are enabled or strengthened by the additional information available when analyzing both the caller and the callee in a single context. For example, the inline expansion of a function within a loop that is unrolled may face significant ambiguity as to how many inlined copies of the function are there, how far scopes in each copy extend, especially if instructions of different iterations are shuffled together by e.g. modulo scheduling. Another situation in which inlining may affect the debug experience significantly is that of heavy use of abstraction calls. As large numbers of nearly empty, abstraction-only functions are inlined, the density of code vs debug annotations becomes low, and the risk of hitting upper limits on debug annotations counts grows. When they are hit, such annotations as debug markers and binds may be dropped, removing the compiler's ability to mask the effects of optimizations on debugging. The loss of markers removes the linearity of single-stepping and the robustness of the relationship between source locations in the program and observable effects that they bring. The loss of debug binds takes with it much of the possibility of observing variables not held in stable memory locations. Such degradation, that takes debug information back to the days in which the debugging of optimized programs was reasonably held to be unreasonably difficult, may sometimes be avoided at the expense of significant compile time and memory, using such parameters as "max-debug-marker-count", "max-vartrack-size", "max-vartrack-expr-depth", and "max-vartrack-reverse-op-size". #O1-ssa-backprop --ssa-backprop: pass_backprop ->#pass_backprop Enable backward propagation of use properties at the SSA level. This flag is enabled by default, but the pass is only activated at -O1 ->#O1 or higher. It detects numeric variables whose sign does not matter, and optimizes away operations that affect only their sign. Debug binds referencing modified SSA DEFs are adjusted when possible, but since some cases involve function calls and those do not belong in debug binds, some binds may be lost, and others, especially after PHI nodes, may be bound to expressions that have their signs reversed, which may be confusing. #O1-tree-phiprop --tree-phiprop: pass_phiprop ->#pass_phiprop Enable hoisting loads from conditional pointers. This pass, enabled by default but activated only at -O1 ->#O1 or higher, replaces phi nodes whose incoming args all take the address of a scalar value, and are later dereferenced, into phi nodes that take the scalar values directly. The pass makes sure that the loaded memory values cannot change between the load points, original and optimized, but this transformation might affect debugging if it involves modifying any of the affected memory variables, as the values may have already been loaded. It may also cause a variable that was addressable to become non-addressable and promoted to an SSA register. Debug binds would only be assigned at the time of this promotion, which may be too late to capture assignments that might have already been moved or optimized out. As a result, such variables, promoted to non-addressable, will have worse location tracking than scalar variables that never have their address taken, but no worse than if they had remained addressable all the way. #O1-tree-pta --tree-pta: pass_build_alias ->#pass_build_alias, pass_build_ealias ->#pass_build_ealias, and TODO_rebuild_alias ->#TODO_rebuild_alias Perform function-local points-to analysis on trees. This just computes more refined alias sets, it doesn't make any transformations, so whatever effects it might have in the debugging experience are indirect. #O1-stdarg-opt --stdarg-opt: pass_stdarg ->#pass_stdarg Optimize amount of stdarg registers saved to stack at start of function. The code enabled by this flag estimates the maximum sizes of general-purpose and floating-point registers areas used in a stdarg variable argument list function, so as to limit the number of registers that need to be saved. This does not affect debugging. #O1-tree-builtin-call-dce --tree-builtin-call-dce: pass_call_cdce ->#pass_call_cdce Enable conditional dead code elimination for builtin calls. Although this flag is enabled even at -Og ->#Og, the pass is omitted from the set of passes activated at -Og ->#Og, so it is only run at -O1 ->#O1. This pass replaces builtin calls with simpler operations, and/or guards the operation by conditions that decide whether or not to execute the call, replaced or not. This may be slightly confusing when setting breakpoints at the omitted calls, or attempting to single-step into them. #O1-tree-cselim --tree-cselim: pass_cselim ->#pass_cselim Transform condition stores into unconditional ones. This flag is enabled by default when there is a conditional move instruction, but the pass is only activated at -O1 ->#O1 or higher. The pass moves gimple stores in conditional blocks to subsequent join blocks, introducing PHI nodes to select the value to be stored. Addressable variables rely on var-tracking (MEM annotations) rather than var-tracking-at-assignments debug binds, so moving stores cause observable changes in the debug experience: if a variable that should be modified by a store is inspected after the expected store point, but before the replacement store is executed, an outdated value will be found. I wonder if it might be possible to insert debug binds to temporarily override the location of variables that live in memory most of their lifetime, so that such deferred writes could be reflected in location lists, and observed immediately through such a bind, in spite of the deferred execution of the store. As in --hoist-adjacent-loads ->#Os-hoist-adjacent-loads, the moves could leave the conditional blocks empty, which could make it impossible to set breakpoints at lines within them or to single-step into them, as SFNs get dropped along with the removed blocks. Unlike the combined stores from if/then/else structures, sunk stores from else-less then blocks (or from else blocks with empty then blocks) retain their location information, so one might be able to stop at them even when the conditional block to be executed does not include that line. This can all get confusing, and it could be alleviated with conditional binds and markers. #O1-ssa-phiopt --ssa-phiopt: pass_phiopt ->#pass_phiopt Optimize conditional patterns using SSA PHI nodes. This pass performs various transformations (see --hoist-adjacent-loads ->#Os-hoist-adjacent-loads for more) that may drop small or empty conditional blocks, combining a test and a conditional assignment (represented as a PHI node) into a flag-store, an abs, min, or max expr. If a temporary is needed, it may be cloned from the phi result, but that will then be placed in one of the operands of the original PHI node, so any debug binds referencing the original result remain correctly unchanged. The potential negative impact on the debug experience of these transformations is limited to the removal of a conditional block, with diminished ability to step into the block or set breakpoints in it, and the potential of an early (temporary) overwrite of the location of the variable that will eventually hold the join value, which might make the variable impossible to inspect or modify after such overwrite. The 3-way min-max cases do not change this picture much, except for the possibility of loss of visibility of the result of the intermediate assignment, as bind and marker are removed along with the conditional block. Another situation in which a conditional block may be eliminated is that in which both edges out of the condition yield the same value for the PHI (e.g. x != a ? a : x simplifies to a). Such simple cases of value unification have just the usual impact of removing a conditional block, but more elaborate cases, with multiple assignments computing the result of the conditional block, have the assignments, but not markers or binds, moved out of the conditional block, with the usual consequences of difficulty of stepping into the removed block, or inspect the results of computations whose debug binds were dropped, before the debug binds at a subsequent join point, if any. Yet another transformation is factoring a conversion out of a PHI node. If both incoming edges perform the same conversion, or if one is a constant and moving the conversion after the join is still found potentially profitable for enabling other optimizations, a new PHI is introduced with type and values prior to the conversion, the original conversions are removed, a new conversion stmt is introduced at the top of the join block, storing in the original PHI result, and finally the original PHI def is removed. This transformation does not remove any block, the original conversions can be propagated into any debug binds, and the new conversion (without location information) is inserted before the debug bind of the original PHI node. The final removal of the original PHI node does not reset debug binds, because we skip propagation into binds upon PHI node removal, and the conversion assignment becomes the new definition. The moved conversions can still be inspected, thanks to SFN and VTA, and the converted value is bound to the variable that takes that value at the join point too, so this transformation does not affect the debug experience. #O1-tree-reassoc --tree-reassoc: pass_reassoc ->#pass_reassoc Enable reassociation on tree level. Although this flag is enabled by default, the pass is omitted from the set of passes activated at -Og ->#Og, so it is only run at -O1 ->#O1 or higher. This patch rearranges multiple stmts that perform the same operation, say addition, ordering operands by rank and issuing multiple operations in parallel when that's advantageous. This ends up removing nearly all of the original stmts and issuing new ones, using new SSA names. Debug binds retain the original operations, and markers allow them to be inspected when single-stepping. The reassociation might insert extraneous calls, however, e.g. turning repeated multiplies into powi calls; this might be slightly confusing if stepping into calls. Range tests in conditional branches may end up simplified, making the branches unconditional, and rendering some blocks unreachable, which prevents setting breakpoints in them. #O1-tree-loop-optimize --tree-loop-optimize: pass_fix_loops ->#pass_fix_loops, pass_tree_loop ->#pass_tree_loop, and pass_tree_no_loop ->#pass_tree_no_loop Enable loop optimizations on tree level. This flag is enabled by default, but it is only activated when optimization at -O1 ->#O1 or higher is enabled. When activated, this flag enables a pass that detects loops and gathers information about them. If the flag is activated and loops are found in a function, then various loop passes are run over that function; otherwise, only the pass enabled by --tree-slp-vectorize ->#O3-tree-slp-vectorize is. #O1-tree-scev-cprop --tree-scev-cprop: pass_scev_cprop ->#pass_scev_cprop Enable copy propagation of scalar-evolution information. This flag is enabled by default, but it is only activated when --tree-loop-optimize ->#O1-tree-loop-optimize is activated. If scalar evolution determines that a PHI node is invariant, replace uses thereof, including those in debug binds, by the invariant. This has no effect on debugging. It also computes, through scalar evolution, the final value of variables modified in loops, dropping the PHI node in favor of a computation based on values known before the loop is entered. This may affect debugging when the removal of the PHI node resets a debug bind referencing it, but the bind could be preserved, since a new, equivalent definition will be introduced. #O1-tree-loop-ivcanon --tree-loop-ivcanon: pass_iv_canon ->#pass_iv_canon, pass_complete_unroll ->#pass_complete_unroll, and pass_complete_unrolli ->#pass_complete_unrolli Create canonical induction variables in loops. This flag is enabled by default, but it is only activated when --tree-loop-optimize ->#O1-tree-loop-optimize is activated. This pass estimates the number of iterations of each loop, identifies exit edges and removes those whose conditions are never met, based on gathered information about the maximum number of iterations. It attempts complete loop unrolling and completes if that succeeds. Otherwise, if the loop meets certain conditions, a countdown induction variable is introduced and the loop exit test is replaced so as to compare this variable with zero. The only transformations that minimally impact debugging are the removal of loop exits, which may render some unreachable blocks unavailable for setting breakpoints (that would never be hit), and loop unrolling, that uses the same machinery and has the same effects on debugging that loop peeling (see --peel-loops ->#O3-peel-loops). #O1-ivopts --ivopts: pass_iv_optimize ->#pass_iv_optimize Optimize induction variables on trees. This flag is enabled by default, but it is only activated when --tree-loop-optimize ->#O1-tree-loop-optimize is activated. For each loop, after detecting base and general induction variables and selecting the optimal set, any new, artificial induction variables are created and added to the loop. Then, uses of induction variables not chosen for the optimal set are rewritten in terms of the optimal set, adjusting their original assignments or inserting new assignments instead of phi nodes. Finally, assignments to induction variables set to be removed are propagated into debug binds, if needed, and then discarded. Alas, propagation into debug binds may lose plenty of useful information: PHI nodes cannot be propagated into binds, and regular assignments are not removed so that, say, if a definition of A is used in a definition of B and both are to be removed, we get a chance to propagate B and then A into debug binds that referenced only B. If we happen to remove A first, uses of B in debug binds end up having to be reset, losing relevant location information. #O1-inline-atomics --inline-atomics: pass_fold_builtins ->#pass_fold_builtins Inline __atomic operations when a lock free instruction sequence is available. This flag is enabled by default, but the transformations described herein, part of the fold builtins pass, are only activated at -O1 ->#O1 or higher. Various atomic operations are turned into atomic bit test and set, complement or reset. The transformation may invalidate user variables used only in compares with zero. #O1-if-conversion --if-conversion: pass_rtl_ifcvt ->#pass_rtl_ifcvt, and pass_if_after_combine ->#pass_if_after_combine Perform conversion of conditional jumps to branchless equivalents. Various situations in this RTL pass remove tests, conditional branches and basic blocks. This can make for very surprising single-stepping into the blocks guarded by the conditions, as lines that would not be expected to run given the condition actually get to run, or vice-versa. SFNs don't help, they just reinforce whatever block execution is taken, or get dropped altogether. Aside from the confusing single-stepping, the block removal might (but likely doesn't) cause GCC to lose track of debug bindings. In theory, at confluence points (when entering SSA), we introduce additional debug binds that allow GCC to recover from the loss of bindings in the separate branches. These should allow GCC to get back in sync with the result of the if-converted assignments at the confluence point, so at least after the confluence point, the bindings should have been recovered: if-converted sets will be inserted before the confluence-recovering debug bind. These transformations usually apply to a single assignment in each conditional block, but there is support for turning multiple assignments in a then block into multiple assignments from IF_THEN_ELSE (cond, then_value, orig_value) too. There aren't further debugging complications in this case, but the blocks can be much longer, breaking users' expectations of single stepping for longer. SFN might make all of this worse, in that the statement markers in the conditional blocks are actually dropped, so you don't get to step into the blocks any more. Support for conditional markers and binds could alleviate the effects of these transformations. #O1-move-loop-invariants --move-loop-invariants: pass_rtl_move_loop_invariants ->#pass_rtl_move_loop_invariants Move loop invariant computations out of loops. This pass identifies SET insns that are invariant within a loop, and moves them to the loop preheader, possibly using a new pseudo to hold the invariant, or replaces them with a copy from the pseudo holding an equivalent invariant. Debug binds remain in place and need not be adjusted, as the transformations ensure the values are available in the original pseudos at the points right after the original SETs, where the binds will tend to be. The only risk I can see to debuggability is that moved insns, and insns leading to equivalences that may end up dead and removed at later passes, may leave lines of code without any insns standing for them. The use of SFN and LVu information in debuggers, enabling them to stop at and inspect the state even at such lines, removes this potential problem. #O1-branch-count-reg --branch-count-reg: pass_rtl_doloop ->#pass_rtl_doloop Replace add, compare, branch with branch on count register. This pass replaces the conditional branch at the end of a loop with a single decrement-counter-and-conditionally-loop sequence, when the loop iteration count can be computed. The original loop counter is not removed by this pass, so this pass by itself does not affect debug information. However, the original loop counter may become unused, and then be optimized away, and then it is unlikely that the generic adjustments to debug bind statements will be able to realize it can be computed from the newly-introduced loop counter. There is room for improvement, adjusting the debug binds of the original loop counter in terms of the new related IV. This might require some additional infrastructure that could likely be generalized and used for IVs in general. #O1-if-conversion2 --if-conversion2: pass_if_after_reload ->#pass_if_after_reload Perform conversion of conditional jumps to conditional execution. This pass turns insns in then and else blocks into COND_EXEC, enabled by the if condition (then) or its negation (else), removing the conditional branch, the branches at the end of the conditional blocks, and bringing it all into a single basic block. It does not modify or remove debug insns, so single-stepping will enter and execute both blocks, though the side effects of insns whose condition is not active will not be executed. In general, insns that modify a variable will be followed by a debug insn that binds the variable to the location holding its modified value. Although debug insns don't have conditional binds, the location of a variable often (but not always) remains the same across modification. In the cases it doesn't, only the bind at the confluence of the conditional blocks will get the variable location and value back in sync. In addition to the post-confluence point, a variable modified within a block turned into conditionally-executed insns can also be correctly inspected right after an (active) assignment to it, i.e., the conditional assignment that would have been executed should the conditional blocks have remained separate. SFN and LVu technology help make sure there will be a usable inspection point with the correct bindings at that point. At other points in the combined block, variables potentially modified in it may be regarded as bound to a stale or unused location holding an unrelated or uninitialized value, corresponding to what would have been assigned to the variable in the other block. This can get confusing if one does not realize that the block that is apparently being executed was not the one corresponding to the guarding condition. All of these caveats of conditional execution only apply in the somewhat unusual cases in which the location of the variable actually changes. Because of control flow confluence and variable value unification at that point (regardless of the debug bind at the confluence point), it will most often be the case that the variable lives at the same register or memory location throughout the conditionally executed blocks, so the degradation of the debugging experience by this pass, although possible, should be rare. Debug binds and markers cannot currently be marked as conditional; making that possible could further alleviate the impact of this transformation. #Os -Os: optimize=2 + size Perform optimizations that tend to reduce the code size. This option sets the optimization level to 2, in a mode that assigns higher priority to reducing code size. Optimization at level 2 or higher extends tests on whether memory references may overlap with affine combinations analysis. This may infer non-aliasing in cases lower optimization levels wouldn't, enabling further optimizations, but nothing with effects on debugging that couldn't be had in other more obvious cases of non-aliasing. #Os-pass_complete_unrolli --- pass_complete_unrolli ->#pass_complete_unrolli Optimization level 2 or higher enables a pass that completely unrolls inner loops that iterate just a few times. Unrolling uses the same machinery that performs loop peeling (see --peel-loops ->#O3-peel-loops) and, by itself, does not affect debugging. #Os-pass_early_remat --- pass_early_remat ->#pass_early_remat An early rematerialization pass runs at optimization level 2 or higher. It rematerializes pseudos whose live ranges cross calls by copying the reaching definition insns between calls and uses. The pseudo may then be regarded as dead before the call, which might reset binds after the new death points, even when they could be adjusted so as to refer to the definition that will be used for rematerialization. In some cases, however, the expression may be lost entirely, but even when it is preserved, it might be too complex to be recognized as unchanged when the pseudo is rematerialized, so locations or values based on the pseudo might be lost. #Os-pass_ira --- pass_ira ->#pass_ira Optimizing for size changes the default register allocation region setting back to the one used when not optimizing. #Os-expensive-optimizations --expensive-optimizations: Perform a number of minor, expensive optimizations. #Os-expensive-optimizations-pass_thread_jumps --- pass_thread_jumps ->#pass_thread_jumps Gimple jump threading is one of the significant transformations enabled by this flag; see the effects of jump threading on debugging under --tree-vrp ->#Os-tree-vrp-pass_thread_jumps. #Os-expensive-optimizations-pass_optimize_bswap --- pass_optimize_bswap ->#pass_optimize_bswap The bswap gimple pass, also enabled by expensive optimizations, recognizes shifts and rotates equivalent to byte-swap transformations, and replaces them with a byte-swap builtin. Any user-visible intermediate computations should have debug bind statements that will ultimately be adjusted and preserved even if the computations themselves are dropped, but some stmt moving, replacing, and inserting-then-removing, might actually mess up debug bind tracking of the final value. #Os-expensive-optimizations-pass_optimize_widening_mul --- pass_optimize_widening_mul ->#pass_optimize_widening_mul Another expensive optimizations pass is widening_mul. It recognizes various opportunities for math optimizations, such as fusing multiply and add, testing overflows on adds or subtracts, and combining divide and modulus into a single operation. Final assignment stmts are replaced and stmts performing no longer needed computations are removed in a way that doesn't harm debugging. #Os-expensive-optimizations-pass_strength_reduction --- pass_strength_reduction ->#pass_strength_reduction, pass_expand ->#pass_expand, pass_combine ->#pass_combine, pass_cse ->#pass_cse, pass_ira ->#pass_ira, pass_reload ->#pass_reload, and pass_postreload_cse ->#pass_postreload_cse Some of the changes brought about by this flag are additional canonicalization of addresses when comparing base addresses in alias analysis, searching for alternate base addresses in gimple strength reduction, loop iteration count estimation even for loops with multiple exits, taking conflict counts into account when ordering SSA names for coalescing, combination of temporary slots for automatic variables, reuse of wider-mode ANDs and MEMs for CSE, simplifications and cheap extensions in combine, slightly more elaborate selection of register class preferences and attempts to decrease the number of live ranges in the integrated register allocator, removal of some unneeded reloads, and additional post-reload combine and CSE subpasses. None of these modify passes in ways that impact debugging but that aren't similarly impacted without this flag. #Os-expensive-optimizations-pass_duplicate_computed_gotos --- pass_duplicate_computed_gotos ->#pass_duplicate_computed_gotos Another of the expensive optimizations is the compgotos RTL pass, that duplicates each small-enough block ending in computed jumps and merges the copies with predecessors that have it as their single successor, with no effects on debugging. #Os-strict-aliasing --strict-aliasing: strict_aliasing ->#strict-aliasing Assume strict aliasing rules apply. This flag limits the cases in which pointer accesses may alias, but that does not enable any kind of transformation with impact on debugging that could be incurred otherwise, using pointers known not to alias through other means. #Os-vect-cost-model=cheap --vect-cost-model=cheap: pass_vectorize ->#pass_vectorize, and pass_slp_vectorize ->#pass_slp_vectorize Use the cheap cost model for vectorization. This affects --tree-loop-vectorize ->#O3-tree-loop-vectorize and --tree-slp-vectorize ->#O3-tree-slp-vectorize decisions, but not the kinds of transformations they make. #Os-tree-vrp --tree-vrp: pass_early_vrp ->#pass_early_vrp, and pass_vrp ->#pass_vrp Perform Value Range Propagation on trees. This flag activates two different passes: early vrp and vrp proper. Early vrp is simpler in that it is not iterative, going through basic blocks once in dominance order rather than using the SSA propagation engine. Once the range assigned to an SSA name is narrowed down to a single constant, subsequent statements referencing the name can be propagated into and possibly folded, and the definition may be removed. Conditional statements may be simplified, removing edges and basic blocks. Expressions in other statements may also be simplified based on ranges. Such simplifications, in themselves, do not affect significantly the debugging experience. Removed definitions, if mentioned in debug binds, will be propagated into them and preserved there, with markers and views enabling them to be single-stepped and inspected; otherwise simplified statements remain in place with the same outputs, and don't require any debug information changes. Simplified conditions may cause entire blocks to become unreachable and be removed, which would stop placing breakpoints at them, but such breakpoints wouldn't be reached anyway. #Os-tree-vrp-pass_thread_jumps --- pass_thread_jumps ->#pass_thread_jumps At the end of VRP proper, (gimple) jump threading takes place, using value ranges to simplify conditional stmts to tell whether outgoing edges of threadable blocks can be determined from incoming edges. Gimple jump threading duplicates a block when arriving at it through a certain incoming edge implies exiting it through a certain outgoing edge. This duplication, in itself, does not affect the debug experience: the copied block carries as much debug information as the original block. During threading, however, there are blocks that are not copied, namely forwarding blocks. From a codegen perspective, all they seem to do is to jump to another block. From a debug experience perspective, however, they may contain plenty of bind statements and markers, and those are not duplicated: binds are consolidated so that only the latest bind to each variable is copied, and markers are dropped entirely. This arrangement, intended to reinforce binds after newly-introduced confluences, drops debug binds that would not be observable before the introduction of markers and views. With markers and views, dropping the blocks in favor of bind consolidation amounts to significant loss. Effects need to be assessed, as forwarding blocks and leading/trailing debug stmts may end up removed by CFG cleanup. Better means to preserve them when consolidating forwarding blocks guarded by optimized-out conditions may be needed: conditional markers and binds are a possibility to explore. #Os-tree-dce(aggressive) --tree-dce(aggressive): pass_cd_dce ->#pass_cd_dce See --tree-dce ->#Og-tree-dce. At optimization level 2 or higher (i.e., starting at -Os ->#Os), the second tree dead code elimination pass is run in aggressive mode, that takes control dependencies into account, enabling additional conditional branches to be eliminated. This does not, however, fundamentally change the kinds of effects these passes have on debugging. #Os-ipa-sra --ipa-sra: pass_early_ipa_sra ->#pass_early_ipa_sra Perform interprocedural reduction of aggregates. This pass modifies the argument list of a function that takes aggregates as arguments, splitting them into scalars, and adjusting the callers. The impact on debugging could possibly be no different from that of --tree-sra ->#O1-tree-sra, but the parameter transformations do not retain any traces of the original parameters that could have variable location information generated in a way that reconstructed the original object, or even that tracked each replacement scalar parameter separately. This would require infrastructure to somehow retain the original parameters and describe how they map to the replacement parameters. #Os-optimize-sibling-calls --optimize-sibling-calls: pass_tail_recursion ->#pass_tail_recursion, and pass_tail_calls ->#pass_tail_calls Optimize sibling and tail recursive calls. This enables two separate passes. One attempts to turn tail recursion into loops, the other marks non-recursive tail calls as such, so that the expander emits them as jumps rather than calls. Neither transformation affects debugging within an activation of a function, but they do affect debugging in that call stacks may be missing expected frames, stepping over a tail call would require additional logic in the debugger and the call would not return to the expected caller, and setting a breakpoint at the entry point of a recursively tail-called function may miss the recursive tail-calls. #Os-tree-switch-conversion --tree-switch-conversion: pass_convert_switch ->#pass_convert_switch Perform conversions of switch initializations. This activates switch statement lowering alternatives that may be more efficient than the jump tables or decision trees that are otherwise used. One of the lowering possibilities uses the switch value as a shift count, and then uses bit tests instead of multiple equality tests. No visible effects on the debug experience are expected from this. Another turns a switch statement with all cases containing assignments of constants to the same variables into arrays of the constants and assignments to the variables from indexed elements of the arrays. This collapses the code for all (in-range) cases into a single block, losing any debug annotations they might contain. This ultimately prevents stepping into the switch statement or breaking at any of the cases. Optimized-out assignments that might have been preserved in such annotations will be lost altogether. As for assignments that are handled by this transformation, even though debug binds in the cases are lost, binds introduced by VTA after the post-switch PHI nodes wi