What is reload?

A note before reading: reload is being replaced by LRA. Initially (July 2013) LRA was only implemented for x86/x86_64. There is work to bring it to other targets as well. If you are having trouble with reload in GCC 4.9 or later, work on converting your port to use LRA instead. Start with the lra_p target hook. In GCC 7 the default is to use LRA.

Reload is the GCC equivalent of Satan. See [gccsource:reload.c], [gccsource:reload1.c], and [gccsource:reload.h] if you have a brave soul. (You'll probably also wind up looking at [gccsource:local-alloc.c] and [gccsource:global.c], the register allocator proper.)

What does reload do?

Good question. The what is still understandable. Don't ask about the how.

Reload does everything, and probably no one exactly knows how much that is. But to give you some idea:

Spill code generation Instruction/register constraint validation Constant pool building Turning non-strict RTL into strict RTL (doing more of the above in evil ways) Register elimination--changing frame pointer references to stack pointer references Reload inheritance--essentially a builtin CSE pass on spill code

All these things are badly intertwined, so that it is almost impossible to improve the spill code generation without breaking platforms. Reload badly needs to be simply a spill code generator (preferably driven from a new register allocator), and the machine-specific constraints (i.e. how to move a constant, a piece of memory, or register from here to there) need to be handled elsewhere.

How does it do this?

So you had to ask, eh? Oh well, here we go (thanks Ulrich Weigand!).

The RTL optimization passes can be split into two quite different classes:

non_strict_ (pre-reload/regalloc) strict RTL (post-reload/regalloc)

In the strict RTL mode, the internal representation of the function is a stream of insn patterns, each of which matches exactly one real assembler instruction of the target (let's ignore niceties like splitters here, they are irrelevant to the reload/regalloc problem). An "exact match" means just that all operands of each insn must be directly encodable in machine language; whether they be hard registers of some sort, memory references, or immediate constants.

On the other hand, in the non-strict RTL mode, the requirements on the insn stream are much looser. While the general *structure* of each insn (e.g. a two-address PLUS in SImode) must correspond to something that the machine can do in assembler, the details of the *operands* do not yet have to be immediately ready for encoding into machine language.

The most notable such difference of course is that we allow pseudo-registers, that do not a-priori correspond to hardware registers, to appear as or within operands. But there are other details where we are allowed to be a bit lax; e.g. memory address modes need not yet fully conform to the target instructions, and we may also allow operands of a different form like an immediate operand that later on may need to reside in a register or in memory.

In all these cases we usually don't allow operands that would be completely impossible on the target machine (like memory-indirect addressing on a machine where this is fundamentally not supported); but in cases where the machine may or may not allow an operand, depending on just which of multiple possible instructions is selected to implement the RTL, we will typically be generous and allow all options during the early phases.

Note that we call these two intermediate languages strict RTL and non-strict RTL: they are both simply target-defined subsets of the full RTL language. Both can be inferred from the machine description. The insn patterns including operand predicates define the nonstrict RTL. Taking the insn constraints into account in addition to that, we define the strict RTL. This is the sole purpose of the constraints.

So, given the description of these two intermediate languages, the task for the register allocator and for reload is very simple to state:

*Transform any given nonstrict RTL program into a semantically equivalent strict RTL program, in the most efficient way possible*.

Unfortunately, this task is not quite as simple to implement.

One part of the task is to address issues typically associated with register allocation: eliminate pseudo-registers by placing them into hard registers and/or stack slots, and emit the spill code necessary to move them back and forth as required.

The other is to make sure that for each insn, all operands are provided in a form that satisfies the constraints of at least one alternative. This may involve reloading the operand from one form into another, but may also involve other actions like fixing up memory addresses to use (one of the) allowed addressing modes, possibly swapping the operands of a commutative instruction, and the more arcane issues like moving values between integer and float registers via a temporary memory location if the hardware needs that.

The problem now is, that it appears to be impossible to do a good job by trying to solve first the one task and then the other; they are too tightly intertwined.

One the one hand, the register allocator needs to understand the insn constraints so that it will use correct register classes etc. It should also know at least something about non-register constraints. For example, if an insn has a memory-constraint alternative the register allocator should know that spilling an operand register to the stack can be done implicitly by simply substituting a memory operand in the insn; there is no need to generate explicit spill code.

On the other hand, reload proper needs to allocate fresh registers for some of its tasks, so it cannot work after a register allocator that left it no free registers to work with. But neither can you just do reload before the allocator. Then the allocator no longer has the choice of switching between alternatives to help avoid spill code.

(Not to mention that the spill code generated by the allocator sometimes itself needs to be reloaded to fix issues like address displacement overflows; and whether a stack displacement overflows will only be known for certain after all spill slots have been allocated!)