As I noted a year and a half ago, the constraint propagation pass in SBCL can slow compilation down a lot (lp#394206, lp#792363). To a large extent, I feel like that issue has been fixed in HEAD, but the lack of documentation on that pass made progress slower than it could have been. So, this post is a half-organised version of my notes, before I forget it all.

1 What constraint propagation does

The constraint propagation pass is executed as part of the IR1 optimisation loop, which operates on Lisp-level basic blocks. It’s only concerned with propagating information between “real” variables (LAMBDA-VARiable), variables that have a name and can be set and referred to multiple times, as opposed to temporaries (LVARs).

The distinction is useful because there are (usually) much fewer full-blown variables than temporaries, and only the former directly benefit from exploiting previously executed type checks and predicates to deduct information (flow sensitivity).

Instead of handling all the operation found in basic blocks, constraint propagation only consider a few key operations, and otherwise exploits the type information previously derived in a bottom-up manner (i.e. propagating the types of arguments to result types) by the other IR1 passes. The latter, inversely, only have access to the results of constraint propagation through the type derived for each REFerence to a lambda-var.

So, constraint propagation goes through each basic block, updating its state (set of known true constraints) when it encounters BIND nodes (LET bindings, more or less), references to a lambda-var, CAST nodes ([usually] checked type assertions), CSET nodes (assignment), or branches on a few key predicates (EQL, <, > or TYPEP, mostly).

As would be expected, the state at the end of each basic block is propagated to its successors, join points take the intersection of their precedessors’ states, and entry points are initialised with empty states... and this is all executed iteratively until convergence.

2 What constraint propagation isn’t

It feels a lot like an abstract interpretation algorithm. However, a major difference is that constraint propagation doesn’t converge on a least fixed point. Take the following trivial example:

CL-USER> (lambda () (let ((x 0.0)) (dotimes (i 10 x) (setf x (/ x 2.0))))) #<FUNCTION (LAMBDA ()) {10048E20C9}> CL-USER> (sb-kernel:%simple-fun-type *) (FUNCTION NIL (VALUES NUMBER &OPTIONAL))

It’s obvious to us that X will always be a single float. Unfortunately, while SBCL’s type analyses converge to a fixed point, they’re not always seeded with an underapproximation of a least fixed point.

When there are assignments to a variable, the type of that variable is initialised with its declared type (T if none), and, after that initial type has been used to derive the type of the values that could be assigned to it, the union of these types is taken (and of the initially bound value). In the example above, the return type of (/ [T] [single-float]) is NUMBER . Once we have an over-approximation of the least fixed point, we shouldn’t expect to tighten it back much.

In a proper abstract interpretation pass, X would be initialised with the bottom type or the singleton type (EQL 0.0) (or some approximation thereof), and iteratively widened with the derived type of the division and of its initialisation value, up to convergence. That would easily deduce that X is always 0.0, or, at least, a single-float.

It’s also not like any SSA variant. Yes, Continuation-passing Style is equivalent to Static single assignment form. One key part of that equivalence is that both styles rename variables at join points (via arguments to the continuation or phi functions). IR1 does mention continuations, but these continuations don’t perform the essential renaming of variables. That is the very reason why we have to use constraint sets to represent what we know about a variable’s contents at a given program point.

3 How constraint propagation does it

3.1 Constraints

At the heart of the pass are constraint sets (CONSETs) that are updated as the analysis walks through a basic block. Each constraint in a conset represent a small fact that is known to be true about a given lambda-var: that its contents are (or are not) of a certain type, are greater or less than values of a given type, EQL to a constant, or EQL to another variable.

As a pun, EQL constraint can also link a lambda-var and an lvar: as combinations (function calls) work with lvars, these special constraints are used to translate information about a combination’s arguments to information about lambda-vars (which is what constraint propagation handles).

EQL constraints are special. When two variables are EQL, it’s essential for good performance to extend that EQLness to all the variables that are EQL to either of them (i.e. compute the transitive closure of the EQL relation). This is useful because, when something is true of a variable (e.g. it’s TYPEP FIXNUM), it’s also true of all the variables EQL to it.

However, we only do that for one of the operands in a non-EQL constraint between variables, and not at all for EQL constraints involving constants (overly eager constant propagation can hurt since our codegen doesn’t handle repeated references to constraints that well) or lvars (these aren’t really constraints, but rather pure bookkeeping information punned as constraints).

3.2 Propagation

To begin propagation, entry points are initialised with empty constraint sets (we know nothing about the inputs), and the consets are updated by walking down the basic blocks in execution order, in CONSTRAINT-PROPAGATE-IN-BLOCK:

BIND nodes represent things like LET bindings. If we have a useful (more precise than T) type for the bound value, the new variable is (currently) of that type. Also, if the bound value is a lambda-var’s current value, then the newly introduced variable is EQL to it.

REF nodes are references to a lambda-var. The LVAR that receives that value is EQL to the lambda-var. Additionally, if we’re on the last iteration, it’s useful to store all that we know about the referenced variable at that program point for other IR1 passes to exploit.

CAST nodes are type checks. If the check fails, execution will not continue, so we can simply assume that the check is successful, and propagate the type to the lambda-var at the end of the chain of casts (if any).

CSET nodes represent assignments. The most important thing to do is forget all that is known about the assigned variable at that program point: after the assignment, it’s bound to a brand new value. Still, before that, if we want a lot of propagation, it’s useful to propagate some information that’s always true of that variable to EQL nodes. For now, it only makes sense to propagate the fact that, if the assigned variable is known to always be of a given type, all the variables that are currently EQL to it are also of that type. Then, as with bind nodes, it’s useful sense to add type constraints from the new value, and to add EQL constraints if applicable.

Finally, if the block ends with a conditional branch, it’s useful to look at the predicate guiding the branch to propagate different constraints to the consequent and alternative successors.

If the predicate is simply a reference to a variable, then it’s known to be NIL in the alternative, and something else in the consequent.

If it’s a function call, only a few functions are interpreted:

TYPEP-like functions result in constraints to the effect that a given lambda-var is or is not of a given type.

EQ and EQL result in additional EQL constraints (between variables, or to a constant) in the consequent, and the reverse constraints (not EQL) in the alternative

< and > are used to derive tighter bounds on numeric types. However, we do not track relationships between variables (except for EQLity and non-EQLity), and only note that a given variable is less or greater than a value of some type.

A few other type predicates also result in TYPEP constraints, when specially marked for the compiler.

Once a basic block has been constraint propagated through, the information is used by its successors. Basic blocks are processed in such an order that at least one of its predecessors have been propagated through before itself. The analysis is run with an initial conset made of the intersection of the consets (that have already been computed) at the end of its predecessors, taking care to use the right one if some predecessor ends in a conditional branch (COMPUTE-BLOCK-IN). And, that’s repeated on blocks for which the initial conset might have changed until we hit a fixed point.

Once that’s done, we’re only interested in storing all the flow-sensitive information we have about each variable in the relevant REFerences to it. So, constraint propagation is executed one last time (otherwise we only have the consets at the beginning and end of each basic block), and, when a ref node is encountered, the set of constraints related to the referenced variables is extracted and converted into a type, in CONSTRAIN-REF-TYPE.

4 What’s been and what could be done

We currently represent consets as bitsets. Without too much surprise, the only operation that is horrible with bitsets is iterating through the members of a given set. Fortunately, we only used this in a few functions, and it seems like bitsets are still the best option, when supplemented with a few indices.

The heaviest timesink was FIND-CONSTRAINT: constraints are hash-consed so that equivalent constraints are always EQ. It used to be we’d perform a linear search over the set of constraints associated with a given variable to find constraints. We now use hash tables in each variable to speed the search up.

The other major slowdowns were related to functions that need to iterate through the intersection of the current set of constraints and certain subsets of the constraints associated with a given variable (e.g. all the EQL constraints for a given variable in the current constraint set). Instead of iterating through the intersection of two bitvectors, we represent each subset of the constraints associated with each variable as vectors of constraints (the hash-consing step already ensures there won’t be any duplicate). Now, we only have to iterate through these vectors and check for membership in a bitset.

These simple changes are enough to bring compile times for a couple test cases down from hundreds of seconds to a few seconds or less.