sound_lazy

sound_eager

sound_lazy

sound_lazy

To carry on the optimizations, we change the syntax of types. Recall that in sound_eager , types were comprised of free or bound type variables TVar , (implicitly universally) quantified type variables QVar and function types TArrow . The first, seemingly unprincipled change, is to eliminate QVar as a distinct alternative and dedicate a very large positive integer -- which should be treated as the inaccessible ordinal ω -- as a generic_level . A free type variable TVar at generic_level is taken to be a quantified type variable. More substantially, all types, not only free type variables, have levels now. The level of a composite type ( TArrow in our case) is an upper, not necessarily exact, bound on the levels of its components. In other words, if a type belongs to an alive region, all its components should be alive. It immediately follows that if a (composite) type is at generic_level , it may contain quantified type variables. Contrapositively, if a type is not at generic_level , it does not contain any quantified variable. Therefore, instantiating such a type should return the type as it is without traversing it. Likewise, if the level of a type is greater than the current level, it may contain free type variables to generalize. On the other hand, the generalization function should not even bother traversing a type whose level is equal or less than the current. This is the first example of how levels help eliminate traversals and rebuildings of a type, improving sharing.

Unifying a type t with a free type variable should update t 's level to the level of the type variable if the latter level is smaller. For a composite type, such an update means recursively updating the levels of all components of the type. To postpone costly traversals, we give composite types two levels: level_old is an upper bound on the levels of type's components; level_new , which is less or equal to level_old , is the level the type should have after the update. If level_new < level_old , the type has pending level updates. The syntax of types in sound_eager is thus

type level = int let generic_level = 100000000 (* as in OCaml typing/btype.ml *) let marked_level = -1 (* for marking a node, to check*) (* for cycles *) type typ = | TVar of tv ref | TArrow of typ * typ * levels and tv = Unbound of string * level | Link of typ and levels = {mutable level_old : level; mutable level_new : level}

We have not explained marked_level . The occurs check on each unification with a free type variable is expensive, raising the algorithmic complexity of the unification and type checking. We now postpone this check, until the whole expression is type checked. In the meanwhile, unification may create cycles in types. Type traversals have to check for cycles, or risk divergence. The marked_level is assigned temporarily to level_new of a composite type to indicate the type is being traversed. Encountering marked_level during a traversal means detecting a cycle, which raises the occurs check error. Incidentally, in OCaml types are generally cyclic: (equi-)recursive types arise when type checking objects and polymorphic variants, and when the -rectypes compiler option is set. The OCaml type checker uses a similar marked-level trick to detect cycles and avoid divergence.

The sound_lazy unification has several important differences from sound_eager :

let rec unify : typ -> typ -> unit = fun t1 t2 -> if t1 == t2 then () (* t1 and t2 are physically the same *) else match (repr t1,repr t2) with | (TVar ({contents = Unbound (_,l1)} as tv1) as t1, (* unify two free vars *) (TVar ({contents = Unbound (_,l2)} as tv2) as t2)) -> if l1 > l2 then tv1 := Link t2 else tv2 := Link t1 (* bind the higher-level var *) | (TVar ({contents = Unbound (_,l)} as tv),t') | (t',TVar ({contents = Unbound (_,l)} as tv)) -> update_level l t'; tv := Link t' | (TArrow (tyl1,tyl2,ll), TArrow (tyr1,tyr2,lr)) -> if ll.level_new = marked_level || lr.level_new = marked_level then failwith "cycle: occurs check"; let min_level = min ll.level_new lr.level_new in ll.level_new <- marked_level; lr.level_new <- marked_level; unify_lev min_level tyl1 tyr1; unify_lev min_level tyl2 tyr2; ll.level_new <- min_level; lr.level_new <- min_level (* everything else is the unification error *) and unify_lev l ty1 ty2 = let ty1 = repr ty1 in update_level l ty1; unify ty1 ty2

repr

Btype.repr

update_level

The function update_level is one of the key parts of the optimized algorithm. Often, it merely promises to update the level of a type to the given level. It works in constant time and maintains the invariant that a type level may only decrease. The level of a type variable is updated immediately. For a composite type, level_new is set to the desired new level if the latter is smaller. In addition, if previously level_new and level_old were the same, the type is put into the to_be_level_adjusted queue for later update of the levels of the components. This work queue is akin to the list of assignments from the old generation to the young maintained by a generational garbage collector (such as the one in OCaml). Incidentally, a unification of two TArrow types has to traverse the types anyway, and so it does pending level updates along the way.

let to_be_level_adjusted = ref [] let update_level : level -> typ -> unit = fun l -> function | TVar ({contents = Unbound (n,l')} as tvr) -> assert (not (l' = generic_level)); if l < l' then tvr := Unbound (n,l) | TArrow (_,_,ls) as ty -> assert (not (ls.level_new = generic_level)); if ls.level_new = marked_level then failwith "occurs check"; if l < ls.level_new then begin if ls.level_new = ls.level_old then to_be_level_adjusted := ty :: !to_be_level_adjusted; ls.level_new <- l end | _ -> assert false

Pending level updates must be performed before generalization: After all, a pending update may decrease the level of a type variable, promoting it to a wider region, and hence save the variable from quantification. Not all pending updates have to be forced however -- only of those types whose level_old > current_level . Otherwise, a type contains no variables generalizable at the present point, and the level update may be delayed further.

The generalization function searches for free TVar s that belong to a dead region (that is, whose level is greater than the current) and sets their level to generic_level , hence quantifying the variables. The function traverses only those parts of the type that may contain type variables to generalize. If a type has the (new) level of current_level or smaller, all its components belong to live regions and hence the type has nothing to generalize. After the generalization, a composite type receives generic_level if it contains a quantified type variable. Later on, the instantiation function will, therefore, only look through those parts of the type whose level is generic_level .

let gen : typ -> unit = fun ty -> force_delayed_adjustments (); let rec loop ty = match repr ty with | TVar ({contents = Unbound (name,l)} as tvr) when l > !current_level -> tvr := Unbound (name,generic_level) | TArrow (ty1,ty2,ls) when ls.level_new > !current_level -> let ty1 = repr ty1 and ty2 = repr ty2 in loop ty1; loop ty2; let l = max (get_level ty1) (get_level ty2) in ls.level_old <- l; ls.level_new <- l (* set the exact level upper bound *) | _ -> () in loop ty

The type checker typeof remains the same, entering a new region when type checking a let expression. Please see the source code for details.