August 7th, 2007 (08:11 pm)

current mood: nerdy

An interesting feature of dependent data types is the ability to erase not only types, but also data, except where needed. Consider vector concatentation:

cat :: (n::Nat) -> Vec n a -> (m::Nat) -> Vec m a -> Vec (n+m) a cat _ Nil _ bs = bs cat n (Cons a as) m bs = Cons a (cat (undefined::(n-1)) as m bs)

cat :: (n::Nat) => Vec n a -> (m::Nat) => Vec m a -> Vec (n+m) a cat Nil bs = bs cat (Cons a as) bs = Cons a (cat as bs) length :: (n::Nat) => Vec n a -> Nat length (v::Vec n a) = n

data Vec = Nil :: Vec 0 a | Cons :: a -> Vec n a -> Vec (n+1) a -- inferred code -- vec_n Nil = 0 vec_n (Cons _ as) = 1 + vec_n as

let a = [1,2,3] b = [4,5,6,7] in length (a ++ b)

The n and m parameters are never used. Therefore, the compiler can erase them. If the caller needs those values, it can always hold onto them itself. As a notational convenience, I will indicate values that should not be explicitly entered by following them with =>, since that's how Haskell indicates class constraints.So, quickly, how does the compiler know what n is? The simplest solution is that the code is generated from the type signature (theorem proving!):This can be generated automatically because of the inductive definition of Vec. But there's more. I just said that a double arrow meant that a value didn't have to be typed; I didn't say that it wouldn't be passed as a parameter. Of course, the compiler doesn't have to call the generated vec_n. If it knows statically what the i parameter of the type is, it can just use that (assuming that length can be inspected). As a trivial example:In this case, we statically know (a::Vec 3 x) and (b::Vec 4 x), so (a++b::Vec 7 x), for some value of x, and length extracts 7 without doing any work.