Verifying Data Structures in Haskell

Posted on April 23, 2017

{-# LANGUAGE DataKinds #-} {-# LANGUAGE TypeOperators #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeInType #-} {-# LANGUAGE KindSignatures #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE BangPatterns #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE RebindableSyntax #-} {-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-} module VerifiedDataStructures where import Data.Kind hiding (type (*)) (type (*)) import Data.Type.Equality import Unsafe.Coerce import GHC.TypeLits hiding (type (<=)) (type (<=)) import Data.Proxy import Data.Coerce import Prelude

A while ago I read this post on reddit (by David Feuer), about sorting traversables (which was a follow-up on this post by Will Fancher), and I was inspired to write some pseudo-dependently-typed Haskell. The post (and subsequent library) detailed how to use size-indexed heaps to perform fast, total sorting on any traversable. I ended up with a library which has five size-indexed heaps (Braun, pairing, binomial, skew, and leftist), each verified for structural correctness. I also included the non-indexed implementations of each for comparison (as well as benchmarks, tests, and all that good stuff).

The purpose of this post is to go through some of the tricks I used and problems I encountered writing a lot of type-level code in modern Haskell.

Type-Level Numbers in Haskell

In order to index things by their size, we’ll need a type-level representation of size. We’ll use Peano numbers for now:

data Peano = Z | S Peano

Z stands for zero, and S for successor. The terseness is pretty necessary here, unfortunately: arithmetic becomes unreadable otherwise. The simplicity of this definition is useful for proofs and manipulation; however any runtime representation of these numbers is going to be woefully slow.

With the DataKinds extension, the above is automatically promoted to the type-level, so we can write type-level functions (type families) on the Peano type:

type family Plus ( n :: Peano ) ( m :: Peano ) :: Peano where ) ( Plus Z m = m Plus ( S n) m = S ( Plus n m) n) mn m)

Here the TypeFamilies extension is needed. I’ll try and mention every extension I’m using as we go, but I might forget a few, so check the repository for all of the examples (quick aside: I did manage to avoid using UndecidableInstances , but more on that later). One pragma that’s worth mentioning is:

This suppresses warnings on the definition of Plus above. Without it, GHC would want us to write:

type family Plus ( n :: Peano ) ( m :: Peano ) :: Peano where ) ( Plus 'Z m = m Plus ( 'S n) m = 'S ( Plus n m) n) mn m)

I think that looks pretty ugly, and it can get much worse with more involved arithmetic. The only thing I have found the warnings useful for is [] : the type-level empty list gives an error in its unticked form.

Using the Type-Level Numbers with a Pairing Heap

In the original post, a pairing heap (Fredman et al. 1986) was used, for its simplicity and performance. The implementation looked like this:

data Heap n a where n a E :: Heap Z a T :: a -> HVec n a -> Heap ( S n) a n an) a data HVec n a where n a HNil :: HVec Z a HCons :: Heap m a -> HVec n a -> HVec ( Plus m n) a m an am n) a

You immediately run into trouble when you try to define merge:

merge :: Ord a => Heap m a -> Heap n a -> Heap ( Plus m n) a m an am n) a E ys = ys mergeysys E = xs merge xsxs @ ( T x xs) h2 @ ( T y ys) merge h1x xs) h2y ys) | x <= y = T x ( HCons h2 xs) x (h2 xs) | otherwise = T y ( HCons h1 ys) y (h1 ys)

Three errors show up here, but we’ll look at the first one:

Could not deduce (m ~ (Plus m Z))

GHC doesn’t know that x = x + 0 x = x + 0 . Somehow, we’ll have to prove that it does.

Singletons

In a language with true dependent types, proving the proposition above is as simple as:

plusZeroNeutral : (n : Nat ) -> n + 0 = n (n Z = Refl plusZeroNeutral S k) = cong (plusZeroNeutral k) plusZeroNeutral (k)cong (plusZeroNeutral k)

(this example is in Idris)

In Haskell, on the other hand, we can’t do the same: functions on the value-level Peano have no relationship with functions on the type-level Peano . There’s no way to automatically link or promote one to the other.

This is where singletons come in (Eisenberg and Weirich 2012). A singleton is a datatype which mirrors a type-level value exactly, except that it has a type parameter which matches the equivalent value on the type-level. In this way, we can write functions on the value-level which are linked to the type-level. Here’s a potential singleton for Peano :

data Natty n where Zy :: Natty Z Sy :: Natty n -> Natty ( S n) n)

(we need GADTs for this example)

Now, when we pattern-match on Natty , we get a proof of whatever its type parameter was. Here’s a trivial example:

isZero :: Natty n -> Maybe (n :~: Z ) (n Zy = Just Refl isZero Sy _) = Nothing isZero (_)

When we match on Zy , the only value which n could have been is Z , because the only way to construct Zy is if the type parameter is Z .

Using this technique, the plusZeroNeutral proof looks reasonably similar to the Idris version:

plusZeroNeutral :: Natty n -> Plus n Z :~: n Zy = Refl plusZeroNeutral Sy n) = case plusZeroNeutral n of plusZeroNeutral (n)plusZeroNeutral n Refl -> Refl

To generalize the singletons a little, we could probably use the singletons library, or we could roll our own:

data family The k :: k -> Type data instance The Peano n where Zy :: The Peano Z Sy :: The Peano n -> The Peano ( S n) n) plusZeroNeutral :: The Peano n -> Plus n Z :~: n Zy = Refl plusZeroNeutral Sy n) = case plusZeroNeutral n of plusZeroNeutral (n)plusZeroNeutral n Refl -> Refl

The The naming is kind of cute, I think. It makes the signature look almost like the Idris version ( the is a function from the Idris standard library). The The type family requires the TypeInType extension, which I’ll talk a little more about later.

Proof Erasure and Totality

There’s an issue with these kinds of proofs: the proof code runs every time it is needed. Since the same value is coming out the other end each time ( Refl ), this seems wasteful.

In a language like Idris, this problem is avoided by noticing that you’re only using the proof for its type information, and then erasing it at runtime. In Haskell, we can accomplish the same with a rule:

{-# NOINLINE plusZeroNeutral #-} {-# RULES "plusZeroNeutral" forall x. plusZeroNeutral x = unsafeCoerce (Refl :: 'Z :~: 'Z) #-}

This basically says “if this type-checks, then the proof must exist, and therefore the proof must be valid. So don’t bother running it”. Unfortunately, that’s a little bit of a lie. It’s pretty easy to write a proof which type-checks that isn’t valid:

falseIsTrue :: False :~: True = falseIsTrue falseIsTruefalseIsTrue

We won’t be able to perform computations which rely on this proof in Haskell, though: because the computation will never terminate, the proof will never provide an answer. This means that, while the proof isn’t valid, it is type safe. That is, of course, unless we use our manual proof-erasure technique. The RULES pragma will happily replace it with the unsafeCoerce version, effectively introducing unsoundness into our proofs. The reason that this doesn’t cause a problem for language like Idris is that Idris has a totality checker: you can’t write the above definition (with the totality checker turned on) in Idris.

So what’s the solution? Do we have to suffer through the slower proof code to maintain correctness? In reality, it’s usually OK to assume termination. It’s pretty easy to see that a proof like plusZeroNeutral is total. It’s worth bearing in mind, though, that until Haskell gets a totality checker (likely never, apparently) these proofs aren’t “proper”.

Generating Singletons

One extra thing: while you’re proving things in one area of your code, you might not have the relevant singleton handy. To generate them on-demand, you’ll need a typeclass:

class KnownSing ( x :: k) where k) sing :: The k x k x instance KnownSing Z where = Zy sing instance KnownSing n => KnownSing ( S n) where n) = Sy sing singsing

This kind of drives home the inefficiency of singleton-based proofs, and why it’s important to erase them aggressively.

Proofs Bundled with the Data Structure

One other way to solve these problems is to try find a data structure which runs the proof code anyway. As an example, consider a length-indexed list:

infixr 5 :- data List n a where n a Nil :: List Z a (:-) :: a -> List n a -> List ( S n) a n an) a

You might worry that concatenation of two lists requires some expensive proof code, like merge for the pairing heap. Maybe surprisingly, the default implementation just works:

infixr 5 ++ (++) :: List n a -> List m a -> List ( Plus n m) a n am an m) a ( ++ ) Nil ys = ys ysys ( ++ ) (x :- xs) ys = x :- xs ++ ys ) (xxs) ysxsys

Why? Well, if you look back to the definition of Plus , it’s almost exactly the same as the definition of (++) . In effect, we’re using lists as the singleton for Peano here.

The question is, then: is there a heap which performs these proofs automatically for functions like merge? As far as I can tell: almost. First though:

Small Digression: Manipulating and Using the Length-Indexed List

The standard definition of ++ on normal lists can be cleaned up a little with foldr

(++) :: [a] -> [a] -> [a] [a][a][a] ( ++ ) = flip ( foldr ( : )) ))

Can we get a similar definition for our length-indexed lists? Turns out we can, but the type of foldr needs to be a little different:

foldrList :: ( forall x . a -> b x -> b ( S x)) b xb (x)) -> b m -> List n a -> b (n + m) b mn ab (nm) Nil = b foldrList f b :- xs) = f x (foldrList f b xs) foldrList f b (xxs)f x (foldrList f b xs) newtype Flip ( f :: t -> u -> Type ) ( a :: u) ( b :: t) ) (u) (t) = Flip { unFlip :: f b a } f b a } foldrList1 :: ( forall x . a -> b x c -> b ( S x) c) b x cb (x) c) -> b m c -> List n a -> b (n + m) c b m cn ab (nm) c foldrList1 f b = unFlip . foldrList (\e -> Flip . f e . unFlip) ( Flip b) unFlipfoldrList (\ef eunFlip) (b) infixr 5 ++ (++) :: List n a -> List m a -> List (n + m) a n am a(nm) a ( ++ ) = flip (foldrList1 ( :- )) (foldrList1 ())

So what’s the point of this more complicated version? Well, if this were normal Haskell, we might get some foldr-fusion or something (in reality we would probably use augment if that were the purpose).

With this type-level business, though, there’s a similar application: loop unrolling. Consider the natural-number type again. We can write a typeclass which will perform induction over them:

class KnownPeano ( n :: Peano ) where unrollRepeat :: Proxy n -> (a -> a) -> a -> a (aa) instance KnownPeano Z where = const id unrollRepeat _ {-# INLINE unrollRepeat #-} instance KnownPeano n => KnownPeano ( S n) where n) _ :: Proxy ( S n)) f x = unrollRepeat (n)) f x Proxy :: Proxy n) f x) f (unrollRepeat (n) f x) {-# INLINE unrollRepeat #-}

Because the recursion here calls a different unrollRepeat function in the “recursive” call, we get around the usual hurdle of not being able to inline recursive calls. That means that the whole loop will be unrolled, at compile-time. We can do the same for foldr:

class HasFoldr ( n :: Peano ) where unrollFoldr :: ( forall x . a -> b x -> b ( S x)) b xb (x)) -> b m b m -> List n a n a -> b (n + m) b (nm) instance HasFoldr Z where = b unrollFoldr _ b _ {-# INLINE unrollFoldr #-} instance HasFoldr n => HasFoldr ( S n) where n) :- xs) = f x (unrollFoldr f b xs) unrollFoldr f b (xxs)f x (unrollFoldr f b xs) {-# INLINE unrollFoldr #-}

I can’t think of many uses for this technique, but one that comes to mind is an n-ary uncurry (like Lisp’s apply):

infixr 5 :- data List ( xs :: [ * ]) where ]) Nil :: List '[] '[] (:-) :: a -> List xs -> List (a ' : xs) xs(a 'xs) class KnownList ( xs :: [ * ]) where ]) foldrT :: ( forall y ys . y -> result ys -> result (y ' : ys)) y ysresult ysresult (y 'ys)) -> result '[] result '[] -> List xs xs -> result xs result xs instance KnownList ('[] :: [ * ]) where ('[]]) = const foldrT _ {-# INLINE foldrT #-} instance KnownList xs => xs KnownList (x ' : xs) where (x 'xs) :- xs) = f x (foldrT f b xs) foldrT f b (xxs)f x (foldrT f b xs) {-# INLINE foldrT #-} type family Func ( xs :: [ * ]) ( y :: * ) where ]) ( Func '[] y = y '[] y Func (x ' : xs) y = x -> Func xs y (x 'xs) yxs y newtype FunType y xs = FunType y xs { runFun :: Func xs y -> y xs y } uncurry :: KnownList xs xs => Func xs y -> List xs -> y xs yxs uncurry f l = f l runFun (foldrT -> g (h x))) (c (\x g hg (h x))) ( FunType id ) l) f where c :: (a -> (( Func xs y -> y) -> ( Func zs z -> z))) (a((xs yy)zs zz))) -> (a -> ( FunType y xs -> FunType z zs)) (ay xsz zs)) c = coerce coerce {-# INLINE c #-} {-# INLINE uncurry #-}

I think that you can be guaranteed the above is inlined at compile-time, making it essentially equivalent to a handwritten uncurry .

Binomial Heaps

Anyway, back to the size-indexed heaps. The reason that (++) worked so easily on lists is that a list can be thought of as the data-structure equivalent to Peano numbers. Another numeric-system-based data structure is the binomial heap, which is based on binary numbering (I’m going mainly off of the description from Hinze 1999).

So, to work with binary numbers, let’s get some preliminaries on the type-level out of the way:

data instance The Bool x where Falsy :: The Bool False Truey :: The Bool True data instance The [k] xs where [k] xs Nily :: The [k] '[] [k] '[] Cony :: The k x -> The [k] xs -> The [k] (x : xs) k x[k] xs[k] (xxs) instance KnownSing True where = Truey sing instance KnownSing False where = Falsy sing instance KnownSing '[] where '[] = Nily sing instance ( KnownSing xs, KnownSing x) => xs,x) KnownSing (x : xs) where (xxs) = Cony sing sing singsing sing

We’ll represent a binary number as a list of Booleans:

type family Sum ( x :: Bool ) ( y :: Bool ) ( cin :: Bool ) :: Bool where ) () ( Sum False False False = False Sum False False True = True Sum False True False = True Sum False True True = False Sum True False False = True Sum True False True = False Sum True True False = False Sum True True True = True type family Carry ( x :: Bool ) ( y :: Bool ) ( cin :: Bool ) ) () ( ( xs :: [ Bool ]) ( ys :: [ Bool ]) :: [ Bool ] where ]) (]) Carry False False False xs ys = Add False xs ys xs ysxs ys Carry False False True xs ys = Add False xs ys xs ysxs ys Carry False True False xs ys = Add False xs ys xs ysxs ys Carry False True True xs ys = Add True xs ys xs ysxs ys Carry True False False xs ys = Add False xs ys xs ysxs ys Carry True False True xs ys = Add True xs ys xs ysxs ys Carry True True False xs ys = Add True xs ys xs ysxs ys Carry True True True xs ys = Add True xs ys xs ysxs ys type family Add ( cin :: Bool ) ( xs :: [ Bool ]) ( ys :: [ Bool ]) :: ) (]) (]) [ Bool ] where Add c (x : xs) (y : ys) = Sum x y c : Carry x y c xs ys c (xxs) (yys)x y cx y c xs ys Add False '[] ys = ys '[] ysys Add False xs '[] = xs xs '[]xs Add True '[] ys = CarryOne ys '[] ysys Add True xs '[] = CarryOne xs xs '[]xs type family CarryOne ( xs :: [ Bool ]) :: [ Bool ] where ]) CarryOne '[] = True : '[] '[]'[] CarryOne ( False : xs) = True : xs xs)xs CarryOne ( True : xs) = False : CarryOne xs xs)xs

The odd definition of Carry is to avoid UndecidableInstances : if we had written, instead:

type family Carry ( x :: Bool ) ( y :: Bool ) ( cin :: Bool ) :: Bool where ) () ( Carry False False False = False Carry False False True = False Carry False True False = False Carry False True True = True Carry True False False = False Carry True False True = True Carry True True False = True Carry True True True = True type family Add ( cin :: Bool ) ( xs :: [ Bool ]) ( ys :: [ Bool ]) :: ) (]) (]) [ Bool ] where Add c (x : xs) (y : ys) = Sum x y c : Add ( Carry x y c) xs ys c (xxs) (yys)x y cx y c) xs ys Add False '[] ys = ys '[] ysys Add False xs '[] = xs xs '[]xs Add True '[] ys = CarryOne ys '[] ysys Add True xs '[] = CarryOne xs xs '[]xs

We would have been warned about nested type-family application.

Now we can base the merge function very closely on these type families. First, though, we’ll have to implement the heap.

Almost-Verified Data Structures

There are different potential properties you can verify in a data structure. In the sort-traversable post, the property of interest was that the number of elements in the structure would stay the same after adding and removing some number n n of elements. For this post, we’ll also verify structural invariants. I won’t, however, verify the heap property. Maybe in a later post.

When indexing a data structure by its size, you encode an awful lot of information into the type signature: the type becomes very specific to the structure in question. It is possible, though, to encode a fair few structural invariants without getting so specific. Here’s a signature for “perfect leaf tree”:

data BalTree a = Leaf a | Node ( BalTree (a,a)) (a,a))

With that signature, it’s impossible to create a tree with more elements in its left branch than its right; the size of the tree, however, remains unspecified. You can use a similar trick to implement matrices which must be square (from Okasaki 1999): the usual trick ( type Matrix n a = List n (List n a) ) is too specific, providing size information at compile-time. If you’re interested in this approach, there are several more examples in Hinze (2001).

It is possible to go from the size-indexed version back to the non-indexed version, with an existential ( RankNTypes for this example):

data ErasedSize f a = forall ( n :: Peano ) . ErasedSize f a { runErasedSize :: f n a f n a }

This will let you prove invariants in your implementation using an index, while keeping the user-facing type signature general and non-indexed.

A Fully-Structurally-Verified Binomial Heap

Wasserman (2010), was able to encode all of the structural invariants of the binomial heap without indexing by its size (well, all invariants except truncation, which turned out to be important a little later). I’ll be using a similar approach, except I’ll leverage some of the newer bells and whistles in GHC. Where Wasserman’s version used types like this for the numbering:

data Zero a = Zero data Succ rk a = BinomTree rk a :< rk a rk ark ark a data BinomTree rk a = BinomTree a (rk a) rk aa (rk a)

We can reuse the type-level Peano numbers with a GADT:

infixr 5 :- data Binomial xs rk a where xs rk a Nil :: Binomial '[] n a '[] n a Skip :: Binomial xs ( S rk) a -> Binomial ( False : xs) rk a xs (rk) axs) rk a (:-) :: Tree rk a rk a -> Binomial xs ( S rk) a xs (rk) a -> Binomial ( True : xs) rk a xs) rk a data Tree rk a = Root a ( Node rk a) rk aa (rk a) infixr 5 :< data Node n a where n a NilN :: Node Z a (:<) :: Tree n a -> Node n a -> Node ( S n) a n an an) a

The definition of Tree here ensures that any tree of rank n n has 2 n 2^n elements. The binomial heap, then, is a list of trees, in ascending order of size, with a True at every point in its type-level list where a tree is present, and a False wherever one is absent. In other words, the type-level list is a binary encoding of the number of elements it contains.

And here are the merge functions:

mergeTree :: Ord a => Tree rk a -> Tree rk a -> Tree ( S rk) a rk ark ark) a @ ( Root x xs) yr @ ( Root y ys) mergeTree xrx xs) yry ys) | x <= y = Root x (yr :< xs) x (yrxs) | otherwise = Root y (xr :< ys) y (xrys) merge :: Ord a => Binomial xs z a xs z a -> Binomial ys z a ys z a -> Binomial ( Add False xs ys) z a xs ys) z a Nil ys = ys mergeysys Nil = xs merge xsxs Skip xs) ( Skip ys) = Skip (merge xs ys) merge (xs) (ys)(merge xs ys) Skip xs) (y :- ys) = y :- merge xs ys merge (xs) (yys)merge xs ys :- xs) ( Skip ys) = x :- merge xs ys merge (xxs) (ys)merge xs ys :- xs) (y :- ys) = Skip (mergeCarry (mergeTree x y) xs ys) merge (xxs) (yys)(mergeCarry (mergeTree x y) xs ys) mergeCarry :: Ord a => Tree rk a rk a -> Binomial xs rk a xs rk a -> Binomial ys rk a ys rk a -> Binomial ( Add True xs ys) rk a xs ys) rk a Nil ys = carryOne t ys mergeCarry tyscarryOne t ys Nil = carryOne t xs mergeCarry t xscarryOne t xs Skip xs) ( Skip ys) = t :- merge xs ys mergeCarry t (xs) (ys)merge xs ys Skip xs) (y :- ys) = Skip (mergeCarry (mergeTree t y) xs ys) mergeCarry t (xs) (yys)(mergeCarry (mergeTree t y) xs ys) :- xs) ( Skip ys) = Skip (mergeCarry (mergeTree t x) xs ys) mergeCarry t (xxs) (ys)(mergeCarry (mergeTree t x) xs ys) :- xs) (y :- ys) = t :- mergeCarry (mergeTree x y) xs ys mergeCarry t (xxs) (yys)mergeCarry (mergeTree x y) xs ys carryOne :: Ord a => Tree rk a -> Binomial xs rk a -> Binomial ( CarryOne xs) rk a rk axs rk axs) rk a Nil = t :- Nil carryOne t Skip xs) = t :- xs carryOne t (xs)xs :- xs) = Skip (carryOne (mergeTree t x) xs) carryOne t (xxs)(carryOne (mergeTree t x) xs)

You’ll notice that no proofs are needed: that’s because the merge function itself is the same as the type family, like the way ++ for lists was the same as the Plus type family.

Of course, this structure is only verified insofar as you believe the type families. It does provide a degree of double-entry, though: any mistake in the type family will have to be mirrored in the merge function to type-check. On top of that, we can write some proofs of properties we might expect:

addCommutes :: The [ Bool ] xs ] xs -> The [ Bool ] ys ] ys -> Add False xs ys :~: Add False ys xs xs ysys xs Nily _ = Refl addCommutes Nily = Refl addCommutes _ Cony Falsy xs) ( Cony Falsy ys) = addCommutes (xs) (ys) Refl gcastWith (addCommutes xs ys) Cony Truey xs) ( Cony Falsy ys) = addCommutes (xs) (ys) Refl gcastWith (addCommutes xs ys) Cony Falsy xs) ( Cony Truey ys) = addCommutes (xs) (ys) Refl gcastWith (addCommutes xs ys) Cony Truey xs) ( Cony Truey ys) = addCommutes (xs) (ys) Refl gcastWith (addCommutesCarry xs ys) addCommutesCarry :: The [ Bool ] xs ] xs -> The [ Bool ] ys ] ys -> Add True xs ys :~: Add True ys xs xs ysys xs Nily _ = Refl addCommutesCarry Nily = Refl addCommutesCarry _ Cony Falsy xs) ( Cony Falsy ys) = addCommutesCarry (xs) (ys) Refl gcastWith (addCommutes xs ys) Cony Truey xs) ( Cony Falsy ys) = addCommutesCarry (xs) (ys) Refl gcastWith (addCommutesCarry xs ys) Cony Falsy xs) ( Cony Truey ys) = addCommutesCarry (xs) (ys) Refl gcastWith (addCommutesCarry xs ys) Cony Truey xs) ( Cony Truey ys) = addCommutesCarry (xs) (ys) Refl gcastWith (addCommutesCarry xs ys)

Unfortunately, though, this method does require proofs (ugly proofs) for the delete-min operation. One of the issues is truncation: since the binary digits are stored least-significant-bit first, the same number can be represented with any number of trailing zeroes. This kept causing problems for me when it came to subtraction, and adding the requirement of no trailing zeroes (truncation) to the constructors for the heap was a pain, requiring extra proofs on merge to show that it preserves truncation.

Doubly-Dependent Types

Since some of these properties are much easier to verify on the type-level Peano numbers, one approach might be to convert back and forth between Peano numbers and binary, and use the proofs on Peano numbers instead.

type family BintoPeano ( xs :: [ Bool ]) :: Peano where ]) BintoPeano '[] = Z '[] BintoPeano ( False : xs) = BintoPeano xs + BintoPeano xs xs)xsxs BintoPeano ( True : xs) = S ( BintoPeano xs + BintoPeano xs) xs)xsxs)

First problem: this requires UndecidableInstances . I’d really rather not have that turned on, to be honest. In Idris (and Agda), you can prove decidability using a number of different methods, but this isn’t available in Haskell yet.

Regardless, we can push on.

To go in the other direction, we’ll need to calculate the parity of natural numbers. Taken from the Idris tutorial:

data Parity ( n :: Peano ) where Even :: The Peano n -> Parity (n + n) (nn) Odd :: The Peano n -> Parity ( S (n + n)) (nn)) parity :: The Peano n -> Parity n Zy = Even Zy parity Sy Zy ) = Odd Zy parity ( Sy ( Sy n)) = case parity n of parity (n))parity n Even m -> gcastWith (plusSuccDistrib m m) ( Even ( Sy m)) gcastWith (plusSuccDistrib m m) (m)) Odd m -> gcastWith (plusSuccDistrib m m) ( Odd ( Sy m)) gcastWith (plusSuccDistrib m m) (m)) plusSuccDistrib :: The Peano n -> proxy m -> n + S m :~: S (n + m) proxy m(nm) Zy _ = Refl plusSuccDistrib Sy n) p = gcastWith (plusSuccDistrib n p) Refl plusSuccDistrib (n) pgcastWith (plusSuccDistrib n p)

We need this function on the type-level, though, not the value-level: here, again, we run into trouble. What does gcastWith look like on the type-level? As far as I can tell, it doesn’t exist (yet. Although I haven’t looked deeply into the singletons library yet).

This idea of doing dependently-typed stuff on the type-level started to be possible with TypeInType . For instance, we could have defined our binary type as:

data Binary :: Peano -> Type where O :: Binary n -> Binary (n + n) (nn) I :: Binary n -> Binary ( S (n + n)) (nn)) E :: Binary Z

And then the binomial heap as:

data Binomial ( xs :: Binary n) ( rk :: Peano ) ( a :: Type ) where n) () ( Nil :: Binomial E n a n a Skip :: Binomial xs ( S rk) a -> Binomial ( O xs) rk a xs (rk) axs) rk a (:-) :: Tree rk a rk a -> Binomial xs ( S rk) a xs (rk) a -> Binomial ( I xs) rk a xs) rk a

What we’re doing here is indexing a type by an indexed type. This wasn’t possible in Haskell a few years ago. It still doesn’t get us a nice definition of subtraction, though.

Using a Typechecker Plugin

It’s pretty clear that this approach gets tedious almost immediately. What’s more, if we want the proofs to be erased, we introduce potential for errors.

The solution? Beef up GHC’s typechecker with a plugin. I first came across this approach in Kenneth Foner’s talk at Compose. He used a plugin that called out to the Z3 theorem prover (from Diatchki 2015); I’ll use a simpler plugin which just normalizes type-literals.

From what I’ve used of these plugins so far, they seem to work really well. They’re very unobtrusive, only requiring a pragma at the top of your file:

The plugin is only called when GHC can’t unify two types: this means you don’t get odd-looking error messages in unrelated code (in fact, the error messages I’ve seen so far have been excellent—a real improvement on the standard error messages for type-level arithmetic). Another benefit is that we get to use type-level literals ( Nat imported from GHC.TypeLits), rather then the noisy-looking type-level Peano numbers.

data Tree n a = Root a ( Node n a) n aa (n a) data Node :: Nat -> Type -> Type where NilN :: Node 0 a (:<) :: {-# UNPACK #-} ! ( Tree n a) n a) -> Node n a n a -> Node ( 1 + n) a n) a mergeTree :: Ord a => Tree n a -> Tree n a -> Tree ( 1 + n) a n an an) a @ ( Root x xs) yr @ ( Root y ys) mergeTree xrx xs) yry ys) | x <= y = Root x (yr :< xs) x (yrxs) | otherwise = Root y (xr :< ys) y (xrys) infixr 5 :- data Binomial :: Nat -> Nat -> Type -> Type where Nil :: Binomial n 0 a (:-) :: {-# UNPACK #-} ! ( Tree z a) z a) -> Binomial ( 1 + z) xs a z) xs a -> Binomial z ( 1 + xs + xs) a z (xsxs) a Skip :: Binomial ( 1 + z) ( 1 + xs) a z) (xs) a -> Binomial z ( 2 + xs + xs) a z (xsxs) a

This definition also ensures that the binomial heap has no trailing zeroes in its binary representation: the Skip constructor can only be applied to a heap bigger than zero.

Since we’re going to be looking at several different heaps, we’ll need a class to represent all of them:

class IndexedQueue h a where h a {-# MINIMAL insert, empty, minViewMay, minView #-} empty :: h 0 a minView :: h ( 1 + n) a -> (a, h n a) h (n) a(a, h n a) singleton :: a -> h 1 a = flip insert empty singletoninsert empty insert :: a -> h n a -> h ( 1 + n) a h n ah (n) a minViewMay :: h n a h n a -> (n ~ 0 => b) (nb) -> ( forall m . ( 1 + m) ~ n => a -> h m a -> b) m)h m ab) -> b class IndexedQueue h a => h a MeldableIndexedQueue h a where h a merge :: h n a -> h m a -> h (n + m) a h n ah m ah (nm) a

You’ll need MultiParamTypeClasses for this one.

mergeB :: Ord a => Binomial z xs a -> Binomial z ys a -> Binomial z (xs + ys) a z xs az ys az (xsys) a Nil ys = ys mergeBysys Nil = xs mergeB xsxs Skip xs) ( Skip ys) = Skip (mergeB xs ys) mergeB (xs) (ys)(mergeB xs ys) Skip xs) (y :- ys) = y :- mergeB xs ys mergeB (xs) (yys)mergeB xs ys :- xs) ( Skip ys) = x :- mergeB xs ys mergeB (xxs) (ys)mergeB xs ys :- xs) (y :- ys) = Skip (mergeCarry (mergeTree x y) xs ys) mergeB (xxs) (yys)(mergeCarry (mergeTree x y) xs ys) mergeCarry :: Ord a => Tree z a z a -> Binomial z xs a z xs a -> Binomial z ys a z ys a -> Binomial z ( 1 + xs + ys) a z (xsys) a ! t Nil ys = carryOne t ys mergeCarryyscarryOne t ys ! t xs Nil = carryOne t xs mergeCarryt xscarryOne t xs ! t ( Skip xs) ( Skip ys) = t :- mergeB xs ys mergeCarryt (xs) (ys)mergeB xs ys ! t ( Skip xs) (y :- ys) = Skip (mergeCarry (mergeTree t y) xs ys) mergeCarryt (xs) (yys)(mergeCarry (mergeTree t y) xs ys) ! t (x :- xs) ( Skip ys) = Skip (mergeCarry (mergeTree t x) xs ys) mergeCarryt (xxs) (ys)(mergeCarry (mergeTree t x) xs ys) ! t (x :- xs) (y :- ys) = t :- mergeCarry (mergeTree x y) xs ys mergeCarryt (xxs) (yys)mergeCarry (mergeTree x y) xs ys carryOne :: Ord a => Tree z a -> Binomial z xs a -> Binomial z ( 1 + xs) a z az xs az (xs) a ! t Nil = t :- Nil carryOne ! t ( Skip xs) = t :- xs carryOnet (xs)xs ! t (x :- xs) = Skip (carryOne (mergeTree t x) xs) carryOnet (xxs)(carryOne (mergeTree t x) xs) instance Ord a => MeldableIndexedQueue ( Binomial 0 ) a where ) a = mergeB mergemergeB {-# INLINE merge #-} instance Ord a => IndexedQueue ( Binomial 0 ) a where ) a = Nil empty = Root x NilN :- Nil singleton x = merge . singleton insertmergesingleton

( BangPatterns for this example)

On top of that, it’s very easy to define delete-min:

= case minViewZip xs of minView xsminViewZip xs Zipper x _ ys -> (x, ys) x _ ys(x, ys) = case q of minViewMay q b f Nil -> b _ :- _ -> uncurry f (minView q) f (minView q) Skip _ -> uncurry f (minView q) f (minView q) data Zipper a n rk = Zipper ! a ( Node rk a) ( Binomial rk n a) a n rka (rk a) (rk n a) skip :: Binomial ( 1 + z) xs a -> Binomial z (xs + xs) a z) xs az (xsxs) a = case x of skip x Nil -> Nil Skip _ -> Skip x _ :- _ -> Skip x data MinViewZipper a n rk where a n rk Infty :: MinViewZipper a 0 rk rk Min :: {-# UNPACK #-} ! ( Zipper a n rk) -> MinViewZipper a (n + 1 ) rk a n rk)a (n) rk slideLeft :: Zipper a n ( 1 + rk) -> Zipper a ( 1 + n + n) rk a n (rk)a (n) rk Zipper m (t :< ts) hs) slideLeft (m (tts) hs) = Zipper m ts (t :- hs) m ts (ths) pushLeft :: Ord a => Tree rk a rk a -> Zipper a n ( 1 + rk) a n (rk) -> Zipper a ( 2 + n + n) rk a (n) rk Zipper m (t :< ts) hs) pushLeft c (m (tts) hs) = Zipper m ts ( Skip (carryOne (mergeTree c t) hs)) m ts ((carryOne (mergeTree c t) hs)) minViewZip :: Ord a => Binomial rk ( 1 + n) a -> Zipper a n rk rk (n) aa n rk Skip xs) = slideLeft (minViewZip xs) minViewZip (xs)slideLeft (minViewZip xs) @ ( Root x ts) :- f) = case minViewZipMay f of minViewZip (tx ts)f)minViewZipMay f Min ex @ ( Zipper minKey _ _) | minKey < x -> pushLeft t ex exminKey _ _)minKeypushLeft t ex _ -> Zipper x ts (skip f) x ts (skip f) minViewZipMay :: Ord a => Binomial rk n a -> MinViewZipper a n rk rk n aa n rk Skip xs) = Min (slideLeft (minViewZip xs)) minViewZipMay (xs)(slideLeft (minViewZip xs)) Nil = Infty minViewZipMay @ ( Root x ts) :- f) = Min $ case minViewZipMay f of minViewZipMay (tx ts)f)minViewZipMay f Min ex @ ( Zipper minKey _ _) | minKey < x -> pushLeft t ex exminKey _ _)minKeypushLeft t ex _ -> Zipper x ts (skip f) x ts (skip f)

Similarly, compare the version of the pairing heap with the plugin:

data Heap n a where n a E :: Heap 0 a T :: a -> HVec n a -> Heap ( 1 + n) a n an) a data HVec n a where n a HNil :: HVec 0 a HCons :: Heap m a -> HVec n a -> HVec (m + n) a m an a(mn) a insert :: Ord a => a -> Heap n a -> Heap ( 1 + n) a n an) a = merge ( T x HNil ) xs insert x xsmerge () xs merge :: Ord a => Heap m a -> Heap n a -> Heap (m + n) a m an a(mn) a E ys = ys mergeysys E = xs merge xsxs @ ( T x xs) h2 @ ( T y ys) merge h1x xs) h2y ys) | x <= y = T x ( HCons h2 xs) x (h2 xs) | otherwise = T y ( HCons h1 ys) y (h1 ys) minView :: Ord a => Heap ( 1 + n) a -> (a, Heap n a) n) a(a,n a) T x hs) = (x, mergePairs hs) minView (x hs)(x, mergePairs hs) mergePairs :: Ord a => HVec n a -> Heap n a n an a HNil = E mergePairs HCons h HNil ) = h mergePairs ( HCons h1 ( HCons h2 hs)) = mergePairs (h1 (h2 hs)) merge (merge h1 h2) (mergePairs hs)

To the version without the plugin:

data Heap n a where n a E :: Heap Z a T :: a -> HVec n a -> Heap ( S n) a n an) a data HVec n a where n a HNil :: HVec Z a HCons :: Heap m a -> HVec n a -> HVec (m + n) a m an a(mn) a class Sized h where size :: h n a -> The Peano n h n a instance Sized Heap where E = Zy size T _ xs) = Sy (size xs) size (_ xs)(size xs) plus :: The Peano n -> The Peano m -> The Peano (n + m) (nm) Zy m = m plus Sy n) m = Sy (plus n m) plus (n) m(plus n m) instance Sized HVec where HNil = Zy size HCons h hs) = size h `plus` size hs size (h hs)size hsize hs insert :: Ord a => a -> Heap n a -> Heap ( S n) a n an) a = merge ( T x HNil ) xs insert x xsmerge () xs merge :: Ord a => Heap m a -> Heap n a -> Heap (m + n) a m an a(mn) a E ys = ys mergeysys E = case plusZero (size xs) of Refl -> xs merge xsplusZero (size xs)xs @ ( T x xs) h2 @ ( T y ys) merge h1x xs) h2y ys) | x <= y = case plusCommutative (size h2) (size xs) of plusCommutative (size h2) (size xs) Refl -> T x ( HCons h2 xs) x (h2 xs) | otherwise = case plusSuccDistrib (size xs) (size ys) of plusSuccDistrib (size xs) (size ys) Refl -> T y ( HCons h1 ys) y (h1 ys) minView :: Ord a => Heap ( S n) a -> (a, Heap n a) n) a(a,n a) T x hs) = (x, mergePairs hs) minView (x hs)(x, mergePairs hs) mergePairs :: Ord a => HVec n a -> Heap n a n an a HNil = E mergePairs HCons h HNil ) = case plusZero (size h) of Refl -> h mergePairs (plusZero (size h) HCons h1 ( HCons h2 hs)) = mergePairs (h1 (h2 hs)) case plusAssoc (size h1) (size h2) (size hs) of plusAssoc (size h1) (size h2) (size hs) Refl -> merge (merge h1 h2) (mergePairs hs) merge (merge h1 h2) (mergePairs hs)

Leftist Heaps

The typechecker plugin makes it relatively easy to implement several other heaps: skew, Braun, etc. You’ll need one extra trick to implement a leftist heap, though. Let’s take a look at the unverified version:

data Leftist a = Leaf | Node {-# UNPACK #-} ! Int a ( Leftist a) a) ( Leftist a) a) rank :: Leftist s -> Int Leaf = 0 rank Node r _ _ _) = r rank (r _ _ _) {-# INLINE rank #-} mergeL :: Ord a => Leftist a -> Leftist a -> Leftist a Leaf h2 = h2 mergeLh2h2 Leaf = h1 mergeL h1h1 @ ( Node w1 p1 l1 r1) h2 @ ( Node w2 p2 l2 r2) mergeL h1w1 p1 l1 r1) h2w2 p2 l2 r2) | p1 < p2 = p1p2 if ll <= lr lllr then LNode (w1 + w2) p1 l1 (mergeL r1 h2) (w1w2) p1 l1 (mergeL r1 h2) else LNode (w1 + w2) p1 (mergeL r1 h2) l1 (w1w2) p1 (mergeL r1 h2) l1 | otherwise = if rl <= rr rlrr then LNode (w1 + w2) p2 l2 (mergeL r2 h1) (w1w2) p2 l2 (mergeL r2 h1) else LNode (w1 + w2) p2 (mergeL r2 h1) l2 (w1w2) p2 (mergeL r2 h1) l2 where = rank r1 + w2 llrank r1w2 = rank l1 lrrank l1 = rank r2 + w1 rlrank r2w1 = rank l2 rrrank l2

In a weight-biased leftist heap, the left branch in any tree must have at least as many elements as the right branch. Ideally, we would encode that in the representation of size-indexed leftist heap:

data Leftist n a where n a Leaf :: Leftist 0 a Node :: ! ( The Nat (n + m + 1 )) (n)) -> a -> Leftist n a n a -> Leftist m a m a -> ! (m <= n) (mn) -> Leftist (n + m + 1 ) a (n) a rank :: Leftist n s -> The Nat n n s Leaf = sing ranksing Node r _ _ _ _) = r rank (r _ _ _ _) {-# INLINE rank #-}

Two problems, though: first of all, we need to be able to compare the sizes of two heaps, in the merge function. If we were using the type-level Peano numbers, this would be too slow. More importantly, though, we need the comparison to provide a proof of the ordering, so that we can use it in the resulting Node constructor.

Integer-Backed Type-Level Numbers

In Agda, the Peano type is actually backed by Haskell’s Integer at runtime. This allows compile-time proofs to be written about values which are calculated efficiently. We can mimic the same thing in Haskell with a newtype wrapper around Integer with a phantom Peano parameter, if we promise to never put an integer in which has a different value to its phantom value. We can make this promise a little more trustworthy if we don’t export the newtype constructor.

newtype instance The Nat n where NatSing :: Integer -> The Nat n instance KnownNat n => KnownSing n where = NatSing $ Prelude.fromInteger $ natVal ( Proxy :: Proxy n) singPrelude.fromIntegernatVal (n)

FlexibleInstances is needed for the instance. We can also encode all the necessary arithmetic:

infixl 6 +. (+.) :: The Nat n -> The Nat m -> The Nat (n + m) (nm) ( +. ) = ( coerce :: ( Integer -> Integer -> Integer ) -> The Nat n -> The Nat m -> The Nat (n + m)) (nm)) ( + ) {-# INLINE (+.) #-}

Finally, the compare function ( ScopedTypeVariables for this):

4 <=. infix (<=.) :: The Nat n -> The Nat m -> The Bool (n <=? m) (nm) ( <=. ) ( NatSing x :: The Nat n) ( NatSing y :: The Nat m) ) (n) (m) | x <= y = case (unsafeCoerce ( Refl :: True :~: True ) :: (n <=? m) :~: True ) of (unsafeCoerce ((nm) Refl -> Truey | otherwise = case (unsafeCoerce ( Refl :: True :~: True ) :: (n <=? m) :~: False ) of (unsafeCoerce ((nm) Refl -> Falsy {-# INLINE (<=.) #-} totalOrder :: p n -> q m -> (n <=? m) :~: False -> (m <=? n) :~: True p nq m(nm)(mn) _ :: p n) ( _ :: q m) Refl = totalOrder (p n) (q m) Refl :: (m <=? n) :~: True unsafeCoerce(mn) type x <= y = (x <=? y) :~: True (xy)

It’s worth mentioning that all of these functions are somewhat axiomatic: there’s no checking of these definitions going on, and any later proofs are only correct in terms of these functions.

If we want our merge function to really look like the non-verified version, though, we’ll have to mess around with the syntax a little.

A Dependent if-then-else

When matching on a singleton, within the case-match, proof of the singleton’s type is provided. For instance:

type family IfThenElse ( c :: Bool ) ( true :: k) ( false :: k) :: k ) (k) (k) where IfThenElse True true false = true true falsetrue IfThenElse False true false = false true falsefalse intOrString :: The Bool cond -> IfThenElse cond Int String condcond Truey = 1 intOrString Falsy = "abc" intOrString

In Haskell, since we can overload the if-then-else construct (with RebindableSyntax ), we can provide the same syntax, while hiding the dependent nature:

ifThenElse :: The Bool c -> (c :~: True -> a) -> (c :~: False -> a) -> a (ca)(ca) Truey t _ = t Refl ifThenElset _ Falsy _ f = f Refl ifThenElse_ f

Verified Merge

Finally, then, we can write the implementation for merge, which looks almost exactly the same as the non-verified merge:

instance Ord a => IndexedQueue Leftist a where Node _ x l r _) = (x, merge l r) minView (_ x l r _)(x, merge l r) {-# INLINE minView #-} = Node sing x Leaf Leaf Refl singleton xsing x {-# INLINE singleton #-} = Leaf empty {-# INLINE empty #-} = merge . singleton insertmergesingleton {-# INLINE insert #-} Leaf b _ = b minViewMayb _ Node _ x l r _) _ f = f x (merge l r) minViewMay (_ x l r _) _ ff x (merge l r) instance Ord a => MeldableIndexedQueue Leftist a where Leaf h2 = h2 mergeh2h2 Leaf = h1 merge h1h1 @ ( Node w1 p1 l1 r1 _) h2 @ ( Node w2 p2 l2 r2 _) merge h1w1 p1 l1 r1 _) h2w2 p2 l2 r2 _) | p1 < p2 = p1p2 if ll <=. lr lllr then Node (w1 +. w2) p1 l1 (merge r1 h2) (w1w2) p1 l1 (merge r1 h2) else Node (w1 +. w2) p1 (merge r1 h2) l1 . totalOrder ll lr (w1w2) p1 (merge r1 h2) l1totalOrder ll lr | otherwise = if rl <=. rr rlrr then Node (w1 +. w2) p2 l2 (merge r2 h1) (w1w2) p2 l2 (merge r2 h1) else Node (w1 +. w2) p2 (merge r2 h1) l2 . totalOrder rl rr (w1w2) p2 (merge r2 h1) l2totalOrder rl rr where = rank r1 +. w2 llrank r1w2 = rank l1 lrrank l1 = rank r2 +. w1 rlrank r2w1 = rank l2 rrrank l2 {-# INLINE merge #-}

What’s cool about this implementation is that it has the same performance as the non-verified version (if Integer is swapped out for Int , that is), and it looks pretty much the same. This is very close to static verification for free.

Generalizing Sort to Parts

The Sort type used in the original blog post can be generalized to any indexed container.

data Parts f g a b r where f g a b r Parts :: ( forall n . g (m + n) b -> (g n b, r)) g (mn) b(g n b, r)) -> ! (f m a) (f m a) -> Parts f g a b r f g a b r instance Functor ( Parts f g a b) where f g a b) fmap f ( Parts g h) = f (g h) Parts (\h' -> case g h' of (remn, r) -> (remn, f r)) h (\h'g h'(remn, r)(remn, f r)) h {-# INLINE fmap #-} instance ( IndexedQueue f x, MeldableIndexedQueue f x) => f x,f x) Applicative ( Parts f g x y) where f g x y) pure x = Parts (\h -> (h, x)) empty (\h(h, x)) empty {-# INLINE pure #-} ( Parts f ( xs :: f m x) :: Parts f g x y (a -> b)) <*> f (f m x)f g x y (ab)) Parts g ( ys :: f n x) = g (f n x) Parts h (merge xs ys) h (merge xs ys) where h :: forall o . g ((m + n) + o) y -> (g o y, b) g ((mn)o) y(g o y, b) = case f v of { (v', a) -> h vf v{ (v', a) case g v' of { (v'', b) -> g v'{ (v'', b) (v'', a b)}} {-# INLINABLE (<*>) #-}

This version doesn’t insist that you order the elements of the heap in any particular way: we could use indexed difference lists to reverse a container, or indexed lists to calculate permutations of a container, for instance.

Other Uses For Size-Indexed Heaps

I’d be very interested to see any other uses of these indexed heaps, if anyone has any ideas. Potentially the could be used in any place where there is a need for some heap which is known to be of a certain size (a true prime sieve, for instance).

The Library

I’ve explored all of these ideas here. It has implementations of all the heaps I mentioned, as well as the index-erasing type, and a size-indexed list, for reversing traversables. In the future, I might add things like a Fibonacci heap, or the optimal Brodal/Okasaki heap (Brodal and Okasaki 1996).