GADTs are weak approximations of inductive families from dependently typed languages—so let's begin there instead.

Inductive families are the core datatype introduction method in a dependently typed language. For instance, in Agda you define the natural numbers like this

data Nat : Set where zero : Nat succ : Nat -> Nat

which isn't very fancy, it's essentially just the same thing as the Haskell definition

data Nat = Zero | Succ Nat

and indeed in GADT syntax the Haskell form is even more similar

{-# LANGUAGE GADTs #-} data Nat where Zero :: Nat Succ :: Nat -> Nat

So, at first blush you might think GADTs are just neat extra syntax. That's just the very tip of the iceberg though.

Agda has capacity to represent all kinds of types unfamiliar and strange to a Haskell programmer. A simple one is the type of finite sets. This type is written like Fin 3 and represents the set of numbers {0, 1, 2} . Likewise, Fin 5 represents the set of numbers {0,1,2,3,4} .

This should be quite bizarre at this point. First, we're referring to a type which has a regular number as a "type" parameter. Second, it's not clear what it means for Fin n to represent the set {0,1...n} . In real Agda we'd do something more powerful, but it suffices to say that we can define a contains function

contains : Nat -> Fin n -> Bool contains i f = ?

Now this is strange again because the "natural" definition of contains would be something like i < n , but n is a value that only exists in the type Fin n and we shouldn't be able to cross that divide so easily. While it turns out that the definition is not nearly so straightforward, this is exactly the power that inductive families have in dependently typed languages—they introduce values that depend on their types and types that depend on their values.

We can examine what it is about Fin that gives it that property by looking at its definition.

data Fin : Nat -> Set where zerof : (n : Nat) -> Fin (succ n) succf : (n : Nat) -> (i : Fin n) -> Fin (succ n)

this takes a little work to understand, so as an example lets try constructing a value of the type Fin 2 . There are a few ways to do this (in fact, we'll find that there are exactly 2)

zerof 1 : Fin 2 zerof 2 : Fin 3 -- nope! zerof 0 : Fin 1 -- nope! succf 1 (zerof 0) : Fin 2

This lets us see that there are two inhabitants and also demonstrates a little bit of how type computation happens. In particular, the (n : Nat) bit in the type of zerof reflects the actual value n up into the type allowing us to form Fin (n+1) for any n : Nat . After that we use repeated applications of succf to increment our Fin values up into the correct type family index (natural number that indexes the Fin ).

What provides these abilities? In all honesty there are many differences in between a dependently typed inductive family and a regular Haskell ADT, but we can focus on the exact one that is most relevant to understanding GADTs.

In GADTs and inductive families you get an opportunity to specify the exact type of your constructors. This might be boring

data Nat where Zero :: Nat Succ :: Nat -> Nat

Or, if we have a more flexible, indexed type we can choose different, more interesting return types

data Typed t where TyInt :: Int -> Typed Int TyChar :: Char -> Typed Char TyUnit :: Typed () TyProd :: Typed a -> Typed b -> Typed (a, b) ...

In particular, we're abusing the ability to modify the return type based on the particular value constructor used. This allows us to reflect some value information up into the type and produce more finely specified (fibered) typed.

So what can we do with them? Well, with a little bit of elbow grease we can produce Fin in Haskell. Succinctly it requires that we define a notion of naturals in types

data Z data S a = S a > undefined :: S (S (S Z)) -- 3

... then a GADT to reflect values up into those types...

data Nat where Zero :: Nat Z Succ :: Nat n -> Nat (S n)

... then we can use these to build Fin much like we did in Agda...

data Fin n where ZeroF :: Nat n -> Fin (S n) SuccF :: Nat n -> Fin n -> Fin (S n)

And finally we can construct exactly two values of Fin (S (S Z))

*Fin> :t ZeroF (Succ Zero) ZeroF (Succ Zero) :: Fin (S (S Z)) *Fin> :t SuccF (Succ Zero) (ZeroF Zero) SuccF (Succ Zero) (ZeroF Zero) :: Fin (S (S Z))

But notice that we've lost a lot of convenience over the inductive families. For instance, we can't use regular numeric literals in our types (though that's technically just a trick in Agda anyway), we need to create a separate "type nat" and "value nat" and use the GADT to link them together, and we'd also find, in time, that while type level mathematics is painful in Agda it can be done. In Haskell it's incredibly painful and often cannot.

For instance, it's possible to define a weaken notion in Agda's Fin type

weaken : (n <= m) -> Fin n -> Fin m weaken = ...

where we provide a very interesting first value, a proof that n <= m which allows us to embed "a value less than n " into the set of "values less than m ". We can do the same in Haskell, technically, but it requires heavy abuse of type class prolog.

So, GADTs are a resemblance of inductive families in dependently typed languages that are weaker and clumsier. Why do we want them in Haskell in the first place?

Basically because not all type invariants require the full power of inductive families to express and GADTs pick a particular compromise between expressiveness, implementability in Haskell, and type inference.

Some examples of useful GADTs expressions are Red-Black Trees which cannot have the Red-Black property invalidated or simply-typed lambda calculus embedded as HOAS piggy-backing off the Haskell type system.

In practice, you also often see GADTs use for their implicit existential context. For instance, the type

data Foo where Bar :: a -> Foo

implicitly hides the a type variable using existential quantification

> :t Bar 4 :: Foo