printf

On the surface of it, typing the C-like formatter printf requires dependent types. After all, the type of printf fmt (the number and the types of its arguments) depends on the value of the format string, fmt . Listed on this page however are many implementations of type-safe printf in the Hindley-Milner system, OCaml and Haskell. One way or another, first-class delimited continuation appeared, either directly or encoded in continuation-passing style (CPS). Does it mean that delimited continuations somehow `implement' dependent types?

Further, OCaml has its own type system for typed formatting, refusing to accept, for example, printf "the value of x is %d" 1.0 because 1.0 has the type float rather than the expected int . Even GCC issues a warning in similar cases. Does it mean that OCaml or GCC are dependently typed?

The answer to all these questions is negative. The C-like type-safe printf does require dependent types, which neither OCaml nor Haskell, let alone GCC, have. A simple test will demonstrate this, hence distinguishing a truly dependently-typed system from a fake. We systematically construct two archetype fake dependently-typed printf s, tracing the progression from dependent types to fancy non-dependent types to the Hindley-Milner system, and discussing what burden we shift from the type checker to the user and what we lose in expressiveness.

Recall that in C, printf is a polyvariadic function whose first argument is the format string, containing ordinary characters and conversion specifications. The latter are introduced by the % character and define how to format the corresponding printf argument. The number and the types of the subsequent arguments to printf depend on the number and the content of conversion specifications in the format string. Hence printf has the dependent-function type Pi x:String. PA x , which is also written as {x:String} -> PA x . Here x is a value, String is a Type (that is, String :: * ) and PA :: String -> * , which computes the type of printf from the contents of the format string, has the dependent kind. With such a printf , we can give a type to the function

fts x = "The formatting result is " ++ sprintf ("%" ++ x) 1

fts :: {x:String | isPrefix "d" x && not (elem '%' x)} -> String

printf

sprintf

fts

fts

printf

fts

# let fts x = "The formatting result is " ^ sprintf ("%" ^^ x) 1;; ^^^ Error: Premature end of format string ``"%"''

fts

fts

fts

The example fts confirmed that the C-like type-safe printf requires dependent types. There are however approximations, also type safe, that do not impose such requirement. We now design two characteristic approximations. With dependently-typed sprintf , sprintf "Hello" has the type String and sprintf "The value of %c is %d" has the different type Char -> Int -> String . And yet in both cases the format descriptor had the same type String . Therefore, the type checker has to parse and reason about the content of that string, its value. In fact, the OCaml error message about fts above betrays type checker's actual parsing of the format string. Alas, OCaml lacks the type system to express the result of parsing "%" .

What if we parse the format string ourselves, separating out the literal text and explicating conversion specifications? Instead of "The value of %c is %d" we would give printf as the format descriptor the following parsed form:

fmtG = LIT "The value of " :^ CHAR :^ LIT " is " :^ INT

LIT str

CHAR

INT

(:^)

data Desc a where LIT :: String -> Desc () INT :: Desc Int CHAR :: Desc Char (:^) :: Desc a -> Desc b -> Desc (a,b)

Desc a

fmtG :: Desc ((((), Char), ()), Int)

fmtG

LIT

printf

fmtG

printf

printf :: Desc a -> PS a type PS a = PFormat a String type family PFormat a w :: * type instance PFormat () w = w type instance PFormat Int w = Int -> w type instance PFormat Char w = Char -> w type instance PFormat (a,b) w = PFormat a (PFormat b w)

PS :: * -> *

PFormat :: * -> * -> *

*

String

There are people who would call a language with type functions `dependently typed'. After all, the type Desc a reflects the particular value of the format descriptor. For example, the type Desc Int corresponds to the value INT , and no other value has that type. Since the value of the format descriptor influences the type so directly and unambiguously, it seems the type of printf sort of depends on the format descriptor value. This dependence however is not sufficient to type check fts . Our emulation gives printf the already parsed format string. We have shifted the burden of parsing the format string from the type checker to the programmer -- gaining simpler, non-dependent typing but losing the flexibility of building arbitrary conversion specifications such as those in fts .

The signature of printf guides, even compels the implementation. Just as PS used an auxiliary type function PFormat , printf relies on the auxiliary function interp to interpret the format descriptor. The structure of PFormat drives the implementation of interp :

printf desc = interp desc id interp :: Desc a -> (String -> w) -> PFormat a w interp (LIT str) = \k -> k str interp INT = \k -> k . show interp CHAR = \k -> k . \c -> [c] interp (x :^ y) = \k -> interp x (\sx -> interp y (\sy -> k (sx ++ sy)))

The interpreter is clearly compositional: the interpretation of a descriptor depends only on the components of the descriptor regardless of the overall context. The compositionality gives us an idea to name the applications of interp to primitive descriptors:

lit :: String -> (String -> w) -> w -- cf. PFormat () w = w lit str = \k -> k str -- which is, interp (LIT str) int :: (String -> w) -> (Int -> w) -- cf. PFormat Int w = Int -> w int = \k -> k . show -- which is, interp INT char :: (String -> w) -> (Char -> w) -- cf. PFormat Char w = Int -> w char = \k -> k . \c -> [c] -- which is, interp CHAR

(^)

interp (desc1 :^ desc2) === (interp desc1) ^ (interp desc2)

(^) :: ((String -> t1) -> t) -> ((String -> t2) -> t1) -> (String -> t2) -> t (^) x y = \k -> x (\sx -> y (\sy -> k (sx ++ sy)))

Instead of writing the format descriptor as a GADT such as fmtG , we write the interpreted descriptor, interp fmtG , directly in terms of the interpreted primitive descriptors lit , int and char :

-- The interpreted descriptor: interp fmtG fmtI = lit "The value of " ^ char ^ lit " is " ^ int -- fmtI :: (String -> w) -> Char -> Int -> w

The main function printf now receives the already interpreted descriptor as the argument, so it becomes

printf' idesc = idesc id

tIr = printf' fmtI 'x' 3 -- "The value of x is 3"

We have thus derived one of Danvy's original implementations of type-safe printf . By making the programmer write interpreted format descriptors (which is hardly a burden in this case: fmtI requires fewer keystrokes than fmtG ) we gain even simpler typing. We no longer need the GADT Desc a ; furthermore, the type function PFormat is incorporated into the types of lit , int , etc. and does not have to be defined explicitly. The type-safe printf' and fmtI are hence typeable in the Hindley-Milner system. Since we have wired-in the interpreter interp , we have lost the ability to change it, that is, to interpret the format descriptor in a different way -- to parse input, to format to a network pipe, etc. Incidentally, we regain the ability to change the interpreter if we abstract over it, using type classes -- obtaining the tagless-final type-safe printf .

The interpreter interp is written in the continuation k passing style -- which is forced upon us by the structure of PFormat . The last clause of that type function

type instance PFormat (a,b) w = PFormat a (PFormat b w)

a

b

(a,b)

w

a

(PFormat b w)

(^)

shift