I was listening to the StackOverflow podcast this week, as I do every week, and in this week’s podcast, Joel Spolsky and Scott Hanselman talked briefly about type inference, e.g. extending C#’s “var” keyword to parameters, rather than just local variables (listen for yourself – about 6 minutes, starting at 1:22:30). The discussion was spirited, but murky: Joel said he knew it is possible, because some languages do it (citing examples); Scott said that as a layman he couldn’t immediately see how type inference could work in certain situations (mostly involving overloading). So it seems the time is ripe for me to try to shed a little light on the subject.

Today’s blog post describes type inference in the context of F#, but as you read you’ll see how much of the same reasoning can be applied to other programming languages. For those readers who already know F#, this blog entry might help solidify your mental model of how the F# type inference process works, and for those who are new to type-inferred languages, this blog entry will hopefully provide you with a basic framework to reason about what type-inferred language can (and cannot easily) do.

Type inference basics

To start things off, consider some very simple F# code:

let x = 3 let b = false

The compiler infers that ‘x’ has type ‘int’ and that ‘b’ has type ‘bool’, based on the type of the literal values on the right-hand-sides. In a language like C# 2.0 you’d have to explicitly declare each variable’s type, whereas nowadays in C# you can just use ‘var’ and the types are inferred like with the ‘let’ keyword of F#. (If you’re using Visual Studio, you can put your mouse over the declarations and the hover-tooltips will show you the inferred type.)

Though this feature does not buy you much in simple examples like the previous one, it starts earning its keep in cases like

open System.Collections.Generic let d : Dictionary<int,string> = new Dictionary<int,string>() for kvp : KeyValuePair<int,string> in d do printfn "%d: %s" kvp.Key kvp.Value

where I’ve explicitly declared the types of ‘d’ and ‘kvp’. Putting a type declaration on ‘d’ is completely redundant, since the same long type name is mentioned again a few characters later on the right-hand side. And Dictionary is one of the most commonly used collection types in the framework; .Net developers know they can iterate over each of the KeyValue pairs in a Dictionary. So we can get rid of the noise and write the code as just

open System.Collections.Generic let d = new Dictionary<int,string>() for kvp in d do printfn "%d: %s" kvp.Key kvp.Value

where the types of ‘d’ and ‘kvp’ are inferred.

Type inference of parameters and return types

It is straightforward to extend type inference to parameters and return types in some cases, based on the body of the method or function. For example, consider the function

let Not(b) = if b then false else true

In the body of the function, ‘b’ is used in an “if”, so ‘b’ must have type ‘bool’. And the if-then-else expression is also returning boolean literals. So the compiler can easily infer that Not is a function that takes a bool and returns a bool.

Types can be inferred not just from uses of literals and control constructs (as in the last example) or constructors (as with Dictionary in the prior example), but also from calls to other methods whose type signatures are known. Consider this contrived function to determine if a string is longer than 50 characters:

let IsLongString(s) = if System.String.IsNullOrEmpty(s) then false else s.Length > 50

Here, the first line of the function body calls String.IsNullOrEmpty(s), and that method takes a single parameter of type ‘string’ and has no other overloads. As a result, the compiler can infer that ‘s’ must be a string. The whole body resolves to a boolean, so the compiler infers that IsLongString takes a string and returns a bool.

The examples in this section demonstrate that for many functions, inferring the parameter and return types is straightforward by simple analysis of the body of the function. (Note also that eventual call sites of these functions need not be taken into account; the types here are completely decided based on the function bodies, not on how these functions might eventually be called.)

Type inference and generics

The power of type inference really shines when generics are involved. For example:

let PrintAll(s) = for x in s do printfn "%A" x

What is the type of PrintAll? Well, the ‘for’ statement constrains the heretofore unknown ‘s’ to be a seq<’T> (an alias for IEnumerable<T>, for those readers less familiar with F#), and so ‘x’ will have the corresponding ‘T’ type. The ‘printfn’ line adds no further constraints (‘”%A” is the ‘any’ format specifier, which works for all types, just calling .ToString() if needed). The type inference logic cannot nail down ‘T’, but the code in the body will work for all types ‘T’, so we can simply infer this to be a generic function. Thus PrintAll’s type is inferred as “seq<’T> –> unit”, that is, a generic function that takes any IEnumerable<T> and returns nothing. The corresponding C# would be

static void PrintAll<T>( IEnumerable <T> s) { foreach ( var x in s) { Console .WriteLine(x); } }

Note that you can specify the function signature explicitly in F# too, if you so choose:

let PrintAll<'T>(s : seq<'T>) : unit = for x in s do printfn "%A" x

but F# lets you leave off the generic parameter declaration “<’T>”, or the argument type “: seq<’T>”, or the return type “: unit”, or all three of these type annotations, since all of these can be inferred from the function body.

For a more advanced example with generic constraints, consider this F# function:

let PrintAndDispose(s) = for x in s do printfn "%A" x (x :> System.IDisposable).Dispose()

This is like the last example, except after we print each item in the sequence we upcast it to IDisposable and Dispose() it. The inferred argument type is now a seq<’T> for all ‘T that inherit IDisposable. The corresponding C# is:

static void PrintAndDispose<T>( IEnumerable <T> s) where T : IDisposable { foreach ( var x in s) { Console .WriteLine(x); x.Dispose(); } }

The ‘where’ clause constrains the generic parameter to be something that inherits IDisposable.

The examples in this section demonstrate that type inference can not only infer simple argument and return types, but even generic types or generic types with constraints. Type inference is powerful, but it is not without limitations, as we’ll see next.

Type inference when overloading is involved

To paraphrase myself, “overloading is the bugaboo of type-inferred languages”. Consider this simple contrived example:

let s = "abcdefg" let FindIndexOfChar(c) = s.IndexOf(c)

Unlike the previous examples where the F# compiler easily inferred the types of everything, this time it falls down. The first three error diagnostics the compiler issues are:

...\Program.fs(44,5): error FS0041:

A unique overload for method 'IndexOf' could not be determined based on type information prior to this program point.

The available overloads are shown below (or in the Error List window). A type annotation may be needed. ...\Program.fs(44,5): error FS0041: Possible overload: 'System.String.IndexOf(value: char) : int'. ...\Program.fs(44,5): error FS0041: Possible overload: 'System.String.IndexOf(value: string) : int'.

(In fact there are a bunch more error lines, because there are a bunch more overloads of System.String.IndexOf.) This example gets to the crux of the Joel & Scott discussion from the podcast. Overloading introduces an ambiguity – in this example ‘c’ could have either type ‘char’ or type ‘string’ based on the overloads, and the F# type inference system doesn’t know what to infer.

(Programming language wonks may now want to take a moment to read the section “For the wonks” at the bottom of this blog entry, where I discuss other ways to deal with overloading, such as Haskell “type classes” and F# inline functions with “hat” types. That’s a needless digression for most readers, though.)

In F#, the ‘answer’ is simple; as the compiler says, “a type annotation may be needed”. So we just add one:

let FindIndexOfChar(c : char) = s.IndexOf(c)

Type inference is not all-powerful in F#, there are cases where you must explicitly specify types. Overloaded methods are a case where types are ambiguous in the absence of other constraints; when you call some.Method(…) and the method is overloaded, you need other prior type information to determine which overload is intended. This applies only for an overloaded method; recall the example:

let IsLongString(s) = if System.String.IsNullOrEmpty(s) then false else s.Length > 50

IsNullOrEmpty is not overloaded, and thus we can infer that ‘s’ is a string based on this method call.

The example in this section demonstrates one case where a type cannot be inferred and an explicit type annotation was required. Next let’s look at another such case.

Another stumbling block for type inference

Consider a different variation of the IsLongString function:

let IsLongString2(s) = if s = null then false else s.Length > 50

Now let’s reason through what we can infer about ‘s’. In the first line of the body, all we know is that ‘s’ is a reference type rather than a value type (since we’re comparing it to null). And then on the last line, we try to call the .Length member. Well, a lot of .Net types have a .Length member, and there’s no way to infer which of those types we mean here. This time the compiler error points to s.Length, reporting

Program.fs(54,9): error FS0072: Lookup on object of indeterminate type based on information prior to this program point.

A type annotation may be needed prior to this program point to constrain the type of the object.

This may allow the lookup to be resolved.

In other words, if you want to do “o.Prop” or “o.Method(…)” on some object ‘o’, you must already know the type of ‘o’ in order to perform the member lookup. In this case we don’t already have the type of ‘s’ pinned down, so we can’t call s.Length. Once again, the fix is simple – add a type annotation!

let IsLongString2(s : string) = if s = null then false else s.Length > 50

And now we’re good.

This section and the previous one show the two most common cases where F# will require a type annotation because types could not be unambiguously inferred. Most of the times when the compiler will require you to declare a type, it’s because of one of these two cases.

Summing up type inference in F#

As we’ve seen, F# can infer types in a variety of contexts including

types of local variables

types of arguments to functions/methods

return types

generic types

type constraints on generics

and it can infer this information in a variety of ways, including

knowing the types of literals like ‘false’ or ‘3’

knowing the types of newly constructed objects like “new Dictionary<int,string>()”

knowing the type signatures of called functions or non-overloaded methods

inferring types based on control constructs (e.g. inferring IEnumerable<T> in “for x in s do…” or inferring bool in “if e then…”)

Nevertheless there are some cases where types cannot be inferred; in all such cases you can simply add a type annotation to explicitly declare the type. Indeed, even when types can be inferred, you can still choose to declare the type – it’s up to you. When you are accustomed to working in statically-typed languages without inference, it can be a little daunting when you’re first starting out in a type-inferred language and there are no types declared anywhere. As Scott said in the podcast, “it’s just syntactic sugar”. So you can “sweeten to your taste”. You can always fall back to writing types everywhere in F#, just as you would in C# 2.0. But as you do more and more functional programming, I think you’ll find that type inference is really handy. My favorite example to drive home that point is this F# code:

let MyFunc(f,s) = seq { for x in s do yield f(x) }

In C# 4.0, the same function is written as

static IEnumerable <U> MyFunc<T, U>( Func <T, U> f, IEnumerable <T> s) { foreach ( var x in s) { yield return f(x); } }

The bodies of these functions are pretty comparable, but the C# declaration line is pretty big compared to the F#. Of course, you might recognize these functions as variations of F#’s “Seq.map” and C#’s “Enumerable.Select”. When you do a lot of functional programming, you end up using and authoring lots of transformative functions like these (that take functions as arguments and apply those functions in various ways to perform transformations of input sequences), and it is nice to not have to clutter these functions with lots of type annotations.

That’s pretty much it for today; those interested in a little more content/detail can keep reading, but the remaining sections are “optional”. :)

Parenthetical aside

Though type inference is “just” syntax sugar, it really can matter; there are cases where you’d never write cool functional programming code in C# because you get completely swamped under by the type annotations. As an example see here; one of the C# functions in that example is this monstrosity:

static Tree< KeyValuePair <A, bool >> DiffTree<A>(Tree<A> tree, Tree<A> tree2) { return XFoldTree((A x, Func <Tree<A>, Tree< KeyValuePair <A, bool >>> l, Func <Tree<A>, Tree< KeyValuePair <A, bool >>> r,

Tree<A> t) => (Tree<A> t2) => Node( new KeyValuePair <A, bool >(t2.Data, object .ReferenceEquals(t, t2)), l(t2.Left), r(t2.Right)), x => y => null , tree)(tree2); }

and yes, every single one of those type annotations is required by C# 4.0. Here’s the corresponding F# code that uses the same identifier names:

let DiffTree(tree,tree2) = XFoldTree ( fun x l r t t2 -> let (Node(x2,l2,r2)) = t2 Node((x2,t===t2), l l2, r r2)) ( fun _ _ -> Leaf) tree tree2

This code is taken from an advanced example, and without any context I do not expect the reader to be able to understand either code snippet. The point is just to illustrate how “sugar can matter”; without syntax sugars like type inference, some advanced functional programming techniques are just hopeless because the code is a horrible mess of generic type names and angle brackets that make it practically impossible to see where the type declarations end and the logic/computation begins.

For the wonks

There are other strategies used by various languages to deal with ‘overloading’. For example, in some ML dialects, there is no overloading allowed at all. Problem solved! Of course means that, for example, you need a different function to add two integers than to add two floating point numbers. As a result, you’ll find two different addition operators in these ML dialects, where + (plus) is used to add integers and +. (plus dot) is used add floats. The same is true of the other math operators. Kinda annoying; especially in the case of common arithmetical operators, overloading is pretty handy.

Then you have languages like Haskell, that use a language feature called “type classes” to deal with cases like overloading math operators for all numeric types. Operators like + are members of the type class named “Num”, and then types like integers and floats are instances of the Num class and each define their own specific implementations of those operators. Then you can write generic code that works for all numeric types. (You might liken the implementation to a blending of constrained generic types and OO-style runtime polymorphism.)

In F#, operators like + are defined as “inline” functions which use generic type variables of the form “^T” (which I call “hat types”) that can do certain limited kinds of ad-hoc overloading logic based on specific concrete types or member constraints. You can Bing for “F# duck typing” if you are interested in learning more. These advanced features help compensate for the lack of a way to express abstractions like “type classes” on .Net or in F#.

All that said, none of these techniques apply the .Net-style overloading example I showed earlier (with String.IndexOf). This kind of overloading, where the same method name is used multiple times with arbitrary parameter lists that differ by type or number, is known as “ad-hoc overloading”, and I don’t think there are any languages that try to infer types from such overloaded calls. Contrast this kind of overloading from the + case; in the case of +, there is a set of numeric types T that all support the form T+T->T (as well as some other operators like ‘*’) with similar semantics and type schemata. Whereas String.IndexOf just has a bunch of overloads with varying numbers of parameters that do various similarish things (find the index of a string, or a char, starting at the beginning, or maybe at a given starting index, …) and they all happen to have the same method name. There’s no intrinsic commonality among “char c” and “string s” and “int startIndex” and “StringComparison compType”, these types just make for handy argument sets of String.IndexOf. Whereas there is an intrinsic commonality among “int x” and “float x” and “BigInteger x” when it comes to operators like + and * and functions like “abs”. There are various language strategies to cope with type inference in the “intrinsic commonality” cases where many types share the same operations (Haskell type classes, F# inline hat types, …), but personally I know of no mechanisms to deal with inference for ad-hoc overloading of a particular method name on a single type, a la .Net overloaded methods.