This is a post of two halves. I will start by explaining why I think types are so useful in professional programming, and then later discuss their place in learning to program.

I ♥ Types

From a software engineering viewpoint, I am a strong proponent of types. Unfortunately, most non-functional programming languages have included types in a fairly lacklustre way. When I say that types are useful, people may think about the distinction between integers, floats and strings (classic question: what type is a telephone number?), or the distinction between different record/class types. These are very useful distinctions but they are relatively basic uses of types. Let’s explore some more powerful uses of types.

NASA’s Mars Climate Orbiter project famously crashed due to a mixup in the units being used. The compiler did not complain because all the numbers involved were typed as floating point numbers, and you are permitted to manipulate them as you please. However, the Orbiter failure should be seen as a typing failure. The F# language has a units feature that can prevent such mixups. One number can be typed as a float<meters>. Another can be typed as a float<seconds>. Divide the former by the latter and you have a float<meters/seconds>. If you try to add that to an acceleration number (typed as float<meters/seconds^2>) then you will get a compile error. This cleverer, more thorough use of types begins to illustrate the benefits.

Many problems in programming arise from letting too many variables in your program have a plain string type. A URL, a file path, a file’s contents, a MySQL query string, a GUI label and someone’s name are all text. Letting them all have the same type and be manipulated in the same ways leads to all sorts of accidental errors. Concatenating the contents of two files might make sense, but concatenating two absolute directory names or two URLs does not. Many, if not all, SQL injection bugs can be seen as the concatenation of two incompatible types: a string originating from user or external input (which could be typed as such) and a query string, resulting in a query string. To avoid SQL injection, you should not allow any non-escape user input into a query string. Similar logic applies to injecting user-originated Javascript content into webpages. And in fact, several dynamic languages have tried to cut this out by having a dirty flag on strings that originate externally; they then prevent the use of (non-escaped) dirty strings in the wrong place like HTML generation.

Just as not all strings are not the same, not all integers are either. Recently, I have been developing some software that reads from a database. Each table has a 64-bit integer primary key named id (NB: I didn’t design the schema). You might be tempted to read that field into a plain 64-bit integer type. But, as well as permitting nonsensical addition and multiplication of ids, this means that you might accidentally read the id field from the users table and use it to find an entry in the posts table. A classic way this can happen is that you have a function that takes two ids, one for users and one for posts, and you pass the parameters in the wrong order. The better way to program this system is to use a different type for the id of the users table (e.g. id<users>) and for the posts table (id<posts>). That way your method can have two parameters with different types, and the compiler can issue an error if you get the order wrong.

Types prevent errors, and static types prevent errors very early on in the development process: at compile-time. Static type declarations also serve as a form of documentation. What’s more, I know that the types must be kept up-to-date, unlike documentation. Out of date documentation requires a programmer to notice; out of date types cause a compiler error. And if you want some really interesting type-related programming, there are actually some functions where there is only a single implementation for a given type, and thus writing a type is enough for a programmer or a tool to work out the function implementation. The djinn tool provides an implementation.

And Yet…

You get the picture: I’m a big fan of using types. However, there is one case where there I’m not as certain about how prominently types should feature: programming education.

I accept that a good strategy for teaching is to pare down what the students are exposed to. You start by teaching the students variables, or method calls, and you leave out other concepts (like loops) until students have got the hang of learning the first concept. So should types be one of the later concepts, that are omitted until students are ready for them? Should we start in languages where the types are relatively hidden and then move to more obviously typed languages later on?

Early on, this is the distinction between writing:

x = 5 y = "hello"

and:

int x = 5 string y = "hello"

Obviously, the former involves less concepts to start with. But does the latter help your understanding in the long run? We get into further differences with the question of whether you can write:

x = 5 x = "hello"

Is a variable a place for storing anything, or does a variable have a type? If you start with the former, is it difficult to later teach the latter? Similarly, can you have heterogenously-typed lists with all sorts of different elements:

x = [ 5, "hello", [3.0, 5.6] ]

Broadly, what I’m wondering is: are dynamically/flexibly typed systems a benefit to learners by hiding complexity, or are they a hindrance because they hide the types that are there underneath? (Aside from the lambda calculus and basic assembly language, I can’t immediately think of any programming languages that are truly untyped. Python, Javascript et al do have types; they are just less apparent and more flexible.) Oddly, I haven’t found any research into these specific issues, which I suspect is because these variations tend to be per-language, and there are too many other confounds in comparing, say, Python and Java — they have many more differences than their type system. I’m interested to hear anyone’s thoughts on this issue.