CS 331 Spring 2018 > A Primer on Type Systems

CS 331 Spring 2018

A Primer on Type Systems

By Glenn G. Chappell

Text updated: 29 Jan 2018

Table of Contents

1. Introduction to Typing

This article introduces types, type systems, and associated terminology—all from a practical point of view. Sections 2–4 describe how type systems can be placed on three axes: static vs. dynamic, manifest vs. implicit, and nominal vs. structural. Section 5 closes with a discussion of type safety.

Basic Concepts

A type system is a way of classifying entities in a program (expressions, variables, etc.) by the kinds of values they represent, in order to prevent undesirable program states. The classification assigned to an entity its type. Most programming languages include some kind of type system.

Consider the following C++ code.

[C++] int abc;

In C++, int is a type. Variable abc has type int .

More C++ code:

[C++] cout << 123 + 456;

Literals 123 and 456 also represent values of type int in C++, although this does not need to be explicitly stated. In C++, applying a binary + operator to two values of type int results in another value of type int . Therefore, the expression 123 + 456 also has type int .

In the past, programming languages often had a fixed set of types. In contrast, modern programming languages typically have an extensible type system: one that allows programmers to define new types. For example, in C++, a class is a type.

[C++] class Zebra { // New type named "Zebra" …

Type checking means checking and enforcing the restrictions associated with a type system. The various actions involved with a type system (determining types, type checking) are collectively known as typing.

How Types are Used

In programming languages, types are used primarily in three ways.

Usage #1: Determining Legal Values

First, types are used to determine which values an entity may take on.

Being of type int , the above variable abc may be set to 42 , which also has type int .

[C++] abc = 42;

Being of type int also makes some values illegal for abc .

[C++] vector<int> v; abc = v; // ILLEGAL!

We cannot set a variable of type int to a value of type vector<int> . The above code contains a type error. When a C++ compiler flags a type error, it prevents the program from compiling successfully.

Usage #2: Determining Legal Operations

Second, types are used to determine which operations are legal.

Since variable abc is of type int , the binary + operator may be used with it.

[C++] cout << abc + abc;

Some operations are forbidden for type int . For example, there is no unary * operator for this type. A C++ compiler would flag a type error in the following code.

[C++] cout << *abc; // ILLEGAL!

A type whose values can be manipulated with the ease and facility of a type like int in C++, is said to be first-class. More formally, a first-class type is one for which

new values may be created from existing values at runtime,

values may be passed as arguments to functions and returned from functions, and

values may be stored in containers.

In some programming languages, such has Haskell, Scheme, Python, and Lua, function types are first-class. Such programming languages are said to have first-class functions.

Usage #3: Determining Which Operation

Third, types are used to determine which of multiple possible operations to perform.

Consider the following C++ function template addEm .

[C++] template <typename T, typename U> T addEm(T a, U b) { return a + b; }

We can call addEm with two arguments of type int or with two arguments of type string . The operation performed by the + operator is determined by the argument type: addition for int arguments, concatentation for string arguments.

Python similarly uses the binary + operator for both numeric addition and string concatenation. The following Python function, like its C++ counterpart, may be called with two integers or with two strings.

[Python] def addEm(a, b): return a + b

An important application of this third use of types is to provide a single interface for entities of different types. This called polymorphism. Both versions of addEm are polymorphic functions.

2. Type Systems: Static vs. Dynamic

Definitions

A type system can be characterized as static or dynamic.

In a static type system, types are determined and checked before program execution. This is typically done by a compiler. Type errors flagged during static type checking generally prevent a program from being executed.

Programming languages with static type systems include C, C++, Java, Haskell, Objective-C, Go, Rust, and Swift.

In a dynamic type system, types are determined and checked during program execution. Types are tracked by attaching to each value a tag indicating its type. Type errors in a particular portion of code are flagged only when that code actually executes.

Programming languages with dynamic type systems include Python, Lua, JavaScript, Ruby, Scheme, and PHP.

Static typing and dynamic typing are actually two very different things. They are handled at different times, are implemented very differently, give different information, and allow for the solution of different problems.

Static and dynamic typing are so different that some people prefer not to use the same word for both. They typically reserve the term “type” for use with a static type system, referring to the categories in a dynamic “type” system as tags. In this article, we will use “type” for both the static and dynamic flavors, but it should be noted that some consider “dynamic typing” to be a misnomer.

Regardless of the characteristics of any particular type system, remember that type systems generally share a common purpose.

The purpose of a type system is to prevent undesirable program states.

For example, we use types to prevent the execution of operations that are incorrect for a type, because it is undesirable that such operations execute.

Consequences of Static & Dynamic Typing

Now we consider some of the ways that static vs. dynamic type systems affect programming. Our examples will use C++, which is statically typed, and Python & Lua, which are dynamically typed.

Compilation vs. Runtime Errors

In C++, Python, and Lua, it is illegal to divide by a string. A C++ compiler will flag a type error in the following code.

[C++] cout << "Hello" << endl; cout << 1 / "bye" << endl; // ILLEGAL!

The static type checking in C++ means that the above code will not compile. An executable will not be created—much less executed.

Similarly, the following Python code will result in a type error being flagged.

[Python] print("Hello") print(1 / "bye") # ILLEGAL!

As will the following Lua code.

[Lua] io.write("Hello

") io.write(1 / "bye" .. "

") -- ILLEGAL!

However, because of the dynamic type checking in Python and Lua, the programs above can still compile successfully and begin execution. The type error will not be flagged until execution reaches the second statement. So both of the above programs will print “ Hello ”, and then the type error will be flagged.

In some dynamically typed programming languages, type errors raise exceptions that can be caught and handled. Here is a Python function ff that takes one parameter and attempts to print 1 divided by that parameter.

[Python] def ff(x): try: print(1 / x) # MAYBE illegal, depending on type of x except TypeError: print("TYPE ERROR")

Above, if a type error is flagged in print(1 / x) , then we catch the resulting exception and print a message. If we do ff(2) , then “ 0.5 ” will be printed. If we do ff("bye") , then “ TYPE ERROR ” will be printed. In both cases, execution will continue.

Typing of Both Variables and Values vs. Only Values

In a static type system, types are generally applied to both variables and values.

[C++] int x = 42;

Above, “ x ” is an identifier naming a variable of type int , and 42 is a value of type int .

In a dynamic type system, types are represented by tags attached to values. So generally only values have types in a dynamic type system.

[Python OR Lua] x = 42 x = "bye"

The above presents no problem in a dynamically typed programming language like Python or Lua. Variable x does not have a type. The value 42 does have a type. And the value "bye" has a different type. But in both cases, x is merely a reference to the value.

A bit more subtly, in dynamically typed programming languages container items typically do not have types; only their values do. So there is generally no problem with a container holding values of different types. Here are Python and Lua lists containing a number, a string, and a boolean.

[Python] mylist = [ 123, "hello", True ]

[Lua] mylist = { 123, "hello", true }

3. Specifying Types: Manifest vs. Implicit

Manifest & Implicit Typing

When we specify the type of an entity by explicitly stating it, we are doing manifest typing.

The typing of variables and functions in C, C++, and Java is mostly manifest.

[C++] double sq47(double n) { double result = 4.7 * n * n; return result; }

Above, the types of n ( double ), sq47 (function taking double and returning double ), and result ( double ) are explicitly stated. Such an explicit specification of a type is a type annotation.

When types are not specified explicitly, we have implicit typing.

The typing in Python and Lua is mostly implicit. Here is a Python function that is more or less equivalent to the above C++ function.

[Python] def sq47(n): result = 4.7 * n * n return result

In the above code, there are no type annotations at all.

In dynamically typed programming languages, typing is usually mostly implicit. It is therefore tempting to conflate manifest typing with static typing; however, the two are not the same.

For example, here is function sq47 in Haskell, which has a static type system.

[Haskell] sq47 n = result where result = 4.7 * n * n

Again, there are no type annotations. However, identifiers sq47 , n , and result in the above code still have types. This is because a Haskell compiler performs type inference, determining types from the way entities are used in the code. Haskell types are said to be inferred. However, while type annotations are mostly not required in Haskell, they are still allowed, as below.

[Haskell] sq47 :: Double -> Double -- Above is Haskell's optional manifest typing: -- sq47 is a function taking Double, returning Double sq47 n = result where result = 4.7 * n * n

The following table shows how various programming languages can be classified.

Type Specification Mostly Manifest Mostly Implicit Type

System Static C, C++, Java Haskell, OCaml Dynamic Not much goes here Python, Lua, Ruby, JavaScript, Scheme

Since 2011, C++ standards have allowed for the increasing use of type inference in C++. For example, the following is legal under the 2014 C++ Standard.

[C++14] auto sq47(double n) { auto result = 4.7 * n * n; return result; }

While the type of parameter n is explicitly specified above, the other two type annotations are no longer required.

On the other hand, some implicitly typed programming languages are moving in the direction of optional type annotations. For example, beginning in 2015, Python has allowed for type annotations on function parameters.

Explicit & Implicit Type Conversions

A type conversion takes a value of one type and returns an equivalent, or at least similar, value of another type. When we specify in our code that a type conversion is to be done, we are doing an explicit type conversion; other conversions are implicit.

For example, C++ does implicit type conversion from int to double .

[C++] int n = 32; auto z = sq47(n);

Function sq47 takes a parameter of type double , while n has type int . This mismatch is dealt with via an implicit type conversion: a double version of the value of n is computed, and this is passed to function sq47 . We could also do the conversion explicitly, as below.

[C++] int n = 32; auto z = sq47(static_cast<double>(n));

Haskell has few implicit type conversions. With function sq47 defined as above, the following code will not compile, as we are attempting to pass an Integer to a function that takes a Double .

[Haskell] n = 32 :: Integer z = sq47 n -- ILLEGAL!

An explicit type conversion fixes the mismatch.

[Haskell] n = 32 :: Integer z = sq47 (fromIntegral n)

Interestingly, when we write the above explicit type conversion, we need not state the type we are converting to.

4. Type Checking: Nominal vs. Structural

Nominal Typing

Various standards can be applied when type checking is done. Consider the following C++ types A and B .

[C++] struct A { int h; int m; }; struct B { int h; int m; };

Should types A and B be considered the same, for type-checking purposes? For example, should we allow the following function gg to be called with an argument of type B ?

[C++] void gg(A x) { ...

It might seem that there is no possible reason to distinguish between A and B . But here is one: they are different types. The programmer made them distinct, and perhaps we should honor that decision.

For example, perhaps the two data members in type A represent hours and minutes in a time of day, while the two data members in type B represent hits and misses in a shooting competition. It might be a good idea to avoid mixing up the two.

Type checking by this standard is nominal typing (“nominal” because we check whether a type has the right name). C++ checks ordinary function parameters using nominal typing. And indeed, if we try to call the above function gg with an argument of type B , a C++ compiler will flag a type error.

There are looser ways to apply nominal typing. For example, in the context of C++ inheritance, when a function takes a parameter of base-class pointer type, the type system allows a derived-class pointer to be passed as an argument.

[C++] class Derived : public Base { ... }; void hh(Base * bp); Derived * derp; hh(derp); // Legal, even though different type

Structural Typing

Another possible standard for type checking is structural typing, which considers two types to be interchangeable if they have the same structure and support the same operations. Under structural typing, the types A and B defined above would be considered the same.

Structural typing may also be applied in a relatively loose manner. Perhaps the loosest variation on structural typing allows an argument to be passed to a function as long as every operation that the function actually uses is defined for the argument. This is duck typing. (The name comes from the Duck Test: “If it looks like a duck, swims like a duck, and quacks like a duck, then it’s a duck.”)

C++ checks template-parameter types using duck typing. The following function template ggt can be called with arguments of type A or B .

[C++] template <typename T> void ggt(T x) { cout << x.h << " " << x.m << endl; }

The addEm functions discussed earlier are examples of duck typing. Here they are again.

[C++] template <typename T, typename U> T addEm(T a, U b) { return a + b; }

[Python] def addEm(a, b): return a + b

We have noted that C++ template-parameter types are checked using duck typing. Python, Lua, and some other dynamic languages check all function parameter types using duck typing. Both versions of addEm above can be called with arguments of any type, as long as all the operations used are defined for those types. In particular, both versions of addEm may be called with two integer arguments or with two string arguments.

No Type Checking

An alternative to the nominal and structural versions of type checking is no type checking at all.

Consider the programming language Forth, as defined in the 1994 ANSI standard. Forth distinguishes between integer and floating-point values, so it arguably has a notion of type. However, these two types are dealt with using different syntax. There is no need to check whether an integer parameter is actually an integer; Forth provides no facilities for passing a floating-point value in its place.

Thus, while Forth has types, it has no type checking.

This idea was once common in programming-language design. However, if different types involve different syntax, then it is difficult to implement an extensible type system. Virtually all modern programming languages include some form of type checking.

5. Type Safety: Strong [ick!], Weak [ick!], Sound

Type Safety

A programming language or programming-language construct is type-safe if it forbids operations that are incorrect for the types on which they operate.

Some programming languages/constructs may discourage incorrect operations or make them difficult, without completely forbidding them. We may thus compare the level of type safety offered by two programming languages/constructs.

For example, the C++ printf function, inherited from the C Standard Library, is not type-safe. This function takes an arbitrary number of parameters. The first should be a format string containing references to the other parameters.

[C++] printf("I am %d years old.", age);

The above inserts the value of variable age in place of the %d in the format string, on the assumption that age has type int . However, the C++ Standard specifies that the type of age is not checked. It could be a floating-point value, a pointer, or a struct ; the code would then compile, and a type error would slip by unnoticed.

In contrast, C++ stream I/O is type-safe.

[C++] cout << "I am " << age << " years old.";

When the above code is compiled, the correct output function is chosen based on the type of the variable age .

Strong [ick!] & Weak [ick!]

Two unfortunate terms are often unfortunately used in discussions of type safety; the results are—you guessed it—unfortunate. The terms are strong typing (or strongly typed) and weak typing (or weakly typed). These generally have something to do with the overall level of type safety offered by a programming language.

The problem with this terminology is that it has no standard definition; it is used in different ways by different people, many of whom seem to assume that their definition is universally agreed upon. I have seen at least three different definitions of “strongly typed” in common use. The C programming language is strongly typed by one of them and weakly typed by the other two.

See the references for a page listing many different definitions of “strongly typed”.

Therefore:

Avoid using the terms “strong” and “weak” typing, or “strongly” and “weakly” typed.

Nonetheless, the issue of how strictly type rules are enforced is an important one. When discussing it, we might speak informally of how one type system is “stronger” than another. If we wish to be more formal, then we can talk about type safety, or simply explain in detail what we mean. When we are discussing a static type system, we can also talk about soundness.

Soundness

A static type system is sound if it guarantees that operations that are incorrect for a type will not be performed; otherwise it is unsound.

Haskell has a sound type system. The type system of C (and thus C++) is unsound, since there is always a way of treating a value of one type as if it has a different type. This might appear to be a criticism. However, the type system of C was deliberately designed to be unsound. Being able to interpret a value in memory in arbitrary ways makes C useful for low-level systems programming.

There does not seem to be any equivalent of the notion of soundness in the world of dynamic typing. However, we can still talk about whether a dynamic type system strictly enforces type safety.

6. References

The following were particularly helpful in writing this article.

The following page summarizes various definitions for strongly typed and gives examples of inconsistent use of the term.

StronglyTyped from the C2 Wiki, a.k.a. the WikiWikiWeb.

7. Copyright & License

© 2015–2018 Glenn G. Chappell



A Primer on Type Systems is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. If different licensing is desired, please contact the author.