Untyped Programs Don’t Exist

Lately, I’ve been thinking about various (false) dichotomies, such as typed vs untyped programming and type systems vs program logics. In this blog post, I will argue that untyped programs don’t exist (although the statement will turn out to be trivial).

TLDR

All languages are typed, but may use different enforcement mechanisms (static checking, dynamic checking, no checking, or some combination). We should talk about how to use types in programming—e.g. tools for writing and enforcing invariants about programs—instead of talking about types and type checking as properties of languages.

Table of Contents

Some Context

In most of my academic work, I work with “typed” languages. These languages have some nice properties for the metatheorist and compiler writer. Types lend themselves to strong automated reasoning, automatically eliminate large classes of errors, and simplify the job of whoever is reasoning about the programs. The downside is that the programmer must essentially statically prove properties of their program in such a way that a machine can understand the theorem and check the proof.

When I’m hacking, I write in “untyped” languages. I write programs in Racket, scripts in bash, plugins and tools in JavaScript, papers in latex, build systems in Makefile, and so on. These languages lend themselves to experimentation and avoid the overhead of necessarily proving properties of the programs. The downside is that the computer cannot help the programmer, since the programmer has not communicated the invariants about the program in a way the computer can understand.

“But surely”, a type evangelist says, “the very same benefits of types for metatheory help one develop the program in the first place? Why do you hobble yourself by omitting types? Come join us in the land of light! Use types from the start!”

“Dear friend, I couldn’t agree more!”, I reply, "Types are invaluable to developing my programs, but your ‘typed’ language prevent me from writing down my types!’

"Well, certainly there are some limitations of typed languages," the type evangelist concedes, "but this we could also choose to ignore the type system, create the uni-type, and program in the error monad. Now we have the benefits of both worlds!"

"Don’t you see,", I say excitedly, "that is just what I’m doing! My ‘untyped’ languages are, in fact, well-typed. My programs run implicitly in the error monad. What’s more, I am not required to prove it, for it is simply true."

A grave look comes over my interlocutor’s face. "But you forfeit all benefits of static typing. Your errors are reported later, and there are performance implications, and …"

"Exactly.", I interrupt. "We are not arguing about typing, for all programs are well typed. We are arguing about pragmatics."

Definitions

Before I can argue that untyped programs don’t exist, I need some near-formal definitions to work with. I posit the following definitions are reasonable, intuitive definitions about types and programs.

Definition. An expression is a symbol of sequence of symbols given some interpretation.

Example.

5 is an expression, whose interpretation is the number five.

is an expression, whose interpretation is the number five. e₁ + e₁ is an expression, whose interpretation is the mathematical addition function applied to expressions e₁ and e₂ .

is an expression, whose interpretation is the mathematical addition function applied to expressions and . function(): return 5; is an expression, whose interpretation is a mathematical function that when applied to any number of arguments returns the expression 5 .

Definition. A type is a statement of the invariants of some expressions.

Example.

a register word is a type describing the kinds of values that fit in an x86 register, such as a collection of 32 bits. A register word supports operations such as: move a value of type register word into a register move a value of type register word from one register into another

a pointer is a type describing a memory address. It is either uninitialized or a valid memory address. A pointer supports operations such as: initialization, giving an uninitialized pointer a value dereference, reading the value of the memory address of an initialized pointer

a Nat is a type describing an element of the set of natural numbers. A Nat supports operations such as: addition multiplication subtraction, but only when subtracting a smaller natural number from a larger natural number



Definition. A language is collection of expressions.

Example.

arith is a language containing the following expressions e: 0 , 1 , 2 …, etc and -1 , -2 , -3 , …, etc, each representing an integer e₁ + e₂ , where e₁ and e₂ are integers e₁ - e₂ , where e₁ and e₂ are integers

JavaScript is a language, defined by the ECMAScript standard, and extended by various implementations.

Definition. A program is a collection of expressions from some language.

Example.

5 + 5 is an arith program.

is an arith program. 5 is a JavaScript program.

Is X a Typed Language?

Is x86 assembly a typed language?

I say yes.

First, x86 assembly is a language. The language x86 assembly meets our definition of a language: it defines a collection of symbols or sequences of symbols given some interpretation. For example, mov ax, bx is an x86 assembly program that moves the contents of register bx to register ax .

Second, x86 is typed. Each expression in x86 assembly has invariants stated about the expression. For example, x86 defines the type “little endian”, which describes the particular encoding of binary data such as numbers over which operations like addition are defined. The division operation is well typed: as division is only defined when the denominator is non-zero. Attempting to divide by zero cause a type error (a dynamic exception).

I would make the same argument for every other language. C is a typed language. So is JavaScript. And Racket. And Haskell.

But I Don’t Get Type Errors in X!

First, you probably do. Second, when you don’t, that’s a major problem.

Let’s visit x86 for a moment, to see dynamically enforced type errors. Recall that division is not defined when the denominator is zero. The result of division by zero in x86 is defined to be a general-protection exception, error code 0. That is a type error. It’s a type error describing that you attempted to divide by zero, and that this is ill-typed. It is a dynamically enforced type error.

Let’s move to C, in which we can easily see two different kinds of type errors: static and unenforced. The language C includes expressions like x=e , where x is a declared name and e is an expression. The expression x=e raises a static error when x is undeclared; this is a static type error. It is a statically enforced invariant that names must be declared before they are used. Other invariants are not enforced at all, such as notorious undefined behavior. For example, bool b; if(b) { ... }; violates a C invariants, namely that uninitialized scalars are never used. However, C does not attempt to enforce this invariant, either statically or dynamically. The result of this sequence of symbols is undefined in C.

Untyped Programs Don’t Exist

First, a few more definitions based on the above arguments about the languages x86 and C.

Definition. A type error is an error raised during the enforcement of a type, i.e., during the enforcement of an invariant about an expression.

Definition. Undefined behavior is the result of interpreting a non-expression, i.e., a sequence of symbols that have no meaning because some invariant has been violated.

Theorem. Untyped Programs Don’t Exist.

Proof. Recall that programs consist of expressions from a language. Expressions are sequences of symbols that have meaning. But undefined behavior only results from non-expressions. As programs are composed of expressions, a program cannot have undefined behavior. Therefore, all programs obey the invariants required by the expressions in the language. That is, all programs are well typed, and untyped programs don’t exist. QED.

I warned you it was a trivial theorem.

Conclusion

The theorem is trivial, but still useful because it helps us reframe our discussion.

Really, the statement is just a rephrasing of type safety: “well typed programs don’t go wrong”. For type safety, what we show is that programs exhibit only defined behavior. The difference is that, typically, type safety is typically thought of as a property of a language, and in particular, of statically typed languages. We should think about type safety differently: it is a property we must enforce of programs. Enforcing it via static typing of every program in the language is one useful way, but it is not the only way, and we cannot always hope to have type safety of a language.

Instead of arguing about untyped vs typed, a non-existent distinction, we should accept that all programs have invariants that must be obeyed, i.e., all programs are typed. The argument we must have is about the pragmatics of types and type checking.

how can we express types about complex languages like x86 and C

under what situations should we enforce types, i.e., check types

is type checking useful

should we check types statically or dynamically

should we allow the programmer to circumvent types checking

is type checking decidable

should it be

Related Reading

This paper proves equivalence of an intrinsically typed languages in which meaning is only assigned to well-typed programs and an extrinsically typed language in which programs are first given meaning and can separately be ascribed types and proved to inhabit those types. In the extrinsic semantics, Reynold’s treat all programs as existing in the universal domain, and use embedding-projection pairs essentially as contracts at run-time, since, e.g., only a function can be called. In my mind, this work essentially proves the same theorem as this blog post: even in an when the semantics of programs consider typing as happening “after” semantics, the semantics still require types.

This blog post argues that dynamic languages are just straight-jacketed versions of static languages, and therefore they aren’t really a separate class of languages. In many ways, I agree with this blog post. Because “dynamic” languages lack any static enforcement, they can be a hindrance when you do know how to encode the types you want, and they can lead to weird type confusing programming patterns. My favorite example pattern is from Racket, where the value #f is sometimes used at type bool and sometimes at type Maybe A . This can lead to annoying problems with functions like findf over a list of bools . However, I think it ignores some of the pragmatics. For example, while sum types give you incredible expressive power, tagged sums are very annoying to use in many languages that enforce static typing, while very simple to use when you are not required to statically prove a term inhabits a sum.

This blog post begins to get at some of the same criticisms of Harper’s view, and starts to talk about pragmatics.

This blog post, reproduced in 2010 on a perl blog, does a great job of breaking down some false dichotomies and fallacies in discussions about type systems. It into more depth than this article about some distinctions in type systems, when they are meaningful and when they are not, and I pretty much agree with it.

The abstract and chapter 1 of this dissertation have something to say about syntax and semantics, which I think are very related to the topic of this blog post. In particular, the thoughts on symbolic-syntactic reasoning I think are vital to understanding the trade-offs in different enforcements of typing.