The Behavior of Gradual Types: A User Study

Note: this post is an extended abstract for the paper The Behavior of Gradual Types: A User Study by Preston Tunnell—Wilson, Ben Greenman, Justin Pombrio, and Shriram Krishnamurthi. For the full paper, datasets, and slides, click here.

The long-term goal of gradual typing is to build languages that offer the “best” of both static and dynamic typing. Researchers disagree, however, on what the semantics of a mixed-typed language should be; there are at least three competing proposals for combining a dynamically-typed language with a similar statically-typed language.

It’s an interesting situation. There are dozens of papers on the semantics of gradual types—and many claim to have developers in mind—but zero papers that ask developers what they think.

To help inform the discussion, we recently designed a survey to see what programmers think of three mixed-typed semantics. The survey is based on 8 example programs; we selected these 8 programs because the set as a whole tells the three mixed-typed semantics apart. For each program, the survey presents a few possible outcomes of running the program and asks participants for their opinion on each outcome.

The image below shows one program from the survey:

This program creates an array, passes it between typed and untyped variables, and performs write & read operations. What should happen when we run this program? One option is to ignore the type annotations and return the second element of the array ( "bye" ). A second option is to reject the write operation (on line 4) because it attempts to write a number to a variable of type Array(String) . A third option is to reject the assignment after the read operation (on line 5) because it attempts to assign a string to a variable of type Number . These are the three behaviors in the survey:

A fourth option is to reject the assignment of an Array(String) to a variable of type Array(Number) . A few participants left comments asking for this behavior. See the anonymized responses for their comments, and see the paper for why we left that behavior out.

For each behavior, we asked for respondents’ preference along two independent dimensions:

Do you like or dislike this behavior?

Does it match your expectation as a programmer?

Combined, the dimensions lead to four possible attitudes: Like and Expected, Like and Unexpected, Dislike and Expected, Dislike and Unexpected. The full example question, with attitudes and space for comments, is below.

We administered the survey to three populations — software engineers, students, and Mechanical Turk workers — and thereby collected three sets of attitudes for each question. The results for the running example are below:

The figure is a matrix of three columns (one for each population) and three rows (one for each behavior). Each cell of the matrix contains a bar chart showing the attitudes that we collected.

Unlike the survey question, the behaviors in the results are labeled as Deep, Erasure, and Shallow. These names describe the three mixed-typed semantics.

For this question, the software engineers (left column, green bars) mostly picked the “Dislike and Unexpected” attitude for every behavior. The students (mid column, blue bars) also show consensus on “Dislike and Unexpected” for the Deep and Erasure behaviors; however, they are split for the Shallow behavior. The Mechanical Turk workers are divided on every behavior.

See the paper for the other questions and responses.

Overall, our main finding is that respondents preferred behaviors that enforced full types and reported runtime mismatches as early as possible. The takeaway is thus:

if you are designing a mixed-typed language and choose not to enforce full types, then make sure to explain this behavior to users!

Put lots of example programs in the language’s documentation. The programs in the survey can be adapted to explain how your chosen behavior differs from alternatives.

Questions

Here are some good questions we’ve gotten that are not clearly answered in the paper.

Q. Did any respondents “expect” more than one behavior?

Yes, 59% of the software engineers and 82% of the students selected “Liked and Expected” and/or “Dislike and Expected” for different behaviors on the same program.

Q. Did the respondents have a prior preference for static or dynamic typing?

Near the end of the survey we asked: “Which do you prefer, typed or untyped programming?”. See table 2 of the paper for coded responses to this question, or the anonymized responses for the ground truth. Most preferred typed programming.