The Purpose of Scala's Type System

A Conversation with Martin Odersky, Part III

by Bill Venners and Frank Sommers

May 18, 2009




Summary

Martin Odersky talks with Bill Venners and Frank Sommers about the design motivations behind Scala's type system.

Scala is an emerging general-purpose, type-safe language for the Java Platform that combines object-oriented and functional programming. It is the brainchild of Martin Odersky, a professor at Ecole Polytechnique Fédérale de Lausanne (EPFL). In this multi-part interview series, Artima's Bill Venners and Frank Sommers discuss Scala with Martin Odersky. In Part I, The Origins of Scala, Odersky gives a bit of the history that led to the creation of Scala. In Part II, The Goals of Scala, he discusses the compromises, goals, innovations, and benefits of Scala's design. In this installment, he dives into the design motivations for Scala's type system.

The value of Scala's "scalability"

Frank Sommers: In your talk at last year's JavaOne, you claimed the Scala is a "scalable language," that you can program in the small and program in the large with Scala. How does this help me as a programmer to use a language like this?

Martin Odersky: The way it helps you is by not having to mix many specialized languages. You can use the same language for very small as well as very large programs, for general purpose things as well as special purpose application domains. And that means that you need not worry about how you push data from one language environment into the next.

Currently if you want to push data across boundaries, you are often thrown back to low level representations. For instance, if you want to ship an SQL query from Java to a database and you use JDBC, your query ends up as a string. And that means that a small typo in your program will manifest itself as an ill-formed query at runtime, possibly at your customer site. There's no compiler or type system to tell you that you shouldn't have done that. This is very fragile and risky. So there's a lot to be gained if you have a single language.

The other issue is tooling. If you're using a single language, you can have a single environment with tooling. Whereas if you have many different languages, you have to mix and match environments and your build become much more complicated and difficult.

Scala's extensibility

Frank Sommers: You also mentioned in your talk the notion of extensibility, that Scala can be extended easily. Can you explain how? And again, how does that help the programmer?

Martin Odersky: The first dimension of scalability is from small to large, but I think there's another notion of extensibility from general to your specific needs. You want to be able to grow the language into domains that you particularly care about.

One example is numeric types. There are a lot of special numeric types out there—for instance, big integers for cryptographers, big decimals for business people, complex numbers for scientists—the list goes on. And each of these communities really cares deeply about their type, but a language that combined them all would be very unwieldy.

The answer, of course, is to say, well, let's do these types in libraries. But then if you really care about this application domain, you want the code accessing these libraries to look just as clean and sleek as code accessing built-in types. For that you need extensibility mechanisms in the language that let you write libraries such that users of those libraries don't even feel that it is a library. For users of a library, let's say a big decimal library, the BigDecimal type should be just as convenient to use as a built-in Int .

Programming in the small with types

Frank Sommers: You mentioned earlier the importance of types in the context of having one language instead of many. I think most people appreciate the utility of types when programming in the large. When you have a very large-scale program, types help you organize the program and make changes to it reliably. But what do types buy us in terms of programming in the small, when you program just a script, for example? Are types important on that level as well?

Martin Odersky: They are probably less important when programming in the small. Types can be in a spectrum from incredibly useful to extremely annoying. Typically the annoying parts are type definitions that are redundant, which require you to do a lot of (finger) typing. The useful parts are, of course, when types save you from errors, when types give you useful program documentation, when types act as a safety net for safe refactoring.

Unit tests and free expression

Scala has type inference to try and let you minimize the annoying bits as much as possible. That means if you write a script, you don't see the types. Because you can just leave them off and the system will infer them for you. At the same time, the types are there so if you make a type error in the script, the compiler will catch it and give you an error message. And I believe no matter whether it is a script or a large system, it's always more convenient to fix this thing immediately with the compiler than later on.

You still need unit tests to test your program logic, but compared to a dynamically typed language, you don't need a lot of the more trivial unit tests that may be just about the types. In the experience of many people, you need a lot fewer unit tests than you would in a dynamic language. Your mileage might vary, but that's been our experience in several cases.

The other objection that's been leveled against static type systems is that they constrain you too much in what you want to express. People say, "I want to express myself freely. I don't want a static type system getting in the way." In my experience in Scala this has not been true, I think for two reasons. The first is that the type system in Scala is actually very flexible, so it typically lets you compose things in very flexible patterns, which a language like Java, which has a less expressive type system, would often make more difficult. The second is that with pattern matching, you can recover type information in a very flexible way without even noticing it.

The idea of pattern matching is that in Scala I can take an object about which I know nothing, and then with a construct like a switch statement, match it against a number of patterns. And if it is one of these patterns, I can also immediately pull out the fields into local variables. Pattern matching is a construct that's built deep into Scala. A lot of Scala programs use it. It is a normal way to do things in Scala. One interesting thing is that by doing a pattern match you also recover the types automatically. What you put in was an object, which you didn't know anything about. If a pattern matches, you actually know you have something that corresponds to the type of the pattern. And the system is able to use that.

Because of pattern matching, you can quite easily have a system where your types are very general, even maximally general—like the type of every variable is Object —but you can still get everything out that you want through the use of pattern matching. So in that sense, you can program in Scala perfectly well as if it were a dynamically typed language. You would just use Object everywhere and pattern match everywhere. Now people usually don't do that, because you want to take more advantage of static types. But it is a very fluid fallback, a fallback that you don't even notice. By comparison, the analog in Java where you would have to use a lot of type tests ( instanceof ) and type casts is really heavyweight and clunky. And I completely understand why people object to having to do that all over the place.

Quacking like a duck

Bill Venners: One of the things I have observed about Scala is that there are a lot more things I can express or say about my program in Scala's type system compared to Java's. People fleeing Java to a dynamic language often explain that they were frustrated with the type system and found they have a better experience if they throw out static types. Whereas it seems like Scala's answer is to try and make the type system better, to improve it so it is more useful and more pleasant to use. What kind of things can I say in Scala's type system that I can't in Java's?

Martin Odersky: One objection leveled against Java's type system is that it doesn't have what's often called duck typing. Duck typing is explained as, if it walks like a duck and quacks like a duck, it is a duck. Translated, if it has the features that I want, then I can just treat it as if it is the real thing. For instance, I want to get a resource that is closable. I want to say, "It needs to have a close method." I don't care whether it's a File or a Channel or anything else.

In Java, for this to work you need a common interface that contains the method, and everybody needs to implement that interface. First, that leads to a lot of interfaces and a lot of boilerplate code to implement all that. And second, it is often impossible to do if you think of this interface after the fact. If you write the classes first and the classes exist already, you can't add a new interface later on without breaking source code unless you control all the clients. So you have all these restrictions that the types force upon you.

One of the aspects where Scala is more expressive than Java is that it lets you express these things. In Scala it is possible to have a type that says: anything with a close method that takes no parameter and returns Unit (which is similar to void in Java). You can also combine it with other constraints. You can say: anything inheriting from a particular class that in addition has these particular methods with these signatures. Or you can say: anything inheriting from this class that has an inner class of a particular type. Essentially, you can characterize types structurally by saying what needs to be in the types so that you can work with them.

Existential types

Bill Venners: Existential types were added to Scala relatively recently. The justification I heard for existentential types was that they allow you to map all Java types, in particular Java's wildcard types, to Scala types. Are existential types larger than that? Are they a superset of Java's wildcard types? And is there any other reason for them that people should know about?

Martin Odersky: It is hard to say because people don't really have a good conception of what wildcards are. The original wildcard design by Atsushi Igarashi and Mirko Viroli was inspired by existential types. In fact the original paper had an encoding in existential types. But then when the actual final design came out in Java, this connection got lost a little bit. So we don't really know the status of these wildcard types right now.

Existential types have been around for a number of years, about 20 years now. They express something very simple. They say you have a type, maybe a list, with an element type that you don't know. You know it's a list of some specific element type, but you don't know the element type. In Scala that would be expressed with an existential type. The syntax would be List[T] forSome { type T } . That's a bit bulky. The bulky syntax is in fact sort of intentional, because it turns out that existential types are often a bit hard to deal with. Scala has better alternatives. It doesn't need existential types so much, because we can have types that contain other types as members.

Scala needs existential types for essentially three things. The first is that we need to make some sense of Java's wildcards, and existential types is the sense we make of them. The second is that we need to make some sense of Java's raw types, because they are also still in the libraries, the ungenerified types. If you get a Java raw type, such as java.util.List it is a list where you don't know the element type. That can also be represented in Scala by an existential type. Finally, we need existential types as a way to explain what goes on in the VM at the high level of Scala. Scala uses the erasure model of generics, just like Java, so we don't see the type parameters anymore when programs are run. We have to do erasure because we need to interoperate with Java. But then what happens when we do reflection or want to express what goes on the in the VM? We need to be able to represent what the JVM does using the types we have in Scala, and existential types let us do that. They let you talk about types where you don't know certain aspects of those types.

Bill Venners: Can you give a specific example?

Martin Odersky: Take Scala lists as an example. I want to be able to describe the return type of the method, head , which returns the first element (the "head") of the list. On the VM level, it is a List[T] forSome { type T } . We don't know what T is, but head returns a T . The theory of existential types tells us that is a T for some type T, which is equivalent to the root type, Object . So we get this back from the head method. Thus in Scala, when we know something we can eliminate these existential qualifications. When we don't know something, we leave them in, and the theory of existential types helps us there.

Bill Venners: Would you have added existential types if you didn't need to worry about the Java compatibility concerns of wildcards, raw types, and erasure. If Java had reified types and no raw types or wildcards, would Scala have existential types?

Martin Odersky: If Java had reified types and no raw types or wildcards, I don't think we would have that much use for existential types and I doubt they would be in Scala.

Variance in Java and Scala

Bill Venners: In Scala variance is defined at the point the class is defined whereas in Java it's done at the usage sites with wildcards. Can you talk about that difference?

Martin Odersky: Because we can model wildcards in Scala with existential types, you actually can if you want do the same thing as in Java. But we encourage you to not do that and use definition site variance instead. Why? First, what is definition-site variance? When you define a class with a type parameter, for instance List[T] , that raises a question. If you have a list of apples, is that also a list of fruit? You would say, yes, of course. If Apple is a subtype of Fruit , List[Apple] should be a subtype of List[Fruit] . That subtyping relationship is called covariance. But in some cases, that relationship doesn't hold. If I have, say, a variable in which I can put only an Apple , a reference of type Apple . That's not a reference of type Fruit because I can't just assign any Fruit to this variable. It has to be an Apple . So you can see there are some situations where we should have the subtype relationship, and others where we shouldn't.

The solution in Scala is we annotate the type parameter. If List is covariant in T , we would write List[+T] . That would mean List s are covariant in T . There are certain conditions that are attached to that. For instance, we can do that only if nobody changes the list, because otherwise we would get into the same problems that we had with the references.

T

List

List

What happens in Scala is the programmer says, well I think lists should be covariant, which means they respect the subtype relationships. Then the programmer would decorate the type parameterwith a plus sign at the declaration site—only once for alls that anybody ever uses. Then the compiler will go and figure out whether all the definitions withinare actually compatible with that, that there's nothing being done with lists that would be conflicting there. If there is something that's incompatible with covariance, the Scala compiler will issue an error. Scala has a range of techniques to deal with those errors, which a competent Scala programmer will pick up fairly quickly. A competent Scala programmer can apply those techniques and end up with a class that compiles and is covariant for the users. Users don't have to think about it anymore. They know if I have a list I can just use it covariantly everywhere. So that means there was just one person who wrote the list class, who had to think a little bit harder, and it was not so bad because the compiler helped this person with error messages.

By contrast, the Java approach of having wildcards means that in the library you do nothing. You just write List<T> and that's it. And then if a user wants a covariant list, they write not List<Fruit> , but List<? extends Fruit> . So that's a wildcard. The problem is that that's user code. And these are users who are often not as expert as the library designers. Furthermore, a single mismatch between these annotations gives you type errors. So no wonder you can get a huge number of completely intractable error messages related to wildcards, and I think that this more than anything else has given the current Java generics a bad rap. Because really this wildcard approach is quite complicated for normal humans to grasp and deal with.

Variance is something that is essential when you combine generics and subtyping, but it's also complex. There's no way to make this completely trivial. The thing we do better than Java is that we let you do it once in the libraries, so that the users don't have to see or deal with it.

Abstract type members

Bill Venners: In Scala, a type can be a member of another type, just as methods and fields can be members of a type. And in Scala those type members can be abstract, like methods can be abstract in Java. Is there not some overlap between abstract type members and generic type parameters? Why does Scala have both? And what do abstract types give you beyond what the generics give you?

Martin Odersky: Abstract types give you some things beyond what generics give you, but let me first state a somewhat general principle. There have always been two notions of abstraction: parameterization and abstract members. In Java you also have both, but it depends on what you are abstracting over. In Java you have abstract methods, but you can't pass a method as a parameter. You don't have abstract fields, but you can pass a value as a parameter. And similarly you don't have abstract type members, but you can specify a type as a parameter. So in Java you also have all three of these, but there's a distinction about what abstraction principle you can use for what kinds of things. And you could argue that this distinction is fairly arbitrary.

What we did in Scala was try to be more complete and orthogonal. We decided to have the same construction principles for all three sorts of members. So you can have abstract fields as well as value parameters. You can pass methods (or "functions") as parameters, or you can abstract over them. You can specify types as parameters, or you can abstract over them. And what we get conceptually is that we can model one in terms of the other. At least in principle, we can express every sort of parameterization as a form of object-oriented abstraction. So in a sense you could say Scala is a more orthogonal and complete language.

Now the question remains, what does that buy you? What, in particular, abstract types buy you is a nice treatment for these covariance problems we talked about before. One standard problem, which has been around for a long time, is the problem of animals and foods. The puzzle was to have a class Animal with a method, eat , which eats some food. The problem is if we subclass Animal and have a class such as Cow , then they would eat only Grass and not arbitrary food. A Cow couldn't eat a Fish , for instance. What you want is to be able to say that a Cow has an eat method that eats only Grass and not other things. Actually, you can't do that in Java because it turns out you can construct unsound situations, like the problem of assigning a Fruit to an Apple variable that I talked about earlier.

The question is what do you do? The answer is that you add an abstract type into the Animal class. You say, my new Animal class has a type of SuitableFood , which I don't know. So it's an abstract type. You don't give an implementation of the type. Then you have an eat method that eats only SuitableFood . And then in the Cow class I would say, OK, I have a Cow , which extends class Animal , and for Cow type SuitableFood equals Grass . So abstract types provide this notion of a type in a superclass that I don't know, which I then fill in later in subclasses with something I do know.

Now you could say, well I could do the same thing with parameterization. And indeed you can. You could parameterize class Animal with the kind of food it eats. But in practice, when you do that with many different things, it leads to an explosion of parameters, and usually, what's more, in bounds of parameters. At the 1998 ECOOP, Kim Bruce, Phil Wadler, and I had a paper where we showed that as you increase the number of things you don't know, the typical program will grow quadratically. So there are very good reasons not to do parameters, but to have these abstract members, because they don't give you this quadratic blow up.

Getting used to the syntax

Bill Venners: When people look at random Scala code, there are two things that I think can make it look a bit cryptic. One is a DSL they are not familiar with, like the parser combinators or the XML library. The other is the kinds of expressions in the type system, especially combinations of things. How can Scala programmers get a handle on that kind of syntax?

Martin Odersky: Certainly there's a lot of new stuff there that has to be learned and absorbed. So this will take some time. I believe one of the things we have to work on is better tool support. Right now when you get a type error, we try to give you a nice error message. Sometimes it spans multiple lines to be able to explain more. We try to do a good job, but I think we could do much better if we could be more interactive.

Imagine if you had a dynamically typed language and you only had three or four lines max for an error message when something went wrong at runtime. There would be no debugger. There would be no stack trace. There would be just three or four lines, such as "null pointer dereference," and maybe the line number where it happened. I don't think dynamic languages would be very popular under those circumstances. Of course, that's not what happens. You're thrown into a debugger where you can quickly find out where the root of the problem is.

For types, we don't have that yet. All we have are these error messages. If you have a very rich and expressive type system that requires more knowledge to make sense of those error messages, you want more help. So one thing we want to investigate in the future is whether we can actually give you a more interactive environment such that if the types go wrong, you could find out why. For example, how the compiler figured out that this expression has this type, and why it doesn't think that this type conforms to some expected type. You could explore these things interactively. I think then the causes of type errors would be much easier to see than they are now.

On the other hand, some syntax is just new and takes some getting used to. That's probably something we can't avoid. We only hope that a couple of years from now these will be types that people take completely naturally and never question. There have been other things in mainstream languages that took some getting used to. I remember very well when exceptions came out, people found them strange. It took a lot of time to get used to them. And now of course everybody thinks they are completely natural. They are not novel anymore. And certainly Scala has a couple of things, mostly in the type side, which take some getting used to.

Next Week

Come back Monday, May 25 for the next installment of this conversation with Martin Odersky. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter by clicking its checkbox in your account settings.

Talk Back!

Have an opinion about the history presented in this article? Discuss this article in the Articles Forum topic, The Purpose of Scala's Type System.

Resources

Martin Odersky is coauthor of Programming in Scala:

http://www.artima.com/shop/programming_in_scala

The Scala programming language website is at:

http://www.scala-lang.org

The original paper on wildcards is "On Variance-Based Subtyping for Parametric Types", by Atsushi Igarashi and Mirko Viroli. In Proc. of ECOOP'02, Springer LNCS, page 441-469. 2002:

http://groups.csail.mit.edu/pag/reading-group/variance-ECOOP02.pdf (PDF)

The quadratic growth of programs that occurs as you increase the number types you don't know is described in "A Statically Safe Alternative to Virtual Types", by Kim Bruce, Philip Wadler, and Martin Odersky. In Proc. of ECOOP'98, Springer LNCS, page 523-549. 1998:

http://lampwww.epfl.ch/~odersky/papers/alt.ps.gz (Postscript)