Computing Thoughts

Scala: The Static Language that Feels Dynamic

by Bruce Eckel

June 12, 2011



Summary

The highest complement you can deliver in the Python world is to say that something is "Pythonic" -- that it feels and fits into the Python way of thinking. I never imagined that a static language could feel this way, but Scala does -- and possibly even better.


I'm actually glad I waited this long before beginning to learn the language, because they've sorted out a lot of issues in the meantime. In fact, several versions of the language have made breaking changes with previous versions, requiring code rewrites. Some people have found this shocking; an indication that the language is "immature" and "not ready for the enterprise." I find it one of the most promising things about Scala -- it is not determined to become an instant boat anchor by committing to early decisions that are later revealed to be suboptimal, or outright mistakes. Java is the perfect case study, unable to pry its cold, dead fingers from old decisions made badly in a rush to meet an imagined deadline imposed by the Internet. C++ was admirable when it determined to be C-compatible because it brought legions of C programmers into the world of object-oriented programming, but coping with the resulting hurdles is no longer a good use of programmer time.

Indeed, I grew tired of the whole mindset that language design is more important than programmer time; that a programmer should work for the language rather than the reverse. So much so that I thought I had grown out of programming altogether. But now I think I might just have been tired of the old generation of languages and waiting for the next generation -- and especially the forward-thinking around those languages.

If you've read my past writings, you know I am unimpressed with arguments about static type checking for its own sake, which typically come down to "if I can't know X is an int, then the world will collapse!" I've written and seen enough robust code in Python to be unswayed by such histrionics; the payoff for all the hoop-jumping in C++ and Java seems small compared to what can be accomplished using far less, and much clearer, Python code.

Scala is the first language I've seen where static type-checking seems to pay off. Some of its amazing contortional abilities would not, I think, be possible without static type checking. And, as I shall attempt to show in this article, the static checking is relatively unobtrusive -- so much so that programming in Scala almost feels like programming in a dynamic language like Python.

It's Not "Just About Finger Typing" One retort I've gotten a lot when I discuss the shortcomings of Java compared with a language like Python is "oh, you're just complaining about Finger Typing" (as opposed to the "typing" of type-checking). You can trivialize "finger typing" but in my experience it really does make a big difference when you can take an idea and express it in a few keystrokes versus the veritable vomiting of code necessary to express even the simplest concepts in Java. The real problem is not the number of keystrokes, but the mental load. By the time you've jumped through all those hoops, you've forgotten what you were actually trying to do. Often, the ceremony involved in doing something will dissuade you from trying it. Scala removes as much of the overhead (and mental load) as possible, so you can express higher-order concepts as quickly as you can type them. I was amazed to discover that in many cases, Scala is even more succinct than Python. The result of all this is something I've always loved about Python: the level of abstraction is such that you can typically express an idea in code more easily and clearly than you can by making diagrams on a whiteboard. There's no need for that intermediate step. Let's look at an example. Suppose you'd like to model buildings. We can say: class Building val b = new Building Note the absolute minimum amount of ceremony to create a class -- great when you're just sketching out a solution. If you don't need parens, you don't write them. A val is immutable, which is preferred in Scala because it makes concurrent code easier to write (there is also var for variables). And notice that I didn't have to put any type information on b, because Scala has type inference so if it can figure out the type for you, it will. No more jumping through hoops to satisfy a lazy language. If we want the Building to know how many square feet it contains, there's an explicit way: class Building(feet: Int) { val squareFeet = feet } val b = new Building(100) println(b.squareFeet) When you do need to provide type information, you just give it after a colon. Note that println() does not require Java's System.out scoping. And class fields default to public -- which is not a big deal if you can stick to val, since that makes it read-only. You can always make them private if you want, and Scala has more fine-grained access control than any language I've seen. If all you want to do is store the argument in the class, as above, Scala makes it easy. Note the addition of the val in the argument list: class Building(val feet: Int) val b = new Building(100) println(b.feet) Now feet automatically becomes the field. But it doesn't stop there. Scala has the case class which does even more for you. For one thing, arguments automatically become fields, without saying val before them: case class Building(feet: Int) val b = Building(100) println(b) // Result: Building(100) Note the new is no longer necessary to create an object, the same form that Python uses. And case classes rewrite toString for you, to produce nice output. But wait, there's more! A case class automatically gets an appropriate hashcode and == so you can use it in a Map (the -> separates keys from values): val m = Map(Building(5000) -> "Big", Building(900) -> "Small", Building(2500) -> "Medium") m(Building(900)) // Result: Small Note that Map is available (along with List, Vector, Set, println() and more) as part of the "basic Scala building set" that comes without any imports. Again, this feels like Python. Inheritance is also succinct. Suppose we want to subclass Building to make a House class: class House(feet: Int) extends Building(feet) val h = new House(100) println(h.feet) // Result: 100 Although the extends keyword is familiar from Java, notice how the base-class constructor is called -- a pretty obvious way to do it, once you've seen it. And again, you don't write any more code than what is absolutely necessary to describe your system. We can also mix in behavior using traits. A trait is much like an interface, except that traits can contain method definitions, which can then be combined when creating a class. Here are several traits to help describe a house: trait Bathroom trait Kitchen trait Bedroom { def occupants() = { 1 } } class House(feet: Int) extends Building(feet) with Bathroom with Kitchen with Bedroom var h = new House(100) val o = h.occupants() val feet = h.feet occupants() is a typical Scala method definition: the keyword def followed by the method name, argument list, and then an = and the body of the method in curly braces. The last line in the method produces the return value. More type inference is happening here; if we wanted to be more specific we could specify the return type of the method: def occupants(): Int = { 1 } Notice that the method occupants() is now part of House, via the mixin effect of traits. Consider how simple this code is ... and how undistracting. You can talk about what it's doing, rather than explaining meaningless syntactic requirements as you must do in Java. Creating a model takes no more than a few lines of straightforward code. Wouldn't you rather teach this to a novice programmer than Java?

Functional Programming Functional programming is often promoted first as a way to do concurrency. However, I've found it to be more fundamentally useful as a way to decompose programming problems. Indeed, C++ has had functional programming virtually from inception, in the form of the STL, without built-in support for concurrency. Python also has significant functional programming libraries but these are independent of its thread support (which, since Python cannot support true parallelism, is primarily for code organization). Scala has the best of both worlds: true multiprocessor parallelism and a powerful functional programming model -- but one that does not force you to program functionally if it's not appropriate. When approaching a functional style of programming, I think it's important to go slow and be gentle with yourself. If you push too hard you can get caught up in knots. In fact, I think one of the great benefits of learning functional programming is that it disciplines you to break a problem into small, provable steps -- and to use existing (and proven) code for each of those steps whenever possible. This not only makes your non-functional code better, but it also tends to make everything you write more testable, since functional programming focuses on transforming data (thus, after each transformation, you have something else to test). Much of functional programming involves performing operations on collections. If, for example, we have a Vector of data: val v = Vector(1.1, 2.2, 3.3, 4.4) You can certainly print this using a for loop: for(n <- v) { println(n) } The left-arrow can be pronounced "in" -- n gets each value in v. This syntax is definitely a step up from having to give every detail as you had to do in C++ and Java (note that Scala does all the creation and type-inference for n). But with functional programming, you extract the looping structures altogether. Scala collections and iterables have a large selection of operations to do this for you. One of the simplest is foreach, which performs an operation on each element in the collection. So the above code becomes: v.foreach(println) This actually uses several shortcuts, and to take full advantage of functional programming you first need to understand the anonymous function -- a function without a name. Here's the basic form: ( function parameters ) => function body The => is often pronounced "rocket," and it means, "Take the parameters on the left and apply them in the code on the right." An anonymous function can be large; if you have multiple lines, just put the body inside curly braces. Here's a simple example of an anonymous function: (x:Int, y:Double) => x * y The previous foreach call is, stated explicitly: v.foreach((n:Double) => println(n)) Usually, you can rely on Scala to do type inference on the argument -- in this case Scala can see that v contains Double so it can infer than n is a Double: v.foreach((n) => println(n)) If you only have a single argument, you can omit the parentheses: v.foreach(n => println(n)) When you have a single argument, you can leave out the parameter list altogether and use an underscore in the anonymous function body: v.foreach(println(_)) And finally, if the function body is just a call to a single function that takes one parameter, you can eliminate the parameter list, which brings us back to: v.foreach(println) With all these options and the density possible in functional programming, it's easy to succumb to fits of cleverness and end up writing obtuse code that will cause people to reject the language as too complex. But with some effort and focus on readability this doesn't need to happen. foreach relies on side effects and doesn't return anything. In more typical functional programming you'll perform operations (usually on a collection) and return the result, then perform operations on that result and return something else, etc. One of the most useful functional tools is map, rather unfortunately named because it's easy to confuse with the Map data structure. map performs an operation on each element in a sequence, just like foreach, but map creates and returns a new sequence from the result. For example: v.map(n => n * 2) multiplies each element in v by 2 and returns the result, producing: Vector(2.2, 4.4, 6.6, 8.8) Again, using shortcuts we can reduce the call to: v.map(_ * 2) There are a number of operations that are simple enough to be called without parameters, such as: v.reverse v.sum v.sorted v.min v.max v.size v.isEmpty Operations like reverse and sorted return a new Vector and leave the original untouched. It's common to see operations chained together. For example, permutations produces an iterator that selects all the different permutations of v. To display these, we pass the iterator to foreach: v.permutations.foreach(println) Another helpful function is zip, which takes two sequences and puts each adjacent element together, like a zipper. This: Vector(1,2,3).zip(Vector(4,5,6)) produces: Vector((1,4), (2,5), (3,6)) (Yes, the parenthesized groups within the Vector are tuples, just like in Python). We can get fancy, and zip the elements of v together with those elements multiplied by 2: v.zip(v.map(_ * 2)) which produces: Vector((1.1,2.2), (2.2,4.4), (3.3,6.6), (4.4,8.8)) It's important to know that anonymous functions are a convenience, and very commonly used, but they are not essential for doing functional programming. If anonymous functions are making your code too complicated, you can always define a named function and pass that. For example: def timesTwo(d: Double) = d * 2 (This uses another Scala shortcut: if the function body fits on one line, you don't need curly braces). This can be used instead of the anonymous function: v.zip(v.map(timesTwo)) You know you could produce the same effect as the code in this section using for loops. One of the biggest benefits of functional programming is that it takes care of the fiddly code -- the very code that seems to involve the kind of common errors that easily escape our notice. You're able to use the functional pieces as reliable building blocks, and create robust code more quickly. It certainly is easy for functional code to rapidly devolve into unreadability, but with some effort you can keep it clear. For me, one of the best things about functional programming is the mental discipline that it produces. I find it helps me learn to break problems down into small, testable pieces, and clarifies my analysis. For that reason alone, it is a worthwhile practice.

Pattern Matching It's amazing how long programmers have put up with stone-age (or more appropriately, assembly-age) language constructs. The switch statement is an excellent example. Seriously, jumping around based on an integral value? How much effort does that really save me? People have begged for things as simple as switching on strings, but this is usually met with "no" from the language designers. Scala leapfrogs all that with the match statement, that looks much like a switch statement except that it can select on just about anything. The clarity and code savings is huge: // PatternMatching.scala (Run as script: scala PatternMatching.scala) trait Color case class Red(saturation: Int) extends Color case class Green(saturation: Int) extends Color case class Blue(saturation: Int) extends Color def matcher(arg:Any): String = arg match { case "Chowder" => "Make with clams" case x: Int => "An Int with value " + x case Red(100) => "Red sat 100" case Green(s) => "Green sat " + s case c: Color => "Some Color: " + c case w: Any => "Whatever: " + w case _ => "Default, but Any captures all" } val v = Vector(1, "Chowder", Red(100), Green(50), Blue(0), 3.14) v.foreach(x => println(matcher(x))) A case class is especially useful because the pattern matcher can decompose it, as you'll see. Any is the root class of all objects including what would be "primitive" types in Java. Since matcher() takes an Any we can be confident that it will handle any type that we pass in. Ordinarily you'd see an opening curly brace right after the = sign, to surround the entire function body in curly braces. In this case, the function body is a single statement so I can take a shortcut and leave off the outer braces. A pattern-matching statement starts with the object you want to match against (this can be a tuple), the match keyword and a body consisting of a sequence of case statements. Each case begins with the match pattern, then a rocket and one or more lines of code which execute upon matching. The last line in each case produces a return value. Match expressions can take many forms, only a few of which are shown here. First, you see a simple string match; however Scala has sophisticated regular expression syntax and you can use regular expressions as match expressions, including picking out the pieces into variables. You can capture the result of a match into a variable as in case x: Int. Case classes can produce an exact match as in Red(100) or you can pick out the constructor arguments as in Green(s). You can also match against traits, as in c: Color. You have two choices if you want to catch everything else. To capture into a variable, you can match Any, as in case w: Any. If you don't care what the value is, you can just say case _. Note that no "break" statement is necessary at the end of each case body.

Concurrency with Actors Most of what drove me away from programming were things I had figured out but couldn't convincingly express to others. Things that the Ph.D. computer scientists ought to be proving. Such as: Beyond a certain level of program complexity, you must have a garbage collector. This could be as simple as any program where objects can belong to more than one collection, but at some point I believe it becomes impossible to manage memory yourself. (C++ people didn't buy this one, although C++0X has hooks now for garbage collection).

Checked exceptions are a failed experiment. For small programs they seem like a good idea, but they don't scale up well.

Shared-memory concurrency is impossible to get right. In theory the smartest programmer in the world could play whack-a-mole long enough to chase down and patch all the race conditions. But then, all you have to do is change the program a little and everything comes back. Shared-memory is just the wrong model for concurrency. Note that all these are issues of scale -- things that work in the small start falling apart as programs get bigger or more complex. That's probably why they're hard to argue about, because demonstration examples can be small and obvious. It turns out I was arguing with the wrong people. Or rather, the right people were not arguing about it, they were off fixing the problems. When it comes to concurrency, the right answer is one that you can't screw up: you live behind a safe wall, and messages get safely passed back and forth over the wall. You don't have to think about whether something is going to lock up (not on a low level, anyway); you live in your little walled garden which happens to run with its own thread. The most object-ish approach to this that I've see is actors. An actor is an object that has an incoming message queue, often referred to as a "mailbox." When someone outside your walled garden wants you to do something, they send you a message that safely appears in your mailbox, and you decide how to handle that message. You can send messages to other actors through their mailboxes. As long as you keep everything within your walls and only communicate through messages, you're safe. To create an actor, you inherit from the Actor class and define an act() method, which is called to handle mailbox messages. Here's the most trivial example I could think of: // Bunnies.scala (Run as script: scala Bunnies.scala) case object Hop case object Stop case class Bunny(id: Int) extends scala.actors.Actor { this ! Hop // Constructor code start() // ditto def act() { loop { react { case Hop => print(this + " ") this ! Hop Thread.sleep(500) case Stop => println("Stopping " + this) exit() } } } } val bunnies = Range(0,10).map(new Bunny(_)) println("Press RETURN to quit") readLine bunnies.foreach(_ ! Stop) The act() method is automatically a match statement, although this is not built into the language -- Scala magic was used to make the Actor library work this way. Because of the match statement, case objects work especially well as messages (although, as with any match statement, you can match on virtually anything) -- a case object is just like a case class except that defining one automatically creates a singleton object. The loop{ react{ construct looks a little strange at first; this is an artifact of the evolution of Scala actors. In the initial design, you only had a loop to open the match statement for mailbox messages. But later, in an act of brilliance, it was determined that the concurrency provided by threads could be combined with cooperative multitasking, wherein a single thread of control is passed around -- cooperatively -- among tasks. Each task does something and then explicitly gives up control, which is then passed to the next task. The benefit of cooperative multitasking is that it requires virtually no stack space or context switching time and thus it can scale up -- often to millions of tasks. By combining this with threaded concurrency, you get the best of both worlds: The speed and scalability of cooperative tasks, which are also distributed across as many processors as are available. This all comes transparently. The loop{ react{ construct should be your default choice, and doesn't cost anything. I suspect if they were creating actors from scratch now, this construct would probably have been simplified into just loop{. Note the two "naked" lines of code at the beginning of class Bunny. In Scala, you don't have to put object initialization code inside a special method, and you can put it anywhere inside the body of the class. The first line uses the Actor operator ! for sending messages, and in this case the object sends a message to itself, to get things going. Then it calls start() to begin the actor's message loop. When the actor receives a Hop message, it prints itself, sends itself another Hop message, then sleeps for half a second. When it gets a Stop message it calls Actor.exit() to stop the event loop. To create all the Bunny objects I use Range() to create a sequence from 0 through 9, which is mapped onto calls to Bunny constructors. readLine waits for the user to press a carriage return, at which point a Stop message is sent to each Bunny. Scala 2.9 includes parallel collections, a powerful way to easily use multiple processors for bulk operations like foreach, map, etc. Suppose you have a collection of data objects called toBeProcessed and an expensive function process. To automatically parallelize the processing, you just add a .par: val result = toBeProcessed.par.map(obj => process(obj)) If you know that you have objects that can be processed in parallel, this construct makes it effortless. You can find out more about parallel collections in this Scala Days 2010 video. Even more powerful is the akka library, which builds concurrent systems that are, among other things, transparently remoteable. Scala is the best solution for concurrent programming that I've seen, and it keeps getting better.

But Scala is so Complex! Scala does suffer from the mistaken idea that it's complicated, and for good reason. Many early adopters have been language enthusiasts who love to show how clever they are, and this only confuses beginners. But you can see from the code above that learning Scala should be a lot easier than learning Java! There's none of the horrible Java ceremony necessary just to write "Hello, world!" -- in Scala you can actually create a one-line script that says: println("Hello, world!") Or you can run it in Scala's interactive interpreter, which allows you to easily experiment with the language. Or consider opening a file and processing the contents (something that's also very high-ceremony in Java): val fileLines = io.Source.fromFile("Colors.scala").getLines.toList fileLines.foreach(println) (The "processing" in this case is just printing each line). The simplicity of the code required to open a file and read all the lines, combined with the power of the language, suggests that Scala can be very useful for solving scripting problems (also Scala has strong native support for XML). We can make some small modifications to create a word-count program: for(file <- args) { print(file + ": ") val contents = io.Source.fromFile(file).getLines.mkString println(contents.split(" ").length) } args is available to all programs and contains all the command-line arguments, so this program steps through them one at a time. Here, we split words at white space but Scala also has regular expressions. It is possible to write complex code that requires expertise to unravel. But it's totally unnecessary to write such code when teaching beginners. Indeed, if taught right a person should come away from Scala thinking that it is a simpler, more consistent language than the alternatives.

How to Learn All the Scala tutorials I encountered assume that you are a Java programmer. This is unfortunate because, as I've shown above, Scala could be taught as a first language in a much less-confusing way than we are forced to teach Java. But it does make it easier for writers to assume that you know how to program, and in Java. I found Daniel Spiewak's series of blog posts titled Scala for Java Refugees to be a very helpful starting point.

Next, I read Programming in Scala, 2nd Edition by Martin Odersky, Lex Spoon, and Bill Venners. Odersky is the creator of the language so this is the authoritative book to read. It definitely assumes you're a Java programmer and it's not a particularly introductory book but it's worth pushing through to give you a fuller perspective on what Scala can do.

The Scala language website has a nice set of free tutorials which I've been finding very useful.

I've also looked for different views by reading other books, for example Programming Scala by Venkat Subramaniam. Although this one definitely suffers from too much cleverness (in the first chapter he gives a very obtuse example, then tells you not to worry about it, then says to study and understand it) and should certainly not be your first book, I found the different perspective to be helpful.

After reading those, I took the Stairway to Scala workshop by Bill Venners (coauthor of Programming in Scala) and Dick Wall (leader of the Java Posse) in Ann Arbor in May. There's one coming up August 8-12 in San Francisco; you can find out more and register here: http://www.artima.com/shop/stairway_to_scala. This is not an especially introductory class; you should have programming experience -- ideally in Java, because that's what they refer to most -- and I strongly recommend reading Programming in Scala beforehand, and other books if you can manage it -- even if you don't understand it in depth, exposing your brain early will allow some concepts to become more comfortable. It made a huge difference for me to do as much study as I did before the seminar.

There are also three different "tents" at the "Scala Campsite" during the Programming Summer Camp. There are language features that I have only touched on here, or not covered at all. What I've shown should either give you the urge to learn and use Scala, or it will have you running back to the safety of your favorite language.

Talk Back!

Have an opinion? Readers have already posted 31 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Bruce Eckel adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Bruce Eckel (www.BruceEckel.com) provides development assistance in Python with user interfaces in Flex. He is the author of Thinking in Java (Prentice-Hall, 1998, 2nd Edition, 2000, 3rd Edition, 2003, 4th Edition, 2005), the Hands-On Java Seminar CD ROM (available on the Web site), Thinking in C++ (PH 1995; 2nd edition 2000, Volume 2 with Chuck Allison, 2003), C++ Inside & Out (Osborne/McGraw-Hill 1993), among others. He's given hundreds of presentations throughout the world, published over 150 articles in numerous magazines, was a founding member of the ANSI/ISO C++ committee and speaks regularly at conferences.

This weblog entry is Copyright © 2011 Bruce Eckel. All rights reserved.