March 20, 2011 — Mario Gleichmann

Welcome to another episode of Functional Scala!

What do you know about Frogs? Well, i mean beyond the most common facts you learn from books. One way to learn more about Frogs might be to dissect one, as you may have done back in school. That would be the analytic way. But there’s a better way: if you really want to learn about the nature of a frog, you should build one! By building a being that has froglike characteristics, you’re going to learn what makes a frog a frog and how frogs are adapted to their particular environment. It’s a perfect example of learning by synthesis!

Well, as this isn’t a series about biology but about Functional Programming with Scala, we’re going to focus on another research object, which will be list-like data structures in a functional environment. We’re going to construct a simple, functional list type and explore its characteristics along the way. So let’s start and play Tinkerbell …

Functional vs imperative data structures

If you’re coming from a more imperative background, chances are good that you already heard of a List. Unfortunately, they are also implemented in an imperative way. What does that mean? Take a look at the following examples in Java:

List primes = new LinkedList(){{ add(3); add(5); add(7); }}; List oddNums = primes; ... oddNums.add( 15 ); primes.add( 2 );

Uhm, what happened? You can’t do that! Well, i shouldn’t do that, but i’m allowed to! What we have here is a nice example of sharing a single list instance. Holding more than one reference to a single list isn’t bad as such. But it might get really, really bad when performing so called destructive updates in an unproper way! In this case, an update to oddNums will destroy the initial structure of the original list – the old version of the data structure won’t be available any longer. There’s simply only one version of that list instance, which gets updated by assignment. In doing so, we just triggered a kind of side effect, since primes also refers to that one and only, mutable version of our list! A data structure which only supports a single version at a time – even across updates – is called ephemeral and it turns out, that most imperative data structures are ephemeral.

While reading the last section, this should set off some alarm bells in your mind when thinking in a more functional way. There’s simply no mutable state, since there isn’t anything like (re-)assignment, hence no destructive updates and also no side effects at all. Just look back at our algebraic datatypes we produced so far. There, we also can’t mutate a given Shape for example:

sealed abstract class Shape case class Circle( radius : Double ) extends Shape case class Rectangle( width : Double, height : Double ) extends Shape ... val rectangle = Rectangle( width = 5, height = 2 ) val anotherRectangle = Rectangle( 10, 5 )

Once we created a certain version (or value) of a shape, it stays forever – just immutable. But how can we ever provide a functional version of a list type which allows for updating their content then? Again, we only need to look back at the very basic characteristics of functional programming: all values which ever exist are just … values (in contrast to variables, which refer to a specific area within memory, which is occupied by a value and might be changed in place). So any operation upon a specific value can only produce or result into another value. For a first grasp, just watch again our well-known friend:

val scale : ( Int, Shape ) => Shape = ( times :Int, shape :Shape ) => shape match { case Circle( r ) => Circle( times * r ) case Rectangle( w, h ) => Rectangle( times * w, times * h ) } val rectangle = Rectangle( 10, 15 ) val anotherRectangle = scale( 2, rectangle )

Aaahhh, by scaling a shape instance, we didn’t mutate the given one in place, but produce a new instance, which is kind of derived from the old one. The same should be true for lists: an operation upon a specific list value might result into a new list value. In other words: updating a certain version of a functional list should produce another, new version of a list without destroying the old one! Such functional data structures also got a fancy name, to contrast them from their ephemeral relatives: a (immutable) data structure which supports multiple versions at a time is called a persistent data structure.

A persistent List structure

So how could we achieve a persistent list? First of all, the above examples might give us a first hint: in a functional environment, it’s common to characterize values of a certain type by defining and finally using some value constructors of an algebraic datatype. Now, we need to think hard about the possible structure of a list (or any sequential data structure if you will) and how to represent them as an algebraic datatype. Remember our last example, where we introduced a datatype for representing an infinite set of algebraic expressions? There, we defined some value constructors for atoms (our basic building blocks) and operations, which recursively act on sub-expressions. What about a list? Could we also identify some atoms and some recursive structure, say for a list of integers? Maybe there should be a representation for an empty integer list. So let’s start with that:

sealed abstract class IntList case object EmptyIntList extends IntList

Ok, now we’re able to create an empty integer list, using that singleton case object EmptyIntList as our first value constructor, hurray! Now comes the interesting part. Think of EmptyIntList as the atom of our list type. Like with our arithmetic expressions, can we create another list which uses the empty list just as a component for creating another, new list instance (just like we used two integer literals and created a new instance of an Add expression)? Well, then we might add another value constructor which just takes our empty list (as a representative of an IntList) and another integer value and say that this is a valid list, too. Observe:

sealed abstract class IntList case object EmptyIntList extends IntList case class NonEmptyIntList( hd :Int, tl :IntList ) extends IntList

This second value constructor just takes an arbitrary integer value and an arbitrary, existing list and composes both into a new list instance. If you look carefully at the structure of that value constructor, you might compare the construction of such a list instance to kind of prepending the new integer value to that existing list:

val intList = NonEmptyIntList( 3, NonEmptyIntList( 2, NonEmptyIntList( 1, EmptyIntList ) ) ) ... val anotherIntList = NonEmptyIntList( 4, intList )

Under this point of view, you might wanna identify the given integer value as the head of the new list and the already existing list as the tail of the new list. Note, that we didn’t destroyed intList while constructing anotherIntList! It looks like we prepended another integer value to intList, but the original structure stays untouched. We’ve only used intList to be the tail of that newly constructed list. And since both lists are immutable, we can be very confident that no evil rascal might destroy our list instance. Again, this characteristic also received a snazzy name with which you might swank in front of your team members from now on: ist’s called structural sharing.

Going polymorphic

So far, we’re only able to produce lists of integer values. If we would like to build another list of say string values, we couldn’t use our IntList! Because it would be pretty dense to come up with an individual list-like data structure for every type we wanna hold within that list, we might consider another, better solution! In fact, we’re able to leverage parametric polymorphism again and abstract over the type of the lists content:

sealed abstract class Lst[A] case class EmptyLst[A] extends Lst[A] case class Cons[A]( head :A, tail :Lst[A] ) extends Lst[A]

Aahhh, by introducing parametric polymorphism, we’re able to produce lists of different types. We just added a type parameter to our new type Lst (remember, it’s a very simple implementation – for a full blown list, we’re going to add that missing i). In that case, our value constructors need to provide that polymorphism and therefore be polymorph in the type of its elements, too (since the concrete type isn’t chosen until we define a certain list instance). Also note that we’ve renamed our second value constructor to Cons, as its a widely used name (for constructing a new list). Let’s take a look at building some list instances of different types:

val intList :Lst[Int] = Cons( 1, Cons( 2, Cons( 3, EmptyLst[Int] ) ) ) val stringList = Cons( "a", Cons( "b", Cons( "c", EmptyLst[String] ) ) )

Hm, anything annoying so far? By adding that type parameter, our list-like data structure become a bit more complex: take a look at our atom, the empty list. We needed to change it from a singleton case object to a case class, in order to provide polymorphism. In that case we also needed to come up with a different instance of an empty list for every individual type of the lists content, e.g. an EmptyLst[Int] and an EmptyLst[String]. Since our data structure is immutable, there’s really no need to have more than one instance of an empty list (since all lists we ever create might refer to the one and same immutable empty list object as their final tail).

Is there a way to get back to the empty list as a single case object and provide parametric polymorphism? Yes, there is. It turns out that Scala provides a ‘special’ type Nothing, which praises itself as the sub-type of all types. Come again? Well, in almost the same manner as Any is the super-type of all types, Nothing can be considered as the sub-type of all types. You build a new type? Nothing is a subtype of it – it sits always at the bottom of your type hierarchy in Scala. So what should speak against leveraging that fact into our list type, like this:

sealed abstract class Lst[A] case object EmptyLst extends Lst[Nothing] case class Cons[A]( head :A, tail :Lst[A] ) extends Lst[A]

Wow, the compiler seems to be statisfied with that. EmptyLst is again a singleton case object, extending Lst[Nothing]. And since Nothing is the sub-type of all types, so for our yet parametric type A, too. But take a look at the compiler if we now try to come up with a concrete list instance:

val intList = Cons( 1, Cons( 2, Cons( 3, EmptyLst ) ) ) // error: type mismatch; found EmptyLst.type ... required: EmptyLst[Int]

Hold on, what’s that? Didn’t we just say Nothing is the sub-type of all types, so also for type Int? Yep, we did! And that’s why the compiler didn’t complain when declaring EmptyLst that way! But what’s that error message sayin’ then? Well, in this case, we simply didn’t allow the tail of a list to be of a subtype of A (which is Int in that case) when applying tail to our value constructor Cons. This is due to the fact, that we declared our list type to be invariant in our type parameter A (which i’m not going to explain in this episode, since there are many good ressources out there explaining type variance in Scala). So what’s left to do is to declare our type to be covariant in type parameter A and we’re done:

sealed abstract class Lst[+A] ... val intList = Cons( 1, Cons( 2, Cons( 3, EmptyLst ) ) ) val stringList = Cons( "a", Cons( "b", Cons( "c", EmptyLst ) ) )

See that + before our type parameter A? It’s the sign for telling Scala that our list type will behave covariant in A. Now constructing some list instances with different types – all refering to our singleton empty list – shouldn’t be a problem anymore! Wow, it seems that we’re now constructed a full blown persistent, polymorphic, recursive, list-like algebraic datatype.

Summary

Hey, we did not bad as Tinkerbells! We’ve just accomplished the first step on our way to understand the deeper meaning and characteristics of persistent list-like data structures! We started with a comparison of ephemeral, destructive data structures (which are widely used in the imperative world) and so called persistent data structures, coming from the functional world! While ephemeral data structures are mutable and therefore change in place over time, a persistent data structure can be seen as an immutable value which might result into another independend value when operating upon the orginial one. The new value might be seen as a consecutive version of the original one, which then co-exist in parallel, maybe exploiting some structural sharing.

On our way to set up an appropriate list type, we saw that parametric polymorphism may be a good option to come up with a single algebraic datatype which abstracts over the type of the lists content. By introducing a type parameter, our list type became a member of what’s called a type constructor: from now on, we need to give a concrete type (for the lists content) in order to get a proper list instance. That’s what usually happen when you move from a monomorphic type (like IntList) to a polymorphic type: our Lst type alone isn’t viable anymore. We first need to do type parametrization in order to get to a full blown type (like Lst[Int]).

We’re by far not finished yet. What about ‘updating’ or ‘replacing’ a single element within a given list? What about the concatenation of to lists? Can we still rely on structural sharing? And what about the cost of those operations? Is there a way to put some syntatctic sugar to our list type, say when constructing lists or do pattern matching on them? As these are all valid questions, we’re going to answer them within the next episode, while also taking a closer look on some basic and more advanced functions on lists. So don’t be afraid of the frog …