January 30, 2011 — Mario Gleichmann

Welcome to another episode of Functional Scala!

This episode is the first part of at least two episodes we’re gonna take a deeper look at what’s called Algebraic datatypes and how you could build some of them on your own in Scala. An algebraic datatype reflects the idea, that a type is – loosely spoken – a set of certain (data) values. So in it’s very simplest form, an algebraic datatype can be defined by giving an enumeration of all the individual data values for a given type (of course you always also need to give the type an appropriate name).

Let’s start right away by defining an own enumerated type. We need a type which consists of a finite set of values (otherwise it would be hard to enumerate them all, right?). Well, we all know that Scala already provides a type Boolean. For demonstration sake, let’s redefine that type as an algebraic datatype and call’em Bool. First of all, the single values of an algebraic datatype are only data – no behaviour at all – maybe that’s why they called datatypes …

In order to give you an impression how easily they can be defined in a pureley functional Language, let’s take a look at how Bool is defined in Haskell as an algebraic datatype and then define Bool in Scala afterwards:

data Bool = False | True

Wow, that was pretty short. What you saw was the introduction of a new datatype by using Haskell’s keyword data followed by the types name Bool. After the equals sign starts the enumeration of all values. In fact they are called value constructors because they’ll construct the individual values of the given type. Bool is a nice example of a so called sum type, because it can be characterized as the sum of all individual and unique values, which we just enumerated.

In Scala, you can define an equal algebraic datatype by using case classes. The fact that we’re using case classes will be more deeply motivated within one of the following episodes (right after we’ve grasped algebraic datatypes), where we’ll going to destruct values of a given datatype, using pattern matching (in fact, the idea of pattern matching is deeply connected to the idea of algebraic datatypes). For now, let’s see how we can define and use (or construct) algebraic datatypes in Scala. it goes something like this:

sealed abstract class Bool case object True extends Bool case object False extends Bool

Because Scala is a hybrid OO – functional Language, it doesn’t support algebraic datatypes in a direct way (like in Haskell). Instead we’re using some constructs out of the OO world – first and foremost the concept of classes and objects: our new type Bool itself is introduced by the abstract class Bool. You may wonder why it’s defined abstract. That’s because we don’t want to see an instance of that base class at runtime. Class Bool is only for introducing the type, not its values. All single values of that type are then defined by a single case object for each of them. Why objects, not classes? This time, it’s the other way round: they are representing the single values, not a certain type – and as long as there’s only one instance for every and each value within a type, they’re defined as singleton objects.

Wow, our first algebraic datatype in Scala! Now lay back and relax … unless you’re asking yourself how you’re able to work with them, now that Bool‘s in the world. Well, working with them means using some functions whose outcome is based (calculated) on Bool. For a first simple use case, let’s consider some boolean functions, based on our new type Bool:

val not : Bool => Bool = ( b : Bool ) => if( b eq True ) False else True val and : ( Bool, Bool ) => Bool = ( a :Bool, b: Bool ) => if( a eq False ) False else b val or : ( Bool, Bool ) => Bool = ( a :Bool, b: Bool ) => if( a eq True ) True else b ... println( and( or( True, not( True ) ), not( False ) ) )

Two points here: Since our values are defined as singleton objects, we’re able to compare those values by using the case objects inherited method eq (we’re living in a OO – functional world, i told you so!). Secondly, it really looks a little paradox to compare two values of type Bool using eq only to receive a value of type Boolean which is then used to decide the if expression. Trust me – as we’ll see, we really only need pattern matching to get rid of eq and that if expression (and hence of Boolean). It will turn out, that with pattern matching on hand, we really could write our own polymorphic if function (that we’re gonna do when we’ve looked at lazy evaluation).

Furthermore, you may have asked yourself, why we’ve sealed our case class (ok, it’s the third point): Consider the fact we didn’t. So everybody could expand our datatype and add another value, say Maybe. Now what about our neatly defined functions? Since they only know and care about the values True and False, their calculation is now kind of incomplete (similar to the visitor pattern in OO world, which you might use best with a stable type structure and some volatile (mostly unpredictable increasing) functions on that type structure). So sealing the case class ensures that those single values of that certain type might not be extended in a chaotic, random way, compromising all functions, based on that type!

Using if expressions in order to compare or match single values of a certain type quickly become really cumbersome, especially if there are more than two values. Watch the following algebraic datatype Season, which consists of four different data values:

sealed abstract class Season case object Winter extends Season case object Spring extends Season case object Summer extends Season case object Fall extends Season

Ok, that was also kind of quick. Nothing new here. Just a new type as the base case class and four case objects, representing the different values. Now let’s think of a function, which returns the next Season for a given one:

val next : Season => Season = ( s :Season ) => if( s eq Winter ) Spring else if( s eq Spring ) Summer else if( s eq Summer ) Fall else Winter

Writing that function wasn’t that quick anymore. Asume we need to write some more functions based on Season. Matching those single values may be inevitable, but in some other cases it may be sufficient to exploit the fact, that the values of an enumerated type are finite. Uhm, ehm, c’mon guy – what do you mean? How can we take advantage of that fact? Well, it turns out, that you can count all values of a finite set, right? And so, you could give each value an individual number. And so you could present a function which delivers the individual number for each value. And you could also present a function which delivers a distinct value for a given number (this function might be only a partial one, since we couldn’t provide a value for any number). Observe:

val toInt : Season => Int = ( s :Season ) => if( s eq Winter ) 0 else if( s eq Spring ) 1 else if( s eq Summer ) 2 else 3 val fromInt : Int => Season = ( i :Int ) => if( i == 0 ) Winter else if( i == 1 ) Spring else if( i == 2 ) Summer else Fall

Admitted, the given functions are not corner proof, but they will serve our purpose. But beside of that, what gives? Well, if you can give a definition for converting the values of a given type into an individual number and vice versa, you’re able to build your logic upon those numbers, not the values itself. And this may come very handy in some situations (the ‘trick’ is to find the right mapping between the values and numbers). Let’s take our function next and rewrite it using the corresponding numbers:

val next : Season => Season = ( s :Season ) => fromInt( ( toInt( s ) + 1 ) % 4 )

Aaaah, now it’s clear. We utilized the fact, that the corresponding numbers are in the right order. So we could produce a function which’s far more concise. Judge for yourself if it’s still that readable …

Summary

This was our first part to the introduction of algebraic datatypes and how they can be implemented within Scala by leveraging case classes (and objects). We’ve seen a special (most simple) form of algebraic datatypes – enumerated types. In this case, all values of that type could be enumerated one by one.

We’ve also seen, that writing functions which are based on algebraic datatypes seemed kind of cumbersome, since we tried to match a certain value (or a combination of some given values) by using eq and if expressions. Don’t worry – we will encounter a powerful technique called pattern matching which will allow us to deconstruct (whatever that means by now) any given value of a certain algebraic datatype, helping us to write more condensed and readable functions!

Of course there are situations, were you can’t give an enumeration for all possible values, especially when the underlyining type leads to a potential set of infinite values. Though, these kind of types can be described as algebraic datatypes as well. This will be the topic for the next episode. Hope to see you soon …