November 21, 2010 — Mario Gleichmann

Welcome to the fifth episode of Functional Scala!

Have you ever heard about set comprehensions? Well, if you ever took a course in mathematics, you surely have. But don’t fear if not! It’s simply a mathematical notation for specifying a set of elements by stating the properties that its members must statisfy. If this sounds too abstract to you, let’s take a look at some simple examples:

{ x | x ∈ N : x == x² }

What we have here is a description for a set of values, coming from the domain of natural numbers, whose square value needs to be equal to the value itself. If you think hard, you’ll discover quickly, that there are only two natural numbers which statisfy the given constraint. So the above given set comprehension is a specification for the set {0, 1}. Nice, isn’t it? I know it’s not, since the set comprehension takes more space than writing down the set directly! Well, there might be cases, for which enumerating the whole set of elements may take some more time and space than using a set comprehension. Take a look at this one:

{ x | x ∈ R : x > 0 }

Yep, it’s the set of all positive real numbers. Now, have a nice time in writing down this set directly – may would take some space … Ok, that’s all fine and good. But what’s that all to do with Functional Programming in general and Scala in particular? Well, it turns out that most Functional Languages provide a form of so called list comprehensions in one or the other way. In contrast to a set comprehension, you don’t specify the elements of a set but instead of a … list.

So let’s get back to another example and see how we can express this one as a comprehension within Scala. Let’s try to specify a list of the first five square values. For the last time, we’re gonna express it firstly as a set comprehension:

{ x² | x ∈ N : x > 0 ∧ x < 6 }

Before we translate this one into a list comprehension in Scala, we may take a closer look at the different elements of a comprehension. What you see before the pipe is called the output function, with output variable x. The output function determines how to calculate the values of the resulting list. Behind the pipe you see a input set which prescribes the domain for the output variable, followed by some constraints which may further limit the given domain. It’s typically a set of constraints which must hold for every element that is considered a valid value for the output variable.

I have to admit that i’ve cheated in the last section. Just once more, i will give you another transformed version of the above set comprehension. But bear with me, it’s only for the sake of a simpler entrance into Scala’s flavour of comprehensions:

{ x² | x ∈ { 1 .. 5 } }

Ahh, see that some of the above constraints aren’t needed, if we use another set for characterizing the domain for our output variable? Under this point of view you might say that a comprehension is a specification for building a more specific set out of a more general set. Ok, that’s enough for building our first comprehension in Scala. You might wanna try to recognize the predefined elements of a comprehension. Let’s go …

for( x <- ( 1 to 5 ) ) yield x*x

I don’t cheated again, honestly! What might look like a for-loop at first sight, is a perfectly pure comprehension! Only the notation for the different elements might seem a little odd at first: clearly, x is our output variable. It’s denoted as the first element within the for-block, followed by the input domain for x. The correlation between the output variable and its input domain is expressed by this arrow between them, connecting both elements to each other. In our case, the input domain is build by that range ( 1 to 5 ) and is commonly called a Generator in Scala. In fact, every type which provides a method map (in the sense of a Functor, which we’ll investigate in a further episode), a method flatMap (in the sense of a Monad) and a method filter is able to act as a Generator within a list comprehension!

Input domain? Check! Output variable? Check! But where’s our output function which determines the shape of the resulting list? I think you’ve already seen that it’s the last part of the comprehension, following that yield. The output function is in fact responsible for what is yielded back as the result of the whole comprehension expression.

Multiple Generators

And it gets even better. Concerning output variables and its correlated Generators, you’re not restricted to just one. You’re allowed to bring as many output variables (and its related Generators) into the game as you want. For example, we could produce a cartesian product of all pairs in the range upto three like so:

for( x <- (1 to 3 ); y <- (1 to 3) ) yield (x,y) // (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)

And now spot the order in which the output variables are running through all possible combinations? Aye, it’s from left to right: before the first generator (for x, on the left) steps from one value to the next one , the following generator’s just running through its whole domain once. And that’s again for the next value of x and so on and so forth. For better comprehension, you may think in fact of two nested for-loops: the first generator’s working at the outer loop, while the following generator’s running at the inner loop …

Ahh, ok – and what if i want to retain only one of those pairs whose values are actually only switched in position, like (1,3) and (3,1)? No problem, mate! It turns out that you’re allowed to refer to a output variable which was defined by a previous generator (so to say within an outer loop). In this case, we simply refer to the current x within the range definition of our second generator:

for( x <- (1 to 3 ); y <- (1 to x) ) yield (x,y) // (1,1), (2,1), (2,2), (3,1), (3,2), (3,3)

And while you’re able to refer to the (intermediate) results of a previous generator within a following one, you can do quite some funny things to lists, since lists might act as generators, too. For example, think about a list of lists, each sublist containing some integer values. Let’s say we want to flatten that list of integer lists, so the result is one single list, containing all the integer values of all sublists. Yep, its easily achivable by leveraging comprehensions:

val flatten = ( xss :List[List[Int]] ) => for( xs <- xss; x <- xs ) yield x

Uh, oh, that’s all? Exactly, it’s simple as that! We’ve constructed a new list out of a given list of lists just by unpacking the lists: while the first output variable is traversing all sublists of the given list, the second generator refers to them and reveals the content of that sublists, which is in turn yielded by the output function.

Guards

So far, we’ve only created some comprehensions which were running through all elements of the given input domains (that is all values which were produced by the given generators). Looking back at set comprehensions, we’ve already seen that it’s possible to declare some constraints in order to filter the possible valid values for a output variable. It turns out, that Scala allows for so called Guards within a comprehension. It’s pretty the same, as guards also representing a set of predicates which must hold for all valid values of an output variable.

For a first example, let’s say we want to create a list of all natural factors for a given integer value. What might be the input domain? Well, how about all natural numbers between one and the value for which we want to calculate its factors? Ok, but they’re surely not all a natural factor for the given integer value (unless we restrict our function to the values of 1 and 2). Only those numbers are accepted as a factor, if we can divide the given value by that number without a remainder. Hm, looks like a reasonable constraint! So let’s write it down:

val factors = ( n :Int ) => for( x <- ( 1 to n ); if n % x == 0 ) yield x

As you can see, the guard’s simply declared right after the generator. Of course, you’re not restricted to one single guard. You’re allowed to add as many guards as needed (each separated by a semicolon) – just keep in mind that the predicates of all guards must hold for a value in order to be considered a valid output variable. For better absorbability, here’s another example, which filters all primes within a list of integer values:

val primes = ( xs :List[Int] ) => for ( x <- xs; allfactors = factors( x ); if allFactors.length == 2; if allFactors(0) == 1; if allfactors(1) == x ) yield x ... primes( List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) ) // List(2, 3, 5, 7, 11)

Ok, in order to decide which integer value qualifies as an output variable, we simply have to examine all its factors. If there are only two factors and if these two factors are 1 and the number itself, we have a prime. Sure, sure, but what about that local value allFactors? Well that’s another feature of Scalas comprehension notation. You can place as many local values as you want and refer to them afterwards, even within generators which may follow!

Putting all pieces together, let’s bring this episode to a close with a somewhat more complex example, which shows another use case of Scalas comprehension mechanism. Imagine a set of companies and another set of employees whithin the following scenario. For this, we’re gonna define two case classes (again, we’ll have a closer look at case classes when it comes to algebraic datatypes) Company and Employee and create a list of companies and another list holding some employees:

case class Company( val name :String, val region :String, val avgSalary :Int ) case class Employee( val name :String, val companyName :String, val age :Int ) ... val companies = List( Company( "SAL", "HE", 2000 ), Company( "GOK", "DA", 2500 ), Company( "MIK", "DA", 3000 ) ) val employees = List( Employee( "Joana", "GOK", 20 ), Employee( "Mikey", "MIK", 31 ), Employee( "Susan", "MIK", 27 ), Employee( "Frank", "GOK", 28 ), Employee( "Ellen", "SAL", 29 ) )

Now we are interested in all employees which are statisfying all of the following constraints:

only employees with age greater than 25

only employees working for a company in region “DA”

only employees with a higher salary than the average salary of the company she’s working for (given a salary which is calculated by the employees age times 100)

For all detected employees, we’re interested in the employees name, the name of the company she’s working for and the amount which she’s above the average salary of the company. Uhm, em, well … wait a minute. Does this scenario ring a bell? If you’re familiar with relational databases and Structured Query Language, you surely would have solved that little brainteeser in a blink. Ok, just pretend that our two lists of employees and companies were read from a database and know selection takes place programatically afterwards. But how to retrieve the desired employees? Hmm, maybe by leveraging comprehensions? Wow, not bad – how did you hit on that? In fact, you can see a comprehension as a kind of Query, which also features a select clause (the output function), a from clause (the input domains) and a where clause (the guards). So without any further ado, let’s take all the stuff we’ve learned so far and try to express our Query as a comprehension:

val result = for( e <- employees; if e.age > 25; salary = e.age * 100; c <- companies; if c.region == "DA"; if c.name == e.companyName; if c.avgSalary < salary ) yield ( e.name, c.name, salary - c.avgSalary ) println( result ) // List( (Mikey, MIK, 100), (Frank, GOK, 300) )

Wow, isn’t that magic? You’re right, no magic! It’s simply comprehensions applied to a special problem of querying. But if you look at it that way, you can pretty easily compare the individual parts to a well known Query. There’s even a counterpart to an inner join. Can you see it? Right, there at line 8, where we correlate the companies name between the company and the employee.

Summary

Puh, what a journey! If you’ve seen Scala’s for-notation as a better loop so far, i hope that you’ve gained a slightly other view on things now. As we said at the very beginning of this episode, it’s rather a mechanism for deriving some specific data based on some more general data. And again, there’s no mutable data in that game. Only some values which are going inside the comprehension and some other values which are going out as the result of the comprehension. Does this remind you on some basic principles within the world of Functional Programming … ?