This document is intended to provide an educational guide for beginning developers working in Scala, to enhance their understanding of both Scala collections and monads in order to improve code quality and developer fluency. We hope that this guide is both educational and inspirational. Mastering monads along with Scala collections will lead to much more robust, idiomatic and functional Scala, and is a heck of a lot more fun than reinventing the wheel when these two abstractions are perfect for the problems that they help to solve.

In order to get the most out of this series, readers should have a basic understanding of functional collections in Scala (or a Java-based library like VAVR), specifically operations such as map , reduce , and flatten .

We’re going to introduce monads along with Scala collections because of how important monads as a concept are to Scala’s internals. Scala collections and related containers are built around monads, and share a number of properties (and methods) because of this. By understanding the basic building blocks that Scala collections are constructed from, we can gain an intuitive understanding of how they work and more naturally solve problems in our day-to-day code. This will especially help developers who are working with Scala for the first time, e.g, developers learning Scala for data and machine learning engineering using tools such as Spark and Flink.

While we’re talking about monads here in the context of containers, there’s a wider domain of application that includes things like keeping track of state or performing I/O, in a functional programming paradigm. However, for the purposes of this introduction, we can use a container model to develop a preliminary understanding of monads.

A note on nomenclature: When we talk about functions, we will often use the => notation to describe the structure of a function. For example, A => B means a function that takes a parameter of some type A and returns some other type B . A and B are generic placeholders for whatever the actual types might be, like String or Int . We might also denote multiple parameters using parentheses. For example, (A, B) => B indicates a function that takes two parameters of different types, and returns a value that is the same type as the second parameter.

MONADS!

Monads are one of the most useful functional programming concepts with arguably the most dismal marketing behind them. The mere mention of a monad is enough to cause panic and a hasty retreat back to unnecessarily bloated code. Let’s demystify what they are and how you can use them to improve the comprehension of your code, especially any of your code built around Scala collections.

Monads have a deep basis in the mathematical side of computer science, coming out of category theory. It may be interesting to do a deep dive into the mathematical background theory of monads, but it’s not necessary in order to become familiar with the core concepts. This section will therefore present a condensed and simplified take on monads, specifically geared to using them while working in Scala, and is not intended to be strictly mathematically correct. A relatively superficial “pattern-like” understanding of what a monad is and how it can be used is sufficient to be productive and write more elegant Scala code.

Definition

Monads are containers. That means they contain some sort of elements. Instead of allowing us to operate on these elements directly, the container itself has certain properties. We can then work with the container and the container works with the element within. This will help us to create more concise code; as mentioned, Scala collections uses monads internally for this reason, such as Option[T] .

In Scala, we define the contents of these containers in terms of types, so the content of a monad can be any valid Scala type. A monad might contain one or several elements, or none, depending on how it’s defined.

When we create a monad around a value, we say that we lift the value into the container.

The unit function of M “lifts” a value of type A to a type M[A] by wrapping it in an M.

Monads have certain properties:

A parametrized type — e.g, Option[T] Unit (return) — e.g, Option.apply FlatMap (bind) —e.g, Option.flatMap

We’ve used Option as an example above because Scala’s Option type is a monad, which we will cover in more detail soon. But first, let’s expand on what Unit (return) and FlatMap (bind) mean.

Unit is defined as: A => M[A]

In other words, put an A in an M . If we stop here, we have nothing more special than a wrapper! There’s nothing particularly complex about wrappers, and even in Java they’re used all the time. The parametrized type is important. If we define Option[Int] then Unit is Int => Option[Int] .

In Scala, Unit is usually the apply method, a special method that doesn’t need to be named. A.apply() is the same as A() which is the same as A{} . The curly brackets just let you use a multi-line expression as the parameter for apply. (This will become more important when we look at Try and Future .)

FlatMap is defined as: (M[A], A => M[B]) => M[B]

In order words, lift an A inside an M , and take a function that lifts and converts an A to a B inside an M , and then combine the two to get a B inside an M .

There appears to be a lot happening here, but in reality all we’re doing is defining a container type of M[A] that can do some kind of operation to produce and return an M[B] .

What about map ? Map is defined as M[A] => M[B] .

FlatMap knows how to do two things: it can apply a function to A , and it can flatten a nested container. When the function in question doesn’t lift A , the flattening is unnecessary, and we just need to apply the function.

Map is used when you have a function A => B , rather than A => M[B] and want to apply it to the A inside an M[A] . Map is not one of the defining properties of monads, however, because it’s technically just a special case of FlatMap.

A lifting function like Unit will wrap its object in a container, even if that object is itself the same type of container. Using flatMap will avoid adding this extra wrapper.

Benefits

Let’s take a step back and look at what the point of this is. Often we have complex tasks that have many steps, and this can result in very complex code. We might have callbacks, multiple functions that pass results back and forth, conditional trees, or other sticky kinds of patterns. Sometimes we have to almost manually keep track of the data we want to perform these operations on, acting as a kind of switchboard operator, passing an object to one function, only to catch the result and pass it off to another function. Rinse, repeat.

We often want to take a complex problem and break it down into manageable tasks, and then put those tasks back together to solve the problem. To do this, we need some kind of structure that will put this process “on rails” so that we’re reducing complexity by breaking down the problem, rather than increasing it by adding more and more independently moving parts. We need a kind of assembly line, where each part of the process can take care of its own business, and then pass the work to the next part. We want to break down our problem into a number of small functions, and then compose those functions together into one master function that represents the entire task.

Composing functions is the reason why we want to use monads and should care about monads. We can create a container that can perform actions on its contents, and then use that container as a way to manage those contents and perform the actions we need, in the order we need to, encapsulating the whole operation through function composition. Instead of handling the objects and outputs of functions, we put the objects into a container, and then write up a manifest of all the functions we need done to it. Then, like an assembly line, we put the raw materials in one end and get the finished product from the other end. When we create a Monad, we’re essentially building an assembly line out of individual assembly line components (functions) that can handle raw materials of various types. When we execute our code, we’re feeding raw materials (values) into the assembly line and watching it operate, eventually spitting out a finished product.

After lifting A into M, we can use map to perform a function on the elements of M. Then we can map again to perform a function on the product of the first map. This can be chained as many times as we like.

Working with map and flatMap might feel familiar to anyone who has worked with the pipe operator in Unix-based languages, which takes the output of one command and applies it as a parameter for the next command. Multiple pipe operations can be chained together into a larger function, and likewise map and flatMap can be chained to perform operations in sequence on the output of each step.

Exploring monads in Scala collections

In order to ground the concept of monads into something concrete, let’s explore how Scala collections behave and how they provide the monadic properties we discussed earlier: parametrized type, return, bind.

The following is not intended to be a deep dive into creating your own monads. For now, let’s get used to monadic data structures by getting used to the ones that already exist in Scala, and why they have been implemented as monads.

Scala collections are monads (by our definition), which gives us access to some straightforward examples.

List

Let’s get started by exploring how and why List in Scala is a monad.

To try these examples yourself, install sbt and launch your sbt console. This will give you an interactive Scala environment to follow along.

We can see that List.apply(1) and List(1) mean the same thing. They both represent the Unit function, which lifts the value 1 into a List .

Recall that Unit is A => M[A] . Specifically in this example, it’s Int => List[Int] . This means we can lift a value of Int into the List[Int] monad.

Let’s begin to explore these monadic properties deeper by understanding how the List monad would operate without FlatMap.

Our function, makeListofDoubles(...) takes an Int and returns a List[Double] . That means that if we were to map that function onto the list, it would return a list of lists, like:

FlatMap is the next essential step in order to flatten this structure by compressing the List[List[Double]] into a List[Double] . This is what we want rather than a deeply nested List of List (of List of List of List …). We can see here why map is derived from flatMap. It’s because flatMap knows how to map and flatten, but map only knows how to map.

For-comprehensions

A for-comprehension looks superficially like a for loop in other languages, but it is actually just an alias for a chain of flatMap and map . (As a matter of fact, the for-comprehension is literally re-written by the compiler into a flatMap/map chain, which is good to know if you’re trying to debug your code and need to know exactly what’s going on.)

However, the for-comprehension is much easier to understand at a glance than the nested flatMap and map operations. For-comprehensions work because of the properties of monads, and it serves as a concrete example of how monads can improve the readability of code.

In the example above we have 3 monads: m1 , m2 , and m3 . We extract the contents of those monads: a , b , c . And what do we yield? A monad of the same type containing the result of combining a , b , and c .

A for-comprehension always returns the same type of container we started with, and we can see why that is when we notice that we’re just flatMapping across multiple containers, and finally mapping the last container. The types need to align so that the values can be flattened into a single result.

Option

The simplest possible monad is a container that holds one element. In Scala, this is Option[T] . It either contains one element, or it contains zero elements.

Remember that a monad also requires a parametrized type. For example, an Option[Int] can only hold integers, and an Option[String] can only hold strings. All monads have type parameters in this way, and you might have noticed the type parameters on the Lists in the examples above.

Option has two subtypes: Some and None , which instantiate the abstract Option type. If an Option contains a value, it is Some(value) . Otherwise, it’s None .

This means that you can build an Option in 3 ways:

Notice the types. The first type is Option[String] , as expected, and the concrete value is Some(“hello”) . The second val is Some[String] , which is a specific child type of Option[String] since we were specific when defining it. However, the third val is None.type , which is a special type that’s equivalent to Option[Nothing] .

Nothing

Nothing is a special type that is a subtype of all other types. That means that if a container can hold an Int , it can hold Nothing . If it can hold a String , it can hold Nothing . Nothing can literally be used in place of any type, although it has no properties.

What this means is that if a val has type Option[A] , it can hold a value of type A or Nothing . This makes it possible for None to be the “empty” Option for all Option types.

We mention this here because this is a great concrete example of how Scala’s type system provides power and flexibility. When you begin to create your own monadic data structures, you’ll be able to leverage Scala’s type system to similar effect. However, a discussion of advanced types along those lines is beyond the scope of this guide; we’ll introduce more of these concepts in a future article.

map, flatMap, and success bias

Options are monads, so we can use flatMap (and therefore map) with them:

However, if we try to map on a None , we get None :

This is because Options are success-biased. That means, if map successfully finds a value in the Option , it executes the map function. Otherwise, it doesn’t. In fact, the code inside the map function is not executed at all, including any side-effecting code like println() . This observation can be used for debugging purposes.

Success-bias is also true in for-comprehensions:

get, getOrElse

Sometimes we have an Option[A] and we need an A . We can get the A by calling Option.get . However, this is unsafe. If there isn’t actually an A in the Option , it’ll throw an exception:

Because of this, we try to avoid using .get wherever possible. Instead, we can use Option.getOrElse . This returns the contents of the Option if available, or a default value if it’s not.

This converts the Option[String] to a String safely, given the requirement that we need to be able to come up with a default value that makes sense.

Pattern matching

An additional way to access an Option is through pattern matching. This works with most container types, including the standard collection types. Pattern matching allows us to deconstruct a container and separate out its contents as a named symbol.

If we end up with a Some , then we can assign a name, “s” to the value contained in the Option , and either return that value or do something with it and return some other String .

It’s important to note that match doesn’t just convert Option[A] => A like getOrElse and get do. It can convert Option[A] => B where B is any type we like. All that is required is for every case clause to individually return B .

Other Collections

Besides List , there are a number of other collections in the Scala standard library. The most commonly used of these are Map , Set , Vector , and Stream . Each of these collections implements its own version of Unit (i.e. apply ) and flatMap , and we can therefore treat them as monads, allowing us to use for-comprehensions with them.

Map

A Map type (not to be confused with a map function) is a collection of key-value pairs that allows the values to be accessed by providing one of the keys.

The important thing to remember about Map is that despite the idiosyncratic way of accessing values, the elements are tuples where the first member of the tuple is the key, and the second the value. This means that operations that map or flatMap over a Map will assume tuple parameters and involve the keys in the operation.

Usually, we only want to interact with the values of a Map , and because of that Map has a special mapValues method, which allows you to perform map operations on the values as though they were in a sequence by themselves but keeps them associated to their keys in the resulting Map .

Set

A Set is simply an unordered collection containing only unique values. Attempting to add an element to a Set that already contains it will return the original Set . The order of elements in a Set is never guaranteed, although it may appear to be reliable at first glance.

Vector

Use of List is very common in Scala, but it can sometimes be inefficient for random access because the time complexity of accessing an element is O(n) . Scala provides an alternative collection, Vector , that is optimized for random access by storing its elements in a tree structure that has little memory overhead. All operations on a Vector happen in effectively constant time, which means that for large collections it can be significantly more efficient than List.

Stream

A Stream is essentially a List with an undetermined number of elements, which are computed lazily. A “lazy” value is one that is only evaluated at the time it’s required.

We’ll demonstrate that with an example below. Because Stream elements are lazy, a Stream can have an infinite size, because we don’t need to immediately evaluate an infinite number of elements, just define how we’d evaluate them if we need to.

Note that the #:: operator is just a Stream constructor that ensures that the elements to the right are appended lazily. In this case a is the head of the Stream, and fib(b, a+b) is the recursively defined remainder.

If we were to just call fib(1, 1) it would result in an infinitely long Stream, since it would equal the entire Fibonacci sequence. By using the take method, we can grab only the first few elements. However, note that this doesn’t actually cause those elements to be evaluated, because despite knowing how many elements there will be, we don’t need to know what they are until we convert the Stream to a List. So our first result is Stream(1, ?), meaning that we know the head, but don’t know the rest of the Stream because it hasn’t been evaluated yet.

Once we evaluate the Stream elements, they stay evaluated. When we look at the stream after deriving the List from it, we can see that it contains all the expected elements, because they had to be evaluated in order to generate the List. Note that this difference in the way the Stream is displayed does not mean that the Stream is mutable. Its contents never changed. We just didn’t know what they were.

More Collection Methods

The standard collections have a large number of useful methods on them that generally take care of some very straightforward and common operations on collections. However, a few of them are worth calling out in the context of functional programming, because the way they operate is not always obvious to someone who is new to the language and used to more imperative style.

Fold

Fold is a way to reduce a multi-element collection down to a single element by applying a combining function. A simple example of how fold works is to think of a list of numbers and a sum function. The function is applied to each number in the list until all of them are added up.

Note that fold takes two parameter lists. The first is a “neutral element”, meaning that we can perform the fold operation any number of times with this element and it won’t change the final answer. For addition, this is 0. The second parameter list contains the fold function, which is a binary operation that itself takes two parameters, the elements being added together. Here we’re using the underscore notation to simplify the syntax and pass an anonymous function, but we could pass in any function that takes two parameters of similar type and returns a result of that type, like so:

It’s important to be aware that fold performs its operation on the collection elements in an arbitrary order, and not necessarily the order in which the elements appear in the collection. Fold is appropriate if the operation is commutative, e.g, multiplication and addition such as 1 x 2 = 2 x 1 .

Now, we can’t say the same thing about subtraction and division as these operations are not commutative, e.g, 1–2 ≠ 2–1 . For operations that are not commutative, maintaining a specific order of terms is important. When the order is important, we can use the foldLeft and foldRight methods instead.

These carry out fold in a definite order: foldLeft starts with the first (i.e. leftmost) element and works to the last, while foldRight starts with the last element and works to the first. Both take a neutral element to start this process with. Interestingly, foldLeft and foldRight do not need to end up with the same type as the collection originally contained. For example:

There is another method, reduce , which is best understood as a special case of fold that does not require a neutral value. It just operates on the collection elements. However, if the collection is empty, it will throw an exception.

Foreach

For a developer coming out of other languages, foreach might have a somewhat misleading application. It takes a function as a parameter, and performs that function on every element of the collection, then returns Unit .

This does not mutate the collection, as you might expect if you are familiar with using for-loops in imperative languages. Rather, it’s intended to generate a side-effect for each element in the collection, such as printing a log, persisting values to a data store, or performing some other I/O.

Foreach therefore should be thought of as changing something about the world, but not changing the collection itself. This is why it returns Unit , which is an empty value. It is a common practice in Scala to return Unit when performing an operation that creates a side effect, because returning a value might obfuscate the fact that a side effect occurred, which should always be obvious in otherwise side-effect free functional code. However, given that we often want to validate that side-effects occurred successfully by returning some marker to indicate success or failure, foreach is used less often than one might expect, and side-effecting code is often concealed inside a map instead.

GroupBy

The groupBy method partitions a collection into subsets based on some discriminator, and is somewhat similar to the way GROUP BY works in SQL. It generates a Map where the key is the discriminator, and the value is a collection of every member of the original collection that matches that discriminator. For example:

The map is made by grouping all elements that return the same result from the discriminator. In this case, the result is the first letter of each string. The derived map is keyed on those letters, so that accessing a letter in the keyset will return the list of strings beginning with that letter.

This is actually a very powerful and flexible method when you consider that the map can be keyed to anything that the discriminator function can return, including complex objects.

Next In This Series