Reading Time: 5 minutes

In this blog, We will be going to explore how we can use scala collections efficiently .

Though, we are taking care of immutability but still something more can be done to make your code more readable and faster.

List vs Vector:

Vector is a collection with good random access. List is indeed a linked list with very fast append-first operation ( :: ), but the links are unidirectional.

Scala Vector has a very effective iteration implementation and, comparing to List is faster in traversing linearly. if you are planning to do quite some appending, Vector is a better choice and List don’t work well with parallel algorithms.

Most efficient way to create ListBuffer:

Order by efficient:

val list = new mutable.ListBuffer[String]

val list = mutable.ListBuffer.empty[String]

val list = mutable.ListBuffer[String]()

First approach creates only one object ListBuffer itself but other two approaches create an instanceof scala.collection.mutable.AddingBuilder first, which is then asked for a new instance of ListBuffer. Therefore, first approach is very efficient.

Prefer TrieMap over any other mutable Map:

// before

val mutableMap = collection.mutable.Map[T,T]() // after

val immutableMap = collection.concurrent.TrieMap[T,T]()

If there is need to use mutable Map in multi-threading environment, use TrieMap. A Scala TrieMap is a trie-based concurrent scalable and immutable map implementation. Unlike normal trie maps, a Scala TrieMap has an efficient, non-blocking, O(1) time snapshot operation (and a slightly optimized readOnlySnapshot ) operation.

Creating empty collections explicitly:

List[T]() //before List.empty[T] //after

There are some immutable collection classes provide singleton “empty” implementations, however not all of the factory methods check length of the created collections. Thus, by making collection emptiness apparent at compile time, we could save either heap space (by reusing empty collection instances) or CPU cycles (otherwise wasted on runtime length checks).

Never negate emptiness-related properties:

//before

!list.isEmpty

!list.nonEmpty //after

list.nonEmpty

list.isEmpty

Simple built-in methods add less visual clutter than a compound expression. It is also applicable to Set, Option,Map, Iterator

Never compute length for emptiness check:

// before

list.length > 0

list.length != 0

list.length == 0 // after

list.nonEmpty

list.nonEmpty

list.isEmpty

Simple property is more easy to perceive than compound expression. And collections are decedents of LinearSeq which will take O(n) time to compute length instead of O(1) time for IndexedSeq. Therefore, we can speed up our code by avoiding unnecessary computations when exact value is not required. Besides that, we can never use length method on infinite streams but we can verify stream emptiness directly.

Never compute full length of collection for length matching:

// before

list.length > n

list.length < n list.length == n

list.length != n // after

list.lengthCompare(n) > 0

list.lengthCompare(n) < 0

list.lengthCompare(n) == 0

list.lengthCompare(n) != 0

Because computing length might be very “costly” for some collection classes. We can reduce comparisons time from O(length) to O(length min n) for decedents of LinearSeq. This approach won’t work with infinite streams.

Prefer length to size for arrays:

// before

array.size // after

array.length

While size and length are synonyms, in scala 2.11 Array.size calls are still implemented via implicit conversion, so that intermediate wrapper objects are created for every method call. Unless you enable escape analysis in JVM , those temporary objects will burden GC and can potentially degrade code performance (especially, within loops).

Don’t retrieve first element by index:

// before

list(0) // after

list.head

The second approach might be slightly faster for some collection classes. Additionally, property access is much simpler(both syntactically and semantically) than calling a method with an argument.

Don’t retrieve last element by index:

// before

list(list.length-1) // after

list.last

The second approach is more obvious and allows to avoid unnecessary length computation. Besides, some collection classes can retrieve last element more efficiently comparing to by-index access.

Don’t check index bound explicitly:

// before

if (i < list.length) Some(list(i)) else None // after

list.lift(i)

The second expression is more concise and semantically equivalent.

Don’t imitate headOption:

// before

if (list.nonEmpty) Some(list.head) else None

list.lift(0) // after

list.headOption

The optimized expression is more concise.

Don’t imitate lastOption :

// before

if (list.nonEmpty) Some(list.last) else None

list.lift(list.length – 1) // after

list.lastOption

The optimized expression is more concise and faster.

Be careful with indexOf and lastIndexOf argument types:

// before

List(1, 2, 3,4,5).indexOf(“1”) // compilable

List(1, 2, 3,4,5).lastIndexOf(“2”) // compilable // after

List(1, 2, 3,4,5).indexOf(1)

List(1, 2, 3,4,5).lastIndexOf(2)

indexOf and lastIndexOf accepts arguments of Any type. In practice, that might lead to hard-to-find bugs, which are not discoverable at compile time. That’s where auxiliary IDE inspections come in handy.

Don’t use equality predicate to check element presence:

// before

list.exists(_ == x) // after

list.contains(x)

The second expression is more concise and semantically equivalent.

Be careful with contains argument type:

// before

List(1, 2, 3,4,5).contains(“1”) // compilable // after

List(1, 2, 3,4,5).contains(1)

Just like indexOf and lastIndexOf method, contains also accepts argument of type Any, that might lead to hard-to-find bugs, which are not discoverable at compile time. Be careful with the method arguments.

Don’t construct indices range manually:

// before

Range(0, list.length) // After

list.indices

There is a in-built method that returns the range of all indices of a collection.

Don’t zip collection with its indices manually:

// before

list.zip(list.indices) // after

list.zipWithIndex

For one thing, the second expression is more concise. Besides, we may expect some performance gain, because we avoid hidden length calculation (which might be expensive for linear sequences). It will also work with infinite streams as well.

Don’t negate filtering predicate:

// before

List(1,2,3,4,5).filter(_ %2 != 0) // after

List(1,2,3,4,5).filterNot(_ %2 == 0)

The second expression is syntactically simpler (while semantically they are equal).

Don’t sort by a property manually:



// before

list.sortWith(_.property < _.property) // after

list.sortBy(_.property)

Second expression is more clear and concise.

Don’t sort by identity manually:

// before

list.sortBy(identity)

list.sortWith(_ < _) // after

list.sorted

There’s a special method for that, more clear and concise.

Perform reverse sorting in one step:

// before

list.sorted.reverse

list.sortBy(_.property).reverse // after

list.sorted(Ordering[T].reverse)

list.sortBy(_.property)(Ordering[T].reverse)

That’s how we can avoid a temporary collection allocation and exclude the additional transformation step (to save both heap space and CPU cycles).

Don’t use sorting to find the smallest element:

// before

list.sorted.head

list.sortBy(_.property).head // after

list.min

list.minBy(_.property)

The latter approach is more concise. Besides, it’s more effective, because it doesn’t create a temporary collection.

Don’t use sorting to find the largest element:

// before

list.sorted.last

list.sortBy(_.property).last // after

list.max

list.maxBy(_.property)

Merge consecutive filter calls:

// before

list.filter(p1).filter(p2) // after

list.filter(x => p1(x) && p2(x))

That’s how we can avoid creation of an intermediate collection (after the first filter call), and thus relieve GC’s burden.

The predicates p1 and p2 must be pure.

Don’t imitate span:

// before

val list1 = list.takeWhile(p)

val list2 = list.dropWhile(p) // after

val (list1, list2) = list.span(p)

That’s how we can traverse the sequence and check the predicate only once, instead of twice.

The predicate p must be pure.

Don’t imitate partition:

// before

val list1 = list.filter(p)

val list2 = list.filterNot(p) // after

val (list1, list2) = list.partition(p)

Again, the benefit is a single-pass computation.

The predicate p must be pure.

Don’t use map when result is not required:

// Before

list.map(???) // the result is ignored // After

list.foreach(???)

When side effect is all that is needed, there’s no justification for calling map . Such a call is misleading (and less efficient).

I know, this is not the complete list of efficient use of scala collections. There are many other ideas also which we still need to explore.

Hope that this blog is helpful for you. 🙂

References: