Collections vs sequences

The difference between eager and lazy evaluation lies in when each transformation on the collection is performed.

Collections are eagerly evaluated — each operation is performed when it’s called and the result of the operation is stored in a new collection. The transformations on collections are inline functions. For example, looking at how map is implemented, we can see that it’s an inline function, that creates a new ArrayList :

public inline fun <T, R> Iterable<T>.map(transform: (T) -> R): List<R> {

return mapTo(ArrayList<R>(collectionSizeOrDefault(10)), transform)

}

Sequences are lazily evaluated. They have two types of operations: intermediate and terminal. Intermediate operations are not performed on the spot; they’re just stored. Only when a terminal operation is called, the intermediate operations are triggered on each element in a row and finally, the terminal operation is applied. Intermediate operations (like map , distinct , groupBy etc) return another sequence whereas terminal operations (like first , toList , count etc) don’t.

Sequences don’t hold a reference to the items of the collection. They’re created based on the iterator of the original collection and keep a reference to all the intermediate operations that need to be performed.

Unlike transformations on collections, intermediate transformations on sequences are not inline functions — inline functions cannot be stored and sequences need to store them. Looking at how an intermediate operation like map is implemented, we can see that the transform function is kept in a new instance of a Sequence :

public fun <T, R> Sequence<T>.map(transform: (T) -> R): Sequence<R>{

return TransformingSequence(this, transform)

}

A terminal operation, like first , iterates through the elements of the sequence until the predicate condition is matched.

public inline fun <T> Sequence<T>.first(predicate: (T) -> Boolean): T {

for (element in this) if (predicate(element)) return element

throw NoSuchElementException(“Sequence contains no element matching the predicate.”)

}

If we look at how a sequence like TransformingSequence (used in the map above) is implemented, we’ll see that when next is called on the sequence iterator, the transformation stored is also applied.

internal class TransformingIndexedSequence<T, R>

constructor(private val sequence: Sequence<T>, private val transformer: (Int, T) -> R) : Sequence<R> { override fun iterator(): Iterator<R> = object : Iterator<R> {

…

override fun next(): R {

return transformer(checkIndexOverflow(index++), iterator.next())

}

…

}

Independent on whether you’re using collections or sequences, the Kotlin Standard Library offers quite a wide range of operations for both, like find , filter , groupBy and others. Make sure you check them out, before implementing your own version of these.

Collections and sequences

Let’s say that we have a list of objects of different shapes. We want to make the shapes yellow and then take the first square shape.

Collections vs sequences

Let’s see how and when each operation is applied for collections and when for sequences

Collections

map is called — a new ArrayList is created. We iterate through all items of the initial collection, transform it by copying the original object and changing the color, then add it to the new list.

is called — a new ArrayList is created. We iterate through all items of the initial collection, transform it by copying the original object and changing the color, then add it to the new list. first is called — we iterate through each item until the first square is found

Sequences

asSequence — a sequence is created based on the Iterator of the original collection

— a sequence is created based on the Iterator of the original collection map is called — the transformation is added to the list of operations needed to be performed by the sequence but the operation is NOT performed

is called — the transformation is added to the list of operations needed to be performed by the sequence but the operation is performed first is called — this is a terminal operation, so, all the intermediate operations are triggered, on each element of the collection. We iterate through the initial collection applying map and then first on each of them. Since the condition from first is satisfied by the 2nd element, then we no longer apply the map on the rest of the collection.

When working with sequences no intermediate collection is created and since items are evaluated one by one, map is only performed on some of the inputs.

Collections vs sequences — eager vs lazy evaluation

Performance

Order of transformations

Independent of whether you’re using collections or sequences, the order of transformations matters. In the example above, first doesn’t need to happen after map since it’s not a consequence of the map transformation. If we reverse the order of our business logic and call first on the collection and then transform the result, then we only create one new object — the yellow square. When using sequences — we avoid creating 2 new objects, when using collections, we avoid creating an entire new list.

Order of transformations matters — avoid unnecessary work

Because terminal operations can finish processing early, and intermediate operations are evaluated lazily, sequences can, in some cases, help you avoid doing unnecessary work compared to collections. Make sure you always check the order of the transformations and the dependencies between them!

Inlining and large data sets consequences

Collection operations use inline functions, so the bytecode of the operation, together with the bytecode of the lambda passed to it will be inlined. Sequences don’t use inline functions, therefore, new Function objects are created for each operation.

On the other hand, collections create a new list for every transformation while sequences just keep a reference to the transformation function.

When working with small collections, with 1–2 operators, these differences don’t have big implications so working with collections should be ok. But, when working with large lists the intermediate collection creation can become expensive; in such cases, use sequences.

Unfortunately, I’m not aware of any benchmarking study done that would help us get a better understanding on how the performance of collections vs sequences is affected with different sizes of collections or operation chains.