Performance Overhead

Photo by Joe Neric on Unsplash

There are 3 main aspects that affect the performance overhead of sequences and streams:

Primitive handling (boolean, char, byte, short, int, long, float, & double)

Optional values

Lambda creation

Primitive Handling

Although Kotlin doesn’t expose primitive types in its type system, it uses primitives behind the scenes when possible. For example, a nullable Double ( Double? ) is stored as a java.lang.Double behind the scenes whereas a non-nullable Double is stored as a primitive double when possible.

Streams have primitive variants to avoid autoboxing but sequences do not:

// Sequence

people.asSequence()

.map { it.weight } // Autobox non-nullable Double

... // Stream

people.stream()

.mapToDouble { it.weight } // DoubleStream from here onwards

...

However, if we capture them in a collection then they’ll be autoboxed anyway since generic collections store references. Additionally, if you’re already dealing with boxed values, unboxing and collecting them in another list is worse than passing along the boxed references so primitive streams can be detrimental when over-used:

// Stream

val testScores = people.stream()

.filter { it.testScore != null }

.mapToDouble { it.testScore!! } // Very bad! Use map { ... }

.toList() // Unnecessary autoboxing because we unboxed them

Although sequences don’t have primitive variants, they avoid some autoboxing by including utilities to simplify common actions. For example, we can use sumByDouble instead of needing to map the value and then sum it as a separate step. These reduce autoboxing and also simplify the code.

When autoboxing happens as a result of sequences, this results in a very efficient heap-usage pattern. Sequences (& streams) pass each element along through all sequence actions until reaching the terminal operation before moving on to the next element. This results in having just a single reachable autoboxed object at any point in time. Garbage collectors are designed to be efficient with short-lived objects since only surviving objects get moved around so the autoboxing that results from sequences is the best possible / least expensive type of heap usage. The memory of these short-lived autoboxed objects won’t flood the survivor spaces so this will utilize the efficient path of the garbage collector rather than causing full collections.

All else being equal, avoiding autoboxing is preferred. Therefore streams can be more efficient when working with temporary primitive values in separate stream actions. However, this only applies when using the specialized versions and also as long as we don’t overuse the primitive variants as they can be detrimental sometimes.

Optional Values

Streams create Optional wrappers when values might not be present (eg. with min , max , reduce , find , etc.) whereas sequences use nullable types:

// Sequence

people.asSequence()

...

.find { it.name.length > 5 } // returns nullable Person // Stream

people.stream()

...

.filter { it.name.length > 5 }

.findAny() // returns Optional<Person> wrapper

Therefore sequences are more efficient with optional values as they avoid creating the Optional wrapper object.

Lambda Creation

Sequences support mapping and filtering non-null values in 1 step and thus reduce the number of lambdas instances:

// Sequence

people.asSequence()

.mapNotNull { it.testScore } // create lambda instance

... // Stream

people.stream()

.map { it.testScore } // create lambda instance

.filter { it != null } // create another lambda instance

...

Additionally, most terminal operations on sequences are inline functions which avoid the creation of the final lambda instance:

people.asSequence()

.filter { it.age >= 18 }

.forEach { println(it.name) } // forEach inlined at compile time

Therefore sequences create fewer lambda instances resulting in more efficient execution due to less indirection.

Performance Overhead Conclusions

Streams have specialized primitive versions to avoid autoboxing when performing multiple transformations on primitive values. Sequences have some mitigation strategies to make unnecessary autoboxing less common. Additionally, the heap usage that’s caused by sequence autoboxing flows through the efficient path of garbage collectors. Nevertheless, streams can be more efficient for primitive transformations as long as they’re not over-used.

Note that the order of operations can have a significant impact on the number of autoboxing occurrences:

// Before

val adultAgesSquared = people.asSequence()

.map { it.age } // autobox non-nullable age

.filter { it >= 18 } // throw away some autoboxed values

.map { it * it } // square and autobox again

.toList() // After - No unnecesarry autoboxing

val adultAgesSquared = people.asSequence()

.filter { it.age >= 18 }

.map { it.age * it.age } // single autobox

.toList()

Additionally, both sequences and streams are avoided for performance-critical code. As an example, performing intensive linear algebra manipulations on large datasets is best without sequences or streams because the added indirection can have a significant impact on these types of use-cases.

After looking at hundreds of real-life usages in a business setting, the vast majority are more efficient with sequences because they require fewer lambdas, they don’t wrap optional values, and most terminal operations are inlined.