Notice, this issue has been fixed in Java 8 (8u222), thanks for the comment Zheka Kozlov

In a recent article, I’ve shown that programmers should always apply a filter first, map later strategy with streams. The example I made there was this one:

hugeCollection .stream() .limit(2) .map(e -> superExpensiveMapping(e)) .collect(Collectors.toList());

In this case, the limit() operation implements the filtering, which should take place before the mapping.

Several readers correctly mentioned that in this case, it doesn’t matter what order we’re putting the limit() and map() operations, because most operations are evaluated lazily in the Java 8 Stream API.

Or rather: The collect() terminal operation pulls values from the stream lazily, and as the limit(5) operation reaches the end, it will no longer produce new values, regardless whether map() came before or after. This can be proven easily as follows:

import java.util.stream.Stream; public class LazyStream { public static void main(String[] args) { Stream.iterate(0, i -> i + 1) .map(i -> i + 1) .peek(i -> System.out.println("Map: " + i)) .limit(5) .forEach(i -> {}); System.out.println(); System.out.println(); Stream.iterate(0, i -> i + 1) .limit(5) .map(i -> i + 1) .peek(i -> System.out.println("Map: " + i)) .forEach(i -> {}); } }

The output of the above is:

Map: 1 Map: 2 Map: 3 Map: 4 Map: 5 Map: 1 Map: 2 Map: 3 Map: 4 Map: 5

But this isn’t always the case!

This optimisation is an implementation detail, and in general, it is not unwise to really apply the filter first, map later rule thoroughly, not relying on such an optimisation. In particular, the Java 8 implementation of flatMap() is not lazy. Consider the following logic, where we put a flatMap() operation in the middle of the stream:

import java.util.stream.Stream; public class LazyStream { public static void main(String[] args) { Stream.iterate(0, i -> i + 1) .flatMap(i -> Stream.of(i, i, i, i)) .map(i -> i + 1) .peek(i -> System.out.println("Map: " + i)) .limit(5) .forEach(i -> {}); System.out.println(); System.out.println(); Stream.iterate(0, i -> i + 1) .flatMap(i -> Stream.of(i, i, i, i)) .limit(5) .map(i -> i + 1) .peek(i -> System.out.println("Map: " + i)) .forEach(i -> {}); } }

The result is now:

Map: 1 Map: 1 Map: 1 Map: 1 Map: 2 Map: 2 Map: 2 Map: 2 Map: 1 Map: 1 Map: 1 Map: 1 Map: 2

So, the first Stream pipeline will map all the 8 flatmapped values prior to applying the limit, whereas the second Stream pipeline really limits the stream to 5 elements first, and then maps only those.

The reason for this is in the flatMap() implementation:

// In ReferencePipeline.flatMap() try (Stream<? extends R> result = mapper.apply(u)) { if (result != null) result.sequential().forEach(downstream); }

As you can see, the result of the flatMap() operation is consumed eagerly with a terminal forEach() operation, which will always produce all the four values in our case and send them to the next operation. So, flatMap() isn’t lazy, and thus the next operation after it will get all of its results. This is true for Java 8. Future Java versions might improve this, of course.

We better filter them first. And map later.

Update: flatMap() gets fixed in JDK 10

Thanks, Tagir Valeev, for pointing out that there’s a fix coming up:

Btw flatMap gets fixed in Java 10. — Tagir Valeev (@tagir_valeev) January 25, 2018

Relevant links:

https://bugs.openjdk.java.net/browse/JDK-8075939

http://hg.openjdk.java.net/jdk/jdk10/rev/fca88bbbafb9