While preparing my class notes for functional programming in Java I was struck between the neat correspondence between many Java Stream methods and Unix commands. I decided to organize the most common of these in a dictionary form that allows the mapping between the two. I'd very much welcome comments regarding common patterns that I've missed.

Java stream processing and Unix pipelines share many traits.

No intermediate storage is needed for processing data

Functional processing: the original data is not modified

Lazy operations: no more data than what is needed are processed (on Unix this is made transparently possible through the SIGPIPE signal)

An indefinite (possibly infinite) number of elements can be processed

Operations can be parallelized

Compared to Java streams Unix pipelines have two advantages

Operations on multiple streams are possible with commands such as join, comm, paste and shell extensions such as Bash's process substitution and dgsh

Non-homogeneous binary data can be easily processed though toolsets such as SoX, NetPBM, FFmpeg, OpenSSL and diverse compression programs.

On the other hand, compared to Unix pipelines, Java streams can efficiently process homogeneous streams of binary objects through custom-build functions.

Without further ado, here is the mapping between Java stream methods and Unix pipeline commands, divided into sources, intermediate methods (filters), and terminal methods.

Stream Sources

Java's stream sources generate stream data from other objects. Many Unix commands produce an output stream from files, the filesystem, or databases.

Java Stream Methods Unix Pipeline Commands BufferedReader.lines() cat or curl Files.list() ls Files.find(Path start, ...) find IntStream.range(int, int) seq first last Arrays.stream(Object[]) dd JarFile.stream() jar tv or tar tv Random.ints() shuf -i Collection.stream() Database CLI, e.g. mysql Stream.concat() cat

Intermediate Stream Methods

Java's intermediate stream methods are the equivalent of typical Unix filters: they process stream data generating another stream as output.

Java Stream Methods Unix Pipeline Commands filter(Predicate predicate) grep RE or awk ' predicate ' map(Function mapper) sed 's/ RE / text /' or awk '{print ...}' or tr or cut or recode or rev or ... distinct() uniq sorted() sort parallel() xargs -P or parallel peek(Consumer action) tee >(...) limit(long maxSize) head skip(long n) tail takeWhile(Predicate predicate) sed '!/ RE /q' or awk '{print} predicate {exit}'

Terminal Stream Methods

Java's terminal stream methods consume a stream generating a result. The corresponding Unix commands produce the result as a single output line, or, for Boolean values, as their exit code.