I'm a few years behind here, but:

In 'Edit 4/5/6' of the original post, you are using the construction:

$ /usr/bin/time cat big_file | program_to_benchmark

This is wrong in a couple of different ways:

You're actually timing the execution of cat , not your benchmark. The 'user' and 'sys' CPU usage displayed by time are those of cat , not your benchmarked program. Even worse, the 'real' time is also not necessarily accurate. Depending on the implementation of cat and of pipelines in your local OS, it is possible that cat writes a final giant buffer and exits long before the reader process finishes its work. Use of cat is unnecessary and in fact counterproductive; you're adding moving parts. If you were on a sufficiently old system (i.e. with a single CPU and -- in certain generations of computers -- I/O faster than CPU) -- the mere fact that cat was running could substantially color the results. You are also subject to whatever input and output buffering and other processing cat may do. (This would likely earn you a 'Useless Use Of Cat' award if I were Randal Schwartz.

A better construction would be:

$ /usr/bin/time program_to_benchmark < big_file

In this statement it is the shell which opens big_file, passing it to your program (well, actually to time which then executes your program as a subprocess) as an already-open file descriptor. 100% of the file reading is strictly the responsibility of the program you're trying to benchmark. This gets you a real reading of its performance without spurious complications.

I will mention two possible, but actually wrong, 'fixes' which could also be considered (but I 'number' them differently as these are not things which were wrong in the original post):

A. You could 'fix' this by timing only your program:

$ cat big_file | /usr/bin/time program_to_benchmark

B. or by timing the entire pipeline:

$ /usr/bin/time sh -c 'cat big_file | program_to_benchmark'

These are wrong for the same reasons as #2: they're still using cat unnecessarily. I mention them for a few reasons:

they're more 'natural' for people who aren't entirely comfortable with the I/O redirection facilities of the POSIX shell

there may be cases where cat is needed (e.g.: the file to be read requires some sort of privilege to access, and you do not want to grant that privilege to the program to be benchmarked: sudo cat /dev/sda | /usr/bin/time my_compression_test --no-output )

in practice, on modern machines, the added cat in the pipeline is probably of no real consequence.

But I say that last thing with some hesitation. If we examine the last result in 'Edit 5' --

$ /usr/bin/time cat temp_big_file | wc -l 0.01user 1.34system 0:01.83elapsed 74%CPU ...

-- this claims that cat consumed 74% of the CPU during the test; and indeed 1.34/1.83 is approximately 74%. Perhaps a run of:

$ /usr/bin/time wc -l < temp_big_file

would have taken only the remaining .49 seconds! Probably not: cat here had to pay for the read() system calls (or equivalent) which transferred the file from 'disk' (actually buffer cache), as well as the pipe writes to deliver them to wc . The correct test would still have had to do those read() calls; only the write-to-pipe and read-from-pipe calls would have been saved, and those should be pretty cheap.

Still, I predict you would be able to measure the difference between cat file | wc -l and wc -l < file and find a noticeable (2-digit percentage) difference. Each of the slower tests will have paid a similar penalty in absolute time; which would however amount to a smaller fraction of its larger total time.

In fact I did some quick tests with a 1.5 gigabyte file of garbage, on a Linux 3.13 (Ubuntu 14.04) system, obtaining these results (these are actually 'best of 3' results; after priming the cache, of course):

$ time wc -l < /tmp/junk real 0.280s user 0.156s sys 0.124s (total cpu 0.280s) $ time cat /tmp/junk | wc -l real 0.407s user 0.157s sys 0.618s (total cpu 0.775s) $ time sh -c 'cat /tmp/junk | wc -l' real 0.411s user 0.118s sys 0.660s (total cpu 0.778s)

Notice that the two pipeline results claim to have taken more CPU time (user+sys) than real wall-clock time. This is because I'm using the shell (bash)'s built-in 'time' command, which is cognizant of the pipeline; and I'm on a multi-core machine where separate processes in a pipeline can use separate cores, accumulating CPU time faster than realtime. Using /usr/bin/time I see smaller CPU time than realtime -- showing that it can only time the single pipeline element passed to it on its command line. Also, the shell's output gives milliseconds while /usr/bin/time only gives hundredths of a second.

So at the efficiency level of wc -l , the cat makes a huge difference: 409 / 283 = 1.453 or 45.3% more realtime, and 775 / 280 = 2.768, or a whopping 177% more CPU used! On my random it-was-there-at-the-time test box.

I should add that there is at least one other significant difference between these styles of testing, and I can't say whether it is a benefit or fault; you have to decide this yourself:

When you run cat big_file | /usr/bin/time my_program , your program is receiving input from a pipe, at precisely the pace sent by cat , and in chunks no larger than written by cat .

When you run /usr/bin/time my_program < big_file , your program receives an open file descriptor to the actual file. Your program -- or in many cases the I/O libraries of the language in which it was written -- may take different actions when presented with a file descriptor referencing a regular file. It may use mmap(2) to map the input file into its address space, instead of using explicit read(2) system calls. These differences could have a far larger effect on your benchmark results than the small cost of running the cat binary.