This is the eleventh progress report from the Wide Finder Project; I’ll use it as the results accumulator, updating it in place even if I go on to write more on the subject, which seems very likely. [Update: Your new leader: Perl.]

The Name · There’ve been a few posts wondering about the name “Wide Finder”. The original Ruby code came from my Beautiful Code chapter called “Finding Things”; thus it was a Finder. The problem with modern CPUs is that they’re getting wider not faster. Thus, I seek a Wide Finder.

Worth Doing · There is a steady drumbeat of commentary along the lines of “WTF? This is a trivial I/O-bound hack you could do on the command line, ridiculously inappropriate for investigating issues of parallelism.” To which I say “Horseshit”. It’s an extremely mainstream, boring, unglamorous file-processing batch job. There’s not the slightest whiff of IT Architecture or Social Networking or Service-Orientation about it. Well, that’s how lots of life is. And the Ruby version runs faster on my Mac laptop than on the mighty T5120. The T5120 is what most of the world’s future computers are going to look like! Houston, we have a problem, and the Wide Finder is a useful lens to examine it.

The Latest · [2007/11/19]: After a week or so off, I’m back to running Wide Finder code. First: Sean O’Rourke’s Perl code.

The good news is that it’s insanely fast; the bad news is that I had to visit CPAN to get Sys::Mmap , and CPAN hates me. Always has. In this case, the Makefile.PL makes a Makefile that doesn’t work on Solaris, out of the box until you mangle the incantations.

I also ran Dave Thomas’ memory-mapped code.

Neither the table’s format nor contents are carved in stone; I’m quite sure I’ll update it as I pour more results in. Are there any missing columns? Or outright broken-ness? Feel free to offer suggestions.

I still have a few Wide Finder implementations to run; if you’ve done one and it doesn’t appear here, it couldn’t hurt to drop me a line to make sure I know about it.

Results · The data is essentially all of the ongoing logfile from March of 2007; 971,538,252 bytes of data in 4,625,236 lines. I will make it available in compressed form to other experimenters on request, but I’ll require you to convince me that you won’t publish it.

In each case, the benchmark was run at least twice, usually in succession with other benchmarks, in an attempt to have the disk cache as hot as possible.

This is a production T5120 with eight cores each at 1.4GHz and 64G of RAM. It’s I/O performance is unexciting.

The table can be sorted by clicking on column headings.

Name Language Elapsed User

CPU System

CPU LoC Notes wf(32) Perl 1.51 16.06 2.89 61 O’Rourke wf(16) Perl 1.70 13.79 2.66 61 O’Rourke wf-mmap-multicore JoCaml 1.76 2.42 0.55 278 Fernandez wf(64) Perl 1.77 18.72 3.56 61 O’Rourke wf(8) Perl 2.52 12.63 2.49 61 O’Rourke wfinder7_2(32) Erlang 3.54 † † 345 Nygren wfinder7_2(16) Erlang 4.09 † † 345 Nygren wf(4) Perl 4.25 12.24 2.38 61 O’Rourke wfinder7_2(64) Erlang 4.27 † † 345 Nygren wf-6(16) Python 4.38 * * 137 Lundh wfinder8(32) Erlang 4.42 46.28 12.27 322 Nygren wfinder8(64) Erlang 4.45 56.13 18.92 322 Nygren wfinder8(16) Erlang 4.74 38.48 7.40 322 Nygren tbray9a(128) Erlang 5.26 36.75 8.39 121 Caoyuan tbray9a(32) Erlang 5.29 36.64 8.11 121 Caoyuan tbray9a(64) Erlang 5.45 36.79 8.23 121 Caoyuan tbray9a(16) Erlang 5.60 36.08 8.27 121 Caoyuan wf-6(8) Python 5.81 * * 137 Lundh wfinder7_2(8) Erlang 6.02 † † 345 Nygren wfinder8(128) Erlang 6.02 2:20.71 24.75 322 Nygren wfinder8(8) Erlang 6.39 34.94 5.49 322 Nygren wfinder1_1 Erlang 6.46 34.07 8.02 287 Nygren tbray9a(8) Erlang 7.63 35.28 8.33 121 Caoyuan wf(2) Perl 7.64 12.16 3.32 61 O’Rourke wf_pichi3 Erlang 8.28 51.98 9.38 545 Pichi wf-6(4) Python 9.08 3.66 1.89 137 Lundh wfinder7_2(4) Erlang 9.97 † † 345 Nygren wfinder8(4) Erlang 10.50 33.03 5.12 322 Nygren tbray9a(4) Erlang 11.81 35.37 8.25 121 Caoyuan wf-mmap OCaml 14.64 12.20 2.44 200 Fernandez wf(1) Perl 14.85 12.09 3.25 61 O’Rourke wf-6(2) Python 16.91 3.62 1.86 137 Lundh wfinder8(2) Erlang 18.88 31.58 4.83 322 Nygren wf-block OCaml 18.99 12.96 6.01 144 Fernandez tbray9a(2) Erlang 20.14 35.31 8.28 121 Caoyuan tbray5 Erlang 20.74 3:51.33 8.00 76 Caoyuan wfinder8(1) Erlang 36.11 31.17 4.72 322 Nygren tbray9a(1) Erlang 37.58 35.51 7.82 121 Caoyuan wf OCaml 39.17 31.48 7.69 124 Fernandez wf-2 Python 41.04 34.80 6.24 38 Lundh widefinder Perl 44.29 1:15.22 12.78 57 Wong clv5 Gawk 46.73 40.63 6.10 24 Paddy3118 wf OCaml 49.69 41.94 7.75 110 Heikkinen wf_p Ruby 50.16 37.58 12.50 39 Heikkinen dave Ruby 58.27 43.18 14.39 8 Thomas widefinder3 PHP 1:00:25 55:04 5.21 39 Beattie tbray5 Erlang 1:04.32 35:33.35 45.84 93 Vinoski report-counts Ruby 1:43.71 1:27.11 16.60 13 Bray ? Groovy 2:21.83 2:22.97 19.95 17 Brown

Variability · The quantities in the cells marked “ * ” above exhibited a lot of variability from run to run, sufficiently so as to make them probably actively misleading to include.

Unknown · The quantities in the cells marked “†” are unknown. These multi-process Erlang runs arrange for the absence of parent/child process relationships, so it’s trickier to determine User and System CPU times.

Notes on Running Nygren’s wfinder7_2 · This is multi-process not multi-thread. The best results are with 32 processes.

Notes on Running Nygren’s wfinder8 · Check out Anders’ notes. The number in parentheses is the value of the +S argument to Erlang, telling it how many schedulers to run. Note that this is an 8-core machine with two integer instruction threads per core and support for eight thread contexts per core. Solaris thinks it sees 64 CPUs. So if you don’t specify +S the default is 64, which seems about right.

Notes on Running the wf-* Python Code · This code is from Fredrik Lundh; see his discussion. I added a number-of-processes command-line parameter to wf-6.py , which appears in parentheses in the table above; so wf-6(8) means running with eight processes. I left the chunk size at 50M for now, this is just (just barely) small enough to run with 16 processes.

Fredrik took a glance and suggested removing wf-3 through wf-5 in the interests of making the table more readable.

The reporting of user and system CPU time was wildly variable. For example, wf-6(8) reported user CPU as low as 5.06 seconds and as high as 31.99 seconds. I suspect that this may be worth bringing to the attention of the Solaris people; the state of the art in tracking CPU usage in this fairly exotic type of machine is still a little shaky, apparently.

Analysis · Are you kidding me!?!? Getouttahere. Maybe someday.

Previous News · [2007/10/30]: I’ve been back from Shanghai for a few days now, but it took till past lunch today to get logged into the mighty T5120 and doing some Wide Finding. It’s a pretty naked Solaris box, and compiling all this stuff has been just a bundle of fun. Not.

[2007/10/31]: Added a Lines-of-Code column to the table. Ran Fredrik Lundh’s multi-process Python, Ilmari Heikkinen’s OCaml, and Russ Beattie’s PHP. There are a few that I’ve tried and failed to run; Erlang code that won’t compile, C and JoCaml code that will compile but not run. In each case I’ve pinged the author, and I may go back and try to see if I can sort things out.

[2007/11/01]: Got Pichi’s patched Erlang to run. 545 lines, wow!. Caoyuan’s too, much more concise, but not as fast.

[2007/11/01]: Did some runs with Nygren’s wfinder8 ; details here.

[2007/11/04]: Per advice from contributors below, I installed sorttable and it seems to work just fine!.

[2007/11/05]: I ran Caoyuan’s latest; notes here.

Then I ran Mauricio Fernandez’ OCaml and JoCaml; see Aim for the Top! Beating the current #1 Wide Finder log analyzer with the join-calculus. Holy crap! I have to say that the OCaml build/run process is kind of klunky, this isn’t your classic REPL by any means. But you know, this isn’t the first time that I’ve seen OCaml thump the competition on a benchmark; it’s just the first time that it was a benchmark I cared about.

[2007/11/08]: I ran Eric Wong’s Perl code (notes here); kind of disappointing.

Then I ran Anders Nygren’s revised wfinder7_2; he writes: “The difference compared to wfinder8 is that this one does not use the SMP virtual machine. Instead it starts a number of slave erlang nodes.”

Caoyuan revised his tbray9 to produce tbray9a , requesting that I replace the results.

[2007/11/09]: Russ Beattie revised his PHP code, renaming it widefinder3 , and indeed, it moved a few places up the table.

Also, based on several requests about this box’s I/O performance, I did a Bonnie run, which is kind of interesting.

I/O · There have been appeals, both in the comments and in email, for a characterization of the test box’s I/O performance. I ran Bonnie with the following results:

-------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU 20 23.9 99.7 41.5 29.8 44.2 61.8 22.3 100 161.8 100 21835 239.4

Given that this stupid thing has 64G of RAM and I could only find 20G of disk space to work with, the results should be taken with a grain of salt (especially the Random Seeks number). Having said that, watching the free-memory readout made it look to me as though when it was doing the sequential runs, it was doing real I/O, not just caching.