After heaving to much fun with a 20 year old filesystem and the inability of unix commands to hande odd filenames, I decided to replace find /somewhere -type f | xargs -P 10 -n 1 do-stuff with a Perl 6 script.

The first step is to travers a directory tree. I don’t really need to keep the a list of paths but for sure run stuff in parallel. Generating a supply in a thread seams to be a reasonable thing to do.

start my $s = supply {

for '/snapshots/home-2019-01-29/' {

emit .IO if (.IO.f & ! .IO.l);

.IO.dir()».&?BLOCK if (.IO.d & ! .IO.l);

CATCH { default { put BOLD .Str } }

}

}



{

my @files;

react whenever $s {

@files.push: $_;

}

say +@files;

say now - ENTER now;

}



Recursion is done with by calling the for block on the topic with .&?BLOCK . It’s very short and very slow. It takes 21.3s for 200891 files — find will do the same in 0.296s.

The OS wont be the bottleneck here, so maybe threading will help. I don’t want to overwhelm the OS with filesystem requests though. The buildin Telemetry module can tell us how many worker threads are sitting on their hands at any given time. If we use Promise to start workers by hand, we can decide to avoid threading when workers are still idle.

sub recurse(IO() $_){

my @ret;

@ret.push: .Str if (.IO.f & ! .IO.l);

if (.IO.d & ! .IO.l) {

if Telemetry::Instrument::ThreadPool::Snap.new<gtq> > 4 {

@ret.append: do for .dir() { recurse($_) }

} else {

@ret.append: await do for .dir() {

Promise.start({ recurse($_) })

}

}

}

CATCH { default { put BOLD .Str } }

@ret.Slip

}

{

say +recurse('/snapshots/home-2019-01-29');

say now - ENTER now;

}



That takes 7.65s what is a big improvement but still miles from the performance of a 20 year old c implementation. Also find can that do the same and more on a single CPU core instead of producing a load of ~800%.

Poking around in Rakudos source, one can clearly see why. There are loads of IO::Path objects created and c-strings concatenated, just to unbox those c-strings and hand them over to some VM-opcodes. All I want are absolute paths I can call open with. We have to go deeper!

use nqp;

my @files;

my @dirs = '/snapshots/home-2019-01-29';

while @dirs.shift -> str $dir {

my Mu $dirh := nqp::opendir(nqp::unbox_s($dir));

while my str $name = nqp::nextfiledir($dirh) {

next if $name eq '.' | '..';

my str $abs-path = nqp::concat( nqp::concat($dir, '/'), $name);

next if nqp::fileislink($abs-path);

@files.push: $abs-path if nqp::stat($abs-path, nqp::const::STAT_ISREG);

@dirs.push: $abs-path if nqp::stat($abs-path, nqp::const::STAT_ISDIR);

}

CATCH { default { put BOLD .Str, ' ⟨', $dir, '⟩' } }

nqp::closedir($dirh);

}

say +@files; say now - ENTER now;

And this finishes in 2.58s with just 1 core and should play better in situations where not many filehandles are available. Still 9 times slower than find but workable. Wrapping it into a supply is a task for another day.

So for the time being — if you want fast you need nqp.

UPDATE: We need to check the currently waiting workers, not the number of spawned workers. Example changed to Snap.new<gtq> .