

shlomif_tech

[ shlomif ]

I've written about File-Find-Object before, but I've intended to write an entry demonstrating its philosophical advantages over the core File::Find module. Today, I'd like to get to it. As opposed to File::Find, File-Find-Object: Has an iterative interface, and is capable of being interrupted in the middle. Can be instantiated and be used to traverse an arbitrary number of directory trees in one process. Can return result objects instead of just plain paths. I'd like to demonstrate some of these advantages now. Case Study #1: Looking for a Needle in a Haystack Let's suppose you have a huge directory tree containing many directories and files, and you're looking for only one result (or a few ones). Once you found that result you wish to stop. This question was raised in this Stack Overflow post. So how can you do it with File::Find? Not very easily. Either you can throw an exception: sub processFile() { if ($_ =~ /target/) { die { type => "file-was-found", path => $File::Find::name }; } } eval { find (\&processFile, $mydir); }; if ( $@ ) { my $result = $@; if ( (ref($result) eq "HASH") && ($result->{type} eq "file-found") ) { my $path = $result->{path}; # Do something with $path. } elsif ( $result ) { die $result; } } else { # be sad } This is incredibly inelegant, and abuses the Perl exception system for propagating values instead of errors. But there's even a worse way, using $File::Find::prune : #! /usr/bin/perl -w use strict; use File::Find; my @hits = (); my $hit_lim = shift || 20; find( sub { if( scalar @hits >= $hit_lim ) { $File::Find::prune = 1; return; } elsif( -d $_ ) { return; } push @hits, $File::Find::name; }, shift || '.' ); $, = "

"; print @hits, "

"; Here, we prune all the levels from the results up to the root to get out of the loop. So how can you do it with File-Find-Object? In a very straightforward manner: #!/usr/bin/perl use strict; use warnings; use File::Find::Object; sub find_needle { my $base = shift; my $finder = File::Find::Object->new({}, $base); while (defined(my $r = $finder->next())) { if ($r =~ /target/) { return $r; } } return; } my $found = find_needle(shift(@ARGV)); if (defined($found)) { print "$found

"; } else { die "Could not find target."; } The find_needle() function is the important thing here, and one can see it doesn't use any exceptions, excessive prunes or anything like that. It just harnesses the iterative interface of File-Find-Object. And it works too: shlomi:~$ perl f-f-o-find-needle.pl ~/progs/ /home/shlomi/progs/Rpms/BUILD/ExtUtils-MakeMaker-6.52/t/dir_target.t shlomi:~$ Case Study #2: Recursive Diff Let's suppose an evil djinni has removed the -r flag from your diff program, making you unable to recursively find the differences between files in two directory tree. As a result, you now need to write a recursive-diff program in Perl that will run diff -u on the two copies of each equivalent path in the two directorie. Since File::Find cannot be instantiated two times at once, then when using it, we will need to collect all the results from both directories, and then traverse them in memory. But with File-Find-Object there is a better way: #!/usr/bin/perl use strict; use warnings; use File::Find::Object; use List::MoreUtils qw(all); my @indexes = (0,1); my @paths; for my $idx (@indexes) { push @paths, shift(@ARGV); } my @finders = map { File::Find::Object->new({}, $_ ) } @paths; my @results; my @fns; sub fetch { my $idx = shift; if ($results[$idx] = $finders[$idx]->next_obj()) { $fns[$idx] = join("/", @{$results[$idx]->full_components()}); } return; } sub only_in { my $idx = shift; printf("Only in %s: %s

", $paths[$idx], $fns[$idx]); fetch($idx); return; } for my $idx (@indexes) { fetch($idx); } COMPARE: while (all { $_ } @results) { my $skip = 0; foreach my $idx (@indexes) { if (!$results[$idx]->is_file()) { fetch($idx); $skip = 1; } } if ($skip) { next COMPARE; } if ($fns[0] lt $fns[1]) { only_in(0); } elsif ($fns[1] lt $fns[0]) { only_in(1); } else { system("diff", "-u", map {$_->path() } @results); foreach my $idx (@indexes) { fetch($idx); } } } foreach my $idx (@indexes) { while($results[$idx]) { only_in($idx); } } ( As a bonus, we do not need to sort the results explicitly at any stage, because File-Find-Object sorts them for us. ) This program did not take me a long time to write, it works pretty well, and does populate a long list of results of one or both directories. Conclusion If you use File-Find-Object instead of File::Find, your code may be cleaner, your logic less convulted, and you may actually be able to achieve things that are not possible with the latter. I hope I whet your appetite here and convinced you to give File-Find-Object a try. So what does the future holds? I recently ported File-Find-Rule to File-Find-Object and called the result File-Find-Object-Rule . As a result, "->start" and "->match" are now truly iterative, and I believe you can iterate with them on several objects at once. As I discovered by porting File-Find-Object-Rule-MMagic, I unfortunately cannot maintain full backwards compatibility with the plugin API of File-Find-Rule, because the latter exposes some of behaviour of File::Find (in a leaky abstraction fashion). I'm planning on porting more File-Find-Rule plugins to File-Find-Object-Rule, and would appreciate any help. I also would like to look at the directory tree traversal APIs of other languages to see if they contain any interesting techniques.