

shlomif_tech

[ shlomif ]

I've been working on File-Find-Object for quite some time now. File-Find-Object provides an alternative to File::Find, which is a built-in module in the perl 5 distribution, with several major philosophical limitations. What both modules do is traverse a directory tree and allow the programmer to do something with all the results, similar to the Unix shell's "find" command. My obsession with working on File-Find-Object started in this perl5-porters thread, where I also claimed that File::Find was probably IO-Bound, and so making it iterative and capable of instantiation (which is partially what File-Find-Object is all about), will not harm its performance much. As it turns out, I was not entirely right. About two weeks ago, I decided to finally benchmark File-Find-Object. The original motivation for this was a small change I did and perceived as an optimisation in which I reduced the number of hard-disk-reading system calls that it used to a minimum. I wrote a few shell scripts that timed the old version of F-F-O versus the new one on my ~/progs directory (which contains many sub-directories), after it was in turn traversed by the shell "find" command. Both version took somewhat over 2 minutes to complete the scan, which seemed pretty good to me. However, when I benchmarked a similar script using File::Find, I found out that it finished the scan in under 5 seconds, making it 24 times faster than File-Find-Object. I didn't want F-F-O to be so much worse, so I decided to work on optimising it. My methodology in optimising File-Find-Object was to profile it using Devel::NYTProf (the Perl profiler with the best reputation, and one which I can now recommend as well), find problematic spots, try to optimise them, and then benchmark the old code against the new to see if there were any improvements, and if so - commit. Among the notable changes I performed were: The first thing I noticed was that a lot of time was spent in Class::Accessor code. To remedy this, I switched to Class-XSAccessor, which resulted in saving 40 seconds of runtime. The other changes I made were less substantial individually in terms of run-time saving. Added a flag to $self (the object instance) with whether or not the stack of the directories through which the traversal is in progress is full. It was used inside several dynamically-generated functions where I delegate to top and non-top versions. To save some extra speed, the flag is checked using exists which should be a bit faster than checking for the truth of the value. Saved 2.5 seconds of run-time by implementing a commonly used function (that was reported to consume a lot of time) in a direct manner, instead of the top/non-top delegation, and using some judicious optimisation using the "||" operator. Saved many seconds on eliminating calls to _copy methods that flat-copy array references (and that are also generated dynamically). Often, the copying was not needed at all. Eliminated excessive calls to File::Spec's functions by caching the results. For each node, File-Find-Object performs two actions in a specific order depending on the "depth" instance-wide parameter. Previously, handling them involved weird returns of arrays, and indexes-mangling (which I admit that I originally implemented in my endless refactoring). This was changed to a simple array of method names, from which values are extracted. Re-arranged the order of operations to prevent an if (_is_top()) { ... } conditional, that was done in every iteration. Originally, the main object served as the controlling object for the top directory, and dir_stack->[0] was the directory below it. So when checking for fatherhood of a directory, one had to return the main object for the father of dir_stack->[0] . In a series of commits, I moved it to be a separate object, as well as occupying the first step in the directory stack. As a result, I was able to cache the "_current" object inside an accessor, because doing so in for the main object before would have resulted in a circular reference, which would have caused a memory leak. Thanks to all these optimisations, File-Find-Object now runs at about 30-40 seconds. Yesterday I also benchmarked the first version of File-Find-Object before I took over its maintenance, and started refactoring it, and it runs at 17 seconds for the same test. So what are the Conclusions? Accessors in Perl may incur a large overhead if Class::XSAccessor (or similar) are not used. Extracting methods excessively can incur a heavy penalty on performance of Perl programs. Traversing a directory in Perl can be very CPU-bound. Many small optimisations can together yield a huge benefit. Bill Raymond says in this message to the fc-solve-discuss mailing list that "I achieved my fast times by multitudes of 1% reductions". I can attest to it here. </p> As indicated by the benchmarks, the road to a faster File-Find-Object is not over, and I still have some ideas for improvements in mind. But it's still much better than it used to be, and I can be proud of it. The title of this post was inspired by Ido Trivizki's "Shavin' another bit" presentation given at YAPC::Israel::2004, and which I attended and enjoyed.