ooph… yah… file_list isn’t very good.

That is some of the first Rust I wrote when I started working on this project. Taking the Vec as a parameter was the result of writing C style code in Rust which I now understand isn’t necessary in most cases. You can see I didn’t write code in that style later because I learned how Vec are returned (cheaply). As for returning the size, I can’t remember why I returned the size of the Vec.

I never really paid attention to that function after I got deeper into the project but it should probably be removed entirely. I was never thrilled with it being part of the library anyway. process_files should probably take a &[std::path::Path] and the user should decide how they want to build the file list or something like that.

And that’s what v2.0.0 is for.

As far as benchmarking, I’m not a fan of microbenchmarks in general. The one in the repo is somewhat useful however. I don’t think I’ll do much more than that as far as microbenchmarks. Were you considering something specific?

If you go through the git history, you can see I was benchmarking using an example program that was run against real data as I learned and made changes. I was recording the results in the main README. However, the example program is actually being used by my employer so I had to move it to a project in their github account before I published. Plus the test project did more than just benchmark the library. It benchmarked what it’s used for by my employer.

That being said I just stripped the example program down to get it pretty close to just benchmarking the parsing. These are the result.

Processed 138 files having 11446671 records in 36120 milliseconds.

Not bad. That’s with a pull request that someone sent me which got rid of a ton of allocations. Also, that’s without any concurrency. It would be very easy to write a program that did the parsing in parallel.

I’ll probably add an example project later that does the same as the current, stripped down example. The problem is that it’s nearly impossible to duplicate the results unless I leave a bunch of log files hanging around on my local drive. I’ll have to think about that.

Thanks for taking a look. I appreciate the feedback.