The issue

In the last month of 2016, I was assigned OrePAN2 in the CPAN Pull Request Challenge. When browsing its issues on GitHub, I discovered #47:

Right now we cannot easily rebuild a minicpan with a lot of modules because the MetaCPAN lookup fails. The problem is here: OrePAN2::Indexer line 148 This code needs to break up the query after X releases have been pushed onto the @file_search stack. I don’t have number handy, but trying to rebuild the minicpan will yield it fairly quickly. AC: OrePAN2 can accommodate lookups for an arbitrary number of modules For testing the MetaCPAN behaviour, see the use MetaCPAN subtest in t/06_inject_live.t . The logic which needs to be tweaked is in OrePAN2::Indexer::do_metacpan_lookup() . We could create an accessor that sets a threshold on how many modules to search on in @search_by_archives . If the number of files we need to look up exceeds the threshold, then we need to loop over the MetaCPAN search logic in order to get everything we need. The accompanying test could inject 2 files into the $tmpdir and then use a very low threshold (like 1 archive) in order to force the looping behaviour. If both releases are found in $orepan->_metacpan_lookup then we have a green light.

Cannot Reproduce

One one hand, the issue’s complexity seemed to be medium, exactly what I felt able to solve by the end of the month. On the other hand, the description smelled of micromanagement: the steps to fix the issue were explained in detail, but the issue itself wasn’t given much focus.

As usually, I wanted to first reproduce the problem; then write a failing test for it; and then make the test pass by fixing the code. I created several CPAN mirrors, but I wasn’t able to reproduce the problem: I thought 1000 would be the threshold, as both MetaCPAN::Client::Request and MetaCPAN::Client mention

size => 1000,

But MetaCPAN lookup worked for me even with 1020 distributions. So, after some hesitation, I decided to just follow the instructions.

The code that processed the lookup consisted of two consecutive loops. The first loop gathered information from all the releases, while the second one iterated over the modules corresponding to the releases. The chunked processing just wrapped both the loops with a simple

while (@search_by_archives) { my @search_by_archives_chunk = splice @search_by_archives, 0, $self->metacpan_lookup_size;

As specified, the test needed a way to specify the threshold. Have you noticed the metacpan_lookup_size in the previous snippet? Yes, that’s it. The default is set to 200, but the tests uses 1.

my $orepan = OrePAN2::Indexer->new( directory => $tmpdir, metacpan => 1, metacpan_lookup_size => 1, );

The test passed (but it didn’t fail with the old code, either), so I created a pull requested and asked for proper testing including the verification that the old issue was fixed where reproducible. The pull request was later merged, but I’m still not sure it’s really fixed the original problem. Anyone able to reproduce the failure in the older version (0.45) being fixed in 0.46?

The Newton Tube Experiment

The situation reminds me of our physics teacher at the high school: “Today, I’m going to show you the famous Newton Tube experiment. I have the tube here, it contains a ball and a feather. In the first part of the experiment, the tube is filled with air, and you can see that the ball falls faster than the feather. In the second part of the experiment, we should pump the air out of the tube and see the feather fall as fast as the ball. Unfortunately, my pump is broken, so we’ll skip this part. In the third part, the air is let back into the tube, and you can see the ball fall faster again. We’ve seen two thirds of the famous Newton Tube experiment.”

We had to believe, but we wanted to really see.