For April, I was assigned AnyEvent::ForkManager, which claims to provide an interface similar to Parallel::ForkManager, but compatible with AnyEvent. The module had some CPAN testers’ failures as well as an issue reported on GitHub, so I tried to fix it. I wasn’t quite successful, though.

The issue reported that tests for the module hung on MSWin. At work, I use Cygwin, so I tried to install the module there to see how the Linux/MSWin hybrid would do. I was able to install all prerequisites, including the main one, AnyEvent, “a framework to do event-based programming”. Nevertheless, the tests for the module itself got stuck, even if in a different place than reported in the issue.

I sprinkled the code with debugging messages (Basic debugging checklist #2) to discover the following line doesn’t return:

isnt $$, $pm->manager_pid, 'called by child';

At first, I thought that manager_pid was the problem, so I extracted the call from the statement:

my $mpid = $pm->manager_pid; isnt $$, $mpid, 'called by child';

Surprisingly, $mpid was populated correctly, it was the isnt that didn’t return. It seemed very suspicious: it’s used in all the test suits on CPAN, it shouldn’t cause problems! Or, maybe, the isnt wasn’t the isnt I thought. I checked the dependencies, and discovered Test::SharedFork which defines its own testing subroutines. Adding some debugging output to it revealed the real problem in the constructor of a Test::SharedFork::Store::Locker object:

flock $store->{fh}, LOCK_EX or die $!;

The flock was waiting for the exclusive lock infinitely. Just for curiosity, I inserted the following before the problematic line:

use Data::Dumper; $Data::Dumper::Deparse = 1; warn Dumper($store);

Strangely, not only was I able to explore the structure, but all the tests passed. “Race condition!” thought I and tried to replace the lines with Time::HiRes::usleep(200) . The tests were still passing, but when I lowered the value, they started to get hung again.

Race conditions appear only sometimes, so I tried running the test suite 50 times on my Linux desktop. It failed 7 times with the following detail:

Interrupted system call at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 104. Can't use an undefined value as a HASH reference at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 51. END failed--call queue aborted at xt/nonblocking.t line 104.

On my laptop, the failures were less frequent (about 2/50), and sometimes, the message was different:

Interrupted system call at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 104. Magic number checking on storable file failed at /usr/lib/perl5/5.18.1/x86_64-linux-thread-multi/Storable.pm line 398, at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 51. END failed--call queue aborted at xt/nonblocking.t line 104.

Line 104 in Store.pm is the flock line shown above.