At tonight's Chicago Perl Mongers Office Hours, Ray came up with an interesting problem. While testing all of CPAN for CPAN Testers, how do you detect when a test is hanging and kill it before it takes down the entire machine? How do you simply kill a test that is taking too long? And how do you do it without having a wholly separate watchdog program?

Ray's using Parallel::ForkManager to execute testing jobs in parallel across multiple Perl installs. There are a few ways we could implement timeouts, including IPC::Run's timeout function, or the alarm Perl built-in, but these must all be implemented in the child process. It'd be nicer if we could use the parent process to watch its own children.

Here, then, is the result of that hacking: This code spawns 5 workers at a time to sleep for a random number of seconds between 1 and 20. If the child worker is alive for longer than the 10 second timeout (a 50% chance), it is forcibly killed.

When enough workers have been spawned, we check on all of our workers to see if they've lived long enough. Once a worker has finished, or been killed, and has been reaped, we can then start another worker.

With this code, hopefully we can prevent some of the test suites for CPAN distributions from forcing the tester to reboot their machine.