People occasionally ask for practical examples of macros when I lament the lack of macros in Perl. While I'm usually pleased at the degree to which Perl lets me design code to get and stay out of my way, sometimes its abstractions just aren't quite enough enough to remove all of the duplication available.

(I've been refactoring one of our business projects in preparation for another round of deployment in the next couple of weeks. We could launch without these improvements, but administrative work took almost two weeks longer than the afternoon I'd planned for it, so I decided it was worth my time to reduce technical friction so that further improvements are easier. More users means more work, so why not accelerate that work while I have the chance? I have another longer technical post to write to praise the use of Moose roles for a plugin system and to show off the stupidly-great task launcher, but that's for later.)

I found myself writing two code couplets that were similar enough they triggered my "Hey, refactor away this duplication!" alert. It's extra sensitive, because I know I'll have a few more couplets like this in the very near future:

while (my $stock = $stock_rs->next) { my $pe_update = $self->analyze_pe( $stock ); $stock_txn->add( $pe_update ) if $pe_update; my $cash_yield_update = $self->analyze_cash_yield( $stock ); $analysis_txn->add( $cash_yield_update ) if $cash_yield_update; }

The *_txn variables contain objects representing deferred and scoped SQL updates. I'll talk about that at YAPC::NA 2012 in When Wrong is Better.

The general pattern is this: for every stock in the appropriate resultset, call a method in this plugin. The method will return nothing if it fails (or has nothing to do) or it will return data to be added to the appropriate transaction. I have at least two types of transactions available here at the moment, and may have more later: one transaction updates stock data and the other updates analysis data.

I have several options. I could rework the data model so that this stage always only updates one transaction, in which the loop body could instead look like:

{ for my $method (qw( analyze_pe analyze_cash_yield )) { next unless my $result = $self->$method( $stock ); $txn->add( $result ); } }

This technique of hoisting the variants into an ad hoc data structure and using existing looping techniques works well sometimes. (I use it in other parts of the system.) It's relatively easy to expand, even though it moves interesting information ("I'm calling the analyze_pe method!") to a place where tools have more trouble finding it. (I search for >analyze_pe when I want to find method calls.) You may have used something similar to define several parametric methods at BEGIN time. It's the same type of pattern, and while Perl provides most of the tools necessary to allow this, it doesn't natively express this pattern well.

I could also change the transaction object's add() method to do nothing when it receives an empty list of arguments. I like that in some ways, but I don't like it in others. I've come down on the side of keeping its invariant (it always takes only one scalar as an object) pure for now. If I change it to take a list of updates, that might be the right time to reconsider this.

What I notice in the code as it stands right now is that the individual variables $pe_update and $cash_yield_update are synthetic variables. They only exist to support the code as written; they're not necessary for the algorithm. If I were to modify this code but only this code, I'd really rather write:

{ ADD_TXN_WITH( $self, analyze_pe, $stock, $stock_txn ); ADD_TXN_WITH( $self, analyze_cash_yield, $stock, $analysis_txn ); }

... though that syntax doesn't thrill me either. The clearest possibility I see right now is:

{ $stock_txn->add( SKIP unless $self->analyze_pe( $stock ) ); $analysis_txn->add( SKIP unless $self->analyze_cash_yield( $stock ) ); }

... where SKIP does some magic to move to the next statement, not the next loop iteration. (I have some ideas how to write XS to make this work, but that creepy yak needs a shave and some mouthwash.)

The second best option right now is adding a function or method as indirection to encapsulate the synthetic code. I'd rather avoid synthetic code, but at least it reduces the possibility of copy and paste bugs.

For now, with only two steps in this analysis, I'm leaving it as it is. Two repetitions of something this similar set off my refactoring alarm, but I resist the urge for refactorings this small until I see three instances of near-duplicate code.