A lot of the advice given in Modern Perl: the Book is advice learned the hard way, whether through making my own mistakes, debugging code written on my teams, or reviewing code written by other people to help novices become better programmers. After a decade-plus of this experience, I think I've developed a good sense of what people find confusing and what problems rarely occur in practice.

The pervasive use of global variables? It'll eventually catch up to you. Variables popping into existence upon use, not declaration? It'll cause problems far sooner than you ever expect.

Clobbering $_ at a distance inside a map block? It happened to me the other day. Yes, it surprised me too.

I've been attaching the Perl search bindings Lucy to a document processing engine. As part of the processing stage, my code adds documents to the search index. The index schema keeps track of specific fields, and it's denormalized slightly to decouple the document database from the search index. The code to add a document to the index creates a hash from method calls on each document object. That's where things started to go wrong:

sub add_entries { my ($self, $entries) = @_; my $index = $self->index; for my $entry (@$entries) { my $fields = { map { $_ => scalar $entry->$_() } keys %index_fields }; $index->add_doc( $fields ); } $index->commit; $self->clear_index; }

I noticed things went wrong when my test bailed out with strange errors. Lucy was complaining about getting a hash key of '' , the empty string. I was certain that %index_fields was correct.

While most of the methods called are simply accessors for simple object properties, these document objects have a method called short_content() :

sub short_content { my $self = shift; my $meth = length $self->content > $MIN_LENGTH ? 'content' : 'summary'; my $content = $self->$meth(); return unless defined $content; my $splitter = Lingua::Sentence->new( 'en' ); my $total_length = 0; my @sentences; for my $sentence ($splitter->split_array( $content )) { push @sentences, $sentence; $total_length += length $sentence; # must be a big mess, if this is true return $self->summary if @sentences == 1 and $total_length > $MAX_SANE_LENGTH; last if $total_length > 0.65 * $MAX_LENGTH; } if (@sentences) { my $text = join ' ', @sentences; return $text if length $text > $MAX_SENTENCE_LENGTH && $text =~ /\S/; } return substr $content, 0, $MAX_SHORT_CONTENT_LENGTH; }

A document may have a summary. It has content. short_content() returns a slice of the first significant portion of either, depending on which exists. While it's not the most detailed portion of the document, it's the earliest significant portion of the document, and it's demonstrably the best portion to index as a summary. (Thank you, inverted pyramid.)

The rest of this method attempts to break the short content at a sentence boundary, so as not to cut it off in the middle of a word or thought.

Nothing in that method obviously clobbers $_ , but something called from it apparently does. (I wonder if Lingua::Sentence or one of its dependencies reads from a file or performs a substitution.) Regardless, I protected my precious hash key with a little defensive programming:

my $fields = { map { my $field = $_; $field => scalar $entry->$field() } keys %index_fields };

... and all was well.

While this has been a very rare occurrence in 13+ years of Perl 5 programming, the trap is particularly insidious. The more work Perl does within that map block, the greater the potential for action at a distance. Furthermore, the better my abstractions—the more behavior hidden behind that simple method call—the greater my exposure to this type of bug.

Throw in things like dependency injection and Moose lazy attributes where you don't have to manage the dependency graph of your data initialization and flow yourself (generally a good thing) and you can't tell what's going to happen where or when.

If my instincts are correct, and something reads from a file somewhere such that only the first call to construct a Lingua::Sentence object clobbers $_ , the point is doubly true.