Update: Changed subroutine to comply with Perl Best Practices

Update2: Removed the prototype from the subroutine.

I’ve always had a problem with recursion. Not with the general theory that a function will call itself, etc – no, that’s easy. The hard part was when I had to deal with complex data structures in Perl (an array- or hashref containing a hash of arrays of hashes, a gazillion levels deep). Well, I guess anyone would have a hard time with that kind of data.

Anyway, in this post I don’t intend to get all complicated explaining all the kinds of recursions out there. If you want that, check this article at wikipedia. What I do want to do is help all of those who are in the situation I was in, by explaining in the simplest way possible how to deal with this scenario.

Let’s start with a need. I have a complex data structure that needs its spaces trimmed on both sides. But since I’m lazy, I’d like my subroutine to modify the data directly, and not return the modified value (pass by reference, not pass by value).

Here’s our data structure:

my $data = [ { key1 => ' trim me! ', key2 => ' trim me too! ', }, [ 'some element to trim ', ' another one ', ], ' a simple string needing trimming ', ];

$data explained : an array containing 3 elements: element 0 is a hashref of keys "key1" and "key2" , element 1 is an arrayref of 2 elements. Element 3 is a simple string. All values have some extra spaces that need trimming (or so they say). We could use whatever number of levels and data types we want (except for anonymous subroutines, I guess – let’s not get too complicated).

Now, to trim all that, I want to be able to simply call trim() à la PHP.

trim($data); # note the lack of the lvalue (lvalue = rvalue)

I also want it to accept simple arrays and hashes, and the references thereof: trim(@array); trim(@array); trim(%hash); trim(%hash); trim($string) . After all, I never know what kind of data my colleagues will be working with. Better have it deal with everything.

The logic to do that is this: our subroutine will have to do the trimming (s///g) on scalars only. For that, it has to check if the data it received is a hash, array, etc, and if it is, iterate through each element and trim the value… but only if the element is not itself a hash, array, etc. Found it confusing? No problem, it really is.

In Perl, if I tried to remove the white space from element 0 of my $data variable, it wouldn’t work. The reason being is that if I printed $data->[0] onto the screen, I’d get a funny looking output, something like HASH(0x1004f5f0) . That’s Perl’s way of saying that you have a HASH structure stored in memory position 0x1004f5f0. You can try to trim the spaces off of that string, but it won’t do you any good. The elements of your hash will still be untouched. That’s why you need to de-reference your data structures and dive into them.

To de-reference a structure is simple, just add a % in front of the variable if it’s a hashref, or an @ if it’s an array. But how do you know which is which? Use ref() .

print ref($data->[0]) . "n"; # HASH print ref($data->[1]) . "n"; # ARRAY print ref($data->[3]) . "n"; # empty string, which is false

ref() tells you what kind of data you are dealing with. It returns CODE if you have a closure or anonymous subroutine, but we’re not going there today.

So, now that we know how to identify the type of element we’re going to be working with, we can build our subroutine…

sub trim() { for my $param (@_) { if (ref($param) eq 'ARRAY') { for my $element (@{$param}) { trim($element); } } elsif (ref($param) eq 'HASH') { for my $val (values %{$param}) { trim($val); } } elsif (ref($param) eq 'CODE') { return; } else { $param =~ s/(^s+|s+$)//g; } } }

trim() explained:

We’re working with passing elements by reference instead of by value. This means that the elements themselves will be modified – no need to return any data. The first thing we do is to iterate through all parameters passed to trim() . In a subroutine, parameters (in our case, variables) are populated into the special @_ array, allowing us to call trim($var1, $var2, $var3) if we want.

We iterate through all elements of @_ and verify if they are an Array. If they are, we iterate through each of their elements once, and call trim() again against them. That will handle as many nested arrays we want (or that your computer can handle). Now we have to make it deal with hashes. Same technique – use ref() to see if it’s a hash. If it is, then iterate through each of its key/pair elements. There are several ways to do that. I personally prefer calling keys to get the keys and use them to fetch the values of the hash. The value of the hash is passed to trim() for more validation. We also check to see if we received a sub { } (anonymous subroutine). In that case, we do nothing, just return.

Finally, after handling Arrays, Hashes and Anonymous subroutines, we can set up the actual trimming of the strings. We take the $_[$i] which is the parameter passed and remove the leading and trailing spaces with one neat substitution: ^s+ stands for leading spaces, s+$ stands for trailing spaces, and it’s all joined by the ( | ) (this or that). We only call it once because we’re using the global (g) modifier of the substitution s///g .

And that’s all there is to it!