I struggled with a problem where a given method would return an array of array refs of data, but the order (and sometimes presence) of array ref elements were sometimes slightly different. This is because this code needed to test real data and I could not mock the results. After giving this some thought, I realized I wanted something like the Levenshtein edit distance for data structures. Marcel Grünauer suggested that each element get assgined a unicode character. This solves my problem nicely with the following code ...

(Fair warning, this is a hack)

use strict; use Test::More 'no_plan'; use Test::Differences; use Data::Dumper; use Text::WagnerFischer 'distance'; sub is_almost($$$;$) { my ( $have, $want, $threshhold, $message ) = @_; $message ||= 'The two arrays should be close enough'; unless ( 'ARRAY' eq ref $have and 'ARRAY' eq ref $want ) { require Carp; Carp::confess( "First two arguments to is_almost() must be array refs"); } local $Test::Builder::Level = $Test::Builder::Level + 1; if ( !@$want ) { if ( !@$have ) { pass $message; } else { eq_or_diff $have, $want, $message; } return; } my %char_for; my $index = 1; my ( $have_str, $want_str ) = ( '', '' ); local $Data::Dumper::Indent = 0; local $Data::Dumper::Sortkeys = 1; local $Data::Dumper::Terse = 1; foreach my $element (@$have) { $have_str .= $char_for{ Dumper($element) } ||= chr( $index++ ); } foreach my $element (@$want) { $want_str .= $char_for{ Dumper($element) } ||= chr( $index++ ); } my $distance = distance( $have_str, $want_str ) / @$want; if ( $distance <= $threshhold ) { pass $message; if ($distance) { diag "Distance is $distance"; } } else { eq_or_diff $have, $want, $message; diag "Distance is $distance"; } } my $want = [ [ 1, 'North Beach', 'au', 'city' ], [ 2, 'North Beach', 'us', 'city' ], [ 3, 'North Beach', 'us', 'city' ], [ 4, 'North Beach Hotel', 'us', 'hotel' ], [ 5, 'North Beach', 'us', 'city' ], [ 6, 'North Beach', 'us', 'city' ], ]; my $have = [ [ 1, 'North Beach', 'au', 'city' ], [ 2, 'North Beach', 'us', 'city' ], [ 3, 'North Beach', 'us', 'city' ], [ 4, 'North Beach Hotel', 'us', 'hotel' ], [ 5, 'North Beach', 'us', 'city' ], [ 6, 'North Beach', 'us', 'city' ], ]; is_almost $have, $want, .20; $have = [ [ 2, 'North Beach', 'us', 'city' ], [ 3, 'North Beach', 'us', 'city' ], [ 4, 'North Beach Hotel', 'us', 'hotel' ], [ 5, 'North Beach', 'us', 'city' ], [ 6, 'North Beach', 'us', 'city' ], ]; is_almost $have, $want, .20; $have = [ [ 2, 'North Beach', 'us', 'city' ], [ 3, 'North Beach', 'us', 'city' ], [ 4, 'North Beach Hotel', 'us', 'hotel' ], [ 5, 'North Beach', 'us', 'city' ], [ 6, 'North Beach', 'us', 'city' ], [ 1, 'North Beach', 'au', 'city' ], ]; is_almost $have, $want, .20; __END__ ok 1 - The two arrays should be close enough ok 2 - The two arrays should be close enough # Distance is 0.166666666666667 not ok 3 - The two arrays should be close enough # Failed test 'The two arrays should be close enough' # at almost.pl line 90. # +----+------------------------------+----+------------------------------+ # | Elt|Got | Elt|Expected | # +----+------------------------------+----+------------------------------+ # | | * 0|1,North Beach,au,city * # | 0|2,North Beach,us,city | 1|2,North Beach,us,city | # | 1|3,North Beach,us,city | 2|3,North Beach,us,city | # | 2|4,North Beach Hotel,us,hotel | 3|4,North Beach Hotel,us,hotel | # | 3|5,North Beach,us,city | 4|5,North Beach,us,city | # | 4|6,North Beach,us,city | 5|6,North Beach,us,city | # * 5|1,North Beach,au,city * | | # +----+------------------------------+----+------------------------------+ # Distance is 0.333333333333333

It seems many of the "edit distance" modules struggle with unicode, so I played with different ones until I had one which led to results I considered vaguely satisfactory.

This is the first pass at a rough, rough hack. Suggestions welcome.