For a dictionary speedup, there is one general transformation/process you can do to greatly reduce the dictionary comparisons ahead of time.

Given that the above grid contains only 16 characters, some of them duplicate, you can greatly reduce the number of total keys in your dictionary by simply filtering out entries that have unattainable characters.

I thought this was the obvious optimization but seeing nobody did it I'm mentioning it.

It reduced me from a dictionary of 200,000 keys to only 2,000 keys simply during the input pass. This at the very least reduces memory overhead, and that's sure to map to a speed increase somewhere as memory isn't infinitely fast.

Perl Implementation

My implementation is a bit top-heavy because I placed importance on being able to know the exact path of every extracted string, not just the validity therein.

I also have a few adaptions in there that would theoretically permit a grid with holes in it to function, and grids with different sized lines ( assuming you get the input right and it lines up somehow ).

The early-filter is by far the most significant bottleneck in my application, as suspected earlier, commenting out that line bloats it from 1.5s to 7.5s.

Upon execution it appears to think all the single digits are on their own valid words, but I'm pretty sure thats due to how the dictionary file works.

Its a bit bloated, but at least I reuse Tree::Trie from cpan

Some of it was inspired partially by the existing implementations, some of it I had in mind already.

Constructive Criticism and ways it could be improved welcome ( /me notes he never searched CPAN for a boggle solver, but this was more fun to work out )

updated for new criteria

#!/usr/bin/perl use strict; use warnings; { # this package manages a given path through the grid. # Its an array of matrix-nodes in-order with # Convenience functions for pretty-printing the paths # and for extending paths as new paths. # Usage: # my $p = Prefix->new(path=>[ $startnode ]); # my $c = $p->child( $extensionNode ); # print $c->current_word ; package Prefix; use Moose; has path => ( isa => 'ArrayRef[MatrixNode]', is => 'rw', default => sub { [] }, ); has current_word => ( isa => 'Str', is => 'rw', lazy_build => 1, ); # Create a clone of this object # with a longer path # $o->child( $successive-node-on-graph ); sub child { my $self = shift; my $newNode = shift; my $f = Prefix->new(); # Have to do this manually or other recorded paths get modified push @{ $f->{path} }, @{ $self->{path} }, $newNode; return $f; } # Traverses $o->path left-to-right to get the string it represents. sub _build_current_word { my $self = shift; return join q{}, map { $_->{value} } @{ $self->{path} }; } # Returns the rightmost node on this path sub tail { my $self = shift; return $self->{path}->[-1]; } # pretty-format $o->path sub pp_path { my $self = shift; my @path = map { '[' . $_->{x_position} . ',' . $_->{y_position} . ']' } @{ $self->{path} }; return "[" . join( ",", @path ) . "]"; } # pretty-format $o sub pp { my $self = shift; return $self->current_word . ' => ' . $self->pp_path; } __PACKAGE__->meta->make_immutable; } { # Basic package for tracking node data # without having to look on the grid. # I could have just used an array or a hash, but that got ugly. # Once the matrix is up and running it doesn't really care so much about rows/columns, # Its just a sea of points and each point has adjacent points. # Relative positioning is only really useful to map it back to userspace package MatrixNode; use Moose; has x_position => ( isa => 'Int', is => 'rw', required => 1 ); has y_position => ( isa => 'Int', is => 'rw', required => 1 ); has value => ( isa => 'Str', is => 'rw', required => 1 ); has siblings => ( isa => 'ArrayRef[MatrixNode]', is => 'rw', default => sub { [] } ); # Its not implicitly uni-directional joins. It would be more effient in therory # to make the link go both ways at the same time, but thats too hard to program around. # and besides, this isn't slow enough to bother caring about. sub add_sibling { my $self = shift; my $sibling = shift; push @{ $self->siblings }, $sibling; } # Convenience method to derive a path starting at this node sub to_path { my $self = shift; return Prefix->new( path => [$self] ); } __PACKAGE__->meta->make_immutable; } { package Matrix; use Moose; has rows => ( isa => 'ArrayRef', is => 'rw', default => sub { [] }, ); has regex => ( isa => 'Regexp', is => 'rw', lazy_build => 1, ); has cells => ( isa => 'ArrayRef', is => 'rw', lazy_build => 1, ); sub add_row { my $self = shift; push @{ $self->rows }, [@_]; } # Most of these functions from here down are just builder functions, # or utilities to help build things. # Some just broken out to make it easier for me to process. # All thats really useful is add_row # The rest will generally be computed, stored, and ready to go # from ->cells by the time either ->cells or ->regex are called. # traverse all cells and make a regex that covers them. sub _build_regex { my $self = shift; my $chars = q{}; for my $cell ( @{ $self->cells } ) { $chars .= $cell->value(); } $chars = "[^$chars]"; return qr/$chars/i; } # convert a plain cell ( ie: [x][y] = 0 ) # to an intelligent cell ie: [x][y] = object( x, y ) # we only really keep them in this format temporarily # so we can go through and tie in neighbouring information. # after the neigbouring is done, the grid should be considered inoperative. sub _convert { my $self = shift; my $x = shift; my $y = shift; my $v = $self->_read( $x, $y ); my $n = MatrixNode->new( x_position => $x, y_position => $y, value => $v, ); $self->_write( $x, $y, $n ); return $n; } # go through the rows/collums presently available and freeze them into objects. sub _build_cells { my $self = shift; my @out = (); my @rows = @{ $self->{rows} }; for my $x ( 0 .. $#rows ) { next unless defined $self->{rows}->[$x]; my @col = @{ $self->{rows}->[$x] }; for my $y ( 0 .. $#col ) { next unless defined $self->{rows}->[$x]->[$y]; push @out, $self->_convert( $x, $y ); } } for my $c (@out) { for my $n ( $self->_neighbours( $c->x_position, $c->y_position ) ) { $c->add_sibling( $self->{rows}->[ $n->[0] ]->[ $n->[1] ] ); } } return \@out; } # given x,y , return array of points that refer to valid neighbours. sub _neighbours { my $self = shift; my $x = shift; my $y = shift; my @out = (); for my $sx ( -1, 0, 1 ) { next if $sx + $x < 0; next if not defined $self->{rows}->[ $sx + $x ]; for my $sy ( -1, 0, 1 ) { next if $sx == 0 && $sy == 0; next if $sy + $y < 0; next if not defined $self->{rows}->[ $sx + $x ]->[ $sy + $y ]; push @out, [ $sx + $x, $sy + $y ]; } } return @out; } sub _has_row { my $self = shift; my $x = shift; return defined $self->{rows}->[$x]; } sub _has_cell { my $self = shift; my $x = shift; my $y = shift; return defined $self->{rows}->[$x]->[$y]; } sub _read { my $self = shift; my $x = shift; my $y = shift; return $self->{rows}->[$x]->[$y]; } sub _write { my $self = shift; my $x = shift; my $y = shift; my $v = shift; $self->{rows}->[$x]->[$y] = $v; return $v; } __PACKAGE__->meta->make_immutable; } use Tree::Trie; sub readDict { my $fn = shift; my $re = shift; my $d = Tree::Trie->new(); # Dictionary Loading open my $fh, '<', $fn; while ( my $line = <$fh> ) { chomp($line); # Commenting the next line makes it go from 1.5 seconds to 7.5 seconds. EPIC. next if $line =~ $re; # Early Filter $d->add( uc($line) ); } return $d; } sub traverseGraph { my $d = shift; my $m = shift; my $min = shift; my $max = shift; my @words = (); # Inject all grid nodes into the processing queue. my @queue = grep { $d->lookup( $_->current_word ) } map { $_->to_path } @{ $m->cells }; while (@queue) { my $item = shift @queue; # put the dictionary into "exact match" mode. $d->deepsearch('exact'); my $cword = $item->current_word; my $l = length($cword); if ( $l >= $min && $d->lookup($cword) ) { push @words, $item; # push current path into "words" if it exactly matches. } next if $l > $max; # put the dictionary into "is-a-prefix" mode. $d->deepsearch('boolean'); siblingloop: foreach my $sibling ( @{ $item->tail->siblings } ) { foreach my $visited ( @{ $item->{path} } ) { next siblingloop if $sibling == $visited; } # given path y , iterate for all its end points my $subpath = $item->child($sibling); # create a new path for each end-point if ( $d->lookup( $subpath->current_word ) ) { # if the new path is a prefix, add it to the bottom of the queue. push @queue, $subpath; } } } return \@words; } sub setup_predetermined { my $m = shift; my $gameNo = shift; if( $gameNo == 0 ){ $m->add_row(qw( F X I E )); $m->add_row(qw( A M L O )); $m->add_row(qw( E W B X )); $m->add_row(qw( A S T U )); return $m; } if( $gameNo == 1 ){ $m->add_row(qw( D G H I )); $m->add_row(qw( K L P S )); $m->add_row(qw( Y E U T )); $m->add_row(qw( E O R N )); return $m; } } sub setup_random { my $m = shift; my $seed = shift; srand $seed; my @letters = 'A' .. 'Z' ; for( 1 .. 4 ){ my @r = (); for( 1 .. 4 ){ push @r , $letters[int(rand(25))]; } $m->add_row( @r ); } } # Here is where the real work starts. my $m = Matrix->new(); setup_predetermined( $m, 0 ); #setup_random( $m, 5 ); my $d = readDict( 'dict.txt', $m->regex ); my $c = scalar @{ $m->cells }; # get the max, as per spec print join ",

", map { $_->pp } @{ traverseGraph( $d, $m, 3, $c ) ; };

Arch/execution info for comparison:

model name : Intel(R) Core(TM)2 Duo CPU T9300 @ 2.50GHz cache size : 6144 KB Memory usage summary: heap total: 77057577, heap peak: 11446200, stack peak: 26448 total calls total memory failed calls malloc| 947212 68763684 0 realloc| 11191 1045641 0 (nomove:9063, dec:4731, free:0) calloc| 121001 7248252 0 free| 973159 65854762 Histogram for block sizes: 0-15 392633 36% ================================================== 16-31 43530 4% ===== 32-47 50048 4% ====== 48-63 70701 6% ========= 64-79 18831 1% == 80-95 19271 1% == 96-111 238398 22% ============================== 112-127 3007 <1% 128-143 236727 21% ==============================

More Mumblings on that Regex Optimization

The regex optimization I use is useless for multi-solve dictionaries, and for multi-solve you'll want a full dictionary, not a pre-trimmed one.

However, that said, for one-off solves, its really fast. ( Perl regex are in C! :) )

Here is some varying code additions:

sub readDict_nofilter { my $fn = shift; my $re = shift; my $d = Tree::Trie->new(); # Dictionary Loading open my $fh, '<', $fn; while ( my $line = <$fh> ) { chomp($line); $d->add( uc($line) ); } return $d; } sub benchmark_io { use Benchmark qw( cmpthese :hireswallclock ); # generate a random 16 character string # to simulate there being an input grid. my $regexen = sub { my @letters = 'A' .. 'Z' ; my @lo = (); for( 1..16 ){ push @lo , $_ ; } my $c = join '', @lo; $c = "[^$c]"; return qr/$c/i; }; cmpthese( 200 , { filtered => sub { readDict('dict.txt', $regexen->() ); }, unfiltered => sub { readDict_nofilter('dict.txt'); } }); }

s/iter unfiltered filtered unfiltered 8.16 -- -94% filtered 0.464 1658% --