$\begingroup$

Preview and comparative results

The implementation below may be not the most "minimal" one, because I don't use any of the built-in functionality ( DictionaryLookup with patterns, Graph -related functions, etc), except the core language functions. However, it uses efficient data structures, such as Trie, linked lists, and hash tables, and arguably maximally avoids the overheads typical in Mathematica programming. The combined use of Trie, linked lists, and recursion allows the main function to copy very little. The use of trie data structure allows me to be completely independent of the system DictionaryLookup function.

Why is this critical here? Because the nature of the problem makes only a single last letter important for the next traversal step, and constructing the whole word (containing all previous letters) just to check that it exists is a waste, and this is arguably the reason why other solutions are both much slower and do not scale so well. Also, the preprocessing step, while rather costly (takes about 6 seconds on my machine), has to be done only once, to initialize the "boggle engine" (moreover, the resulting trie can be stored in e.g. .mx file for later reuse, avoiding this overhead for subsequent uses), while in other posted solutions some preprocessing has to be done for every particular board.

The main message I want to deliver is that, for the top-level Mathematica code, the choice of efficient data structures is crucial. Our Mathematica programming instincts demand that we reuse as much of the built-in functionality as possible, but one always has to question how well the existing functionality matches the problem. In this particular case, my opinion is that neither the built-in Graph - related functions nor the DictionaryLookup with patterns bring much to the table. To the opposite, these functions force us to use unnatural for this problem data representations and/or algorithms, and this is what leads to the slowdowns. I may be over-emphasizing this point, but this was exactly the essence of the question.

Now, some timing comparisons (note that for the solution of @R.M., I had to include the pieces defining adjnodes , letters and dict variables, into the timing measurements):

Board 4x4 (the original one): Pillsy 3.3 sec R.M. 1.4 sec L.S. 0.04 sec

Board 5x5: "E I S H R B D O I O T R O E X Z U Y Q S I A S U M" Pillsy 18.8 sec R.M. 7.6 sec L.S. 0.05 sec

Board 7x7 "E I E G E O T A O B A U R A N E I P L A Y O O I I C A T I I F U N L A S T I N G E W U H L E O X S" Pillsy 373.8 sec R.M. 191.5 sec L.S. 0.18 sec



So, you can see that for larger boards, the difference between the running times is even more dramatic, hinting that the solutions have different computational complexities.

I took the trouble to perform and present all these timings because I think that this problem is an important counterexample to the "conventional wisdom" to favor shorter implementations utilizing built-ins over the hand-written top-level mma code. While I agree that in general this is a good strategy, one has to always examine the case at hand. To my mind, this problem presents one notable exception to this rule.

Implementation

The following solution will not use Mathematica graphs, but will be about 100 times faster (than the timings you cite), and will rely on this post. I will borrow a function which builds the word tree from there:

ClearAll[makeTree]; makeTree[wrds : {__String}] := makeTree[Characters[wrds]]; makeTree[wrds_ /; MemberQ[wrds, {}]] := Prepend[makeTree[DeleteCases[wrds, {}]], {} -> {}]; makeTree[wrds_] := Reap[If[# =!= {}, Sow[Rest[#], First@#]] & /@ wrds, _, #1 -> makeTree[#2] &][[2]]

Its use is detailed in the mentioned post. Now, here is a helper function which will produce rules for vertex number to letter conversion, and adjacency rules:

Clear[getLetterAndAdjacencyRules]; getLetterAndAdjacencyRules[letterMatrix_?(MatrixQ[#, StringQ] &)] := Module[{a, lrules, p, adjRules}, lrules = Thread[Range[Length[#]] -> #] &@Flatten[letterMatrix]; p = ArrayPad[ Partition[Array[a, Length[lrules]], Last@Dimensions@letterMatrix], 1 ]; adjRules = Flatten[ ListConvolve[{{1, 1, 1}, {1, 2, 1}, {1, 1, 1}}, p] /. Plus -> List /. {left___, 2*v_, right___} :> {v -> {left, right}} /. a[x_] :> x]; Map[Dispatch, {lrules, adjRules}] ];

It is pretty ugly but it does the job. Next comes the main function, which will find all vertex sequences which result in valid dictionary words:

EDIT

Apparently, there is a problem with Module -generated inner functions. I used Module in getVertexSequences initially, but, because in my benchmarks I happened to use a previous incarnation of it with a different name (where I did not yet modularize the inner functions), I did not see the difference. The difference is an order of magnitude slow-down. Therefore, I switched to Block , to get back the performance I claimed (You can replace back the Block with Module to observe the effect). This is likely related to this issue, and is something anyone should be aware of IMO, since this is quite insidious.

END EDIT

Clear[getVertexSequences]; getVertexSequences[adjrules_, letterRules_, allTree_, n_] := Block[{subF, f, getWordsForStartingVertex}, (* A function to extract a sub-tree *) subF[v_, tree_] := With[{letter = v /. letterRules}, With[{res = letter /. tree}, res /; res =!= letter]]; subF[_, _] := {}; (* Main function to do the recursive traversal *) f[vvlist_, {{} -> {}, rest___}] := f[Sow[vvlist], {rest}]; f[_, {}] := Null; f[vvlist : {last_, prev_List}, subTree_] := Scan[ f[{#, vvlist}, subF[#, subTree]] &, Complement[last /. adjrules, Flatten[vvlist]] ]; (* Function to post-process the result *) getWordsForStartingVertex[v_] := If[# === {}, #, Reverse[Map[Flatten, First@#], 2] ] &@Reap[f[{v, {}}, subF[v, allTree]]][[2]]; (* Call the function on every vertex *) Flatten[Map[getWordsForStartingVertex, Range[n]], 1] ]

At the heart of it, there is a recursive function f , which acts very simply. The vvlist variable is a linked list of already visited vertices. The second argument is a sub-tree of the main word tree, which corresponds to the sequence of already visited vertices (converted to letters. To understand better what the sub-tree is, see the mentioned post). When the sub-tree starts with {} -> {} , this means (by the way word tree is constructed), that the sequence of vertices corresponds to a valid word, so we record it. In any case, if the subtree is not {} , we Scan our function recursively on adjacent vertices, removing from them those we already visited.

The final functions we need are the one to convert vertex sequences to words, and the one to construct the trie data structure. Here they are:

Clear[wordsFromVertexSequences]; wordsFromVertexSequences[vseqs_List, letterRules_] := Map[StringJoin, vseqs /. letterRules]; ClearAll[getWordTree]; getWordTree[minLen_Integer: 1, maxLen : (_Integer | Infinity) : Infinity] := makeTree[ Select[ToLowerCase@DictionaryLookup["*"], minLen <= StringLength[#] <= maxLen &]];

The function to bring this all together:

ClearAll[getWords]; getWords[board_String, wordTree_] := getWords[ToLowerCase@ImportString@board, wordTree]; getWords[lboard_, wordTree_] := Module[{lrules, adjrules}, {lrules, adjrules} = getLetterAndAdjacencyRules[lboard ]; wordsFromVertexSequences[ getVertexSequences[adjrules, lrules, wordTree, Times @@ Dimensions[lboard]], lrules ] ];

Illustration

First, construct a full tree of all words in a dictionary. This preprocessing step can take a little while:

largeTree = getWordTree[];

Now, construct the word matrix:

wmat = ToLowerCase@ImportString@ "F X I E A M L O E W B X A S T U"

{{"f", "x", "i", "e"}, {"a", "m", "l", "o"}, {"e", "w", "b","x"}, {"a", "s", "t", "u"}}

Next, construct the rules for vertex-to-letter conversion and adjacency rules:

({lrules,adjrules} = getLetterAndAdjacencyRules[wmat])//Short[#,3]&

{Dispatch[{1->f,2->x,3->i,4->e,5->a,6->m,7->l,8->o,9->e,10->w,11->b, 12->x,13->a,14->s,15->t,16->u},-DispatchTables-], Dispatch[{1->{2,5,6},<<14>>,16->{11,12,15}},<<1>>]}

We are now ready to use our function:

(seqs = getVertexSequences[adjrules,lrules,largeTree,16])//Short//AbsoluteTiming

{0.0185547,{{1,5},{1,5,2},{1,5,6,9},{1,6},<<89>>,{15,14}, {15,16,11},{15,16,11,14},{15,16,12}}}

Note that it took very little time to get the result. We can finally convert it to words:

wordsFromVertexSequences[seqs,lrules]//Short

{fa,fax,fame,fm,xi,xml,xl,<<84>>,twas,tb,ts,tub,tubs,tux}

The way to call a final function:

(* Do this only once per session *) $largeTree = getWordTree[3]; board = ToLowerCase@ImportString@"F X I E A M L O E W B X A S T U" getWords[board, $largeTree]

{fax,fame,xml,imf,eli,elm,elma,<<59>>,stub,twa,twa,twas,tub,tubs,tux}

(note that the result differs from that in illustration section, since I am now using the word tree with words with less than 3 letters excluded - using the $largeTree rather than largeTree now).

Discussion

Of course, I was a bit cheating in the sense that the preprocessing time takes a while, but this has to be done only once. My main point is that I think, the Trie data structure (my interpretation of it) is the right one here, and coupled with linked lists and hash tables ( Dispatch -ed rules), it leads to a rather simple solution. The essence of the solution is expressed in function f , which is just a few lines long and more or less self-documenting. And, also, the solution itself turns out quite fast (especially given that this uses just the top-level mma, no packed arrays, Compile , etc).

EDIT 2

To address the question in your edit, and generally the question on applicability of Mathematica's new Graph functionality to this problem: I think, that while you can use new Graphs to solve the problem, it is not a natural choice here. I may be wrong, of course, but these are my reasons:

The graph traversal you need for this problem does not fit directly into either one of DepthFirstScan and BreadthFirstScan built-in graph-traversal functions. Rather, it is a kind of enumeration of all possible depth-first traversals starting at a given vertex.

and built-in graph-traversal functions. Rather, it is a kind of enumeration of all possible depth-first traversals starting at a given vertex. Those traversals should stop as soon as it becomes clear that no words can be constructed by going to any of the adjacent vertices. This can be also achieved in DepthFirstScan through the use of Catch and Throw , but it is rather inelegant, and will also induce an overhead.

through the use of and , but it is rather inelegant, and will also induce an overhead. The general ideology of DepthFirstScan and BreadthFirstScan is somewhat similar to a visitor design pattern used for a tree traversal. The idea is that the traversal is done for you, while you have to supply the functions to be called on tree (or graph) nodes. This approach works well when your traversal matches exactly the one implemented by the pattern. For example, most of the time, a tree is traversed depth-first. However, I had many chances to observe (in other languages) that as soon as I have to modify the traversal even slightly, using the tools like that creates more problems than it solves. The main question to ask yourself is this: does you traversal (sequence of visited vertices) depend on the content of the vertices (information you get during the traversal)? If yes, then it is more than likely that custom general traversal functions will not give you a good solution, because you then need more control over the way traversal is performed. The whole idea of visitor pattern (used for tree traversals) and the like is that you can separate the traversal itself from the information-processing during the traversal, and it's just not true for data-dependent traversals, where you can not really decouple traversal from the data-processing of the tree (or graph) nodes.

I think that we should separate cases where graphs represent just a useful abstraction to think about the problem, from those where the problem can be solved by means of more or less standard graph-theoretical functionality (in particular that present in Mathematica), once it is reformulated in an appropriate way. The case at hand clearly looks to me like belonging to the first category.