Some months ago, during a boring Twitter reading session, I’ve spotted a tweet from Reto Meier:

This looked like a really interesting challenge and a way to have fun with Android development and Algorithms. Also, there was a comment from Hoi Lam:

adding a little of ML Kit to it was a really good point, it would allow capturing the input for the algorithm in a modern AI-fashioned way.

So I decided to take this journey that I completed roughly in a couple of weeks

Then I decided to dedicate to the project more time, to learn better ML Kit and wrote this article to make it useful for others devs to come.

During the journey, I also spotted a bug here in the MK Kit android sample app.

Let’s analyze together this interesting journey:

Design Phase

For the sake of a demo it seems reasonable to create a single Activity app without thinking too much about the perfect architecture. Just give the user a button to open the camera to take a picture of a table for Word Search/Boogle or to select an image from the phone storage and that’s it.

Simple Activity simple UI.

After the user confirms the picture, the algorithm should start to elaborate it looking for words.

This business logic can be included in a ViewModel, in this specific case we also need a context, to use ML Kit, so we can use an AndroidViewModel. Let’s call this WordSearchAiViewModel.

WordSearchAiViewModel will receive the image from the Activity and using the services of ML Kit will find valid characters then will pass them to a Processor (CloudDocumentTextRecognitionProcessor) containing the core of the WordSearch algorithm. WordSearchAiViewModel will also expose two LiveData to get the list of the found words and the bounding boxes of the perceived characters useful to display to the user the correctness of the recognition.

The LiveData with the found words and the bounding boxes are observed by the Activity so it can update the UI accordingly.

Get Text detection using ML Kit

To configure and use ML Kit for Android we can refer to the guide at

In particular in our case we only need to set the cloud project and use the text recognition features.

When the ViewModel will receive the image from the Activity it will instantiate a FirebaseVisionDocumentTextRecognizer calling

then it can pass the image to it to make ML Kit perform the text recognition

Just add a listenermethod that will handle the successful result of the detection or the eventual failure.

In this case we added two methods one to elaborate the text found and the other to elaborate the graphics overlay

The postWordsFound will take the text coming from the detector and pass it to the processor (CloudDocumentTextRecognitionProcessor) that will activate the Word Search algo using a dictionary preloaded (in this case from the resources).

Then the found words will be posted to the LiveData.

A look into the processor

The processor (CloudDocumentTextRecognitionProcessor) takes the recognized text and convert it in a clean format for the algo.

In this case we will change all the characters in lower case, split everything in lines, drop the last line because an empty line is generated from the Firebase response, remove all the empty spaces and at the end convert in an array of chars.

To look for words we should define a dictionary, for this sample we will load it from a text file contained in the resources.

The dictionary is just trimmed by all the word less than 3 characters assuming we are interested only to that kind of words.

Then with these inputs, we can call the algo, in this case, the processor implements the word search algorithm using Kotlin delegation. The Object WordSearchLinear contains the effective logic for the algo.

The ALGO

We will assume that a word can be found starting by any letter in the table and going towards only one direction, so we for each letter we can go to

left

top

right

bottom

top/left

top/right

bottom/left

bottom/right

For simplicity, once we choose a direction we cannot change it. We continue to look for a valid word ( one contained in our dictionary ) until the end of the board is reached for that direction.

The algorithm to look for words in the table is based on the Trie data structure.

A Trie is a data structure very good at storing words and optimal to look for words inside it. It is just a tree of characters where every path from the root to a leaf is a word. So in this tree, each stored word having a common prefix is composed of the same nodes characters for the prefix.

If you want to better know the Trie, I suggest you to watch

or to read this complete post at

https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014

The Algorithm developed

This algorithm performs a Deep Firs Search for each character in the WordSearch board. If we see the findWords method, the first 2 loops are there to loop through each character (one loop for the row, one for the column) the inner loop is needed to go in all directions considered.

The Directions are encoded with the number of steps to take the action. To move from a cell to another one we can loop through directionX, directionY. In the first time we have [0, 1] so 0 step in horizontal, 1 (bottom) in vertical and so on.

After defining from where to start to what direction to follow, we can start the DFS to look for words contained into the dictionary loaded using the power of the Trie. Trie just filled with all the words from the dictionary.

To perform a Kotlin optimization of the DFS we can use tailrec keyword.

Note also that in the case as we only go in one direction at time, we don’t need to store that a node has been visited.

Test with a simple image

We can test this algorithm and the app with a simple board like this