Magic Sudoku uses a combination of Computer Vision, Machine Learning, and Augmented Reality to give a magical user experience that “just works” when you point your phone at a Sudoku puzzle. The basic flow of the application is this:

ARKit gets a new frame from the camera.

We use iOS11’s Vision Library to detect rectangles in the image.

If rectangles are found we determine whether they are a Sudoku.

If we find a puzzle we split it up into 81 square images.

Each square is run through a neural network we trained to determine what number (if any) it represents.

Once enough numbers are gathered we use a traditional recursive algorithm to solve the puzzle.

We pass a 3D model representing the solved puzzle back to ARKit to display on top of the original image from the camera.

All of this happens several times each second.

I won’t dwell too much on the ARKit portion, the Sudoku solving algorithm, or the actual Machine Learning model itself (there are plenty of tutorials written about those subjects already).

What was most interesting to me were the practical aspects I learned while training my first machine learning algorithm.

Machine Learning: Lessons Learned

I first learned several ways that wouldn’t work

One of the original reasons I chose a Sudoku solver as our first AR app was that I knew classifying digits is basically the “hello world” of Machine Learning. I wanted to dip my toe in the water of Machine Learning while working on a real-world problem. This seemed like a realistic app to tackle.

Before I set out on training my own model, I tried a few strategies that would have made things a lot easier if they had worked.

The first thing I tried was using an optical character recognition library called SwiftOCR. The problem with using SwiftOCR for my use case was that it is designed for reading strings (like giftcard codes) rather than individual numbers. I also couldn’t get it to differentiate between “this is an empty square” and “I couldn’t read this digit”. After several hours of experimentation it wasn’t doing much better than randomly guessing digits. That’s not a knock on SwiftOCR. It’s a great library; it’s just not well suited for this particular problem.

Second, I moved on to a pre-trained MNIST model that had already been converted to CoreML. MNIST is an open dataset of handwritten digits released in the 1990s. Using a pre-trained model would have been nice because it could have been a drop-in solution. The .mlmodel file is completely self contained and exposes a class in Swift that you can basically use as a black box oracle. Using MNIST was how I reached the milestone of having my first internal prototype that “worked” from start to finish. Unfortunately, the MNIST dataset of handwritten digits is not close enough to computer fonts to generalize well and that first prototype was very finicky and prone to errors.

By this point things were working well enough that I knew I was on the right track. I hoped that if I could train my own Machine Learning model on real-world data extracted from Sudoku puzzles that it would become a lot more accurate and reliable.

Gathering data

The next step was to gather as many examples of Sudoku puzzles as I could. I went to our local Half Price Book Store and bought out their entire stock of Sudoku books.

I had the team at Hatchlings help me rip them apart and I modified the prototype app to upload each of the small squares it scanned to a server instead of running them through the CoreML model.

Our treasure trove of real-world data acquired from the local secondhand book store.

Many hands make for light work

Now I had a problem. After I scanned a wide variety of puzzles from each book, my server had stored about 600,000 images… but they were completely unlabeled.

Simple classification tool

So I cheated a little; I tapped into my secret superpower: thousands of Hatchlings users happy to help the developer of their favorite game.

I made a simple admin tool that let users classify images by pressing the number keys on their keyboard. After putting out a call for help in our Hatchlings Facebook Group, our users made quick work of things and within 24 hours they had classified all 600,000 images!

Admin tool to send mistakes back to the pool

Unfortunately, a small minority had misunderstood the task and there were a significant number of misidentified images.

To correct this I made a second tool that showed you 100 images (all supposed to be the same number) and asked you to click the ones that didn’t match (which then threw them back into the first tool to be reclassified).

After the first pass I had enough verified data that I was able to add an automatic accuracy checker into both tools for future data runs (it would periodically show the user known images and check their work to determine how much to trust their answers going forward).

Over the next few weeks our players classified several more batches of scanned data. By the time we launched the app it was trained on over a million images of Sudoku squares.

I used this dataset in place of the MNIST dataset and used a tutorial to help me craft a neural network using Keras. The results turned out better than I expected: 98.6% accuracy! We’ve been able to improve that to above 99% accuracy in subsequent versions.

Train on real world data

At this point things were working well on the corpus of Sudoku puzzles we had collected from the bookstore. What we didn’t realize was that this was only a small subset of the Sudoku puzzle layouts out there waiting for us in the wild.

Immediately after launch we started receiving reports that our app wasn’t working with on-screen puzzles. Of course! People wanted to try our app but they didn’t have a Sudoku in front of them so they searched on Google for a puzzle to try it out. This is obvious in retrospect, but it blindsided us on launch day.

Problem number one was that our machine learning model was only trained on paper puzzles; it didn’t know what to think about pixels on a screen. I pulled an all nighter that first week and re-trained our model with puzzles on computer screens.

Problem number two was that ARKit only supports horizontal planes like tables and floors (not vertical planes like computer monitors). Solving this was a trickier problem but I did come up with a hacky workaround. I used a combination of some heuristics and FeaturePoint detection to place puzzles on non-horizontal planes.

Our ML model is now trained on blurry images as well

Another unexpected issue we discovered after launch was that ARKit has a fixed (and hard-coded) focal length; it doesn’t autofocus. There is also some variation between devices (things work Ok on my iPhone 7 camera sensor but iPhone 6S has trouble focusing up close). Our “fix” to this was to add blurrycam images to the training set. I wasn’t sure whether it would be able to grok useful info from these images that I couldn’t even read myself... but adding them didn’t seem to affect accuracy of the model much if at all! The current version of the app works well even when the numbers in the image are super blurry.

The cloud

Finally, I had heard a lot of hype about AWS’s GPU Instances for Deep Learning and Google Cloud’s Deep Learning offerings so I tried them out (and also tried the “Heroku for Deep Learning” platform FloydHub).

They worked. But they were slow. My 2016 Macbook Pro running tensorflow-cpu was outperforming the AWS p2.xlarge GPU instance. My suspicion is that the training runs were being bottlenecked by disk not compute.

The cloud instances are also expensive. By my calculations, the payoff period of building my own box would be less than 2 months of cloud run-time. So I built a machine with relatively modest specs for about $1200 and parked it in my parents’ basement. It’s over 3x faster on my dataset than the AWS GPU instance I was experimenting with and should pay for itself soon.

What’s Next

I’m still a Machine Learning beginner but I’ve learned a lot through making Magic Sudoku.

Over the next few weeks I have a pretty long to-do list of bug fixes and minor improvements of the current feature-set. Now that I’ve fixed most of the issues with weird fonts, weights, padding, and blurriness, the next step is to improve my heuristics to better identify puzzles.

The current version (v1.4) can get into trouble if there isn’t much padding between the puzzle and the edge of the sheet of paper or if there is text near the edges. I have some theories on how I can improve this but it will probably be a lot of trial and error to get it right.

If I can’t get the heuristics to work I’ve been collecting a dataset of properly aligned puzzles and bad scans that I can use to train another neural network to filter out the bad scans.

In the longer term we have plans to add some pretty awesome features. For example, we’d like to add a “checker” feature for completed puzzles. Most of the new features we have planned will require us to read and separate handwriting from computer-given numbers so we’ve been collecting handwriting samples from users around the world to train a new machine learning model.

We also need to work out our business model. But that’s a story that will have to wait for part three. In the meantime, download the app and let us know what you think!

Stay tuned…

This is the second post in a 3-part series. Part one explained why we decided to make Magic Sudoku and what makes it unique. Part three will explore the business side of launching an app like this and how it’s been going so far.

Subscribe or follow me on twitter and you’ll be the first to know when part three comes out. And don’t forget to download the app and give it a try yourself!