Updated @ 05:25 April 17: Four months after it first published this research (detailed in the story below), Google is now promoting this deep neural network as a win for both Street View and its Recaptcha product. As far as I can tell, nothing has changed since January — Google is just now framing it as “our neural network is so advanced that it can decrypt our captchas as well as a human,” rather than improving Street View’s ability to decrypt hard-to-read house signs. The software can decrypt the hardest type of Recaptcha captchas with 99.8% accuracy (which is a lot better than my own accuracy).

Original story

Having spent some time on the internet, you have no doubt been forced to prove your humanity by typing words and numbers into a captcha. Google’s own Recaptcha variant has been used not only to keep bots at bay, but to help the search giant identify the text in scanned books and house numbers from Street View. Google doesn’t rely exclusively on hijacking your brain cycles anymore, though. A new research paper from Google details how the company trained a neural network to read the millions of unidentified house numbers captured by Street View cameras without human intervention.

An artificial neural network is a computational model that seeks to replicate the parallel nature of a living brain. This system works directly on the pixel images that are captured by Street View cars, and it works more like your brain than many previous models. Instead of breaking each address image up into individual digits then identifying each one, it looks at the whole number and recognizes it, just like we do.

When you type in an address on Google Maps, you expect it to return the correct location. Having the right addresses for each structure is essential to that, especially in areas where building numbers are not linear. That’s why there is value in knowing what it actually says on the front door, and why the company would go to the trouble of building a synthetic brain to do it.

To train its neural network, Google used the publicly available Street View House Numbers (SVHN) Dataset. This is exactly what it sounds like — a massive dataset with 200,000 addresses split up into number blocks for a total of 600,000 numerical images to train an electronic brain. It takes six days for the system to learn the dataset and be able to identify digits in Street View pictures at a high level of accuracy.

Google simplified the process by placing some constraints on the images analyzed by the neural network. The addresses must have already been identified and automatically cropped so that the number is at least one third the total width of the final image. They also assume that the number is five or fewer digits long, which works fine in most regions. Since Google’s neural network doesn’t read digits slowly, one at a time, the limit on length is essential.

Humans transcribing the numbers from Street View images are about 98% accurate, so that is the threshold Google is shooting for with the machine. That doesn’t mean 98% of all images necessarily — it refers to a subset of images that are suitable for the automated system to identify. About 95% of captured addresses fall into this category and the neural network meets the 98% accuracy requirement on them. Google says it has used this system to read 100 million physical street numbers so far. [Research paper: arxiv.org/abs/1312.6082 – “Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks”]

This computer model has lightened the load on human eyeballs considerably, but there are still some images that require a human’s assessment. As the neural network is improved, Google researchers hope that it could be of use in reading street signs or phone numbers on billboards.