Chinese search giant Baidu has served up its first ever visual search engine, which allows users to finally query the web using only images as input instead of keywords.

Google has long offered this sort of thing, but Baidu continues to show that it's determined to keep pace with Larry Page and company.

“We didn’t have any similar kind of product in China because we didn’t have the sufficient technology to handle this,” says Baidu's Kai Yu, who led the project. “In the China market, this is the first of its kind.”

Unveiled last week, the tool grew out of Baidu’s newly launched Institute of Deep Learning, the company’s Beijing- and Silicon Valley-based research arm focused on deep learning, a field of computer science that seeks to mimic how the human brain works. The company has already deployed deep-learning algorithms for optical character, face, and voice recognition, online advertising and web search. Yu and engineers at IDL have been working on visual search since September to meet growing demand among their users, Yu says.

Baidu’s visual search engine is powered by convolutional neural networks, the same type of deep-learning technology that also underlies Google’s photo tagging system, according to NYU’s Yann LeCun who developed convolutional neural nets in the 1980s and is working on photo-tagging systems based on the same technology. (Google’s neural nets are being developed by Alex Krizhevsky, Ilya Sutskever and Geoffry Hinton, whom Google hired in March to supercharge its deep-learning capabilities.)

Convolutional neural nets are particularly useful for this type of application because they are engineered to be able to recognize objects from various angles, assuming the neural network has been trained to recognize it. The technology has also been used for handwriting recognition and for high-speed check-reading systems.

They are "designed to recognize visual patterns from pixel images with minimal preprocessing. They can recognize patterns with extreme variability and robustness to distortions," according to LeCun's website.

Yu’s team is using Nvidia GPU servers to train their neural nets, but unlike Google, Baidu is sticking to commodity CPU servers for their online deployment. Yu says Baidu engineers have done a “significant job of accelerating the online algorithm” to ensure it runs fast enough to meet user demands and that for now they don’t need to turn to GPUs, which are faster but can be more power-intensive than traditional CPUs.

Part of their trick has been to develop algorithms that need only compare the query image to a small number of images in Baidu’s distributed database, rather than the billions the company has access to. From there, the system can figure out what images are similar to the original input and crank out relevant search results quickly. Baidu has also chosen to index images in the main memory in order to retrieve them at speeds internet users have come to expect.

“Some special large-scale indexing structure is being used…. We keep everything in memory, otherwise it would be very difficult to serve the query in a fast way. When you access data in hard disk, that will be very painful,” says Yu. “We go to memory, so that’s even faster than Flash.”

Baidu’s new service compares images using only pixel and image feature information to find images. Typically, search engines also look at images’ surrounding text on resident webpages to serve up better results.

“This is our first version. We just tried purely image-based search and we found the result was quite amazing,” says Yu. “In the future, we’ll further improve the product by combining text information.”

Future iterations will also port the product to mobile. Right now, Baidu’s visual-search engine is limited to the web, a move perhaps driven by pressure to bring the product to market. When Google launched its first image-search service, Google Goggles, in 2009, it started off on mobile and it took the company about two years to bring the service to the web. But Google is Google, and mobile image search can be more engineering-intensive. It presents a unique set technical challenges, like controlling for different quality cameras, blurring, color balance and over-exposure.

Now that they’ve put themselves on the visual-search map, Baidu can concentrate on making a product that’s more in line with how people are searching today.

“We want to make full use of [mobile] sensors to help users do all kinds of search in the most natural way,” Yu says. “Definitely, mobile search is our big target. We are already planning for this.”

