Computers can already recognize you in an image, but can they see a video or real-world objects and tell exactly what's going on?

Researchers are trying to make computer video recognition a reality, and they are using some image recognition techniques to make that happen.

Researchers in and outside of Google are making progress in video recognition, but there are also challenges to overcome, Rajat Monga, engineering director of TensorFlow for Google's Brain team, said during a question-and-answer session on Quora this week.

The benefits of video recognition are enormous. For example, a computer will be able to identify a person's activities, an event, or a location. Video recognition will also make self-driving cars more viable.

Video recognition has the potential of giving digital eyes to robots, which may then be able to do regular chores like laundry.

Image recognition is now common, but video recognition involves analyzing a string of related images bunched together in a sequence. Video recognition is akin to human vision, where we see a stream of related images, recognize objects immediately, and identify what's going on around us.

Many gains in video recognition have come, thanks to advances in the deep-learning models driving image recognition.

"With the sequence of frames in each video that are related to each other, it provides a much richer perspective of the real world, allowing the models to create a 3D view of the world, without necessarily needing stereo vision," said Monga, who leads up TensorFlow, an open-source machine-learning software stack from Google.

In the context of deep learning, there are challenges related to image recognition. Computers can recognize some items in images, but not everything. That's a disadvantage when it comes to goals like giving human-like vision to robots.

True human vision via video recognition is "still far away," Monga said.

Computers need to be trained to recognize images in deep-learning models, and there are large repositories that can be used to cross-reference objects in pictures. Large datasets like ImageNet, which has about 14 million images, have helped enhance vision recognition. But there still need to be larger datasets, Monga said.

Researchers at Google are trying to enhance video recognition. The company's researchers are studying how deep-learning could help robots with hand-eye coordination and learning through predictive video.

Google is making AI a big part of its cloud operations and using machine learning for Google Now, street mapping, and other services. Outside of Google, deep learning is also being used by self-driving cars to cruise the street safely. Companies are also using AI to get rid of bugs in code.

Deep learning -- both training and inferencing -- is getting better with faster computing, algorithms, and datasets, but there is still plenty of room for improvement, Monga said.

The rise of faster hardware and custom chips like Google's machine-learning Tensor Processing Unit have helped boost deep learning. Low-level calculations on GPUs are driving most deep learning models today, but faster hardware will make learning and inferencing faster.

"This remains a challenge even as we are getting customized chips, there’s continued demand for more [computing]," Monga said.

There's also a need for larger datasets and more algorithms, which provide the underlying formulae to conduct deep-learning operations.

Training neural networks, which train deep-learning models, "is hard without large enough datasets," Monga said.

Machine learning is growing fast, and many companies are adopting Google's tools. Variants of Google's TensorFlow have been developed by companies like Nvidia and Movidius (which is being acquired by Intel) for servers and embedded devices.

Google, Amazon, Facebook, Microsoft and IBM this week also formed the Partnership on AI organization to establish best AI practices. Funders Elon Musk, Peter Thiel, Sam Altman, and Jessica Livingston have pledged US $1 billion to the fast-growing Open AI project, which is becoming the nerve-center of AI activity in the IT industry.