Today the idea of a World Wide Web seems obvious in retrospect. Yet at the time it was a radical idea. I believe that Computer Vision, needs its own web of visual data (images, videos, features & annotations). Such a web will not only increase research productivity in short term but also enable new possibilities that are currently even difficult to contemplate.

Inefficient distribution mechanisms

Datasets and competitions are currently the primary mechanisms for judging progress in Computer Vision. However inefficiencies in sharing datasets (due to incompatible formats, parsing etc.) and time commitment required for participating in competitions significantly reduces the speed of innovation. Consider following three recent video datasets,

Youtube BB (https://research.google.com/youtube-bb/download.html)

Video Comprehension (http://videomcc.org/)

VideoNet A benchmark for all things video http://videonet.team/

Significant amount of effort / money went into collecting, labeling & organizing these datasets. Each dataset can be represented as set of annotations over a bounding box on frames & videos. Yet each one has a different method for representation. To use any of these datasets a PhD student will have to spend at least couple of hours, if not couple of days, writing scripts to download, extract & parse data.

Part of the reason behind the incompatibility is the “Static” nature of data generation & labeling process. Typically a research group will collect set of images / videos, assign labels & annotations using MTurk, etc. And then release the dataset. There is little reuse. However with advent of MSCOCO (http://mscoco.org/) and YFCC 100M by Yahoo Flickr (http://multimediacommons.wordpress.com) , researchers have started reusing same set of images and videos with different types of annotations. Even then there are two issues that remain unsolved.

How do we incorporate new visual data. Methods and formats for accessing and parsing data/annotations.

In an age where the human attention is scarce, we need mechanisms that enable frictionless sharing & collaborative creation of visual data.

Web of visual data with Visual Data Network

Today programmers can use Git & Github/Gitlab to share code, track revisions and collaborate. An ideal solution for sharing and collaborating with visual data will require ability for users to share their datasets And for other users to extend them by providing annotations, detections and computed features. I am building Visual Data Network to solve this problem.

Collaborating with Visual Data Network

With Visual Data Network (VDN) , researchers, organizations, individual users can run their own VDN server, which would maintain list of datasets & annotations. By piggybacking on DNS / URL scheme datasets can be referenced across servers. Making data private would be equivalent to putting those servers inside private network or behind authentication.

Deep Video Analytics to solve the Chicken and Egg problem

Standardization of formats alone is not sufficient. It has to be accompanied by user friendly software that leverages those standards to expand the ecosystem by enabling rapid creation of new visual data.

Consider following example, a new video game, Zelda is released. Gamers and enthusiast would like to build an ml model to monitor live streams of the game to detect what the player is doing. Building such domain specific system will require collection & labeling of data, currently there are no tools that allow such collaborative annotation. There is a strong need for a software that allows users to quickly pull/store/process/query visual data. Just like you would with relational data using databases such as MySQL & Postgres .

As a result I have build Deep Video Analytics, an open source visual data analytics platform. Using Deep Video Analytics users can quickly load, annotate, index, images & videos. The platform uses deep learning indexing, detection and recognition models for visual search. They can detect and recognize objects (such as faces) and seamlessly import and share processed datasets using Visual Data Network.

Following video gives a quick demo of an alpha version Deep Video Analytics. Including capabilities such as pulling datasets seamlessly from Visual Data Network server.

A demo alpha version of Deep Video Analytics & Visual Data Network

My hope is that with Deep Video Analytics & Visual Data Network, the Computer Vision research community will be able to move from current static view of datasets towards a more dynamic & collaborative future. More information about Deep Video Analytics & Visual Data Network can be found at links below.