Established in 2009, Alphabet’s autonomous driving unit Waymo has always been relatively protective of its technology and data. Yesterday however the company surprised many by releasing a new high-quality multimodal sensor dataset for autonomous driving. The Waymo Open Dataset was introduced at top AI conference Computer Vision and Pattern Recognition (CVPR) 2019 in Long Beach, California.

In a talk at the CVPR 2019 Autonomous Driving Workshop, Waymo Principal Scientist Drago Anguelov said traditional open-source datasets like KITTI are too small for today’s leading autonomous driving companies, forcing researchers and engineers to spend too much time on data augmentation and on preventing overfitting. Moreover, algorithm results on KITTI could not generalize to large datasets.

That motivated Waymo to curate the Waymo Open Dataset, which features some 3,000 driving scenes totalling 16.7 hours of video data, 600,000 frames, approximately 25 million 3D bounding boxes and 22 million 2D bounding boxes. Sensors on Waymo’s data-collection autonomous vehicles include five LiDARs, five cameras and an undisclosed number of radars. Anguelov also stressed that Waymo does a better job on LiDAR-to-camera synchronization than KITTI or NuScenes.

The Waymo Open Dataset also improves on data diversity, factoring in variables such as weather, pedestrians, lighting conditions, cyclists and construction.

Waymo will release the first part of the dataset with 1,000 videos in July, and more in the near future. Anguelov says the company will also publish benchmarks and organize competitions.

The Waymo Open Dataset release signals a strategic shift inside Waymo: from working behind closed doors to embracing the open-source spirit. Many self-driving companies — even industry leader Waymo — are realizing the road to broad commercialization may be a long and winding one. Waymo’s Arizona ride-share project Waymo One has exposed tech shortcomings, with vehicles unable to handle unusual circumstances or tough weather conditions. Company CEO John Krafcik has admitted “it will take decades for self-driving cars to become common on roads.”

Anguelov told the audience that benchmark datasets have been critical to major advances in the field. ImageNet for example spawned the computer vision boom. Like ImageNet, the Waymo Open Dataset is expected to be applied in academic and experimental research, and the company will benefit from the development of new models that outperform its benchmark.

Last year Chinese tech giant Baidu and the University of California in Berkeley respectively released the large-scale self-driving datasets Apollo Scape and BDD100K. The data volume of Apollo Scape is 10 times greater than KITTI and CityScapes; while BDD100K contains over 100,000 driving experience videos running 40 seconds at 30 fps.

The Waymo Open Dataset will soon be available at www.waymo.com/open.