Waymo, the Alphabet subsidiary that hopes to someday pepper roads with self-driving taxis, today pulled back the curtains on a portion of the data used to train the algorithms underpinning its cars: The Waymo Open Dataset. Waymo principal scientist Dragomir Anguelov claims it’s the largest multimodal sensor sample corpus for autonomous driving released to date.

“[W]e are inviting the research community to join us with the [debut] of the Waymo Open Dataset, [which is composed] of high-resolution sensor data collected by Waymo self-driving vehicles,” wrote Anguelov in a blog post published this morning. “Data is a critical ingredient for machine learning … [and] this rich and diverse set of real-world experiences has helped our engineers and researchers develop Waymo’s self-driving technology and innovative models and algorithms.”

The Waymo Open Dataset contains data collected over the course of the millions of miles Waymo’s cars have driven in Phoenix, Kirkland, Mountain View, and San Francisco, and it covers a wide variety of urban and suburban environments during day and night, dawn and dusk, and sunshine and rain. Samples are divided into 1,000 driving segments, each of which captures 20 seconds of continuous driving — corresponding to 200,000 frames at 10 Hz — through the sensors affixed to every Waymo car. These include five custom-designed lidars (which bounce light off of objects to map them three-dimensionally) and five front- and side-facing cameras.

The corpus additionally includes labeled lidar frames and images with vehicles, pedestrians, cyclists, and signage, capturing a total of 12 million 3D labels and 1.2 million 2D annotations. Waymo says the camera and lidar frames have been synchronized by its in-house 3D perception models that fuse data from multiple sources, obviating the need for manual alignment.

“Waymo designs our entire self-driving system — including hardware and software — to work seamlessly together, which includes choice of sensor placement and high-quality temporal synchronization,” wrote Anguelov. “This data has the potential to help researchers make advances in 2D and 3D perception and make progress on areas such as domain adaptation, scene understanding, and behavior prediction. We hope that the research community will generate more exciting directions with our data that will not only help to make self-driving vehicles more capable, but also impact other related fields and applications, such as computer vision and robotics.”

The launch of Waymo’s enormous data set comes after Lyft revealed its own open source corpus for autonomous vehicle development. In addition to over 55,000 human-labeled 3D annotated frames of traffic agents, it contains bitstreams from seven cameras and up to three lidar sensors, plus a drivable surface map and an underlying HD spatial semantic map that includes over 4,000 lane segments, 197 crosswalks, 60 stop signs, 54 parking zones, eight speed bumps, and 11 speed humps.

Other such collections include Mapillary Vistas’ data set of street-level imagery, the KITTI collection for mobile robotics and autonomous driving research, and the Cityscapes data set developed and maintained by Daimler, the Max Planck Institute for Informatics, and the TU Darmstadt Visual Inference Group.

Progress toward truly driverless cars

It has been over six months since Waymo launched Waymo One, its commercial driverless taxi fleet of over 600 cars with safety drivers behind the wheel, and the company says the operation has grown to serve over 1,000 riders in that time. Separately, Waymo recently revealed that its cars have driven 10 billion autonomous miles in simulation and 10 million real-world autonomous miles in 25 cities.

Weeks after Waymo announced it would dedicate a factory in southeast Michigan to the production of level 4 autonomous cars — that is, cars capable of driving without human supervision in most conditions — the company said it had settled on a location in Detroit. Separately, Waymo partnered with Lyft to deploy 10 of its vehicles on the ride-hailing platform in Phoenix.

Waymo currently operates a roughly 20-person, 53,000-square-foot office in Novi, Michigan that opened in 2016, and in Detroit it tests driverless Chrysler Pacifica hybrid minivans that are produced in Windsor, Canada and shipped to Novi, where they’re outfitted with hardware and software by Waymo and Chrysler engineers. In Chandler, Arizona, Waymo last year expanded its full service center, which houses operations and support teams — including fleet technicians, dispatch, response, and rider support — to 60,000 square feet. More recently, the company pledged to open an 85,000-square-foot technical service center in the city of Mesa, Arizona, near Phoenix’s East Valley, and Waymo expects to “more than double” its capacity to maintain the fleet of cars in Waymo One.

Waymo also announced last year that it would add up to 62,000 minivans to its fleet and said it had signed a deal with Jaguar Land Rover to equip 20,000 of the automaker’s Jaguar I-Pace electric SUVs with its autonomous system by 2020.

Waymo has competition in Yandex, Tesla, Zoox, Aptiv, May Mobility, Pronto.ai, Aurora, Nuro, and GM’s Cruise Automation, to name just a few. Daimler last summer obtained a permit from the Chinese government allowing it to test autonomous cars powered by Baidu’s Apollo platform on public roads in China. Beijing-based Pony.ai, which has raised $214 million in venture capital, in early April launched a driverless taxi pilot in Guangzhou. And startup Optimus Ride this month built out a small autonomous shuttle fleet in New York City, becoming the first to do so.

According to marketing firm ABI, as many as 8 million driverless cars will be added to the road in 2025, and Research and Markets anticipates that there will be some 20 million autonomous cars in operation in the U.S. by 2030.