“CityFlow,” a city-scale traffic camera dataset paper from NVIDIA researchers, has been accepted by CVPR 2019 as an Oral Session, earning two “Strong Accepts” and one “Accept” from reviewers. Billed as the world’s first large dataset to support cross-camera car tracking and re-identification, CityFlow features the largest number of cameras (40) and the largest spatial span (> 3 km²) of any such smart city test platform.

There is great potential for networked traffic cameras acting as citywide sensors to optimize traffic flow and manage traffic accidents in crowded urban areas. Existing technology however lacks the ability to track vehicles on a large scale. Challenges include vehicles spanning multiple cameras at different intersections, and ever-varying weather conditions.

To optimize the potential of traffic cameras in this regard, three distinct but closely related research questions must be addressed: 1) Detecting and tracking targets within a single camera, i.e. multi-target single camera (MTSC) tracking; 2) Re-identification across multiple camera targets (ReID); 3) Detecting and tracking targets across the camera network, i.e. multi-target multi-camera (MTMC) tracking. MTMC tracking can be seen as a combination of MTSC tracking inside the cameras and image-based ReID, connecting the target trajectory between cameras.

Compared with the recent rapid development of pedestrian re-identification systems, vehicle re-identification faces two major challenges: large intra-class variability re the distortion of vehicle shapes from different angles; and small inter-class variability re the high similarity among vehicles of the same model and colour. Existing vehicle re-identification datasets (VeRi-776 from Beijing University of Posts and Telecommunications; VehicleID and PKU-VD from Peking University) do not provide original videos and camera correction information, so they cannot be used for studies on video-based cross-camera vehicle tracking.

CityFlow contains high-definition synchronized video collected from 40 cameras across 10 intersections in a medium-sized US city; with more than 200K annotated bounding boxes covering a wide range of scenes, viewing angles, vehicle models, and urban traffic flow conditions.

Key contributions of the paper:

1) The largest spatial span, as well as the most cameras and junctions, including diverse urban scenarios and traffic flows, providing the best platform for city-scale solutions;

2) The first video-based dataset to support multi-target multi-camera (MTMC) vehicle tracking, which provides original videos, camera distribution and camera correction information, thus leading the way to a new research field.

3) Analysis of various advanced algorithms’ performance on the dataset compared with algorithms combining visual and spatiotemporal analysis.

The dataset sets new standards and is expected to facilitate additional research, improve the effectiveness of current algorithms, and optimize real-world traffic management. Both the CityFlow dataset and its online evaluation server have been released s part of the 2019 AI City Competition. To protect privacy, all license plates and faces in the dataset have been occluded.

Zheng Tang, the first author of the paper, is a PhD student at the Department of Computer Engineering at the University of Washington (Seattle) and an intern at NVIDIA.

The paper CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification is on arXiv.