To the machine learning community, high-quality data is as vital as the fuel to a car — it’s what keeps the ML engines running. Recently, a dataset with 64,000 pictures of cars appeared on GitHub, the work of data scientist Nicolas Gervais. The Car Connection Picture Dataset is of added interest because its images are conveniently labeled by make, model, year, price, horsepower, body style and more.

Gervais first collected more than a quarter million images from the website thecarconnection.com. His focus was on exteriors, and excluding car interior and other images left him with the 64k set, with picture sizes of about 320x210. Users can also access large versions of the images by adjusting the included scraper settings in “scrape.py.”

To demonstrate the dataset’s potential in practical applications, Gervais created a car price prediction model, and an Audi vs BMW deep learning classification task in PyTorch.

So, what is the first thing the ML community thought of with these 64,000 pictures of cars in hand? Making fantasy rides of course: “Seems like this would be really fun to hook up to StyleGAN2 and be able to generate cars based on those properties” suggested Reddit user Skylion007 in a sentiment echoed by others on the ML discussion reddit. StyleGAN is the hyperrealistic image generator developed by chip giant NVIDIA in 2018. Philip Wang used the tool to create “This Person Does Not Exist,” a website that generates a new hyperrealistic fake human face every time it’s refreshed. The tech has since extended to cats, airbnbs, anime faces — why not cars?

Reddit exchange on the new car dataset’s potential for building dream cars with GANs.

Aside from amusing vehicle style mashups, it’s also been suggested the dataset could be used to predict future car designs, or style and price trends, etc.

Gervais is a Python software engineer with TD in Montreal and a Machine Learning and Data Science student at McGill. The Car Connection Picture Dataset is available on his GitHub.