On July 4, swarms of sugar-happy children, sparkler laden teens, and buzzed, burger-full adults crowd the streets in cities across the US. They stagger in packs, waving flags and wearing flags, acting, and let’s not mince words here, like very patriotic fools. Why not? Many have the day off, and maybe the day after off, too. Add fireworks, and you have the makings of a very good time.

Unless you’re a self-driving car developer, in which case you have the makings of a nightmare. See, on most days of the year, walkers act fairly predictably. They wait at crosswalks for their light. Or they don’t, and make little dashes across the street. Normal, everyday stuff—that’s what the systems that run autonomous vehicles are trained to handle. But if people act erratically, wandering about the lanes in ways they usually don’t, the cars can get confused.

So if you are that self-driving car developer, encountering Fourth of July for the very first time, you might pay for the services of a new breed of data labeling company, like Scale API. Scale’s automated systems, helped along by somewhere north of 10,000 contract workers, examine and label the data collected by autonomous vehicles as they run tests on American roads. Those labels, in turn, help the car’s software train to recognize particular situations next time they occurs.

LEARN MORE The WIRED Guide to Self-Driving Cars

Here, the fine distinctions matter. If a data labeler were to consistently label cars as people, an autonomous vehicle’s software might get very, very confused, swerving or braking when it shouldn’t. Or: If data is labeled perfectly and accurately, every single time, then those systems just might learn how to safely maneuver through the wide, weird world. Put another way, the tedious task of data labeling is essential to building safe self-driving cars.

Here’s the silly thing, though. When a Scale customer—like self-driving car developers Cruise, Zoox, Lyft, Nutonomy, Nuro, Pony.ai or Voyage, or self-driving truck builders Embark and Starsky Robotics—sends data to be labeled, that data doesn’t get shared with other Scale clients. This is too bad, because autonomous driving systems could always use more data to train on, more images of the real world that help them refine their robobrains. It’s doubly too bad when it comes to edge cases, the unusual but dangerous happenings that all cars should be prepared to handle.

Animation by Scale API Scale API

Sure, it makes a lot of sense for companies to want to keep these bits of data to themselves. The developers spend a lot of time and money collecting that information, after all. “I don’t know how you get competitors to share their most valuable information,” says Oscar Beijbom, who heads up the machine learning team at Nutonomy.1 “In a way, these corner cases are very precious.”