If you believe the CEOs, a fully autonomous car could be only months away. In 2015, Elon Musk predicted a fully autonomous Tesla by 2018; so did Google. Delphi and MobileEye’s Level 4 system is currently slated for 2019, the same year Nutonomy plans to deploy thousands of driverless taxis on the streets of Singapore. GM will put a fully autonomous car into production in 2019, with no steering wheel or ability for drivers to intervene. There’s real money behind these predictions, bets made on the assumption that the software will be able to catch up to the hype.

On its face, full autonomy seems closer than ever. Waymo is already testing cars on limited-but-public roads in Arizona. Tesla and a host of other imitators already sell a limited form of Autopilot, counting on drivers to intervene if anything unexpected happens. There have been a few crashes, some deadly, but as long as the systems keep improving, the logic goes, we can’t be that far from not having to intervene at all.

But the dream of a fully autonomous car may be further than we realize. There’s growing concern among AI experts that it may be years, if not decades, before self-driving systems can reliably avoid accidents. As self-trained systems grapple with the chaos of the real world, experts like NYU’s Gary Marcus are bracing for a painful recalibration in expectations, a correction sometimes called “AI winter.” That delay could have disastrous consequences for companies banking on self-driving technology, putting full autonomy out of reach for an entire generation.

“Driverless cars are like a scientific experiment where we don’t know the answer”

It’s easy to see why car companies are optimistic about autonomy. Over the past ten years, deep learning — a method that uses layered machine-learning algorithms to extract structured information from massive data sets — has driven almost unthinkable progress in AI and the tech industry. It powers Google Search, the Facebook News Feed, conversational speech-to-text algorithms, and champion Go-playing systems. Outside the internet, we use deep learning to detect earthquakes, predict heart disease, and flag suspicious behavior on a camera feed, along with countless other innovations that would have been impossible otherwise.

But deep learning requires massive amounts of training data to work properly, incorporating nearly every scenario the algorithm will encounter. Systems like Google Images, for instance, are great at recognizing animals as long as they have training data to show them what each animal looks like. Marcus describes this kind of task as “interpolation,” taking a survey of all the images labeled “ocelot” and deciding whether the new picture belongs in the group.

Engineers can get creative in where the data comes from and how it’s structured, but it places a hard limit on how far a given algorithm can reach. The same algorithm can’t recognize an ocelot unless it’s seen thousands of pictures of an ocelot — even if it’s seen pictures of housecats and jaguars, and knows ocelots are somewhere in between. That process, called “generalization,” requires a different set of skills.

For a long time, researchers thought they could improve generalization skills with the right algorithms, but recent research has shown that conventional deep learning is even worse at generalizing than we thought. One study found that conventional deep learning systems have a hard time even generalizing across different frames of a video, labeling the same polar bear as a baboon, mongoose, or weasel depending on minor shifts in the background. With each classification based on hundreds of factors in aggregate, even small changes to pictures can completely change the system’s judgment, something other researchers have taken advantage of in adversarial data sets.

Marcus points to the chat bot craze as the most recent example of hype running up against the generalization problem. “We were promised chat bots in 2015,” he says, “but they’re not any good because it’s not just a matter of collecting data.” When you’re talking to a person online, you don’t just want them to rehash earlier conversations. You want them to respond to what you’re saying, drawing on broader conversational skills to produce a response that’s unique to you. Deep learning just couldn’t make that kind of chat bot. Once the initial hype faded, companies lost faith in their chat bot projects, and there are very few still in active development.

That leaves Tesla and other autonomy companies with a scary question: Will self-driving cars keep getting better, like image search, voice recognition, and the other AI success stories? Or will they run into the generalization problem like chat bots? Is autonomy an interpolation problem or a generalization problem? How unpredictable is driving, really?

It may be too early to know. “Driverless cars are like a scientific experiment where we don’t know the answer,” Marcus says. We’ve never been able to automate driving at this level before, so we don’t know what kind of task it is. To the extent that it’s about identifying familiar objects and following rules, existing technologies should be up to the task. But Marcus worries that driving well in accident-prone scenarios may be more complicated than the industry wants to admit. “To the extent that surprising new things happen, it’s not a good thing for deep learning.”

“Safety isn’t just about the quality of the AI technology”

The experimental data we have comes from public accident reports, each of which offers some unusual wrinkle. A fatal 2016 crash saw a Model S drive full speed into the rear portion of a white tractor trailer, confused by the high ride height of the trailer and bright reflection of the sun. In March, a self-driving Uber crash killed a woman pushing a bicycle, after she emerged from an unauthorized crosswalk. According to the NTSB report, Uber’s software misidentified the woman as an unknown object, then a vehicle, then finally as a bicycle, updating its projections each time. In a California crash, a Model X steered toward a barrier and sped up in the moments before impact, for reasons that remain unclear.

Each accident seems like an edge case, the kind of thing engineers couldn’t be expected to predict in advance. But nearly every car accident involves some sort of unforeseen circumstance, and without the power to generalize, self-driving cars will have to confront each of these scenarios as if for the first time. The result would be a string of fluke-y accidents that don’t get less common or less dangerous as time goes on. For skeptics, a turn through the manual disengagement reports shows that scenario already well under way, with progress already reaching a plateau.

Andrew Ng — a former Baidu executive, Drive.AI board member, and one of the industry’s most prominent boosters — argues the problem is less about building a perfect driving system than training bystanders to anticipate self-driving behavior. In other words, we can make roads safe for the cars instead of the other way around. As an example of an unpredictable case, I asked him whether he thought modern systems could handle a pedestrian on a pogo stick, even if they had never seen one before. “I think many AV teams could handle a pogo stick user in pedestrian crosswalk,” Ng told me. “Having said that, bouncing on a pogo stick in the middle of a highway would be really dangerous.”

“Rather than building AI to solve the pogo stick problem, we should partner with the government to ask people to be lawful and considerate,” he said. “Safety isn’t just about the quality of the AI technology.”

“This is not an easily isolated problem”

Deep learning isn’t the only AI technique, and companies are already exploring alternatives. Though techniques are closely guarded within the industry (just look at Waymo’s recent lawsuit against Uber), many companies have shifted to rule-based AI, an older technique that lets engineers hard-code specific behaviors or logic into an otherwise self-directed system. It doesn’t have the same capacity to write its own behaviors just by studying data, which is what makes deep learning so exciting, but it would let companies avoid some of the deep learning’s limitations. But with the basic tasks of perception still profoundly shaped by deep learning techniques, it’s hard to say how successfully engineers can quarantine potential errors.

Ann Miura-Ko, a venture capitalist who sits on the board of Lyft, says she thinks part of the problem is high expectations for autonomous cars themselves, classifying anything less than full autonomy as a failure. “To expect them to go from zero to level five is a mismatch in expectations more than a failure of technology,” Miura-Ko says. “I see all these micro-improvements as extraordinary features on the journey towards full autonomy.”

Still, it’s not clear how long self-driving cars can stay in their current limbo. Semi-autonomous products like Tesla’s Autopilot are smart enough to handle most situations, but require human intervention if anything too unpredictable happens. When something does go wrong, it’s hard to know whether the car or the driver is to blame. For some critics, that hybrid is arguably less safe than a human driver, even if the errors are hard to blame entirely on the machine. One study by the Rand Corporation estimated that self-driving cars would have to drive 275 million miles without a fatality to prove they were as safe as human drivers. The first death linked to Tesla’s Autopilot came roughly 130 million miles into the project, well short of the mark.

But with deep learning sitting at the heart of how cars perceive objects and decide to respond, improving the accident rate may be harder than it looks. “This is not an easily isolated problem,” says Duke professor Mary Cummings, pointing to an Uber crash that killed a pedestrian earlier this year. “The perception-decision cycle is often linked, as in the case of the pedestrian death. A decision was made to do nothing based on ambiguity in perception, and the emergency braking was turned off because it got too many false alarms from the sensor”

That crash ended with Uber pausing its self-driving efforts for the summer, an ominous sign for other companies planning rollouts. Across the industry, companies are racing for more data to solve the problem, assuming the company with the most miles will build the strongest system. But where companies see a data problem, Marcus sees something much harder to solve. “They’re just using the techniques that they have in the hopes that it will work,” Marcus says. “They’re leaning on the big data because that’s the crutch that they have, but there’s no proof that ever gets you to the level of precision that we need.”

Correction: This piece originally described Andrew Ng as a founder of Drive.AI. In fact, he sits on the company’s board. The Verge regrets the error.