The latest update from crowdsourced review website Yelp has turned heads in the machine learning community. Yelp version 12.27.0 appeared in Apple App Store with an odd notation: “We apologize to anyone who had problems with the app this week. We trained a neural net to eliminate all the bugs in the app and it deleted everything. We had to roll everything back. To be fair, we were 100% bug-free… briefly.”

The announcement seemed too early for April Fools’, but a Yelp spokesperson confirmed with Synced that it was in fact a gag: “we typically use a joking tone when we write our release notes for the app store. Our latest note was meant in jest.”

Joking aside, the note did shine a light on the task of auto debugging code, which continues to challenge machine learning techniques. As renown AI scholar and Google Director of Research Peter Norvig told the 2016 EmTech Digital conference, “the problem here is the methodology for scaling [machine learning verification] up to a whole industry is still in progress. We have been doing this for a while; we have some clues for how to make it work, but we don’t have the decades of experience that we have in developing and verifying regular software.”

Conventional software programming uses Boolean-based logic and can be tested to determine whether performance matches original design. As a black box programming method, machine learning produces probabilistic logic which cannot be effectively tested in the same manner.

Last September, Facebook debuted an artificial intelligence programming tool called SapFix to tackle the challenge of auto debugging. The tool can identify bugs automatically, produce fixes for bugs specifically, and suggest engineers’ best proposals for deployment. Facebook has already deployed SapFix on its codebase.

The Yelp joke also raised concerns that an AI might find a way to identify and exploit bugs maliciously. WIRED Senior Writer Tom Simonite tweeted “Worth noting that the scenario is very plausible. AI researchers have found themselves outwitted by their experiments when bots find ways to cheat, for example by discovering unknown bugs in their code.”

While the Yelp note raised a few smiles in the machine learning community, in the back of their minds most researchers are well aware of the potentially severe consequences of mistakes from deployed AI models. Even the most robust AI system is far from infallible, as Chief Data Scientist and Co-Founder of Quartic.ai Xiaozhou Wang says: “it is possible for a trained model to make mistake, i.e., not behave as expected completely… one of the biggest reasons is that they all have bugs.” Wang says the problem won’t go away anytime soon, as researchers have yet to agree on the best way to set up test frameworks that can effectively purge machine learning models of bugs.