Spammers, hackers, political propagandists, and other nefarious users have always tried to game the systems that social media sites put in place to protect their platforms. It’s a never-ending battle; as companies like Twitter and Facebook become more sophisticated, so do the trolls. And so last week, after Facebook shared new details about a tool it built to analyze text found in images like memes, some people began brainstorming how to thwart it.

Social media companies are under tremendous pressure from lawmakers, journalists, and users to be more transparent about how they decide what content should be removed and how their algorithms work, especially after they’ve made a number of high-profile mistakes. While many companies are now more forthcoming, they’ve also been reluctant to reveal too much about their systems because, they say, ill-intentioned actors will use the information to game them.

Last Tuesday, Facebook did reveal the details of how it uses a tool called Rosetta to help automatically detect things like memes that violate its hate speech policy or images that spread already debunked hoaxes; the company says it uses it to process one billion public images and videos uploaded to Facebook each day.

Propagators of the false right-wing conspiracy theory QAnon took interest after “Q”—the anonymous leader who regularly posts nonsensical “clues” for followers—linked to several news articles about the tool, including WIRED’s. Rosetta works by detecting the words in an image and then feeding them through a neural network that parses what they say. The QAnon conspiracy theorists created memes and videos with deliberately obscured fonts, wonky text, or backwards writing, which they believe might trick Rosetta or disrupt this process. Many of the altered memes were first spotted on 8chan by Shoshana Wodinsky, an intern at NBC News.

"Maybe we have to think of content moderation as an ongoing effort that must evolve in the face of innovative adversaries and changing cultural values." Tarleton Gillespie

It's not clear whether any of these tactics will work (or how seriously they have even been tested), but it's not hard to imagine that other groups will keep trying to get around Facebook. It’s also incredibly difficult to build a machine-learning system that’s foolproof. Automated tools like Rosetta might get tripped up by wonky text or hard-to-read fonts. A group of researchers from the University of Toronto’s Citizen Lab found that the image-recognition algorithms used by WeChat—the most popular social network in China—could be tricked by changing a photo’s properties, like the coloring or way it was oriented. Because the system couldn’t detect that text was present in the image, it couldn’t process what it said.

It’s hard to create ironclad content-moderation systems in part because it’s difficult to map out what they should accomplish in the first place. Anish Athalye, a PhD student at MIT who has studied attacks against AI, says it’s difficult to account for every type of behavior a system should protect against, or even how that behavior manifests itself. Fake accounts might behave like real ones, and denouncing hate speech can look like hate speech itself. It’s not just the challenge of making the AI work, Athalye says. “We don't even know what the specification is. We don't even know the definition of what we're trying to build."

When researchers do discover their tools are susceptible to a specific kind of attack, they can recalibrate their systems to account for it, but that doesn’t entirely solve the problem.

“The most common approach to correct these mistakes is to enlarge the training set and train the model again,” says Carl Vondrick, a computer science professor at Columbia University who studies machine learning and vision. “This could take between a few minutes or a few weeks to do. However, this will likely create an arms race where one group is trying to fix the model and the other group is trying to fool it.”

Another challenge for platforms is deciding how transparent to be about how their algorithms work. Often when users, journalists, or government officials have asked social media companies to reveal their moderation practices, platforms have argued that disclosing their tactics will embolden bad actors who want to game the system. The situation with Rosetta appears like good evidence for their argument: Before details of the tool were made public, conspiracy theorists ostensibly weren’t trying to get around it.