Billions of text posts, photos, and videos are uploaded to social media every day, a firehose of information that’s impossible for human moderators to sift through comprehensively. And so companies like Facebook and YouTube have long relied on artificial intelligence to help surface things like spam and pornography.

Something like a white supremacist meme, though, can be more challenging for machines to flag, since the task requires processing several different visual elements at once. Automated systems need to detect and “read” the words that are overlaid on top of the photo, as well as analyze the image itself. Memes are also complicated cultural artifacts, which can be difficult to understand out of context. Despite the challenges they bring, some social platforms are already using AI to analyze memes, including Facebook, which this week shared details about how it uses a tool called Rosetta to analyze photos and videos that contain text.

Facebook says it already uses Rosetta to help automatically detect content that violates things like its hate speech policy. With help from the tool, Facebook also announced this week that it’s expanding its third-party fact checking effort to include photos and videos, not just text-based articles. Rosetta will aid in the process by automatically checking whether images and videos that contain text were previously flagged as false.

Rosetta works by combining optical character recognition (OCR) technology with other machine learning techniques to process text found in photos and videos. First, it uses OCR to identify where the text is located in a meme or video. You’ve probably used something like OCR before; it’s what allows you to quickly scan a paper form and turn it into an editable document. The automated program knows where blocks of text are located and can tell them apart from the place where you’re supposed to sign your name.

Once Rosetta knows where the words are, Facebook uses a neural network that can transcribe the text and understand its meaning. It then can feed that text through other systems, like one that checks whether the meme is about an already-debunked viral hoax.

The researchers behind Rosetta say the tool now now extracts text from every image uploaded publicly to Facebook in real time, and it can “read” text in multiple languages, including English, Spanish, German, and Arabic. (Facebook says Rosetta is not used to scan images that users share privately on their timelines or in direct messages.)

Rosetta can analyze images that include text in many forms, such as photos of protest signs, restaurant menus, storefronts, and more. Viswanath Sivakumar, a software engineer at Facebook who works on Rosetta, said in an email that the tool works well both for identifying text in a landscape, like on a street sign, and also for memes—but that the latter is more challenging. “In the context of proactively detecting hate speech and other policy-violating content, meme-style images are the more complex AI challenge,” he wrote.

Unlike humans, an AI also typically needs to see tens of thousands of examples before it can learn to complete a complicated task, says Sivakumar. But memes, even for Facebook, are not endlessly available, and gathering enough examples in different languages can also prove difficult. Finding high-quality training data is an ongoing challenge for artificial intelligence research more broadly. Data often needs to be painstakingly hand-labeled, and many databases are protected by copyright laws.

'In the context of proactively detecting hate speech and other policy-violating content, meme-style images are the more complex AI challenge.' Viswanath Sivakumar, Facebook

To train Rosetta, Facebook researchers used images posted publicly on the site that contained some form of text, along with their captions and the location from which they were posted. They also created a program to generate additional examples, inspired by a method devised by a team of Oxford University researchers in 2016. That means the entire process is automated to some extent: One program automatically spits out the memes, and then another tries to analyze them.