It’s no secret that YouTube has struggled to moderate the videos on its platform over the past year. The company has repeatedly faced scandals over its inability to rid itself of inappropriate and disturbing content, including some videos aimed at children. Often missing from the discussion over YouTube’s shortcomings, though, are the employees directly tasked with removing things like porn and graphic violence, as well as the contractors that help train AI to learn to detect unwelcome uploads. But a Mechanical Turk task shared with WIRED appears to provide a glimpse into what training one of YouTube's machine learning tools looks like at the ground level.

MTurk is an Amazon-owned marketplace where corporations and academic researchers pay individual contractors to perform micro-sized services—called Human Intelligence Tasks—in exchange for a small sum, usually less than a dollar. MTurk workers help keep the internet running by completing jobs like identifying objects in a photo, transcribing an audio recording, or helping to train an algorithm.

And while MTurk workers don't make content moderation decisions directly, they do routinely help train YouTube’s machine learning tools in all sorts of ways. The machine learning tools that they help train also do more than just find inappropriate videos, they aid other parts of YouTube’s system, like its recommendation algorithm.

“YouTube and Google have been posting tasks on Mechanical Turk for years,” says Rochelle LaPlante, the Mechanical Turk worker who shared the specific assignment with WIRED. “It’s been all different kinds of stuff—tagging content types, looking for adult content, flagging content that is conspiracy theory-type stuff, marking if titles are appropriate, marking if titles match the video, identifying if a video is from a VEVO account.”

LaPlante says that the tasks and guidelines often change. Some appear to be directly related to detecting offensive content, while others appear to be about helping determine whether a video is appropriate for a specific audience segment, like children. “Some workers have suspected this is related to decision making in which channels should be monetized or demonetized,” she says.

Watch and Learn

The specific moderation task shared with WIRED, which LaPlante completed on March 14 for a payout of 10 cents, is fairly straightforward, though it leaves plenty of room for the worker's opinions. The job offers a window into a usually opaque process: how a human's interpretation of a video is used to later help craft a machine learning algorithm. And even inside YouTube, machine learning algorithms only flag videos; determining whether something violates the company's Community Guidelines remains a human's job.

The machine learning tools that they help train also do more than just find inappropriate videos.

The MTurk HIT asks the the worker to watch a video, and then tick a series of boxes about what it contains. It also asks them to pay attention to the video's title and description. The MTurk worker should “watch enough of the video” to be confident in their judgment, and the HIT suggests they should consider watching it at 1.5x speed to quicken the process. The questions address whether the clip contains “crude/coarse language,” or “adult dialog,” including “offensive or controversial views.” It asks MTurk workers to differentiate between artistic nudity and content designed to “arouse or sexually gratify.”

One especially ambiguous section asks the worker to differentiate between “graphic depictions (actual or fictional) of drug use” and “incidental or comedic use of soft drugs." The task doesn't include a list of what counts as a hard or soft drug, though it does indicate that “hard drugs” include heroin. At the end of the task, the worker judges whether they think the video is appropriate for children.

The MTurk task that LaPlante completed for YouTube.

In order to make the federal minimum wage of $7.25, an MTurk worker would need to complete 72.5 tasks like this in an hour, meaning there's an incentive to answer these questions extremely quickly. While some of the questions YouTube asks are straightforward (Is there any speech or singing in the audio?), most are nuanced, and underscore the complexity of training an artificial intelligence to help sort a gigantic, global video platform. The average cat video likely wouldn’t trip up a worker assigned to this task, but it’s not hard to imagine how, say, a political rant about abortion might.