M. Six Silberman, Kristy Milland, Rochelle LaPlante, Joel Ross, and Lilly Irani, with approval of Andrew Zaldivar and Bill Tomlinson

This paper is out of date and should no longer be cited.

The average wage statistic presented in the paper is misleading, as are many other statistics such as those on the nationalities of workers.

While the statistics may have been accurate in 2009, since then:

Market demographics have changed

Market demographics have changed significantly due to system design and rule changes. In the paper, we wrote that MTurk was gaining a substantial number of international workers. However, the reverse is now the case as new international workers are no longer being admitted to the system. To our knowledge, all new workers that have joined since late 2012 have been American. This change has made a substantial difference in who the workers are and what their expectations are for fair payment.

We learned that most tasks on AMT are done by a small group of professional Turkers, so averages are misleading

Most of the tasks on Mechanical Turk are done by a relatively small group of workers. These workers rely on income from “Turking” to meet basic needs. These workers should be understood as professional Turkers. Professional Turkers have a stronger incentive than other workers to search for well-paying tasks and complete them well to minimize the risk of rejection. Professional Turkers also participate in online worker communities and use specialized tools to find, choose, and do tasks. As a result, they earn higher wages than other workers. We now believe that Mechanical Turk worker wages follow a power law distribution. An average wage figure over all workers is therefore misleading. Importantly, even an accurate average figure over all workers is likely to be significantly lower than an average wage figure over all tasks.

We learned that low pay causes selection bias

Low-paying survey tasks posted to Mechanical Turk — quite possibly including our survey task — are likely to suffer from selection bias. Low-paying survey tasks do not recruit many professional Turkers, many of whom do not do surveys at all, instead preferring large batches. This bias affects the quality of results. In survey tasks aiming to collect demographic data, it is likely to produce an unrepresentative sample.

We learned that researcher and requester discourse about wages affects wages

We have also learned that researcher and requester discourse about wages may itself play a role in suppressing wages. Specifically, a hypothesis exists among researchers and requesters that increasing pay does not lead to improved quality because high-paying tasks attract spammers. A typical approach to this problem is to post the same task several times with low pay and rely on a complex aggregation or quality control scheme to try to extract high-quality results from low-quality inputs. For batch HITs, a more effective approach is to post the task once with high pay and use a qualification test to select for professional Turkers with the appropriate skills for the task (see Milland 2014; citation below). Other methods, such as fair, well-crafted attention checks, may be more appropriate for surveys.

What now?

An update to the paper is being prepared. When an update is available, a comment will be posted on the ACM Digital Library entry for the paper and this post will be updated.

In the meantime, we strongly encourage researchers considering citing this paper to read and cite the Guidelines for Academic Requesters and the following sources instead:

Fort, Karën, Gilles Adda, and K. Bretonnel Cohen. 2011. Amazon Mechanical Turk: gold mine or coal mine? Computational Linguistics 37(2): 413–420.

Martin, David, Benjamin V. Hanrahan, Jacki O’Neill, and Neha Gupta. 2014. Being a Turker. Proc. CSCW ‘14: 224–235.

Milland, Kristy (“spamgirl”). 2014. The myth of low cost, high quality on Amazon’s Mechanical Turk. Turker Nation, 30 Jan 2014.