Design notes for a future crowd work market

A new market should account for the unique needs of professional crowd workers — and include them in market design and management

Rochelle LaPlante and Six Silberman

This post is prompted by Stanford computer science professor Michael Bernstein’s call for “aspiring researchers” to “join Stanford researchers to form the largest crowdsourcing research project ever.”

After reading the post, we wanted to offer some thoughts. Rochelle has spent eight years as a Turker — a worker on Amazon’s Mechanical Turk platform (AMT). Six has spent six years as co-maintainer of Turkopticon, a review site used by Turkers to review requesters — employers — on AMT. Between the two of us — and the many workers, requesters, researchers, and others we’ve had the privilege to learn from over the years — we’ve spent some time thinking about what a future crowd work market might look like. While we can’t speak for anyone else, we do hope others will see some of their experiences and hopes reflected here. And we hope, more than anything else, to spark open, inclusive discussion and constructive debate.

In this post we assume a next generation crowd work market will inherit some of the basic ideas of AMT:

There are workers and requesters

Requesters post tasks; workers do them

Requesters set prices (it is not, for example, an auction)

Requesters review workers’ submissions and approve or reject them

Requesters can post tasks and review work algorithmically, through an API

Requesters can make qualifications and use them to screen workers for specific tasks

We don’t necessarily think this is the best model. In fact, there are very good reasons to explore other models. But there are some decent reasons to talk about how to improve on this model. Workers and requesters both understand it. Collectively, the crowd work community knows a lot of the problems with it. And we even have ideas about how to deal with some of them. (For example, while we assume requesters may reject work, we do not assume that workers are not paid for rejected work, as in AMT.) As Bernstein rightly wrote, AMT is “notoriously bad at ensuring high quality results, producing respect and fair wages for workers, and making it easy to author effective tasks.” But the model behind it works well enough that it’s still popular after ten years. So while we hope the Stanford project sparks a broader discussion, in this post we start with this model.

What have we learned from Mechanical Turk and Turkopticon?

Turkers are people, and requesters and platform operators are managers

Of course, nobody argues that Turkers aren’t people — at least, not in words. But the design of AMT makes human labor available as if it were computation — a goal made clear by the platform’s cheeky but truthful tagline, “artificial artificial intelligence.” This makes it easy for requesters to forget they are managing people. For example, the researcher David Martin wrote that when he presented the paper “Being a Turker” at the 2014 Conference on Computer Supported Cooperative Work, “some people came up to [him] after to say, ‘I had never really thought before that using MTurk involved real people, and that you maybe should pay them a decent amount…’.” When requesters forget — or worse, don’t realize in the first place — that Turkers are people, and that some of them rely on Turking as a major source of income, the odds of misunderstandings and bad outcomes go up. The AMT API lets requesters post tasks and review work automatically with code. But this code is special; it is code that directly manages people. Requesters and platform operators are not just programmers; they are managers. This is the case even if the design of AMT obscures that fact, and even if investors would rather invest in “technology companies” than “labor companies.”

Most tasks on AMT are done by a relatively small group of Turkers who rely on Turking income to make ends meet

Most of these professional Turkers are highly skilled and educated, live in the United States, consistently spend dozens of hours Turking per week, care about producing good work, rely on Turking income to make ends meet, and have highly developed supportive ecosystems to make this work financially viable and profitable. Many report that they Turk because it is the best paying work currently available to them, given constraints such as disability, lack of jobs in rural areas, discrimination, and family care obligations. So some of the stories offered by researchers and employers to justify low pay — for example, that most workers who rely on crowd work income live in “developing” countries with low costs of living; that most “developed”-country workers work on AMT mainly to pass time; that the work isn’t physically difficult; or that workers freely choose to participate in crowd work and can choose other work if they find the pay too low — are misleading myths.

Professional Turkers spend a lot of unpaid time helping each other and requesters

Turkers, especially professionals, spend a great deal of unpaid time helping each other find good work and avoid bad work, teaching one another about platform processes, helping requesters improve their task designs, and offering social — and sometimes financial — support to one another. They want market transactions to produce good outcomes for workers and requesters. They take professional pride in doing good work and helping other members of the community, sacrifice short-term personal profit to follow norms they believe benefit everyone and punish others who do not follow them, and spend unpaid time discussing norms with others. They care about doing good work for requesters. And they care that the rules of the market and the community’s informal norms are conducive to producing good work over the long term. They spend a significant amount of unpaid time educating requesters and helping with project design, setup, and testing.

Facilitating informal communication is an extremely powerful, and underappreciated, way to improve market outcomes

Many tasks posted to AMT rely on complex workflows enabled by custom software created by the requester or by third parties. Workers often deal with technical problems requesters did not plan for. For example, there may be a problem in the HTML code of a task that prevents workers from clicking the “Submit” button. Selecting a radio button to answer a question may deselect the radio button the worker clicked on to answer the previous question. A task may direct the worker to an outside site, but the URL for the site may be broken. A worker may be directed to enter a completion code after taking a survey on an outside site, but the site may not provide a completion code. Inexperienced requesters may not realize that such problems are possible and may not respond to workers’ attempts to communicate when problems arise. Experienced workers recognize these common problems and often even study a task’s HTML code to help requesters fix them. But for this to happen, requesters must be open to communicating with workers — and to the possibility that workers are skilled, helpful, and well-intentioned. When requesters are open to this possibility, workers and requesters can often solve problems together by communicating. Open informal communication helps workers better understand the context of a requester’s tasks, leading to better work. It helps requesters better understand workers’ processes and contexts, leading to better task design, higher work quality, and fairer pay and review processes. And it improves trust in the market overall, increasing requesters’ confidence that work will be done well and workers’ confidence that work will be reviewed fairly and paid for promptly.

There are some cheaters among workers and requesters, and the interests of professional Turkers and well-intentioned requesters are aligned against all cheaters

There are cheaters among both workers and requesters. But both professional Turkers and well-intentioned requesters care about producing good work efficiently, paying workers fairly, and keeping the market as a whole functioning well. These well-intentioned participants on both sides of the market think about the consequences of their actions on other participants and in the long term. Cheaters, in contrast, only care about their own profits, and focus on the short term. All cheating behavior erodes trust in the market. This forces both workers and requesters to take more effort to verify the good intentions of others. This makes crowd work less profitable, more time-consuming, and more stressful for well-intentioned participants. Some cheaters can be reformed once they understand that cheating doesn’t pay. Others don’t care if their strategy isn’t sustainable or if they hurt someone else. They have to be disabled or kicked out.

Suggestions for a future crowd work market

In view of these lessons, we have a few suggestions for the design and operation of a future crowd work market.

Present workers as people, not computation

This does not mean that requesters should know individual workers’ first and last names, email addresses, geographical locations, educational backgrounds, or work histories — in fact, giving requesters such information might give them more power than they already have over workers. Rather, it means that the design and messaging of the platform should incentivize and prime requesters to realistically consider workers’ abilities, needs, contexts, and limitations while designing and reviewing tasks. Workers should not be asked to complete tasks that are unsafe, illegal, or unethical. Platform operators should inform requesters that some workers rely on income earned from crowd work to make ends meet, and that if requesters intend to source work from these workers they should pay accordingly. When work on AMT goes well, no communication is needed. Workers and requesters both value this arrangement, and it distinguishes AMT from online staffing markets such as oDesk in which requesters explain their projects in detail, interview multiple prospective hires, and negotiate payment. But requesters should not be primed to always expect near-instantaneous and “frictionless” market interactions, as if they were interacting with a computing system. Task instructions may be unclear. Workers may encounter hard-to-reproduce technical problems. Or workers may unexpectedly object to some part of a task. A future crowd work market should warn requesters that they may need to communicate with workers if problems arise, and give them tools with which to do so effectively, efficiently, promptly, and respectfully.

Value and support professional workers

Professional workers provide a disproportionately large amount of value in a crowd work market, complete most of the work, and are more strongly invested in its smooth functioning than casual workers. Professionals go out of their way to teach new workers and requesters how to use the platform, help requesters improve their task designs, share real-time market information, communicate when problems arise, discuss and enforce market norms (such as protecting information about attention checks and qualification screeners on worker forums and requester review sites), and design and maintain forums and custom software. These unpaid labor activities provide a huge amount of uncompensated value in the market. These activities need not necessarily be formalized or compensated, as forcibly formalizing these currently voluntary activities into duties could limit workers’ flexibility. But platform operators should make an effort to understand these activities and their role in the overall functioning in the market, and to support them — financially, procedurally, or technologically — as appropriate. Additionally, professional workers, because of their reliance on crowd work income and their participation in crowd worker communities, are both more able and more motivated to produce high quality work. Thus not only do professional crowd workers do most crowd work tasks, they also do most of the good work. Platform designers should develop strong relationships with professional workers, understanding and valuing their paid and unpaid work, soliciting their advice on major design and operational decisions, and taking their advice seriously.

Punish cheaters

A two-sided and somewhat complex reputation system is needed to do this robustly. The limitations of both AMT and Turkopticon have taught us that this cannot be a “side project” to work well. Doing reputation well presents significant technical and social challenges. It requires ongoing engagement with the community of stakeholders and ongoing technical adjustments to the platform design. Offering a detailed design is not our goal here, but there is a lot of work on how to do it right, and it’s far from impossible.

Take worker and requester input seriously — even, or especially, when it runs counter to designers’ understandings

Designers should not always give themselves the final say in design and management decisions, even though they may be in a position to do so because of their technical power. Workers and requesters should have a significant say in the design and operation of the market. But not all stakeholders are equally invested in the success of the market, and not all stakeholder input can or should be acted on. Explicit and well-publicized processes should allow workers and requesters substantive input into — and, in some cases, power over — platform design and management decisions. These processes should privilege professional workers who rely on income earned through the platform, requesters whose operations rely crucially on the platform, and other stakeholders who are significantly invested in the success of the market. And they should give stakeholders ways to hold platform operators accountable for implementing changes as agreed upon in a timely manner. These means should be technically integrated into the platform where appropriate and formally integrated into organizational structure and process where appropriate. Sustaining substantive stakeholder power over decision making may require a nonstandard organizational structure — such as a B Corporation or stakeholder-owned cooperative — and nontraditional funding strategies such as crowdfunding to prevent the concentration of power in the hands of a small number of managers and funders.

Provide good documentation and technical support and pay the people who provide it

Because workers and requesters are the most motivated and capable people available to solve one another’s problems, communication between them should be encouraged and easily available. But platform operators should also offer detailed documentation technical and support as needed. Workers should not be relied on as a source of unpaid technical support for requesters. If reliable support — for both workers and requesters — requires paid staff, funding should be arranged, and tradeoffs and potential alternatives discussed with stakeholders.

Use technology to guide, support, and amplify — not replace — human capabilities and relationships

A new crowd work market should be oriented by the understanding that work is a human activity and markets are human institutions. A future crowd work market should take seriously the central lessons of the last two decades of research in “human-centered computing”: Technology can support and guide work, but it cannot replace human capabilities or relationships. Algorithms can aggregate human inputs, but no matter how sophisticated or accurate they become, they cannot replace human beings who take responsibility for decisions that affect others. Computers can make information available, but they cannot replace human judgment or emotion. Reputation systems can store, sort, and distribute accounts of people’s experiences and opinions, but they cannot build or create trust. If participants in a market do not trust each other, a reputation system can only make this clear; it cannot replace the long and difficult work of talking, understanding, misunderstanding, and reconciling out of which human trust is built. And when workers can’t trust requesters to pay them fairly, every task becomes a risk of wasted time. In this situation, workers may feel pressured to work as fast as possible, leading to lower quality work. And when requesters can’t trust workers to do good work, they must spend time, money, or both building sophisticated quality control systems and paying multiple workers to do the same work. A next-generation crowd work market will help well-intentioned workers do what they already want to do — get paid fairly for doing good work. The problems with both AMT and Turkopticon show that this is much easier said than done.

What now?

The design of a next-generation crowd work market should not be “crowdsourced” — because in crowdsourcing, the requester orchestrates the process and gets the big reward — money, credit, or both. In paid crowdsourcing arrangements, workers are “incentivized” to give their “input” with pay (or the chance to get paid). In Bernstein and colleagues’ crowdsourced research project, the incentives are the chance to offer input into the design of a new technology, to associate with a prestigious university, and to get one’s name on the author list of an academic paper. A few members of the “crowd” will likely benefit enormously from participating in this process. But as with most spec work — work done for free in hopes of future compensation — an inclusive but centrally-controlled process for crowdsourced research will probably end with the requesters getting the big rewards.

In the spirit of resisting this dynamic, we call on researchers to participate as equal, if expert, contributors to a collective, open-ended conversation about the future of crowd work — and to acknowledge the different but very real expertise of deeply invested crowd workers and requesters. Worker opinion especially should be solicited where workers already live and work — mainly on the various crowd worker forums. In this process, workers should be treated as equal, deeply invested, expert partners in building a better future of crowd work — not slotted into a process managed by a small team of researchers, designers, and programmers. Workers’ knowledge, earned over years of participating in the grueling daily grind of crowd work, is unique and offers a perspective not directly available to people who are not themselves experienced workers.

Bernstein and colleagues have made the spark: it is time now to talk seriously and openly about what a new crowd work market could, and should, look like. We do not intend to orchestrate, facilitate, or even guide this discussion. We hope to contribute to it. And we hope very much that it will be wide-ranging and lively, and that professional workers’ and requesters’ voices will be heard just as loudly as those of researchers, designers, and programmers.

Addendum, February 20, 2015

Bernstein has explicitly invited workers to join the project.

The authors thank Lilly Irani, co-founder and -maintainer of Turkopticon, and Kristy “Spamgirl” Milland, professional Turker and community manager of Turker Nation, for their support and extremely helpful comments on this post. Some text in this post is based on Silberman’s dissertation-in-progress, Human-Centered Computing and the Future of Work: Lessons from Mechanical Turk and Turkopticon, 2008–2015. The opinions presented here are those of the authors as individuals. They do not represent the official positions of any organizations such as Turkopticon, Electrolyte Enterprises, Turker Nation, Mturkgrind, or the University of California, or the views of any particular group of workers.