As you can see, the instructions are clearer and more specific about what the worker should be looking for. That gave us better results on image quality; as a nice additional benefit, this granular data enables us to provide feedback to restaurants on how a food image could be improved.

Task parameter tuning

The MTurk requester UI provides an easy way to alter parameters for a task like the task name, description, tags to improve searchability of the task, the number of unique workers every task should be performed by, and worker qualifications and locations to ensure that human error is reduced. We performed experiments with a hand-curated and labelled dataset of images to tune these parameters for our task to ensure best possible results.

Studying similar MTurk tasks enabled us to agree on a fair compensation for the workers. MTurk also allows requesters to only have workers with certain expertise work on their tasks. But this smaller set of workers can lead to higher turnaround time and prices for tasks.

The following aspects can also have an effect on the task turnaround time:

The number of workers available : The number of workers available to complete a task reduces when we specify qualifications and region filters. Worker availability also changes according to the willingness of workers to take up the task based on the wage provided.

: The number of workers available to complete a task reduces when we specify qualifications and region filters. Worker availability also changes according to the willingness of workers to take up the task based on the wage provided. The complexity of the task : Some tasks like address verification, outlining and tagging of objects in pictures, etc. are complex tasks and increase the average time a worker will spend on each task.

: Some tasks like address verification, outlining and tagging of objects in pictures, etc. are complex tasks and increase the average time a worker will spend on each task. User interface provided to the workers : A complicated user interface without clear instructions or one with a faulty user experience will reduce the number of workers willing to take up the task, increasing turnaround time for the task.

: A complicated user interface without clear instructions or one with a faulty user experience will reduce the number of workers willing to take up the task, increasing turnaround time for the task. Searchability of the task on the worker MTurk application: The newest tasks show up at the top of the search results for workers. Hence those tasks will have more workers taking up the task, reducing their turnaround time. We therefore recommend that you re-create incomplete tasks well before they expire as that will improve their search visibility. Adding appropriate tags to the task helps with better search result placement.

Guarding against inaccurate results

While MTurk is an easy-to-use and reliable tool for crowdsourcing data, it is worth noting that every worker will have a different perspective on the task you provide. It is almost certain that you will have instances where the same task yielded different answers from multiple workers. It’s also possible that some workers, while trying to maximize the number of tasks they work on, do not pay attention to the quality of their work.

To tackle these issues, we collected rating answers from multiple workers for the same image and used an average of scores from all the workers as the quality rating for an image to reduce the effect of anomalies. MTurk also lets you obtain anonymized data on every worker who finished the tasks, which can then be used to weed out workers submitting unsatisfactory results repeatedly.

Automating the rating calculation process

The task layout was setup through the MTurk Requester user interface, but we automated the rest of the quality rating calculation process through daily cron jobs running on the Grubhub infrastructure like so:

Data collection : The data collector job runs once every day to aggregate all image assets uploaded onto the Grubhub platform the previous day and writes a dated file to an Amazon S3 bucket.

: The data collector job runs once every day to aggregate all image assets uploaded onto the Grubhub platform the previous day and writes a dated file to an Amazon S3 bucket. Task creation : The HIT creation cron job reads this file, programmatically creates tasks for MTurk workers using the MTurk API and records the metadata returned by the MTurk service in an Apache Cassandra database.

: The HIT creation cron job reads this file, programmatically creates tasks for MTurk workers using the MTurk API and records the metadata returned by the MTurk service in an Apache Cassandra database. Rating calculation: Independent of both these jobs, this job runs the rating calculation process that fetches previously completed tasks from MTurk, calculates a rating score for the image assets associated with those tasks, and persists the score. It also fetches any expired tasks and re-creates them.

architectural overview

Conclusion

Our early results obtained from MTurk are promising. Several hundred thousand image assets have been given ratings on all three facets and we found that while the workers did a great job in identifying photos with a “great” and “bad” quality rating, the accuracy of results with a “good” quality rating for photos can still be improved. Examples of the ratings received can be found below:

Some examples of ratings received.

Thus, it is evident that using MTurk for crowdsourcing data can be difficult to get right the first time, but some experimentation — as seen in this blog post — can yield satisfactory results. Future steps for us would include using the quality rating scores to understand possible correlations between the quality of food images on a restaurant menu page on Grubhub and their performance on our delivery platform, encouraging restaurants to replace “bad” quality images on our platform to possibly enhance their business and exploring training an AI model using the data collected from MTurk for classifying images based on their quality.