Strengths and Limitations of Mechanical Turk

Leib Litman, PhD

Hundreds of academic papers are published each year using data collected through Mechanical Turk. Researchers have gravitated to Mechanical Turk primarily because it provides high quality data quickly and affordably. However, Mechanical Turk has strengths and weaknesses as a platform for data collection. While Mechanical Turk has revolutionized data collection, it is by no means a perfect platform. Some of the major strengths and limitations of MTurk are summarized below.


Strengths

A source of quick and affordable data

Thousands of participants are looking for tasks on Mechanical Turk throughout the day, and can take your task with the click of a button. You can run a 10 minute survey with 100 participants for $1 each, and have all your data within the hour.

Data is reliable

Researchers have examined data quality on MTurk and have found that by and large, data are reliable, with participants performing on tasks in ways similar to more traditional samples. There is a useful reputation mechanism on MTurk, in which researchers can approve or reject the performance of workers on a given study. The reputation of each worker is based on the number of times their work was approved or rejected. Many researchers use a standard practice that relies on only using data from workers who have a 95% approval rating, thereby further ensuring high-quality data collection.

Participant pool is more representative compared to traditional subject pools

Traditional subject pools used in social science research are often samples that are convenient for researchers to obtain, such as undergraduates at a local university. Mechanical Turk has been shown to be more diverse, with participants who are closer to the U.S. population in terms of gender, age, race, education, and employment.


Limitations

There are two kinds of potential limitations on MTurk, technical limitations, and more fundamental limitations with the platform. Many of the technical limitations of MTurk have been resolved through scripts written by researchers or platforms such as CloudResearch, which help researchers do things they were not previously able to do on MTurk including

  • Exclude participants from a study based on participation in a previous study
  • Conduct longitudinal research
  • Make sure larger studies do not stall out after the first 500 to 1000 Workers
  • Communicate with many Workers at a time

There are however several more fundamental limitations to data collection on MTurk:

Small population

There are about 100,000 Mechanical Turk workers who participate in academic studies each year. In any one month about 25,000 unique Mechanical Turk workers participate in online studies. These 25,000 workers participate in close to 600,000 monthly assignments. The more active workers complete hundreds of studies each month. The natural consequence of a small worker population is that participants are continuously recycled across research labs. This creates a problem of ‘non-naivete’. Most participants on Mechanical Turk have been exposed to common experimental manipulations and this can affect their performance.

Although the effects of this exposure have not been fully examined, recent research indicates that this may be impacting effect sizes of experimental manipulations, comprising data quality and the effectiveness of experimental manipulations.

Diversity

Although Mechanical Turk workers are significantly more diverse than the undergraduate subject pool, the Mechanical Turk population is significantly less diverse than the general US population. The population of MTurk workers is significantly less politically diverse, more highly educated, younger, and less religious compared to the US population. This can complicate the way that data can be interpreted to be reliable on a population level.

Limited selective recruitment

Mechanical Turk has basic mechanisms to selectively recruit workers who have already been profiled. To accomplish this goal Mechanical Turk conducts profiling HITs that are continuously available for workers. However, Mechanical Turk is structured in such a way that it is much more difficult to recruit people based on characteristics that have not been profiled. For this reason while rudimentary selective recruitment mechanisms exist there are significant limitations on the ability to recruit specific segments of workers.


Solutions

CloudResearch offers researchers more specific selective recruitment opportunities, and has some features in development to help researchers target participants who are less active and therefore more naive to common experimental manipulations and survey measures. CloudResearch also offers access to Prime Panels, which has access to more than 50 million participants, who can be selectively recruited, and are more diverse.


References

Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers.Behavior research methods, 46(1), 112-130.

Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples.Journal of Behavioral Decision Making, 26(3), 213-224.

Litman, L., Robinson, J., & Abberbock, T. (2017). TurkPrime. com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior research methods, 49(2), 433-442.

Litman, L., Robinson, J., & Rosenzweig, C. (2015). The relationship between motivation, monetary compensation, and data quality among US-and India-based workers on Mechanical Turk. Behavior Research Methods, 47(2), 519-528.

Paolacci, G., & Chandler, J. (2014). Inside the Turk: Understanding Mechanical Turk as a participant pool.Current Directions in Psychological Science, 23(3), 184-188.

Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior research methods, 46(4), 1023-1031.

Related Articles

SUBSCRIBE TO RECEIVE UPDATES