By Leib Litman, PhD, Cheskie Rosenzweig, MS, & Aaron Moss, PhD
Updated 9/12/22 to reflect changes to the MTurk Toolkit
Amazon Mechanical Turk (MTurk) is a microtask platform that has been used for social science research for nearly a decade. During this time, thousands of peer-reviewed papers have been published supporting MTurk’s status as a source of high-quality data (e.g. Hauser & Schwarz, 2016; Litman & Robinson, 2020; Litman et al., 2015). But, in the last year or two, more participants appear to be providing low-quality data for reasons including inattention, a lack of language comprehension, a lack of effort, or fraudulent responses from people outside of the U.S. who misrepresent their location to access studies intended for people within the U.S. (e.g. Chandler et al., 2020; Kennedy et al., 2018). The result of these problems has been an increasing number of data quality issues, including lower pass rates on attention checks and a “bot” scare that has left researchers questioning the validity of their data.
CloudResearch has recently taken an aggressive approach to dealing with this issue. We are introducing a new solution that largely ameliorates data quality problems on MTurk.
CloudResearch extensively vets the entire Mechanical Turk population, using a variety of approaches to assess data quality, participant attention, engagement, and English comprehension. These methods include administering questions from a library of hundreds of pre-tested data quality checks, developed using patented technology. Our methods draw from our previously published work (e.g.Chandler et al., 2019), as well as ongoing research efforts. In addition, we created tasks to target fraudulent actors who are willing to lie about their personal information. So far, we have vetted approximately 80% of all active U.S. MTurk workers, and we will continue to expand the list of vetted MTurkers as new people join the platform. This new, large-scale vetting has led to a solution that we are offering as part of our MTurk Toolkit.
If you want to be sure that everyone who participates in your study has shown prior evidence of attention and engagement, you can use our “CloudResearch-Approved Participants” feature. This feature allows only vetted workers to take HITs.
The Approved Participants represent various levels of experience on MTurk, different ages, races, incomes, and are generally demographically similar to other MTurk workers. Because we use more stringent vetting tools to Approve workers, the number of people who are available through this feature is smaller than MTurk overall but of much higher quality.
For evidence of the Approved List’s effectiveness, see this peer-reviewed publication and this preprint.
In the past we have developed other tools that are still useful in minimizing low-quality data, and we continue to offer these tools. These additional tools allow researchers to block workers based on suspicious locations or duplicate IP addresses. These geo-location and IP-based tools work well in combination with our new solutions.
Our data quality solutions will protect your studies from most of the issues that researchers have encountered recently on MTurk. But, like any sampling method, there are best practices that researchers should follow to ensure that they get good data. Not following best practices will likely lead to compromised data quality even with the use of our features. Please take a look at Chapter 11 of our recently published book for further discussion of these best practices in the context of ethical use of MTurk, or check out this blog. In general, the following elements of study setup can affect the quality of the data you get from MTurk:
When you launch a study using our data quality solutions, it is unlikely that every response from every participant will be perfect—people are human—but you should not see responses that cause you to wonder whether the participant in your study is actually a person or not. You should not see a large percentage of workers providing low-quality data, and can expect even better data quality from people on the CloudResearch Approved List. Because these are new solutions, we anticipate that many users will have questions and suggestions for improvements as they begin using these tools. Please let us know how things are going by leaving a note in our suggestion box or feel free to reach out to us at any time. We are glad to be able to offer you our new solutions, and will continue monitoring participant attention and engagement.
Chandler, J., Paolacci, G., & Hauser, D. J. (2020). Data quality issues on MTurk. In L. Litman & J. Robinson (Eds.) Conducting Online Research on Amazon Mechanical Turk and Beyond (95-120). Sage Academic Publishing. Thousand Oaks: CA
Chandler, J., Rosenzweig, C., Moss, A. J., Robinson, J., & Litman, L. (2019). Online panels in social science research: Expanding sampling methods beyond Mechanical Turk. Behavior Research Methods, 51(5), 2022-2038.
Hauser, D. J., & Schwarz, N. (2016). Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior Research Methods, 48(1), 400-407.
Kennedy, R., Clifford, S., Burleigh, T., Waggoner, P. D., Jewell, R., & Winter, N. J. (2018). The shape of and solutions to the MTurk quality crisis. Political Science Research and Methods, 1-16.
Litman, L., & Robinson, J. (2020). Conducting Online Research on Amazon Mechanical Turk and Beyond (Vol. 1). SAGE Publications, Incorporated.
Litman, L., Robinson, J., & Rosenzweig, C. (2015). The relationship between motivation, monetary compensation, and data quality among US-and India-based workers on Mechanical Turk. Behavior Research Methods, 47(2), 519-528.