Doing Important Research on Amazon's Mechanical Turk?

There seems to be many important questions that need research, from the mundane (say, which of four slogans for 80,000 Hours people like best) to the interesting (say, how to convince people to donate more than they otherwise would). Unfortunately, it’s difficult to collect data in a quick, reliable, and affordable way. We generally lack access to easily survey-able populations and a lot of research has high barriers to entry for completing (such as needing to enroll in graduate school).

However, since the 2005 creation of Amazon’s Mechanical Turk, some of this has changed. Mechanical Turk is a website where anyone can create tasks for people to complete at a certain wage. These tasks can be anything, from identifying pictures to transcribing interviews to social science research.

Best of all, this is quick and cheap – for example, you could offer $0.25 to complete a short survey, put $75 in a pot, and get 300 responses within a day or two, and this should be quicker and cheaper than any other option available to you for collecting data.

But could Mechanical Turk actually be useable for answering important questions? Could running studies on Mechanical Turk be a competitive use of altruistic funds?

What Questions Would We Be Interested in Asking?

There are a variety of questions we might be interested in that would be appropriate to ask via Mechanical Turk. I don’t believe you could make a longitudinal study, so testing the effects of vegetarian ads on diets in a useful way wouldn’t be able to happen. But less conclusive diet studies could be run in this area.

Additionally, we could test to see how people respond to various marketing materials in EA space. We could explore how people think about charity and see what would make them more likely to donate and how changes in the marketing affect a willingness to donate. We could find out which arguments are more compelling. We could even test various memes against each other and see what people think of them.

Is Mechanical Turk a Reliable Source of Data?

It would only be good to use MTurk if the data you could get is useful. But is it?

Diversity of the Sample

The first question we might ask is whether MTurk produces a sample that is sufficiently diverse and representative of the United States. Unfortunately, this isn’t always the case for MTurk. In “Problems With Mechanical Turk Study Samples”, Dan Kahan noted that female populations can be overrepresented (as high as 62%), African Americans are underrepresented (5% in MTurk compared to 12% in the US), and conservatives are very underrepresented (53% liberal / 25% conservative in MTurk vs. 20% liberal and 40% conservative in real life).

MTurkers are more likely to vote and vote Obama. More concerning, Kahan also found respondents lie about their prior exposure to measures and even whether their US citizens. Additionally, repeated exposure to standard survey questions can bias responses.

But is this really a problem? First, MTurk samples are still more diverse and representative than college student samples or other surveys conducted over the internet. Second, many important questions are about items that we wouldn’t expect to be influenced by demographics. So it’s quite possible that MTurk might be the best of all possible sources by enough to make it worth it.

Wage Sensitivity

Do you have to pay more for higher quality data? Possibly not. Buhrmester, Kwang, and Gosling and another analysis both found that even changes between 2 cents and 50 cents didn’t affect the quality of data received on psychological studies, but it does buy more participants and at a higher rate of participation (get more participants faster).

Is Mechanical Turk a Competitive Use of Altruistic Funds?

It depends on the question being asked, how reliable the findings are, and how they’d be put into use.

Even though I don’t think MTurk could be used for veg flyers very well, it’s the best example I can think of right now: imagine that the current flyer converts 1% of people who read it to consider vegetarianism, but a different flyer might convert 1.05% of people. This means that every donation to veg ads now has approximately a 1.05x multiplier attached to it, because we can use the better flyer. If the MTurk study/studies to find this cost $1K, we would break even on this after distributing 100K flyers at 20 cents a flyer. I don’t know how many flyers are given out a year, so that may or may not be impressive, but the numbers are made up anyway.

The bottom line is that studies may have strong compounding effects, which will almost always beat out the relatively linear increase in impact from a donation to something like AMF. But chances might be small that MTurk will produce something useful. Likewise, it’s possible that there are yet better settings for running these tests (like split testing current materials as they are being distributed, doing longer range and more reliable tracking of impact, etc.). But MTurk could be an interesting way to supplement existing research in a quick and cheap manner.

I think it’s worth thinking about further, even if I wouldn’t act on it yet.


(Also cross-posted on LessWrong.)