Story 1: (Internet) experiments in industryInternet experiments have now become a major activity in giant companies such as Amazon, Google, and Microsoft, in smaller web-based companies, and among academic researchers in management and the social sciences. The buzzword "A/B Testing" refers to the most common and simplest design which includes two groups (A and B), where subjects -- typically users -- are assigned at random to group A or B, and an effect of interest is measured. A/B tests are used for testing anything from the effect of a new website feature on engagement to the effect of a new language translation algorithm on user satisfaction. Companies run lots of experiments all the time. With a large and active user-base, you can run an internet experiment very quickly and quite cheaply. Academic researchers are now also beginning to use large scale randomized experiments to test scientific hypotheses about social and human behavior (as we did in One-Way Mirrors in Online Dating: A Randomized Field Experiment).
Based on our experience in this domain and on what I learned from colleagues and past-students working in such environments, there are multiple critical issues challenging the ability to draw valid conclusions from internet experiments. Here are three:
- Contaminated data: Companies constantly conduct online experiments introducing interventions of different types (such as running various promotions, changing website features, and switching underlying technologies). The result is that we never have "clean data" to run an experiment, and we don't know how they are dirty. The data are always somewhat contaminated by other experiments that are taking place in parallel, and in many cases we do not even know which or when such experiments have taken place.
- Spill-over effects: in a randomized experiment we assume that each observation/user experiences only one treatment (or control). However, in experiments that involve an intervention such as knowledge sharing (eg, the treatment group receives information about a new service while the control group does not), the treatment might "spill over" to control group members through social networks, online forums, and other information-sharing platforms that are now common. For example, many researchers use Amazon Mechanical Turk to conduct experiments, where, as DynamoWiki describes, "workers" (the experiment subjects) share information, establish norms, and build community through platforms like CloudMeBaby, MTurk Crowd, mTurk Forum, mTurk Grind, Reddit's /r/mturk and /r/HITsWorthTurkingFor, Turker Nation, and Turkopticon. This means that the control group can be "contaminated" by the treatment effect.
- Gift effect: Treatments that benefit the treated subjects in some way (such as a special promotion or advanced feature) can confuse the effect of the treatment with the effect of receiving a special treatment. In other words, the difference between the outcome for the treatment and control groups might be not due to the treatment per-se but rather due to the "special attention" the treatment group received by the company or researcher.