BzST | Business Analytics, Statistics, Teaching: behavioral big data

Showing posts with label behavioral big data. Show all posts

Thursday, December 22, 2016

Key challenges in online experiments: where are the statisticians?

Randomized experiments (or randomized controlled trials, RCT) are a powerful tool for testing causal relationships. Their main principle is random assignment, where subjects or items are assigned randomly to one of the experimental conditions. A classic example is a clinical trial with one or more treatment groups and a no-treatment (control) group, where individuals are assigned at random to one of these groups.

Story 1: (Internet) experiments in industry

Internet experiments have now become a major activity in giant companies such as Amazon, Google, and Microsoft, in smaller web-based companies, and among academic researchers in management and the social sciences. The buzzword "A/B Testing" refers to the most common and simplest design which includes two groups (A and B), where subjects -- typically users -- are assigned at random to group A or B, and an effect of interest is measured. A/B tests are used for testing anything from the effect of a new website feature on engagement to the effect of a new language translation algorithm on user satisfaction. Companies run lots of experiments all the time. With a large and active user-base, you can run an internet experiment very quickly and quite cheaply. Academic researchers are now also beginning to use large scale randomized experiments to test scientific hypotheses about social and human behavior (as we did in One-Way Mirrors in Online Dating: A Randomized Field Experiment).

Based on our experience in this domain and on what I learned from colleagues and past-students working in such environments, there are multiple critical issues challenging the ability to draw valid conclusions from internet experiments. Here are three:

Contaminated data: Companies constantly conduct online experiments introducing interventions of different types (such as running various promotions, changing website features, and switching underlying technologies). The result is that we never have "clean data" to run an experiment, and we don't know how they are dirty. The data are always somewhat contaminated by other experiments that are taking place in parallel, and in many cases we do not even know which or when such experiments have taken place.
Spill-over effects: in a randomized experiment we assume that each observation/user experiences only one treatment (or control). However, in experiments that involve an intervention such as knowledge sharing (eg, the treatment group receives information about a new service while the control group does not), the treatment might "spill over" to control group members through social networks, online forums, and other information-sharing platforms that are now common. For example, many researchers use Amazon Mechanical Turk to conduct experiments, where, as DynamoWiki describes, "workers" (the experiment subjects) share information, establish norms, and build community through platforms like CloudMeBaby, MTurk Crowd, mTurk Forum, mTurk Grind, Reddit's /r/mturk and /r/HITsWorthTurkingFor, Turker Nation, and Turkopticon. This means that the control group can be "contaminated" by the treatment effect.
Gift effect: Treatments that benefit the treated subjects in some way (such as a special promotion or advanced feature) can confuse the effect of the treatment with the effect of receiving a special treatment. In other words, the difference between the outcome for the treatment and control groups might be not due to the treatment per-se but rather due to the "special attention" the treatment group received by the company or researcher.

Story 2: Statistical discipline of Experimental Design

Design of Experiments (DOE or DOX) is a subfield of statistics that is focused on creating the most efficient designs for an experiment, and the most appropriate analysis. Efficient here refers to a context where each run is very expensive or resource consuming in some way. Hence, the goal in DOE methodology is to answer the causal questions of interest with the smallest number of runs (observations). The statistical methodological development of DOE was motivated by agricultural applications in the early 20th century, led by the famous Ronald Fisher. DOE methodology gained further momentum in the context of industrial experiments (today it is typically considered part of "industrial statistics"). Currently, the most active research area within DOE is "computer experiments" that is focused on constructing simulations to emulate a physical system for cases where experimentation is impossible, impractical, or terribly expensive (e.g. experimenting on climate).

Do the two stories converge?

With the current heavy use of online experiments by companies, one would have thought the DOE discipline would flourish: new research problems, plenty of demand from industry for collaboration, troves of new students. Yet, I hear that the number of DOE researchers in US universities is shrinking. Most "business analytics" or "data science" programs do not have a dedicated course on experimental design (with focus on internet experiments). Recent DOE papers in top industrial statistics journals (eg Technometrics) and DOE conferences indicate the burning topics from Story 1 are missing. Academic DOE research by statisticians seems to continue focusing on the scarce data context and on experiments on "things" rather than human subjects. The Wikipedia page on DOE also tells a similar story. I tried to make these points and others in my recent paper Analyzing Behavioral Big Data: Methodological, practical, ethical, and moral issues. Hopefully the paper and this post will encourage DOE researchers to address such burning issues and take the driver's seat in creating designs and analyses for researchers and companies conducting "big experiments".

Monday, October 24, 2016

Experimenting with quantified self: two months hooked up to a fitness band

It's one thing to collect and analyze behavioral big data (BBD) and another to understand what it means to be the subject of that data. To really understand. Yes, we're all aware that our social network accounts and IoT devices share our private information with large and small companies and other organizations. And although we complain about our privacy, we are forgiving about sharing it, most likely because we really appreciate the benefits.

So, I decided to check out my data sharing in a way that I cannot ignore: I started wearing a fitness band. I bought one of the simplest bands available - the Mi Band Pulse from Xiaomi - this is not a smart watch but "simply" a fitness band that counts steps, measures heart rate and sleep. It's cheap price means that this is likely to spread to many users. Hence, BBD at speed on a diverse population.

I had two questions in mind:

Can I limit the usage so that I only get the benefits I really need/want while avoiding generating and/or sharing data I want to keep private?
What do I not know about the data generated and shared?

I tried to be as "private" as possible, never turning on location, turning on bluetooth for synching my data with my phone only once a day for a few minutes, turning on notifications only for incoming phone calls, not turning on notifications for anything else (no SMS, no third party apps notifications), only using the Mi Fit app (not linking to other apps like Google Fit), and only "befriending" one family member who bought the same product.

While my experiment is not as daring as a physician testing his/her developed drugs on themselves, I did learn a few important lessons. Here is what I discovered:

Despite intentionally turning off my phone bluetooth right after synching the band's data every day (which means the device should not be able to connect to my phone, or the cloud), the device and my phone did continue having secret "conversations". I discovered this when my band vibrated on incoming phone calls when bluetooth was turned off. This happened on multiple occasions. I am not sure what part of the technology chain to blame - my Samsung phone? the Android operating system? the Mi Band? It doesn't really matter. The point is: I thought the band was disconnected from my phone, but it was not.
Every time I opened the Mi Fit app on my phone, it requested access to location, which I had to decline every time. Quite irritating, and I am guessing many users just turn it on to avoid the irritation.
My family member who was my "friend" through the Mi Fit app could see all my data: number of steps, heart rate measurements (whenever I asked the band to measure heart rate), weight (which I entered manually), and even... my sleep hours. Wait: my family member now knew exactly what time I went to sleep and what time I got up every morning. That's something I didn't consider when we decided to become "friends". Plus, Xiaomi has the information about our "relationship" including "pokes" that we could send each other (causing a vibration).
The sleep counter claims to measure "deep sleep". It was showing that even if I slept 8 hours and felt wonderfully refreshed, my deep sleep was only 1 hour. At first I was alarmed. With a bit of search I learned that it doesn't mean much. But the alarming effect made me realize that even this simple device is not for everyone. Definitely not for over-worried people.

After 2 months of wearing of the band 24-by-7, what have I learned about myself that I didn't already know? That my "deep sleep" is only one hour per night (whatever that means). And that the feeling of the vibrating band when conquering the 8000 steps/day limit feels good. Addictively good, despite being meaningless. All the other data was really no news (I walk a lot on days when I teach, my weight is quite stable, my heart rate is reasonable, and I go to sleep too late).

To answer my two questions:

Can I limit the usage to only gain my required benefits? Not really. Some is beyond my control, and in some cases I am not aware of what I am sharing.
What do I not know? Quite a bit. I don't know what the device is really measuring (eg "deep sleep"), how it is sharing it (when my bluethooth is off), and what the company does with it.

Is my "behavioral" data useful? To me probably not. To Xiaomi probably yes. One small grain of millet (小米 = XiaoMi) is not much, but a whole sack is valuable.

So what now? I move to Mi Band 2. Its main new feature is a nudge every hour when you're sitting for too long.