Friday, April 28, 2006

p-values in LARGE datasets

We had an interesting discussion in our department today, the result of confining statisticians and non-statisticians in a maize-like building. Our colleague who called himself "non-stat-guru" sent a query to us "stat-gurus" (his labels) regarding p-values in a model that is estimated from a very large dataset.

The problem: a cetain statistical model was fit to 120,000 observations (that's right, n=120K). And obviously, all p-values for all predictors turned out to be highly statistically significant.

Why does this happen and what does it mean?
When the number of observations is very large, standard errors of estimates become very small: a simple example is the standard error of the mean which is equal to std/sqrt(n) . Plug 1 million in that denominator! This means that the model has power to detect even miniscule changes.

For instance, say we want to test whether the average population IQ is 100 (remember that IQ scores are actually calibrated so that the average is 100...). We take a sample of 1 million people, measure their IQ and compute the mean and standard deviation. The null hypothesis is

H0: population mean (mu) = 100
H1: mu NOT 100

The test statistic is: T = {sample mean - 100 } / {sample std / sqrt(n)}

the n=1,000,000 inflates the numerator of the T statistic and will make it statistically significant for even a sample mean of 100.000000000001. But is such a different practically significant??? Of course not.

The problem, in short, is that in large datasets statistical significance is likely to diverge from practical significance.

What can be done?

1. Assess the magnitude of the coefficients themselves and what their interpretation is. Their practical significance might be low. For example, in a model for cigarette box demand in a neighborhood grocery store, such as demand = a + b price, we might find a coefficient of b=0.000001 to be statistically significant (if we have enough observations). But what does it mean? An increase of $1 in price is associated with an average increase of 0.000001 in the number of cigerette boxes sold. Is this relevant?

2. Take a random sample and perform the analysis on that. You can use the remaining data to test the robustness of the model.

Next time before driving your car, make sure that your windshield was not replaced with a magnifying glass (unless you want to detect every ant on the road).
Post a Comment