Saturday, November 07, 2009

The value of p-values: Science magazine asks

My students know how I cringe when I am forced to teach them p-values. I have always felt that their meaning is hard to grasp, and hence they are mostly abused when used by non-statisticians. This is clearly happening in research using large datasets, where p-values are practically useless for inferring practical importance of effects (check out our latest paper on the subject, which looks at large-dataset research in Information Systems).

So, when one of the PhD students taking my "Scientific Data Collection" course stumbled upon this recent Science Magazine article "Mission Improbable: A Concise and Precise Definition of P-Value" he couldn't resist emailing it to me. The article showcases the abuse of p-values in medical research due their illusive meaning. This is not even with large samples! Researchers incorrectly interpret the meaning of a p-value to be the probability of an effect rather than its statistical significance. The result of such confusion can clearly be devastating when the issue at stake is the effectiveness of a new drug or vaccine.

There are obviously better ways for assessing statistical significance, which are better aligned with practical significance and are also less ambiguous than p-values. One is confidence intervals. You get an estimate of your effect plus/minus some margin. You can then evaluate what the interval means practically. Another approach (good to try both) is to test predictive accuracy of your model, to see whether the prediction error is at a reasonable level -- this is achieved by applying your model to new data, and evaluating how well it fits those new data.

Shockingly enough, people seem to really want to use p-values, even if they don't understand them. I recently was involved in designing materials for a basic course on statistics for engineers and managers in a big company. We created an innovative and beautiful set of slides, with real examples, straightforward explanations, and practical advice. The 200+ slides did not have mention of p-values, but rather focused on measuring effects, understanding sampling variability, standard errors and confidence intervals, seeing the value of residual analysis in linear regression, and learning how to perform and evaluate prediction. Yet, we were requested by the company to replace some of this material ("not sure if we will need residual analysis, sampling error etc. our target audience may not use it") with material on p-values and on the 0.05 threshold ("It will be sufficient to know interpreting the p-value and R-sq to interpret the results"). Sigh.
It's hard to change a culture with such a long history.