In its recent editorial, the journal Basic and Applied Social Psychology announced that it will no longer accept papers that use classical statistical inference. No more p-values, t-tests, or even... confidence intervals!
"prior to publication, authors will have to remove all vestiges of the NHSTP (p-values, t-values, F-values, statements about ‘‘significant’’ differences or lack thereof, and so on)... confidence intervals also are banned from BASP"
Many statisticians would agree that it is high time to move on from p-values and statistical inference to practical significance, estimation, more elaborate non-parametric modeling, and resampling for avoiding assumption-heavy models. This is especially so now, when datasets are becoming larger and technology is able to measure more minute effects.
In our 2013 paper "Too Big To Fail: Large Samples and the p-value Problem" we raise the serious issue p-value-based decision making when using very large samples. Many have asked us for solutions that scale up p-values, but we haven't come across one that really works. Our focus was on detecting when you're "too large" and we emphasized the importance of focusing on effect magnitude, and precision (please do report standard errors!)
Machine learners would probably advocate finally moving to predictive modeling and evaluation. Predictive power is straightforward to measure, although it isn't always what social science researchers are looking for.
But wait. What this editorial dictates is only half a revolution: it says what it will ban. But it does not offer a cohesive alternative beyond simple summary statistics. Focusing on effect magnitude is great for making results matter, but without reporting standard errors or confidence intervals, we don't know anything about the uncertainty of the effect. Abandoning any metric that relies on "had the experiment been replicated" is dangerous and misleading. First, this is more a philosophical assumption than an actual re-experimentation. Second, to test whether effects found in a sample generalize to a population of interest, we need the ability to replicate the results. Standard errors give some indication of how replicable the results are, under the same conditions.
BASP's revolutionary decision has been gaining attention outside of psychology (a great tactic to promote a journal!) so much so that at times it is difficult to reach the controversial editorial. Some statisticians have blogged about this decision, others are tweeting. This is a great way to open a discussion about empirical analysis in the social sciences. However, we need to come up with alternatives that focus on uncertainty and the ultimate goal of generalization.
No comments:
Post a Comment