## Tuesday, October 27, 2009

### Testing directional hypotheses: p-values can bite

I've recently had interesting discussions with colleagues in Information Systems regarding testing directional hypotheses. Following their request, I'm posting about this apparently illusive issue.

In information systems research, the most common type of hypothesis is directional, i.e. the parameter of interest is hypothesized to go in a certain direction. An example would be testing the hypothesis that teenagers are more likely than older folks to use Facebook. Another example is the hypothesis that higher opening bids on eBay lead to higher final prices. In the Facebook example, the researcher would test the hypothesis by gathering data on Facebook usage by each age group, then comparing the average usage of each group, and if the teenager's average is sufficiently larger, then the hypothesis would be supported (at some statistically significant level). In the eBay example, a researcher might collect information on many eBay auctions, then fit a regression of price on the opening bid (and controlling for all other types of factors). If the regression coefficient turns out to be sufficiently larger than zero, then the researcher could conclude that the hypothesized effect is true (let's put aside issues of causality for the moment).

More formally, for the Facebook hypothesis the test statistic would be a T statistic of the form
T = (teenager Average - older folks Average) / Standard Error
The test statistic for the eBay example would also be a T statistics of the form:
T = opening-bid regression coefficient / Standard Error

Note an important point here: when stating a hypothesis as above (namely, "the alternative hypothesis"), there is always a null hypothesis that is the default. This null hypothesis is often neglected to be mentioned expliciltly in Information Systems articles, but let's make clear that in directional hypotheses such as the ones above, the null hypothesis includes both the "no effect" and the "opposite directional effect" scenarios. In the Facebook example, the null includes both the case that teenagers and older folks use Facebook equally, and that teenagers use Facebook less than older folks. In the eBay example, the null includes both cases of "opening bid doesn't affect final price" and "opening bid lowers final price".

Getting back to the T test statistics (or any other test statistic, for this matter): To evaluate whether the T is sufficiently extreme to reject the null hypothesis (and support the researcher's hypothesis), information systems researchers typically use a p-value, and compare it to some significince level. BUT, computing the p-values must take into account the directionality of the hypothesis! The default p-value that you'd get from running a regression model in any standard software is for a non-directional hypothesis! To get the directional p-value you would either divide that p-value by 2, if the sign of the T statistic is in the "right" direction (positive if your hypothesis said positive; negative if your hypothesis said negative), or you would have to use 1-p-value/2. In the first case, mistakenly using the software p-value would result in missing out on real effects (loss of statistical power), while in the latter case you might infer an effect, when there is none (or maybe there even is an effect in the opposite direction).

The solution to this confusion is to examine each hypothesis for its directionality (think what the null hypothesis is), then construct the corresponding p-value carefully. Some tests in some software packages will allow you to specify the direction and will give you a "kosher" p-value. But in many cases, regression being an example, most software will only spit out the no-directional p-value. Or just get a die-hard statistician on board.

Which reminds me again why I don't like p-values. For lovers of confidence intervals, I promise to post about confidence intervals for directional hypotheses (what is the sound of a one-sided confidence interval?)