Tuesday, July 24, 2012

Linear regression for binary outcome: even better news

I recently attended the 8th World Congress in Probability and Statistics, where I heard an interesting talk by Andy Tsao. His talk "Naivity can be good: a theoretical study of naive regression" (Abstract #0586) was about the use of Naive Regression, which is the application of linear regression to a categorical outcome, treating the outcome as numerical. He asserted that predictions from Naive Regression will be quite good. My last post was about the "goodness" of a linear regression applied to a binary outcome in terms of the estimated coefficients. That's what explanatory modeling is about. What Dr. Tsao alerted me to, is that the predictions (or more correctly, classifications) too, will be good. In other words, it's useful for predictive modeling! In his words:
"This naivity is not blessed from current statistical or machine learning theory. However, surprisingly, it delivers good or satisfactory performances in many applications."
Note that to derive a classification from naive regression, you treat the prediction as the class probability (although it might be negative or >1), and apply a cutoff value as in any other classification method.

Dr. Tsao pointed me to the good old The Elements of Statistical Learning, which has a section called Linear Regression of an Indicator Matrix. There are two interesting takeaway from Dr. Tsao's talk:
  1. Naive Regression and Linear Discriminant Analysis will have the same ROC curve, meaning that the ranking of predictions will be identical.
  2. If the two groups are of equal size (n1=n2), then Naive Regression and Discriminant Analysis are equivalent and therefore produce the same classifications.
Post a Comment