Thursday, July 19, 2007

Handling outliers with a smile

Here's one of the funniest statistics cartoons that I've seen (thanks Adi Gadwale!) First you laugh, then you cry.

Also reminds me of the claim by the famous industrial statistician George Box "All models are wrong, but some are useful".


Tabrez said...

Prof. Shmueli,
You had mentioned in the class that there are different set of methods for outlier detection.

Can you point to any good resources on the topic?

Now that we are done with BUDT-750, there is too much time on hand:)

Galit Shmueli said...

To detect outliers in a single variable can be done easily via sorting and plotting. Even in 2D scatterplots can help. But once you move beyond two variables, outliers are harder to spot. That's where analytics come in. Almost any elementary statistics textbook will have information on outlier detection in the context of linear regression, where it is most developed. We also discussed PCA in class, where plotting the first 2 PC scores as a scatterplot can help detect multivariate outliers. I am not aware of a book dedicated totally to outlier detection, but a quick search on Amazon lead me Outliers in Statistical Data by Vic Barnett and Toby Lewis. I haven't read it, so I cannot comment on it as of yet.