To detect outliers in a single variable can be done easily via sorting and plotting. Even in 2D scatterplots can help. But once you move beyond two variables, outliers are harder to spot. That's where analytics come in. Almost any elementary statistics textbook will have information on outlier detection in the context of linear regression, where it is most developed. We also discussed PCA in class, where plotting the first 2 PC scores as a scatterplot can help detect multivariate outliers. I am not aware of a book dedicated totally to outlier detection, but a quick search on Amazon lead me Outliers in Statistical Data by Vic Barnett and Toby Lewis. I haven't read it, so I cannot comment on it as of yet.
2 comments:
Prof. Shmueli,
You had mentioned in the class that there are different set of methods for outlier detection.
Can you point to any good resources on the topic?
Now that we are done with BUDT-750, there is too much time on hand:)
To detect outliers in a single variable can be done easily via sorting and plotting. Even in 2D scatterplots can help. But once you move beyond two variables, outliers are harder to spot. That's where analytics come in. Almost any elementary statistics textbook will have information on outlier detection in the context of linear regression, where it is most developed. We also discussed PCA in class, where plotting the first 2 PC scores as a scatterplot can help detect multivariate outliers. I am not aware of a book dedicated totally to outlier detection, but a quick search on Amazon lead me Outliers in Statistical Data by Vic Barnett and Toby Lewis. I haven't read it, so I cannot comment on it as of yet.
Post a Comment