BzST | Business Analytics, Statistics, Teaching: PCA

Tuesday, August 03, 2010

The PCA Debate

Recently a posting on the Research Methods Linked-In group asked what is Principal Components Analysis (PCA) in laymen terms and what is it useful for. The answers clearly reflected the two "camps": social science researchers and data miners. For data miners PCA is a popular and useful data reduction method for reducing the dimension of dataset with many variables. For social scientists PCA is a type of factor analysis without a rotation step. The last sentence might sound cryptic to a non-social-scientist, so a brief explanation is in place: The goal of rotation is to simplify and clarify the interpretation of the principal components relative to each of the original variables. This is achieved by optimizing some criterion (see http://en.wikipedia.org/wiki/Factor_analysis#Rotation_methods for details).

Now here comes the explain vs. predict divide:

PCA and factor analysis often produce practically similar results in terms of "rearranging" the total variance of the data. Hence, PCA is by far more common in data mining compared to Factor Analysis. In contrast, PCA is considered by social scientists to be inferior to Factor Analysis because their goal is to uncover underlying theoretical constructs. Costello & Osborne (in the 2005 issue of the online journal Practical Assessment, Research& Evaluation) give an overview of PCA and factor analysis, discuss the debate between the two, and summarize:

We suggest that factor analysis is preferable to principal components analysis. Components analysis is only a data reduction method. It became common
decades ago when computers were slow and expensive to use; it was a quicker, cheaper alternative to factor analysis... However, researchers rarely collect and analyze data without an a priori idea about how the variables are related (Floyd & Widaman, 1995). The aim of factor analysis is to reveal any latent variables that cause the manifest variables to covary.

Moreover, the choice of rotation method can lead to either correlated or uncorrelated factors. While data miners would tend to opt for uncorrelated factors (and therefore would stick to the uncorrelated principal components with no rotation at all), social scientists often choose a rotation that leads to correlated factors! Why? Costello & Osborne explain: "In the social sciences we generally expect some correlation among factors, since behavior is rarely partitioned into neatly packaged units that function independently of one another."

At the end of the day, it comes down to the different places that causal-explanatory scientist and data miners take on the data-theory continuum. In the social sciences, researchers assume an underlying causal theory before considering any data or analysis. The "manifest world" is only useful for uncovering the "latent world". Hence, data and analysis methods are viewed only through the lens of theory. In contrast, in data mining the focus is at the data level, or the "manifest world", because often there is no underlying theory, or because the goal is to predict new (manifest) data or to capture an association at the measurable data level.

Thursday, March 26, 2009

Principal Components Analysis vs. Factor Analysis

Here is an interesting example of how similar mechanics lead to two very different statistical tools. Principal Components Analysis (PCA) is a powerful method for data compression, in the sense of capturing the information contained in a large set of variables by a smaller set of linear combinations of those variables. As such, it is widely used in applications that require data compression, such as visualization of high-dimensional data and prediction.

Factor Analysis (FA), technically considered a close cousin of PCA, is popular in the social sciences, and is used for the purpose of discovering a small number of 'underlying factors' from a larger set of observable variables. Although PCA and FA are both based on orthogonal linear combinations of the original variables, they are very different conceptually: FA tries to relate the measured variables to underlying theoretical concepts, while PCA operates only at the measurement level. The former is useful for explaining; the latter for data reduction (and therefore prediction).

Richard Darlington, a Professor Emeritus of Psychology at Cornell, has a nice webpage describing the two. He tries to address the confusion between PCA and FA by first introducing FA and only then PCA, which is the opposite of what you'll find in textbooks. Darlington comments:

I have introduced principal component analysis (PCA) so late in this chapter primarily for pedagogical reasons. It solves a problem similar to the problem of common factor analysis, but different enough to lead to confusion. It is no accident that common factor analysis was invented by a scientist (differential psychologist Charles Spearman) while PCA was invented by a statistician. PCA states and then solves a well-defined statistical problem, and except for special cases always gives a unique solution with some very nice mathematical properties. One can even describe some very artificial practical problems for which PCA provides the exact solution. The difficulty comes in trying to relate PCA to real-life scientific problems; the match is simply not very good.

Machine learners are very familiar with PCA as well as other compression-type algorithms such as Singular Value Decomposition (the most heavily used compression technique in the Netflix Prize competition). Such compression methods are also used as alternatives to variable selection algorithms, such as forward selection and backward elimination. Rather than retain or remove "complete" variables, combinations of them are used.

I recently learned of Independent Components Analysis (ICA) from Scott Nestler, a former PhD student in our department. He used ICA in his dissertation on portfolio optimization. The idea is similar to PCA, except that the resulting components are not only uncorrelated, but actually independent.