Monday, September 22, 2008

Dr. Doom and data mining

Last month The New York Times featured an article about Dr. Doom: Economics professor "Roubini, a respected but formerly obscure academic, has become a major figure in the public debate about the economy: the seer who saw it coming."

This article caught my statistician eye due to the description of "data" and "models". While economists in the article portray Roubini as not using data and econometric models, a careful read shows that he actually does use data and models, but perhaps unusual data and unusual models!

Here are two interesting quotes:
“When I weigh evidence,” he told me, “I’m drawing on 20 years of accumulated experience using models” — but his approach is not the contemporary scholarly ideal in which an economist builds a model in order to constrain his subjective impressions and abide by a discrete set of data.
Later on, Roubini is quoted:
"After analyzing the markets that collapsed in the ’90s, Roubini set out to determine which country’s economy would be the next to succumb to the same pressures."
This might not be data mining per-se, but note that Roubini's approach is at heart similar to the data mining approach: looking at unusual data (here, taking an international view rather than focus on national only) and finding patterns within them that predict economic downfalls. In a standard data mining framework we would of course include also all those markets that have not-collapsed, and then set up the problem as a "direct marketing" problem: who is most likely to fall?

A final note: As a strong believer in the difference between the goals of explaining and forecasting, I think that econometricians should stop limiting their modeling to explanatory, causality-based models. Good forecasters might not be revealing in terms of causality, but in many cases their forecasts will be far more accurate than those from explanatory models!

Wednesday, September 03, 2008

Data conversion and open-source software

Recently I was trying to open a data file that was created in the statistical software SPSS. SPSS is widely used in the social sciences (a competitor to SAS), and appears to have some ground here in Bhutan. Being in Bhutan with slow and erratic internet connection though, I've failed once and again to use the software through our school's portal. Finding the local SPSS representative seemed a bit surreal, and so I went off trying to solve the problem in another way.

First stop: Googling "convert .sav to .csv" lead me nowhere. SPSS and SAS both have an annoying "feature" of keeping data in file formats that are very hard to convert. A few software packages now import data from SAS databases, but I was unable to find a software package that will import from SPSS. This lead me to a surprising finding: PSPP. Yes, that's right: PSPP, previously known as FIASCO, is an open-source "free replacement for the proprietary program, SPSS." The latest version even boasts a graphic user interface. Another interesting feature is described as "Fast statistical procedures, even on very large data sets."

My problem hasn't been solved as yet, because downloading PSPP and the required Cygwin software poses a challenge with my narrow bandwidth... Thus, I cannot report about the usefulness of PSPP. I'd be interested in hearing from others who have tested/used it!