Wednesday, September 03, 2008

Data conversion and open-source software

Recently I was trying to open a data file that was created in the statistical software SPSS. SPSS is widely used in the social sciences (a competitor to SAS), and appears to have some ground here in Bhutan. Being in Bhutan with slow and erratic internet connection though, I've failed once and again to use the software through our school's portal. Finding the local SPSS representative seemed a bit surreal, and so I went off trying to solve the problem in another way.

First stop: Googling "convert .sav to .csv" lead me nowhere. SPSS and SAS both have an annoying "feature" of keeping data in file formats that are very hard to convert. A few software packages now import data from SAS databases, but I was unable to find a software package that will import from SPSS. This lead me to a surprising finding: PSPP. Yes, that's right: PSPP, previously known as FIASCO, is an open-source "free replacement for the proprietary program, SPSS." The latest version even boasts a graphic user interface. Another interesting feature is described as "Fast statistical procedures, even on very large data sets."

My problem hasn't been solved as yet, because downloading PSPP and the required Cygwin software poses a challenge with my narrow bandwidth... Thus, I cannot report about the usefulness of PSPP. I'd be interested in hearing from others who have tested/used it!

7 comments:

Mike W. said...

Have you found your solution? I have cygwin running on my home windows machine and might be able to convert by writing a Perl script, if possible. Either way, I'm curious to know the answer and get feedback on PSPP. I'd be happy to help.

Galit Shmueli said...

So here's how the story ended: I tried converting the files with PSPP. Although it was able to open the files, PSPP does not have an export option... So once again, you are left jailed inside the SPSS environment.

The solution was to go back to the file originators and use their SPSS to save the files to CSV... They didn't know how to do it, so we had to show them (very easy: File> Save As).

If you do write a PERL script for converting files, I think many many people would thank you if you posted it online!

Loggy said...

Use R and library(foreign). This will read and write SPSS files (adapted in fact from PSPP) as well as a lot of other formats.

Galit Shmueli said...

Thanks John! this is a very nice solution for statisticians. The "foreign" library in R is described and can be downloaded from http://www.cran.r-project.org/web/packages/foreign

BUT, for non-statisticians or non-R users in general, there is still no reasonable solution for converting SPSS data! There may be a business opportunity here...

rishaad said...

Perhaps I'm missing something.

Given a spss file with (say) three variables x, y and z, it's trivial to convert it to csv with a syntax like:

WRITE OUTFILE='mydata.csv'
/x * ',' y * ',' z *.

EXECUTE.

This should work for any version of spss (at least any less than 20 years old). It also works with pspp.

What's the big problem?

Galit Shmueli said...

Hi Rishaad -- indeed it is very easy if you have SPSS. The big problem is when you do not have SPSS. How do you convert a .sav file that someone gives you? I hope this clarifies.

janika said...

Thanks! PSPP is really good solution for SPSS.