Wednesday, March 07, 2007

Source for data

Adi Gadwale, a student in my 2004 MBA Data Mining class, still remembers my fetish with business data and data visualization. He just sent me a link to an IBM Research website called Many Eyes, which includes user-submitted datasets as well as Java-applet visualizations.

The datasets include quite a few "junk" datasets, lots with no description. But there are a few interesting ones: FDIC is a "scrubbed list of FDIC institutions removing inactive entities and stripping all columns apart from Assets, ROE, ROA, Offices (Branches), and State". It includes 8711 observations. Another is Absorption Coefficients of Common Materials - I can just see the clustering exercise! Or the 2006 Top 100 Video Games by Sales. There are social-network data, time series, and cross-sectional data. But again, it's like shopping at a second-hand store -- you really have to go through a lot of junk in order to find the treasures.

Happy hunting! (and thanks to Adi)

