Wednesday, January 27, 2010

Drag-and-drop data mining software for the classroom

The drag-and-drop (D&D) concept in data mining tools is very neat. You "drag" icons (aka "nodes") that do different operations, and "connect" them to create a data mining process. This is also called "graphical programming". What I especially like about it is that it keeps the big picture in your mind rather than getting blinded by analysis details. The end product is also much easier to present and document.

There has been quite a bonanza lately with a few of the major D&D data mining software tools. Clementine (by SPSS - now IBM) is now called "IBM SPSS Modeler". Insightful Miner (by Insightful - now TIBCO) is now TIBCO Spotfire Miner. SAS Enterprise Miner remains SAS EM. And STATISTICA Data Miner by StatSoft also remains in the same hands.

There's a good comparison of these four tools (and two more non-d&d, menu driven tools: KXEN and XLMiner) on The 2006 article by Nisbet compares performance, pricing, and more.

Let me look at the choice of a D&D package from the perspective of a professor teaching a data mining course in a business school. My own considerations are: (1) easy and fast to learn, (2) easy for my students to access, (3) cheap enough for our school to purchase, and (4) reasonably priced for students after they graduate. It's also nice to have good support (when things break down or when you just can't figure something out). And some instructors also like additional teaching materials.

I've had the longest experience with SAS EM, but it has been a struggle. At first we had individual student licenses, where each student had to download the software from a set of CDs that I had to circulate between them. The size of the software choked too many computers. So we moved to the server version (that allows students to use the software through our portal), but that has been excruciatingly slow. The server version is also quite expensive to the school. The potential solution was to move to using the "SAS on demand" product, where the software is accessed online and sits on the SAS servers. SAS offers this through the SAS on demand for Academics (SODA) program and it is faster. However, as I ranted in another post, SODA currently can only load SAS datasets. And finally, SAS EM is extremely expensive outside of academia. The likelihood that my students would have access to it in their post-graduation job was therefore low.

I recently discovered Spotfire Miner (by TIBCO) and played around with it. Very fast and easy to learn, runs fast, and happily accepts a wide range of data file types. Cost for industry is currently $349/month. For use in the classroom it is free to both instructor and students! (as part of TIBCO's University Program).

I can't say much about IBM SPSS Modeler (previously known as Clementine) or StatSoft's STATISTICA Data Miner, except that after looking thoroughly through their websites I couldn't find any mention of pricing for academia or for industry. And I usually don't like the "request a quote" which tends to leave my mailbox full of promotional materials forever (probably the result of a data mining algorithm used for direct marketing!). Is the academic version identical to the full-blown version? is it a standalone installation or do you install it on a server?

For instructors who like extra materials: SAS offers a wealth of data mining teaching materials (you must contact them to receive the materials). StatSoft has a nice series of YouTube videos on different data mining topics and a brief PDF tutorial on data mining (they also have the awesome free Electronic Statistics Textbook which is a bit like an encyclopedia). I don't know of data mining teaching materials for the other packages (and couldn't find any on their websites).

It would be great to hear from other instructors and MBA students about their classroom (and post-graduation) experience with D&D software.
Post a Comment