Wednesday, September 19, 2007

Webcast on Analytics in the Classroom

Tomorrow at 11:00 EST I will be giving a webcast describing several term projects by MBAs in my data mining class. Students have been working on real business projects in my class for 4 years now, with many of the projects leading to important insights to the companies who provided the data (in most cases the students' workplaces).

For each of several cases I will describe the business objective; we'll look at the data via interactive visualization using Spotfire, and then examine some of the analyses and findings.

The webcast is organized by Spotfire (now a division of TIBCO). We have been using their interactive visualization software in classes via their b-school education outreach program.

To join the webcast tomorrow, please register: Analytics in the Classroom- Giving today's MBA's a Competitive Advantage, by Dr. Galit Shmueli, Univ. of MD

Thursday, September 06, 2007

Data mining = Evil?

Some get a chill when they hear "data mining" because they associate it with "big brother". Well, here's one more major incident that sheds darkness on smart algorithms: The Department of Homeland Security declared the end of a data mining program called ADVISE (Analysis, Dissemination, Visualization, Insight and Semantic Enhancement). Why? Because it turns out that they were testing it for two years on live data on real people "without meeting privacy requirements" (Yahoo! News: DHS ends criticized data-mining program).

There is nothing wrong or evil about data mining. It's like any other tool: you can use it or abuse it. Issues of privacy and confidentiality in data usage have always been there and will continue to be a major concern as more and more of our private data gets stored in commercial, government, and other databases.

Many students in my data mining class use data from their workplace for their term project. The projects almost always turn out to be insightful and useful beyond the class exercise. But we do always make sure to obtain permission, de-identify, and protect and restrict access to the data as needed. Good practice is the key to keeping "data mining" a positive term!

Wednesday, September 05, 2007

Shaking up the statistics community

A new book is gaining emotional reactions for the normally calm statistics community (no pun intended): The Black Swan: The Impact of the Highly Improbably by Nassim Taleb uses blunt language to critique the field of statistics, statisticians, and users of statistics. I have not yet read the book, but from the many reviews and coverage I am running to get a copy.

The widely read ASA statistics journal The American Statistician decided to devote a special section that reviews the book and even obtained a (somewhat bland) response from the author. Four reputable statisticians (Robert Lund, Peter Westfall, Joseph Hilbe, and Aaron Brown) reviewed the book, some trying to confront some of the arguments and criticize the author for making some unscientific claims. A few even have formulas and derivations. All four agree that this is an important read for statisticians, and that it raises some interesting points that we should ponder upon.

The author's experiences come from the world of finance, where he worked for investment banks, a hedge fund, and finally made a fortune at his own hedge fund. His main claim (as I understand from the reviews and coverage) is that analytics should focus more on the tails, or the unusual, and not as much on the "average". That's true in many applications (e.g., in my own research in biosurveillance, for early detection of disease outbreak, or in anomaly detection as a whole). Before I make any other claims, though, I must rush to read the book!