Thursday, September 21, 2006

Dylan on data exploration

The ease of use of many data analysis and data mining software packages has lead to the dangerous tendency to jump to the model fitting stage without proper data exploration. Getting an initial understanding of the data via summarization and visualization is crucial for building good models.

Mike Melcer, a current MBA student in my data mining class, mentioned that Bob Dylan knew this well. He sings You don't need a weatherman to know which way the wind blows (from Subterranean Homesick Blues). The weatherman can, however, quantify the speed of the wind and the temperature. In other words, the modeling phase is there to formalize and quantify what you learn in the data exploration phase. But you do have to stick your head out of the window first.

No comments: