Tuesday, February 26, 2008

Data mining competition season

Those who've been following my postings probably recall "competition season" when all of a sudden there are multiple new interesting datasets out there, each framing a business problem that requires the combination of data mining and creativity.

Two such competitions are the SAS Data Mining Shootout and the 2008 Neural Forecasting Competition. The SAS problem concerns revenue management for an airline who wants to improve their customer satisfaction. The NN5 competition is about forecasting cash withdrawals from ATMs.

Here are the similarities between the two competitions: they both provide real data and reasonably real business problems. Now to a more interesting similarity: they both have time series forecasting tasks. From a recent survey on the popularity of types of data mining techniques, it appears that time series are becoming more and more prominent. They also both require registration in order to get access to the data (I didn't compare their terms of use, but that's another interesting comparison), and welcome any type of modeling. Finally, they are both tied to a conference, where competitors can present their results and methods.

What would be really nice is if, like in KDD, the winners' papers would be published online and made publicly available.
