Friday, October 06, 2006

Time Series Forecasting Competition

Forecasting transportation demand is important for multiple goals such as staffing, planning, and inventory control. The public transportation system in Santiago de Chile is currently going through a major effort of reconstruction (if you read Spanish, you can find more at

The 2006 Business Intelligence Competition (BI CUP 2006) focuses on forecasting demand for public transportation. They provide a training set of a time series of passengers arriving at a terminal, and the competitors must come up with a method for forecasting the test set, which comprises of a few future days.

Although this problem is a great example of data mining for business intelligence, the winning criterion is the model that generates the smallest mean absolute error (MAE), also known as mean absolute deviation (MAD). In other words, the closeness of the forecasts to the actual values in the test set is the only criterion for winning. This setup makes this more of a pure data mining problem, and much less one of business intelligence. Clearly, the most accurate model might be completely impractical. For example, it might be very computationally intensive, when in practice the model is supposed to produce real-time forecasts for many different series simultaneously. Or, there might be different costs associated with forecast errors at different times of the day (or difference days of the week). These types of considerations, when included in the modeling phase, turn the data mining task into a business related one.

The competition organizers promised to provide more of the business details after the competition ends, at the end of October.

5 teams of MBAs in my data mining class are now working hard on this forecasting problem. A few might decide to formally participate in the competition.

Good luck to the competitors!

No comments: