Showing posts with label predicting. Show all posts
Showing posts with label predicting. Show all posts

Tuesday, September 06, 2011

"Predict" or "Forecast"?

What is the difference between "prediction" and "forecasting"? I heard this being asked quite a few times lately. The Predictive Analytics World conference website has a Predictive Analytics Guide page with the following Q&A:

How is predictive analytics different from forecasting?
Predictive analytics is something else entirely, going beyond standard forecasting by producing a predictive score for each customer or other organizational element. In contrast, forecasting provides overall aggregate estimates, such as the total number of purchases next quarter. For example, forecasting might estimate the total number of ice cream cones to be purchased in a certain region, while predictive analytics tells you which individual customers are likely to buy an ice cream cone.
In a recent interview on "Data Analytics", Prof Ram Gopal asked me a similar question. I have a slightly different view of the difference: the term "forecasting" is used when it is a time series and we are predicting the series into the future. Hence "business forecasts" and "weather forecasts". In contrast, "prediction" is the act of predicting in a cross-sectional setting, where the data are a snapshot in time (say, a one-time sample from a customer database). Here you use information on a sample of records to predict the value of other records (which can be a value that will be observed in the future). That's my personal distinction.



While forecasting has traditionally focused on providing "overall aggregate estimates", that has long changed, and methods of forecasting are commonly used to provide individual estimates. Think again of weather forecasts -- you can get forecasts for very specific areas. Moreover, daily (and even minute-by-minute) weather forecasts are generated for many different geographical areas. Another example is SKU-level forecasting for inventory management purposes. Stores and large companies often use forecasting to predict every product they carry. These are not aggregate values, but individual-product forecasts.

"Old fashioned" forecasting has indeed been around for a long time, and has been taught in statistics and operations research programs and courses. While some forecasting models require a lot of statistical expertise (such as ARIMA, GARCH and other acronyms), there is a terrific and powerful set of data-driven, computationally fast, automated methods that can be used for forecasting even at the individual product/service level. Forecasting, in my eyes, is definitely part of predictive analytics.

Friday, January 25, 2008

New "predictive tools" from Fair Issac

An interesting piece in the Star Tribune: Fair Isaac hopes its new tools lessen lenders' risk of defaults was sent to me by former student Erik Anderson. Fair Issac is apparently updating their method for computing FICO scores for 2008. According to the article "in the next few weeks [Fair Issac] will roll out a suite of tools designed to predict future default risk". The emphasis is on predicting. In other words, given a database of past credit reports, a model is developed for predicting default risk.

I would be surprised if this is a new methodology. Trying to decipher what really is new is very hard. Erik pointed out the following paragraph (note the huge reported improvement):

"The new tools include revamping the old credit-scoring formula so that it penalizes consumers with a high debt load more than the earlier version. The update, dubbed FICO 08, should increase predictive strength by 5 to 15 percent, according to Fair Isaac's vice president of scoring, Tom Quinn."

So what is new for the 2008 predictor? The inclusion of a new debt load variable? a different binning of debt into categories? a different way for incorporating debt into the model? a new model altogether? Or maybe, simply the model based on the most recent data now includes a parameter estimate that is much higher for debt load than models based on earlier data.