Tuesday, September 06, 2011

"Predict" or "Forecast"?

What is the difference between "prediction" and "forecasting"? I heard this being asked quite a few times lately. The Predictive Analytics World conference website has a Predictive Analytics Guide page with the following Q&A:

How is predictive analytics different from forecasting?
Predictive analytics is something else entirely, going beyond standard forecasting by producing a predictive score for each customer or other organizational element. In contrast, forecasting provides overall aggregate estimates, such as the total number of purchases next quarter. For example, forecasting might estimate the total number of ice cream cones to be purchased in a certain region, while predictive analytics tells you which individual customers are likely to buy an ice cream cone.
In a recent interview on "Data Analytics", Prof Ram Gopal asked me a similar question. I have a slightly different view of the difference: the term "forecasting" is used when it is a time series and we are predicting the series into the future. Hence "business forecasts" and "weather forecasts". In contrast, "prediction" is the act of predicting in a cross-sectional setting, where the data are a snapshot in time (say, a one-time sample from a customer database). Here you use information on a sample of records to predict the value of other records (which can be a value that will be observed in the future). That's my personal distinction.

While forecasting has traditionally focused on providing "overall aggregate estimates", that has long changed, and methods of forecasting are commonly used to provide individual estimates. Think again of weather forecasts -- you can get forecasts for very specific areas. Moreover, daily (and even minute-by-minute) weather forecasts are generated for many different geographical areas. Another example is SKU-level forecasting for inventory management purposes. Stores and large companies often use forecasting to predict every product they carry. These are not aggregate values, but individual-product forecasts.

"Old fashioned" forecasting has indeed been around for a long time, and has been taught in statistics and operations research programs and courses. While some forecasting models require a lot of statistical expertise (such as ARIMA, GARCH and other acronyms), there is a terrific and powerful set of data-driven, computationally fast, automated methods that can be used for forecasting even at the individual product/service level. Forecasting, in my eyes, is definitely part of predictive analytics.


Rob J Hyndman said...

I've heard the time series/cross sectional distinction before, although I tend to use predict and forecast interchangeably in both contexts myself. But the aggregate/individual distinction is new to me and seems impossible to use consistently in practice. How much disaggregation is required before you would switch from forecasting to prediction?

Galit Shmueli said...

Good point Rob. Another point: I find that the distinction between "forecast" and "predict" is more prominent with statisticians and econometricians (who often concentrate more on the model "appropriateness" than on predictive accuracy), to emphasize the difference between models that account for temporal dependence directly (such as ARIMA) and those that don't. In contrast, data miners (who are typically focused on the result in terms of predictive power) will blur the terms and often even apply cross-sectional methods such as neural nets to time series data for purposes of forecasting.

Dushyant said...

Really nice discussion :)

Having worked in the Analytics Industry for 5 years, I come more of the opinion that Forecasting and Prediction are completely different terms though both have been used interchangeably.

Forecasting- A method of using historical data assuming that there is a trend to be followed and forecasting is basically the "prediction" of the trend!

Prediction- A method to build a model (many to many dependency model) of the real world business problem and using it to determine the future.

From an altogether different perspective, Aryans "predicted" the end of world in 2012 based on certain "factors". They did not "forecast" the end of world based on previous data :)

Galit Shmueli said...

Dushyant - I would just add that "prediction" need not be only of the future. You could try to predict the response of some records (not necessarily future ones) based on responses of other records. For instance, you can try to predict the chance of fraud in some existing transactions, given a sample of transactions with known fraud/non-fraud labels.