BzST | Business Analytics, Statistics, Teaching: forecasting

Showing posts with label forecasting. Show all posts

Monday, December 10, 2018

Forecasting large collections of time series

With the recent launch of Amazon Forecast, I can no longer procrastinate writing about forecasting "at scale"!

Quantitative forecasting of time series has been used (and taught) for decades, with applications in many areas of business such as demand forecasting, sales forecasting, and financial forecasting. The types of methods taught in forecasting courses tends to be discipline-specific:

Statisticians love ARIMA (auto regressive integrated moving average) models, with multivariate versions such as Vector ARIMA, as well as state space models and non-parametric methods such as STL decompositions.
Econometricians and finance academics go one step further into ARIMA variations such as ARFIMA (f=fractional), ARCH (autoregressive conditional heteroskedasticity), GARCH (g=general), NAGARCH (n=nonlinear, a=asymmetric), and plenty more
Electrical engineers use spectral analysis (the equivalent of ARIMA in the frequency domain)
Machine learning researchers use neural nets and other algorithms

In practice, it is common to see 3 types of methods being used by companies for forecasting future values of a time series : exponential smoothing, linear regression, and sometimes ARIMA.

Image from https://itnext.io

Why the difference? Because the goal is different! Statistical models such as ARIMA and all its econ flavors are often used for parameter estimation or statistical inference. Those are descriptive goals (e.g., "is this series a random walk?", "what is the volatility of the errors?"). The spectral approach by electrical engineers is often used for the descriptive goals of characterizing a series' frequencies (signal processing), or for anomaly detection. In contrast, the business applications are strictly predictive: they want forecasts of future values. The simplest methods in terms of ease-of-use, computation, software availability, and understanding, are linear regression models and exponential smoothing. And those methods provide sufficiently accurate forecasts in many applications - hence their popularity!

ML algorithms are in line with a predictive goal, aimed solely at forecasting. ARIMA and state space models can also be used for forecasting (albeit using a different modeling process than for a descriptive goal). The reason ARIMA is commonly used in practice, in my opinion, is due to the availability of automated functions.

For cases with a small number of time series to forecast (a typical case in many businesses), it is usually worthwhile investing time in properly modeling and evaluating each series individually in order to arrive at the simplest solution that provides the required level of accuracy. Data scientists are sometimes over-eager to improve accuracy beyond what is practically needed, optimizing measures such as RMSE, while the actual impact is measured in a completely different way that depends on how those forecasts are used for decision making. For example, forecasting demand has completely different implications for over- vs. under-forecasting; Users might be more averse to certain directions or magnitudes of error.

But what to do when you must forecast a large collection of time series? Perhaps on a frequent basis? This is "big data" in the world of time series. Amazon predict shipping time for each shipment using different shipping methods to determine the best shipping method (optimized with other shipments taking place at the same/nearby time); Uber forecasts ETA for each trip; Google Trends generates forecasts for any keyword a user types in near-realtime. And... IoT applications call for forecasts for time series from each of their huge number of devices. These applications obviously cannot invest time and effort into building handmade solutions. In such cases, automated forecasting is a practical solution. A good "big data" forecasting solution should

be flexible to capture a wide range of time series patterns
be computationally efficient and scalable
be adaptable to changes in patterns that occur over time
provide sufficient forecasting accuracy

In my course "Business Anlaytics Using Forecasting" at NTHU this year, teams have experienced trying to forecast hundreds of series from a company we're collaborating with. They used various approaches and tools. The excellent forecast package in R by Rob Hyndman's team includes automated functions for ARIMA (auto.arima), exponential smoothing (ets), and a single-layer neural net (nnetar). Facebook's prophet algorithm (and R package) runs a linear regression. Some of these methods are computationally heavier (e.g. ARIMA) so implementation matters.

While everyone gets excited about complex methods, in time series so far evidence is that "simple is king": naive forecasts are often hard to beat! In the recent M4 forecasting contest (with 100,000 series), what seemed to work well were combinations (ensembles) of standard forecasting methods such as exponential smoothing and ARIMA combined using a machine learning method for the ensemble weights. Machine learning algorithms were far inferior. The secret sauce is ensembles.

Because simple methods often work well, it is well worth identifying which series really do require more than a naive forecast. How about segmenting the time series into groups? Methods that first fit models to each series and then cluster the estimates are one way to go (although can be too time consuming for some applications). The ABC-XYZ approach takes a different approach: it divides a large set of time series into 4 types, based on the difficulty of forecasting (easy/hard) and magnitude of values (high/low) that can be indicative of their importance.

Forecasting is experiencing a new "split personality" phase, of small-scale tailored forecasting applications that integrate domain knowledge vs. large-scale applications that rely on automated "mass-production" forecasting. My prediction is that these two types of problems will continue to survive and thrive, requiring different types of modeling and different skills by the modelers.

For more on forecasting methods, the process of forecasting, and evaluating forecasting solutions see Practical Time Series Forecasting: A Hands-On Guide and the accompanying YouTube videos.

Tuesday, November 05, 2013

A Tale of Two (Business Analytics) Courses

I have been teaching two business analytics elective MBA-level courses at ISB. One is called "Business Analytics Using Data Mining" (BADM) and the other, "Forecasting Analytics" (FCAS). Although we share the syllabi for both courses, I often receive the following question, in this variant or the other:

What is the difference between the two courses?

The short answer is: BADM is focused on analyzing cross-sectional data, while FCAS is focused on time series data. This answer clarifies the issue to data miners and statisticians, but sometimes leaves aspiring data analytics students perplexed. So let me elaborate.

What is the difference between cross-sectional data and time series data?

Think photography. Cross-sectional data are like a snapshot in time. We might have a large dataset on a large set of customers, with their demographic information and their transactional information summarized in some form (e.g., number of visits thus far). Another example is a transactional dataset, with information on each transaction, perhaps including a flag of whether it was fraudulent. A third is movie ratings on an online movie rental website. You have probably encountered multiple examples of such datasets in the Statistics course. BADM introduces methods that use cross-sectional data for predicting the outcomes for new records. In contrast, time series data are like a video, where you collect data over time. Our focus will be on approaches and methods for forecasting a series into the future. Data examples include daily traffic, weekly demand, monthly disease outbreaks, and so forth.

How are the courses similar?

The two courses are similar in terms of flavor and focus: they both introduce the notion of business analytics, where you identify business opportunities and challenges that can be potentially be tackled with data mining or statistical tools. They are both technical courses, not in the mathematical sense, but rather that we do hands-on work (and a team project) with real data, learning and applying different techniques, and experiencing the entire process from business problem definition to deployment back into the business environment.
In both courses, a team project is pivotal. Teams use real data to tackle a potentially real business problem/opportunity. You can browse presentations and reports from previous years to get an idea. We also use the same software packages in both courses, called XLMiner and TIBCO Spotfire. For those on the Hyderabad campus, BADM and FCAS students will see the same instructor in both courses this year (yes, that's me).

How do the courses differ in terms of delivery?

Since last year, I have "flipped" BADM and turned it into a MOOC-style course. This means that students are expected to do some work online before each class, so that in class we can focus on hands-on data mining, higher level discussions, and more. The online component will also be open to the larger community, where students can interact with alumni and others interested in analytics. FCAS is still offered in the more traditional lecture-style mode.

Is there overlap between the courses?

While the two courses share the data mining flavor and the general business analytics approaches, they have very little overlap in terms of methods, and even then, the implementations are different. For example, while we use linear regression in both cases, it is used in different ways when predicting with cross-sectional data vs. forecasting with time series.

So which course should I take? Should I take both?

Being completely biased, it's difficult for me to tell you not to take any one of these courses. However, I will say that these courses require a large time and effort investment. If you are taking other heavy courses this term, you might want to stick with only one of BADM or FCAS. Taking the two courses will give you a stronger and broader skill set in data analytics, so for those interested in working in the business analytics field, I'd suggest taking both. Finally, if you register for FCAS only, you'll still be able to join the online component for BADM without registering. Although it's not as extensive as taking the course, you'll be able to get a glimpse of data mining with cross-sectional data.

Finally, a historical note: when I taught a similar course at the University of Maryland (in 2004-2010), it was a 14-week semester-long course. In that course, which was mostly focused on cross-sectional methods, I included a chunk on forecasting, so it was a mix. However, the separation into two dedicated courses is more coherent, gives more depth, does more justice to these extremely useful methods and approaches, and allows gaining first-hand experience in the uses of these different types of data structures that are commonly encountered in any organization.

Monday, July 30, 2012

Launched new book website for Practical Forecasting book

Last week I launched a new website for my textbook Practical Time Series Forecasting. The website offers resources such as the datasets used in the book, a block with news that pushes posts to the book Facebook page, information about the book and author, for instructors an online form for requesting an evaluation copy and another for requesting access to solutions, etc.

I am already anticipating my colleagues' question "what platform did you use?". Well, I did not hire a web designer, nor did I spend three months putting the website together using HTML. Instead, I used Google Sites. This is a great solution for those who like to manage their book website on their own (whether you're self-publishing or not). Very readable, clean design, integration with other Google Apps components (such as forms), and as hack-proof as it gets. Not to mention easy to update and maintain, and free hosting.

Thanks to the tools and platforms offered by Google and Amazon, self-publishing is not only a good realistic option for authors. It also allows a much closer connection between the author and the book users -- instructors, students and "independent" readers.

Wednesday, March 07, 2012

Forecasting + Analytics = ?

Quantitative forecasting is an age-old discipline, highly useful across different functions of an organization: from forecasting sales and workforce demand to economic forecasting and inventory planning.

Business schools have offered courses with titles such as "Time Series Forecasting", "Forecasting Time Series Data", "Business Forecasting", more specialized courses such as "Demand Planning and Sales Forecasting" or even graduate programs with title "Business and Economic Forecasting". Simple "Forecasting" is also popular. Such courses are offered at the undergraduate, graduate and even executive education. All these might convey the importance and usefulness of forecasting, but they are far from conveying the coolness of forecasting.

I've been struggling to find a better term for the courses that I teach on-ground and online, as well as for my recent book (with the boring name Practical Time Series Forecasting). The name needed to convey that we're talking about forecasting, particularly about quantitative data-driven forecasting, plus the coolness factor. Today I discovered it! Prof Refik Soyer from GWU's School of Business will be offering a course called "Forecasting for Analytics". A quick Google search did not find any results with this particular phrase -- so the credit goes directly to Refik. I also like "Forecasting Analytics", which links it to its close cousins "Predictive Analytics" and "Visual Analytics", all members of the Business Analytics family.

Tuesday, September 06, 2011

"Predict" or "Forecast"?

What is the difference between "prediction" and "forecasting"? I heard this being asked quite a few times lately. The Predictive Analytics World conference website has a Predictive Analytics Guide page with the following Q&A:

How is predictive analytics different from forecasting?

Predictive analytics is something else entirely, going beyond standard forecasting by producing a predictive score for each customer or other organizational element. In contrast, forecasting provides overall aggregate estimates, such as the total number of purchases next quarter. For example, forecasting might estimate the total number of ice cream cones to be purchased in a certain region, while predictive analytics tells you which individual customers are likely to buy an ice cream cone.

In a recent interview on "Data Analytics", Prof Ram Gopal asked me a similar question. I have a slightly different view of the difference: the term "forecasting" is used when it is a time series and we are predicting the series into the future. Hence "business forecasts" and "weather forecasts". In contrast, "prediction" is the act of predicting in a cross-sectional setting, where the data are a snapshot in time (say, a one-time sample from a customer database). Here you use information on a sample of records to predict the value of other records (which can be a value that will be observed in the future). That's my personal distinction.

While forecasting has traditionally focused on providing "overall aggregate estimates", that has long changed, and methods of forecasting are commonly used to provide individual estimates. Think again of weather forecasts -- you can get forecasts for very specific areas. Moreover, daily (and even minute-by-minute) weather forecasts are generated for many different geographical areas. Another example is SKU-level forecasting for inventory management purposes. Stores and large companies often use forecasting to predict every product they carry. These are not aggregate values, but individual-product forecasts.

"Old fashioned" forecasting has indeed been around for a long time, and has been taught in statistics and operations research programs and courses. While some forecasting models require a lot of statistical expertise (such as ARIMA, GARCH and other acronyms), there is a terrific and powerful set of data-driven, computationally fast, automated methods that can be used for forecasting even at the individual product/service level. Forecasting, in my eyes, is definitely part of predictive analytics.

Saturday, April 09, 2011

Visualizing time series: suppressing one pattern to enhance another pattern

Visualizing a time series is an essential step in exploring its behavior. Statisticians think of a time series as a combination of four components: trend, seasonality, level and noise. All real-world series contain a level and noise, but not necessarily a trend and/or seasonality. It is important to determine whether trend and/or seasonality exist in a series in order to choose appropriate models and methods for descriptive or forecasting purposes. Hence, looking at a time plot, typical questions include:

is there a trend? if so, what type of function can approximate it? (linear, exponential, etc.) is the trend fixed throughout the period or does it change over time?
is there seasonal behavior? if so, is seasonality additive or multiplicative? does seasonal behavior change over time?

Exploring such questions using time plots (line plots of the series over time) is enhanced by suppressing one type of pattern for better visualizing other patterns. For example, suppressing seasonality can make a trend more visible. Similarly, suppressing a trend can help see seasonal behavior. How do we suppress seasonality? Suppose that we have monthly data and there is apparent annual seasonality. To suppress seasonality (also called seasonal adjustment), we can

Plot annual data (either annual averages or sums)
Plot a moving average (an average over a window of 12 months centered around each particular month)
Plot 12 separate series, one for each month (e.g., one series for January, another for February and so on)
Fit a model that captures monthly seasonality (e.g., a regression model with 11 monthly dummies) and look at the residual series

An example is shown in the Figure. The top left plot is the original series (showing monthly ridership on Amtrak trains). The bottom left panel shown a moving average line, suppressing seasonality and showing the trend. The top right panel shows a model that captures the seasonality. The lower left panel shows the residuals from the model, again enhancing the trend.

For further details and examples, see my recently published book Practical Time Series Forecasting: A Hands On Guide (available in soft-cover and as an eBook).

Thursday, March 01, 2007

Lots of real time series data!

I love data-mining or statistics competitions - they always provide great real data! However, the big difference between a gold mine and "just some data" is whether the data description and their context is complete. This reflects, in my opinion, the difference between "data mining for the purpose of data mining" vs. "data mining for business analytics" (or any other field of interest, such as engineering or biology).

Last year, the BICUP2006 posted an interesting dataset on bus ridership in Santiego de Chile. Although there was a reasonable description of the data (number of passengers at a bus stations at 15-minute intervals), there was no information on the actual context of the problem. The goal of the competition was to accuractly forecast 3 days into the future of the data given. Although this has its challenges, the main question is whether a method that accurately predicts these 3 days would be useful for the Santiago Transportation Bureau, or anyone else outside of the competition. For instance, the training data included 3 weeks, where there is a pronounced weekday/weekend effect. However, the prediction set include only 3 weekdays. A method that predicts accuractly weekdays might suffer on weekends. It is therefore imperative to include the final goal of the analysis. Will this forecaster be used to assist in bus scheduling on weekdays only? during rush hours only? How accurate do the forecasts need to be for practical use? Maybe a really simple model predicts accuractly enough for the purpose at hand.

Another such instance is the upcoming NN3 Forecasting Competition (as part of the 2007 International Symposium on Forecasting). The dataset includes 111 time series, varying in length (about 40-140 time points). However, I have not found any description neither of the data nor of the context. In reality we would always know at least the time frequency: are these measurements every second? minute? day? month? year? This information is obviously important for determining factors like seasonality and which methods are appropriate.
To download the data and see a few examples, you will need to register your email.

An example of a gold mine is the T-competition, which concentrates on forecasting transportation data. In addition to the large number of series (ranging in length and at various frequencies from daily to yearly), there is a solid description of what each series is, and the actual dates of measurement. They even include a set of seasonal indexes for each series. The data come from an array of transportation measurements in both Europe and North America.