Tuesday, January 22, 2013

Business analytics student projects a valuable ground for industry-academia ties

Since October 2012, I have taught multiple courses on data mining and on forecasting. Teams of students worked on projects spanning various industries, from retail to eCommerce to telecom. Each project presents a business problem or opportunity that is translated into a data mining or forecasting problem. Using real data, the team then executes the analytics solution, evaluates it and presents recommendations. A select set of project reports and presentations is available on my website (search for 2012 Nov and 2012 Dec projects).

For projects this year, we used three datasets from regional sources (thanks to our industry partners Hansa Cequity and TheBargain.in). One is a huge dataset from an Indian retail chain of hyper markets. Another is data on electronic gadgets on online shopping sites in India. A third is a large survey on mobile usage conducted in India. These datasets were also used in several data mining contests that we set up during the course through CrowdANALYTIX.com and through Kaggle.com. The contests were open to the public and indeed submissions were given from around the world.

Business analytics courses are an excellent ground for industry-academia partnerships. Unlike one-way interactions such as guest lectures from industry or internships or site visits of students, a business analytics project that is conducted by student teams (with faculty guidance) creates value for both the industry partner who shares the data as well as the students. Students who have gained the basic understanding of data analytics can be creative about new uses that companies have not considered (this can be achieved through "ideation contests"). Companies can also use this ground for piloting or testing out the use or their data for addressing goals of interest with little investment. Students get first-hand experience with regional data and problems, and can showcase their project as they interview for positions that require such expertise.

So what is the catch? Building a strong relationship requires good, open-minded industry partners and a faculty member who can lead such efforts. It is a new role for most faculty teaching traditional statistics or data mining courses. Managing data confidentiality, creating data mining contests, initiating and maintaining open communication channels with all stakeholders is nontrivial. But well worth the effort.

Thursday, January 17, 2013

Predictive modeling and interventions (why you need post-intervention data)

In the last few months I've been involved in nearly 20 data mining projects done by student teams at ISB, as part of the MBA-level course and an executive education program.  All projects relied on real data. One of the data sources was transactional data from a large regional hyper market. While the topics of the projects ranged across a large spectrum of business goals and opportunities for retail, one point in particular struck me as repeating across many projects and in many face-to-face discussions. The use of secondary data (data that were already collected for some purpose) for making decisions and deriving insights regarding future interventions. 

By intervention I mean any action. In a marketing context, we can think of personalized coupons, advertising, customer care, etc.

In particular, many teams defined a data mining problem that would help them in determining appropriate target marketing. For example, predict whether the next shopping trip of a customer will include dairy products and then use this for offering appropriate promotions. Another example: predict whether a relatively new customer will be a high-value customer at the end of a year (as defined by some metric related to the customer's spending or shopping behavior), and use it to target for a "white glove" service. In other words, building a predictive model for deciding who, when and what to offer. While this approach seemed natural to many students and professionals, there are two major sticky points:

  1. we cannot properly evaluate the performance of the model in terms of actual business impact without post-intervention data. The reason is that without historical data on a similar intervention, we cannot evaluate how the targeted intervention will perform. For instance, while we can predict who is most likely to purchase dairy products from a large existing transactional database, we cannot tell whether they would redeem a coupon that is targeted to them unless we have some data post a similar coupon campaign.
  2. we cannot build a predictive model that is optimized with the intervention goal unless we have post-intervention data. For example, if coupon redemption is the intervention performance metric, we cannot build a predictive model optimizing coupon redemption unless we have data on coupon redemption.

A predictive model is trained on past data. To evaluate the effect of an intervention, we must have some post-intervention data in order to build a model that aims at optimizing the intervention goal, and also for being able to evaluate model performance in light of that goal. A pilot study/period is therefore a good way to start: either deploy it randomly or to the sample that is indicated by a predictive model to be optimal in some way (it is best to do both: deploy to a sample that has both a random choice and a model-indicated choice). Once you have the post-intervention data on the intervention results, you can build a predictive model to optimize results on a future, larger-scale intervention.

Tuesday, January 15, 2013

What does "business analytics" mean in academia?

But what exactly does this mean?
In the recent ISIS conference, I organized and moderated a panel called "Business Analytics and Big Data: How it affects Business School Research and Teaching". The goal was to tackle the ambiguity in the terms "Business Analytics" and "Big Data" in the context of business school research and teaching. I opened with a few points:

  1. Some research b-schools are posting job ads for tenure-track faculty in "Business Analytics" (e.g., University of Maryland; Google "professor business analytics position" for plenty more). What does this mean? what is supposed to be the background of these candidates and where are they supposed to publish to get promoted? ("The Journal of Business Analytics"?)
  2. A recent special issue of the top scholarly journal Management Science was devoted to "Business Analytics". What types of submissions fall under this label? what types do not?
  3. Many new "Business Analytics" programs have been springing up in business schools worldwide. What is new about their offerings? 

Panelists Anitesh, Ram and David - photo courtesy of Ravi Bapna
The panelist were a mix of academics (Prof Anitesh Barua from UT Austin and Prof Ram Chellapah from Emory University) and industry (Dr. David Hardoon, SAS Singapore). The audience was also a mixed crowd of academics mostly from MIS departments (in business schools) and industry experts from companies such as IBM and Deloitte.

The discussion took various twists and turns with heavy audience discussion. Here are several issues that emerged from the discussion:

  • Is there any meaning to BA in academia or is it just the use of analytics (=data tools) within a business context? Some industry folks said that BA is only meaningful within a business context, not research wise.
  • Is BA just a fancier name for statisticians in a business school or does it convey a different type of statistician? (similar to the adoption of "operations management" (OM) by many operation research (OR) academics)
  • The academics on the panel made the point that BA has been changing the flavor of research in terms of adding a discovery/exploratory dimension that does not typically exist in social science and IS research. Rather than only theorize-then-test-with-data, data are now explored in further detail using tools such as visualization and micro-level models. The main concern, however, was that it is still very difficult to publish such research in top journals.
  • With respect to "what constitutes a BA research article", Prof. Ravi Bapna said "it's difficult to specify what papers are BA, but it is easy to spot what is not BA".
  • While machine learning and data mining have been around for some good time, and the methods have not really changed, the application of both within a business context has become more popular due to friendlier software and stronger computing power. These new practices are therefore now an important core in MBA and other business programs. 
  • One type of b-school program that seems to lag behind on the BA front is the PhD program. Are we equipping our PhD students with abilities to deal with and take advantage of large datasets for developing theory? Are PhD programs revising their curriculum to include big data technologies and machine learning capabilities as required core courses?
Some participants claimed that BA is just another buzzword that will go away after some time. So we need not worry about defining it or demystifying it. After all, the software vendors coin such terms, create a buzz, and finally the buzz moves on. Whether this is the case with BA or with Big Data is yet to be seen. In the meanwhile, we should ponder whether we are really doing something new in our research, and if so, pinpoint to what exactly it is and how to formulate it as requirements for a new era of researchers.