Showing posts with label Analytics. Show all posts
Showing posts with label Analytics. Show all posts

Wednesday, September 19, 2012

Self-publishing to the rescue

The new Coursera course by Princeton Professor Mung Chiang was so popular that Amazon and the publisher ran out of copies of the textbook before the course even started (see "new website features" announcement; requires login). I experienced a stockout of my own textbook ("Data Mining for Business Intelligence") a couple of years ago, which caused grief and slight panic to both students and instructors.

With stockouts in mind, and recognizing the difficulty of obtaining textbooks outside of North America (unavailable, too expensive, or long/costly shipping), I decided to take things into my own hands and self-publish a "Practical Analytics" series of textbooks. Currently, the series has three books. All are available in soft-cover editions and Kindle editions. I used CreateSpace.com, an Amazon company, for publishing the soft-cover editions. This reduces the stockout problem due to a print-on-demand model. I used Amazon KDP for publishing the Kindle editions, so definitely no stockouts there. Amazon makes the books available on its global websites and so reachable in many places worldwide (the Indian Flipkart also avails the books). Finally, since I got to set the prices, I made sure to keep them affordable (for example, in India the e-books are even cheaper than in the USA).

How has this endeavor fared? Well, more than 1000 copies were sold since March 2011. Several instructors adopted books for their courses. And from reader emails and ratings on Amazon, it looks like I'm on the right track.

To celebrate the power and joy of self-publishing as well as accessible and affordable knowledge, I am running a "free e-book promotion" next week. The following e-books will be available for free:

Both promotions will commence a little after midnight, Pacific Standard Time, and will last for 24 hours. To download each of the e-books, just go to the Amazon website during the promotion period and search for the title. You will then be able to download the book for free.

Enjoy, and feel free to share!

Tuesday, August 07, 2012

The mad rush: Masters in Analytics programs

The recent trend among mainstream business schools is opening a graduate program or a concentration in Business Analytics (BA). Googling "MS Business Analytics" reveals lots of big players offering such programs. A few examples (among many others) are:

These programs are intended (aside from making money) to bridge the knowledge gap between the "data or IT team" and the business experts. Graduates should be able to lead analytics teams in companies, identifying opportunities where analytics can add value, understanding pitfalls, being able to figure out the needed human and technical resources, and most importantly -- communicating analytics with top management. Unlike "marketing analytics" or other domain-specific programs, Business Analytics programs are "tools" oriented.

As a professor of statistics, I feel a combination of excitement and pain. The word Analytics is clearly more attractive than Statistics. But it is also broader in two senses. First, it combines methods and tools from a wider set of disciplines: statistics, operations research, artificial intelligence, computer science. Second, although technical skills are required to some degree, the focus is on the big picture and how the tools fit into the business process. In other words, it's about Business Analytics.

I am excited about the trend of BA programs because finally they are able to force disciplines such as statistics into considering the large picture and fitting in both in terms of research and teaching. Research is clearly better guided by real problems. The top research journals are beginning to catch up: Management Science has an upcoming special issue on Business Analytics. As for teaching, it is exciting to teach students who are thirsty for analytics. The challenge is for instructors with PhDs in statistics, operations, computer science or other disciplines to repackage the technical knowledge into a communicable, interesting and useful curriculum. Formulas or algorithms, as beautiful as they might appear to us, are only tolerated when their beauty is clearly translated into meaningful and useful knowledge. Considering the business context requires a good deal of attention and often modifying our own modus operandi (we've all been brainwashed by our research discipline).

But then, there's the painful part of the missed opportunity for statisticians to participate as major players (or is it envy?). The statistics community seems to be going through this cycle of "hey, how did we get left behind?". This happened with data mining, and is now happening with data analytics. The great majority of Statistics programs continuously fail to be the leaders of the non-statistics world. Examining the current BA trend, I see that

  1. Statisticians are typically not the leaders of these programs. 
  2. Business schools who lack statistics faculty (and that's typical) are either hiring non-research statisticians as adjunct faculty to teach statistics and data mining courses or else these courses are taught by faculty from other areas such as information systems and operations.
  3. "Data Analytics" or "Analytics" degrees are still not offered by mainstream Statistics departments. For example, North Carolina State U has an Institute for Advanced Analytics that offers an MS in Analytics degree. Yet, this does not appear to be linked to the Statistics Department's programs. Carnegie Mellon's Heinz Business College offers a Master degree with concentration in BI and BA, yet the Statistics department offers a Masters in Statistical Practice.
My greatest hope is that a new type of "analytics" research faculty member evolves. The new breed, while having deep knowledge in one field, will also posses more diverse knowledge and openness to other analytics fields (statistical modeling, data mining, operations research methods, computing, human-computer visualization principles). At the same time, for analytics research to flourish, the new breed academic must have a foot in a particular domain, any domain, be it in the social sciences, humanities, engineering, life-sciences, or other. I can only imagine the exciting collaboration among such groups of academics, as well as the value that they bring to research, teaching and knowledge dissemination to other fields.

Thursday, August 04, 2011

The potential of being good

Yesterday I happened to hear talks by two excellent speakers, both on major data mining applications in industry. One common theme was that both speakers gave compelling and easy to grasp examples of what data mining algorithms and statistics can do beyond human intelligence, and how the two relate.

The first talk, by IBM's Global Services Christer Johnson, was given at the 2011 INFORMS Conference on Business Analytics and Operations Research (see video). Christer Johnson described the idea behind Watson, the artificial intelligence computer system developed by IBM that beat two champions of the Jeopardy quiz show. Two main points in the talk about the relationship between humans and data mining methods that I especially liked are:
  1. Data analytics methods are designed not only to give an answer, but also to evaluate how confident they are about the answer. In answering the jeopardy questions, the data mining approach tells you not only what is the most likely answer, but also how confident you are about that answer.
  2. Building trust in an analytics tool occurs when you see it make mistakes and learn from those mistakes.
The second talk, "The Art and Science of Matching Items to Users" was given by Deepak Agarwal , a Yahoo! principle research scientist and fellow statistician, was webcasted at ISB's seminar series. You can still catch it on Aug 10 at Yahoo!'s Big Thinker Series in Bangalore. The talk was about recommender systems and their use within Yahoo!. Among various approaches used by Yahoo! to improve recommendations, Deepak described a main idea for improving the customization of news item displays on news.yahoo.com.

On the relation between human intelligence and automation, the process of choosing which items to display on Yahoo! is a two-step process, where first human editors create a pool of potential interesting news items, and then automated machine-learning algorithms choose which individual items to display from that pool.

Like Christer Johnson's point #2, Deepak illustrated the difference between "the answer" (what we statisticians call a point estimate) and "the potential of it being good" (what we call the confidence in the estimate, AKA variability) in a very cool way: Consider two news items of which one will be displayed to a user. The first item was already shown to 100 users and 2 users clicked on links from that page. The second was shown  to 10,000 users and 250 users clicked on links. Which news item should you show to maximize clicks? (yes, this is about ad revenues...) Although the first item has a lower click-through-rate (2%), it is also less certain, in the sense that it is based on less data than item 2. Hence, it is potentially good. He then took this one step further: Combine the two! "Exploit what is known to be good, explore what is potentially good".

So what do we have here? Very practical and clear examples of why we care about variance, the weakness of point estimates, and expanding the notion of diversification to combining certain good results with uncertain not-that-good results.

Tuesday, November 16, 2010

November Analytics magazine on BI

click to read the latest issue
A bunch of interesting articles about business analytics and predictive analytics from a managerial point of view, in the November issue of INFORMS Analytics magazine.