Showing posts with label teaching. Show all posts
Showing posts with label teaching. Show all posts

Tuesday, September 05, 2017

My videos for “Business Analytics using Data Mining” now publicly available!

Five years ago, in 2012, I decided to experiment in improving my teaching by creating a flipped classroom (and semi-MOOC) for my course “Business Analytics Using Data Mining” (BADM) at the Indian School of Business. I initially designed the course at University of Maryland’s Smith School of Business in 2005 and taught it until 2010. When I joined ISB in 2011 I started teaching multiple sections of BADM (which was started by Ravi Bapna in 2006), and the course was fast growing in popularity. Repeating the same lectures in multiple course sections made me realize it was time for scale! I therefore created 30+ videos, covering various supervised methods (k-NN, linear and logistic regression, trees, naive Bayes, etc.) and unsupervised methods (principal components analysis, clustering, association rules), as well as important principles such as performance evaluation, the notion of a holdout set, and more.

I created the videos to support teaching with our textbook “Data Mining for Business Analytics” (the 3rd edition and a SAS JMP edition came out last year; R edition coming out this month!). The videos highlight the key points in different chapters, (hopefully) motivating the watcher to read more in the textbook, which also offers more examples. The videos’ order follows my course teaching, but the topics are mostly independent.

The videos were a big hit in the ISB courses. Since moving to Taiwan, I've created and offered a similar flipped BADM course at National Tsing Hua University, and the videos are also part of the Statistics.com Predictive Analytics series. I’ve since added a few more topics (e.g., neural nets and discriminant analysis).

The audience for the videos (and my courses and textbooks) is non-technical folks who need to understand the logic and uses of data mining, at the managerial level. The videos are therefore about problem solving, and hence the "Business Analytics" in the title. They are different from the many excellent machine learning videos and MOOCs in focus and in technical level -- a basic statistics course that covers linear regression and some business experience should be sufficient for understanding the videos.
For 5 years, and until last week, the videos were only available to past and current students. However, the word spread and many colleagues, instructors, and students have asked me for access. After 5 years, and in celebration of the first R edition of our textbook Data Mining for Business Analytics: Concepts, Techniques, and Applications in R, I decided to make it happen. All 30+ videos are now publicly available on my BADM YouTube playlist.


Currently the videos cater only to those who understand English. I opened the option for community-contributed captions, in the hope that folks will contribute captions in different languages to help make the knowledge propagate further.

This new playlist complements a similar set of videos, on "Business Analytics Using Forecasting" (for time series), that I created at NTHU and made public last year, as part of a MOOC offered on FutureLearn with the next round opening in October.

Finally, I’ll share that I shot these videos while I was living in Bhutan. They are all homemade -- I tried to filter out barking noises and to time the recording when ceremonies were not held close to our home. If you’re interested in how I made the materials and what lessons I learned for flipping my first course, check out my 2012 post.

Saturday, February 07, 2015

Teaching spaces: "Analytics in a Studio"

My first semester at NTHU has been a great learning experience. I introduced and taught two new courses in our new Business Analytics concentration (data mining and forecasting). Both courses met once a week for a 3-hour session for a full semester (18 weeks). Although I've taught these courses in different forms, in different countries, and to different audiences, I had a special discovery this time. I discovered the critical role of the learning space on the quality of teaching and learning. Specifically for a topic that combines technical, creativity and communication skills.

"Case study" classroom
In my many years of experience as a student and later as a professor at multiple universities, I've experienced two types of spaces: a lecture hall and a "case study" classroom. While the latter is more conducive to in-class discussions, both spaces put the instructor (and his/her slides) in the front, separated from most the students, and place the students in rows. In both cases the instructor is typically standing or moving around, while the students are immobile. Not being exposed to alternatives, I am ashamed to say that I never doubted this arrangement. Until this semester.

Like all discoveries, it started from a challenge: the classroom allocated for my courses was a wide room with two long rows, hardly any space for the instructor and no visibility of the slides for most of the students on the sides. My courses had 20-30 students each. My first attempt was to rearrange the tables to create a U-shape, so that students could see each other and the slides. In hindsight, I was trying to create more of a "case study" environment. After one session I realized it didn't work. The U was too long and narrow and there was a feeling of suffocation. And stagnancy. One of my classes was transferred to a case-type classroom. I was relieved. But for the other class there was no such classroom available. I examined a few different classrooms, but they were all lecture halls suitable for larger audiences.

Teams tackle a challenge using a whiteboard
And then, I discovered "the studio". Intended for design workshops, this was a room with no tables or chairs, with walls that are whiteboards plus double-sided whiteboards on wheels. In a corner was a stack of hard sponge blocks and a few low foldable square tables. There's a projector and a screen. I decided to take the plunge with the data mining course, since it is designed as a blended course where class time is devoted to discussions and hands-on assignments and experiences. [Before coming to class, students read and watch videos, take a short quiz, and contribute to an online discussion].

Here is how we used the space: At least half of each session engaged teams of students in a problem/question that they needed to tackle using a whiteboard. The challenges I came up with emerged from the interaction with the students - from the online discussion board, from discussing the online quizzes, and from confusion/difficulties in the pre-designed in-class assignments. After each team worked on their board, we all moved from board to board, the team explained their approach, and I highlighted something about each solution/attempt. This provided great learning for everyone, including myself, since different teams usually approached the problems in different ways. And they encountered different problems or insights.
Students give feedback on other teams' proposals

The setup was also conducive for team project feedback. After each team presented their proposal, the other teams provided them feedback by writing on their "wall" (whiteboard). This personal touch - rather than an email or discussion board - seems to makes a difference in how the feedback is given and perceived.

Smartphones were often used to take photos of different boards - their own and well as others' boards.

Student demos software to others
During periods of the sessions where students needed to work on laptops, many chose to spread out on the floor - a more natural posture for many folks than sitting at a desk. Some used the sponges to place their laptops. A few used a square table where 4 people faced each other.

We also used the space to start class with a little stretching and yoga! The students liked the space. So did two colleagues (Prof. Rob Hyndman and Prof. Joao Moreira) who teach analytics courses at their universities and visited my courses. Some students complained at first about sitting on the hard floor, so I tried to make sure they don't sit for long, or at least not passively. My own "old school" bias made me forget how it feels to be passively sitting.

Visitor Prof. Moreira experiences the studio
Although I could see the incredible advantages during the semester, I waited till its end to write this post. My perspective now is that teaching analytics in a studio is revolutionary. The space supports deeper learning, beneficial collaboration both within groups and across groups, better personalization of the teaching level by stronger communication between the instructor and students, and overall a high-energy and positive experience for everyone. One reason that makes "analytics in a studio" so powerful is the creativity aspect in data analytics. You use statistical and data mining foundations, but the actual problem-solving requires creativity and out-of-the-box thought.

From my experience, the requirements for "analytics in a studio" to work are:
  1. Students must come prepared to class with the needed technical basics (e.g., via reading/video watching/etc.) 
  2. The instructor must be flexible in terms of the specifics taught. I came into class focused on 2-3 main points students needed to learn, I had in-class assignments, and designed teams-on-whiteboards challenges on-the-fly. 
  3. The instructor is no longer physically in the center, but s/he must be an effective integrator, challenger, and guide of the directions taken. This allows students to unleash their abilities, but in a constructive way. It also helps avoid a feeling of "what did we learn?"
How does "analytics in a studio" scale to larger audiences? I am not sure. While class sizes of many Analytics programs are growing to meet the demand, top programs and educators should carefully consider the benefits of smaller class sizes in terms of learning and learning experience. And they should carefully choose their spaces.

Friday, December 19, 2014

New curriculum design guidelines by American Statistical Association: Who will teach?

The American Statistical Association published new "Curriculum Guidelines for Undergraduate Programs in Statistical Science". This is the first update to the guidelines since 2000.
The executive summary lists the key points:
  1. Increased importance of data science
  2. Real applications
  3. More diverse models and approaches
  4. Ability to communicate
This set sounds right on target with what is expected of statisticians in industry (the authors of the report include prominent statisticians in industry). It highlights the current narrow focus of statistics programs as well as their lack of real-world usability. 

I found three notable mentions in the descriptions of the above points:
Point #1: "Students should be fluent in higher-level programming languages and facile with database systems."
Point #2: "Students require exposure to and practice with a variety of predictive and explanatory models in addition to methods for model-building and assessment."
Point #3: "Students need to be able to communicate complex statistical methods in basic terms to managers and other audiences and to visualize results in an accessible manner"
Agree! But - are Statistics faculty qualified to teach these topics/skills? Since these capabilities are not built into most Statistics graduate programs, faculty in Statistics departments typically have not been exposed to these topics, nor to methods for teaching them (two different skills!). While one can delegate programming to computer science instructors, a gap is being created between the students' abilities and the Statistics faculty abilities.

Point #2 talks about prediction and explanation - an extremely important distinction for both practice and research statisticians. This topic is still quite blurred in the Statistics community as well as in many other domains , and textbooks have still not caught up, thereby creating a gap in needed teaching materials.

Point #3 is an interesting one: while data visualization is a key concept in Statistics, it is typically used in the context of the Exploratory Data Analysis, where charts and summaries are used by the statistician to understand the data prior to analysis. Point #3 talks about a different use of visualization, for the purpose of communication between the statistician and the stakeholder. This requires a different approach to visualization, different from classic classes on box plots, histograms, and computing percentiles.

To summarize: great suggestions for improving the undergrad curriculum. But, successful implementation requires professional development for most faculty teaching in such programs.

Let me add my own key point, which is a critical issue underlying many data scandals and sagas: "Students need to understand what population their final cleaned sample generalizes to". The issue of generalization, not just in the sense of statistical inference, is at the heart of using data to come up with insights and decisions for new records and/or in new situations. After sampling, cleaning (!!), pre-processing, and analyzing the data, you often end up with results that are relevant to a very restricted population, which is far from what you initially intended.

On the aside: note the use of the term "Data Science" in the report - a term now claimed by statisticians, operations researchers, computer scientists and anyone trying to ride the new buzz. What does it mean here? The report reads (page 7):
Although a formal definition of data science is elusive, we concur with the StatsNSF committee statement that data science comprises the “science of planning for, acquisition, management, analysis of, and inference from data.”
Oops - what about non-inference uses such as prediction? and communication?

Tuesday, January 15, 2013

What does "business analytics" mean in academia?

But what exactly does this mean?
In the recent ISIS conference, I organized and moderated a panel called "Business Analytics and Big Data: How it affects Business School Research and Teaching". The goal was to tackle the ambiguity in the terms "Business Analytics" and "Big Data" in the context of business school research and teaching. I opened with a few points:

  1. Some research b-schools are posting job ads for tenure-track faculty in "Business Analytics" (e.g., University of Maryland; Google "professor business analytics position" for plenty more). What does this mean? what is supposed to be the background of these candidates and where are they supposed to publish to get promoted? ("The Journal of Business Analytics"?)
  2. A recent special issue of the top scholarly journal Management Science was devoted to "Business Analytics". What types of submissions fall under this label? what types do not?
  3. Many new "Business Analytics" programs have been springing up in business schools worldwide. What is new about their offerings? 

Panelists Anitesh, Ram and David - photo courtesy of Ravi Bapna
The panelist were a mix of academics (Prof Anitesh Barua from UT Austin and Prof Ram Chellapah from Emory University) and industry (Dr. David Hardoon, SAS Singapore). The audience was also a mixed crowd of academics mostly from MIS departments (in business schools) and industry experts from companies such as IBM and Deloitte.

The discussion took various twists and turns with heavy audience discussion. Here are several issues that emerged from the discussion:

  • Is there any meaning to BA in academia or is it just the use of analytics (=data tools) within a business context? Some industry folks said that BA is only meaningful within a business context, not research wise.
  • Is BA just a fancier name for statisticians in a business school or does it convey a different type of statistician? (similar to the adoption of "operations management" (OM) by many operation research (OR) academics)
  • The academics on the panel made the point that BA has been changing the flavor of research in terms of adding a discovery/exploratory dimension that does not typically exist in social science and IS research. Rather than only theorize-then-test-with-data, data are now explored in further detail using tools such as visualization and micro-level models. The main concern, however, was that it is still very difficult to publish such research in top journals.
  • With respect to "what constitutes a BA research article", Prof. Ravi Bapna said "it's difficult to specify what papers are BA, but it is easy to spot what is not BA".
  • While machine learning and data mining have been around for some good time, and the methods have not really changed, the application of both within a business context has become more popular due to friendlier software and stronger computing power. These new practices are therefore now an important core in MBA and other business programs. 
  • One type of b-school program that seems to lag behind on the BA front is the PhD program. Are we equipping our PhD students with abilities to deal with and take advantage of large datasets for developing theory? Are PhD programs revising their curriculum to include big data technologies and machine learning capabilities as required core courses?
Some participants claimed that BA is just another buzzword that will go away after some time. So we need not worry about defining it or demystifying it. After all, the software vendors coin such terms, create a buzz, and finally the buzz moves on. Whether this is the case with BA or with Big Data is yet to be seen. In the meanwhile, we should ponder whether we are really doing something new in our research, and if so, pinpoint to what exactly it is and how to formulate it as requirements for a new era of researchers.

Thursday, October 04, 2012

Flipping and virtualizing learning

Adopting new technology for teaching has been one of my passions, and luckily my students have been understanding even during glitches or choices that turn out to be ineffective (such as the mobile/Internet voting technology that I wrote about last year). My goal has been to use technology to make my courses more interactive: I use clickers for in-class polling (to start discussions and assess understanding, not for grading!); last year, after realizing that my students were constantly on Facebook, I finally opened a Facebook account and ran a closed FB group for out-of-class discussions; In my online courses on statistics.com I created interactive lessons (slides with media, quizzes, etc.) using Udutu.com. On the pedagogical side, I have tried to focus on hands-on learning: team projects took over exams, in-class presentations and homework that get your hands dirty.

But all these were just baby steps, preparing me for the big leap. In the last month, I have been immersed in a complete transformation of one of my on-ground courses: The new approach is a combination of a new technology and a recent pedagogical movement. The pedagogical side is called 'flipping the classroom', where class time is not spent on one-directional lecturing but rather on discussions and other interactive activities. The technological leap is the move towards a Massive Open Online Course (MOOC) – but in my case a "moderate open online course". As a first step, the course will be open only to the community of the Indian School of Business (students, alumni, faculty and staff). The long term plan is to open it up globally.

The course Business Analytics using Data Mining is opening in less than two weeks. I've been working round-the-clock creating content for the online and on-ground components, figuring out the right technologies that can support all the requirements, and collaborating with colleagues at CrowdANALYTIX and at Hansa Cequity to integrate large local datasets and a platform for running data mining contests into the course.

Here are the ingredients that I found essential:
  • You need strong support from the university! Luckily, ISB is a place that embraces innovation and is willing to evaluate cutting-edge teaching approaches.
  • A platform that is easy for a (somewhat tech-savvy instructor) instructor to design, to upload materials, to update, to interact with participants, and in general, to run. If you are a control freak like me, the last thing you want is to need to ask someone else to upload, edit, or change things. After researching many possibilities, I decided to use the Google platform. Not the new Google Course Builder platform (who has time for programming in Javascript?), but rather a unique combination of Google Sites, Google Drive, Google Groups, YouTube embedding, etc. The key is Google Sites, which is an incredibly versatile tool (and free! thanks Google!). Another advantage of Google Sites is that you have the solid backbone of Google behind you. If your university uses Google Apps for Education, all the better (we hope to move there soon...)
  • Definitely worthwhile to invest in a good video editing software. This was a painful experience. After starting with one software that was causing grief, I switched to Camtasia Studio, and very quickly purchased a license. It is an incredibly powerful yet simple to use software for recording video+screen+audio and then editing (cutting out coughs, for instance)
  • Hardware for lecture videos: Use a good webcam that also has a good mic. I learned that audio quality is the biggest reason for people to stop watching a video. Getting the Thimphu street dogs to stop barking is always a challenge. If you're in a power-outage prone area, make sure to get a back-up battery (UPS).
  • Have several people go over the course platform to make sure that all the links work, the videos stream, etc. Also, get someone to assist with participants' technical queries. There are always those who need hand-holding.
The way the course will work at ISB is that the ISB community can join the online component (lecture videos, guided reading, online forum, contests). Registered students will also attend on-ground meetings that will focus on discussions, project-based learning, and other interactive activities. 

We opened registration to the community today and there are already more than 200 registrants. I guess everyone is curious! Whether the transformation will be a huge success or will die out quietly is yet to be seen. But for sure, there will be insights and learning for all of us.


Tuesday, August 07, 2012

The mad rush: Masters in Analytics programs

The recent trend among mainstream business schools is opening a graduate program or a concentration in Business Analytics (BA). Googling "MS Business Analytics" reveals lots of big players offering such programs. A few examples (among many others) are:

These programs are intended (aside from making money) to bridge the knowledge gap between the "data or IT team" and the business experts. Graduates should be able to lead analytics teams in companies, identifying opportunities where analytics can add value, understanding pitfalls, being able to figure out the needed human and technical resources, and most importantly -- communicating analytics with top management. Unlike "marketing analytics" or other domain-specific programs, Business Analytics programs are "tools" oriented.

As a professor of statistics, I feel a combination of excitement and pain. The word Analytics is clearly more attractive than Statistics. But it is also broader in two senses. First, it combines methods and tools from a wider set of disciplines: statistics, operations research, artificial intelligence, computer science. Second, although technical skills are required to some degree, the focus is on the big picture and how the tools fit into the business process. In other words, it's about Business Analytics.

I am excited about the trend of BA programs because finally they are able to force disciplines such as statistics into considering the large picture and fitting in both in terms of research and teaching. Research is clearly better guided by real problems. The top research journals are beginning to catch up: Management Science has an upcoming special issue on Business Analytics. As for teaching, it is exciting to teach students who are thirsty for analytics. The challenge is for instructors with PhDs in statistics, operations, computer science or other disciplines to repackage the technical knowledge into a communicable, interesting and useful curriculum. Formulas or algorithms, as beautiful as they might appear to us, are only tolerated when their beauty is clearly translated into meaningful and useful knowledge. Considering the business context requires a good deal of attention and often modifying our own modus operandi (we've all been brainwashed by our research discipline).

But then, there's the painful part of the missed opportunity for statisticians to participate as major players (or is it envy?). The statistics community seems to be going through this cycle of "hey, how did we get left behind?". This happened with data mining, and is now happening with data analytics. The great majority of Statistics programs continuously fail to be the leaders of the non-statistics world. Examining the current BA trend, I see that

  1. Statisticians are typically not the leaders of these programs. 
  2. Business schools who lack statistics faculty (and that's typical) are either hiring non-research statisticians as adjunct faculty to teach statistics and data mining courses or else these courses are taught by faculty from other areas such as information systems and operations.
  3. "Data Analytics" or "Analytics" degrees are still not offered by mainstream Statistics departments. For example, North Carolina State U has an Institute for Advanced Analytics that offers an MS in Analytics degree. Yet, this does not appear to be linked to the Statistics Department's programs. Carnegie Mellon's Heinz Business College offers a Master degree with concentration in BI and BA, yet the Statistics department offers a Masters in Statistical Practice.
My greatest hope is that a new type of "analytics" research faculty member evolves. The new breed, while having deep knowledge in one field, will also posses more diverse knowledge and openness to other analytics fields (statistical modeling, data mining, operations research methods, computing, human-computer visualization principles). At the same time, for analytics research to flourish, the new breed academic must have a foot in a particular domain, any domain, be it in the social sciences, humanities, engineering, life-sciences, or other. I can only imagine the exciting collaboration among such groups of academics, as well as the value that they bring to research, teaching and knowledge dissemination to other fields.

Wednesday, March 07, 2012

Forecasting + Analytics = ?

Quantitative forecasting is an age-old discipline, highly useful across different functions of an organization: from  forecasting sales and workforce demand to economic forecasting and inventory planning.

Business schools have offered courses with titles such as "Time Series Forecasting", "Forecasting Time Series Data", "Business Forecasting",  more specialized courses such as "Demand Planning and Sales Forecasting" or even graduate programs with title "Business and Economic Forecasting". Simple "Forecasting" is also popular. Such courses are offered at the undergraduate, graduate and even executive education. All these might convey the importance and usefulness of forecasting, but they are far from conveying the coolness of forecasting.

I've been struggling to find a better term for the courses that I teach on-ground and online, as well as for my recent book (with the boring name Practical Time Series Forecasting). The name needed to convey that we're talking about forecasting, particularly about quantitative data-driven forecasting, plus the coolness factor. Today I discovered it! Prof Refik Soyer from GWU's School of Business will be offering a course called "Forecasting for Analytics". A quick Google search did not find any results with this particular phrase -- so the credit goes directly to Refik. I also like "Forecasting Analytics", which links it to its close cousins "Predictive Analytics" and "Visual Analytics", all members of the Business Analytics family.


Monday, April 25, 2011

Google Spreadsheets for teaching probability?

In business schools it is common to teach statistics courses using Microsoft Excel, due to its wide accessibility and the familiarity of business students with the software. There is a large debate regarding this practice, but at this point the reality is clear: the figure that I am familiar with is about 50% of basic stat courses in b-schools use Excel and 50% use statistical software such as Minitab or JMP.

Another trend is moving from offline software to "cloud computing" -- Software such as www.statcrunch.com offer basic stat functions in an online, collaborative, social-networky style.

Following the popularity of spreadsheet software and the cloud trend, I asked myself whether the free Google Spreadsheets can actually do the job. This is part of my endeavors to find free (or at least widely accessible) software for teaching basic concepts. While Google Spreadsheets does have quite an extensive function list, I discovered that its current computing is very limited. For example, computing binomial probabilities using the function BINOMDIST is limited to a sample size of about 130 (I did report this problem). Similarly, HYPGEOMDIST results in overflow errors for reasonably small sample and population sizes.


From the old days when we used to compute binomial probabilities manually, I am guessing that whoever programmed these functions forgot to use the tricks that avoid computing high factorials in n-choose-k type calculations...