BzST | Business Analytics, Statistics, Teaching: teaching business data mining

Showing posts with label teaching business data mining. Show all posts

Tuesday, September 05, 2017

My videos for “Business Analytics using Data Mining” now publicly available!

Five years ago, in 2012, I decided to experiment in improving my teaching by creating a flipped classroom (and semi-MOOC) for my course “Business Analytics Using Data Mining” (BADM) at the Indian School of Business. I initially designed the course at University of Maryland’s Smith School of Business in 2005 and taught it until 2010. When I joined ISB in 2011 I started teaching multiple sections of BADM (which was started by Ravi Bapna in 2006), and the course was fast growing in popularity. Repeating the same lectures in multiple course sections made me realize it was time for scale! I therefore created 30+ videos, covering various supervised methods (k-NN, linear and logistic regression, trees, naive Bayes, etc.) and unsupervised methods (principal components analysis, clustering, association rules), as well as important principles such as performance evaluation, the notion of a holdout set, and more.

I created the videos to support teaching with our textbook “Data Mining for Business Analytics” (the 3rd edition and a SAS JMP edition came out last year; R edition coming out this month!). The videos highlight the key points in different chapters, (hopefully) motivating the watcher to read more in the textbook, which also offers more examples. The videos’ order follows my course teaching, but the topics are mostly independent.

The videos were a big hit in the ISB courses. Since moving to Taiwan, I've created and offered a similar flipped BADM course at National Tsing Hua University, and the videos are also part of the Statistics.com Predictive Analytics series. I’ve since added a few more topics (e.g., neural nets and discriminant analysis).

The audience for the videos (and my courses and textbooks) is non-technical folks who need to understand the logic and uses of data mining, at the managerial level. The videos are therefore about problem solving, and hence the "Business Analytics" in the title. They are different from the many excellent machine learning videos and MOOCs in focus and in technical level -- a basic statistics course that covers linear regression and some business experience should be sufficient for understanding the videos.

For 5 years, and until last week, the videos were only available to past and current students. However, the word spread and many colleagues, instructors, and students have asked me for access. After 5 years, and in celebration of the first R edition of our textbook Data Mining for Business Analytics: Concepts, Techniques, and Applications in R, I decided to make it happen. All 30+ videos are now publicly available on my BADM YouTube playlist.

Currently the videos cater only to those who understand English. I opened the option for community-contributed captions, in the hope that folks will contribute captions in different languages to help make the knowledge propagate further.

This new playlist complements a similar set of videos, on "Business Analytics Using Forecasting" (for time series), that I created at NTHU and made public last year, as part of a MOOC offered on FutureLearn with the next round opening in October.

Finally, I’ll share that I shot these videos while I was living in Bhutan. They are all homemade -- I tried to filter out barking noises and to time the recording when ceremonies were not held close to our home. If you’re interested in how I made the materials and what lessons I learned for flipping my first course, check out my 2012 post.

Wednesday, August 19, 2015

Categorical predictors: how many dummies to use in regression vs. k-nearest neighbors

Recently I've had discussions with several instructors of data mining courses about a fact that is often left out of many books, but is quite important: different treatment of dummy variables in different data mining methods.

From http://blog.excelmasterseries.com

Statistics courses that cover linear or logistic regression teach us to be careful when including a categorical predictor variable in our model. Suppose that we have a categorical variable with m categories (e.g., m countries). First, we must factor it into m binary variables called dummy variables, D1, D2,..., Dm (e.g., D1=1 if Country=Japan and 0 otherwise; D2=1 if Country=USA and 0 otherwise, etc.) Then, we include m-1 of the dummy variables in the regression model. The major point is to exclude one of the m dummy variables to avoid redundancy. The excluded dummy's category is called the "reference category". Mathematically, it does not matter which dummy you exclude, although the resulting coefficients will be interpreted relative to the reference category, so if interpretation is important it's useful to choose the reference category as the one we most want to compare against.

In linear and logistic regression models, including all m variables will lead to perfect multicollinearity, which will typically cause failure of the estimation algorithm. Smarter software will identify the problem and drop one of the dummies for you. That is why every statistics book or course on regression will emphasize the need to drop one of the dummy variables.

Now comes the surprising part: when using categorical predictors in machine learning algorithms such as k-nearest neighbors (kNN) or classification and regression trees, we keep all m dummy variables. The reason is that in such algorithms we do not create linear combinations of all predictors. A tree, for instance, will choose a subset of the predictors. If we leave out one dummy, then if that category differs from the other categories in terms of the output of interest, the tree will not be able to detect it! Similarly, dropping a dummy in kNN would not incorporate the effect of belonging to that category into the distance used.

The only case where dummy variable inclusion is treated equally across methods is for a two-category predictor, such as Gender. In that case a single dummy variable will suffice in regression, kNN, CART, or any other data mining method.

Saturday, February 07, 2015

Teaching spaces: "Analytics in a Studio"

My first semester at NTHU has been a great learning experience. I introduced and taught two new courses in our new Business Analytics concentration (data mining and forecasting). Both courses met once a week for a 3-hour session for a full semester (18 weeks). Although I've taught these courses in different forms, in different countries, and to different audiences, I had a special discovery this time. I discovered the critical role of the learning space on the quality of teaching and learning. Specifically for a topic that combines technical, creativity and communication skills.

"Case study" classroom

In my many years of experience as a student and later as a professor at multiple universities, I've experienced two types of spaces: a lecture hall and a "case study" classroom. While the latter is more conducive to in-class discussions, both spaces put the instructor (and his/her slides) in the front, separated from most the students, and place the students in rows. In both cases the instructor is typically standing or moving around, while the students are immobile. Not being exposed to alternatives, I am ashamed to say that I never doubted this arrangement. Until this semester.

Like all discoveries, it started from a challenge: the classroom allocated for my courses was a wide room with two long rows, hardly any space for the instructor and no visibility of the slides for most of the students on the sides. My courses had 20-30 students each. My first attempt was to rearrange the tables to create a U-shape, so that students could see each other and the slides. In hindsight, I was trying to create more of a "case study" environment. After one session I realized it didn't work. The U was too long and narrow and there was a feeling of suffocation. And stagnancy. One of my classes was transferred to a case-type classroom. I was relieved. But for the other class there was no such classroom available. I examined a few different classrooms, but they were all lecture halls suitable for larger audiences.

Teams tackle a challenge using a whiteboard

And then, I discovered "the studio". Intended for design workshops, this was a room with no tables or chairs, with walls that are whiteboards plus double-sided whiteboards on wheels. In a corner was a stack of hard sponge blocks and a few low foldable square tables. There's a projector and a screen. I decided to take the plunge with the data mining course, since it is designed as a blended course where class time is devoted to discussions and hands-on assignments and experiences. [Before coming to class, students read and watch videos, take a short quiz, and contribute to an online discussion].

Here is how we used the space: At least half of each session engaged teams of students in a problem/question that they needed to tackle using a whiteboard. The challenges I came up with emerged from the interaction with the students - from the online discussion board, from discussing the online quizzes, and from confusion/difficulties in the pre-designed in-class assignments. After each team worked on their board, we all moved from board to board, the team explained their approach, and I highlighted something about each solution/attempt. This provided great learning for everyone, including myself, since different teams usually approached the problems in different ways. And they encountered different problems or insights.

Students give feedback on other teams' proposals

The setup was also conducive for team project feedback. After each team presented their proposal, the other teams provided them feedback by writing on their "wall" (whiteboard). This personal touch - rather than an email or discussion board - seems to makes a difference in how the feedback is given and perceived.

Smartphones were often used to take photos of different boards - their own and well as others' boards.

Student demos software to others

During periods of the sessions where students needed to work on laptops, many chose to spread out on the floor - a more natural posture for many folks than sitting at a desk. Some used the sponges to place their laptops. A few used a square table where 4 people faced each other.

We also used the space to start class with a little stretching and yoga! The students liked the space. So did two colleagues (Prof. Rob Hyndman and Prof. Joao Moreira) who teach analytics courses at their universities and visited my courses. Some students complained at first about sitting on the hard floor, so I tried to make sure they don't sit for long, or at least not passively. My own "old school" bias made me forget how it feels to be passively sitting.

Visitor Prof. Moreira experiences the studio

Although I could see the incredible advantages during the semester, I waited till its end to write this post. My perspective now is that teaching analytics in a studio is revolutionary. The space supports deeper learning, beneficial collaboration both within groups and across groups, better personalization of the teaching level by stronger communication between the instructor and students, and overall a high-energy and positive experience for everyone. One reason that makes "analytics in a studio" so powerful is the creativity aspect in data analytics. You use statistical and data mining foundations, but the actual problem-solving requires creativity and out-of-the-box thought.

From my experience, the requirements for "analytics in a studio" to work are:

Students must come prepared to class with the needed technical basics (e.g., via reading/video watching/etc.)
The instructor must be flexible in terms of the specifics taught. I came into class focused on 2-3 main points students needed to learn, I had in-class assignments, and designed teams-on-whiteboards challenges on-the-fly.
The instructor is no longer physically in the center, but s/he must be an effective integrator, challenger, and guide of the directions taken. This allows students to unleash their abilities, but in a constructive way. It also helps avoid a feeling of "what did we learn?"

How does "analytics in a studio" scale to larger audiences? I am not sure. While class sizes of many Analytics programs are growing to meet the demand, top programs and educators should carefully consider the benefits of smaller class sizes in terms of learning and learning experience. And they should carefully choose their spaces.

Thursday, November 28, 2013

Running a data mining contest on Kaggle

Following the success last year, I've decided once again to introduce a data mining contest in my Business Analytics using Data Mining course at the Indian School of Business. Last year, I used two platforms: CrowdAnalytix and Kaggle. This year I am again using Kaggle. They offer free competition hosting for university instructors, called InClass Kaggle.

Setting up a competition on Kaggle is not trivial and I'd like to share some tips that I discovered to help fellow colleagues. Even if you successfully hosted a Kaggle contest a while ago, some things have changed (as I've discovered). With some assistance from the Kaggle support team, who are extremely helpful, I was able to decipher the process. So here goes:

Step #1: get your dataset into the right structure. Your initial dataset should include input and output columns for all records (assuming that the goal is to predict the outcome from the inputs). It should also include an ID column with running index numbers.

Save this as an Excel or CSV file.
Split the records into two datasets: a training set and a test set.
Keep the training and test datasets in separate CSV files. For the test set, remove the outcome column(s).
Kaggle will split the test set into a private and public subsets. It will score each of them separately. Results for the public records will appear in the leaderboard. Only you will see the results for the private subsets. If you want to assign the records yourself to public/private, create a column Usage in the test dataset and type Private or Public for each record.

Step #2: open a Kaggle InClass account and start a competition using the wizard. Filling in the Basic Details and Entry & Rules pages is straightforward.

Step #3: The tricky page is Your Data. Here you'll need to follow the following sequence in order to get it working:

Choose the evaluation metric to be used in the competition. Kaggle has a bunch of different metrics to choose from. In my two Kaggle contests, I actually wanted a metric that was not on the list, and voila! the support team was able to help by activating a metric that was not generally available for my competition. Last year I used a lift-type measure. This year it is an average-cost-per-observation metric for a binary classification task. In short, if you don't find exactly what you're looking for, it is worth asking the folks at Kaggle.
After the evaluation metric is set, upload a solution file (CSV format). This file should include only an ID column (with the IDs for all the records that participants should score), and the outcome column(s). If you include any other columns, you'll get error messages. The first row of your file should include the names of these columns.
After you've uploaded a solutions file, you'll be able to see whether it was successful or not. Aside from error messages, you can see your uploaded files. Scroll to the bottom and you'll see the file that you've submitted; or if you submitted multiple times, you'll see all the submitted files; if you selected a random public/private partition, the "derived solution" file will include an extra column with labels "public" and "private". It's a good idea to download this file, so that you can later compare your results with the system.
After the solution file has been successfully uploaded and its columns mapped, you must upload a "sample submission file". This file is used to map the columns in the solutions file with what needs to be measured by Kaggle. The file should include an ID column like that in the solution file, plus a column with the predictions. Nothing more, nothing less. Again, the first row should include the column names. You'll have an option to define rules about allowed values for these columns.
After successfully submitting the sample submission file, you will be able to test the system by submitting (mock) solutions in the "submission playground". One good test is using the naive rule (in a classification task, submit all 0s or all 1s). Compare your result to the one on Kaggle to make sure everything is set up properly.
Finally, in the "Additional data files" you upload the two data files: the training dataset (which includes the ID, input and output columns) and the test dataset (which includes the ID and input columns). It is also useful to upload a third file, which contains a sample valid submission. This will help participants see what their file should look like, and they can also try submitting this file to see how the system works. You can use the naive-rule submission file that you created earlier to test the system.
That's it! The rest (Documentation, Preview and Overview) are quite straightforward. After you're done, you'll see a button "submit for review". You can also share the contest with another colleague prior to releasing it. Look for "Share this competition wizard with a coworker" on the Basic Details page.

If I've missed tips or tricks that others have used, please do share. My current competition, "predicting cab booking cancellation" (using real data from YourCabs in Bangalore) has just started, and it will be open not only to our students, but to the world.

Submission deadline: Midnight Dec 22, 2013, India Standard Time. All welcome!

Tuesday, November 05, 2013

A Tale of Two (Business Analytics) Courses

I have been teaching two business analytics elective MBA-level courses at ISB. One is called "Business Analytics Using Data Mining" (BADM) and the other, "Forecasting Analytics" (FCAS). Although we share the syllabi for both courses, I often receive the following question, in this variant or the other:

What is the difference between the two courses?

The short answer is: BADM is focused on analyzing cross-sectional data, while FCAS is focused on time series data. This answer clarifies the issue to data miners and statisticians, but sometimes leaves aspiring data analytics students perplexed. So let me elaborate.

What is the difference between cross-sectional data and time series data?

Think photography. Cross-sectional data are like a snapshot in time. We might have a large dataset on a large set of customers, with their demographic information and their transactional information summarized in some form (e.g., number of visits thus far). Another example is a transactional dataset, with information on each transaction, perhaps including a flag of whether it was fraudulent. A third is movie ratings on an online movie rental website. You have probably encountered multiple examples of such datasets in the Statistics course. BADM introduces methods that use cross-sectional data for predicting the outcomes for new records. In contrast, time series data are like a video, where you collect data over time. Our focus will be on approaches and methods for forecasting a series into the future. Data examples include daily traffic, weekly demand, monthly disease outbreaks, and so forth.

How are the courses similar?

The two courses are similar in terms of flavor and focus: they both introduce the notion of business analytics, where you identify business opportunities and challenges that can be potentially be tackled with data mining or statistical tools. They are both technical courses, not in the mathematical sense, but rather that we do hands-on work (and a team project) with real data, learning and applying different techniques, and experiencing the entire process from business problem definition to deployment back into the business environment.
In both courses, a team project is pivotal. Teams use real data to tackle a potentially real business problem/opportunity. You can browse presentations and reports from previous years to get an idea. We also use the same software packages in both courses, called XLMiner and TIBCO Spotfire. For those on the Hyderabad campus, BADM and FCAS students will see the same instructor in both courses this year (yes, that's me).

How do the courses differ in terms of delivery?

Since last year, I have "flipped" BADM and turned it into a MOOC-style course. This means that students are expected to do some work online before each class, so that in class we can focus on hands-on data mining, higher level discussions, and more. The online component will also be open to the larger community, where students can interact with alumni and others interested in analytics. FCAS is still offered in the more traditional lecture-style mode.

Is there overlap between the courses?

While the two courses share the data mining flavor and the general business analytics approaches, they have very little overlap in terms of methods, and even then, the implementations are different. For example, while we use linear regression in both cases, it is used in different ways when predicting with cross-sectional data vs. forecasting with time series.

So which course should I take? Should I take both?

Being completely biased, it's difficult for me to tell you not to take any one of these courses. However, I will say that these courses require a large time and effort investment. If you are taking other heavy courses this term, you might want to stick with only one of BADM or FCAS. Taking the two courses will give you a stronger and broader skill set in data analytics, so for those interested in working in the business analytics field, I'd suggest taking both. Finally, if you register for FCAS only, you'll still be able to join the online component for BADM without registering. Although it's not as extensive as taking the course, you'll be able to get a glimpse of data mining with cross-sectional data.

Finally, a historical note: when I taught a similar course at the University of Maryland (in 2004-2010), it was a 14-week semester-long course. In that course, which was mostly focused on cross-sectional methods, I included a chunk on forecasting, so it was a mix. However, the separation into two dedicated courses is more coherent, gives more depth, does more justice to these extremely useful methods and approaches, and allows gaining first-hand experience in the uses of these different types of data structures that are commonly encountered in any organization.

Tuesday, January 22, 2013

Business analytics student projects a valuable ground for industry-academia ties

Since October 2012, I have taught multiple courses on data mining and on forecasting. Teams of students worked on projects spanning various industries, from retail to eCommerce to telecom. Each project presents a business problem or opportunity that is translated into a data mining or forecasting problem. Using real data, the team then executes the analytics solution, evaluates it and presents recommendations. A select set of project reports and presentations is available on my website (search for 2012 Nov and 2012 Dec projects).

For projects this year, we used three datasets from regional sources (thanks to our industry partners Hansa Cequity and TheBargain.in). One is a huge dataset from an Indian retail chain of hyper markets. Another is data on electronic gadgets on online shopping sites in India. A third is a large survey on mobile usage conducted in India. These datasets were also used in several data mining contests that we set up during the course through CrowdANALYTIX.com and through Kaggle.com. The contests were open to the public and indeed submissions were given from around the world.

Business analytics courses are an excellent ground for industry-academia partnerships. Unlike one-way interactions such as guest lectures from industry or internships or site visits of students, a business analytics project that is conducted by student teams (with faculty guidance) creates value for both the industry partner who shares the data as well as the students. Students who have gained the basic understanding of data analytics can be creative about new uses that companies have not considered (this can be achieved through "ideation contests"). Companies can also use this ground for piloting or testing out the use or their data for addressing goals of interest with little investment. Students get first-hand experience with regional data and problems, and can showcase their project as they interview for positions that require such expertise.

So what is the catch? Building a strong relationship requires good, open-minded industry partners and a faculty member who can lead such efforts. It is a new role for most faculty teaching traditional statistics or data mining courses. Managing data confidentiality, creating data mining contests, initiating and maintaining open communication channels with all stakeholders is nontrivial. But well worth the effort.

Thursday, October 04, 2012

Flipping and virtualizing learning

Adopting new technology for teaching has been one of my passions, and luckily my students have been understanding even during glitches or choices that turn out to be ineffective (such as the mobile/Internet voting technology that I wrote about last year). My goal has been to use technology to make my courses more interactive: I use clickers for in-class polling (to start discussions and assess understanding, not for grading!); last year, after realizing that my students were constantly on Facebook, I finally opened a Facebook account and ran a closed FB group for out-of-class discussions; In my online courses on statistics.com I created interactive lessons (slides with media, quizzes, etc.) using Udutu.com. On the pedagogical side, I have tried to focus on hands-on learning: team projects took over exams, in-class presentations and homework that get your hands dirty.

But all these were just baby steps, preparing me for the big leap. In the last month, I have been immersed in a complete transformation of one of my on-ground courses: The new approach is a combination of a new technology and a recent pedagogical movement. The pedagogical side is called 'flipping the classroom', where class time is not spent on one-directional lecturing but rather on discussions and other interactive activities. The technological leap is the move towards a Massive Open Online Course (MOOC) – but in my case a "moderate open online course". As a first step, the course will be open only to the community of the Indian School of Business (students, alumni, faculty and staff). The long term plan is to open it up globally.

The course Business Analytics using Data Mining is opening in less than two weeks. I've been working round-the-clock creating content for the online and on-ground components, figuring out the right technologies that can support all the requirements, and collaborating with colleagues at CrowdANALYTIX and at Hansa Cequity to integrate large local datasets and a platform for running data mining contests into the course.

Here are the ingredients that I found essential:

You need strong support from the university! Luckily, ISB is a place that embraces innovation and is willing to evaluate cutting-edge teaching approaches.
A platform that is easy for a (somewhat tech-savvy instructor) instructor to design, to upload materials, to update, to interact with participants, and in general, to run. If you are a control freak like me, the last thing you want is to need to ask someone else to upload, edit, or change things. After researching many possibilities, I decided to use the Google platform. Not the new Google Course Builder platform (who has time for programming in Javascript?), but rather a unique combination of Google Sites, Google Drive, Google Groups, YouTube embedding, etc. The key is Google Sites, which is an incredibly versatile tool (and free! thanks Google!). Another advantage of Google Sites is that you have the solid backbone of Google behind you. If your university uses Google Apps for Education, all the better (we hope to move there soon...)
Definitely worthwhile to invest in a good video editing software. This was a painful experience. After starting with one software that was causing grief, I switched to Camtasia Studio, and very quickly purchased a license. It is an incredibly powerful yet simple to use software for recording video+screen+audio and then editing (cutting out coughs, for instance)
Hardware for lecture videos: Use a good webcam that also has a good mic. I learned that audio quality is the biggest reason for people to stop watching a video. Getting the Thimphu street dogs to stop barking is always a challenge. If you're in a power-outage prone area, make sure to get a back-up battery (UPS).
Have several people go over the course platform to make sure that all the links work, the videos stream, etc. Also, get someone to assist with participants' technical queries. There are always those who need hand-holding.

The way the course will work at ISB is that the ISB community can join the online component (lecture videos, guided reading, online forum, contests). Registered students will also attend on-ground meetings that will focus on discussions, project-based learning, and other interactive activities.

We opened registration to the community today and there are already more than 200 registrants. I guess everyone is curious! Whether the transformation will be a huge success or will die out quietly is yet to be seen. But for sure, there will be insights and learning for all of us.

Wednesday, August 17, 2011

Where computer science and business meet

Data mining is taught very differently at engineering schools and at business schools. At engineering schools, data mining is taught more technically, deciphering how different algorithms work. In business schools the focus is on how to use algorithms in a business context.

Business students with a computer science background can now enjoy both worlds: take a data mining course with a business focus, and supplement it with the free course materials from Stanford Engineering school's Machine Learning course (including videos of lectures and handouts by Prof Andrew Ng). There are a bunch of other courses with free materials as part of the Stanford Engineering Everywhere program.

Similarly, computer science students with a business background can take advantage of MIT's Sloan School of Management Open Courseware program, and in particular their Data Mining course (last offered in 2003 by Prof Nitin Patel). Unfortunately, there are no lecture videos, but you do have access to handouts.

And for instructors in either world, these are great resources!

Friday, October 09, 2009

SAS On Demand: Enterprise Miner -- Update

Following up on my previous posting about using SAS Enterprise Minder via the On Demand platform: From continued communication with experts at SAS, it turns out that with the EM version 5.3, which is the one available through On Demand, there is no way to work (or even access) non-SAS files. Their suggestion solution is to use some other SAS product like SAS BASE, or even SAS JMP (which is available through the On Demand platform) in order to convert your CSV files to SAS data files...

From both a pedagogical and practical point of view, I am reluctant to introduce SAS EM through On Demand to my MBA students. They will dislike the idea of downloading, learning, and using yet another software package (even if it is a client) just for the purpose of file conversion (from ordinary CSV files into SAS data files).

So at this point it seems as though SAS EM via the On Demand platform may be useful in SAS-based courses that use SAS data files. Hopefully SAS will upgrade the version to the latest, which is supposed to be able to handle non-SAS data files.

Saturday, October 03, 2009

SAS On Demand: Enterprise Miner

I am in the process of trying out SAS Enterprise Miner via the (relatively new) SAS On Demand for Academics. In our MBA data mining course at Smith, we introduce SAS EM. In the early days, we'd get individual student licenses and have each student install the software on their computer. However, the software took too much space and it was also very awkward to circulate a packet of CDs between multiple students. We then moved to the Server option, where SAS EM is available on the Smith School portal. Although it solved the individual installation and storage issues, the portal version is too slow to be practically useful for even a modest project. Disconnects and other problems have kept students away. So now I am hoping that the On Demand service that SAS offers (which they call SODA) will work.

For the benefit of other struggling instructors, here's my experience thus far: I have been unable to access any non-SAS data files, and therefore unable to evaluate the product. The On Demand version installed is EM 5.3, which is still very awkward in terms of importing data, and especially non-SAS data. It requires uploading files to the SAS server via FTP, and then opening SAS EM, creating a new project, and then inserting a line or two of SAS code into the non-obvious "startup code" tab. The code includes a LIBNAME statement for creating a path to one's library, and a FILENAME statement in order to reach files in that library (thank goodness I learned SAS programming as an undergrad!). Definitely not for the faint of heart, and I suspect that MBAs won't love this either.

I've been in touch with SAS support and thus far we haven't solved the data access issue, although they helped me find the path where my files were sitting in (after logging in to SAS On Demand For Academics, and clicking on your course, click on "how to use this directory").

If you have been successful with this process, please let me know!
I will post updates when I conquer this, one way or another.

Tuesday, June 10, 2008

Resources for instructors of data mining courses in b-schools

With the increasing popularity of data mining courses being offered in business schools (at the MBA and undergraduate levels), a growing number of faculty are becoming involved. Instructors come from diverse backgrounds: statistics, information systems, machine learning, management science, marketing, and more.

Since our textbook Data Mining for Business Intelligence has been out, I've received requests from many instructors to share materials, information, and other resources. At last, I have launched BZST Teaching, a forum for instructors teaching data mining in b-schools. The forum is open only to instructors and a host of materials and communication options are available. It is founded on the idea of sharing with the expectation that members will contribute to the pool.

If you are interested in joining, please email me directly.

Wednesday, September 19, 2007

Webcast on Analytics in the Classroom

Tomorrow at 11:00 EST I will be giving a webcast describing several term projects by MBAs in my data mining class. Students have been working on real business projects in my class for 4 years now, with many of the projects leading to important insights to the companies who provided the data (in most cases the students' workplaces).

For each of several cases I will describe the business objective; we'll look at the data via interactive visualization using Spotfire, and then examine some of the analyses and findings.

The webcast is organized by Spotfire (now a division of TIBCO). We have been using their interactive visualization software in classes via their b-school education outreach program.

To join the webcast tomorrow, please register: Analytics in the Classroom- Giving today's MBA's a Competitive Advantage, by Dr. Galit Shmueli, Univ. of MD