The neat recent Wall Street Journal article Netflix Aims to Refine Art of Picking Films (Nov 20, 2007) was sent to me by Moshe Cohen, one of my dedicated ex-data-mining-course students. In the article, a spokesman from Netflix demystifies some of the winning techniques in the Netflix $1 million contest. OK, not really demystifying, but revealing two interesting insights:
1) Some teams joined forces by combining their predictions to obtain improved predictions (without disclosing their actual algorithms to each other). Today, for instance, the third best team on the Netflix Leaderboard is "When Gravity and Dinosaurs Unite", which is the result of two teams combining their predictions(Gravity from Hungary and Dinosaur Planet from US). This is an example of the "portfolio approach" which says that combining predictions from a variety of methods (and sometimes a variety of datasets) can lead to higher performance, just like stock portfolios.
2) AT&T, who is currently in the lead, takes an approach that includes 107 different techniques (blended in different ways). You can get a glimpse of these methods in their publicly available document written by Robert Bell, Yehuda Koren, and Chris Volinsky (kudos for the "open-source"!). They use regression models, k-nearest-neighbor methods, collaborative filtering, "portfolios" of the different methods, etc. Again, this shows that "looking" at data from multiple views is usually very beneficial. Like painkillers, a variety is useful because sometimes one works but other times another works better.
Please note that this does NOT suggest that a portfolio approach with painkillers is recommended!
3 comments:
You wrote: You can get a glimpse of these methods in their publicly available document written by Robert Bell, Yehuda Koren, and Chris Volinsky (kudos for the "open-source"!).
But according to the leaders website, this was part of the competetion rules. So make sure you direct your kudos to Netflix for requiring this full disclosure.
Indeed Peter, but does Netflix require disclosure only to them or to the entire public?
The entire public. According to the final bullet in the "Terms and Conditions in a Nutshell" in the Netflix Prize Rules,
"To win and take home either prize...you must share your method with (and non-exclusively license it to) Netflix, and you must describe to the world how you did it and why it works."
Thank you, Netflix!! :)
Post a Comment