An exciting new dataset is out there for us data aficionados! Netflix, the huge movie renter, announced a $1 million prize for the winner of a competition who can improve upon their Cinematch algorithm for predicting movie ratings. The competition started at the beginning of the month and has already created a lot of buzz. The company put out there a huge training set that includes millions of movie ratings. Competing teams can use this dataset to come up with prediction algorithms, and then submit predictions for a test set.
The training dataset contain more than 100 million ratings from a random sample of 480,000 (unidentifiable) users on 18,000 movies.
The $1 million grand prize goes to the team that can reduce the RMSE of Cinematch by 10% on the test set. There are also modest $50,000 "progress prizes".
Putting aside the monetary incentive, and the goal of beating Cinamatch on the test set, this is a great dataset for research purposes. And Netflix has been generous enough to allow usage of the data for research purposes.
Another fun aspect is to read the posting on the forum! The various opinions, questions, and answers are a feast for anyone interested in online communities.