Thursday, May 20, 2010

Google's new prediction API

I just learned of the new Prediction API by Google -- in brief, you upload a training set with up to 1 million records and let Google's engine build an algorithm trained on the data. Then, upload a new dataset for prediction, and Google will apply the learned algorithm to score those data.

On the user's side, this is a total blackbox since you have no idea what algorithms are used and which is chosen (probably an ensemble). The predictions can therefore be used for utility (accurate predictions). For researchers, this is a great tool for getting a predictive accuracy benchmark. I foresee future data mining students uploading their data to the Google Prediction API to see how well they could potentially do by mining the data themselves!

From Google's perspective this API presents a terrific opportunity to improve their own algorithms on a wide set of data.

Someone mentioned that there are interesting bits in the FAQ. I like their answer to how accurate are the predictions? which is "more data and cleaner data always triumphs over clever algorithms".

Right now the service is free (if you get an invitation), but it looks like it will eventually be a paid service. Hopefully they will have an "academic version"!

No comments: