Monday, April 21, 2008

Good predictions by wrong model?

Are explaining and predicting the same? An age-old debate in philosophy of science started with Hempel & Oppenheim's 1948 paper that equates the logical structure of predicting and explaining (saying that in effect they are the same, except that in explaining the phenomenon already happened while in prediction it hasn't occurred). Later on it was recognized that the two are in fact very different.

When it comes to statistical modeling, how are the two different? Do we model data differently when the goal is to explain than to predict? In a recent paper co-authored with Otto Koppius from Erasmus University, we show how modeling is different in every step.

Let's take the argument to an extreme: Can a wrong model lead to correct predictions? Well, here's an interesting example: Although we know that the ancient Ptolemaic astronomic model, which postulates that the universe revolves around earth, is wrong it turns out that this model generated very good predictions of planet motion, speed, brightness, and sizes as well as eclipse times. The predictions are easy to compute and fairly accurate that they still serve today as engineering approximations and have even been used in navigation until not so long ago.

So how does a wrong model produce good predictions? It's all about the difference between causality and association. A "correct" model is one that identifies the causality structure. But for a good predictive model all we need are good associations!

No comments: