Monday, April 21, 2008

Good predictions by wrong model?

Are explaining and predicting the same? An age-old debate in philosophy of science started with Hempel & Oppenheim's 1948 paper that equates the logical structure of predicting and explaining (saying that in effect they are the same, except that in explaining the phenomenon already happened while in prediction it hasn't occurred). Later on it was recognized that the two are in fact very different.

When it comes to statistical modeling, how are the two different? Do we model data differently when the goal is to explain than to predict? In a recent paper co-authored with Otto Koppius from Erasmus University, we show how modeling is different in every step.

Let's take the argument to an extreme: Can a wrong model lead to correct predictions? Well, here's an interesting example: Although we know that the ancient Ptolemaic astronomic model, which postulates that the universe revolves around earth, is wrong it turns out that this model generated very good predictions of planet motion, speed, brightness, and sizes as well as eclipse times. The predictions are easy to compute and fairly accurate that they still serve today as engineering approximations and have even been used in navigation until not so long ago.

So how does a wrong model produce good predictions? It's all about the difference between causality and association. A "correct" model is one that identifies the causality structure. But for a good predictive model all we need are good associations!

Tuesday, April 15, 2008

Are conditional probabilities intuitive?

Somewhere in the early 90's I started as a teaching assistant for the "intro to probability" course. Before introducing conditional probabilities, I recall presenting the students with the "Let's make a deal" problem that was supposed to show them that their intuition is often wrong and therefore they should learn about laws of probability, and especially conditional probability and Bayes' Rule. This little motivation game was highlighted in last week's NYT with an extremely cool interactive interface: welcome to the Monty Hall Problem!

The problem is nicely described in Wikipedia:
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

The initial thought that crosses one's mind is "it doesn't matter if you switch or not" (i.e. probability of 1/2 that the car is behind each of the two closed doors). Turns out that switching is the optimal strategy: if you switch there's a probability of 2/3 to win the car, but if you stay it's only 1/3.

How can this be? note that the door that the host opens is chosen such that it has a goat behind it. In other words, there is some new information that comes in once the door gets opened. The idea behind the solution is to condition on the information that the door that opened had a goat, and therefore we look at event pairs such as "goat-then-car", "goat-then-goat". In probability language, we move from P(car behind door 1) to P(car behind door 1 GIVEN goat behind door 3).

The Tierney Lab, by NYT's blogger John Tierney, writes about the psychology behind the deception in this game. [Thanks to Thomas Lotze for pointing me to this posting!] He quotes a paper by Fox & Levav (2004) that gets to the core of why people get deceived:
People seem to naturally solve probability puzzles by partitioning the set of possible events {Door 1; Door 2; Door 3}, editing out the possibilities that can be eliminated (the door that was revealed by the host), and counting the remaining possibilities, treating them as equally likely (each of two doors has a ½ probability of containing the prize).
In other words, they ignore the host. And then comes the embarrassing part about asking MBAs who took a probability course, and they too get it wrong. The authors conclude with a suggestion to teach probability differently:
We suggest that introductory probability courses shouldn’t fight this but rather play to these natural intuitions by starting with an explanation of probability in terms of interchangeable events and random sampling.


What does this mean? My interpretation is to use trees when teaching conditional probabilities. Looking at a tree for the Monty Hall game (assuming that you initially choose door 1) shows the asymmetry of the different options and the effect of the car location relative to your initial choice. I agree that trees are a much more intuitive and easy way to compute and understand conditional probabilities. But I'm not sure how to pictorially show Bayes' Rule in an intuitive way. Ideas anyone?

Wednesday, April 02, 2008

Data Mining Cup 2008 releases data today

Although the call for this competition has been out for a while on KDnuggets.com, today is the day when the data and the task description are released. This data mining competition is aimed at students. The prizes probably might not sound that attractive to student ("participation in the KDD 2008, the world's largest international conference for "Knowledge Discovery and Data Mining" (August 24-27, 2008 in Las Vegas)", so I'd say the real prize is cracking the problem and winning!

An interesting related story that I recently heard from Chris Volinsky from the Belkor team (who is currently in first place) is the high level of collaboration that competing teams have been exhibiting during the Netflix Prize. Although you'd think the $1 million would be a sufficient incentive for not sharing, it turns out that the fun of the challenge leads teams to collaborate and share ideas! You can see some of this collaboration on the NetflixPrize Forum.