The problem is nicely described in Wikipedia:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

The initial thought that crosses one's mind is "it doesn't matter if you switch or not" (i.e. probability of 1/2 that the car is behind each of the two closed doors). Turns out that switching is the optimal strategy: if you switch there's a probability of 2/3 to win the car, but if you stay it's only 1/3.

How can this be? note that the door that the host opens is chosen such that it has a goat behind it. In other words, there is some new information that comes in once the door gets opened. The idea behind the solution is to condition on the information that the door that opened had a goat, and therefore we look at event pairs such as "goat-then-car", "goat-then-goat". In probability language, we move from P(car behind door 1) to P(car behind door 1 GIVEN goat behind door 3).

The Tierney Lab, by NYT's blogger John Tierney, writes about the psychology behind the deception in this game. [Thanks to Thomas Lotze for pointing me to this posting!] He quotes a paper by Fox & Levav (2004) that gets to the core of why people get deceived:

People seem to naturally solve probability puzzles by partitioning the set of possible events {Door 1; Door 2; Door 3}, editing out the possibilities that can be eliminated (the door that was revealed by the host), and counting the remaining possibilities, treating them as equally likely (each of two doors has a ½ probability of containing the prize).In other words, they ignore the host. And then comes the embarrassing part about asking MBAs who took a probability course, and they too get it wrong. The authors conclude with a suggestion to teach probability differently:

We suggest that introductory probability courses shouldn’t fight this but rather play to these natural intuitions by starting with an explanation of probability in terms of interchangeable events and random sampling.

What does this mean? My interpretation is to use trees when teaching conditional probabilities. Looking at a tree for the Monty Hall game (assuming that you initially choose door 1) shows the asymmetry of the different options and the effect of the car location relative to your initial choice. I agree that trees are a much more intuitive and easy way to compute and understand conditional probabilities. But I'm not sure how to pictorially show Bayes' Rule in an intuitive way. Ideas anyone?