Tuesday, September 12, 2006

What are decision trees?

The term "decision tree" has been used in two very different contexts, which causes some confusion. In the context of decision sciences (or decision making), it means a tree structure that assist in decision making, by mapping the different courses of action and assigning costs and probabilities to the different scenarios. There is a good description on MindTools website.

In contrast, "decision trees" are also a popular name for classification trees (or regression trees), a data mining method for predicting an outcome from a set of predictor variables (see, for example, the description on Resample.com). Two well-known types of classification tree algorithms are CART (implemented in software such as CART, SAS Enterprise Miner, and the Excel add-on XLMiner) and C4.5 (implemented in SPSS). An alternative algorithm, which is more statistically oriented and widely used in marketing, is CHAID (implemented in multiple software packages).

Both types of decision trees are tools that are very useful in business applications and decision making. They both use a tree-structure and can generate rules. But otherwise, they are quite different in what they are used for, and how they operate. The decision-sciences decision tree relies on the expert to build the scenarios, assess costs and probabilities of events. In contrast the data-mining decision tree uses a large database of historic data to come up with rules that relate an outcome of interest with a set of predictor variables.

To see how much of a confusion the use of the same term for the two tools causes, check out the definition of Decision Tree in wikipedia. The first paragraph refers to decision theory, while all the rest is the data mining version... So next time, when decision trees are mentioned, make sure sure to find out which tool they are talking about!

No comments: