Decision Trees and Random Forests

Decision tree learning is a method commonly used in data mining. Tree logic uses a series of steps to come to a conclusion. The trick is to have many mini-decision combine for good choices.

In a decision tree, each decision is a node. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf. And the final prediction is a leaf node.

Based on previous data, the goal is to specify branches of choices that lead to good predictions in new scenarios. A tree can be “learned” by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node has all the same value of the target variable, or when splitting no longer adds value to the predictions.

CART

Classification And Regression Tree (CART) analysis is an umbrella term used for decision trees. Trees used for regression and trees used for classification have some similarities – but also some differences, such as the procedure used to determine where to split. By default the tree command in R uses CART and two arguments: “mincut” and “mindev” can respectively specify the minimum size of a new child and the minimum deviance improvement before proceeding with a new split.

Three main benefits to using trees over classification, or some other supervised learning technique, are: (1) you automatically can deal with nonlinearity, (2) it automatically includes interactions, and (3) you can handle nonconstant variance.

Random Forests

While CART is a very efficient algorithm to choose a single trees, there might be many possible trees that fit the data similarly well. An alternative approach is to use random forests.

Random Forests grow many classification trees and uses “model averaging,” a concept that fits models across small data samples and weights each model using cross validation. This “average” fit is similar in concept to regression to the mean and generally produces a better prediction model than CART.