Pre-pruning or early stopping

RSS

RSS

RSS

Post Pruning

T0

alpha

alpha

$\alpha = alpha\ $

alpha

$\displaystyle \sum_{m=1}^{\left | T \right |} \sum_{i:x_i \epsilon R_m} (y_i - \widehat{y}_{R_m})^ {2} + \alpha\left | T \right |$

$R_m\ is\ the\ rectangle\ corresponding\ to\ mth\ terminal\ node$

$\widehat{y}_{R_m}\ is\ the\ mean\ of\ training\ observations\ in\ R_m$

Steps involved in building Regression Tree using Tree Pruning

Split the data to grow the large tree stopping only when the terminal node contains fewer than some minimum number of observations. For example, we will keep dividing until each region has less than 20 data points.

Apply cost complexity pruning to the large tree and get the sequence of best subtrees as a function of alpha . The idea is to minimize the cost-complexity function. $C_\alpha(T) = R(T) + \alpha \left | T \right |$

$where\ T\ is\ the\ number\ of\ leaves\ of\ the\ tree$

$and\ R(T)\ is\ the\ loss\ function\ calculated\ across\ the\ leaves.$

Use K-Fold cross-validation to choose alpha . Simply putting, divide the training data into K smaller parts and run the fit function on K iterations with leaving K different subsets of training data as validation data and finally calculate the Mean Square Error using these K validation subsets as a function of alpha . (Will discuss k-fold cross-validation in feature engineering post.)

. Simply putting, divide the training data into smaller parts and run the function on iterations with leaving different subsets of training data as validation data and finally calculate the Mean Square Error using these K validation subsets as a function of . (Will discuss k-fold cross-validation in feature engineering post.) Pick the alpha value with a minimum average error.

value with a minimum average error. Return the subtree that corresponds to the chosen value of alpha .

Using sklearn to see pruning effect on trees

import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer from sklearn.tree import DecisionTreeClassifier X , y = load_breast_cancer ( return_X_y = True ) X_train , X_test , y_train , y_test = train_test_split ( X , y , random_state = 0 )

alpha

cost_complexity_pruning_path

DecisionTreeClassifier

clf = DecisionTreeClassifier () path = clf . cost_complexity_pruning_path ( X_train , y_train ) path

{ 'ccp_alphas': array( [ 0 . , 0.00226647 , 0.00464743 , 0.0046598 , 0.0056338 , 0.00704225 , 0.00784194 , 0.00911402 , 0.01144366 , 0.018988 , 0.02314163 , 0.03422475 , 0.32729844 ] ) , 'impurities': array( [ 0 . , 0.00453294 , 0.01847522 , 0.02313502 , 0.02876883 , 0.03581108 , 0.04365302 , 0.05276704 , 0.0642107 , 0.0831987 , 0.10634033 , 0.14056508 , 0.46786352 ] ) }

alpha

impurities

ccp_alphas , impurities = path . ccp_alphas , path . impurities plt . figure ( figsize = ( 10 , 6 )) plt . plot ( ccp_alphas , impurities ) plt . xlabel ( "effective alpha" ) plt . ylabel ( "total impurity of leaves" )

clfs = [] for ccp_alpha in ccp_alphas : clf = DecisionTreeClassifier ( random_state = 0 , ccp_alpha = ccp_alpha ) clf . fit ( X_train , y_train ) clfs . append ( clf )

tree_depths = [ clf . tree_ . max_depth for clf in clfs ] plt . figure ( figsize = ( 10 , 6 )) plt . plot ( ccp_alphas [: - 1 ], tree_depths [: - 1 ]) plt . xlabel ( "effective alpha" ) plt . ylabel ( "total depth" )

alpha

from sklearn.metrics import accuracy_score acc_scores = [ accuracy_score ( y_test , clf . predict ( X_test )) for clf in clfs ] tree_depths = [ clf . tree_ . max_depth for clf in clfs ] plt . figure ( figsize = ( 10 , 6 )) plt . grid () plt . plot ( ccp_alphas [: - 1 ], acc_scores [: - 1 ]) plt . xlabel ( "effective alpha" ) plt . ylabel ( "Accuracy scores" )

0.013

As we have already discussed in the regression tree post that a simple tree prediction can lead to a model which overfits the data and produce bad results with the test data. Tree Pruning is the way to reduce overfitting by creating smaller trees. Tree Pruning isn’t only used for regression trees. We also make use of it in the classification trees as well.As the word itself suggests, the process involves cutting the tree into smaller parts. We can do pruning in two ways.This means stopping before the full tree is even created. The idea is to build the tree only as long as the decrease in thedue to each split exceeds some threshold. This means that we can stop further creation of the tree as soon as thedecrease while producing the next node is lower than the given threshold. This might lead to some shortsightedness as there might be some cases in which there might a large reduction in theat the later ends of the tree creation. Thus, we try to make use of much more complex post pruning.In Post pruning, we grow a large treeand then prune it back in order to obtain a subtree such that we get the lowest test error rate. Now, the problem with this algorithm is that we don’t want to go to every subtree and choose each one of them to calculate the change in the test error rate.orhelps us with that. It introduces a new term,. We pick only those trees which are indexed by thisFor each value ofwe have a subtree which can minimize the value ofWe will use simple data to check the effect of pruning on the Decision Tree. Let’s first get the data and split it accordingly.To get the different applicablevalues and corresponding impurity in the leaf nodes introduced due to that, we can use thefunction present in theclass.This gives,This contains two Numpy Arrays ofand. We can plot this on a graph to see the relation.Finding an optimal value of alpha using PythonAs we already know that there is a strong relation between, alpha and the depth of the tree. We can find the relation using this plot.Use the following code to find the relation betweenand accuracy.We can clearly see that somewhere aroundalpha, we get a very good value of accuracy. Here is the notebook for the following code. Please share on social media and subscribe to the newsletter to read more such posts.