As you can see most of the figure is red! What happened? There is no way for the tree to decide where to split along either of the two features. Each split is as good as any other. In the end it picks one at random, which often leads to a suboptimal choice. In this case it split on feature 2 <= 0.0137. Not a smart move.

For the second split it does not do much better either.

Let's nudge our tree¶

The idea behind bumping is that we can break the symmetry of the problem (or escape the local minimum) by training a decision tree on random subsample. This is similar to bagging. The hope is that in the subsample there will be a preferred split so the tree can pick it.

We fit several trees on different bootstrap) samples (sampling with replacement) and choose the one with the best performance on the full training set as the winner.

The more rounds of bumping we do, the more likely we are to escape. It costs more CPU time as well though.