What is this?

This is an interactive demonstration-explanation of gradient boosting algorithm applied to classification problem. Boosting takes a decision ('blue' or 'orange') by iteratively building many simpler classification algorithms (decision trees in our case).

Challenges

Can you ..

overfit the model? (from some point test loss should increase) achieve test loss < 0.02 on 'xor' dataset (second one) using depth = 2? same for depth = 1? achieve test loss < 0.1 on spiral dataset using minimal depth of trees? fit 'stripes' dataset with trees of depth 1? Can you explain why this is possible? get minimal possible loss on dataset with several embedded circles?

Comments

each time you change some parameter, gradient boosting is recomputed from the scratch

(yes, gradient boosting algorithm is quite fast)

(yes, gradient boosting algorithm is quite fast) if learning rate is small, target doesn't change much. As a result, consequent trees have similar structure

to fight this, different randomizations are introduced: subsampling (take random part of data to train each tree) and random subspaces (take random subset of features to build a tree),

only 2 variables in the demo, so random rotations are used instead of random subspaces.

from other demo you can find why large learning rate is a bad idea and small learning rate is recommended. did you notice? If you set learning rate to be high (without using Newton-Raphson update) only several first trees make serious contribution, other trees are almost not used

datasets from other playgrounds are too easy for gradient boosting,

that's why some challenging datasets were added

that's why some challenging datasets were added updating tree leaves using Newton-Raphson method is something typically ignored in the ML courses for simplicity, but in practice this is a cheap way to build efficient small ensembles.

There are many other things about GB you can find out from this demo. Enjoy!