$\begingroup$

(I'm far from expert. These are just musings from a junior statistician who has dealt with different, but loosely analogous, issues. My answer might be out of context.)

Given a new sample to be predicted, and an oracle which has access to a much larger training set, then maybe the "best" and most honest prediction is to say "I predict with 60% probability that this belongs in the Red class rather than the Blue class".

I'll give a more concrete example. Imagine that, in our very large training set, there is a large set of samples that are very similar to our new sample. Of these, 60% are blue and 40% are red. And there appears to be nothing to distinguish the Blues from the Red. In such a case, it's obvious that a 60%/40% is the only prediction a sane person can make.

Of course, we don't have such an oracle, instead we have lots of trees. Simple decision trees are incapable of making these 60%/40% predictions and hence each tree will make a discrete prediction (Red or Blue, nothing in between). As this new sample falls just on the Red side of the decision surface, you will find that almost all of the trees predict Red rather than Blue. Each tree pretends to be more certain than it is and it starts a stampede towards a biased prediction.

The problem is that we tend to misinterpret the decision from a single tree. When a single tree puts a node in the Red class, we should not interpret that as a 100%/0% prediction from the tree. (I'm not just saying that we 'know' that it's probably a bad prediction. I'm saying something stronger, i.e. that we should be careful we interpret as being the tree's prediction). I can't concisely expand on how to fix this. But it is possible to borrow ideas from areas of statistics about how to construct more 'fuzzy' splits within a tree in order to encourage a single tree to be more honest about its uncertainty. Then, it should be possible to meaningfully average the predictions from a forest of trees.

I hope this helps a little. If not, I hope to learn from any responses.