If you didn’t have the chance to check the first part and the second part of my notes about the course Structuring Machine Learning projects, I encourage that you do that ASAP, but in any case you can check this third part of my notes.

.

Error Analysis

Carrying out error analysis

To carry out error analysis, you should find a set of mislabeled examples, either in your dev set, or in your development set. And look at the mislabeled examples for false positives and false negatives. And just count up the number of errors that fall into various different categories.

By counting up the fraction of examples that are mislabeled in different ways, often this will help you prioritize.

Take away

Error analysis: manually examining mistakes that your algorithm is making, can give you insights into what to do next.

Ceiling on performance: Upper bound on how much you could improve performance by working on one problem.

How to carry out error analysis: via the evaluation of multiple ideas in parallel in order to improve our algorithm, using a table. The conclusion of this process gives you an estimate of how worthwhile it might be to work on each of these different categories of errors.

.

Cleaning up: Incorrectly labeled data

With incorrectly labeled examples, DL algorithms are quite robust to random errors in the training set. But DL algorithms are less robust with systematic errors.

Now, this discussion has focused on what to do about incorrectly labeled examples in your training set. How about incorrectly labeled examples in your dev set or test set? If you’re worried about the impact of incorrectly labeled examples on your dev set or test set, what they recommend you do is during error analysis to add one extra column so that you can also count up the number of examples where the label Y was incorrect.

Image Dog Great Cat Blurry Incorrectly Labeled Comments ... 98 labeler missed cat in background 99 100 Drawing of a cat. Not a real cat. % of total 8% 43% 61% 6%

Example A: is it worthwhile going in to try to fix up this 6% of incorrectly labeled examples?

Overall dev set error 10% Errors due incorrect labels 0.6% Errors due to other causes 9.4%

Our system has 90% overall accuracy and 10% error. In this case, 6% of the errors are due to incorrect labels. So 6% of 10% is 0.6%. And then you should look at errors due to all other causes. So if you made 10% error on your dev set and 0.6% of those are because the labels is wrong, then the remainder, 9.4% of them, are due to other causes such as misrecognizing dogs being cats, great cats and their images.

So in this case, go in and fix these incorrect labels it’s maybe not the most important thing to do right now.

.

Example B: is it worthwhile going in to try to fix up this 2% of incorrectly labeled examples?

Overall dev set error 2% Errors due incorrect labels 0.6% Errors due to other causes 1.4%

Let’s say the errors down to 2%, but still 0.6% of your overall errors are due to incorrect labels. And a very large fraction of them, 0.6 divided by 2%, so that is actually 30% rather than 6% of your labels. Your incorrect examples are actually due to incorrectly label examples.

When such a high fraction of your mistakes as measured on your dev set due to incorrect labels, then it maybe seems much more worthwhile to fix up the incorrect labels in your dev set.

Conclusions

First, deep learning researchers sometimes like to say things like, “I just fed the data to the algorithm. I trained in and it worked.” There is a lot of truth to that in the deep learning error. But checking manually the examples can really help you prioritize where to go next.

Take away

Checking manually the examples can really help you prioritize where to go next, so it is really valuable make a error analysis, and after that set a path based on your numbers.

It’s really important that your dev and test sets come from the same distribution. But if your training set comes from a slightly different distribution, often that’s a pretty reasonable thing to do.

.

Build you first system quickly, then iterate

If you’re working on a brand new machine learning application, one of the piece of advice I often give people is that, I think you should build your first system quickly and then iterate.

Depending on the area of application, the guideline below will help you prioritize when you build your system.

Set up development/ test set and metrics – Set up a target Train training set quickly: Fit the parametersBuild an initial system quickly

Development set: Tune the parameters

Test set: Assess the performance Use Bias/Variance analysis & Error analysis to prioritize next steps

Build your first system quickly, then iterate applies less strongly if you’re working on an application area in which you have significant prior experience. It also implies to build less strongly if there’s a significant body of academic literature that you can draw on for pretty much the exact same problem you’re building.

Take away