What is the Confusion Matrix in Machine Learning?





Confusion Matrix in Machine Learning





In the field of machine learning and explicitly the issue of statistical classification, a confusion matrix, otherwise called an error matrix.





A confusion matrix is a table that is frequently used to portray the execution of an order model (or "classifier") on a lot of test information for which the genuine qualities are known. It permits the perception of the execution of a calculation.





It permits simple ID of confusion between classes, for example, one class is usually mislabeled as the other. Most execution measures are registered from the confusion matrix.









This article aims at:





1. What the confusion matrix is and why you have to utilize it.

2. The most effective method to figure a confusion matrix for a 2-class grouping issue sans preparation.

3. The most effective method to make a confusion matrix in Python.









Confusion Matrix:





A confusion matrix is a rundown of forecast results on a classification problem.

The quantity of right and mistaken forecasts are abridged with tally esteems and separated by each class. This is the way to the confusion matrix.

The confusion matrix demonstrates the manners by which your characterization model is befuddled when it makes predictions.

It gives us knowledge not just into the errors being made by a classifier however more critically the sorts of errors that are being made.













Here,

• Class 1: Positive

• Class 2: Negative







Definition of the Terms:





• Positive (P): Observation is positive (for example: is an apple).

• Negative (N): Observation is not positive (for example: is not an apple).

• True Positive (TP): Observation is positive, and is predicted to be positive.

• False Negative (FN): Observation is positive, but is predicted negative.

• True Negative (TN): Observation is negative, and is predicted to be negative.

• False Positive (FP): Observation is negative, but is predicted positive.









Classification Rate/Accuracy:





Classification Rate or Accuracy is given by the relation:





In any case, there are issues with precision. It acknowledges identical costs for the two sorts of errors. A 99% precision can be astounding, extraordinary, unremarkable, poor or awful depending on the issue.









Recall:





Recall can be characterized as the proportion of the all out number of effectively grouped positive models partition to the all outnumber of positive precedents. High Recall demonstrates the class is effectively perceived (a small number of FN).









The recall is given by the relation:













Precision:





To get the value of precision we divide the total number of correctly classified positive examples by the total number of predicted positive examples. High Precision indicates an example of labelled as positive is indeed positive (a small number of FP).

Precision is given by the relation:

High recall, low precision: This means that most of the positive examples are correctly recognized (low FN) but there are a lot of false positives.





Low recall, high precision: This shows that we miss a lot of positive examples (high FN) but those we predict as positive are indeed positive (low FP)









F-measure:





Since we have two measures (Precision and Recall) it has an estimation that speaks to the two. We ascertain an F-measure which utilizes Harmonic Mean instead of Arithmetic Mean as it rebuffs the outrageous qualities more.

The F-Measure will always be nearer to the smaller value of Precision or Recall.













Let’s consider an example now, in which we have infinite data elements of class B and a single element of class A and the model is predicting class A against all the instances in the test data.

Here,

Precision : 0.0

Recall : 1.0









Now:





The arithmetic mean: 0.5

Harmonic mean: 0.0

When taking the arithmetic mean, it would have 50% correct. Despite being the worst possible outcome! While taking the harmonic mean, the F-measure is 0.









Example to interpret confusion matrix:













For the simplification of the above confusion matrix i have added all the terms like TP,FP,etc and the row and column totals in the following image:













Now,

Classification Rate/Accuracy:

Accuracy = (TP + TN) / (TP + TN + FP + FN)= (100+50) /(100+5+10+50)= 0.90









Recall: Recall gives us an idea about when it’s actually yes, how often does it predict yes.

Recall=TP / (TP + FN)=100/(100+5)=0.95









Precision: Precision tells us about when it predicts yes, how often is it correct.

Precision = TP / (TP + FP)=100/ (100+10)=0.91









F-measure:

Fmeasure=(2*Recall*Precision)/(Recall+Presision)=(2*0.95*0.91)/(0.91+0.95)=0.92









Here is a python code which shows how to make a confusion matrix on an anticipated model. For this, we need to import the confusion matrix module from the sklearn library which encourages us to create the confusion matrix.









Below is the Python implementation of the above explanation :





Note that this program might not run on Geeksforgeeks IDE, but it can run easily on your local python interpreter, provided, you have installed the required libraries.









# Python script for confusion matrix creation.





from sklearn.metrics import confusion_matrix

from sklearn.metrics import accuracy_score

from sklearn.metrics import classification_report

actual = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0]

predicted = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0]

results = confusion_matrix(actual, predicted)

print 'Confusion Matrix :'

print(results)

print 'Accuracy Score :',accuracy_score(actual, predicted)

print 'Report : '

print classification_report(actual, predicted)









OUTPUT ->





Confusion Matrix :





[[4 2]





[1 3]]





Accuracy Score: 0.7





Report :





precision recall f1-score support





0 0.80 0.67 0.73 6





1 0.60 0.75 0.67 4





avg / total 0.72 0.70 0.70 10













Full Machine Learning Series

http://bit.ly/2Ufe34U