How accurate is your model?

What is the process of evaluating your model?

Why do we need to evaluate our model?

cancer

no cancer

Supervised learning and classification problems.

Supervised Learning

Classification Problem

Test train split using sklearn

import pandas as pd from sklearn.model_selection import train_test_split df = pd . read_csv ( 'Ecommerce Customers' ) X = df [[ 'Avg. Session Length' , 'Time on App' , 'Time on Website' , 'Length of Membership' ]] y = df [ 'Yearly Amount Spent' ] df = pd . read_csv ( 'Ecommerce Customers' ) X_train , X_test , y_train , y_test = train_test_split ( X , y , test_size = 0.3 )

test_size

0.3

Accuracy

$accuracy = \frac{number\ of\ correct\ responses}{Total\ number\ of\ test\ cases}$

predictions

X_test

y_test

Unbalanced data is the type of dataset in which you have more outcomes for one type of data and fewer outcomes for others.

dog

90%

Recall

It is the ability of your model to find all the relevant cases in your model.

$recall = \frac{number\ of\ true\ positives}{number\ of\ true\ positives + No.\ of\ false\ Negatives}$

$recall = \frac{volcanos\ correctly\ identified}{volcanos\ correctly\ identified + volcanos\ incorrectly\ labelled\ to\ not\ erupt\ tomorrow}$

$recall = \frac{20}{20 + 0}$

$recall = 1$

recall

precision

Precision

Ability of a model to identify only the relevant data points.

$precision = \frac{number\ of\ true\ positives}{number\ of\ true\ positives + No.\ of\ false\ Positives}$

$precision = \frac{volcanos\ correctly\ identified}{volcanos\ correctly\ identified + volcanos\ incorrectly\ labelled\ to\ erupt\ tomorrow}$

$precision = \frac{20}{20 + 80}$

$precision = 0.2$

Accuracy and Recall using Venn Diagram

Total time you Stopped shooting: x Total time you Stopped shooting with survivor ship ahead you: y Total time you Stopped shooting with Opposition Space ship ahead you: z

Precision

$precision = \frac{Correctly\ stopped\ shooting}{Total\ Number\ of\ Shooting\ stops}$

Recall

$recall = \frac{Correctly\ stopped\ shooting}{Total\ Number\ of\ times\ survivor\ ships\ in\ front}$

Accuracy vs Recall using Example.

Case 1: Disease detection (Covid 19)

False Negative

False Positive

Recall

Accuracy

Recall

Case 2: Email Detection (Spam or Not spam)

False Positive

False Negative

Precision

F1-score

It is the harmonic mean of precision and recall.

$F1-score = \frac{2 * precision * recall}{precision * recall}$

Confusion matrix

True Positive

False Positive

True Negative

False Negative

confusion matrix

sklearn

from sklearn.metrics import confusion_matrix print ( confusion_matrix ( y_test , predictions )) # where y_test is the data frame of test values # and predictions are the model predicted values

recall

precision

from sklearn.metrics import classification_report print ( classification_report ( y_test , predictions ))

Introduction to KNN | K-nearest neighbor classification algorithm using Examples #python #sklearn #knn #machinelearning



March 22, 2020 7 mins read



Evaluating for regression problems

Mean Absolute Error

$MAE = \frac{1}{n}\sum\left | Y_a - Y_e \right |$

$where\ Y_a\ is\ the\ actual\ value$

$and\ Y_e\ is\ the\ model\ evaluated\ value$

sklearn

from sklearn import metrics metrics . mean_absolute_error ( y_test , predictions )

Mean Square Error

$MSE = \frac{1}{n}\sum (Y_a - Y_e)^{2}$

sklearn

from sklearn import metrics metrics . mean_squared_error ( y_test , predictions )

100 square unit

Root Mean Square Error

MSE

RMSE

MSE

$RMSE = \sqrt{\frac{1}{n}\sum (Y_a - Y_e)^{2}}$

sklearn

from sklearn import metrics np . sqrt ( metrics . mean_squared_error ( y_test , predictions ))

What is considered as a good metric value for your model?

You are happy with your Machine Learning model and excited to share it with your client. You go up to your clients and show them the new model. You are excited to run it on the new production data. Checking your eagerness, the client asks this question.What should be your answer to this question? Today we are going to talk about the problem of calculating the accuracy of your model along with some code samples that will help you to calculate them easily. At the end of the post, you will be able to know and understand all the ways of evaluating your machine learning model.So, Machine Learning is a simple way of predicting the results with the input that the model has not seen before. For example, Predicting stock prices with the historical data related to that particular stock which can tell us, whether it would be profitable to buy a stock on a particular day or not. Evaluating a model is like checking the accuracy of the model when test data is passed onto the model - a piece of data that the model has never seen before.In general, it is a good metric to come up with when we are talking about your model with the guys in other teams who don’t understand tech. No model is 100% correct and there is no perfect score you want to achieve. It simply depends on case to case basis. Also, sometimes giving the wrong answers is better than giving a very wrong answer. For example, if you are building a cancer detector depending upon the test reports of the patient. You might want to tell that the patient might haveand finally get off with the real cancer test, rather than givingoutput from your model when in reality the person had cancer.are the problems where the outcomes of the model are already known. For example a data set of housing prices of an area.is a subset of supervised learning where the outcomes are generally divided into two or more parts. For example, whether a person is having cancer or not. All these models can be evaluated on the following parameters.Generally, we divide the total dataset into two parts. The first dataset is known as the training data and the other is known as the test data. The idea behind such a division of the data is to use test data just for evaluation purposes. To find if the model we are trying to use for the given dataset is good enough, or we want to use a different one. Here is a code sample that can help you to divide your data frame into several parts using scikit-learn is the amount of data that you want to your test split to be.means that the test split would be 30%.Accuracy is the simple calculation where you divide the number of data points evaluated correctly by the number of total data points.We first calculate thecorresponding to the given, finally, we compare these predictions with thewhich are the real outcomes to the corresponding parameters. We will talk more about how we calculate these values in later posts . Keep track of the progress by subscribing . Accuracy is one of the easiest ways to evaluate the performance of your model.This type of evaluation model is not the best thing to use when the data available to you is unbalanced.For example In a cat-dog images dataset, out of 100 pictures, you have 90 pictures of dog and 10 pictures of cat. In such case, if you are checking the accuracy for dog pictures and your model always return, no matter what picture is thrown at it, the accuracy in any case will beMost of these evaluation models are used when the data in mainly imbalanced In a case of a model that classifies every volcano to erupt the next day having a data set with 20% values of it erupting the next day, The value of recall will look something like this.Although the value ofis great, it goes with the value ofand in most cases both of their value is considered.According to the same volcano problem,In general, it is a standard to maximize both the values of recall and precision.Consider that we were playing a simple game online. The rules of the game are pretty simple, you will have to shoot the space ships of the opposition constantly, until you see a survivor space ship coming toward you.Now let’s consider that your model was playing the game from a long time and have collected a lot of related data. Data points that we collected belonged to these 3 categories.Now, we created a Venn diagram to explain the occurrences.Precision is the value which gives the value of number of times you took the correct action of all the times you took an action. So, precision is given by the intersection divided by the blue part of the Venn diagram. i.e.Recall is the value which gives the value of all the times you were supposed to take an action of all the times you took an action. So, Recall is given by the intersection divided by the orange part of the Venn diagram.Choosing what you want as your evaluation metric really depends upon the problem that you are trying to solve. I will take a few examples that will help you decide when to use the first and when to use the next.Let’s say you are working on a model that is trying to detect whether the person is having COVID 19 . In this particular case, the cost ofis more than the cost of. Hence, if you predict that someone is not having a disease when someone is actually carrying one will lead to some bad circumstances. They might go out and start spreading it. On the other hand, if you predict someone not really carrying the disease when in reality he/she is a disease carrier, it is not a big of an issue, because we will obviously do further tests on them. In this case, we will try to lower the result of our evaluation metric when the number of false negatives is increasing. Looking at the formulas ofand, we know that the thing that we want to use is,In this problem, the cost ofis more than the cost of. If your models start predicting important mails as spam and start sending them to the spam folder, it would be really bad. So, in this case, you will want to choose a metric that will lower its value when the number of False positives starts increasing. Again we know the right metric for such a problem isF1-score helps us to consider both the values of precision and recall while evaluating our model.Confusion matrix is a Matrix in which we evaluate all the positives and negatives like:It also helps you to evaluate your machine learning model in a better way.The image shows thefor a case where patient’s data was tested for a specific disease. Out of total 165 patients our model produced the following results.Our model predicted that 100 patients were carrying the disease, and they were actually carrying the disease.Our model predicted that 50 patients were not carrying the disease, and they actually were not carrying it.Our model predicted that 10 patients were carrying the disease, and they actually were not carrying it. This is also known asOur model predicted that 5 patients were not carrying the disease, and they actually were carrying it. This is also known as. We can go forward and calculate all the values for Accuracy, Recall, Precision and F1-Score from this confusion matrix. To print the confusion matrix of a model inuse the following code.You can also print a matrix containing all values likeetc. using the following code,For more information, read the following postRegression problems are a little different from the categorization problems as the output is just not a single value. Here you can actually see how off your predicted value was from the real value. Here are the three ways in which regression models can be evaluated.Its a simple difference between the predicted and the actual values. Here is the simplest way in which you can calculate Mean Absolute error for your model.You can get the value in usingusing the following code.The only problem with this model is that it doesn’t punish the outliers. If a value is very far from the current value, it will not punish it and the error will get averaged out by other values.Mean square error is the sum of squared errors of the predicted and actual values.You can get the value in usingusing the following code.The only problem with this value is that you can’t really make any sense of the value. For example, you can’t really tell that our model isaccurate.To solve the problem with the, we usejust by square rooting the values of. We can then simply tell about the proficiency of our model from the value derived from this.You can calculate this is Python using Numpy andAs we have already discussed a good metric value of the model really depends upon the type model you are evaluating. You will always have to check with the peers who really are going to use your model. Hope you liked the post, do leave your thoughts on what you use to evaluate your model and what do you think is a good metric value for your model.