• Home
  • Blog
  • Our Services
  • Contact Us
  • Register
Have any question?

(+91) 844 745 8168
[email protected]”>
RegisterLogin
AI Next Generation
  • Home
  • Blog
  • Our Services
  • Contact Us
  • Register

Machine Learning

  • Home
  • Blog
  • Machine Learning
  • Classification Metrics in Machine Learning

Classification Metrics in Machine Learning

  • Posted by Rehan
  • Date August 18, 2021
  • Comments 0 comment

Introduction

In this blog, we will deeply implement from scratch for classification metrics in Machine Learning and then we will match with the sklearn libraries for different kinds of metrics before going into detail you must go through our previous blog on machine learning https://ainewgeneration.com/logistic-regression/. The purpose of this blog is to familiarize you with the metrics used to measure prediction performance in classification systems. Now let’s get started and build every classification matrices from scratch and compare the results with pre-build sklearn libraries.

Table of Content

  1. pseudo Data for Classification Metrics
  2. Calculate Model Predictions
  3. Calculate the Model Accuracy
  4. Calculate the Model Error Rate
  5. Calculate the Model Precision and Recall
  6. Calculate the TPR and FPR for ROC Curve
  7. Compute and Plot the ROC Curve

pseudo Data for Classification Metrics

Suppose there are 20 binary observations whose target values are:

true_labels = [1,0,0,1,0,0,1,0,0,1,1,0,0,0,0,1,0,1,0,1]

Suppose that your machine learning model returns prediction probabilities are :

pred_probs =
[0.886,0.375,0.174,0.817,0.574,0.319,0.812,0.314,0.098,0.741,0.847,0.202,0.31,0.073,0.179,0.917,0.64,0.388,0.116,0.72]

We are going to use pseudo data true_labels as our y_test and pred_prob as our probability predicted by our model.

Calculate Model Prediction

So Begin by writing a function from scratch called predict()that accepts as input a list of prediction probabilities and a threshold value and computes the final predictions to be output by the model. If a prediction probability value is less than the threshold value, then the prediction is the negative case (i.e. 0). If a prediction probability value is greater than the threshold value, then the prediction is the positive case (i.e. 1).

def predict(pred_prob,threshold):
    pred = []  
    for i in range(len(pred_prob)):
        if list[i] >= threshold: 
            pred.append(1) 
        else:
            pred.append(0)
    return pred

Next, we will invoke the predict() function to calculate the model predictions using the threshold value of 0.5 and pred_probs list.

thresh = 0.5
pred_probs =                                  [0.886,0.375,0.174,0.817,0.574,0.319,0.812,0.314,0.098,0.741,0.847,0.202,0.31,0.073,0.179,0.917,0.64,0.388,0.116,0.72]

preds = predict(pred_probs,thresh)
print("Model Predictions: ", preds)

output : Model Predictions:[1,0,0,1,1,0,1,0,0,1,1,0,0,0,0,1,1,0,0,1]

Calculate the Model Accuracy

Now we will create a function from scratch called acc_score() that accepts as input a list of true labels and a list of model predictions, and then we will calculate the model accuracy.

def acc_score(true_value, pred_value):
    acc = 0
    for i in range(len(true_value)):
        if true_value[i] == pred_value[i]: 
            acc+=1
            
    acc = acc/len(true_value)
    return acc

Now Next, we will compute the accuracy score using our function acc_score, and pass as input the true labels and the model predictions we calculated above.

true_labels = [1,0,0,1,0,0,1,0,0,1,1,0,0,0,0,1,0,1,0,1] #as above mentioned
accuracy = acc_score(preds,labels)
print("Model Accuracy: ", accuracy)

output: Model Accuracy 0.85

Next, we will use Scikit-Learn’s accuracy_score() function to check that the value we computed using acc_score() is correct.

from sklearn.metrics import accuracy_score
print("Model Accuracy",accuracy_score(preds,labels))

output: Model Accuracy 0.85

As we can see from the above Model Accuracy by our function and Scikit-Learn’s function is the same as 0.85

Calculate the Model Error Rate

Error rate = 1 – Model Accuracy

We will create a function from scratch called error_rate() that accepts as input a list of true labels and a list of model predictions, and then we will calculate the model error rate.

def error_rate(preds,labels):
    err = 1 - acc_score(preds,labels)
    return err

Next, now compute the model error rate for the true labels and the model predictions.

error = error_rate(pred_probs,true_labels)
print("Model Error Rate: ", error)

output: Model Error Rate:  0.15000000000000002

Calculate the Model Precision and Recall

Before understanding precision and recall you must understand the: True Positive, True Negative, False Positive, False Negative.

True Positive: When the Model is Predicted it is positive and the Actual Label is also positive. example – When the model predicted India will win the World Cup and Actual India Wins i.e TP

True Negative: When Model Predicted it Negative and Actual Negative is also positive. example – When the model predicted India will not win the World Cup and Actual India doesn’t Win i.e FP

False Positive: When Model Predicted it is positive and the Actual Label is also Negative. example – When the model predicted India will win the World Cup and Actual India doesn’t Win i.e TN

False Negative: When Model Predicted it Negative and Actual Label is also positive. example – When the model predicted India will not win the World Cup and Actual India Wins i.e FN

Source: https://medium.com/@shrutisaxena0617/precision-vs-recall-386cf9f89488

We will create a function from scratch called prec_recall_score() that accepts as input a list of true labels and a list of model predictions and return both the model precision and recall.

def prec_recall_score(preds,labels):
    tp = 0
    tn = 0
    fp = 0
    fn = 0
    for i in range(len(preds)):
        if preds[i] ==1 and labels[i]==1: # true positive
            tp += 1
            
        if preds[i] ==0 and labels[i]==0: # true negative
            tn += 1
            
        if preds[i] ==1 and labels[i]==0: # false negative
            fn += 1
            
        if preds[i] ==0 and labels[i]==1: # false positive 
            fp += 1
    
    prec = tp /(tp+fp)
    recall = tp/(tp+fn)
    
    return prec, recall

Now we will use our prec_recall_score function to compute precision and recall for the true labels and the model predictions, we calculated previously.

precision, recall = prec_recall_score(pred_probs,true_labels)
print("Precision = ", precision)
print("Recall = ", recall)

output: Precision =  0.875     
        Recall =  0.7777777777777778

Next, we will use Scikit-Learn’s precision_score() and recall_score() to verify that our calculations above are correct:

from sklearn.metrics import precision_score ,recall_score
print("Precision = ", precision_score(pred_probs,true_labels))
print("Recall = ", precision_score(pred_probs,true_labels))

output: Precision =  0.875     
        Recall =  0.7777777777777778

As we can see from above by our function prec_recall_score and Scikit-Learn’s function is the same as the value of Precision & Recall.

Calculate the TPR and FPR for ROC Curve

If your true positive rate is 0.20 it approaches that on every occasion you call a positive, you have got a probability of 0.80 of being wrong. That is your false-positive rate.

TPR(True Positive Rate) =recall=TP / TP+FN

FPR(False Positive Rate) =FP/ FP+TN

We will create a function from scratch called TPR_FPR_score that is nearly identical to prec_recall_score that we did previously, which computes and returns TPR and FPR.

def TPR_FPR_score(pred_probs,true_labels):
    tp = 0
    tn = 0
    fp = 0
    fn = 0
    for i in range(len(preds)):
        if preds[i] ==1 and labels[i]==1:
            tp += 1
            
        if preds[i] ==0 and labels[i]==0:
            tn += 1
            
        if preds[i] ==1 and labels[i]==0:
            fn += 1
            
        if preds[i] ==0 and labels[i]==1:
            fp += 1
            
    TPR = tp/(tp+fn)
    FPR = fp/(fp+tn)
    return TPR,FPR

Compute and Plot the ROC Curve

ROC (Receiver Operating Characteristic curve): The ROC curve determines the how classification model is performing at every given threshold value This curve sets two boundaries:

TPR(True Positive Rate) =recall=TP / TP+FN

FPR(False Positive Rate) =FP/ FP+TN

The TPR will be plotted against the FPR in what is called the Receiver Operating Characteristic (ROC) curve.

ROC curve of binary classification of abnormal (TB+cancer, positives)... |  Download Scientific Diagram
Source: https://www.researchgate.net/figure/ROC-curve-of-binary-classification-of-abnormal-TB-cancer-positives-and-normal_fig8_323143806https://www.researchgate.net/figure/ROC-curve-of-binary-classification-of-abnormal-TB-cancer-positives-and-normal_fig8_323143806

AUC (Area Under the ROC Curve): That is, AUC measures the complete-dimensional location underneath the complete roc curve (think essential calculus) from (zero, zero) to (1,1).

AUC (Area under the ROC Curve).
Source: Google Search Image

We will create a function from scratch called roc_curve_compute that accepts (in this exact order) as input the true labels and prediction probabilities, as well as a list of threshold values. The function must compute and return the True Positive Rate (TPR, also called recall) and the False Positive Rate (FPR) (these are both scalar values) for each threshold value in the list that is passed to the function.

As an example, calling the roc_curve_compute function with the input true_labels = [1, 0, 1, 0, 0], pred_probs = [0.875, 0.325, 0.6, 0.09, 0.4], and thresholds = [0.00, 0.25, 0.50, 0.75, 1.00] yields the output TPR = [1.0, 1.0, 1.0, 0.5, 0.0] and FPR = [1.0, 0.6666, 0.0, 0.0, 0.0].

def roc_curve_computer(labels,prob,thres):
    TPR = []
    FPR = []
    for i in range(len(thres)):
        preds = predict(prob,thres[i])
        try:
            tpr,fpr = TPR_FPR_score(preds,labels)
            TPR.append(tpr)
            FPR.append(fpr)
        except ZeroDivisionError:
            tpr,fpr = 0,0
    return TPR,FPR

Next, we will use your roc_curve_compute function along with the threshold values thresholds = [x/100 for x in range(101)] to compute the TPR and FPR lists.

thresholds = [x/100 for x in range(101)]
TPR, FPR = roc_curve_computer(true_labels, pred_probs, thresholds)

Now we will use the following function to plot the ROC curve. Pass the FPR and TPR that we calculated above into the function.

import matplotlib.pyplot as plt
def plot_roc_curve(fpr, tpr, label=None):
    plt.plot(FPR, TPR, linewidth=2, label=label)
    plt.plot([0, 1], [0, 1], 'k--') # dashed diagonal line
    plt.title('Receiver Operating Characteristic', fontsize=12)
    plt.axis([-0.015, 1.0, 0, 1.02])
    plt.xlabel('False Positive Rate (Fall-Out)', fontsize=12)
    plt.ylabel('True Positive Rate (Recall)', fontsize=12)
    plt.grid(True)

plt.figure(figsize=(6, 4))
plot_roc_curve(FPR, TPR)
plt.show()

Next, we will compare our plot to the plot generated by Scikit-Learns roc_curve function. we will Use Scikit-Learns roc_curve function to calculate the false-positive rates and the true positive rates.

from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(labels, probs)

We will Pass the false-positive rates and the true positive rates obtained above via the Scikit-Learn functions as input to the plot_roc_curve function in order to compare ROC curves:

plt.figure(figsize=(6, 4))
plot_roc_curve(fpr, tpr)
plt.show()

From the above two result, it is clear that the curve calculated by our function and by sklearn functions is similar.

End Notes

I hope this blog quit help you in better understanding the classification metrics from scratch in both theory and practical expect. In the coming blog, we will more go depth in metrics.

Tag:machine learning, metrics

  • Share:
author avatar
Rehan

Previous post

Overfitting and Underfitting in Deep Learning
August 18, 2021

Next post

Decision Tree in Machine Learning
August 21, 2021

You may also like

nbc1
Naive Bayes in Machine Learning
28 September, 2021
featured (1)
Gradient Boosting In Machine Learning
19 September, 2021
support-vector-machine-cover
SVM in Machine Learning
15 September, 2021

Leave A Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Garbage Classification using CNN Model
  • Brain Tumor Prediction PyTorch[CNN]
  • Covid-19 X-ray prediction using Deep Learning
  • Data Analysis for Olympic 2021
  • Naive Bayes in Machine Learning

Categories

  • Data Science
  • Deep Learning
  • Machine Learning
  • Python

Archives

  • December 2021
  • November 2021
  • September 2021
  • August 2021
  • July 2021

(+91) 844 745 8168

[email protected]

COMPANY

  • Blog
  • Our Services
  • Contact Us

LINKS

  • Home
  • Blog
  • Activity
  • Checkout

RECOMMEND

  • Cart
  • Members
  • Sample Page
  • Shop

SUPPORT

  • Members
  • My account
  • Register
  • Shop

Copyright © 2021 AI New Generation

Become an instructor?

Join thousand of instructors and earn money hassle free!

Get started now

Login with your site account

Lost your password?

Not a member yet? Register now

Register a new account

Are you a member? Login now