Classification Metrics in Machine Learning
Introduction
In this blog, we will deeply implement from scratch for classification metrics in Machine Learning and then we will match with the sklearn libraries for different kinds of metrics before going into detail you must go through our previous blog on machine learning https://ainewgeneration.com/logistic-regression/. The purpose of this blog is to familiarize you with the metrics used to measure prediction performance in classification systems. Now let’s get started and build every classification matrices from scratch and compare the results with pre-build sklearn libraries.
Table of Content
- pseudo Data for Classification Metrics
- Calculate Model Predictions
- Calculate the Model Accuracy
- Calculate the Model Error Rate
- Calculate the Model Precision and Recall
- Calculate the TPR and FPR for ROC Curve
- Compute and Plot the ROC Curve
pseudo Data for Classification Metrics
Suppose there are 20 binary observations whose target values are:
true_labels = [1,0,0,1,0,0,1,0,0,1,1,0,0,0,0,1,0,1,0,1]
Suppose that your machine learning model returns prediction probabilities are :
pred_probs =
[0.886,0.375,0.174,0.817,0.574,0.319,0.812,0.314,0.098,0.741,0.847,0.202,0.31,0.073,0.179,0.917,0.64,0.388,0.116,0.72]
We are going to use pseudo data true_labels as our y_test and pred_prob as our probability predicted by our model.
Calculate Model Prediction
So Begin by writing a function from scratch called predict()
that accepts as input a list of prediction probabilities and a threshold value and computes the final predictions to be output by the model. If a prediction probability value is less than the threshold value, then the prediction is the negative case (i.e. 0). If a prediction probability value is greater than the threshold value, then the prediction is the positive case (i.e. 1).
def predict(pred_prob,threshold):
pred = []
for i in range(len(pred_prob)):
if list[i] >= threshold:
pred.append(1)
else:
pred.append(0)
return pred
Next, we will invoke the predict()
function to calculate the model predictions using the threshold value of 0.5 and pred_probs list.
thresh = 0.5
pred_probs = [0.886,0.375,0.174,0.817,0.574,0.319,0.812,0.314,0.098,0.741,0.847,0.202,0.31,0.073,0.179,0.917,0.64,0.388,0.116,0.72]
preds = predict(pred_probs,thresh)
print("Model Predictions: ", preds)
output : Model Predictions:[1,0,0,1,1,0,1,0,0,1,1,0,0,0,0,1,1,0,0,1]
Calculate the Model Accuracy
Now we will create a function from scratch called acc_score()
that accepts as input a list of true labels and a list of model predictions, and then we will calculate the model accuracy.
def acc_score(true_value, pred_value):
acc = 0
for i in range(len(true_value)):
if true_value[i] == pred_value[i]:
acc+=1
acc = acc/len(true_value)
return acc
Now Next, we will compute the accuracy score using our function acc_score
, and pass as input the true labels and the model predictions we calculated above.
true_labels = [1,0,0,1,0,0,1,0,0,1,1,0,0,0,0,1,0,1,0,1] #as above mentioned
accuracy = acc_score(preds,labels)
print("Model Accuracy: ", accuracy)
output: Model Accuracy 0.85
Next, we will use Scikit-Learn’s accuracy_score() function to check that the value we computed using acc_score()
is correct.
from sklearn.metrics import accuracy_score
print("Model Accuracy",accuracy_score(preds,labels))
output: Model Accuracy 0.85
As we can see from the above Model Accuracy by our function and Scikit-Learn’s function is the same as 0.85
Calculate the Model Error Rate
Error rate = 1 – Model Accuracy
We will create a function from scratch called error_rate()
that accepts as input a list of true labels and a list of model predictions, and then we will calculate the model error rate.
def error_rate(preds,labels):
err = 1 - acc_score(preds,labels)
return err
Next, now compute the model error rate for the true labels and the model predictions.
error = error_rate(pred_probs,true_labels)
print("Model Error Rate: ", error)
output: Model Error Rate: 0.15000000000000002
Calculate the Model Precision and Recall
Before understanding precision and recall you must understand the: True Positive, True Negative, False Positive, False Negative.
True Positive: When the Model is Predicted it is positive and the Actual Label is also positive. example – When the model predicted India will win the World Cup and Actual India Wins i.e TP
True Negative: When Model Predicted it Negative and Actual Negative is also positive. example – When the model predicted India will not win the World Cup and Actual India doesn’t Win i.e FP
False Positive: When Model Predicted it is positive and the Actual Label is also Negative. example – When the model predicted India will win the World Cup and Actual India doesn’t Win i.e TN
False Negative: When Model Predicted it Negative and Actual Label is also positive. example – When the model predicted India will not win the World Cup and Actual India Wins i.e FN

We will create a function from scratch called prec_recall_score()
that accepts as input a list of true labels and a list of model predictions and
return both the model precision and recall.
def prec_recall_score(preds,labels):
tp = 0
tn = 0
fp = 0
fn = 0
for i in range(len(preds)):
if preds[i] ==1 and labels[i]==1: # true positive
tp += 1
if preds[i] ==0 and labels[i]==0: # true negative
tn += 1
if preds[i] ==1 and labels[i]==0: # false negative
fn += 1
if preds[i] ==0 and labels[i]==1: # false positive
fp += 1
prec = tp /(tp+fp)
recall = tp/(tp+fn)
return prec, recall
Now we will use our prec_recall_score
function to compute precision
and recall
for the true labels and the model predictions, we calculated previously.
precision, recall = prec_recall_score(pred_probs,true_labels)
print("Precision = ", precision)
print("Recall = ", recall)
output: Precision = 0.875
Recall = 0.7777777777777778
Next, we will use Scikit-Learn’s precision_score()
and recall_score()
to verify that our calculations above are correct:
from sklearn.metrics import precision_score ,recall_score
print("Precision = ", precision_score(pred_probs,true_labels))
print("Recall = ", precision_score(pred_probs,true_labels))
output: Precision = 0.875
Recall = 0.7777777777777778
As we can see from above by our function prec_recall_score
and Scikit-Learn’s function is the same as the value of Precision & Recall.
Calculate the TPR and FPR for ROC Curve
If your true positive rate is 0.20 it approaches that on every occasion you call a positive, you have got a probability of 0.80 of being wrong. That is your false-positive rate.
TPR(True Positive Rate) =recall=TP / TP+FN
FPR(False Positive Rate) =FP/ FP+TN
We will create a function from scratch called TPR_FPR_score
that is nearly identical to prec_recall_score
that we did previously, which computes and returns TPR and FPR.
def TPR_FPR_score(pred_probs,true_labels):
tp = 0
tn = 0
fp = 0
fn = 0
for i in range(len(preds)):
if preds[i] ==1 and labels[i]==1:
tp += 1
if preds[i] ==0 and labels[i]==0:
tn += 1
if preds[i] ==1 and labels[i]==0:
fn += 1
if preds[i] ==0 and labels[i]==1:
fp += 1
TPR = tp/(tp+fn)
FPR = fp/(fp+tn)
return TPR,FPR
Compute and Plot the ROC Curve
ROC (Receiver Operating Characteristic curve): The ROC curve determines the how classification model is performing at every given threshold value This curve sets two boundaries:
TPR(True Positive Rate) =recall=TP / TP+FN
FPR(False Positive Rate) =FP/ FP+TN
The TPR will be plotted against the FPR in what is called the Receiver Operating Characteristic (ROC) curve.
AUC (Area Under the ROC Curve): That is, AUC measures the complete-dimensional location underneath the complete roc curve (think essential calculus) from (zero, zero) to (1,1).
We will create a function from scratch called roc_curve_compute
that accepts (in this exact order) as input the true labels and prediction probabilities, as well as a list of threshold values. The function must compute and return the True Positive Rate (TPR, also called recall) and the False Positive Rate (FPR) (these are both scalar values) for each threshold value in the list that is passed to the function.
As an example, calling the roc_curve_compute
function with the input true_labels = [1, 0, 1, 0, 0]
, pred_probs = [0.875, 0.325, 0.6, 0.09, 0.4]
, and thresholds = [0.00, 0.25, 0.50, 0.75, 1.00]
yields the output TPR = [1.0, 1.0, 1.0, 0.5, 0.0]
and FPR = [1.0, 0.6666, 0.0, 0.0, 0.0]
.
def roc_curve_computer(labels,prob,thres):
TPR = []
FPR = []
for i in range(len(thres)):
preds = predict(prob,thres[i])
try:
tpr,fpr = TPR_FPR_score(preds,labels)
TPR.append(tpr)
FPR.append(fpr)
except ZeroDivisionError:
tpr,fpr = 0,0
return TPR,FPR
Next, we will use your roc_curve_compute
function along with the threshold values thresholds = [x/100 for x in range(101)]
to compute the TPR and FPR lists.
thresholds = [x/100 for x in range(101)]
TPR, FPR = roc_curve_computer(true_labels, pred_probs, thresholds)
Now we will use the following function to plot the ROC curve. Pass the FPR and TPR that we calculated above into the function.
import matplotlib.pyplot as plt
def plot_roc_curve(fpr, tpr, label=None):
plt.plot(FPR, TPR, linewidth=2, label=label)
plt.plot([0, 1], [0, 1], 'k--') # dashed diagonal line
plt.title('Receiver Operating Characteristic', fontsize=12)
plt.axis([-0.015, 1.0, 0, 1.02])
plt.xlabel('False Positive Rate (Fall-Out)', fontsize=12)
plt.ylabel('True Positive Rate (Recall)', fontsize=12)
plt.grid(True)
plt.figure(figsize=(6, 4))
plot_roc_curve(FPR, TPR)
plt.show()

Next, we will compare our plot to the plot generated by Scikit-Learns roc_curve
function. we will Use Scikit-Learns roc_curve
function to calculate the false-positive rates and the true positive rates.
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(labels, probs)
We will Pass the false-positive rates and the true positive rates obtained above via the Scikit-Learn functions as input to the plot_roc_curve
function in order to compare ROC curves:
plt.figure(figsize=(6, 4))
plot_roc_curve(fpr, tpr)
plt.show()

From the above two result, it is clear that the curve calculated by our function and by sklearn functions is similar.
End Notes
I hope this blog quit help you in better understanding the classification metrics from scratch in both theory and practical expect. In the coming blog, we will more go depth in metrics.
Tag:machine learning, metrics