Gradient Boosting In Machine Learning
Introduction
In this blog, we will focus on Gradient Boosting in Machine Learning You must be clear in the concepts of Ensemble Learning https://ainewgeneration.com/random-forest-in-machine-learning/ and Boosting algorithm in Machine Learning https://ainewgeneration.com/boosting-in-machine-learning/. Gradient Boosting Algorithm in which sets of machine learning algorithm generally weak learner are combined to generate a strong predictive model usually Decision Tree is used as base learner. We usually prepared the Gradient Boosting algorithm when we are dealing with complex datasets.
Table of Contents
- Gradient Boosting In Machine Learning.
- Parallel Ensembel.
- Sequential Ensembel.
- AdaBoost vs GradientBoost.
- AdaBoost.
- GradientBoost.
- Working of Gradient Boosting Algorithm.
- Hyperparameters: Learning rate and n_estimators.
- Implementaion Gradient Boosting In Machine Learning
Gradient Boosting In Machine Learning
Gradient Boosting is Sequential Ensemble method which is a collection of weak machine learning algorithms usually base learner is Decision Tree. There are two types of Machine Learning ensembles 1st is parallel Ensemble and 2nd sequential Ensemble.
- Parallel Ensemble: In Parallel Ensembel in which sets of models are trained in parallel and there prediction are agregiated together to make the final prediction.The base estimator model are typically strong model.
Random Forest and Bagging are examples of the Parallel Ensemble Model.

- Sequential Ensemble: In Sequential Ensembel in which sets of model are trained in sequentially. The base estimator models are typically weak model.
In Sequential Ensemble model takes the dataset which builds a first model then model tries to identify where they are falling and gives higher weighted to particular wrong classifier data which further transfer to next model for improvement on wrong classification data predicted by the previous model and again transfer to next model soo on they work in sequentially way means trained the models one by one.
Gradient Boosting and AdaBoost is an example of Sequential Ensemble Model

AdaBoost vs GradientBoost.
In Sequentially Boosting Algorithm the key point is to identify where the model has done misclassify the data and give more focus on misclassifying data to the base learner in a sequentially way to increase the accuracy.
AdaBoost
In AdaBoost Boosting Algorithm the weights are updated for all the misclassified data in which the model can’t predict correctly their weights are increased means more weight is given to those records for the next model to predict in a sequentially way

In the above example square vs circle binary classification model where the model had misclassified between circle and square In the AdaBoost model, the misclassified data will assign higher weights than of correctly classified by the model.
Gradient Boosting
In the Gradient Boosting algorithm, the first step is similar to AdaBoost where the model had misclassified the data point it focuses on the wrong classified data value but in place assigning a higher weighted to the wrong classified data point it uses Residual which means an error (between True label and Predicted Label). The misclassified example has large residuals or loss gradients. The GBA minimizes the error between predicted and true values in order to increase the accuracy of the new model.

In the above example, square vs circle binary classification model where the model had misclassified between circle and square In Gradient Boosting Algorithm which competes for the residual as well as magnitude basically with the help of magnitude it gives ideas which way model should push and adjust its behavior in order to make a new model with higher accuracy.

Working of Gradient Boosting Algorithm.

In Gradient Boosting Algorithm we compute loss which directly indicates how our model is performing. Let’s suppose we are using the loss function i.e. mean square loss which measures the square difference between the True Lable and Actual Label. if the difference between a true label and an actual label is less then the model accuracy is high otherwise model has low accuracy In order to minimize the difference between a true label and an actual label we can use a gradient boosting algorithm.

In order to decrease the MSE of the model, we want our model should predict correctly. which loss function can be optimized with the help of gradient descent updating our prediction based on learning rate through which we can find the value where MSE is low.

Therefore, we are basically reviewing predictions such that our residual value is closer to 0 (or minimum) and predicted values are close enough to actual values.
Hyperparameters: Learning rate and n_estimators.
Hyperparameter is a key part of the learning algorithms. which on tunning might increase the accuracy of learning model as in Gradient Boosting Algorithm you will deal with Learning rate and n_estimators.
Learning rate: The Learning rate indicates how fast the model is learning. It is denoted by α, by default learning rate is 0.01. In GBA what residual or error made by the previous model is multiplied by the learning rate. The lower the learning rate the less prone to overfitting and slower the model learns.
n_estimator: The n_estimator is the number of trees used by the model if the learning rate is low then n_estimator should be approx 50-100 but be careful it may prone to overfitting.
Maths behind Gradient Boosting Algorithm
Suppose target value is y_actual and features values are x. on the basic of features x model predict target y_pred. The differences between y_pred and y_actual are called residual or error. on the basis of minimizing the error in the model, gradient boosting builds successive trees.
Let’s say the model output y
when which fit to only 1 decision tree, is given by:
y=A1+(B1∗X)+e1 where, e_1
is the loss or residual from this decision tree.
In developing a gradient, we measure consecutive decision trees in the loss or residual of the final. Therefore when gradient increments are used in this model, successive decision trees will be represented as
e1=A2+(B2∗X)+e2
e2=A3+(B3∗X)+e3
Note that here we only used 3 decision trees to stop at 3 decision trees, but in an actual, the gradient boosting model, use 50-100 decision trees or the number of weak learners
To sum up all three equations, the final model of the decision tree will be provided by
y=A1+A2+A3+(B1∗x)+(B2∗x)+(B3∗x)+e3
Implementaion of GBA
We will use Titanic Dataset to predict whether passengers are survived or not based on no. of features.
You can find the dataset on Kaggle: https://www.kaggle.com/c/titanic/data
Importing required library.
%matplotlib notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report,
confusion_matrix, roc_curve, auc
Loading Training & Testing Model
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
train.head()

Set “PassengerId” variable as index
train.set_index("PassengerId", inplace=True)
test.set_index("PassengerId", inplace=True)

Generate training target set (y_train)
y_train = train["Survived"]
output:
PassengerId
1 0
2 1
3 1
4 1
5 0
..
887 0
888 1
889 0
890 1
891 0
Name: Survived, Length: 891, dtype: int64
Delete column “Survived” from train set
train.drop(labels="Survived", axis=1, inplace=True)
Shapes of train and test sets
print("Train shape :" ,train.shape)
print("Test shape :", test.shape)
output:
Train shape : (891, 10)
Test shape : (418, 10)
Join train and test sets to form a new train_test set
train_test = train.append(test)
train_test.head()

Delete columns that are not used as features for training and prediction
columns_to_drop = ["Name", "Age", "SibSp", "Ticket", "Cabin", "Parch", "Embarked"]
train_test.drop(labels=columns_to_drop, axis=1, inplace=True)
Replace the nulls with 0.0
train_test_dummies.fillna(value=0.0, inplace=True)
Generate feature sets (X)
X_train = train_test_dummies.values[0:891]
X_test = train_test_dummies.values[891:]X_train.shape, X_test.shape
output:
((891, 4), (418, 4))
Scalling the Dataset
scaler = MinMaxScaler()
X_train_scale = scaler.fit_transform(X_train)
X_test_scale = scaler.transform(X_test)
Split training feature and target sets into training and validation subsets
X_train_sub, X_validation_sub, y_train_sub, y_validation_sub = train_test_split(X_train_scale, y_train, random_state=0)
Train with Gradient Boosting algorithm
compute the accuracy scores on train and validation sets when training with different learning rates
learning_rates = [0.05, 0.1, 0.25, 0.5, 0.75, 1]
for learning_rate in learning_rates:
gb = GradientBoostingClassifier(n_estimators=20, learning_rate = learning_rate, max_features=2, max_depth = 2, random_state = 0)
gb.fit(X_train_sub, y_train_sub)
print("Learning rate: ", learning_rate)
print("Accuracy score (training): {0:.3f}".format(gb.score(X_train_sub, y_train_sub)))
print("Accuracy score (validation): {0:.3f}".format(gb.score(X_validation_sub, y_validation_sub)))
print()
output:
Learning rate: 0.05
Accuracy score (training): 0.789
Accuracy score (validation): 0.780
Learning rate: 0.1
Accuracy score (training): 0.792
Accuracy score (validation): 0.780
Learning rate: 0.25
Accuracy score (training): 0.816
Accuracy score (validation): 0.803
Learning rate: 0.5
Accuracy score (training): 0.826
Accuracy score (validation): 0.834
Learning rate: 0.75
Accuracy score (training): 0.831
Accuracy score (validation): 0.789
Learning rate: 1
Accuracy score (training): 0.831
Accuracy score (validation): 0.789
Output confusion matrix and classification report of Gradient Boosting algorithm on validation set
gb = GradientBoostingClassifier(n_estimators=20, learning_rate = 0.5, max_features=2, max_depth = 2, random_state = 0)
gb.fit(X_train_sub, y_train_sub)
predictions = gb.predict(X_validation_sub)
print("Confusion Matrix:")
print(confusion_matrix(y_validation_sub, predictions))
print()
print("Classification Report")
print(classification_report(y_validation_sub, predictions))
output:
[[31 8]
[ 29 55]]
Classification Report
precision recall f1-score support
0 0.82 0.94 0.88 139
1 0.87 0.65 0.75 84
accuracy 0.83 223
macro avg 0.85 0.80 0.81 223
weighted avg 0.84 0.83 0.83 223
End Notes
I hope Gradient Boosting in Machine Learning Algorithm was clearly explained with basics to Implementation. In the next article, we will go through XGBOOST in the machine learning algorithms.
Tag:machine learning