• Home
  • Blog
  • Our Services
  • Contact Us
  • Register
Have any question?

(+91) 844 745 8168
[email protected]”>
RegisterLogin
AI Next Generation
  • Home
  • Blog
  • Our Services
  • Contact Us
  • Register

Machine Learning

  • Home
  • Blog
  • Machine Learning
  • Gradient Boosting In Machine Learning

Gradient Boosting In Machine Learning

  • Posted by Rehan
  • Date September 19, 2021
  • Comments 0 comment

Introduction

In this blog, we will focus on Gradient Boosting in Machine Learning You must be clear in the concepts of Ensemble Learning https://ainewgeneration.com/random-forest-in-machine-learning/ and Boosting algorithm in Machine Learning https://ainewgeneration.com/boosting-in-machine-learning/. Gradient Boosting Algorithm in which sets of machine learning algorithm generally weak learner are combined to generate a strong predictive model usually Decision Tree is used as base learner. We usually prepared the Gradient Boosting algorithm when we are dealing with complex datasets.

Table of Contents

  1. Gradient Boosting In Machine Learning.
    • Parallel Ensembel.
    • Sequential Ensembel.
  2. AdaBoost vs GradientBoost.
    • AdaBoost.
    • GradientBoost.
  3. Working of Gradient Boosting Algorithm.
    • Hyperparameters: Learning rate and n_estimators.
  4. Implementaion Gradient Boosting In Machine Learning

Gradient Boosting In Machine Learning

Gradient Boosting is Sequential Ensemble method which is a collection of weak machine learning algorithms usually base learner is Decision Tree. There are two types of Machine Learning ensembles 1st is parallel Ensemble and 2nd sequential Ensemble.

  • Parallel Ensemble: In Parallel Ensembel in which sets of models are trained in parallel and there prediction are agregiated together to make the final prediction.The base estimator model are typically strong model.

Random Forest and Bagging are examples of the Parallel Ensemble Model.

Source: Google Image Search

  • Sequential Ensemble: In Sequential Ensembel in which sets of model are trained in sequentially. The base estimator models are typically weak model.

In Sequential Ensemble model takes the dataset which builds a first model then model tries to identify where they are falling and gives higher weighted to particular wrong classifier data which further transfer to next model for improvement on wrong classification data predicted by the previous model and again transfer to next model soo on they work in sequentially way means trained the models one by one.

Gradient Boosting and AdaBoost is an example of Sequential Ensemble Model

Source: Google Search Image

AdaBoost vs GradientBoost.

In Sequentially Boosting Algorithm the key point is to identify where the model has done misclassify the data and give more focus on misclassifying data to the base learner in a sequentially way to increase the accuracy.

AdaBoost

In AdaBoost Boosting Algorithm the weights are updated for all the misclassified data in which the model can’t predict correctly their weights are increased means more weight is given to those records for the next model to predict in a sequentially way

In the above example square vs circle binary classification model where the model had misclassified between circle and square In the AdaBoost model, the misclassified data will assign higher weights than of correctly classified by the model.

Gradient Boosting

In the Gradient Boosting algorithm, the first step is similar to AdaBoost where the model had misclassified the data point it focuses on the wrong classified data value but in place assigning a higher weighted to the wrong classified data point it uses Residual which means an error (between True label and Predicted Label). The misclassified example has large residuals or loss gradients. The GBA minimizes the error between predicted and true values in order to increase the accuracy of the new model.

In the above example, square vs circle binary classification model where the model had misclassified between circle and square In Gradient Boosting Algorithm which competes for the residual as well as magnitude basically with the help of magnitude it gives ideas which way model should push and adjust its behavior in order to make a new model with higher accuracy.

gradient boosting explained

Working of Gradient Boosting Algorithm.

Gradient Boosting Trees for Classification: A Beginner's Guide | by  Aratrika Pal | The Startup | Medium

In Gradient Boosting Algorithm we compute loss which directly indicates how our model is performing. Let’s suppose we are using the loss function i.e. mean square loss which measures the square difference between the True Lable and Actual Label. if the difference between a true label and an actual label is less then the model accuracy is high otherwise model has low accuracy In order to minimize the difference between a true label and an actual label we can use a gradient boosting algorithm.

In order to decrease the MSE of the model, we want our model should predict correctly. which loss function can be optimized with the help of gradient descent updating our prediction based on learning rate through which we can find the value where MSE is low.

Therefore, we are basically reviewing predictions such that our residual value is closer to 0 (or minimum) and predicted values ​​are close enough to actual values.

Hyperparameters: Learning rate and n_estimators.

Hyperparameter is a key part of the learning algorithms. which on tunning might increase the accuracy of learning model as in Gradient Boosting Algorithm you will deal with Learning rate and n_estimators.

Learning rate: The Learning rate indicates how fast the model is learning. It is denoted by α, by default learning rate is 0.01. In GBA what residual or error made by the previous model is multiplied by the learning rate. The lower the learning rate the less prone to overfitting and slower the model learns.

n_estimator: The n_estimator is the number of trees used by the model if the learning rate is low then n_estimator should be approx 50-100 but be careful it may prone to overfitting.

Maths behind Gradient Boosting Algorithm

Suppose target value is y_actual and features values are x. on the basic of features x model predict target y_pred. The differences between y_pred and y_actual are called residual or error. on the basis of minimizing the error in the model, gradient boosting builds successive trees.

Let’s say the model output y when which fit to only 1 decision tree, is given by:

y=A1+(B1∗X)+e1 where, e_1 is the loss or residual from this decision tree.

In developing a gradient, we measure consecutive decision trees in the loss or residual of the final. Therefore when gradient increments are used in this model, successive decision trees will be represented as

e1=A2+(B2∗X)+e2

e2=A3+(B3∗X)+e3

Note that here we only used 3 decision trees to stop at 3 decision trees, but in an actual, the gradient boosting model, use 50-100 decision trees or the number of weak learners

To sum up all three equations, the final model of the decision tree will be provided by

y=A1+A2+A3+(B1∗x)+(B2∗x)+(B3∗x)+e3

Implementaion of GBA

We will use Titanic Dataset to predict whether passengers are survived or not based on no. of features.

You can find the dataset on Kaggle: https://www.kaggle.com/c/titanic/data

Importing required library.

%matplotlib notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report, 
      confusion_matrix, roc_curve, auc

Loading Training & Testing Model


train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
train.head()

Set “PassengerId” variable as index


train.set_index("PassengerId", inplace=True)
test.set_index("PassengerId", inplace=True)

Generate training target set (y_train)


y_train = train["Survived"]

output: 
PassengerId
1      0
2      1
3      1
4      1
5      0
      ..
887    0
888    1
889    0
890    1
891    0
Name: Survived, Length: 891, dtype: int64

Delete column “Survived” from train set


train.drop(labels="Survived", axis=1, inplace=True)

Shapes of train and test sets


print("Train shape :" ,train.shape)
print("Test shape :", test.shape)

output: 
Train shape : (891, 10)
Test shape : (418, 10)

Join train and test sets to form a new train_test set

train_test =  train.append(test)
train_test.head()

Delete columns that are not used as features for training and prediction


columns_to_drop = ["Name", "Age", "SibSp", "Ticket", "Cabin", "Parch", "Embarked"]
train_test.drop(labels=columns_to_drop, axis=1, inplace=True)

Replace the nulls with 0.0


train_test_dummies.fillna(value=0.0, inplace=True)

Generate feature sets (X)

X_train = train_test_dummies.values[0:891]
X_test = train_test_dummies.values[891:]X_train.shape, X_test.shape

output: 
((891, 4), (418, 4))

Scalling the Dataset

scaler = MinMaxScaler()
X_train_scale = scaler.fit_transform(X_train)
X_test_scale = scaler.transform(X_test)

Split training feature and target sets into training and validation subsets


X_train_sub, X_validation_sub, y_train_sub, y_validation_sub = train_test_split(X_train_scale, y_train, random_state=0)

Train with Gradient Boosting algorithm

compute the accuracy scores on train and validation sets when training with different learning rates


learning_rates = [0.05, 0.1, 0.25, 0.5, 0.75, 1]
for learning_rate in learning_rates:
    gb = GradientBoostingClassifier(n_estimators=20, learning_rate = learning_rate, max_features=2, max_depth = 2, random_state = 0)
    gb.fit(X_train_sub, y_train_sub)
    print("Learning rate: ", learning_rate)
    print("Accuracy score (training): {0:.3f}".format(gb.score(X_train_sub, y_train_sub)))
    print("Accuracy score (validation): {0:.3f}".format(gb.score(X_validation_sub, y_validation_sub)))
    print()

output:
Learning rate:  0.05
Accuracy score (training): 0.789
Accuracy score (validation): 0.780

Learning rate:  0.1
Accuracy score (training): 0.792
Accuracy score (validation): 0.780

Learning rate:  0.25
Accuracy score (training): 0.816
Accuracy score (validation): 0.803

Learning rate:  0.5
Accuracy score (training): 0.826
Accuracy score (validation): 0.834

Learning rate:  0.75
Accuracy score (training): 0.831
Accuracy score (validation): 0.789

Learning rate:  1
Accuracy score (training): 0.831
Accuracy score (validation): 0.789

Output confusion matrix and classification report of Gradient Boosting algorithm on validation set


gb = GradientBoostingClassifier(n_estimators=20, learning_rate = 0.5, max_features=2, max_depth = 2, random_state = 0)
gb.fit(X_train_sub, y_train_sub)
predictions = gb.predict(X_validation_sub)

print("Confusion Matrix:")
print(confusion_matrix(y_validation_sub, predictions))
print()
print("Classification Report")
print(classification_report(y_validation_sub, predictions))

output:
[[31   8]
 [ 29  55]]

Classification Report
              precision    recall  f1-score   support

           0       0.82      0.94      0.88       139
           1       0.87      0.65      0.75        84

    accuracy                           0.83       223
   macro avg       0.85      0.80      0.81       223
weighted avg       0.84      0.83      0.83       223

End Notes

I hope Gradient Boosting in Machine Learning Algorithm was clearly explained with basics to Implementation. In the next article, we will go through XGBOOST in the machine learning algorithms.

Tag:machine learning

  • Share:
author avatar
Rehan

Previous post

K-Means Clustering in Machine Learning
September 19, 2021

Next post

Naive Bayes in Machine Learning
September 28, 2021

You may also like

nbc1
Naive Bayes in Machine Learning
28 September, 2021
support-vector-machine-cover
SVM in Machine Learning
15 September, 2021
boosting_algorithms
Boosting in Machine Learning
13 September, 2021

Leave A Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Garbage Classification using CNN Model
  • Brain Tumor Prediction PyTorch[CNN]
  • Covid-19 X-ray prediction using Deep Learning
  • Data Analysis for Olympic 2021
  • Naive Bayes in Machine Learning

Categories

  • Data Science
  • Deep Learning
  • Machine Learning
  • Python

Archives

  • December 2021
  • November 2021
  • September 2021
  • August 2021
  • July 2021

(+91) 844 745 8168

[email protected]

COMPANY

  • Blog
  • Our Services
  • Contact Us

LINKS

  • Home
  • Blog
  • Activity
  • Checkout

RECOMMEND

  • Cart
  • Members
  • Sample Page
  • Shop

SUPPORT

  • Members
  • My account
  • Register
  • Shop

Copyright © 2021 AI New Generation

Become an instructor?

Join thousand of instructors and earn money hassle free!

Get started now

Login with your site account

Lost your password?

Not a member yet? Register now

Register a new account

Are you a member? Login now