• Home
  • Blog
  • Our Services
  • Contact Us
  • Register
Have any question?

(+91) 844 745 8168
[email protected]”>
RegisterLogin
AI Next Generation
  • Home
  • Blog
  • Our Services
  • Contact Us
  • Register

Machine Learning

  • Home
  • Blog
  • Machine Learning
  • Car Evaluation using Machine Learning

Car Evaluation using Machine Learning

  • Posted by Natasha
  • Date September 1, 2021
  • Comments 0 comment

Introduction

As you might have got some idea from the title of the blog itself, in this blog we will carry out predictions on a Car Evaluation using machine learning. Coming to the algorithm that we are going to use in order to carry out these predictions. We are going to do this prediction using three different algorithms namely Logistic Regression, Decision tree and Random forest. This is a classification problem i.e. we need to classify the car as, unacceptable, acceptable, good and very good based on a number of features. For better analysis of the problem you can download the dataset from below link.

Car Evaluation dataset: https://drive.google.com/file/d/1pqL9NdCrX1_xJYLZqStzpMmyEYXq0ot4/view?usp=sharing

Also, if you have not gone through the blogs relating to the algorithms which we are going to use here, we will recommend doing that first. The links have been added below for each one of these.

Logistic Regression: https://ainewgeneration.com/logistic-regression/

Decision Tree: https://ainewgeneration.com/decision-tree-in-machine-learning/

Random Forest: https://ainewgeneration.com/random-forest-in-machine-learning/

Table of Contents

  1. Data Analysis and Preprocessing
  2. Car Evaluation prediction using Logistic Regression
  3. Car Evaluation prediction using Decision Tree
  4. Car Evaluation prediction using Random Forest

1. Data Analysis and Preprocessing

import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,mean_squared_error

First things firsts. As we do in our every project, our foremost task is to import all the necessary libraries that are going to be used in our code.

df=pd.read_csv(r"D:\Project_Datasets\Car Evaluation\car_evaluation.csv")

Post the import of necessary libraries, we will read our dataset which is a comma- separated value form(csv file) and will convert that into a panda data frame named df.

df.head()
col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']


df.columns = col_names

col_names

output:
['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']
As we can see that the columns of our dataset have not been named, so here we are naming our columns and then printing them. The output of this code is also shown above.
df.head()
df.describe()
df.info()

output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1727 entries, 0 to 1726
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   buying    1727 non-null   object
 1   maint     1727 non-null   object
 2   doors     1727 non-null   object
 3   persons   1727 non-null   object
 4   lug_boot  1727 non-null   object
 5   safety    1727 non-null   object
 6   class     1727 non-null   object
dtypes: object(7)
memory usage: 94.6+ KB

Here, we are using the info() function to know about each column of the dataset. Next, we will use the isnull().sum() function to see if there are any null values in the data as shown below.

df.isnull().sum()

output:
buying      0
maint       0
doors       0
persons     0
lug_boot    0
safety      0
class       0
dtype: int64

The following code segments and their outputs show the various value types of features present in our dataset.

df["buying"].value_counts()

output:
med      432
high     432
low      432
vhigh    431
Name: buying, dtype: int64
df["maint"].value_counts()

output:
med      432
high     432
low      432
vhigh    431
Name: maint, dtype: int64
df["doors"].value_counts()

output:
5more    432
4        432
3        432
2        431
Name: doors, dtype: int64
df["persons"].value_counts()

output:
more    576
4       576
2       575
Name: persons, dtype: int64
df["lug_boot"].value_counts()

output:
big      576
med      576
small    575
Name: lug_boot, dtype: int64
df["safety"].value_counts()

output:
med     576
high    576
low     575
Name: safety, dtype: int64
df["class"].value_counts()

output:
unacc    1209
acc       384
good       69
vgood      65
Name: class, dtype: int64

As we see that all the features of our dataset are categorical in nature so we will create dummy variables for independent variables.

df = pd.get_dummies(df,columns=["buying",
"maint","doors","persons","lug_boot","safety"])

The class feature of our dataset is our target variable, so we are renaming it just for our own sake of understanding. You can skip this step if you want.

df=df.rename(columns={"class":"target"})
df.head()

Next, we are defining our independent categorical variable as shown below. Here, we are including all featres except our target(or dependent) variable.

cat_cols=list(set(df.columns)-{"target"})

If the target variable is also categorical, then we use label encoder to encode it and derive our results.

le=LabelEncoder()

y=le.fit_transform(df["target"])
X = df.drop(["target"], axis=1)

Then, just as we do for every machine learning project, we are splitting the dataset into test and train sets in 70: 30 ratio.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 42)

After this we are investigating the shape i.e. no. of rows and columns of our test and train dataset.

X_train.shape, X_test.shape

output:
(1157, 21), (570, 21))

2. Car Evaluation prediction using Logistic Regression

Finally, its time to build a model. First we will build a model using logistic regression.

from sklearn.linear_model import LogisticRegression

clf=LogisticRegression()
clf.fit(X_train, y_train)

output:
LogisticRegression()

After building a model, we will then predict the values of our test dataset.

test_pred=clf.predict(X_test)

After our prediction has taken place, we will now calculate the error in our test data and its predicted values.

mean_squared_error(y_test, test_pred)

output:
0.37894736842105264

After doing everything, our last task involves calculating the accuracy of our results.

accuracy_score(y_test, test_pred)

output:
0.9017543859649123

3. Car Evaluation prediction using Decision Tree

After carrying out prediction using Logistic Regression, next we will move on to making a model using Decision tree. First we are importing the library for the same, followed by fitting our model into it. It is to be noted that we are using gini index as our attribute selection measure for this decision tree model.

from sklearn.tree import DecisionTreeClassifier

max_depth = 5
clf_gini = DecisionTreeClassifier(criterion='gini', max_depth=max_depth, random_state=0)
# clf_gini = DecisionTreeClassifier(criterion='gini', random_state=0)

# fit the model
clf_gini.fit(X_train, y_train)

output:
DecisionTreeClassifier(max_depth=5, random_state=0)

After building a model, we will then predict the values of our test dataset.

y_pred_gini = clf_gini.predict(X_test)

After carrying our prediction, finally we are calculating accuracy of our model by using the following code:

print('Model accuracy score with criterion gini index: {0:0.4f}'. format(accuracy_score(y_test, y_pred_gini)))

output:
Model accuracy score with criterion gini index: 0.8404

4. Car Evaluation prediction using Random Forest

Now, next we will move on to making a model using Random Forest. First we are importing the algorithm which is a part of the sklearn’s ensemble module. Post this, we are building the model for our training dataset. It is to be noted that we are using a model involving 1000 estimators in this Random Forest model. Then we are carrying out predictions as shown in the code below:

from sklearn.ensemble import RandomForestClassifier

rf=RandomForestClassifier(n_estimators=1000, random_state=0)
rf.fit(X_train,y_train)
y_pred = rf.predict(X_test)

After all this, we are finally, calculating the error in the predictions made by our model on the test dataset. As we can see in the output, the error is as less as 14%.

mean_squared_error(y_test, y_pred)

output:
0.14210526315789473

Finally, we are using the accuracy_score metric to judge our model on the basis of its accuracy. As we can clearly see that the accuracy comes out to be as high as 96% which means our model is good enough.

accuracy_score(y_test, y_pred)

output:
0.9631578947368421

End Notes

We hope that this blog gave you a great idea about training models and carrying out predictions using different types of machine learning algorithms. We will come up with more of such insightful blogs in the future.

  • Share:
author avatar
Natasha

Previous post

ML Model Deployment Using Flask
September 1, 2021

Next post

Principal Components Analysis
September 2, 2021

You may also like

nbc1
Naive Bayes in Machine Learning
28 September, 2021
featured (1)
Gradient Boosting In Machine Learning
19 September, 2021
support-vector-machine-cover
SVM in Machine Learning
15 September, 2021

Leave A Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Garbage Classification using CNN Model
  • Brain Tumor Prediction PyTorch[CNN]
  • Covid-19 X-ray prediction using Deep Learning
  • Data Analysis for Olympic 2021
  • Naive Bayes in Machine Learning

Categories

  • Data Science
  • Deep Learning
  • Machine Learning
  • Python

Archives

  • December 2021
  • November 2021
  • September 2021
  • August 2021
  • July 2021

(+91) 844 745 8168

[email protected]

COMPANY

  • Blog
  • Our Services
  • Contact Us

LINKS

  • Home
  • Blog
  • Activity
  • Checkout

RECOMMEND

  • Cart
  • Members
  • Sample Page
  • Shop

SUPPORT

  • Members
  • My account
  • Register
  • Shop

Copyright © 2021 AI New Generation

Become an instructor?

Join thousand of instructors and earn money hassle free!

Get started now

Login with your site account

Lost your password?

Not a member yet? Register now

Register a new account

Are you a member? Login now