In today’s world, Machine Learning is one of the domains where many advancements in Science and Technology take place. This is mainly due to the abundance of data. The algorithms in Machine Learning helps industries find real-time solutions to real-time scenarios. For example, predicting future sales, to smart assistants at home.
Machine Learning is classified into Supervised Learning, Unsupervised Learning, and Reinforcement Learning.
Supervised Learning is further classified into Regression and Classification. Supervised learning is a type of learning method where both the input as well as the output are known to the machine.
In this particular article, we will focus on the most basic algorithm in machine learning i.e. linear regression.
Table of Contents
- Linear Regression
- Types of Linear Regression
- Evaluation Metrics for Regression Analysis
- Real Time Examples
- Assumptions in Linear Regression algorithm
- Benefits of Linear Regression
Linear Regression is one of the classic machine learning algorithms. The algorithm is a way to identify the relationship between two or more variables. As a result, the relationship helps us predict values for one variable for given values for another variable. In other words, linear regression establishes the relationship between a dependent and an independent variable. The relationship between the variables can be
understood with the help of the equation of the line. The equation is as follows,
Y = mx+c
Where Y is called the dependent variable or the predicted value and x is the independent variable or the input.
In machine learning and regression literature above equation is used in the form:
Y = w0+w1x
Where w0 is intercepted on the y-axis, w1 is the slope of the line, x is an explanatory variable and y is the response variable.
Let us now discuss some real-time scenarios of linear regression:
Types of Linear Regression
The two major types of Linear Regression are Simple Linear Regression and Multiple
1. Simple Linear Regression:
Simple Linear Regression is where there is only one independent variable and the model helps in establishing the relationship with the dependent variable. The relationship between the variables can be understood with the help of the equation of the line. The equation is as follows,
Y = mx+c
Where Y is called the dependent variable or the predicted value and x is the independent variable or the input and, m is the slope.
If m>0, then X (independent variable) and Y (dependent variable) have a positive relationship and as a result, if Y increases along with the X.
If m<0, then X (independent variable) and Y (dependent variable) have a negative relationship and as a result, Y will decrease as X increases.
2. Multiple Linear Regression:
Multiple Linear Regression is where more than one independent variable exists and
the model helps in establishing the relationship with the dependent variable.
Y = m0+m1x1+m2x2+m3x3+…..+mNxN,
Where Y is called the dependent variable, m0 is the intercept, and m1,m2,m3 …mN are slopes of the independent variable.
Evaluation Metrics for Regression Analysis
There are four main metrics of evaluation to evaluate a model’s performance. They are Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Adjusted R Squared and R squared.
1) Mean Squared Error (MSE): It is one of the most commonly used evaluation metrics used in linear regression. It calculates the squared difference between the actual value and the predicted value.
2) Root Mean Squared Error (RMSE): This metric evaluates the root of the mean difference between the actual and the predicted value.
3) Adjusted R squared: It considers the features which are important for the model. Adjusted R2 is always lower than R2.
4) R squared or Coefficient of Determination: It can be defined as a Ratio of variation to the Total Variation. The value lies between 0 to 1.
Real Time Examples
Let us now discuss some real-time scenarios as an example for the Linear Regression algorithm.
1) We are given medical data about some patients. The dataset consists of numerical information such as the range of blood pressure, age of the patient, etc. This is a classic example of Simple Linear Regression, as we can build and train the linear regression model to predict the blood pressure range according to different age groups. Here, we can clearly see that age can be the input/ independent variable, and blood pressure is the output/ dependent variable. The equation can be written as follows,
Blood Pressure(Y) = m*Age+c
2) Now let us consider a dataset containing details about cars. The data consists of mileage, model of the car, type, sale price etc. The sale price is the dependent variable and it depends on mileage, model of the car, type. The equation can be written as follows,
Sale Price (Y) = m0+ m1*Mileage + m1*Model + m1*Type.
Assumptions in Linear Regression Algorithm
The assumptions help a data science engineer to identify whether a dataset is eligible for using a linear regression model. These assumptions are as follows:
1. A linear relationship between features and the target variable:
It is assumed by a Linear Regression algorithm that the relationship between the independent variable(X) and the dependent variable (Y) is linear.
2. No Multicollinearity between features:
There must be no correlation between the independent variables (m1,m2,m3….mN). The independent variables must not relate to each other.
In simple words, independent variables must not be highly correlated to each other.
3. No Heteroscedasticity:
The residuals present must be even for all values of X. This can be verified by plotting a residual plot, where the assumption goes wrong when the points form a funnel shape.
Benefits of Linear Regression
4. Deploys and Performs well on Online Settings
I hope this blog helped in understanding the details of Linear Regression. In the next blog, we will cover the implementation of Linear Regression using a House price prediction data set https://ainewgeneration.com/house-price-prediction-using-linear-regression/.