• Home
  • Blog
  • Our Services
  • Contact Us
  • Register
Have any question?

(+91) 844 745 8168
[email protected]”>
RegisterLogin
AI Next Generation
  • Home
  • Blog
  • Our Services
  • Contact Us
  • Register

Data Science

  • Home
  • Blog
  • Data Science
  • Data Analysis with Pandas

Data Analysis with Pandas

  • Posted by Rehan
  • Date September 18, 2021
  • Comments 0 comment

Introduction

In this article, let us learn using Pandas for Data Analysis. Pandas is a very useful library in Data Analysis with python. You can manipulate and analyze data with the use of Pandas. Pandas is used to load the datasets and  it also supports datasets from various files like CSV, Excel, JSON, HTML, etc. Before getting into Pandas, we recommend you to learn the concepts NumPy  and if you are not sure what NumPy is, please go through the link : https://ainewgeneration.com/numpy/.

Pandas is one of the most important libraries of Data Science as it is known to be fast, flexible, powerful, and easy-to-use open-source data analysis library that builds on top of the Python programing language.

Table of Contents

  1. Import Pandas Library.
  2. Loading Data in Pandas.
  3. Reading Data in Pandas.
  4. Describing the Dataset.
  5. Sorting the Dataset.
  6. Adding Columns in the Dataset.
  7. Saving the Dataset.
  8. Filtering the Dataset.
  9. Finding Null values in the Dataset.
  10. Removing Null Values from the Dataset.
  11. Groupby Analysis in the Dataset.
  12. Data Types
  13. Renaming Columns in the Dataset

Dataset link – https://www.kaggle.com/mariotormo/complete-pokemon-dataset-updated-090420

Import Pandas Library

The first step is to import the Pandas library and the standard notation to write/import pandas as “pd” to avoid typing pandas every time you use the library.

import pandas as pd

Loading Data in Pandas

For Loading the different types of dataset files into Pandas, we use pd.read_fileType(file_path) and the same can be seen below:

  • CSV file with pd.read_csv(file_path)
  • Exel file wih pd.read_excel(file_path)
  • JSON file with pd.read_json(file_path)
  • Html file with pd.read_html(file_path)
df = pd.read_csv("pokemon_data.csv")

Reading Data in Pandas

  • Reading Columns name or Headers in Dataset
df.columns

output : Index(['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk','Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')
  • Displaying first 5 rows in pandas using .head() similar last 5 rows using .tail()
df.head()

Describing the Dataset

df.describe()

Sorting the Dataset

A Dataset can be sorted based on any column and it takes optional parameters like ascending or descending. If you wish to sort in descending order, pass the parameter as “ascending = False”. If sorting with multiple columns, pass the no. of the column in the list with an additional parameter “ascending = [0,1]” ; 0 for ascending and 1 for descending.

df.sort_values(["Name","Attack"]).head()

Adding & Droping Columns in the Dataset

To add a column in the dataset, pass the key as column name and the assign the values as the code below. Here, we have used index-based slicing i.e., iloc.

df["Total"] = df.iloc[:,4:10].sum(axis=1)
df.head()

Deleting Columns in the dataset using dataframe.drop(column_name)

df = df.drop(columns = ["Total","#"])

Saving the Dataset

To save any type of dataset in CSV file or excel file or any other file type using dataframe.to_datafileType.

#To save in csv file format
df.to_csv("modified.csv")

#To save in excel file format
df.to_excel("modified.excel")

Filtering the Dataset

Filtering the dataset with [Type 1 = Grass , Type 2 = Poison and HP > 80]

filter_data = df[(df["Type 1"]=="Grass") & (df["Type 2"] == "Poison") & (df["HP"]>70)]

#Reset Index
filter_data.reset_index(drop=True)

Finding Null values in the Dataset

df.isnull().sum()

output :
Name            0
Type 1          0
Type 2        386
HP              0
Attack          0
Defense         0
Sp. Atk         0
Sp. Def         0
Speed           0
Generation      0
Legendary       0
dtype: int64

Removing Null Values from the Dataset

dropna function in Pandas drops the rows of the Dataset which has null values in it.

df.dropna(inplace =True)

#Recheck for null values
df.isnull().sum()

output : 
Name          0
Type 1        0
Type 2        0
HP            0
Attack        0
Defense       0
Sp. Atk       0
Sp. Def       0
Speed         0
Generation    0
Legendary     0
dtype: int64

Groupby Analysis in Dataset

Groupby function in pandas is used to group the specific data for better analysis of the dataset.

  • Groupby with mean :
df.groupby(["Type 1"]).mean().sort_values("Defense",ascending = False)

  • Groupby with sum :

  • Groupby with count : (Type 1 & Type 2)

Data Types

df.dtypes

output:
Name          object
Type 1        object
Type 2        object
HP             int64
Attack         int64
Defense        int64
Sp. Atk        int64
Sp. Def        int64
Speed          int64
Generation     int64
Legendary       bool
Total          int64
dtype: object

Renaming Columns in the Dataset

df = df.rename(columns={"Generation":"GP"})
df.head()

End Notes

I hope you understood the concepts of using Pandas for Data Analysis. You may now start Machine Learning with Python here: https://ainewgeneration.com/category/machine-learning/ and later, you can learn Deep Learning from: https://ainewgeneration.com/category/deep-learning/.

Tag:data analysis

  • Share:
author avatar
Rehan

Previous post

SVM in Machine Learning
September 18, 2021

Next post

K-Means Clustering in Machine Learning
September 19, 2021

You may also like

1234
Data Analysis for Olympic 2021
12 November, 2021

Leave A Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Garbage Classification using CNN Model
  • Brain Tumor Prediction PyTorch[CNN]
  • Covid-19 X-ray prediction using Deep Learning
  • Data Analysis for Olympic 2021
  • Naive Bayes in Machine Learning

Categories

  • Data Science
  • Deep Learning
  • Machine Learning
  • Python

Archives

  • December 2021
  • November 2021
  • September 2021
  • August 2021
  • July 2021

(+91) 844 745 8168

[email protected]

COMPANY

  • Blog
  • Our Services
  • Contact Us

LINKS

  • Home
  • Blog
  • Activity
  • Checkout

RECOMMEND

  • Cart
  • Members
  • Sample Page
  • Shop

SUPPORT

  • Members
  • My account
  • Register
  • Shop

Copyright © 2021 AI New Generation

Become an instructor?

Join thousand of instructors and earn money hassle free!

Get started now

Login with your site account

Lost your password?

Not a member yet? Register now

Register a new account

Are you a member? Login now