Data Analysis with Pandas
Introduction
In this article, let us learn using Pandas for Data Analysis. Pandas is a very useful library in Data Analysis with python. You can manipulate and analyze data with the use of Pandas. Pandas is used to load the datasets and it also supports datasets from various files like CSV, Excel, JSON, HTML, etc. Before getting into Pandas, we recommend you to learn the concepts NumPy and if you are not sure what NumPy is, please go through the link : https://ainewgeneration.com/numpy/.
Pandas is one of the most important libraries of Data Science as it is known to be fast, flexible, powerful, and easy-to-use open-source data analysis library that builds on top of the Python programing language.
Table of Contents
- Import Pandas Library.
- Loading Data in Pandas.
- Reading Data in Pandas.
- Describing the Dataset.
- Sorting the Dataset.
- Adding Columns in the Dataset.
- Saving the Dataset.
- Filtering the Dataset.
- Finding Null values in the Dataset.
- Removing Null Values from the Dataset.
- Groupby Analysis in the Dataset.
- Data Types
- Renaming Columns in the Dataset
Dataset link – https://www.kaggle.com/mariotormo/complete-pokemon-dataset-updated-090420
Import Pandas Library
The first step is to import the Pandas library and the standard notation to write/import pandas as “pd” to avoid typing pandas every time you use the library.
import pandas as pd
Loading Data in Pandas
For Loading the different types of dataset files into Pandas, we use pd.read_fileType(file_path) and the same can be seen below:
- CSV file with pd.read_csv(file_path)
- Exel file wih pd.read_excel(file_path)
- JSON file with pd.read_json(file_path)
- Html file with pd.read_html(file_path)
df = pd.read_csv("pokemon_data.csv")
Reading Data in Pandas
- Reading Columns name or Headers in Dataset
df.columns
output : Index(['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk','Sp. Def', 'Speed', 'Generation', 'Legendary'],
dtype='object')
- Displaying first 5 rows in pandas using .head() similar last 5 rows using .tail()
df.head()

Describing the Dataset
df.describe()

Sorting the Dataset
A Dataset can be sorted based on any column and it takes optional parameters like ascending or descending. If you wish to sort in descending order, pass the parameter as “ascending = False”. If sorting with multiple columns, pass the no. of the column in the list with an additional parameter “ascending = [0,1]” ; 0 for ascending and 1 for descending.
df.sort_values(["Name","Attack"]).head()

Adding & Droping Columns in the Dataset
To add a column in the dataset, pass the key as column name and the assign the values as the code below. Here, we have used index-based slicing i.e., iloc.
df["Total"] = df.iloc[:,4:10].sum(axis=1)
df.head()

Deleting Columns in the dataset using dataframe.drop(column_name)
df = df.drop(columns = ["Total","#"])

Saving the Dataset
To save any type of dataset in CSV file or excel file or any other file type using dataframe.to_datafileType.
#To save in csv file format
df.to_csv("modified.csv")
#To save in excel file format
df.to_excel("modified.excel")
Filtering the Dataset
Filtering the dataset with [Type 1 = Grass , Type 2 = Poison and HP > 80]
filter_data = df[(df["Type 1"]=="Grass") & (df["Type 2"] == "Poison") & (df["HP"]>70)]
#Reset Index
filter_data.reset_index(drop=True)

Finding Null values in the Dataset
df.isnull().sum()
output :
Name 0
Type 1 0
Type 2 386
HP 0
Attack 0
Defense 0
Sp. Atk 0
Sp. Def 0
Speed 0
Generation 0
Legendary 0
dtype: int64
Removing Null Values from the Dataset
dropna function in Pandas drops the rows of the Dataset which has null values in it.
df.dropna(inplace =True)
#Recheck for null values
df.isnull().sum()
output :
Name 0
Type 1 0
Type 2 0
HP 0
Attack 0
Defense 0
Sp. Atk 0
Sp. Def 0
Speed 0
Generation 0
Legendary 0
dtype: int64
Groupby Analysis in Dataset
Groupby function in pandas is used to group the specific data for better analysis of the dataset.
- Groupby with mean :
df.groupby(["Type 1"]).mean().sort_values("Defense",ascending = False)

- Groupby with sum :

- Groupby with count : (Type 1 & Type 2)

Data Types
df.dtypes
output:
Name object
Type 1 object
Type 2 object
HP int64
Attack int64
Defense int64
Sp. Atk int64
Sp. Def int64
Speed int64
Generation int64
Legendary bool
Total int64
dtype: object
Renaming Columns in the Dataset
df = df.rename(columns={"Generation":"GP"})
df.head()

End Notes
I hope you understood the concepts of using Pandas for Data Analysis. You may now start Machine Learning with Python here: https://ainewgeneration.com/category/machine-learning/ and later, you can learn Deep Learning from: https://ainewgeneration.com/category/deep-learning/.
Tag:data analysis