Skip to content

lautibursese/Data_Engineering_-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis

separator1

This project is part of the tutorials from A2-Capacitacion

Description

The present Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to:

  • Maximize insight into a data set;
  • Uncover underlying structure;
  • Extract important variables;
  • Detect outliers and anomalies;
  • Test underlying assumptions;
  • Develop parsimonious models; and
  • Determine optimal factor settings.

What includes?

  • Pandas first impression (head)
  • Pandas shape
  • Mean
  • Pandas describe
  • Seaborn distplot (represents the overall distribution of continuous data variables)
  • Pandas skew (calculates the skew for each column)
  • Kurt (return unbiased kurtosis over requested axis)
  • Scatter plot
  • Box plot
  • Seaborn heatmap
  • Correlation coeficient
  • Pair plot

Libraries

  • Pandas
  • Matplotlib
  • Seaborn
  • NumPy
  • Scipy
  • Sklearn