Skip to content

In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection…

Notifications You must be signed in to change notification settings

e181337/data_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

data_analysis for imbalanced data

In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection with IGR is applied for numerical columns. Chi-square test is applied for categorical columns in order to test whether there exist differences between distributions for target columns. Correlation analysis for an imbalanced data set is applied by using undersampling methods.

Application of Shapiro-Wilk, Anderson Darling, Chi-square tests
Correlation analysis for imbalanced data
Outlier detection

About

In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published