In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection with IGR is applied for numerical columns. Chi-square test is applied for categorical columns in order to test whether there exist differences between distributions for target columns. Correlation analysis for an imbalanced data set is applied by using undersampling methods.
-
Notifications
You must be signed in to change notification settings - Fork 2
In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection…
e181337/data_analysis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection…
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published